Thursday, 2017-10-12

pabelangermaybe it isn't fixed00:00
pabelangerianw: mind adding http://logs.openstack.org/16/502316/3/check/gate-openstackci-beaker-ubuntu-trusty/241fbcf/console.html#_2017-10-11_23_55_43_441780 to your list of things to fix?00:00
pabelangerthat is blocking system-config patches from landing00:00
pabelangerI'll look at mirror-update.o.o again00:01
*** vhosakot has joined #openstack-infra00:01
ianwhmm, ok00:02
jeblairpabelanger: let me know what you see -- i don't understand what i'm seeing in /var/log/reprepro/ubuntu-mirror.log00:02
pabelangerjeblair: I deleted the lockfile and I'm manually running reprepro update on ubuntu mirror00:03
*** yamahata has quit IRC00:03
pabelangerI _think_ we need to increase out timeout from 30mins to longer00:04
jeblairpabelanger: ok, that explains the abbreviated output00:04
pabelangerwhich then kills reprepro and leaves lockfile00:04
*** dingyichen has joined #openstack-infra00:04
*** srobert_ has joined #openstack-infra00:04
jeblairpabelanger: we still only release the volume if it's successful, right?  (so i'm curious how we're getting out of syng)00:04
jeblairsync00:04
pabelangerbut, a few hours ago, I did run reprepro check and checkpool fast, and things looked correct00:04
pabelangerjeblair: right, should only vos release when check / checkpool pass00:05
pabelanger  processing updates for 'xenial-security|main|amd64'00:05
pabelangercurrently00:05
*** gmann_afk is now known as gmann00:06
*** ijw has quit IRC00:07
*** ijw has joined #openstack-infra00:08
*** sree has joined #openstack-infra00:08
*** srobert has quit IRC00:08
*** Swami has quit IRC00:08
ianwpabelanger: the fact this is in keyring & cryptography ... possibly related to missing packages?00:08
ianwit does not trivially reproduce on a trusty node in a virtualenv00:09
*** vhosakot has quit IRC00:09
pabelangerianw: ya, I think I'm going to see about an autohold, once I figure out reprepro issue00:10
ianwlet me add that and kick one off00:10
*** sree has quit IRC00:12
*** yamahata has joined #openstack-infra00:12
*** sbezverk has joined #openstack-infra00:16
*** edmondsw has joined #openstack-infra00:16
ianwpabelanger: http://paste.openstack.org/show/623394/00:17
ianwerror in cryptography setup command: Invalid environment marker: python_version < '3'00:17
*** Goneri has quit IRC00:17
clarkboh swift was running into that too00:17
notmynameyup00:18
notmynamehad to update setuptools to a newer-than-distro version00:18
ianwhmm, the root cause seems to be pip 7 ish00:18
ianwwhy is the latest pip not on trusty00:18
ianwto the build logs!00:19
ianwhttp://logs.openstack.org/16/502316/3/check/gate-openstackci-beaker-ubuntu-trusty/241fbcf/console.html#_2017-10-11_23_51_08_43158800:20
ianwsudo pip install 'pip<8' 'virtualenv<14'00:20
ianwwhy would you do that00:20
*** edmondsw has quit IRC00:21
ianwbecause http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/jobs/macros.yaml#n569 ... hmmm00:21
ianwbecause of this https://review.openstack.org/#/c/270995/ ~2 year old patch00:22
jeblairdoes that mean that the ubuntu mirror is actually okay?00:22
*** gouthamr has joined #openstack-infra00:24
pabelangerno, I think there is an issue with the reprepro database for xenial-security00:26
*** vhosakot has joined #openstack-infra00:27
jeblairpabelanger: why's that?00:27
jeblairtell me what you're looking at and what you see00:27
pabelangersure, 1 sec00:28
pabelangerlet me get pastebin00:28
openstackgerritIan Wienand proposed openstack-infra/project-config master: Revert "Pin pip to <8 for openstackci-beaker jobs"  https://review.openstack.org/51136000:29
ianwjeblair: remember what that's all about ^ ?  i'm guessing no00:29
*** jkilpatr has quit IRC00:29
jeblairianw: nope, sorry.00:30
openstackgerritIan Wienand proposed openstack-infra/project-config master: Revert "Pin pip to <8 for openstackci-beaker jobs"  https://review.openstack.org/51136000:30
pabelangerjeblair: http://paste.openstack.org/show/623396/00:30
pabelanger  processing updates for 'u|xenial-security|main|amd64'00:30
pabelangerdoesn't look correct00:30
*** caphrim007 has joined #openstack-infra00:31
*** andreas_s has joined #openstack-infra00:31
pabelangerand reprepro update command appears to have hung on what I posted00:31
pabelangerI did use strace to look at pid, but I didn't see much going on00:31
ianwpabelanger: dmesg ... i had an issue with zero sized files on AFS the other day doing the ceph stuff00:32
ianw?00:32
ianwi had to clear everything out00:32
jeblairi'm stracing it now, and it does not seem to be working -- i haven't even gotten the current system call returned.00:32
pabelangerianw: clear out form where?00:32
pabelangerfrom*00:32
ianwsorry, clear out the mirror and restart00:33
ianwbut this was just for the ceph luminous, so not big00:33
pabelangerah00:33
pabelangeryah, I am hope we don't need to do the same00:33
mnaserxenial-security is probably not that big, unless you have to wipe everything :(00:34
pabelangerright, I _think_ i've already cleared the files on xenial-security, but update still not happy00:34
openstackgerritIan Wienand proposed openstack-infra/openstack-zuul-jobs master: Remove pin pip from beaker legacy jobs  https://review.openstack.org/51136100:34
pabelangerwhich make me think, when timeout killed reprepro before, something maybe got corrupt in the database00:35
pabelangerwhich, is possible, according to the warning it prints00:35
jeblairpabelanger: so let's assume there's something unhappy about the afs client on mirror-update that has it stuck.  how is it possible that we released an inconsistent volume?00:35
*** andreas_s has quit IRC00:35
ianwjeblair: of course, if we merge that unpin ... then do we break things even worse? :/00:35
*** Apoorva_ has joined #openstack-infra00:36
jeblairianw: zuulv3 would tell us pre-merge :/00:36
pabelangerjeblair: I don't think we did. I mean, I manually release a few hours ago, because I thought things were okay. But I believe the actually issue is, we build newer images with new packages, but our indexes were still old00:36
jeblairpabelanger: ooh... so the underlying fix we need is to use our own mirrors when building images?00:37
pabelangerand when we run apt-get install on bindep-fallback, it fails because we still point to old packages, while it expects newer00:37
pabelangerjeblair: Yah, that is possible00:37
pabelangerit would help prevent something like this I think00:37
jeblairpabelanger: what do you main by 'point to old packages'?00:37
jeblairs/main/mean/00:37
pabelangerI am speculating here, but give me a second00:38
*** srobert_ has quit IRC00:38
pabelangerjeblair: http://logs.openstack.org/c7/c722a78bea5d1a75cb204cc783b2480131bd5bc4/post/static-election-publish/d11a220/console.html#_2017-10-11_01_54_31_75558800:38
jeblairpabelanger: do you mean the index on the image is out of date?  cause i thought the first thing we do after configure-mirrors is to apt-get update.  the only thing i could see that could cause an inconsistency is to actually have a package installed on the image that causes a conflict00:38
pabelangerthat error to me, means we already have libcurl4-gnutls-dev installed, but it is a newer version that what our index is saying it should be00:39
*** Apoorva has quit IRC00:39
pabelangerI think the indexes on the image are newer then AFS mirrors, but because we apt-get clean in configure_mirror today, they image boots properly00:40
pabelangerthen, once we hit old indexes, apt-get is confused by old indexes00:40
jeblairapt-get clears out the index; clean should just clear out cached packages, i think.00:40
*** Apoorva_ has quit IRC00:40
jeblairi looked on a ready xenial node and see i  libcurl3-gnutls:amd64              7.47.0-1ubuntu2.2 installed, no libcurl400:41
pabelangerthe odd thing is, I _think_ this might work on zuulv3 jobs. I say think because I thought I seen a job properly pass an hour ago00:41
* EmilienM online for the next hour if needed00:41
pabelangerokay, it is possible I am wrong. So, please look and see if you find anything00:42
EmilienMpabelanger: see #tripleo when you can00:43
jeblairpabelanger: is that the current repo error, or the previous one?00:45
*** thorst has joined #openstack-infra00:46
*** thorst has quit IRC00:46
pabelangerjeblair: I believe that has been the issue all along, clarkb right?00:46
jeblairi just ran those commands on the xenial node i logged into and they worked00:47
pabelangerya00:47
pabelangerhttp://logs.openstack.org/55/511255/1/check/legacy-devstack-gate-tox-py3-run-tests/8134612/job-output.txt.gz#_2017-10-11_15_52_26_84205000:47
jeblairpabelanger: those are both old runs though, are we sure that's still a problem?00:48
pabelangerjeblair: which cloud?00:48
jeblairpabelanger: rax-ord00:48
pabelangerjeblair: no, I'm not 100% it is an issue still.00:49
pabelangerI thought I fixed it a few hours ago00:49
pabelangerbut, when I started running reprepro update manually, and it stopped (hung), I assumed it still wasn't fixed00:49
pabelangerso, possible this is a 2nd (new isssue)00:50
pabelangerissue*00:50
jeblairokay here's the recent error: http://logs.openstack.org/56/511356/1/check/gate-election-python35/e52bb05/console.html00:50
*** rook is now known as rook-afk00:50
jeblairhttps://etherpad.openstack.org/p/fkQc9nXfgN00:51
pabelangerthat is also ovh00:51
mnaserjeblair i think what hsppens is 2.2 is installed, but then if you try and do apt-get install libcurl3-gnutls-devel, it will try to pull 2.300:51
*** s-shiono has joined #openstack-infra00:51
mnaserbeacuse it tries to install libcurl3-gnutls-devel-<whatever>-2.300:51
mnaserand that wants libcurl3-gnutls-<foo>-2.3 which does not exist in the mirrors00:52
*** LindaWang has joined #openstack-infra00:52
mnaser(or didn't this morning at least)00:52
clarkbthe gnutls one is the ine weve had all along00:52
jeblairmnaser: right, though i just ran those commands on a 30m old rax-ord node and it only wanted to install 2.200:52
mnaserapt-get update before doing that jeblair ?00:52
jeblairso the question i now have is: under what circumstances does it want to install 2.300:53
jeblairmnaser: yes00:53
jeblairpabelanger seems to be suggesting we should look at the cloud region as a nexus00:53
tonybpabelanger: I'm still gettingt the gnutls issue :(00:53
mnaseryou're bringing up a good point here, the volume would have not been released00:53
mnasertonyb do you have logs of a failed job?00:53
pabelangeryah, we are debugging now00:54
jeblairmnaser, tonyb: i started an etherpad and put tonyb's links there: https://etherpad.openstack.org/p/fkQc9nXfgN00:54
tonybmnaser: reykjavik00:54
jeblairboth ovh-gra100:54
*** lewo` has quit IRC00:54
tonybmnaser: http://logs.openstack.org/56/511356/1/check/gate-election-python27-ubuntu-xenial/695ae09/console.html#_2017-10-12_00_09_31_881052 stupid clipboard00:54
mnasertonyb np :)00:55
mnaserjeblair ill try and search elasticsearch and see if the theory of ovh-gra1 only holds00:55
tonybjeblair: Thanks.00:55
jeblairmnaser: cool, i'll try see what i can find out about those node and image build times00:55
*** sbezverk has quit IRC00:55
jeblairwe put the image build information on the node.  we don't output it in all jobs.  :(00:56
*** dhinesh has quit IRC00:56
mnaserjeblair do you build the image multiple times for different formats in nodepool00:57
mnaseror build once then convert00:57
pabelangerOh00:57
clarkbmnaser: once and convert00:57
jeblairmnaser: once then convert00:57
pabelangerjeblair: I think rax are using the old images00:57
jeblairpabelanger: that's not surprising00:58
pabelangerjeblair: I can see in nodepool we are still trying to upload xenial images00:58
jeblairpabelanger: so the cloud connection is "broken everywhere but rax"00:58
pabelangerif so, they were the last good images before the breakage00:58
jeblairi'm going to assume that's the case for the moment and stop my investigations00:58
dmsimardWow, achievement unlocked. ARA mentioned in top comment of a frontpage HackerNews thread (without getting thrashed) https://news.ycombinator.com/item?id=1545059400:58
pabelangerkk00:59
*** priteau has joined #openstack-infra00:59
jeblairdmsimard: congrats!00:59
*** cuongnv has joined #openstack-infra00:59
jeblairi'm going to see if i can get myself on a non-rax node00:59
mnaserelasticsearch isn't cooperating, it shows bars for events but the messages is not showing things :<00:59
mnaseror at least it's taking a loooong time to load01:00
*** namnh has joined #openstack-infra01:00
clarkbmnaser: ya I noticed e-r is out to lunch too will have to investigate in the morning01:00
jeblairii  curl                               7.47.0-1ubuntu2.3                     amd64        command line tool for transferring data with URL syntax01:00
mnaserthere we have it01:00
dmsimardSorry for distracting from the issues, I'll go back to my cave01:00
* tonyb is going to take a tangent and try to create a minimal bindep.txt for the election repo01:01
jeblairokay, so it does look like the problem is that images are newer than mirrors01:01
*** jcoufal has joined #openstack-infra01:01
jeblairtonyb: time well spent regardless!01:01
pabelangerso, when I checked a few hours ago, I must have been looking at a rax node01:01
tonybjeblair: Yeah I've been putting it of as 'hard'01:01
jeblairso fixes are: short-term: git a mirror update finished and released. long-term: build images with our mirrors01:01
mnaserjeblair: compounded alongside the mirrors failing to update, boo01:01
fungitonyb: not hard at all. adding a bindep.txt is self-testing01:02
tonybfungi: hehe okay01:02
*** aeng has quit IRC01:02
pabelangerYah, and reprepro update still hasn't moved past the pastbin from above, so I am guessing we're corrupted something with timeout command01:03
jeblairpabelanger: if you think afs is being weird, how about we reboot mirror-update?01:03
fungitonyb: just take the http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/data/bindep-fallback.txt and whittle it down to the things you think your jobs for that repo will need from a distro package perspective. odds are, on the election repo, the answer is "very little"01:03
pabelangerjeblair: ya, happy to try that01:03
jeblairpabelanger: the main process is still stuck and doing nothing01:03
jeblairi'm less inclined to think it's corruption and more inclined to think it's afs01:03
*** priteau has quit IRC01:04
pabelangersure, lets reboot01:04
*** jcoufal_ has joined #openstack-infra01:04
tonybfungi: Yeah.  I'm going to try with an empty one ;P01:04
jeblairthe only other thing i see running is npm-mirror-update running since apr 1401:04
jeblairi think i should just issue 'reboot' now.  any objections?01:05
pabelanger++01:05
jeblairand there goes bandersnatch.01:05
jeblairi'll wait till it's done, then reboot immediately.01:05
*** liusheng has joined #openstack-infra01:05
pabelangerdmsimard: I agree with comment, tower has a lot of moving parts01:05
jeblairi'm not going to cast any stones at CI/CD systems for having lots of moving parts.01:06
jeblairrebooting01:06
SamYapleoh you just found the issue...01:06
*** kiennt26 has joined #openstack-infra01:06
SamYaplei was going to pop on to say ovh is a valid mirror01:06
SamYapleit just looks like you have newer packages than ovh has already installed01:07
SamYaplealways the slow poke01:07
jeblairSamYaple: yep!01:07
jeblairSamYaple: all mirrors are old01:07
pabelangermirror-update.o.o back01:07
*** jcoufal has quit IRC01:07
jeblairall images are new, except rax.  so rax is the only thing working now (because we're unable to upload there atm)01:07
jeblairpabelanger: you want to do the rerpreprepepro thing?01:07
pabelangerjeblair: yah01:08
*** sbezverk has joined #openstack-infra01:08
SamYaplewell my gates are working too, but thats because i build everything in docker conatienrs01:08
SamYaplethats what got me looking down the versions to new path01:08
jeblairthe systemic fix is to build our images with our mirrors so they can't get ahead of each other01:08
SamYapleyea01:09
*** namnh has quit IRC01:09
dmsimardSamYaple: I know I wanted to ask you something earlier but I forget what :/01:09
mnaserquestion01:09
mnaserdont we want to fix the upload-to-rax problem first01:10
pabelangerokay, reprepro running now01:10
jeblairpabelanger: that is a lot of 'v's :)01:10
pabelangermoar v's01:10
mnaseror otherwise we'll have a significantly smaller portion of ci that is functioning01:10
pabelanger  reading '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_xenial-security_main_amd64_Packages'01:10
pabelangerlast thing in console ATM01:10
SamYaplejeblair: another option would be to jsut run apt-get with the option "-t=xenial" as that will stomp and downgrade thigns as needed01:10
*** hemna_ has quit IRC01:10
*** yamahata has quit IRC01:10
SamYaplethat might cause other problems though, something to keep in mind01:11
jeblairmnaser: we generally expect images to be out of date -- we try not to rely on them being current01:11
SamYapleit comes in handy in a pinch01:11
mnaserjeblair gotcha, and actually i realized that curl will update gracefully if the mirrors are okay now01:11
SamYapledmsimard: was it "SamYaple: how are you so successful and attractive?"01:11
mnaser(of course i always realize these things after speaking up)01:11
pabelangerfile looks valid01:11
tonybhttps://review.openstack.org/#/c/511365/ \o/  Possibly more than needed and wont work for rpms but I'll merge it anyway01:12
pabelangerjeblair: are you seeing anything in strace?01:12
jeblairpread(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 90521600) = 409601:12
jeblairthat's the last line01:13
jeblairreprepro 2003 root    5u   REG   0,25  90628096 2537568 /afs/.openstack.org/mirror/ubuntu/db/checksums.db01:13
dmsimardSamYaple: nope. And it's annoying the hell out of me now :(01:13
*** liusheng has quit IRC01:13
jeblairi can access that file okay01:13
jeblairpabelanger: any chance that's a potentially corrupted file?01:13
SamYapledmsimard: sounds like me yea01:14
jeblairpabelanger: reprepro is at 100% cpu01:14
pabelangerjeblair: possible, however I believe we can regnerate it with reprepro with another command01:14
pabelangerlet me check man page01:14
mnaserbtw, not sure if anyone knows this or not but -fF with strace is quite useful01:14
mnaserit'll actually hop into subprocesses/threads01:14
*** cuongnv has quit IRC01:14
jeblairpabelanger: i feel like 100% cpu and no system call activity after reading a bunch of null data from a file looks a lot like "infinite loop because of bad data"01:14
jeblairmnaser: i used -f but not -ff01:15
mnaserjeblair im old, man says => "This option is now obsolete and it has the same functionality as -f."01:15
mnaserold typing habits die hard i guess01:15
*** liusheng has joined #openstack-infra01:15
pabelangerreprepro collectnewchecksums01:15
pabelangerI think that is the command01:15
pabelangerjeblair: yah, seems to make sense01:15
ianwpabelanger / jeblair : can we do https://review.openstack.org/#/c/511360/ to unblock system-config, and i'll jump on any further issues?01:15
pabelangerreprepro _listchecksums should show what current checksums are01:16
pabelangerjeblair: I'm going to kill reprepro and try _listchecksums01:16
jeblairpabelanger: ++01:17
jeblairianw: +201:17
dmsimardSamYaple: OH I remember now01:18
*** baoli has joined #openstack-infra01:18
dmsimardSamYaple: remember how we discussed bindep supporting different dances for sources and things01:18
pabelangerappears to be running, will let it finish01:18
jeblairianw, pabelanger: i need to afk.  aiui next steps are 1) fix pip stuff 2) fix reprepro and release the mirror 3) approve and enqueue 511260 into zuulv2 gate01:18
dmsimardSamYaple: or different profiles and such01:19
jeblairi think if we do all of those things, we can zuulv3?01:19
SamYapledmsimard: https://review.openstack.org/#/c/506502/ ?01:19
pabelangerjeblair: okay, I'll keep working on reprepro01:19
SamYaple(please some one +3 that patch, im begging)01:19
dmsimardSamYaple: I was wondering whether that was still relevant with zuul v3, considering roles (and their dependencies) should likely be self contained01:19
dmsimardso if you need something in a role, it should likely be installed inside that role01:19
SamYapledmsimard: so actually that patch is to use bindep in docker containers for image building (which we are currently doing in LOCI)01:20
SamYaplesignificantly different use case to the gate01:20
dmsimardSamYaple: oh, huh, interesting.01:20
SamYapledmsimard: https://github.com/openstack/loci/blob/master/bindep.txt01:20
dmsimardSamYaple: sort of makes sense I guess01:21
SamYapledmsimard: it makes image building very very clean. if i can get the bindep syntax changed from the above patch, then i can do https://review.openstack.org/#/c/506823/3/bindep.txt01:21
dmsimardI wouldn't have thought about bindep for installing packages in containers :p01:21
SamYaplewhich is even more better01:21
SamYaplewell its great ebcause its one stop for all rpm/deb/pacman/emerge01:21
SamYapleand with different architectures01:22
dmsimardnot unlike ansible but I guess ansible is more verbose01:22
ianwjeblair: ok, i'll try to push all that along01:22
SamYapleno duplication in the case where there are same-named packages across multiple distros01:22
dmsimardI should look at ansible-container again, I try to look at least once every 3 months01:22
SamYapleall in all, weve been very happy with it01:22
dmsimardthey haven't yet fulfilled my dream01:22
SamYapleheh you and me both01:22
ianwthe logs i assume we're just pruning as fast as we can01:23
dmsimardSamYaple: https://github.com/ansible/ansible-container/issues/399#issuecomment-31610919301:23
*** mrunge has quit IRC01:24
SamYapledmsimard: i actually dont mind running systemd as pid 1. docker did/does have the pid reaping problem (there are docker daemon options to fix that now)01:24
SamYaplebut i dont want to include systemd in my image because it adds like 80mb01:24
SamYapleim building keystone in less than a 40mb layer. i dont need to triple that01:25
dmsimardSamYaple: oh, it's just a bit awkward to do it but that's not ansible-container's fault01:25
*** cuongnv has joined #openstack-infra01:25
openstackgerritTony Breeds proposed openstack-infra/irc-meetings master: bindep:  Supply a bindep.txt file to avoid the 'global' set  https://review.openstack.org/51136901:25
SamYapleits a cool idea, i just find it hard to image it becoming practicle01:25
*** yamamoto has quit IRC01:26
dmsimardSamYaple: the use case is mostly to take a role that already exists and works with modern distros and use it to build an image with ansible-container01:26
dmsimardbut there's all sort of things that make this awkward01:26
SamYapleyes. and that i agree with as a migration step01:26
SamYaplebut i dont really like it as a "long-term" solution01:27
dmsimardmigration ? there's no migration, if you want to install on a bare metal, vm or container image you use the same role with the same params and everything :p01:27
SamYapleno i get that, im just not sure i get the benefit at that point is *my* point01:27
*** baoli has quit IRC01:27
fungis/migration/cross-platform portability/ ?01:27
dmsimardfungi: :)01:28
dmsimardIf I do a service task in ansible that says start the service, it better be able to start the darn service :)01:28
*** kiennt26 has quit IRC01:29
SamYaplei do understand the feeling :)01:29
dmsimardin the meantime, I'll keep cursing at this elk container thing01:31
openstackgerritMerged openstack-infra/irc-meetings master: bindep:  Supply a bindep.txt file to avoid the 'global' set  https://review.openstack.org/51136901:32
pabelangerianw: 260GB is the since of ubuntu mirror01:33
pabelangerwould take a bit to re-mirror i think01:34
*** vhosakot has quit IRC01:38
*** fanzhang has joined #openstack-infra01:38
*** baoli has joined #openstack-infra01:38
*** kiennt26 has joined #openstack-infra01:41
*** ijw has quit IRC01:41
*** ijw has joined #openstack-infra01:41
ianw:/01:42
pabelangercurrently running reprepro export xenial01:42
pabelangerin an effort to see if we regenerate01:42
*** larainema has joined #openstack-infra01:43
*** kiennt26 has quit IRC01:43
*** baoli has quit IRC01:43
*** kiennt26 has joined #openstack-infra01:43
*** ijw has quit IRC01:44
*** sdague has quit IRC01:44
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add documentation on force-merging a change  https://review.openstack.org/51124801:45
*** jcoufal_ has quit IRC01:45
*** kiennt26 has quit IRC01:46
*** kiennt26 has joined #openstack-infra01:46
*** kiennt26 has quit IRC01:47
*** kaisers has quit IRC01:48
*** kaisers has joined #openstack-infra01:49
*** psachin has joined #openstack-infra01:49
*** kiennt26 has joined #openstack-infra01:49
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add documentation on force-merging a change  https://review.openstack.org/51124801:50
*** edmondsw has joined #openstack-infra01:51
*** rosmaita has quit IRC01:53
*** nikhil has quit IRC01:54
*** hongbin has joined #openstack-infra01:55
openstackgerritMerged openstack-infra/project-config master: Online inap-mtl01 region  https://review.openstack.org/51132801:57
*** kukacz has quit IRC02:00
*** dhinesh has joined #openstack-infra02:00
*** kukacz has joined #openstack-infra02:01
openstackgerritMerged openstack-infra/project-config master: Revert "Pin pip to <8 for openstackci-beaker jobs"  https://review.openstack.org/51136002:02
*** thorst has joined #openstack-infra02:02
ianwok, will try system-config in a bit with ^ and see where we're at02:03
*** thorst has quit IRC02:07
*** baoli has joined #openstack-infra02:07
*** jascott1 has quit IRC02:08
*** jascott1 has joined #openstack-infra02:08
*** hichihara has joined #openstack-infra02:10
openstackgerritMerged openstack-infra/irc-meetings master: Create alternate time for Neutron Drivers meeting  https://review.openstack.org/51129302:11
*** baoli has quit IRC02:12
*** jascott1 has quit IRC02:12
*** kiennt26 has quit IRC02:13
*** kiennt26 has joined #openstack-infra02:15
*** thorst has joined #openstack-infra02:17
dmsimardclarkb: I'm done fighting with my local elk instance for tonight, I wanted to test the type things we've talked about.. I'll look at it some more tomorrow.02:18
*** thorst has quit IRC02:19
dmsimardIt might be that I'm testing with stuff that is too up to date compared to what we're running on logstash.o.o.02:20
clarkboh ya we are old for reasons02:20
clarkbmostly of the javascript variety02:21
*** gildub has quit IRC02:25
*** baoli has joined #openstack-infra02:25
openstackgerritMerged openstack-infra/irc-meetings master: Update Neutron team meeting chairperson  https://review.openstack.org/51130302:28
pabelangerclarkb: still working on getting ubuntu mirror working again, it is possible we may need to rebuilt it from scratch, but doing so might take ~2 days. We have 260GB to deal with02:28
pabelangerI don't plan on deleting anything, but something we might need to dig into in the morning02:28
ianwcan i help?02:29
clarkbpabelanger: do we think it is reprepro or afs or both?02:29
pabelangerianw: right now, waiting for reprepro export to finish, then going to try running update again02:29
pabelangerianw: however, I might have to pass off to you shortly, getting late02:29
pabelangerclarkb: we rebooted mirror-update to make sure afs was good02:29
pabelangerbut same issues02:30
pabelangercurrent hope is export regenerates everything we need02:30
pabelangerso update cmd works02:30
pabelangerits been reading files on disk for a while now02:30
clarkbexport of reprepro?02:30
pabelangerhttps://mirrorer.alioth.debian.org/reprepro.1.html02:31
pabelangeryah02:31
*** liujiong has joined #openstack-infra02:31
pabelangerreprepro export xenial02:31
*** gouthamr has quit IRC02:31
clarkbgotcha thats like an in place rebuild02:32
pabelangeryah, I hope02:33
*** coolsvap has joined #openstack-infra02:33
pabelangerI also just found https://github.com/esc/reprepro/blob/master/docs/recovery02:34
*** srobert has joined #openstack-infra02:38
*** markvoelker has quit IRC02:39
ianwthe only other thing i can think to do as a prophylactic is maybe attach a volume and get the data on afs01 so if it needs to be imported, it's there?02:39
ianwbut if it slows things down, it would be even worse02:39
*** mrunge has joined #openstack-infra02:39
*** andreas_s has joined #openstack-infra02:40
*** gildub has joined #openstack-infra02:42
*** srobert has quit IRC02:43
*** gcb has joined #openstack-infra02:43
*** dfflanders has joined #openstack-infra02:44
pabelangerOh interesting02:48
pabelangerhttp://paste.openstack.org/show/623401/02:48
pabelangerI just got that on export02:48
pabelangerand see some afs warnings in dmesg02:49
*** andreas_s has quit IRC02:49
*** dfflanders has quit IRC02:49
*** yamahata has joined #openstack-infra02:51
ianwoh dear02:52
*** junbo has quit IRC02:52
*** edmondsw has quit IRC02:53
ianwthings that handle an error from close() are few and far between too02:53
*** nicolasbock has quit IRC02:53
ianwhandle it properly, anyway02:53
pabelangerI'm starting down the recovery doc now02:54
pabelangerrereference first02:54
tonybproject-config is still frozen correct?02:57
mnasertonyb v2 changes likely wont merge, v3 changes welcome afaik02:57
tonybmnaser: okay I was hoping to do both but I can just make a note and do the v3 change after the switch02:58
*** andreas_s has joined #openstack-infra02:59
mnasertonyb you can still propose the v3 change and it will be reviewed and merged02:59
pabelangerianw: ya, something appears to be up with AFS02:59
*** priteau has joined #openstack-infra03:00
ianwi've been pining afs01 from mirror-update03:00
ianwno dropped packets, a few quite high spikes though (~8ms)03:00
tonybmnaser: okay, I'll thin kon that for a bit03:00
ianwseeing as basically nothing has changed, gotta feel like it's network between the two03:00
pabelangerianw: okay, I have to call it. But xenial-updates and xenial-security both have issues03:01
pabelangerxenial and xenial-backports update properly03:01
pabelangerso, it is possible could just try first deleteing ubuntu-secutiry from reprepro, then mirror it03:01
pabelangerthen, if it works, we do the same for -updates03:02
*** mtreinish has quit IRC03:02
pabelangerianw: good luck, I'll read up on backscroll in the morning03:02
ianwalright, let me think about it before i do anything :)03:03
*** hichihara has quit IRC03:03
*** priteau has quit IRC03:05
*** mtreinish has joined #openstack-infra03:07
ianwinteresting, we don't make /var/run/reprepro on reboot i guess03:10
*** kiennt26 has quit IRC03:11
*** gouthamr has joined #openstack-infra03:12
*** andreas_s has quit IRC03:12
*** thorst has joined #openstack-infra03:12
*** thorst has quit IRC03:12
*** kiennt26 has joined #openstack-infra03:16
*** baoli has quit IRC03:17
*** Srinivas has joined #openstack-infra03:20
Srinivashi all, i am facing this while runnning jobs in jenkins, " ERROR! Unexpected Exception: 'module' object has no attribute '_vendor'" any one knows this issue03:20
*** links has joined #openstack-infra03:25
openstackgerritIan Wienand proposed openstack-infra/system-config master: Create flock directories in /var/run  https://review.openstack.org/51138003:28
*** udesale has joined #openstack-infra03:35
*** yamamoto has joined #openstack-infra03:39
*** yamamoto_ has joined #openstack-infra03:44
*** yamamoto has quit IRC03:47
openstackgerritNam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy job from Oslo.log  https://review.openstack.org/51138403:52
*** dave-mccowan has quit IRC03:54
*** ykarel|afk has joined #openstack-infra03:56
openstackgerritNam Nguyen Hoai proposed openstack-infra/openstack-zuul-jobs master: Remove Oslo.log legacy job  https://review.openstack.org/51138503:56
openstackgerritNam Nguyen Hoai proposed openstack-infra/openstack-zuul-jobs master: Remove Oslo.log legacy job  https://review.openstack.org/51138503:58
openstackgerritNam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy job from Oslo.log  https://review.openstack.org/51138404:01
*** sree has joined #openstack-infra04:03
*** yamamoto_ has quit IRC04:03
*** hongbin has quit IRC04:04
ianwpabelanger: ok, as suggested, i removed xenial-update & xenial-security -- i dropped them from distribtuions (/etc/reprepro/ubuntu/distributions.ianw) $REPREPRO --delete clearvanished04:04
ianwit seemed to remove a bunch of things04:04
ianwsee logs in /tmp/ianw/out.log (sorry it's just a huge stream)04:04
ianwi put them back, and am rerunning a "normal" update04:04
SamYapleianw: are you saying that updates and security wont be mirrored anymore?04:04
SamYapleoh ok04:05
SamYaplephew. scared me for a second04:05
ianwi don't know what it's doing, it's sitting there at 100% cpu with -VVV not saying anything04:05
ianwi am going to go do something else for about 45 minutes and not look at it, see if something happens04:05
SamYapleianw: you dont htink this has to do with the docker mirror we added do you?04:05
SamYapleseems like alot of this started right after that04:06
ianwSamYaple: no, my guess is that transient network errors have introduced AFS issues, which have corrupted reprepro's state somehow04:06
SamYaplegot it04:06
ianwthe only thing more obscure than AFS internals is reprepro internals, which makes for an interesting combo04:06
SamYaple:)04:07
SamYaplei really have to finish my apt mirroring utility04:07
SamYapleive never found a really good one04:07
SamYapleand i like to push to my ceph radosgw without having an intermedate clone locally, which *nothing* does04:07
*** armax has quit IRC04:08
*** armax has joined #openstack-infra04:08
*** armax has quit IRC04:08
*** armax has joined #openstack-infra04:09
SamYapleis there something on paper about how infra is going to solve the unsigned mirrors for apt issue?04:09
*** armax has quit IRC04:09
SamYaplewe could just resign the Release file after the mirroring04:09
*** armax has joined #openstack-infra04:10
ianwi don't think there's anything to solve, i don't think we want it signed to avoid it being used as public mirrors04:10
*** armax has quit IRC04:10
ianwas jeblair noted, you can't seem to strace this process.  or at least it doesn't seem to be doing anything04:10
ianwi installed gdb in a hail-mary to see if i can see what's going on04:10
SrinivasSamYaple:hi all, i am facing this while runnning jobs in jenkins, " ERROR! Unexpected Exception: 'module' object has no attribute '_vendor'" any one knows this issue04:10
ianwi haven't bothered with symbols -> http://paste.openstack.org/show/623403/04:11
ianwit's somewhere doing something in db code every time04:11
ianwthe dbs it has open are04:12
ianwreprepro 17829 root    5u   REG   0,25  42790912 2537578 /afs/.openstack.org/mirror/ubuntu/db/references.db04:12
ianwreprepro 17829 root    6u   REG   0,25  90628096 2537568 /afs/.openstack.org/mirror/ubuntu/db/checksums.db04:12
ianwreprepro 17829 root    7u   REG   0,25 485736448 2537576 /afs/.openstack.org/mirror/ubuntu/db/contents.cache.db04:12
ianwi think if they are corrupt, we are SOL basically04:12
clarkbianw: I'm just happy that hail mary as long shot term transcemds murican football04:13
SamYaplewould it be so wrong to purge it all and completely resync?04:13
ianwthis might have been a hospital pass from pabelanger :)04:13
SamYaplei know it will take time04:13
ianwnot sure if that term transcends04:13
ianwit's 250something gb over afs ... that is our last option04:14
ianwif i had to learn something from this now, it's that i think we should get things pointing to reverse proxies04:15
*** claudiub|2 has joined #openstack-infra04:15
ianwthat way, we can at least roll out a config to point it to upstream if this happens again04:15
SamYaplereverse proxies work great for non-https things04:15
SamYaplebut some repos are only https04:15
clarkbyou can totally http -> https04:16
SamYapleyea thats true04:16
SamYaplei guess that wouldbt be so bad, we are already doing custom urls. wouldnt be much different04:16
clarkbianw we could do that if we need to04:16
SamYapleas far as a workflow goes i mean04:17
clarkbin this case can we get by if we rebuild images against the mirror while we rebuild it?04:19
ianwclarkb: i think reverse proxies would be more reliable04:20
*** yamamoto has joined #openstack-infra04:21
ianwof course right now, we can't merge system-config until https://review.openstack.org/511360 is deployed.  i don't know why zuul hasn't reconfigured, it seems like it's been ages04:22
SamYaplei do apt-cacher-ng at my house with pretty good success04:22
SamYaplei would be ok with reverse proxies04:22
SamYapleit would save a great deal of space too04:23
*** markvoelker has joined #openstack-infra04:39
*** edmondsw has joined #openstack-infra04:39
*** edmondsw has quit IRC04:44
adriantwe still having issues with Zuul? I've got a patch I +2 and +1 workflow for an it doesn't seem to want to merge :(04:45
adrianthttps://review.openstack.org/#/c/509016/04:45
*** bhavik1 has joined #openstack-infra04:49
clarkbadriant: it needs to be +1'd by jenkins first. a recheck should get it going04:50
adriantclarkb: ty!04:50
adriantclarkb: although I'd have assumed the zuul +1 was enough :(04:51
*** stakeda has joined #openstack-infra04:53
ykarel|afkclarkb, why jenkins has not possed +1 in https://review.openstack.org/#/c/510735/, any idea?04:55
clarkbit wouldve been if we managed to keep using zuulv3 for gating but we had to roll back04:55
*** gouthamr has quit IRC04:56
*** thorst has joined #openstack-infra04:56
ykarel|afkthe patch has workflow +1 but gate jobs are not running04:56
clarkbykarel|afk: it did, look at the comments (toggle ci if you need to)04:56
*** ykarel|afk is now known as ykarel04:56
ykarelclarkb, yes it's there but gate jobs are not there in http://status.openstack.org/zuul/04:58
ykarelnot running04:58
clarkbit is there when I look04:59
ykarelclarkb, yes it's there, sorry05:00
ykareli mislooked05:00
*** thorst has quit IRC05:00
*** priteau has joined #openstack-infra05:00
*** bhavik1 has quit IRC05:01
*** eumel8 has joined #openstack-infra05:03
*** priteau has quit IRC05:05
ykarelclarkb, how tarballs are pushed, is there some issue, i cannot find https://tarballs.openstack.org/puppet-tripleo/puppet-tripleo-5.6.4.tar.gz05:05
ianwok, i think reprepo is dead, nothing has happened05:07
ykarellooks like there is some issue that's why some reverts are going: https://review.openstack.org/#/q/status:merged+project:openstack/releases+branch:master+topic:newton/tripleo05:08
*** CHIPPY has joined #openstack-infra05:11
*** markvoelker has quit IRC05:14
*** stakeda has quit IRC05:18
*** ykarel_ has joined #openstack-infra05:20
*** ykarel has quit IRC05:22
ianwi sent out a note in reply to mordred.  i'm running out of ideas if i can't get system-config changes merged05:29
*** jtomasek has joined #openstack-infra05:33
*** bhavik1 has joined #openstack-infra05:35
*** CHIPPY has quit IRC05:36
*** mrunge has quit IRC05:44
*** eumel8 has quit IRC05:45
*** cshastri has joined #openstack-infra05:45
*** threestrands has quit IRC05:48
*** dhajare has joined #openstack-infra05:53
*** e0ne has joined #openstack-infra05:55
*** eumel8 has joined #openstack-infra05:55
*** lewo has joined #openstack-infra05:56
*** e0ne has quit IRC05:59
*** udesale__ has joined #openstack-infra06:03
*** martinkopec has joined #openstack-infra06:03
*** udesale has quit IRC06:03
*** udesale has joined #openstack-infra06:06
*** mrunge has joined #openstack-infra06:06
*** sshnaidm|off is now known as sshnaidm06:06
*** udesale__ has quit IRC06:07
*** martinkopec has quit IRC06:08
*** martinkopec has joined #openstack-infra06:09
*** markvoelker has joined #openstack-infra06:11
*** kjackal_ has joined #openstack-infra06:12
*** bhavik1 has quit IRC06:16
*** pahuang has quit IRC06:18
*** yamahata has quit IRC06:20
ianwok, stracing reprepro the last entry is06:23
ianw3170  pread(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 90521600) = 409606:23
ianwwhich lsof tells me06:23
ianwreprepro 3170 root    6u   REG   0,25  90628096 2537568 /afs/.openstack.org/mirror/ubuntu/db/checksums.db06:23
sshnaidminfra-root, zuulv3 can't install ansible properly and fails, is it known issue? I didn't see it in etherpad before:     fatal error: openssl/opensslv.h: No such file or directory:  http://logs.openstack.org/07/472607/102/check/legacy-tripleo-ci-centos-7-scenario002-multinode-oooq-puppet/a263fa3/job-output.txt.gz#_2017-10-12_06_13_19_53200706:24
*** pahuang has joined #openstack-infra06:27
*** edmondsw has joined #openstack-infra06:28
ianwi'm running "find pool -type f -print | reprepro --confdir /etc/reprepro/ubuntu -b . _detect" which can hopefully recreate it?06:29
*** pgadiya has joined #openstack-infra06:30
*** Swami has joined #openstack-infra06:31
*** edmondsw has quit IRC06:32
ianwoh jeez, if this has to checksum the whole thing, over afs ...06:32
ianwit's up to 1mb, the old file was 80mb06:33
*** Swami has quit IRC06:33
AJaegeroops ;(06:33
ianwso let's say 5 minutes a megabyte, 5*80 == 6 hours?06:34
*** yamahata has joined #openstack-infra06:34
ianwthe old checksum file is still there06:34
*** udesale has quit IRC06:34
AJaegerjlk: something is wrong with the periodic translation jobs, see http://logs.openstack.org/periodic/git.openstack.org/openstack/glance/stable/newton/propose-translation-update/2302cc1/ - I expected that to be a *master* job since we only converted master...06:35
*** rcernin has joined #openstack-infra06:35
*** udesale has joined #openstack-infra06:35
AJaegerjlk: That one failed to isntall packages as well. Missing root?06:37
*** armaan has joined #openstack-infra06:38
*** srobert has joined #openstack-infra06:40
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Install zanata dependencies as root  https://review.openstack.org/51139606:40
AJaegerjlk, ianw, quick fix for the second problem ^06:40
ianwAJaeger: see my notes though, not sure how much will merge06:41
*** markvoelker has quit IRC06:44
*** dhinesh has quit IRC06:44
*** srobert has quit IRC06:44
eumel8AJaeger: There are more tasks in this role which requires root06:45
*** pgadiya has quit IRC06:45
AJaegereumel8: want to do a followup fix?06:46
chandankumarianw: hello06:46
AJaegerianw: those jobs looks fine - but in general I agree ;( Thanks for wading through it06:47
chandankumarianw: how to add initial core reviewers for this http://git.openstack.org/cgit/openstack/python-tempestconf/ ? we need to add 4 people for the same.06:47
ianwchandankumar: i added you as core, you can now add as you want06:49
eumel8AJaeger: just wondering if this full playbook runs not under root06:49
chandankumarianw: thanks :-)06:49
chandankumarianw: one more help i need on this review https://review.openstack.org/#/c/511194/06:50
AJaegereumel8: it does not run as root, see the link above if you want to check06:51
chandankumarianw: https://review.openstack.org/#/admin/groups/1842,members i am not able to add other core reviewers06:51
chandankumarianw: my email-id is chkumar@redhat.com06:51
AJaegerchandankumar: log out and in again06:52
*** priteau has joined #openstack-infra06:52
AJaegerchandankumar: and add first thing the QA PTL, please - the repo is part of QA06:52
AJaegerianw: so, 511396 passed tests06:53
* AJaeger will be offline for a couple of hours now06:54
chandankumarAJaeger: it is a part of Refstack, i will add hogepodge but still facing the same issue after logging out and logging again06:54
eumel8ok06:54
chandankumarAJaeger: some people also complains that they are not able to add me a reviewer06:54
ianwinfra-root / pabelanger: http://lists.openstack.org/pipermail/openstack-infra/2017-October/005610.html is likely my last update on the reprepro thing.  it's currently trying to recreate the checksums.db as described.  if that doesn't work, i'm out of ideas for now06:55
*** priteau has quit IRC06:55
*** priteau has joined #openstack-infra06:55
ianwchandankumar: i don't know, i'm almost out.  want me to add someone else?06:56
chandankumarianw: please add luigi toscano06:56
chandankumarand Chris Hoge06:57
*** thorst has joined #openstack-infra06:57
ianwchandankumar: what's you account id?06:57
ianwclick on your name and settings06:57
chandankumarianw: username chkumar24606:58
chandankumarUsername06:58
chandankumarchkumar24606:58
chandankumarFull Name Chandan Kumar06:58
chandankumarEmail Address chkumar@redhat.com06:58
ianwAccount ID below that06:58
chandankumarAccount ID 1239306:58
ianwOct 12, 2017 5:48 PMAddedChandan Kumar (8944)06:59
*** priteau has quit IRC06:59
ianwthat's the problem.  i think someone will have to manually delete the account.  check back in US hours for another infra root, i've got to EOD sorry06:59
chandankumarianw: no problem thanks :-)07:00
*** thorst_ has joined #openstack-infra07:00
chandankumartake rest, have anice night ahead :-)07:00
*** eumel8 has quit IRC07:01
*** Dinesh_Bhor has quit IRC07:01
*** slaweq has joined #openstack-infra07:01
*** thorst has quit IRC07:01
*** thorst_ has quit IRC07:05
*** Dinesh_Bhor has joined #openstack-infra07:07
*** vsaienk0 has joined #openstack-infra07:09
*** dingyichen has quit IRC07:10
*** Hal has joined #openstack-infra07:13
*** Hal is now known as Guest6633707:14
sshnaidmis issue with "permission denied" is back? http://logs.openstack.org/84/508884/1/check/legacy-tripleo-ci-centos-7-nonha-multinode-oooq/f7a99fa/logs/devstack-gate-setup-workspace-new.txt07:14
sshnaidmI thought it was solved yesterday07:14
*** yamahata has quit IRC07:15
*** pcaruana has joined #openstack-infra07:17
*** florianf has joined #openstack-infra07:20
openstackgerritAndreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove unused kuryr-libnetwork jobs  https://review.openstack.org/51140407:22
*** aviau has quit IRC07:23
*** aviau has joined #openstack-infra07:24
*** gildub has quit IRC07:24
*** armaan has quit IRC07:24
*** jpich has joined #openstack-infra07:29
*** shardy has joined #openstack-infra07:30
*** florianf has quit IRC07:32
*** tesseract has joined #openstack-infra07:32
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: DNM: test containers update  https://review.openstack.org/51117507:32
*** florianf has joined #openstack-infra07:32
*** rwsu has joined #openstack-infra07:33
*** andreas_s has joined #openstack-infra07:35
*** ykarel__ has joined #openstack-infra07:38
AJaegersshnaidm: see the status emails by ianw and monty on openstack-dev, this is not solved yet. You know, when it rains, it pours... ;(07:39
ethfciguys i feel it is high time for a 'stop the line'?07:40
ethfcisince days Jenkins and Zull is dead...07:40
*** ykarel_ has quit IRC07:41
*** markvoelker has joined #openstack-infra07:41
ethfcistill facing with the 'libcurl4-gnutls-dev' issue...07:43
*** stakeda has joined #openstack-infra07:45
*** egonzalez has joined #openstack-infra07:45
AJaegerethfci: http://lists.openstack.org/pipermail/openstack-dev/2017-October/date.html07:46
*** hashar has joined #openstack-infra07:57
*** d0ugal has joined #openstack-infra07:59
*** eumel8 has joined #openstack-infra08:00
*** gildub has joined #openstack-infra08:03
*** dbecker has joined #openstack-infra08:05
openstackgerritTovin Seven proposed openstack-infra/openstack-zuul-jobs master: Remove legacy oslo.db job  https://review.openstack.org/51141208:05
openstackgerritTovin Seven proposed openstack-infra/project-config master: Remove legacy oslo.db job  https://review.openstack.org/51141408:05
*** yamamoto has quit IRC08:05
*** yamamoto has joined #openstack-infra08:08
*** markvoelker has quit IRC08:14
*** edmondsw has joined #openstack-infra08:15
*** s-shiono has quit IRC08:17
*** priteau has joined #openstack-infra08:18
*** edmondsw has quit IRC08:20
*** shardy has quit IRC08:29
*** shardy has joined #openstack-infra08:29
kazshAJaeger: G'day, got PTL's +1 accordingly, please check https://review.openstack.org/#/c/509119/08:31
*** ralonsoh has joined #openstack-infra08:33
*** lucas-afk is now known as lucasagomes08:34
*** tosky has joined #openstack-infra08:35
*** derekh has joined #openstack-infra08:38
toskyAJaeger: hi! Going back to the previous questions about python-tempestconf by chandankumar: I'm adding the missing people, but when the project is approved under refstack, will we add refstack-core instead of specifying for example the PTL directly in python-tempestconf?08:49
*** spectr has quit IRC08:49
AJaegertosky: that all depends on how the refstack team wants to have it working ;)08:49
toskyack08:50
AJaegerYou could add refstack-core (or could have created the repo with reusing the refstack ACLs).08:50
AJaegerOr add a subteam of refstack... I would either add refstack-core or the PTL - and let the PTL decide the rest (after discussion with the team obviously)08:50
AJaegerkazsh: thanks, will review later08:51
*** yamamoto has quit IRC08:55
toskysure, I added the PTL for now08:56
*** thorst has joined #openstack-infra09:02
*** e0ne has joined #openstack-infra09:04
*** thorst has quit IRC09:05
*** gildub has quit IRC09:09
*** jascott1 has joined #openstack-infra09:10
openstackgerritDuong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Karbor  https://review.openstack.org/51143209:12
openstackgerritDuong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in Karbor  https://review.openstack.org/51143309:12
openstackgerritDuong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Karbor  https://review.openstack.org/51143209:14
*** jascott1 has quit IRC09:14
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: Configure OVB jobs to use local mirrors for images  https://review.openstack.org/51143409:14
*** yamamoto has joined #openstack-infra09:15
*** yamamoto has quit IRC09:15
*** ykarel__ is now known as ykarel|lunch09:17
openstackgerritAndreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Use native propose-translation jobs  https://review.openstack.org/51143509:17
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Fix propose-translation-update  https://review.openstack.org/51143609:17
*** spectr has joined #openstack-infra09:19
*** ociuhandu has quit IRC09:23
openstackgerritDuong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Murano  https://review.openstack.org/51143809:24
openstackgerritDuong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in Murano  https://review.openstack.org/51143909:24
*** chenying_ has joined #openstack-infra09:25
*** logan- has quit IRC09:27
AJaegerjlk, mordred, pleaes review  https://review.openstack.org/511435  https://review.openstack.org/511436  and https://review.openstack.org/511396  - and review whether we need root access in other places as well. Hope that gets us moving forward with translations...09:31
openstackgerritDuong Ha-Quang proposed openstack-infra/openstack-zuul-jobs master: Remove legacy jobs in Solum  https://review.openstack.org/51144009:31
openstackgerritDuong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in Solum  https://review.openstack.org/51144109:31
*** yamamoto has joined #openstack-infra09:44
*** kiennt26 has quit IRC09:53
*** priteau has quit IRC09:53
*** electrofelix has joined #openstack-infra10:00
*** thorst has joined #openstack-infra10:02
*** liujiong has quit IRC10:02
*** sree has quit IRC10:02
*** sree has joined #openstack-infra10:03
*** egonzalez has quit IRC10:03
*** edmondsw has joined #openstack-infra10:04
*** andreas_s has quit IRC10:06
*** rpittau has quit IRC10:06
*** andreas_s has joined #openstack-infra10:06
*** rpittau has joined #openstack-infra10:06
*** edmondsw has quit IRC10:08
*** thorst has quit IRC10:08
*** andreas_s has quit IRC10:11
*** spectr has quit IRC10:12
*** markvoelker has joined #openstack-infra10:12
*** pbourke has quit IRC10:15
*** ykarel|lunch is now known as ykarel10:15
*** egonzalez has joined #openstack-infra10:17
*** pbourke has joined #openstack-infra10:17
*** andreas_s has joined #openstack-infra10:20
*** sbezverk has quit IRC10:21
*** spectr has joined #openstack-infra10:25
*** sree has quit IRC10:26
*** mikal has quit IRC10:28
*** thingee has quit IRC10:28
*** thingee has joined #openstack-infra10:28
*** mikal has joined #openstack-infra10:30
*** udesale has quit IRC10:30
*** gcb has quit IRC10:30
*** panda|rover|off is now known as panda|rover10:30
*** andreas_s has quit IRC10:33
*** boden has joined #openstack-infra10:34
*** andreas_s has joined #openstack-infra10:39
*** armaan has joined #openstack-infra10:40
*** florianf has quit IRC10:40
*** florianf has joined #openstack-infra10:40
*** priteau has joined #openstack-infra10:42
*** armaan has quit IRC10:43
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: Use infra proxy server for trunk.r.o in delorean-deps  https://review.openstack.org/50888410:45
*** markvoelker has quit IRC10:45
*** clayton has quit IRC10:49
*** clayton has joined #openstack-infra10:51
*** andreas_s has quit IRC10:53
*** edmondsw has joined #openstack-infra10:53
*** florianf has quit IRC10:54
*** andreas_s has joined #openstack-infra10:54
*** florianf has joined #openstack-infra10:55
*** andreas_s has quit IRC10:56
*** andreas_s has joined #openstack-infra10:56
*** edmondsw has quit IRC10:56
*** logan- has joined #openstack-infra10:57
*** sambetts|afk is now known as sambetts11:01
*** zoli is now known as zoli|lunch11:02
*** zoli|lunch is now known as zoli11:02
*** priteau has quit IRC11:02
*** priteau has joined #openstack-infra11:03
*** dave-mccowan has joined #openstack-infra11:04
*** andreas_s has quit IRC11:07
*** priteau has quit IRC11:08
*** shardy is now known as shardy_lunch11:08
*** priteau has joined #openstack-infra11:10
*** wolverineav has joined #openstack-infra11:10
*** sdague has joined #openstack-infra11:11
*** gildub has joined #openstack-infra11:12
*** florianf has quit IRC11:13
*** florianf has joined #openstack-infra11:14
*** priteau has quit IRC11:16
*** gmann is now known as gmann_afk11:17
*** jkilpatr has joined #openstack-infra11:23
*** Srinivas has quit IRC11:26
*** cuongnv has quit IRC11:26
*** yamamoto has quit IRC11:27
*** martinkopec has quit IRC11:29
*** priteau has joined #openstack-infra11:35
*** andreas_s has joined #openstack-infra11:35
*** ykarel has quit IRC11:37
*** ykarel has joined #openstack-infra11:38
*** nicolasbock has joined #openstack-infra11:39
*** andreas_s has quit IRC11:39
*** andreas_s has joined #openstack-infra11:40
*** markvoelker has joined #openstack-infra11:43
*** nicolasbock has quit IRC11:45
strigaziHello AJaeger, Can we merge magnum's zuulv3 patch? https://review.openstack.org/#/c/508676/ I'm a little lost with all these new fast failures/RETRY_LIMIT11:46
*** andreas_s has quit IRC11:49
*** yamamoto has joined #openstack-infra11:50
mordredinfra-root: I've got a doctor's appointment this morning - out for the next few hours11:52
*** coolsvap has quit IRC11:55
AJaegermordred: all the best!11:55
jamespagemorning/afternoon all - we had some issues with installation of libcurl4-gnutls-dev in our check/gate jobs yesterday which I understood where due to an ubuntu archive cache problem11:56
jamespagestill seeing some of those today - https://review.openstack.org/#/c/504310/11:56
jamespagedo we still have some inconsistency somewhere?11:56
AJaegerstrigazi: the syntax and set up looks fine, otherwise Zuul would have complained - it was able to run the jobs successfully. There'S no regresssion from legacy to your jobs, so the migration looks fine. With all the problems, the -1 are to be expected. I think you can merge the change but I doubt you will be able to since there's no +1 by Jenkins yet.11:56
AJaegerjamespage: http://lists.openstack.org/pipermail/openstack-dev/2017-October/date.html11:57
AJaegerjamespage: basically: the problem is not fixed yet11:57
*** nicolasbock has joined #openstack-infra11:58
*** stakeda has quit IRC11:59
*** andreas_s has joined #openstack-infra11:59
*** dprince has joined #openstack-infra12:00
*** hashar has quit IRC12:01
*** trown|outtypewww is now known as trown12:02
*** lucasagomes is now known as lucas-hungry12:02
*** andreas_s has quit IRC12:03
*** andreas_s has joined #openstack-infra12:04
*** thorst has joined #openstack-infra12:06
*** andreas_s has quit IRC12:08
*** andreas_s has joined #openstack-infra12:08
*** shardy_lunch is now known as shardy12:09
*** edmondsw has joined #openstack-infra12:10
*** edmondsw_ has joined #openstack-infra12:10
*** gcb has joined #openstack-infra12:12
*** edmondsw has quit IRC12:14
*** markvoelker has quit IRC12:16
sambettsAJaeger: are the zuul v2 docs completely gone now?? I'm trying to change a configuration in our third party CI and all the zuul docs are for zuul v3 now12:19
*** lifeless has quit IRC12:24
AJaegersambetts: do you mean the infra-manual? You can check it out and build locally...12:25
*** gmann_afk is now known as gmann12:26
openstackgerritMajor Hayden proposed openstack-infra/project-config master: Remove OpenStack/Ceph/Virt repo from CentOS  https://review.openstack.org/49300312:26
*** hrybacki|trainin is now known as hrybacki12:26
sambettsAJaeger: so its not published any more? not planning on doing https://docs.openstack.org/infra/zuul/v2 like the other projects have ocata/pike etc12:26
*** eharney has joined #openstack-infra12:28
*** lifeless has joined #openstack-infra12:31
*** markvoelker has joined #openstack-infra12:33
*** rosmaita has joined #openstack-infra12:33
efriedGood morning infra.  Is this a known issue?  I'm seeing quite a bit of it: http://logs.openstack.org/06/502306/10/check/gate-nova-specs-docs-ubuntu-xenial/e9f94d9/console.html12:33
*** udesale has joined #openstack-infra12:36
*** kgiusti has joined #openstack-infra12:36
*** udesale has quit IRC12:37
*** udesale has joined #openstack-infra12:37
rosmaitaalso on https://review.openstack.org/#/c/493654/712:38
rosmaitaactually, https://review.openstack.org/#/c/493654/812:39
*** ociuhandu has joined #openstack-infra12:40
*** adarazs is now known as adarazs_brb12:41
vsaienk0efried: we need to switch to bindep to fix it  https://review.openstack.org/#/c/444201/12:41
*** links has quit IRC12:41
efriedvsaienk0 That needs to be done for every project?12:42
vsaienk0looks like upstream deb repo is broken, and we install default package list, which actually is not needed for ironic tests, so adding bindep to our project with exact depends fixes problem12:42
eumel8efried, rosmaita: hat are known issues. look at http://lists.openstack.org/pipermail/openstack-dev/2017-October/123489.html12:42
*** andreas_s has quit IRC12:43
eumel8s/hat/that/12:43
*** LindaWang has quit IRC12:43
*** andreas_s has joined #openstack-infra12:43
*** liusheng has quit IRC12:45
*** chenying_ has quit IRC12:45
vsaienk0efried: ideally each project should have its own bindep file with only exact dependencies it needs. By not having this file enforce jobs to install default package list12:45
*** liusheng has joined #openstack-infra12:45
*** bobh has joined #openstack-infra12:47
*** mriedem has joined #openstack-infra12:47
*** florianf has quit IRC12:52
*** florianf has joined #openstack-infra12:52
openstackgerritStephen Finucane proposed openstack-dev/pbr master: Discover Distribution through the class hierarchy  https://review.openstack.org/39918812:54
openstackgerritStephen Finucane proposed openstack-dev/pbr master: Remove unnecessary 'if True'  https://review.openstack.org/51080612:54
stephenfindhellmann, mordred: Want to take a look at those? Think they can merge now (setuptools had changed stuff under the hood)12:54
*** LindaWang has joined #openstack-infra12:56
*** andreas_s has quit IRC12:57
*** lucas-hungry is now known as lucasagomes12:58
*** andreas_s has joined #openstack-infra13:01
*** adarazs_brb is now known as adarazs13:02
*** jcoufal has joined #openstack-infra13:03
*** esberglu has quit IRC13:03
openstackgerritMerged openstack-infra/tripleo-ci master: Remove unnecessary scripts from tripleo-ci  https://review.openstack.org/51082813:06
AJaegersambetts: ah, that'S what you mean - better ask the zuul folks, can't help with that one13:10
AJaegerefried, rosmaita http://lists.openstack.org/pipermail/openstack-dev/2017-October/date.html13:11
AJaegerargh, wrong link - I see eumel8 gave the correct one...13:11
rosmaitaAJaeger thanks13:11
sambettsAJaeger: what channel can I find zuul folks in? here or do they have a separate one?13:12
*** mat128 has joined #openstack-infra13:13
AJaegersambetts: #zuul ;)13:13
*** baoli has joined #openstack-infra13:13
sambettsah not openstack-zuul (tried that one and it didn't exist)13:13
AJaegersambetts: but here as well - just give them a chance to wake up and drink their morning coffee, please :)13:13
sambettsof course :D13:13
AJaegerrosmaita: regarding bindep: Yes, that might help in this case - you might want to review https://review.openstack.org/#/c/468159/13:15
aspiersFYI, how Google uses Gerrit: https://gitenterprise.me/2017/10/10/gerrit-user-summit-gerrit-at-google/13:15
aspierspretty interesting setup at scale13:15
AJaegerjamespage: http://lists.openstack.org/pipermail/openstack-dev/2017-October/123489.html is the link I wanted to point you to earlier13:15
*** camunoz has joined #openstack-infra13:17
*** esberglu has joined #openstack-infra13:18
*** edmondsw_ is now known as edmondsw13:19
*** ykarel has quit IRC13:20
*** ykarel has joined #openstack-infra13:20
*** esberglu has quit IRC13:21
*** esberglu has joined #openstack-infra13:23
*** tikitavi has joined #openstack-infra13:23
*** chlong has joined #openstack-infra13:25
*** cshastri has quit IRC13:27
*** jaosorior has quit IRC13:27
*** sree has joined #openstack-infra13:27
*** dbecker has quit IRC13:27
*** dbecker has joined #openstack-infra13:27
*** sree has quit IRC13:31
*** gmann is now known as gmann_afk13:31
openstackgerritJean-Philippe Evrard proposed openstack-infra/irc-meetings master: Moving the OpenStack-Ansible meeting time and channel  https://review.openstack.org/51147913:33
*** sree has joined #openstack-infra13:37
*** sbezverk has joined #openstack-infra13:39
fungiokay, i'm here and catching up on scrollback now. i can already see we're still out of inodes on the logs volume even though the tempfile deletion pass and expired log purging are still going from before i went to sleep13:41
*** sree has quit IRC13:41
AJaegerstill out of inodes? Ooops ;( And Ubuntu mirror also still broken ;(13:42
AJaegerfungi: you didn't sleep long enough ;) Good morning!13:42
AJaegerfungi, jeblair, good news: We had for the first time periodic jobs running with Zuul v3. Read in backscroll and etherpad some of the issues it uncovered ;)13:43
*** cshastri has joined #openstack-infra13:43
*** kashyap has joined #openstack-infra13:44
kashyapCan anyone link to the upstream live migration job, please?13:47
fungiykarel: the missing tripleo release tarballs from yesterday are due to the stale ubuntu mirror issue. we'll rerun the jobs to build and publish them as soon as that's sorted out13:48
*** gouthamr has joined #openstack-infra13:49
*** kiennt26 has joined #openstack-infra13:50
*** nikhil_ has joined #openstack-infra13:52
*** kiennt26 has quit IRC13:52
*** nikhil_ is now known as Guest4851613:52
*** kiennt26 has joined #openstack-infra13:53
*** Guest48516 is now known as nikhil_k13:53
*** andreas_s has quit IRC13:54
*** andreas_s has joined #openstack-infra13:55
*** kiennt26 has quit IRC13:55
*** kiennt26 has joined #openstack-infra13:55
ykarelfungi, Ok13:56
*** srobert has joined #openstack-infra13:56
*** srobert has quit IRC13:56
*** smatzek has joined #openstack-infra13:56
fungithe last mirror-update pass ianw started seems to still be running13:57
*** srobert has joined #openstack-infra13:57
openstackgerritPetr Kovar proposed openstack-infra/irc-meetings master: Update chair for doc team meeting  https://review.openstack.org/51148413:57
*** andreas_s has quit IRC13:57
AJaegerfungi, ianw sent a status report via email - on the infra list13:57
*** andreas_s has joined #openstack-infra13:57
fungiAJaeger: thanks, i'm caught up on irc scrollback so proceeding to e-mail backlog next13:58
fungialso, i seem to be having some massive packet loss from here today... not sure what's up13:58
fungi(my home broadband uplink i mean)13:58
AJaegernot good ;(13:59
*** sree has joined #openstack-infra14:00
Shrewsfungi: Oh no! Pack   loss i  not goo . Hope  t gets b tter fo  you.14:03
fungi;)14:03
*** gildub has quit IRC14:06
*** jaosorior has joined #openstack-infra14:09
AJaeger:)14:09
*** hamzy has quit IRC14:12
*** hongbin has joined #openstack-infra14:13
*** chlong has quit IRC14:13
*** eumel8 has quit IRC14:14
*** florianf has quit IRC14:15
*** florianf has joined #openstack-infra14:15
*** chlong has joined #openstack-infra14:19
pabelangerianw: thanks14:20
pabelangerYa, inodes look bad still14:21
jeblairi'm on mirror-update and looking through the screen windows and don't see one running reprepro14:22
fungiinfra-root: making a judgement call here, i think we're going to need to drop our retention on the logs site14:22
*** hemna_ has joined #openstack-infra14:23
fungijeblair: one of the screen windows was when i looked a moment ago14:23
jeblairnor do i see a reprepro process running14:23
pabelangerjeblair: yes, ianw last updated is it failed to write to AFS directories14:23
*** yamamoto has quit IRC14:23
pabelanger/tmp/ianw contains logs14:23
fungijeblair: no, wauit, you're right. that was the flock bysywait i saw14:23
jeblairpabelanger: where's that update?  last i saw ianw said it was still running.14:24
fungijeblair: infra mailing list14:24
pabelangerjeblair: ianw posted a reply to infra ML14:24
pabelangerI'm still getting up to speed myself14:24
jeblair"I restarted for good luck,"14:24
*** cshastri has quit IRC14:25
fungimay as well have been "for great justice"14:25
jeblairthat makes me think we should see either a process running, or a prompt right after a process died14:25
jeblairi can find neither14:25
*** rbrndt has joined #openstack-infra14:27
jeblairalso, do we still have the old images?  can we delete the new ones from nodepool to get things working again?14:27
*** psachin has quit IRC14:28
pabelangerlooking14:29
pabelangerwe still haven't upload to rackspace, so we should be good there for now14:29
pabelangerI don't think out other clouds have a good image anymore14:29
jeblairokay. we should have deleted the new images yesterday, as soon as this started.14:30
*** armax has joined #openstack-infra14:30
fungibut does nodepool remove the old image from disk if it hasn't been able to upload newer images?14:30
pabelangersince the AFS read-only mirror is working, we could also do the work needed to have DIBs use them14:30
jeblairfungi: good point, it does not.14:30
pabelangerwhich, should fix our issues for now, but pin us at specific version of xenial14:30
pabelangerfungi: yah14:30
fungithinking rackspace may have saved us here14:31
jeblairShrews: around?14:31
fungisince there are several people focusing on the mirror situation, i'll focus on the logs site14:31
jeblairfungi: ++14:31
jeblairfungi: i support whatever retention period you want to use :)14:31
fungiinfra-root: last call for objections, i'm planning to reduce our log retention from 4 to 3 weeks (and if that doesn't help fast enough, i'll drop it to 2)14:32
pabelanger++14:32
Shrewsjeblair: yes14:32
clarkbfungi: seems reasonable, its still just an inode problem right?14:32
jeblairShrews: can you and pabelanger work on getting the old rax images uploaded everywhere?14:33
fungiclarkb: yes, but we have so many that traversing them to fnid tempfiles or 4-week old files seems to be taking too long, so we need something with a higher hit-rate for now i expect14:33
jeblairthis is a tricky thing we've never done before, so best to proceed carefully14:33
clarkbfungi: maybe before deleting the old stuff really quickly do an inode cou t between some common jobs like tempest just to make sure we havent regressed there too?14:34
pabelangerjeblair: Shrews: if we manually upload, we could use cloud-images section in nodepool?14:34
*** andreas_s has quit IRC14:34
pabelangerotherwise, will defer to Shrews14:34
*** andreas_s has joined #openstack-infra14:35
clarkbthough that will just tell us if it has changed, not what or why (so maybe less important)14:35
fungiclarkb: i'll see if i can spot anything real quick14:35
Shrewspabelanger: not sure. lemme catch up on things14:35
fungiclarkb: but my guess is that all these other unrelated issues are just resulting in higher log volume as jobs fail more quickly and people are rechecking them all14:36
clarkbya14:37
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace  https://review.openstack.org/51149214:37
jeblairpabelanger, Shrews: rolling forward is a good option too :)14:38
pabelangerjeblair: clarkb: fungi: ^in case we want to go this route. Should make our DIB builds use AFS mirrors for ubuntu14:38
fungiclarkb: actually, i'm not even sure i can easily get a representative sample since we're not successfully uploading logs14:38
*** xarses has joined #openstack-infra14:38
clarkbfungi: oh right ugh14:39
fungii need to find one which successfully managed to upload all its files14:39
pabelangerjeblair: clarkb: fungi: regardless what we do, I believe we also want to disable nb03.o.o, as it has a slow uplink to clouds. Takes upwards of 6 hours to upload, vs 30mins on nb04.o.o14:39
*** caphrim007 has quit IRC14:39
*** links has joined #openstack-infra14:39
*** andreww has joined #openstack-infra14:41
pabelangerI've stopped nb03 for now14:41
*** andreww has quit IRC14:41
fungiclarkb: i expect that it will be easier to find the culprit(s) and address the issue once we have working log uploads again, so i'm going ahead with the >3-week purge now14:42
pabelangerOkay, I think ubuntu-xenial-0000001137 is the DIB we want to save14:42
pabelangerthat is our oldest ubuntu-xenial image14:42
Shrewspabelanger: so cloud-images is a feature/zuulv3 thing (builders aren't running that), so that's a no-go14:42
pabelangerkk14:43
*** xarses has quit IRC14:43
pabelangerubuntu-xenial-0000001138 is also an option I think14:43
*** andreww has joined #openstack-infra14:44
*** supertakumi86 has joined #openstack-infra14:44
pabelangerand believe that is what we are booting in rackspace now14:44
pabelangertrying to confirm14:44
Shrewspabelanger: is rebuilding a new image an option?14:44
openstackgerritMichael Turek proposed openstack/diskimage-builder master: Add iscsi-boot element  https://review.openstack.org/51149414:44
andreykurilinhi folks! There are a lot of POST_FAILURES in zuul_v2. Is it ok?14:44
pabelangerShrews: it is, but we'll need 51149214:44
pabelangerI'm happy to give it a try14:44
*** supertakumi86 has quit IRC14:45
openstackgerritMichael Turek proposed openstack/diskimage-builder master: Add iscsi-boot element  https://review.openstack.org/51149414:45
pabelangerinfra-root: do we want to roll forward an image with 511492 first?14:45
jeblairclarkb: want to work on a status alert?14:45
pabelangerbut, we should first copy ubuntu-xenial-0000001138 or ubuntu-xenial-0000001137 to be safe14:46
*** jkilpatr has quit IRC14:46
clarkbjeblair: sure14:46
jeblairpabelanger: whatever you and Shrews think is safest and quickest14:46
fungifor the current 3-week expiration pass i've also switched from -exec rm {} \; to -delete and removed the check for removing empty directories for now in hopes this will cover ground more quickly14:46
*** iyamahat has joined #openstack-infra14:46
pabelangerjeblair: ack14:47
pabelangerShrews: okay, so sounds like you want to try new image? I'll let you review 511492 and I'll save our DIBs14:47
clarkbhow does "Job log retention is being reduced to get inode consumption under control. Separately we are updating job instance images to use our ubuntu mirrors temporarily addressing the problems with Xenial packaging."14:48
clarkber how does that look14:48
Shrewspabelanger: i think so. if we delete the bad images, i think a new one is just going to be built anyway14:48
clarkbmaybe too verbose14:48
jeblairclarkb: maybe cover the problem and symptoms first.  i don't think folks need to know what we're doing14:48
fungis/updating/reverting/ maybe14:48
fungibut yeah14:49
pabelangerShrews: okay, lets land 511492 and give it a go14:49
Shrewspabelanger: +A'd14:49
*** iyamahat_ has joined #openstack-infra14:49
clarkb"Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow."14:50
clarkbthat better?14:50
jeblairclarkb: wfm14:50
fungiship it14:50
pabelanger++14:50
Shrewspabelanger: if nb03 is stopped, 0000001137 and 0000001138 will not be deleted, but good to make a backup anyway14:50
pabelangerShrews: ya, doing that in /opt/nodepool_dib.backup-pabelanger now14:51
*** iyamahat__ has joined #openstack-infra14:51
*** Swami has joined #openstack-infra14:51
Shrewspabelanger: once your change lands, we'll delete 0000001140 and 000000114114:51
clarkb#status alert Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow.14:51
openstackstatusclarkb: sending alert14:51
jeblairi'm going to add a new volume to the afs server because the debian volume has 7G free.   then i will increase the quota on that volume.  then i will reboot both the mirror-update server and afs01.dfw.  then i will attempt the mirror repair again.14:51
pabelangerclarkb: jobs that don't need gnutls, could stop using bindep-fallback.txt and properly add their own bindep.txt also14:51
pabelangernot long term, but help mitigate the issue14:52
*** iyamahat has quit IRC14:52
*** jkilpatr has joined #openstack-infra14:53
*** andreas_s has quit IRC14:53
*** iyamahat__ has quit IRC14:53
*** yamahata has joined #openstack-infra14:53
*** iyamahat__ has joined #openstack-infra14:53
-openstackstatus- NOTICE: Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow.14:53
*** ChanServ changes topic to "Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow."14:54
clarkbAnything that uses cryptography needs it though right? That is going to be many things14:54
clarkbthe subset that doesnt need it is probably small enough that identifying it is too mich work14:54
*** rcernin has quit IRC14:55
*** iyamahat_ has quit IRC14:55
*** hamzy has joined #openstack-infra14:55
openstackstatusclarkb: finished sending alert14:57
*** iyamahat__ has quit IRC14:58
*** iyamahat has joined #openstack-infra14:58
pabelangerclarkb: Shrews: 511492 failed with POST_FAILURE. I suggest we add nodepool to emergency file, and manually apply. While we attempt to recheck it14:58
clarkbpabelanger: wfm, though its builders that need it?14:59
jeblairclarkb, pabelanger: is the npm mirror in production?14:59
pabelangerno, there is a patch up to remove it from AFS15:00
pabelangerclarkb: ya, nb0415:00
jeblairokay, i'll kill the process15:00
Shrewspabelanger: are you confident that will fix the build image? because the other option here is just pause image builds and delete the bad images and run with the older images for awhile15:01
jeblairoh that's nice, the npm mirror releases even if the process is killed15:01
*** eharney has quit IRC15:02
*** chlong has quit IRC15:02
*** andreas_s has joined #openstack-infra15:02
Shrewsbut if we know it will fix it, would be best to move forward with the fix, IMO15:02
clarkbShrews: I dont think any  of our images are old enough to work at this point15:03
jeblairclarkb: rax15:03
pabelangerShrews: no, we need to first upload the good images, only rackspace today have them15:03
clarkbright except in rax15:03
pabelangerso, that process needs to be manual15:03
jeblairclarkb: we have the images on disk15:03
clarkboh right we keep all the formats15:03
clarkbuntil all formats can be removed15:04
jeblairand this is why15:04
Shrewsoh, rax has the oldest ones. got it15:05
*** jaosorior has quit IRC15:05
pabelangerokay, patch manually applied15:07
pabelangerready to image-build xenial15:07
pabelangerclarkb: Shrews:^15:07
*** dhajare has quit IRC15:08
Shrewspabelanger: cool. kick it off15:08
pabelangerstarted15:09
evrardjphey, is there an env var I can use in my job to check if I am under zuul v3 or jenkins?15:09
pabelangerevrardjp: I think we said $(whoami)? If user is jenkins, then you are jenkins15:10
pabelangerotherwise it would be zuul15:10
evrardjpok15:10
AJaegerpabelanger: NO!15:10
pabelangerevrardjp: listen to AJaeger15:10
evrardjpI couldn't use that :)15:10
AJaegerevrardjp: check email by Monty, let me find it quickly...15:10
*** annp has joined #openstack-infra15:11
evrardjpAJaeger: let me search then15:11
pabelangerShrews: k, we are pulling packages from http://mirror.dfw.rax.openstack.org/ubuntu now15:11
AJaegerevrardjp: http://lists.openstack.org/pipermail/openstack-dev/2017-October/123049.html15:11
evrardjpwow that was fast15:11
AJaeger;)15:11
AJaegerbbl, shutting down here so won't be able to read backscroll for some time...15:12
*** sree has quit IRC15:12
*** AJaeger has quit IRC15:12
evrardjpAJaeger but that's only openstack's zuul's behavior, I don't have a magic variable I can use outside if need be15:13
evrardjpat least I have something.15:13
evrardjpthanks!15:13
Shrewspabelanger: i think we'll need delete 1141 before the new one will upload. i see 1141 still uploading rax too15:14
*** iyamahat has quit IRC15:14
*** yamahata has quit IRC15:14
*** sadasu has joined #openstack-infra15:15
Shrews(my weechat session is picking a very poor time to randomly freeze on me)15:15
*** andreas_s has quit IRC15:15
*** eharney has joined #openstack-infra15:15
pabelangerk, we had a minor issue with --allow-unauthenticated, working on patch15:16
*** chlong has joined #openstack-infra15:16
jeblairis the rubygems mirror in production?15:19
pabelangerno, we are using reverse proxy for that also15:20
jeblairokay we *really* need to clean this stuff up15:20
fungiinfra-root: the three-week expiration is not making traction fast enough to keep pace with new log uploads either. i'm going to switch to a two-week expiration as a last-ditch before we have to consider disabling uploads for a while to bring utilization back down or randomly deletnig rtees of the filesystem since using find to stat modify time is just too slow15:20
jeblairnpm and ruby are both making this work very difficult15:20
jeblairinfra-root: i'm going to delete both from afs15:21
fungijeblair: sounds good15:21
clarkbfungi: do you think stat might not be able to keepp up? or are older logs less inody?15:22
fungiclarkb: i expect it's a combination of both of those plus we're uploading a lot more logs with zuul v3 also running check jobs15:22
fungiproblem is finding those newer inody job logs and purging them is at least as expensive if not moreso than the date-based expirations15:23
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace  https://review.openstack.org/51149215:23
fungiif we had the tree sharded by date/time somehow this would be a cinch15:23
pabelangerclarkb: Shrews: ^updates needed for AFS mirrors in DIBs. Logic come from nodepool-dsvm jobs15:23
pabelangerhelps if I git add15:24
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace  https://review.openstack.org/51149215:24
*** yamamoto has joined #openstack-infra15:24
fungiclarkb: and also as i said earlier, all the recent issues in general are causing people to recheck changes far more frequently in vain hope that they'll suddenly work15:24
clarkbpabelanger: looks like it should work15:25
pabelangerclarkb: I think we might have to ask tripleo to drop more things, /etc for example15:25
pabelangerwith 2 zuuls running and uploading to logs.o.o, I won't be surprised if tripleo jobs are eating all the inodes now15:26
clarkbpabelanger: ya once we've got some breathing room we'll need to gather data and see where inode usage is15:26
jeblairi'm going to restart afs02.dfw15:27
*** udesale has quit IRC15:27
*** electrofelix has quit IRC15:30
fungii have my fingers crossed on the current >2-week purge, but it's not looking good so far and i'm afraid we're going to have to do recursive deletes based on some sort of filesystem glob rather than by age to get things back to sanity before we can make further progress through less disruptive means15:30
jeblairi'm going to restart afs01.dfw now15:31
*** yamamoto has quit IRC15:31
fungilike, i can delete _all_ logs for specific jobs by name, or just remove jobs at random by wiping a high-level sibdirectory or two15:31
jeblairand mirror-update15:31
*** dangers has joined #openstack-infra15:32
clarkbfungi: we might be able to construct some deletes based on change numbers to roughly correlate to dates?15:32
*** LindaWang has quit IRC15:32
fungithat would be a very rough correlation15:32
fungilike, what delete jobs for any change id numbers below a certain threshhold?15:34
*** andreas_s has joined #openstack-infra15:34
*** Swami has quit IRC15:34
*** Swami has joined #openstack-infra15:35
*** hashar has joined #openstack-infra15:36
fungia quick listing of 6-digit change ids prior to 500000 says there are 4646 of those15:36
clarkbya15:36
clarkbthough at this point those may actually be relatively active as they would've aged out already on their own otherwise15:37
*** iyamahat has joined #openstack-infra15:37
*** vsaienk0 has quit IRC15:37
*** ykarel has quit IRC15:37
*** annp has quit IRC15:37
pabelangerI'm starting to like the idea of top-level hash by UTC date again15:38
*** iyamahat has quit IRC15:38
*** iyamahat has joined #openstack-infra15:38
jeblairlet's not redesign the system now15:38
*** ykarel has joined #openstack-infra15:38
pabelangerokay, ubuntu-xenial now building properly with latest 511492 applied15:39
pabelangerin devstack-cache element now15:39
*** e0ne has quit IRC15:40
*** andreas_s has quit IRC15:43
clarkbfind gate-tripleo-ci-centos-7-3nodes-multinode-nv | grep -v '/tmp/ansible' | wc -l reports 39549 /me finds some other comparisons15:43
*** yamamoto has joined #openstack-infra15:45
*** yamamoto has quit IRC15:45
clarkber that was for multiple runs, its 9889 for a single run. 917 for single run of the tempest multinode job15:46
pabelanger100x15:46
pabelanger:(15:46
*** links has quit IRC15:47
*** andreas_s has joined #openstack-infra15:48
clarkblegacy multinode job under zuulv3 is roughly in that 900 range too15:48
jeblairpabelanger: how do i run the reprepro _detect command?15:48
pabelangerjeblair: you be: reprepro _detect15:49
pabelangerreprepro --configdir /etc/reprepro/ubuntu _detect maybe15:49
jeblairpabelanger: i ran:15:50
jeblaircd /afs/.openstack.org/mirror/ubuntu15:50
jeblairfind pool -type f -print | reprepro -b . _detect15:50
jeblairand then got:15:50
jeblairError opening config file './conf/distributions': No such file or directory(2)15:50
jeblairpabelanger: so i'm looking for the exact command you or ianw ran15:50
jeblairshould i not be doing the find thing?15:51
jeblairthat's what it said in step 3 of https://github.com/esc/reprepro/blob/master/docs/recovery which ianw linked15:51
ihrachysall those POST_FAILURE failures that happened on all my patches yesterday, are those gone and we can recheck?15:51
ihrachysfailures as in https://review.openstack.org/#/c/507966/15:51
jeblairihrachys: no, see channel topic15:51
ihrachysok thanks15:52
*** kiennt26 has quit IRC15:52
Shrewspabelanger: so... i'm not sure how any image uploads are working at all. i'm seeing shade exceptions during image upload in builder logs15:52
pabelangerShrews: not sure either, I haven't looked at logs yet15:53
*** vsaienk0 has joined #openstack-infra15:54
pabelangerjeblair: hmm, let me see, I didn't try _detect15:54
pabelangerjeblair: I think you need to pass --confdir /etc/reprepro/ubuntu to your reprepro command15:55
*** yamahata has joined #openstack-infra15:55
jeblairpabelanger: okay -- is that the command i should be running?15:55
*** egonzalez has quit IRC15:55
jeblairi'm basically trying to just follow whatever instructions you and ianw gave.  i thought it was clear, but it's becoming less so15:55
*** priteau has quit IRC15:55
jeblairpabelanger: like, i'm very confused why you don't think i should run _detect.  what do you think i should do?15:56
pabelangerjeblair: I am not sure, which document are you looking at currently? What is it you want reprepro to do?15:56
*** kashyap has left #openstack-infra15:56
jeblairhttps://github.com/esc/reprepro/blob/master/docs/recovery15:56
jeblairpabelanger: ianw said checksums.db was bad.  that says that's what you do when checksums.db is bad.15:56
jeblairpabelanger: is that not what you were doing yesterday?15:56
pabelangerno, I only tried step 1 (rereference) before I passed the torch to ianw15:57
pabelangerso, this is new process for me also15:57
jeblairpabelanger: how did you determine referencesdb was bad?15:57
pabelangerjeblair: I didn't, I never deleted referencedb, but just tried rereference to see if there was any corruption15:58
pabelangerit command worked properly15:58
jeblairpabelanger: thanks.  i'll take it from here.15:58
pabelangerokay15:59
openstackgerritBrian Rosmaita proposed openstack-infra/project-config master: Remove workflow +1 on glance_store from swift-core  https://review.openstack.org/51151715:59
*** andreas_s has quit IRC16:00
*** links has joined #openstack-infra16:01
*** ralonsoh has quit IRC16:01
*** chlong has quit IRC16:03
*** tikitavi has quit IRC16:04
jeblair#status log removed mirror.npm volume from afs16:04
openstackstatusjeblair: finished logging16:04
*** ykarel is now known as ykarel|afk16:05
*** edmondsw has quit IRC16:05
pabelangerShrews: it looks like maybe just rackspace upload have the issue in shade. I can see an inap working in debug log16:06
*** erlon has quit IRC16:07
smatzekThe trove gate has been broken by one thing or another since the PTG. I've been working for 2-3 weeks trying to fix it up.  In the past couple days I've seen errors like this from the gate-trove-python27-ubuntu-xenial checks whereas the openstack-tox-py27 runs clean.  Is this a known issue with v3? "libcurl4-gnutls-dev : Depends: libcurl3-gnutls (= 7.47.0-1ubuntu2.2) but 7.47.0-1ubuntu2.3 is to be installed"16:08
smatzekhttp://logs.openstack.org/87/507087/15/check/gate-trove-python27-ubuntu-xenial/3168c43/console.html16:08
jeblairsmatzek: we're not running v3.  it's a known issue.  see topic.16:08
Shrewspabelanger: i think it's only providers using tasks to upload images16:09
pabelangerShrews: okay, I think that is only rackspace for us16:09
*** dangers has quit IRC16:09
smatzekthanks, I read the upload issue but glazed over the gnutls16:09
*** masber has quit IRC16:09
clarkbfungi: poking around the tmp/ansible fix definitely seems to have cut down on tripleo inode consumption but they are still about an order of magnitude more inodes per job run than say multinode devstack + tempest16:10
pabelangerwe're just compressing ubuntu-xenial DIB now, shouldn't be much longer before we start uploads16:10
*** andreas_s has joined #openstack-infra16:10
pabelangersmatzek: hopefully not much longer, new images should be coming online in the next hour16:10
*** dangers has joined #openstack-infra16:10
*** camunoz has quit IRC16:11
clarkband are significantly smaller under zuulv3 I think because log collection is broken there for som reason for them16:11
fungiclarkb: good to know. we _could_ just delete the logs for those jobs specifically. is there a solid file glob i could match on to get all those?16:11
*** edmondsw has joined #openstack-infra16:12
fungitrying to do ti by arbitrary pattern matching with find is not going to be fast enough16:12
clarkbfungi: gate-tripleo-ci- is the job name prefix16:12
clarkbthe ara install at top level of all zuulv3 jobs is coming in at 400-600 inodes depending on job looks like16:13
clarkbwhich may significantly bump inode overhead for all the things that weren't really copying many files before hand16:13
Shrewspabelanger: i am REALLY confused as to how rax has _any_ uploads16:13
*** dhinesh has joined #openstack-infra16:14
*** andreas_s has quit IRC16:14
clarkb455 inodes for nova pep8 under zuulv3, 24 under v216:15
jeblairpabelanger: i have a dumb question -- is it possible for us to just copy the db files from the most recent read-only release into place, and then run reprepro normally?16:15
clarkb441 of the v3 side is ara16:15
jeblairfungi, clarkb: ^ do you know enough about reprepro to know if that's okay?16:15
*** AJaeger has joined #openstack-infra16:16
*** edmondsw has quit IRC16:16
fungijeblair: that _seems_ like it should be okay, but i don't really know. i get the impression that's where it keeps its state anyway so makes sense that it should be able to roll forward again from there16:16
pabelangerjeblair: I don't see why we can't try. reprepro should be smart enough to detect differences and update where needed. reprepro check and reprepro checkpool should be how we audit16:16
*** priteau has joined #openstack-infra16:16
jeblairya -- like maybe it re-downloads some new files or something. that'd be fine.16:16
jeblairokay, i'll give that a shot16:16
jeblairand verify with pabelanger's suggested commands16:17
fungipabelanger: unfortunately the checks seem to be designed to retrieve every file from the filesystem (so over slow udp datagrams in the case of afs) to recalculate checksums, right?16:17
jeblairfungi: well, that's been the repair process to date anyway16:17
fungior does it have a check mode to just verify filenames?16:18
* SpamapS peeking back in and seeing reprepro and inode issues... quickly retreats like a groundhog seeing his shadow16:18
*** dangers` has joined #openstack-infra16:19
*** AJaeger has quit IRC16:19
*** AJaeger has joined #openstack-infra16:19
*** dangers has quit IRC16:20
fungiclarkb: are there specific paths under those gate-tripleo-ci-* log trees i should remove, or are those scattered and better to just remove the entire tree for each job matching that name pattern?16:20
SpamapSreprepro's check process likely is also reading all of the metadata from every package.16:20
clarkbfungi: logs/undercloud/tmp/ansible logs/ara_oooq logs/undercloud/etc seem to be large hitters16:21
*** AJaeger has quit IRC16:21
*** sadasu has quit IRC16:21
clarkbfungi: I'm currently trying to get a count for what I hope is a representative nova change t osee if we should expect to be able to store 4 weeks of nova change logs16:21
clarkbchange 509039 has ~157208 inodes16:22
mordredmorning all16:22
*** AJaeger has joined #openstack-infra16:22
clarkbwe have inodes for about 5122 nova changes if we treat that as representative16:22
*** camunoz has joined #openstack-infra16:22
mordredclarkb: holy crap! - 157208 is a lot of inodes for one change16:22
pabelangerfungi: yah, it was a slow process last night when I did reprepro export, that walked all the files in the pool for generating indexes16:22
clarkbmordred: ya I'm going to start trying to break that down16:23
*** andreas_s has joined #openstack-infra16:24
fungiclarkb: thanks, i'm still waiting for ls | wc -l to return a count for the pattern /srv/static/logs/??/??????/*/*/gate-tripleo-ci-*16:24
fungiwell, ls -d specifically16:24
pabelangerclarkb: as I am watching this DIB rebuilt again, I'm noticing we're approaching 1 hour build times again. I think it is possible we might want to delete our git cache on builders for a fresh (and smaller) start shorlty16:26
stephenfindhellmann: Got a few mins?16:27
stephenfinCurious about the rationale behind pre/post-versioning in pbr16:27
dhellmannstephenfin : I'm on a call. ~30 min?16:27
* stephenfin didn't know you could do 'Sem-Ver:' trailers16:27
stephenfindhellmann: I'll probably be gone by then. Tomorrow is fine :)16:27
dhellmannstephenfin : ok. or email to the -dev list (this sounds like something others might be interested in and have input into)16:28
jeblairit looks like the checksumsdb on the ro volume is corrupt, so i'm back to running the find | _detect command16:28
AJaegerfungi, what about remove /srv/static/logs/0b0bbd59a9be905da869ace3797919f9cd6217/ etc - those are logs that nobdy finds...16:28
AJaegerthese came from initial Zuul v3 logs16:28
clarkbmordred: check's patchset 4 of that change is 11637316:28
clarkblooks like ~8 rechecks16:28
clarkbI think there is a lot of weight behind the "constant rechecks are just making it worse" theory based on ^16:29
jeblairi am running that command with a copy of the db directory on local disk, so if there is any trouble writing to the fileserver, we shouldn't lose the whole operation.16:29
mordredclarkb: yah - that makes an amount of sense16:29
pabelangerclarkb: Shrews: ubuntu-xenial DIB finished, we've started uploads16:29
jeblairbut as ianw calculated, best case for this is probably 6 hours16:30
Shrewspabelanger: ++16:30
clarkbopenstack-tox-pep8 is ~1800 and old pep8 job is ~200 over those rechecks16:30
clarkbmaybe we sould consider not building a static ara for every build? we could maybe just upload them on failures?16:30
*** jkilpatr_ has joined #openstack-infra16:31
*** jpich has quit IRC16:31
fungiAJaeger: sure, i can do that and maybe it'll free up some as well16:31
pabelangerclarkb: wow, large difference16:32
*** jkilpatr has quit IRC16:32
pabelangerclarkb: sounds like we need to update ARA regardless. But I do only look at it today if there is a failure16:32
*** andreas_s has quit IRC16:33
*** trown is now known as trown|lunch16:33
clarkbpabelanger: ya me too, which is why I had that idea:) it is really handy for understanding failures, but skimming successes tends to happen in the job output log for me (or job specific logs)16:33
pabelangerclarkb: maybe we can see how much effort would be involved from dmsimard to clean up little files16:34
Shrewspabelanger: I'm actually unclear as to why those are already uploading since 1141 is less than 24 hours old16:34
*** ykarel|afk has quit IRC16:34
*** sambetts is now known as sambetts|afk16:34
pabelangerShrews: not sure myself16:34
SpamapSis there a summary of why some jobs take up so many inodes? Purely curious.16:35
*** Apoorva has joined #openstack-infra16:35
*** Guest66337 has quit IRC16:35
clarkbSpamapS: in the general case, ara seems to be a big hitter. In specific cases some tripleo jobs were copying all of ansibles /tmp contents16:36
*** hashar is now known as hasharAway16:36
clarkbSpamapS: there are also bits of some jobs like tripleo that copy a good chunk of /etc which if grabbing /etc/selinux gets about an aras worth of files too16:36
*** vsaienk0 has quit IRC16:36
pabelanger2017-10-12 16:36:50,057 INFO nodepool.builder.UploadWorker.0: Image build ubuntu-xenial-0000001155 in ovh-bhs1 is ready16:37
pabelangerOVH nice and fast :)16:37
clarkboh also tripleo has its internal ara_oooq which is much larger than the zuul ara16:37
jeblairclarkb: ara is optional in zuulv3; the process to disable it is just to uninstall it and restart the executors.16:38
pabelangerclarkb: we could propose ara_oooq disable now, as zuulv3 wouldn't make it needed?16:38
Shrewspabelanger: oh, nm. uploads happen anytime. dib rebuilds happen only after 24 hours16:38
SpamapSclarkb: if it's just for debugging and not quick viewing... tar instead of rsync?16:38
jeblairSpamapS: that would likely be so difficult to use to not make it worthwhile; i would find it easier to just read the raw json.16:39
pabelangerSpamapS: clarkb: I think stackviz has the right idea of what it does, a single json file IIRC, to render data from16:39
bodenhi, as of recent I’ve seen a number of failures in the v2 jobs “ERROR: These requested packages were not installed..”  is this a known issue?16:39
pabelangerboden: it is known, we are pushing up new images now to try and fix it16:39
jeblairi think i can channel dmsimard here and say the right way to use ara in this situation is with a centralized reporting database.16:39
dmsimardI don't believe there is a short term opportunity to reduce the amount of files generated in an ARA report, it's basically one html file per result/host/etc. The best option is to consider a centralized instance of sorts (not unlike openstack-health)16:40
fungiSpamapS: if i were to redesign this entire system, i'm starting to think that it would have made more sense to archive a tarball of logs from each build and then have a human-friendly frontend which temporarily unpacks and serves it on demand... but then you also get the benefits that anyone or any system who wants to grab all the logs for a build can just pull that tarball16:40
dmsimardjeblair: thou hath summoned me16:40
AJaegerwe also store both  job-output.json.gz  and job-output.txt.gz - and the json is 3 times as large as the txt16:40
jeblairdmsimard: nailed it!16:40
bodenpabelanger: ack16:40
bodenthanks16:40
pabelangerjeblair: dmsimard: woah16:40
jeblairAJaeger: the json has all the information in it; we still haven't gotten the text quite right yet.  it very frequently does not have info wee need to diagnose errors.16:41
SpamapSjeblair: I was just thinking for the instances where people try to grab all of /etc, not ARA.16:41
AJaegerjeblair: ack16:41
jeblairSpamapS: ah16:41
dmsimardThe thing about a centralized instance is that we don't want every ansible run to synchronously report each task/result to a central (mysql) database remotely, just the added latency I suspect would be noticeable -- especially for regions farther away from the database server.16:42
dmsimardWe would need to asynchronously import the data, somehow.16:42
clarkbSpamapS: well we've also asked htat that stop happening and it has improved over time, but still finding cases here and there where I think it must be a blacklist instead of a whitelist of copies16:42
dmsimardA quick hack would likely be to recover the sqlite database and import it in a central location a bit like we trigger logstash things16:43
openstackgerritwes hayutin proposed openstack-infra/tripleo-ci master: be more prescriptive in log collection  https://review.openstack.org/51152616:43
SpamapSclarkb: maybe we should du -i in the executor too.16:43
SpamapSactually16:43
SpamapSs/maybe/16:43
SpamapS/16:43
SpamapSbah16:43
dmsimardSo we wouldn't generate the report on each job but we would copy the sqlite database. The database is pretty small.16:43
SpamapSDiskAccountant should have stopped this.16:43
SpamapSWe're limiting on storage bytes, but we should also limit inodes.16:44
jeblairSpamapS: our limit is super high until after the cutover because we have to support some legacy jobs that use a lot of space16:44
jeblairSpamapS: also, we still haven't cut over16:44
SpamapSthis happened on log storage from check jobs yeah?16:44
dmsimardI'd need to think about the process involved in importing databases over and over.16:44
SpamapSAnd just the duplicated check load?16:44
jeblairso i'd like to suggest that we focus this conversation right now to whether we need to make any emergency changes to zuulv3 right now to stop our inode use?16:44
jeblairbecause the ci system has had at least a partial outage for over a day16:45
jeblairand we should focus on nothing other than correcting that now.16:45
jeblairwhen we clear that status alert, we can talk about what to do later in v316:45
clarkbjeblair: I think it is definitely a large overhead for the previously "small" jobs16:45
*** yamamoto has joined #openstack-infra16:45
clarkber ara by default for each job is16:45
dmsimardara is probably a nice to have, if it can help alleviate the load we can toggle it off -- no need to do it by uninstalling ara from the executor IMO, we could just disable the generation from inside the role that does the generation16:46
pabelangerIf ARA is a large amount of inodes over v2, then (reluctantly) I'd be in favor of disabling it on zuulv3 for now16:46
clarkbI'd be happy trying it with just failed jobs to start if that is easy16:46
SpamapSRight, I was suggesting we have DiskAccountant kill jobs that abuse the inode table of the executor. But I guess the problem is actually not the executor running out of inodes, but the end target running out.16:46
jeblairhow about we just turn off the v3 check pipeline right now?16:46
*** armaan has joined #openstack-infra16:46
clarkbthat works too16:46
*** andreas_s has joined #openstack-infra16:46
pabelangersure16:46
fungithat may help drop some load. even with `rm -rf /srv/static/logs/??/??????/*/*/gate-tripleo-ci-*/logs/{und16:46
*** smatzek has quit IRC16:46
jeblairmordred: are you available?16:46
fungiercloud/tmp/ansible,ara_oooq,undercloud/etc}` going we're still not gaining ground16:46
jeblairguess not16:47
*** smatzek has joined #openstack-infra16:47
jeblairclarkb: you want to disable v3 check?16:47
SpamapSYeah I think that's the thing to do.16:47
dmsimardI'll go ahead and propose a toggle to disable the ara report generation just in case, it could come handy in the future16:47
clarkbjeblair: ya I think we should to try and get logs fs into a happier state16:48
fungibasically, i think short of burning down whole swaths of the logs tree, i don't think we can delete files faster than we're uploading them at the moment16:48
jeblairclarkb: sorry, i meant, are you available to make that change?16:48
clarkboh yes, I can16:48
*** dimak has quit IRC16:48
jeblairclarkb: cool, it's yours16:48
clarkbI'll push that up momentarily16:48
fungii suppose another option is we could artificially constrain our nodepool quota so we run fewer jobs at a time16:49
fungii mean, after the stop the v3 check pipeline16:49
*** leyal has quit IRC16:49
*** lihi has quit IRC16:50
*** oanson has quit IRC16:50
*** oanson has joined #openstack-infra16:50
fungibut let's see if this makes a significant dent first, i guess16:51
*** smatzek has quit IRC16:51
openstackgerritClark Boylan proposed openstack-infra/project-config master: Disable zuulv3 check pipeline  https://review.openstack.org/51152716:51
clarkbdoes that look right?16:51
pabelangerlooking16:52
*** yamamoto has quit IRC16:52
*** lucasagomes is now known as lucas-afk16:52
*** Swami has quit IRC16:53
mordredjeblair: yes! I am here16:53
pabelangeryah, think so16:53
mordred+2 on turning of v3 check16:54
*** dimak has joined #openstack-infra16:54
*** leyal has joined #openstack-infra16:54
*** lihi has joined #openstack-infra16:55
jeblairclarkb: zuul reported back with an expected post-fail.  that means the syntax check passed.  i say force-merge it now.16:56
mordred++16:56
clarkbok on it16:56
*** smatzek has joined #openstack-infra16:57
pabelangerperiodic is also large on zuulv3 (303 patches), so zuulv3 might start processing them with no check16:57
clarkbdo we want to disable periodic too?16:57
jeblairgood point.  i'm in favor of disabling periodic16:58
openstackgerritMerged openstack-infra/project-config master: Disable zuulv3 check pipeline  https://review.openstack.org/51152716:58
clarkbok working on a patch for periodic now16:58
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul-jobs master: Add a toggle to disable ARA static report generation  https://review.openstack.org/51152816:58
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul-jobs master: Add a toggle to enable saving the ARA sqlite database  https://review.openstack.org/51152916:58
dmsimardinfra-root ^16:58
*** derekh has quit IRC16:58
*** andreas_s has quit IRC17:00
openstackgerritClark Boylan proposed openstack-infra/project-config master: Similarly to disabling check, disable periodic  https://review.openstack.org/51153017:00
clarkbthere is periodic17:00
*** Goneri has joined #openstack-infra17:01
*** baoli has quit IRC17:01
jeblairclarkb: i think you need to leave the pipelines and change the trigger to "trigger: {}"17:02
clarkbthanks17:02
clarkb(just saw the error with jobs trying to use a pipeline that no longer exists)17:02
openstackgerritClark Boylan proposed openstack-infra/project-config master: Similarly to disabling check, disable periodic  https://review.openstack.org/51153017:03
*** panda|rover is now known as panda|rover|off17:04
mordreddmsimard: both lgtm17:04
*** caphrim007 has joined #openstack-infra17:05
*** eroux has joined #openstack-infra17:06
*** priteau has quit IRC17:06
*** baoli has joined #openstack-infra17:07
*** jbadiapa has quit IRC17:07
*** baoli has quit IRC17:09
*** baoli has joined #openstack-infra17:10
*** dangers` has quit IRC17:10
*** iyamahat has quit IRC17:10
*** iyamahat has joined #openstack-infra17:10
clarkb511530 has no post failured, ready for me to merge it ?17:10
pabelanger++17:11
*** baoli has quit IRC17:11
*** tesseract has quit IRC17:11
*** baoli has joined #openstack-infra17:11
mordredclarkb: wfm17:12
openstackgerritMerged openstack-infra/project-config master: Similarly to disabling check, disable periodic  https://review.openstack.org/51153017:12
clarkbmordred: want me to remove you from project bootstrappers when I remove myself?17:12
*** sree has joined #openstack-infra17:12
*** dangers has joined #openstack-infra17:13
*** caphrim007_ has joined #openstack-infra17:13
pabelangerjust citycloud-sto2 and infracloud (both regions) left for latest ubuntu-xenial DIB17:13
mordredclarkb: yes please17:13
pabelangerwe should be seeing some results of bindep-fallback.txt already17:13
clarkbmordred: done17:13
pabelangergoing to try and find a log17:13
*** ociuhandu has quit IRC17:14
pabelangerboden: which review did you see the failure on?17:15
openstackgerritMonty Taylor proposed openstack-infra/shade master: Temporarily disable volume and os_image functional tests  https://review.openstack.org/50815617:15
openstackgerritMonty Taylor proposed openstack-infra/shade master: Fix image task uploads  https://review.openstack.org/51153217:15
mordredShrews: ^^ those should help/fix the image upload issue17:15
pabelangerboden: okay, I found 51022417:16
bodenpabelanger anything recent in neutron-lib… for example https://review.openstack.org/#/c/502416/  https://review.openstack.org/#/c/510224/17:16
*** caphrim007 has quit IRC17:16
pabelangerboden: thanks17:16
mordredShrews: or at least help with the log spam - since I think the uploads are actually accidentally occuring correctly whilst we log a bunch of errors - but logging errors while a thing works cloud-side means finding real errors is unpossible17:17
clarkbdf shows IFree appears to be slowly increasing17:17
clarkbcourse now that i have said that...17:17
*** sree has quit IRC17:17
clarkbalso would be nice if du had -i on that server, oh well17:18
mordredclarkb: yah- it rises for a bit and then gets cratered17:18
*** dhinesh has quit IRC17:18
bodenpabelanger: also still seeing issues with vmware-nsx… ex: https://review.openstack.org/#/c/509661/17:19
clarkbmordred: its over 100k now at least17:19
mordredclarkb: \o/17:19
pabelangerboden: okay, just confirming it is fixed with 51022417:19
openstackgerritAndreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Add override-branch to all periodic jobs  https://review.openstack.org/51153317:19
pabelangerboden: I would also suggest adding your own bindep.txt to both project, and figuring out which OS packages you need bindep to install. Possible you might be able to mitigate the errors from today, since you are using bindep-fallback.txt right now17:20
bodenpabelanger: ok, I’ll have to read up on that17:21
clarkbwow and back down to 80k ish17:21
AJaegerwe also had broken periodic jobs in v3, see 511533 - we didn't specify override-branch and just run the job for each branch...17:21
pabelangerwill ned to recheck 510224, all jobs ran on rackspace17:22
*** slaweq_ has joined #openstack-infra17:23
inc0good morning, zuulv3 is down?17:23
pabelangeryes, we are stopping check pipelines until we can recover logs.o.o17:23
pabelanger(on zuulv3)17:23
inc0ok17:23
pabelangerclarkb: should we restart zuulv3 to dump pipelnes? Or let it run out17:23
clarkboh right it won't dump them on its own17:24
clarkbjeblair: ^ what do you thinK/17:24
AJaegerteam, we have 303 changes currently in periodic and 117 in check17:24
jeblairclarkb: yes we should17:24
AJaegerpabelanger: yes!17:24
*** slaweq_ has quit IRC17:24
jeblairi will do it17:24
clarkbjeblair: tanks17:24
pabelanger++17:24
clarkband now down toe 45k ish inodes17:24
clarkbso ya not keeping up17:25
pabelangerShrews: just citycloud-sto2 left for ubuntu-xenial DIB17:25
pabelangerand rackspace of course17:25
jeblairzuulv3 restarting17:25
*** felipemonteiro has joined #openstack-infra17:26
*** links has quit IRC17:27
clarkbwe are back down to 0 free inodes17:27
pabelangerokay, confirmed gnutls package is no longer breaking on xenial with ovh17:27
pabelangertelnet://158.69.88.129:1988517:27
pabelangerclarkb: care to +3 51149217:28
*** shardy has quit IRC17:29
pabelangerthat's what we used for ubuntu DIBs17:29
jeblairas best as i can tell, the checksum correction will take more than 7 more hours.17:29
pabelangerkk17:29
clarkbpabelanger: done17:30
fungipabelanger: excellent job!17:30
clarkbwatching df -i output I am imagining a game of hungry hungry hippos17:30
fungii take it we're still waiting on uploads to complete elsewhere17:30
fungiclarkb: yes, that's a great analogy17:31
jlkAJaeger: reviewed. I don't think anything else needs root.17:31
*** pblaho has quit IRC17:31
pabelangerYah, as long as we don't vos release our ubuntu AFS mirror, we should be protected until we can repair read/write.  Just means we've pinned for abit17:31
AJaegerjlk: great, thanks17:31
fungiclarkb: heh, one of my polls (i have watch reporting once a minute on inode count for that filesystem) showed 1 free inode17:32
pabelangerwe also could cause ubuntu DIBs too, if we felt the need17:32
pabelangers/cause/pause17:32
jlkAJaeger: You'll need infra root to push it through though.17:32
*** sree has joined #openstack-infra17:32
AJaegerjlk: why that?17:32
clarkbis there an easy way t ostrace ssh on static.o.o and filter writes to logs/ ?17:32
clarkband maybe we can see realtime what is going into the fs/17:33
jlkor maybe not infra root, but people with more voting rights than I have :D17:33
mordredjlk, AJaeger: which change?17:33
clarkbclarkb@static:/srv/static/logs/61$ find 509761 | wc -l returns 284098517:33
fungiclarkb: sshd forks on each incoming connection, so you'd need to -f17:33
AJaegerjlk: yes, I know...17:33
pabelangerclarkb: Oh, I guess we should also get check-tripleo pipelines for zuulv317:33
pabelangersince they will run tripleo jobs17:33
mordredclarkb: holy crap17:34
jeblairpabelanger: please update commit message on https://review.openstack.org/473911 so we can merge the change.  i've already manually removed the crontab entries.17:34
AJaegermordred: https://review.openstack.org/511396 , https://review.openstack.org/511436  ,  https://review.openstack.org/51143517:34
pabelangerjeblair: ack17:34
clarkba single tripleo change is .4% of our inode total17:34
SamYaplehaha wow17:35
*** sshnaidm is now known as sshnaidm|off17:35
AJaegerjeblair: let me do a proposal for periodic translation jobs...17:36
jeblairAJaeger: let's check with mordred and see which approach he favors; he's thought about this more in the context of the migrated jobs17:36
*** ociuhandu has joined #openstack-infra17:36
clarkbI need to pop out for breakfast now. If someone else is able to get check-trupleo in v3 that would be great17:36
jeblairclarkb: can you elaborate?  i don't know what you're asking17:37
mordredAJaeger, jeblair: those patches look good to me - what's the other approach?17:37
openstackgerritPaul Belanger proposed openstack-infra/system-config master: Remove npm / rubygem crontab entries  https://review.openstack.org/47391117:37
pabelangerjeblair: fungi: ^more info in commit message for rubygems / npm17:38
jeblairmordred: https://review.openstack.org/511533 i think is the change we're discussing with AJaeger17:38
AJaegermordred, jeblair what about http://paste.openstack.org/show/623489/ instead of https://review.openstack.org/#/c/511436/1/zuul.d/jobs.yaml ?17:38
jeblairi've re-enabled all of the crontab entries on mirror-update except ubuntu17:38
AJaegerjeblair: problem with that approach is that we have some repos that don't run the job on all branches17:38
*** felipemonteiro has quit IRC17:39
jeblairAJaeger: you're saying some projects only run propose-translation-update on master, but some run on all branches?  in which case, put no branch matcher on the job, but do add one to the project-pipeline invocation of the job.17:42
jeblairAJaeger: also, branches can be a yaml list, so you don't have to do regexes any more17:42
*** SumitNaiksatam has joined #openstack-infra17:43
AJaegerjeblair: most projects run the translation proposal only on master, some on stable/pike and ocata, some only on pike, others only on ocata17:43
clarkbjeblair: pabelanger pointed out we are still running check tripleo in zuulv317:43
pabelangerclarkb: mordred: what if you did http://logs.openstack.org/07/472607/ ? what is the number of inodes on that?17:43
AJaegerall depending on whether translations were ready at that time.17:43
jeblairthis is certainly how new periodoc jobs should be constructed.  the question i'd like mordred to weigh in on is whether we should go ahead and do this for these broken legacy jobs, or is the other approach better.17:44
*** trown|lunch is now known as trown17:44
jeblairmordred you +2d 511533 without any feedback on my comments, so i guess that means he favors your approach.  that's fine.17:44
mordredwait17:44
jeblairwow that didn't make sens17:44
jeblairmordred you +2d 511533 without any feedback on my comments, so i guess that means you favor AJaeger'sapproach.17:45
jeblairi did not switch conversation partners well17:45
jeblairanyway, i need to focus on fires17:45
mordredjeblair, AJaeger: I'm sorry, I feel completely lost. I do not understand how branch matchers can have any impact on periodic jobs17:45
jeblairso let's pin this for later.17:45
jeblairclarkb: i'll take care of it17:45
*** apetrich has quit IRC17:45
mordredok. I've removed my vote (which I gave missing jeblair's comments) - and yes, I'd like to pin it for later bcause I'm not being a good participant in it right now it seems17:46
jeblairwe need to focus on the 30-hour long ci outage now.17:46
*** armaan_ has joined #openstack-infra17:46
AJaegerjeblair: ok - we can discuss later, I'll add to etherpad17:46
mordredyup17:46
*** edmondsw has joined #openstack-infra17:46
pabelangerokay, gnutls is under control now. I've see a few projects working properly again with bindep-fallback.txt17:48
*** apetrich has joined #openstack-infra17:48
pabelangerwhat can I help with now?17:48
openstackgerritJames E. Blair proposed openstack-infra/project-config master: Disable trigger for v3 check-tripleo pipeline  https://review.openstack.org/51154017:48
jeblairpabelanger: did you end up creating and uploading new images?17:49
*** Swami has joined #openstack-infra17:49
pabelangerjeblair: yes, we used AFS mirrors for packages on xenial DIB, once we built and images uploaded, issue resolved it self17:49
*** armaan has quit IRC17:49
*** dhinesh has joined #openstack-infra17:49
jeblairpabelanger: and theyre uploaded to all regions?17:49
pabelangerjeblair: execpt rackspace, because of a shade bug17:50
jeblairok.  and that's fine because they have working old images17:50
pabelangerand rackspace is running 3 day old images, that are not affected17:50
pabelangeryah17:50
*** sree has quit IRC17:50
ShrewsThey should eventually complete in rax17:50
*** rwsu has quit IRC17:51
pabelangeronce https://review.openstack.org/511492/ lands, we also can remove nb04 from emergency file17:51
pabelangerI recommend we keep nb03 disabled, and potentially rebuilt into rackspace for faster uploads of DIBs17:51
*** sree has joined #openstack-infra17:51
openstackgerritAndreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: WIP: Add new translation templates  https://review.openstack.org/51154117:52
mordredthe longest portion of rackspace 'upload' time is actually the image import step I believe (which is where the bug is)17:52
jeblairpabelanger, mordred: +3 https://review.openstack.org/511540  ?17:52
jeblairfungi: do you need any help with logs?17:53
mordredjeblair: done17:53
pabelangershould I look at update base jobs in zuulv3 with ARA disabled?17:53
jeblairpabelanger: no17:53
pabelangerkk17:53
jeblairi'm not ready to talk about anything that isn't directly related to clearing that status message17:54
*** dhinesh has quit IRC17:54
*** dhinesh has joined #openstack-infra17:54
*** armaan_ has quit IRC17:54
fungijeblair: i think i've got it deleting the most effective two things we can hope for at the moment (subtrees of tripleo jobs clarkb identified, and any logs older than 2 weeks). it's _almost_ keeping up now, and as we finish winding down the other high-use pipelines on zuulv3 i have hopes it'll finally gain ground17:54
*** armaan has joined #openstack-infra17:55
*** slaweq_ has joined #openstack-infra17:55
openstackgerritAndreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Add new translation templates  https://review.openstack.org/51154117:55
fungii haven't seen it break 100k free inodes yet (polling once a minute) but it's been a little while since i've seen it at 0 free (i've seen a few sub-1k though)17:56
*** sree has quit IRC17:56
*** baoli has quit IRC17:56
fungias opposed to earlier where it was basically pegged to 0 free on every poll17:56
fungino, wait, there it just came back 0 again17:56
SamYaplefungi: you jinxed it!17:56
fungiindeed :/17:56
pabelangerokay, I'll work on getting system-config jobs working17:56
*** baoli has joined #openstack-infra17:57
pabelangerlooks like openstackci-beaker is failing17:57
jeblairpabelanger: thanks17:57
*** jascott1 has joined #openstack-infra17:57
mordredas soon as the latest patch from jeblair lands we should likely restart zuul again to clear that pipeline yeah?17:57
jeblairmordred: there's only one thing in it now, so if it lands soon, no big deal17:57
fungiprobably so17:57
mordredoh - ok. cool17:57
fungioh, good point we already restarted so that cleared it out anyway17:58
*** camunoz has quit IRC17:58
fungiand there hasn't been a lot of time for it to accumulate again17:58
*** armaan has quit IRC17:59
*** trown is now known as trown|brb18:00
mordredShrews: https://review.openstack.org/#/c/508156/ came back green - wanna +A it?18:01
mordredShrews: the follow up also is mostly green - the red is POST_FAILURE from the current incident18:01
*** dangers has quit IRC18:02
*** eharney has quit IRC18:03
openstackgerritPaul Belanger proposed openstack-infra/puppet-openstack_infra_spec_helper master: Cap signet < 0.8.0  https://review.openstack.org/51154318:03
fungihuh, no more rsync processes on static.o.o for the past few minutes, and now we're up over 100k free inodes18:04
fungiover 200k free18:05
*** dprince has quit IRC18:05
mordredfungi: I haven't seen that number over 200k in a WHILE18:07
pabelangerjeblair: okay, I've confirmed ubuntu-trusty also is affect with gnutls issue. I've started an image-build for ubuntu-trusty now18:07
*** dangers has joined #openstack-infra18:08
*** ijw has joined #openstack-infra18:08
*** dhinesh has quit IRC18:08
*** dhinesh has joined #openstack-infra18:09
*** trown|brb is now known as trown18:09
jeblairi'm looking into what's holding up 51139618:09
*** dhinesh has quit IRC18:09
pabelangerYah, was just taking a peek myself18:10
pabelangerwe have a lot of ready nodes on nl01, and few currently building18:10
jeblairslow building node in inap-mtl0118:11
pabelangermordred: fungi: clarkb: https://review.openstack.org/511543/ is the fix for system-config openstackci-beaker jobs, if you'd like to review. I'm working on fixing ubuntu-trusty issue now18:11
pabelangerjeblair: maybe trying to boot new xenial image18:11
fungithanks pabelanger!18:11
jeblairpabelanger: does that take 30m?18:12
pabelangerjeblair: yah, I've seen upwards of an hour18:12
mgagnejeblair: have new images been uploaded? is Nodepool playing catch up since yesterday?18:12
jeblairwe should maybe set the launch timeout to 10m there like rax18:12
fungii suppose it could if there's a thundering herd on the storage distribution network warming nova image caches18:12
pabelangerjeblair: +118:13
jeblairmgagne: yes new images18:13
pabelangerinap boots fast, so 10mins should be plenty18:13
*** camunoz has joined #openstack-infra18:14
Shrewsmordred: +2's the first, the follow up has a comment from mye18:14
Shrewsme18:14
openstackgerritJames E. Blair proposed openstack-infra/project-config master: Set v3 nodepool inap timeout to 600  https://review.openstack.org/51154518:14
mordredShrews: thanks18:14
pabelanger+218:15
jeblairokay, that should eventually clear.  i don't think disabling v3 check-tripleo is urgent enough to do anything other than just check back in a bit.18:15
pabelangergreat18:16
jeblairfungi: how much headroom do you think we need before we can send an all-clear?18:16
fungiwatching what jobs are uploading logs in real-time (by grepping the process list for rsync) i just saw a gate-tripleo-ci-centos-7-3nodes-multinode-nv build suck up 10k inodes18:16
Shrewsmordred: oh, maybe what you've done in that exception format will work (assumes 'message' is an attribute, right?)18:17
fungijeblair: a sane amount would be when we get down to 99% inode consumption maybe? like around 7.7m free18:17
fungijeblair: right now we're at 0.02% free18:18
*** jdandrea has quit IRC18:18
openstackgerritPaul Belanger proposed openstack-infra/puppet-openstack_infra_spec_helper master: Add bindep.txt file  https://review.openstack.org/51154618:19
mordredShrews: yes - just replied with that - status is a dict with message as a key - but we could change it to the other syntax if you prefer18:19
Shrewsmordred: yeah, i'd rather be explicit than clever  :)18:19
openstackgerritMerged openstack-infra/project-config master: Install zanata dependencies as root  https://review.openstack.org/51139618:20
openstackgerritMerged openstack-infra/project-config master: Switch ubuntu DIB to use AFS mirror in rackspace  https://review.openstack.org/51149218:20
fungia gate-tripleo-ci-centos-7-containers-multinode build just now uploaded 8k files18:20
clarkbalmost 2k of that is various ara things iirc and etc is another 1k or so then multiple etc by number of nodes18:21
fungiand right after that i saw a gate-tempest-dsvm-py35-ubuntu-xenial job upload 600 files18:21
fungiso we're talking over order of magnitude higher inode counts from tripleo jobs than devstack-gate jobs18:22
pabelangeryah18:22
fungii saw a gate-tempest-dsvm-ironic-ipa-partition-bios-pxe_ipmitool-coreos-src-ubuntu-xenial build upload 500 files18:23
fungi(that's one heck of a job name!)18:23
clarkbya multinode tempest/grenade is in the 1k range18:24
mordredShrews: kk. update coming18:25
*** anupn has quit IRC18:26
*** baoli_ has joined #openstack-infra18:31
*** dhinesh has joined #openstack-infra18:31
pabelangermordred: clarkb: care to +3 https://review.openstack.org/511545/ for inap launch-timeout 60018:31
openstackgerritMerged openstack-infra/project-config master: Disable trigger for v3 check-tripleo pipeline  https://review.openstack.org/51154018:32
mordredpabelanger: done18:32
*** baoli has quit IRC18:33
fungii just saw a legacy-tempest-dsvm-neutron-ovsfw build upload logs... i guess the orphaned nodes from the zuulv3 restart are still chugging along18:34
fungithat might explain why cleanup hasn't sped up just yet18:34
openstackgerritMerged openstack-infra/project-config master: Set v3 nodepool inap timeout to 600  https://review.openstack.org/51154518:36
jeblairfungi: ah, yeah, we may still have the bug where a scheduler restart doesn't abort executor jobs18:36
jeblairthough it should cause them to get deleted.18:36
jeblairthe nodes i mean18:37
pabelangerubuntu-trusty DIB now compressing18:37
openstackgerritMonty Taylor proposed openstack-infra/shade master: Fix image task uploads  https://review.openstack.org/51153218:38
openstackgerritMonty Taylor proposed openstack-infra/shade master: Add group parameter to create_server  https://review.openstack.org/51130518:38
jeblairfungi: at this point you should see no more legacy- jobs upload18:38
mordredShrews: k. that should fix your comment18:38
jeblairfungi: the only nodepool v3 nodes in use are for infra-post jobs18:38
jeblairfungi: and the change to disable check-tripleo in v3 has landed18:38
jeblairi'm going to afk for about an hour for lunch, etc.18:39
mordredjeblair: have good lunching18:39
clarkbwhat time did the tripleo ansible tmp fix get in yesterday? mordred do you recall?18:40
mordredclarkb: I do not - I can go look though18:40
mordredclarkb: I made the Stop collecting ephemeral temp dirs patch at around 22:16 - which is right around the time we force-merged the other tmp patches18:41
openstackgerritMonty Taylor proposed openstack-infra/shade master: Image should be optional  https://review.openstack.org/51129918:42
openstackgerritMonty Taylor proposed openstack-infra/shade master: Add method to set bootable flag on volumes  https://review.openstack.org/50247918:42
openstackgerritMonty Taylor proposed openstack-infra/shade master: Allow domain_id for roles  https://review.openstack.org/49699218:42
openstackgerritMonty Taylor proposed openstack-infra/shade master: Move role normalization to normalize.py  https://review.openstack.org/50017018:42
clarkbmordred: thanks18:43
mordredShrews: ^^ if you get a sec, those 4 have been on hold due to other gate issue, but would be nice to have if we're gonna cut a new release for the upload bug18:43
mordredShrews: (3 of them are for fixing bugs humans have reported running in to)18:43
clarkbgate-tripleo-ci-centos-7-ovb-ha-oooq has added more /etc collection in the last 7 dayrs or so18:43
clarkbovercloud-*/etc seems to be the bulk of it that is new18:44
*** vsaienk0 has joined #openstack-infra18:44
clarkbwent from ~4k to ~37k18:45
clarkb23k or so of that is the ansible tmp stuff18:45
clarkbthen good chunk of the rest looks like etc18:45
Shrewsmordred: ack18:46
clarkbalso /var/log/extra and /var/log/config-data18:48
clarkbwe are copying all of the apache modules multiple times (basically once per oenstack service?)18:49
clarkbEmilienM: Hlogs/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/var/log/config-data/nova/etc/httpd/conf.modules.d is probably as easy sort of thing to just stop collecting18:50
clarkbEmilienM: are we using a whitelist for logs yet?18:50
clarkbbut its in logs/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/var/log/config-data/heat_api/etc/httpd/conf.modules.d as well and so on18:51
openstackgerritMerged openstack-infra/shade master: Temporarily disable volume and os_image functional tests  https://review.openstack.org/50815618:52
clarkblogs/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/var/log/config-data/keystone/etc/httpd/conf.modules.d18:53
*** vsaienk0 has quit IRC18:54
clarkblooks like we also copy the system systemd units18:55
EmilienMo/18:55
EmilienMclarkb: yes we have whitelist and exclude18:55
clarkbEmilienM: why are we copying all of the apache modules multiple times then?18:55
EmilienMweshay|ruck: can you take a look please? I'm in a call right now18:55
clarkband systemd system units?18:55
EmilienMclarkb: I don't know now18:55
weshay|ruckaye18:56
weshay|ruckEmilienM, k18:56
EmilienMty18:56
weshay|ruckclarkb, I have an email out for review w/ a few patches for logs on openstack-dev18:57
pabelangerlikey drop SSH host keys in http://logs.openstack.org/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/etc/ssh/18:57
pabelanger:( http://logs.openstack.org/61/509761/2/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/c9aeee4/logs/overcloud-controller-0/etc/sysconfig/network-scripts/18:58
pabelangerdon't think we need all of sysconfig/network-scripts too18:58
*** camunoz has quit IRC18:58
pabelangerweshay|ruck: clarkb: lets create a topic in gerrit so we can review them18:59
weshay|ruckhttps://github.com/openstack-infra/tripleo-ci/blob/master/toci-quickstart/config/collect-logs.yml#L15118:59
weshay|ruckya.. I'll nuke that18:59
*** slaweq_ has quit IRC18:59
pabelangerYah, I think we should be whitelisting specific files, not just directories18:59
clarkblogs/61/509761/2/check/gate-tripleo-ci-centos-7-containers-multinode/99f9196/logs/subnode-2/etc/selinux is another big consumer19:00
clarkbpabelanger: yes that is what we've been asking for since like march19:00
pabelangerclarkb: agree19:00
pabelangerokay, I have to run out for an errand19:00
pabelangerI will try to be back shortly19:00
fungiworth noting, tripleo isn't the only team with high-inode-count build logs... i just saw a gate-openstack-ansible-os_nova-ansible-func-ubuntu-xenial build upload 5k files19:02
Shrewsmordred: reviewed the shade changes. all look good except for one19:03
*** ihrachys_ has joined #openstack-infra19:03
*** harlowja has quit IRC19:03
*** slaweq_ has joined #openstack-infra19:03
clarkbfungi: do you have the full path to that? i'd be curious to go see what tehy are grabbing19:03
fungiclarkb: /srv/static/logs/44/479844/6/check/gate-openstack-ansible-os_nova-ansible-func-ubuntu-xenial/1541ded/19:03
clarkbthanks19:03
dmsimardfungi: OSA have heavy playbooks and use ARA so there's likely a lot of files19:03
*** rbrndt has quit IRC19:03
dmsimard(because of ARA)19:03
*** ihrachys has quit IRC19:03
funginoted19:03
Shrewsmordred: i think you need to s/payload/kwargs/ in 511305 ?19:04
clarkbfungi: ara is 2800 of that19:04
fungiyikes. still a lot of files, but ara is over 50%?19:04
dmsimardfungi: that's probably not even the heaviest one, gate-openstack-ansible-openstack-ansible-aio-ubuntu-trusty is likely heavier than that19:05
clarkbthey are also grabbing a lot of stuff out of etc that shouldn't be grabbed19:05
*** lukebrowning has quit IRC19:05
clarkband looks like redundant sets possibly19:05
dmsimardfungi: wait, wrong job name, hang on.19:05
*** masber has joined #openstack-infra19:05
dmsimardfungi: gate-openstack-ansible-openstack-ansible-aio-ubuntu-xenial http://logs.openstack.org/21/474721/7/check/gate-openstack-ansible-openstack-ansible-aio-ubuntu-xenial/d8cdf1d/logs/ara/19:06
*** slaweq_ has quit IRC19:06
*** baoli_ has quit IRC19:06
*** slaweq_ has joined #openstack-infra19:06
dmsimardthat one likely has a bunch of files :(19:07
fungidmsimard: oh, yeah, /srv/static/logs/21/474721/7/check/gate-openstack-ansible-openstack-ansible-aio-ubuntu-xenial/d8cdf1d contains 10k files, so rivalling tripleo jobs19:07
*** AJaeger has quit IRC19:07
*** lukebrowning has joined #openstack-infra19:07
dmsimardgate-openstack-ansible-openstack-ansible-ceph-ubuntu-xenial should be on about the same level19:07
*** SumitNaiksatam has quit IRC19:08
pabelangerokay, ubuntu-trusty DIBs uploadin19:08
fungiand we're back under 10k free inodes. so we're really still not keeping pace with the rate at which new builds are uploading logs (or maybe only barely)19:08
pabelangerafk now19:08
clarkbmaybe its worth a general email explaining that we shouldn't be copying all of /etc19:09
clarkbbut ya ara is the bigger chunk of the pie for osa at least19:09
*** AJaeger has joined #openstack-infra19:09
dmsimardI'll try and think of the plumbing involved for shifting from static reports to sqlite to central19:09
*** camunoz has joined #openstack-infra19:10
*** masber has quit IRC19:10
fungihow terrible would ara performance be if the report files were passed around as a bundle (tarball or something) and unpacked on the fly?19:11
*** baoli has joined #openstack-infra19:11
fungii guess you'd need some backend support to deal with that, or end up transferring all the data to the browser as a giant blob up-front19:12
clarkbodyssey4me: ^ fyi, any chance you can workto clean up the collection of /etc in your jobs?19:12
fungi3 inodes free :/19:12
dmsimardfungi: unpacked on the fly ? I've never done something like this before -- right now every file in gzipped individually and then there's the necessary mime types to make the webserver extract them on the fly19:13
fungiyeah, we're back to returning POST_FAILURE again19:13
clarkbodyssey4me: http://logs.openstack.org/44/479844/6/check/gate-openstack-ansible-os_nova-ansible-func-ubuntu-xenial/1541ded/logs/etc/ has 2111 inodes in use and a bunch of that is copying stuff that isn't really relevant to the jobs19:13
dmsimardfungi: We would need some sort of middleware ?19:13
dmsimardevrardjp, cloudnull ^ see clarkb's question19:13
fungidmsimard: yeah, probably unless you serialized all the report data into a single file19:13
fungiwhich would likely be a big hit browser-side, i'm guessing19:14
fungi(hit to performance, not hit on the solid gold singles chart)19:14
* cloudnull reading19:15
*** gouthamr has quit IRC19:15
dmsimardfungi: It's not very realistic, no, there's too much data to display everything in one single file. I'll try and think of something relative to the sqlite database instead. The sqlite database is several order of magnitude smaller than even the gzipped static report, not to mention it's just one file.19:16
clarkboh nice that sounds like win win win19:16
clarkbdmsimard: is that a useable feature today? we just have to turn it on?19:16
cloudnull^ I can go kick that out to the tests repo, if so19:17
fungidmsimard: oh! neat, i didn't realize sqlite could do multiple tables in one file, but i'll admit i've done very little with it so far19:17
cloudnullclarkb: we have the log/config collection tasks within the tests role. are we needing to just prune that bacK?19:18
dmsimardclarkb: Well the sqlite database already exists, that's where the callback saves it's data and from where the web interface reads it. The static report generation is more or less a crawler that crawls all the pages of the interface and generates static files out of every page.19:18
clarkbcloudnull: ya if we could stop grabbing all of /etc and multiple copies of it that would be good.19:19
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Update some legacy jobs  https://review.openstack.org/51155519:19
evrardjpcloudnull: I am off for today, but if you're doing something to make the whole etc collection and archive that would be great19:19
clarkbcloudnull: its fine to copy things relavent to the job, openstack service logs and config or whatever19:19
evrardjpclarkb: it's multiple copies because it's multiple "hosts"19:19
clarkbcloudnull: but do you really need all of rc.* and fonts and logrotate and so on19:19
dmsimardfungi: BonnyCI ran with a mysql database of like 42 000 playbook runs :)19:19
evrardjpwe generally need it, but it can definitely be a an archive19:19
clarkbevrardjp: right the problem is that stuff like ^ is all going to be identica and has no relevance to the job really19:19
clarkbevrardjp: its fine to copy the bits that are relevant to the job anddifferent like openstack service ofnfig19:20
evrardjpexactly19:20
evrardjpI agree19:20
evrardjpplus one or two locations, like apt sources19:20
evrardjpor yum repos19:20
dmsimardfungi: the challenge here is to go from a sqlite database saved on logs.o.o to an interface, somehow -- whether that's a centralized instance, or something generated on the fly from that database19:20
evrardjpthe rest doesn't matter19:20
evrardjpand in all cases we can iterate later to add some small stuff19:20
cloudnullyea I think we did an /etc/.* just because it was easy, we could be a lot more tactical19:21
evrardjpbut I think we should generally not ship those files directly19:21
evrardjpwe should just archive those19:21
dmsimardfungi: ultimately, it's sqlalchemy with a sqlite connection string -- sqlite is usually on the filesystem. Maybe we can work out something that uses the sqlite database over http or something like that.19:21
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Update some legacy jobs  https://review.openstack.org/51155519:21
openstackgerritAndreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove legacy-python{34,35} jobs  https://review.openstack.org/51155719:21
evrardjpif we want the detail, we download and unarchive19:21
clarkbevrardjp: sound slike a plan then? prune and archive?19:21
evrardjparchive and don't even collect19:21
evrardjpso when the instance is destroyed, we don't care anymore19:22
fungidmsimard: yeah, still needs some backend support, like a filter callout in apache maybe (that's how we do the fancy clickable log stuff with os_loganalyze)19:22
evrardjpI had a very long day, so I'd be happy if someone can take over this... cloudnull?19:22
cloudnullsure thing19:23
cloudnulllooks like we just need to adjust https://github.com/openstack/openstack-ansible-tests/blob/master/test-log-collect.sh#L40-L6219:23
cloudnullwhich will take the pressure off from everyone of our role.s19:23
cloudnullfor now we could comment all that out19:23
cloudnulland then work it back in19:24
evrardjpsame for var log, we can only keep what's interesting for us19:24
evrardjpanything that can deblock others is good... But we still need logs at the end, because it reduces our ability to use gate results, and we don't want to spend cycles for nothing either :p19:26
jeblairfungi: back.  looks like there's little progress on inodes?19:26
clarkbya there is a balance to be reached19:26
clarkbwith devstack-gate we try to add things when we notice we need them and be specific19:26
clarkband we remove things as we notice they aren't useful too19:27
clarkbrather than juts wholesale copy (so I think pruning and archiving to single file is a big win there, thanks)19:27
*** andreww has quit IRC19:27
evrardjpclarkb: yeah, I guess here we noticed "we need /etc/<something> "19:27
evrardjpand then yes we need /etc/<somethingelse>19:27
fungijeblair: yes and no. i saw it fall all the way back to 0 but now we're nearing 300k free again19:27
evrardjpand then it finished to be a boatload of things19:27
evrardjpwhich is obviously wrong :p19:27
jeblairfungi: what's the next most dramatic step we can take?19:28
*** xarses has joined #openstack-infra19:28
fungijeblair: a few potential options: artificially constrain our nodepool quota, disable some of the top-offender jobs, or delete entire subtrees of the filesystem19:28
jeblairfungi: oh!  what if i produced a list of v3 check jobs from zuul logs and we just rm-rfd those paths?19:29
*** eharney has joined #openstack-infra19:29
clarkbjeblair: ++19:29
*** baoli has quit IRC19:29
fungijeblair: maybe... deleting jobs by name still seems to go pretty slowly mainly because there are a lot of wildcarded parent directories to get to the job names19:29
cloudnullevrardjp: https://review.openstack.org/51156019:30
jeblairfungi: i'm talking exact paths19:30
jeblairfungi: i should have said 'build' rather than 'job' :)19:30
*** baoli has joined #openstack-infra19:30
fungijeblair: oh, yeah i could probably loop over those pretty easily19:30
jeblairlemme see what i can produce19:30
cloudnullwe can add it back once there's less pressure, and we have sometime to think about everything that we might really need.19:30
*** camunoz has quit IRC19:31
fungithanks jeblair!19:32
evrardjpcloudnull: ok. Alternative would be to tar them19:32
evrardjplet's already do htis19:32
fungii still worry that tarring up files you don't know for sure you need just avoids doing the actual work of figuring out what information is actually useful19:32
cloudnull++19:33
cloudnullI think it'd be better to get a list together of what we really need19:33
fungiand makes it easier to never get around to working through that19:33
evrardjpfungi: oh yes, I mean tarring only what's useful19:33
fungioh, got it. that would be cool as long as they're not useful to browse directly on a frequent basis19:34
evrardjpcloudnull: I updated the commit message with the reason and let's do it19:34
openstackgerritDavid Shrewsbury proposed openstack-infra/shade master: Image should be optional  https://review.openstack.org/51129919:34
cloudnullwe could do something like an include list from file and use that with our existing rsync commands so that its easy to add and remove as needed.19:34
ianwjeblair / pabelanger : looks like current status on the mirror is trying to rebuild the checksums.db, is that right?19:34
evrardjpcloudnull: yes, that's what I thought, basically saying WHAT we really want to collect.19:34
cloudnullbut it'd be good to circulate that with the osa community so that we make sure we get everything useful for our folks19:34
evrardjplet me merge your patch quick then19:34
jeblairianw: yes.  i have added space to partition and added quota to volume, rebooted all servers involved, and am running the checksum rebuild with a db directory on local disk.19:35
cloudnullevrardjp: ok.19:35
*** andreas_s has joined #openstack-infra19:35
jeblairianw: (that way we avoid afs write errors on the db).19:36
cloudnullassuming jenkins doesn't kick us in the teeth that should be merged soon, which will have immediate impact on ALL of our role jobs.19:36
clarkbI'm semi manually going though and clearning out tmp/ansible from tripleo change logs19:36
clarkbcloudnull: thanks19:36
clarkbcloudnull: I think we are making progress on the inode front so optimistic it will get through19:37
fungioh, wow, we're up over half a million inodes free now! i suspect we owe some of this to job volume falling now that zuul is no longer backlogged19:37
jeblairianw: the immediate gnutls issue has been resolved by uploading new images built from our mirror.19:37
openstackgerritDavid Shrewsbury proposed openstack-infra/shade master: Add group parameter to create_server  https://review.openstack.org/51130519:37
openstackgerritDavid Shrewsbury proposed openstack-infra/shade master: Image should be optional  https://review.openstack.org/51129919:37
clarkbfungi: ya and I've cleared out about 100k so far19:37
cloudnullsorry for the issues fungi clarkb.19:37
*** florianf has quit IRC19:38
jeblairclarkb, fungi: should we delete all v3 check and check-tripleo pipeline builds?19:38
ianwjeblair: excellent, thanks; i am glad that works.  dib's "use this mirror during build" is maybe not as robust as i'd like19:39
jeblairianw: you can see recent project-config changes merged to do that if you want to retro-review it19:39
clarkbjeblair: I think that would make a significant impact, I'd be in favor19:39
mordredjeblair: yah. I'm also in favor19:40
fungijeblair: sure, i expect a majority of them to exhibit failures for issues we've since fixed19:40
*** andreas_s has quit IRC19:40
jeblairi have a list of 132272 zuulv3 builds.  119842 of which are check*19:40
fungiso probably could stand fresh check results anyway19:40
ianwand i see system-config seems green, so that's good too19:40
jeblairfungi: static.openstack.org:~corvus/log-delete19:42
fungiwe've just reached 0.1% free inodes now, so about 1/10th of what i'd like to see freed up before we #status ok19:42
fungithanks jeblair! i'll start culling those19:42
jeblairfungi: cool, thanks19:42
fungiworth noting, my deletion of /srv/static/logs/??/??????/*/*/gate-tripleo-ci-*/logs/{und19:42
fungiercloud/tmp/ansible,ara_oooq,undercloud/etc}19:42
fungifinally completed19:42
ianwcool, the only reason not to use DIB_DISTRIBUTION_MIRROR is that it leaves the mirror behind in the image.  which doesn't matter in this case, but i didn't feel was suitable for the general case19:42
mordredfungi: \o/19:42
fungiyay stray newlines in my clipboard19:42
dmsimardclarkb, fungi: I was discussing with a colleague.. looking at https://github.com/openstack-infra/puppet-openstackci/blob/master/files/log_archive_maintenance.sh#L4-L10 would it make sense to do a .tar.gz archive of the whole job logs instead of just gzipping every file ?19:43
clarkbdmsimard: the reason to not do that is for browseability in your web browser19:43
dmsimardyeah, I get that19:43
clarkbif you tarball everything the nyou have to download and extract locally19:44
dmsimardbut past a certain treshold, meh, I don't know19:44
clarkbfungi: oh I think ansible should have been ansible*19:44
fungidmsimard: they cease to be browsable but i suppose we could consider doing that for logs over a week old or something19:44
ianwjeblair: was I right that zuulv2 didn't reload to see https://review.openstack.org/#/c/511360/?19:44
dmsimardotherwise, we could consider rotating logs off to another node (cold storage) or something19:44
fungiclarkb: thanks, i'll add that in a separate pass19:44
dmsimardjust trying to think of different other things that could help19:44
clarkbfungi: actually you can just rm undercloud/tmp19:45
clarkbfungi: since the only content there is the ansible related stuff19:45
dmsimardfungi: maybe a treshold between 10 and 30 days, I don't know. Just saying the likelyhood of someone looking at logs >1 week gets increasingly smaller, and on that topic, it'd probably be interesting to look at apache logs to get some stats on what people are looking at.19:45
fungiclarkb: thanks, that'll help19:45
ianwdmsimard: heh, .tar.gz is my oldest "one day i'll fix this" change -> https://review.openstack.org/#/c/122615/ (not related to reducing inodes though)19:46
fungidmsimard: we already were only keeping 30 days and that had us at 95% blocks used19:46
ianw"log packages are around 7MiB from my testing. This is big but not ridiculous." i think this is no longer true19:46
fungiso we needed to reduce retention for now anyway (and i've effectively dropped it to 14 days with the pass currently underway)19:46
dmsimardianw: that review is very interesting, we were actually discussing something like that earlier fungi and I19:47
dmsimardianw: the problem with ARA is that while it's not big, it's a lot of smaller files and we could perhaps make a .tar.gz and serve that instead19:47
dmsimardHowever I'm looking at another possibility right now, involving just having to store the sqlite database19:48
openstackgerritMerged openstack-infra/zuul-jobs master: Add a toggle to disable ARA static report generation  https://review.openstack.org/51152819:48
jeblairianw: re zuulv2 and pip8; i don't know.  i forgot about that one.19:49
fungijeblair: i'm using your list thusly: cd /srv/static/logs/ ; cat ~corvus/log-delete | xargs rm -rf19:49
jeblairfungi: that sounds about right19:49
fungiactually, i think i'm going with my earlier plan for safety19:50
clarkbwe just passed 1 million free19:51
jeblairfungi: for loop?19:51
*** vhosakot has joined #openstack-infra19:51
fungijeblair: sed s,^,/srv/static/logs/, ~corvus/log-delete | xargs rm -rf19:51
jeblairfungi: heh, that one makes me nervous i'd sed that to a new file and just xargs from that file19:52
fungijeblair: fair, i just re-audited the file to make sure it contains no absolute paths19:52
fungiand no ".."19:53
fungideletion underway just catting the file and treating them as relative paths19:53
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Add publish-deploy-guide job  https://review.openstack.org/51156319:54
fungiwe're finally well over a million inodes free19:54
*** baoli has quit IRC19:55
fungiso i have hopes the current several patterns/lists under deletion will get us to our 1% free #status ok in relatively short order now19:55
openstackgerritAndreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: convert deploy-guide to native zuul v3  https://review.openstack.org/51156419:57
mordredjeblair: good idea with the v3 job list!19:57
*** harlowja has joined #openstack-infra19:57
fungidefinitely. this should knock out a good chunk19:57
fungialong with the adjusted pattern for tripleo-ci ansible tempfiles and halving retention19:58
*** pcaruana has quit IRC19:58
mordred++19:58
*** baoli has joined #openstack-infra19:59
fungiand with a few teams making headway on reducing the number of files they're collecting, we should be in better shape in a couple weeks when we get back to a month of logs19:59
*** baoli has quit IRC19:59
SamYaplejust switch to btrfs with dynamic inodes. simple. there have never been issues at scale with btrfs.20:00
mordredSamYaple: yah - I can't see any potential issues with that at all20:01
jeblair2m inodes now, i clocked it at +125826 inodes/sec20:01
fungiSamYaple: reiserfs also didn't have an inode maximum20:01
jeblairer20:01
jeblair2m inodes now, i clocked it at +125826 inodes/min20:01
jeblair /sec would be truly impressive.20:01
mordredjeblair: that other rate would hve been amazing20:01
jeblairso maybe 45m to get to 7m free?20:02
SamYaplefungi: i dont want to... murder... performance though20:02
fungimy freighter can make the kessel run in 125826 parsecs20:02
fungiSamYaple: ooh, too soon20:02
clarkbI've really been enjoying zfs locally20:02
clarkbbut scrubbing 12 TB of logs is probably very slow :/20:03
SamYapleyea zfs is da bomb for alot of things. but it has its weaknesses20:03
dmsimardclarkb, fungi: I was reading ianw's middleware patch for serving tgz's ( https://review.openstack.org/#/c/122615/ ) and it gave me an idea.. how about we always create the /ara/ log directory with the ara sqlite database in it, and then a middleware intercepts requests to that directory, if a static report is not generated, it generates it ? It doesn't sound overly complex to achieve and it would20:03
dmsimardmake it so the ara reports would only be generated on demand and if required20:03
fungiclarkb: really, the slowness is the nearly a billion inodes20:03
funginot so much the block size20:03
clarkbfungi: ya though it checksums all the data too iirc20:04
clarkbat a block level20:04
openstackgerritMerged openstack-infra/shade master: Fix image task uploads  https://review.openstack.org/51153220:04
SamYapleall data and metadata, yup20:04
fungiclarkb: oh, so bandwidth hit20:04
clarkbits so stupid simple to use20:04
clarkbreally like the simplicity of it20:04
SamYapleand it can generate block devices20:04
SamYapleits really nice20:05
SamYaplethinly or think provisioned block devices at that20:05
clarkbwould be two commands to have our current logs lvm set in place20:05
fungiclarkb: i'll reserve "simple to use" for something with mainline kernel support i can use for my boot/rootfs20:06
SamYaplefungi: 16.04 started including it20:07
SamYaplefungi: so you can totally do that by default20:07
fungii suppose if i were to switch to freebsd it would be mainline20:07
clarkbI mean ext4 + lvm is also relatively simple, just more verbose20:08
fungiwell, ubuntu is shipping out-of-tree kernel drivers for zfs, right? i thought the cddl was incompatible with the gplv220:08
clarkbfungi: ya its a module20:08
SamYaplethere was a big license uproar about it, but it landed as "meh"20:08
SamYaplethe whole thing was "prove damages" and no one could20:08
SamYapleso it didnt really go anywhere20:08
fungiclarkb: yeah, all my personal systems boot from lvm220:08
clarkbya Fontana had a talk about it at seagl20:09
fungigrub has fine support for searching logical volumes these days20:09
SamYapleand zfs ;)20:09
clarkbtldr hard to show damages because source is provided on both sides and they aren't charging money from it thatyou'd be able to charge elsewhere20:09
SamYaplethat was my takeaway too20:09
clarkbfungi: my zfs box is booting lvm + ext4 on a dedicated device which then mounts the zfs pool20:09
fungiheh20:09
clarkbapproaching 2 million inodes20:10
clarkbalso scrubs are auto niced for you20:11
fungiit's like we've finally reached warp factor 220:11
clarkbso in theory they don't have major impact20:11
fungiand the reactor hasn't even shaken apart20:12
SamYaplewarp 2 on which scale?20:12
SamYaplethis is important20:12
fungioh, cochrane scale, sorry20:13
clarkbjeblair: what was the magic sauce for handling all those emails back in the day and inode counts? I imagine that was a very high inode to disk ratio?20:14
dmsimardbefore we got sidetracked by filesystems discussion I was trying to brainstorm about solutions to help with the unfortunate contribution of ARA to the inode exhaustion :p Another low hanging fruit would be to consider generating an ara report only when there is a job failure20:14
clarkbdmsimard: ya that was the idea pabelanger had earlier, I like it because I really only look at ara when things hav ebroken20:14
pabelangerand back20:14
pabelangercatching up on backscroll20:14
clarkbthat seems like a relatively easy intermediate fix20:14
fungiclarkb: "back in the day" your mailbox was one file which kept getting appended to20:15
SamYaplewas this inode issue the underlying problem with the mirror?20:15
dmsimardclarkb, jeblair: would a post job know that the job is going to fail ?20:15
dmsimardI guess the executor knows, but it's probably not passed on as a piece of information to the post jobs20:15
pabelangerianw: ya, we could configure-mirror roles should protect us with DIB_DISTRIBUTION_MIRROR, however we could also fix in finalize.d if we wanted20:15
*** gouthamr has joined #openstack-infra20:16
mordreddmsimard: yah - I believe there is a status variable that the post job should know about20:16
clarkbSamYaple: no they were separate issues20:16
mordreddmsimard: it's called 'success'20:16
clarkbSamYaple: mirror is on afs, logs inode is a 12TB ext4 fs20:16
SamYaplegot it. and also ouch20:17
clarkbya when it rains it pours20:17
SamYapleall this unrelated to rolling out zuulv3 ya?20:17
pabelangerokay, ubuntu-trusty DIB uploading to rackspace and citycloud-kna1 still20:17
dmsimardmordred: zuul_success ?20:18
dmsimardmordred: like http://git.openstack.org/cgit/openstack-infra/project-config/tree/roles/submit-logstash-jobs/tasks/main.yaml#n620:18
clarkbSamYaple: other than zuulv3 double running jobs potentiaily adding inodes to the logs fs correct20:18
mordreddmsimard: yes!20:18
fungiSamYaple: correct (discounting that we were adding a few additional build logs from running extra copies of a lot of jobs under v3 which likely didn't help matters)20:18
SamYapleman. craziness20:18
dmsimardmordred: ok, I'll send a patch.20:18
*** hasharAway has quit IRC20:18
mordreddmsimard: cool - I think that'll buy us time to think about some of the other options20:19
pabelangermordred: Shrews: do you think new shade will be today to address rackspace uploads? I'm asking, because we might have to remove ubuntu-trusty from rackspace, since it is broken20:19
*** esberglu has quit IRC20:19
fungipabelanger: honestly, we run so few jobs on trusty at this point that having it missing from a few regions for a while won't hurt us much20:20
mordredpabelanger: yes - it should be not too much longer20:20
pabelangerfungi: yah, that is true20:20
fungii would just go ahead and delete it there regardless of the timeline for getting a replacement uploaded20:20
pabelangerfungi: I'll propose the patch20:21
fungipabelanger: we can't just delete the trusty images there and wait for a corrected upload to eventually work?20:21
openstackgerritMerged openstack-infra/zuul-jobs master: Add a toggle to enable saving the ARA sqlite database  https://review.openstack.org/51152920:22
*** esberglu has joined #openstack-infra20:22
pabelangerfungi: Hmm, I think we could20:22
pabelangerthat might be better20:22
pabelangerfungi: I'll start with rax-ord and see20:23
*** ijw has quit IRC20:25
*** ijw has joined #openstack-infra20:25
mnaserwhen all this zuulv3 stuff settle, i would like to work with some infra core to add some monitoring, these things are so much easier to solve when you know they're coming up beforehand :(20:26
mnaseri can share some of the stuff we do and the tooling of how to do it in a distributed way (mostly stateless sensu-server, servers define their own checks in sensu-client) .. but yeah, i think it'd make all of our lives easier to find out about issues in advance (hopefully)20:27
clarkbmnaser: at the ptg rough plan was tohave a spec detailing options available then going from there (just because there are so many tools and they have their own strenths and weaknesses)20:27
clarkbmnaser: I think we were initially wary of sensu due to its open core nature and the need to run a message bus for it20:28
clarkb(but it should be on the list of options probably)20:28
mnaserclarkb i have a very document going over many of the OSS monitoring tools and why we ended up at sensu so i'll find that and share it20:28
*** AJaeger has quit IRC20:29
clarkb2.5million20:29
*** kgiusti has left #openstack-infra20:29
*** rbrndt has joined #openstack-infra20:29
mnaseryeah... but honestly, we haven't ran into any issues where we were like "dang, we'd want the enterprise for this one" .. the nice thing is that yuo dont have to maintain the checks in the server (unlike most other tools) but at the client, which makes it cleaner in writing puppet manifests and what not, but anyways, /me puts name down for that20:29
*** rbrndt has quit IRC20:29
dmsimardmnaser: I'm accountable for drafting a spec to do proactive monitoring20:30
dmsimardmnaser: I signed up for that :)20:30
mnaseroh even better :>20:30
*** jkilpatr_ has quit IRC20:30
pabelangerShrews: when you have a moment, I'm not sure why image-delete is not working. I get back 'Image upload not found'20:30
pabelangerShrews: sudo -H -u nodepool nodepool image-delete --provider rax-ord --image ubuntu-trusty --upload-id 0000002659 --build-id 000000000120:30
dmsimardmnaser: I believe it was part of when we discussed https://etherpad.openstack.org/p/queens-infra-metric-collection20:31
* fungi would prefer to see something based on an established standard, ideally snmp, but is willing to entertain other options20:31
pabelangerI don't mind nagios, after all these years20:31
fungidefinitely not a fan of monitoring systems which need server-side agents speaking nonstandard protocols20:31
fungii did deal with nrpe for many years on that front, but having proper snmp backends to check this is so much nicer20:32
*** e0ne has joined #openstack-infra20:32
fungiand net-snmp is very extensible if you want to write your own extensions for custom mibs20:32
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Update some legacy jobs  https://review.openstack.org/51155520:33
Shrewspabelanger: hrm, not sure. may need to do some digging20:33
clarkbfun queston time. Assuming we've got inodes and ubuntu mirror stuff under control. Is the last outstanding item for v3 re rollout 511260 to fix cache useage?20:33
fungiand anyway, the two major issues we have would have been spotted by 1. trending inode usage for filesystems (there is a standard oid for that, easy enough to check over snmp) and 2. evaluating the last updated timestamps on our mirrors (these can be polled over http and analyzed quite trivially)20:34
openstackgerritAndreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove legacy-.*python{34,35} jobs  https://review.openstack.org/51155720:34
fungiclarkb: as far as i know, yes (well, and getting check/periodic pipelines added back i guess so people can dry-run their v3 jobs again)20:35
jeblairinfra-root, dmsimard: when folks have a moment, i'd like to have some semi-structured conversation about 1) options for ara in v3 followed by 2) refreshing the rollout plan for v320:36
clarkbI'm good now20:36
jeblairassuming we think that fires are out enough we can do that while we wait for bg tasks to complete20:36
*** rbrndt has joined #openstack-infra20:37
fungijeblair: yep, i think we're in a good place for that now. inode usage has been dropping steadily rather than increasing for a while20:37
dmsimardjeblair: I am working on a patch for emit-ara-html to be able to only generate a report on job failure, it's a low hanging fruit that we can put through fairly easily.20:37
fungithanks dmsimard!20:37
jeblairlet's use this etherpad: https://etherpad.openstack.org/p/hdYC2ZKfWd20:37
dmsimardjeblair: Beyond that, it requires a bit of thinking outside the box -- whether that's figuring out how to translate a sqlite database to an ara interface on the fly somehow, or use a centralized instance, etc.20:37
jeblairdmsimard: yeah -- let me articulate my current thinking:20:38
jeblairpoint 1: we think running ara on every v3 job is bad for inodes20:38
jeblairpoint 2: we want to roll out v3 soon20:38
jeblairpoint 3: we should come up with short-term solutions to give us breathing room to roll out v320:39
Shrewspabelanger: oh, i think you have upload-id and build-id backwards20:39
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Remove ubuntu-trusty from rackspace  https://review.openstack.org/51157020:39
pabelangerShrews: oh, maybe20:39
jeblairpoint 4: there are long term changes that may make this better20:39
pabelangerShrews: let me test20:39
jeblairso i'm thinking we mostly need to decide on a short term solution now to give us room to roll out v3 and implement long-term solutions20:39
pabelangerShrews: Better! Thanks20:40
jeblairassuming we accept that point 1 is valid :)20:40
mordredI agree with those 4 points and the goal20:40
* Shrews debugs pabelanger20:40
*** hamzy has quit IRC20:40
mordred(like, I think that making it so that running ara on every change IS a thing that we want to do - but that is also likely to take slightly longer)20:40
pabelangerthanks! it is now deleting20:40
*** Apoorva_ has joined #openstack-infra20:41
dmsimardI agree with that as well, however I'd appreciate highlighting that while ara contributes to the issue it is not the sole responsible :(20:41
SamYaplesilly question, what is the ARA thing?20:41
clarkbSamYaple: ara has a floor of like 400 files per job and ceiling much higher depending on the job so it uses a lot more inodes than beofre when jobs might have had a couple files logged20:41
dmsimardSamYaple: this: http://logs.openstack.org/21/474721/7/check/gate-openstack-ansible-openstack-ansible-ceph-ubuntu-xenial/779e047/logs/ara/20:41
*** e0ne has quit IRC20:41
clarkbIn theory our jobs succeed more than they fail and ara is a useful debugging tool. I'd be inclined to start iwth ara only on failure and if that isn't enough then possibly just remove it by default?20:42
SamYapleoh i see20:42
clarkbwould it be possible to have ara locally accept the json file and emit a report?20:42
dmsimardGoing back to my previous statement, I'd like to make sure that we follow up with the other projects to make sure they are not needlessly logging things20:42
*** eharney has quit IRC20:42
clarkbso that we can continue logging the json file and then only feed it to ara if you know you want it?20:42
clarkbdmsimard: yes we should continue pushing on that too20:43
jeblairyes, though we also have longer term plans to rework logging so we may not care as much.20:43
dmsimardclarkb: I am looking in doing a bit like what you are proposing but with the sqlite database instead. Running off of the JSON would require more work.20:43
*** eharney has joined #openstack-infra20:43
mordredclarkb, dmsimard: I think that falls into the category of "medium to longer term we can make improvements to how we're using ARA or how ARA works or whatnot"20:43
*** Apoorva has quit IRC20:43
jeblairfor instance, following mordred's proposal to its conclusion means we get to the point where we say "every job gets 100MB.  put whatever you want in there.  it goes in swift.  we don't care"20:44
jeblairso i think it's worthwhile to push back on some really large inode jobs, but i think that's not a long-term sustainable strategy.20:44
* dirk has a few inodes to give away20:44
* mordred takes dirk's inodes20:44
clarkbjeblair: swift too has inode like limits last I looked into it20:45
mordredjeblair, clarkb, dmsimard: so far I'm the biggest fan of dmsimard's change to run ara only on failure20:45
clarkbjeblair: basically there is a ceiling on reasonable number of objects within a container to maintain performance20:45
pabelangerI like #2 so far, but this isn't long term right?20:45
openstackgerritDavid Shrewsbury proposed openstack-infra/shade master: Add method to set bootable flag on volumes  https://review.openstack.org/50247920:45
mordredclarkb: nod - but once we get to that point we'll have a good central place in-which to place limits20:45
mordredpabelanger: right - only short term20:45
jeblairwhat's the collecting the sqlite file option?20:45
notmynameclarkb: that "ceiling" is rather large, and it doesn't affect client performance20:46
jeblairi also favor ara-on-failure at the moment, but i do want to make sure we survey the options20:46
mordredjeblair: ++20:46
clarkbnotmyname: I think its roughly in the range of our current inode limit though ~1billion20:46
openstackgerritDavid Shrewsbury proposed openstack-infra/shade master: Add group parameter to create_server  https://review.openstack.org/51130520:47
fungijeblair: doing something server-side with ara to render reports out of a database file in a similar way to how we use osla to render log files20:47
mordredjeblair: dmsimard was investigating something wsgi-like to do the report generation only on-demand20:47
mordredfungi: jinx20:47
notmynameclarkb: but it's a per-container thing. so if you're doing a new container per job, that's ok. or container per project may be better20:47
jeblairah, interesting.  i feel like that's probably a long-term thing.  like, we should be weighing that against running a central server, (or static generation into swift)20:48
funginotmyname: agreed, and we could be doing something similar with local filesystems, but what we're doing now is akin to dumping them all in one container20:48
jeblairdmsimard: is server-side sqlite generation a 2 day project or longer?20:48
mordredyah - I think that's not a 'by tomorrow' kind of option and might take a little longer for us to be comfortable with it- especially since the main value in the ara reports is helping to diagnose job issues20:48
mordredso if we need to make sure the on-the-fly report generation is solid ...20:49
dmsimardjeblair: I'm not sure if it's the best approach, it would mean logs.o.o would be using it's cpu for generating reports20:49
notmynameack. the current feature/deep work (hoped for my early next year) will solve that once and for all (ie N billions of objects per container is no problem, only limited by your installed hardware capacity)20:49
pabelangerdoesn't need to be logs.o.o, we could stand up another server20:49
mordrednotmyname: cool20:49
mordredpabelanger: the files are sqlite20:49
clarkboh that is good to know re swift20:49
funginotmyname: neat!20:49
jeblairpabelanger: it'd have to get the sqlite file from logs.o.o, which is not scalable20:49
dmsimardjeblair: The direction I'm looking at, is more like... ara.openstack.org/?database=path/to/sqlite/in/logs.sqlite or something20:50
mordredI think it's likely better use of hacking resources to figure out a centralized ara than an on-demand ara20:50
dmsimardI don't know20:50
mordredI could be wrong about that - but it's an unknown enough I doubt it's the solution for this week20:50
dmsimardA centralized ara is not complicated, we just need to figure out how to feed the data back to the instance -- because we don't want to have the callback do a call to a remote mysql server on each task. Just the latency from farther nodepool regions would not be good.20:51
jeblairdmsimard: i put that as long term option #320:51
mordreddmsimard: yah - well, there's also grouping issues we'd need to figure out too with centralized20:51
dmsimardAn example would be to have a "post subunit gearman" thing and then have that import data back into a central instance20:51
dmsimardmordred: right, that too.20:51
pabelangermqtt could be a long(er) term option too20:52
mordreddmsimard: having one ara with 10000 playbook runs called "run.yaml" ... :)20:52
dmsimardthere's permalinks for playbooks but not group of playbooks20:52
fungioh, i just realized this is specifically long-term options for ara, not more general options for increasing inode capacity/decreasing inode usage on the logs site20:52
jeblairpabelanger: how does mqtt help?20:52
mordredfungi: well - no, option 1 is not about ara20:52
mordredfungi: long term #1 is about offloading caring about inodes to swift - but has a few steps between us and it to be viable20:52
dmsimardI have to step away momentarily... not going to have time to finish the patch for only running on failure, if anyone can put that up I can review in ~1hr20:53
dirkIs moving stuff into a readonly squashfs mirror an option? That should save plenty of inodes20:53
pabelangerjeblair: in my brain, we publish to mqtt, instead of sqlite, then have a series of ara collectors, to generate static bits, then upload some place.  However, would be a lot more moving parts20:53
dmsimardnevermind, looks like I got time20:54
mordreddmsimard: if you run out of time I can take over20:54
jeblairdirk: re-architecting log storage is a long-term option.  that would be up there with "use ceph" and "use swift"20:55
pabelangerfungi: Shrews: I've deleted ubuntu-trusty images from rax-ord20:55
mordreddmsimard: or I can write it if you'd like to think about other things20:55
jeblairfungi: that docs-draft thing is worth keeping in mind as it's a medium-term mitigation.20:55
dmsimardmordred: nope, I'll have time to finish it.20:56
dmsimardmordred: also, it's something we have to be very cautious about.. remember it's in the base job and it's not something that is integration tested :(20:57
*** caphrim007_ has quit IRC20:57
mordreddmsimard: ++20:57
fungijeblair: agreed20:57
clarkbfor 4 weeks of logs ~28760941 is the number of inodes we can use per day20:57
*** caphrim007 has joined #openstack-infra20:58
jeblairokay, we only have two short term options: #2 run ara on success.  #3 don't run ara at all.  anything else we can do short-term?20:58
clarkbif we want to run 25k jobs per day that is about 1150 inodes per job average20:58
clarkb(25k per day is our rough peak from a year ago iirc)20:58
mordredjeblair: those are the only things I can think of right now for short term20:58
*** dangers is now known as dangers_away20:58
jeblairclarkb: that makes me think reducing log retention should be in the short-term list20:58
*** caphrim007_ has joined #openstack-infra20:59
clarkbI think 10-15k is likely a more reasonable current average jobs per day which will roughly double the inode count20:59
*** caphrim007_ has quit IRC20:59
jeblairshould we also consider reducing to 3 weeks retention as a short-term solution?21:00
*** iyamahat has quit IRC21:00
mordred++21:00
clarkbdevstack,grenade,tempest,tox related jobs all fit into that set of limitations based on my scanning. But osa, tripleo, and potentially others don't21:00
clarkbjeblair: ya I think so21:00
fungiranking preferences, i would have to say i vote 2,3,4,121:00
pabelangerI like #2 if we can swing it21:00
pabelangerbut, understand if we have to do #321:00
mordredwell - also - the osa/tripleo inode counts are the same v2 and v321:00
dmsimardmordred: what are the values for zuul_success ? It seems like it's either undefined or true21:01
mordredthe main thing is the additional inodes from ara run by v3's ansible21:01
clarkbmordred: yes, roughly the same21:01
mordreddmsimard: yah - use the | boolean filter21:01
dmsimardmordred: ok21:01
mordredclarkb: I _think_ a normal zuul-generated-ara report is around 500 inodes21:01
fungiwe're up over 3m inodes free now, btw21:01
*** iyamahat has joined #openstack-infra21:01
*** caphrim007 has quit IRC21:02
clarkbmordred: ~400 seems to be the low end21:02
clarkbmordred: for eg pep8 jobs21:02
mordred(as opposed to the multi-10k reports from some of the larger jobs that are using ara in their job content21:02
Shrewswait, "run ara on success"? not failure? i must've missed something, cause failure is where ara is most helpful, yeah?21:02
jeblairi was leaning toward 2+4 togther then fallback to 3.21:02
mordredShrews: on failure21:02
*** trown is now known as trown|outtypewww21:02
clarkbjeblair: ya I think that is my preference too21:02
fungiShrews: run on failure, using the zuul_success variable to determine whether there was a failure21:02
clarkb3 is fallback from 221:02
mordredmy prefernce is also 2+4 and fallback to 321:03
jeblairfungi: if i interpret your earlier statement, you'd prefer to keep retention at 4 weeks even if it means not running ara at all?21:03
fungii'm uncertain option 4 is strictly necessary (aside from the one-time expiration i'm doing right now to deal with the current crisis)21:04
dmsimardWhat we need to keep in mind is that we'll need to retrofit the 'generate ara only on failure' to openstack-ansible, tripleo and kolla-ansible as well, they are using it outside of zuul v321:04
fungibut yeah, i see options 3 and 4 as roughly equal preference21:04
*** edmondsw has quit IRC21:05
clarkbdmsimard: I think thats a separate concern of continuing to work with various projects to prune and curate the logs they collect21:05
fungiso maybe my preference is 2,3|4,3&4,121:05
clarkbdmsimard: that might involve only running ara on failure along with cleaning up etc/ and so on21:05
jeblairfungi, clarkb, mordred: i think the compromise position then is 2, then 4, then 3.  how's that sound?21:06
fungiwfm21:06
dmsimardI'll have a patch up for #2 soon.. just being extra careful about it and testing every bit of it21:06
mordred++21:06
fungidmsimard: appreciated!21:06
clarkbjeblair: sounds like a plan21:06
jeblairokay, i put that in the etherpad21:07
jeblairthe next thing, while we're all here, is how we should proceed with v3 rollout21:07
dmsimardmordred: zuul_success is undefined on failure, right ? (double making sure)21:07
jeblairi'm inclined to say that we should allocate tomorrow as a day for continued stabilization21:07
*** jkilpatr_ has joined #openstack-infra21:08
jeblairthe mirror issue may not be resolved by tonight, or even by tomorrow21:08
SamYaplejeblair: but i want my zuulv3 for the weekend :(21:08
clarkb511260 finally appears close to merging21:08
dmsimard+1, rolling out on a friday is not a good idea21:08
*** thorst has quit IRC21:08
clarkbjeblair: do we want to maybe turn check et al back on in v3 and watch it?21:08
fungijeblair: yes, at this point i'd be concerned about rolling back onto v3 on a friday21:08
clarkbmaybe after the ara on failure thing is in place21:09
mordreddmsimard, jeblair: two things- a) it's always defined in post playbooks b) it's not set if a pre-playbook or a previous post playbook failed21:09
dmsimardmordred: ok.21:09
mordredso I think we should also make a patch to zuul to make sure we set either zuul_success or a new variable if ANY of the playbooks fail21:09
jeblairmordred: maybe we need a new var?21:09
mordredjeblair: yah. let's do that ... I can make that patch21:09
jeblairyeah, one of those things.  i'm not sure which yet.  :)21:10
fungiclarkb: i could see turning check pipelines back on in v3 sometime tomorrow, as overal ci volume tends to trail off around 16:00z or so21:10
*** kjackal_ has quit IRC21:10
fungion fridays21:10
jeblaireven with ara-on-failure in place, are we worried about general additional volume?21:10
pabelangerjeblair: fungi: if we did rollout on friday, it would possible low jobs over weekend for fixes eager people would want to make21:10
pabelangerbut, agree we should stabilize first21:10
jeblairit's not a lot, though it will run continuously over the weekend and still emit more than normal log volume21:10
fungipabelanger: but fewer of _us_ around to deal with lurking bugs in zuul v3 we have yet to uncover21:10
jeblairalso, we've had >24h partial outage; i kinda don't want to push it.21:11
jeblairit might be nice for folks to be able to land changes for a few minutes.  :)21:11
fungiyup21:11
*** eharney has quit IRC21:12
clarkbya I also don't like feeling compeleed to firefight over the weekend :)21:12
jeblairi'm going to start a new section on that etherpad at the bottom21:12
*** gouthamr has quit IRC21:13
fungii so hope it's a section listing new drinking games i can try over the weekend21:14
jeblairi don't normally like to say "can i please work on the weekend?" but at this point would be willing to help flip the switch sunday evening.21:14
*** eharney has joined #openstack-infra21:14
mordredyah. I think a sunday rollout is not a terrible idea21:14
jeblairand of course, our 1100utc plan for monday or tuesday would be fine too.21:14
fungii could see doing it late sunday _if_ ianw and yolanda are handy to keep an eye on things21:15
mordredyes - all three work for me- I'll be flying starting afternoon local time on monday ...21:15
mordredso if we do monday morning I may be less available to help than other times21:15
clarkbya I don't mind a later sunday21:15
fungiotherwise it'll be one of those where i wake up and never get a chance to pour a cup of coffee21:15
mordredI think my preference is sunday, tuesday, monday21:15
mordred(or, rather, those are ordered by amount of time/effort I'll be able to directly contribute)21:16
clarkbI like sunday because we'll be able to hopefully sort out any issues without the full load of the system on it21:17
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Switch success to false if a post playbook fails  https://review.openstack.org/51161921:17
clarkbit was definitely easier to fix problems over the weekend after the last rollout21:17
dmsimardwhat's the rush?21:17
dmsimardwhy sunday ?21:17
fungidmsimard: the longer we wait, the later into the release cycle we creep with these disruptions21:18
mordreddmsimard: job config is effectively frozen for people21:18
mordredand yah - what fungi said21:18
dmsimardokay, that's fair21:18
mordreddmsimard, jeblair, clarkb: https://review.openstack.org/511619 is the zuul_success patch21:19
clarkbI've also got to start prepping for event things (summit mostly) so earlier the better for me21:19
jeblairokay, to my secret disappointment, no one has vetoed sunday  :)21:19
mordredjeblair: :)21:19
jeblairwhat time sunday works for east-coasters?21:19
dmsimardmordred: I'm covering for the edge case where it might not be defined21:19
jeblairand, erm, central coasters?21:19
mordreddmsimard: that's great - that patch is just making sure that if a post playbook fails that subsequent post playbooks get success==false21:20
dmsimardMy mother is coming to visit this weekend, I'm east coaster and can respond to pings but might not be available for longer periods of sustained work21:20
fungii have no hard scheduled obligations next week, and am happy to support whatever/whenever people want to do the next v3 rollout attempt21:20
fungii can be around as late as, say, 04:00z21:21
jeblairdmsimard: well, i'm not expecting us to do sustained work on sunday, more like perform the transition so that people start the day on nv321:21
fungi(which is technically utc monday, not sunday, but whatevs)21:21
jeblairstart monday on v3 that is21:21
mordredjeblair: agree21:21
pabelangerI'm traveling on Wed, Thur, Friday next week too21:21
mordredturns out flipping the switch itself is actually not too hard21:21
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul-jobs master: Add the ability to generate an ARA report only on job failure  https://review.openstack.org/51162221:21
dmsimardmordred, jeblair, fungi, clarkb, pabelanger ^21:22
dmsimardI *think* that's okay, but please do review it carefully, no integration tests and all21:22
Shrewsi can be around late sunday, but am having my tooth cleaned monday morning so less around then21:22
*** aviau has quit IRC21:22
fungiShrews: just the one tooth, eh?21:22
dmsimardlol21:22
*** aviau has joined #openstack-infra21:22
*** jascott1 has quit IRC21:22
* clarkb somewhat arbitrarily throws out 2200UTC21:23
Shrewsfungi: of course. all i need21:23
clarkbthat is mid afternoon for pacific coasters and evening for easter/central coasters21:23
fungiShrews: as long as you can open a beer with it, you're all set i suppose21:23
jeblairclarkb: that wfm21:23
mordredclarkb: 2200 wfm21:23
fungii'm cool with a 22-23:00z window21:24
dmsimardjeblair: vars across jobs are merged or replaced ?21:24
fungithat's like 6-7pm local here so early by my standards21:24
dmsimardjeblair: I mean, if we put 'ara_generate_report: failure' as a var in the base job(s), it's going to stick around, right ?21:24
jeblairdmsimard: merged21:24
dmsimardjeblair: ok, nice.21:24
*** mat128 has quit IRC21:25
jeblairdmsimard: yes.  it will be overridable by children, but i'm not worried about folks overriding that.  for now.21:25
mordred++21:25
clarkbok should we call it 2200UTC sunday then?21:25
mordred++21:25
clarkbuse the rest of today and tomorrow to stabilize21:25
jeblairdmsimard: let's just stick in some documentation asking folks to please not override it. :)21:25
jeblairclarkb: ++21:25
fungiclarkb: sgtm21:25
dmsimardjeblair: I was more worried about someone declaring just 'vars' and removing it more than someone overriding the default value21:26
clarkbmordred: do you want to follow up to your thread about the mirror and devstack-gate with a zomg inodes but now we are looking to be in a better place and aiming for sunday rollout?21:26
*** aeng has joined #openstack-infra21:26
mordredsure!21:26
openstackgerritDavid Moreau Simard proposed openstack-infra/project-config master: Test ARA report generation only on failure in base-test  https://review.openstack.org/51162421:26
dmsimard^ testing the toggle in base-test21:27
fungiwe're just over 4m inodes free now, so about halfway to where we want to be for a 1% free cushion21:27
clarkboh maybe wait for the cushion before emailing21:27
clarkband possibly sneak in a "please review your logs and remove unnecessary things like logging all of etc or selinux or systemd units21:27
fungiyes, we can gloat about what a great position we're in with only 99% inode utilization on that volume ;)21:28
clarkbI could also delete the logs for this one change and free up 2.8 million inodes21:28
clarkbto get us there quicker >_>21:28
fungideletion of /srv/static/logs/??/??????/*/*/gate-tripleo-ci-*/logs/ercloud/tmp completed a little while ago, looks like21:29
clarkbfungi: hrm I was still seeing them21:29
clarkbwas it undercloud and no ercloud?21:30
fungier, that's a terrible pattern21:30
fungii must have missed an un when i edited that line. restarting :/21:30
fungi"und" added and rerunning. /srv/static/logs/??/??????/*/*/gate-tripleo-ci-*/logs/undercloud/tmp this time21:31
mordredfungi: that seems gooder21:31
pabelangerfungi: clarkb: jeblair: gnutls issue on ubuntu-trusty fixed now too21:31
clarkbfungi: oh I think there is a bug in that too21:31
fungipabelanger: excellent!21:31
pabelangerand bad images from rackspace deleted21:31
clarkbfungi: needs to be gate-tripleo-ci-*/*/logs/undercloud/tmp21:31
pabelangershould be re-uploaded once shade has been released21:32
clarkbfungi: to get the build uuid21:32
fungiclarkb: yep, you're right. fixing21:32
pabelangerfungi: clarkb: which means we can then land https://review.openstack.org/511543/ to fix system-config21:32
*** thorst has joined #openstack-infra21:33
clarkbpabelanger: I've approved it so should be fine if the recheck comes around good21:33
mnaseryay, things getting fixed21:34
pabelangerdistributed sysops21:34
mordreddmsimard: one comment21:34
*** eharney has quit IRC21:34
mordreddmsimard: otherwise looks good to me21:34
mordredclarkb: I would not oppose you deleting all the logs for that one job :)21:35
pabelangerclarkb: I've also just removed nb04.o.o from emergency file21:35
*** thorst has quit IRC21:36
*** lifeless has quit IRC21:37
dmsimardmordred: the reason it took longer to get the patch up is that I was testing exactly your comment21:39
dmsimardmordred: the problem is that false and failure have different behaviors21:39
dmsimardmordred: false == never generate, failure == only generate on failure21:39
mordreddmsimard: yes - they do - but the first condition is checking for true21:40
dmsimardtrue would be == always generate21:40
mordreddmsimard: so == true and | bool should both have the same effect21:40
mordredyou're using | bool on the false branch already21:40
dmsimardoh21:40
openstackgerritDavid Shrewsbury proposed openstack-infra/shade master: Image should be optional  https://review.openstack.org/51129921:40
dmsimardlet me test21:40
openstackgerritDavid Shrewsbury proposed openstack-infra/shade master: Add method to set bootable flag on volumes  https://review.openstack.org/50247921:40
mordreddmsimard: it's possible this is dumb - my python brain is rejecting == true -but maybe in jinja == true is ok?21:40
openstackgerritDavid Shrewsbury proposed openstack-infra/shade master: Allow domain_id for roles  https://review.openstack.org/49699221:41
clarkb511260 should enter the gate shortly, last job is running against it now21:41
dmsimardmordred: yeah, == true is okay and works21:41
clarkbdid we decide on whether or not we should reenable v3 pipelines?21:41
dmsimardmordred: but | bool also works21:41
clarkbjeblair: ^21:41
dmsimardmordred: I was afraid of using | bool for what might up being a string (that would evaluate to true)21:42
dmsimardmordred: in python, a non-empty string is true21:42
mordreddmsimard: no - | bool on a string is false (I just checked that in ansible)21:42
*** srobert_ has joined #openstack-infra21:42
dmsimardmordred: but in the jinja bool filter, it looks like a non-empty string (that is not 'false') is false21:42
mordreddmsimard: BUT ... if == true works in jinja, let's do it21:42
openstackgerritMerged openstack-infra/puppet-openstack_infra_spec_helper master: Cap signet < 0.8.0  https://review.openstack.org/51154321:42
dmsimardmordred: | bool is fine, I tested it and it works.. it's just my brain confusing python and ansible/jinja boolean :/21:43
mordreddmsimard: or, rather, I'm fine either way now that we've verified that | bool and == true both ahve the same impact21:43
clarkbpabelanger: ^ there we go21:43
openstackgerritDavid Shrewsbury proposed openstack-infra/shade master: Move role normalization to normalize.py  https://review.openstack.org/50017021:43
jeblairclarkb: drat.  we did not.21:43
mordredmaybe re-enable tomorrow once the system has stablized more?21:43
openstackgerritDavid Moreau Simard proposed openstack-infra/zuul-jobs master: Add the ability to generate an ARA report only on job failure  https://review.openstack.org/51162221:43
dmsimardmordred: ^ now with | bool21:43
jeblairmordred: yeah, that sounds like a plan21:44
clarkbwfm21:44
pabelangerclarkb: yah, rechecking system-config now21:44
clarkbover 5 million now21:44
dmsimardmordred: also added a comment for posterity21:44
dmsimardclarkb: wow that's over 900021:44
fungimordred: jeblair: clarkb: i suggested tomorrow as well, mainly because around 16:00z on a friday utilization will start to trail off heading into the weekend so we can continue to make progress on inode cleanup in the background21:44
openstackgerritMohammed Naser proposed openstack-infra/openstack-zuul-jobs master: Drop tox_constraints_file from include_role for release notes  https://review.openstack.org/51162721:45
openstackgerritMohammed Naser proposed openstack-infra/openstack-zuul-jobs master: Move tox_envlist into job variables for releasenote jobs  https://review.openstack.org/51162821:45
*** srobert has quit IRC21:45
mnaseri identified two issues with releasenote jobs, very quick review mordred ^21:45
fungisomewhere around 16-19:00z at any rate21:45
mnaseryou can also see the failure happening here - http://logs.openstack.org/54/511054/1/check/build-openstack-releasenotes/6aeff53/job-output.txt.gz21:45
fungibut for now, i need to disappear for a while. back later21:46
mnaser(sorry just stomping here mid-discussion)21:46
*** jcoufal has quit IRC21:46
*** srobert_ has quit IRC21:47
mordredmnaser: yes. great patches21:47
*** esberglu has quit IRC21:47
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Switch success to false if a post playbook fails  https://review.openstack.org/51161921:48
*** esberglu has joined #openstack-infra21:48
mnasershould i add those in any etherpad that's being used so they get eyes or maybe someone here around can give them a quick review (i haven't been following much today)21:48
*** iyamahat_ has joined #openstack-infra21:48
*** boden has quit IRC21:49
mordredclarkb, pabelanger, fungi: mnaser's patches above lgtm - could use an extra set of eyeballs21:49
*** iyamahat has quit IRC21:49
*** slaweq_ has quit IRC21:50
pabelanger+321:50
jeblairmordred: i know we all have brainhurt, but do you to chat about periodic jobs and branches now?21:50
mordredjeblair: yes - although I just had an idea I want to float first ...21:51
jeblairhttps://review.openstack.org/511533 was the change which brought it up21:51
mordredjeblair: does abandoning a change emit an event zuul can act on?21:51
jeblairmordred: yes21:51
jeblair(whether it correctly does act on it in either master or v3 atm, i could not say for certain)21:52
pabelangerclarkb: zomg, centos-7 failed again with ssh key21:52
mordredjeblair: what if we made a 'cleanup' pipeline that ran when a change is abandoned, and had something that would delete the logs for an abandoned change ...21:52
pabelangerclarkb: looks like I'll be debugging that next21:52
mordred(came to mind as I just abandoned a bunch of DNM test patches that each had a ton of tests associated with them)21:52
*** esberglu has quit IRC21:52
jeblairmordred: the v3 logs have the abandoned tests enabled, so it should be functioning.21:53
pabelangerfor now, I have to run.21:53
jeblairmordred: interesting.  i *think* that would take code changes.21:53
jeblairmordred: i believe we have zuul hard-coded to remove abandoned changes from pipelines21:53
jeblairmordred: so we'd have to drop that and rely on the 'status:open' pipeline requirement21:53
jeblairmordred: i think it's feasible; couple hours of work.21:54
*** threestrands has joined #openstack-infra21:54
jeblair(tbh, i think that's the better long-term structure for the code anyway)21:55
*** Apoorva_ has quit IRC21:55
jeblair(i think the hard-coding predates pipeline requirements)21:55
mordredjeblair: maybe let's put that on the backburner for next time we feel like hacking in such areas ... if we did that, I think perhaps we rename the merge-check template to "system-required" and put the 'delete logs' job into that - so that the openstack story is "you always have to have the system-required template"21:55
*** gouthamr has joined #openstack-infra21:55
jeblair++21:56
mordredmaybe we should do that second part anyway21:56
jeblair++21:56
mordredjeblair: ok - so - branches21:57
jeblairthe idea in v3 is that periodic jobs are just like regular jobs.  so instead of putting "periodic-foo-master" and "periodic-foo-pike" on a project, you just put "foo"21:57
mordredjeblair: yes! this I agree with wholeheartedly21:58
jeblairzuul emits trigger events for every project-branch combination21:58
mordredjeblair: and now I believe I understand what you were saying21:58
*** wolverineav has quit IRC21:58
jeblairso if you add a periodic job to a project, it'll run on all that projects branches21:58
jeblairso if you only want it to run on a subset of branches, you just use branch matchers in the project-pipeline in the regular way21:58
mordredjeblair: so instead of putting branch-override ... yah. that21:58
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Drop tox_constraints_file from include_role for release notes  https://review.openstack.org/51162721:58
mordredjeblair: cool. that all makes sense to me- and yes, I think that's definitely the way to go for new jobs21:59
jeblair(or, you can put the branch matcher on the job definition itself, if that's something you can say globally)21:59
jeblairmordred: yep21:59
*** yamamoto has joined #openstack-infra21:59
*** lifeless has joined #openstack-infra22:00
mordredjeblair: for *legacy* jobs... I think ajaeger's patch - except s/override-branch/branches/ is the right thing22:00
mordredjeblair: becuase those generated jobs are all expecting to only be triggered for the branch in question22:00
*** wolverineav has joined #openstack-infra22:00
mordredjeblair: and we should definitely replace them all with new v3 jobs that are done correctly - but I don't think we should try to correct them semantically in place22:00
jeblairokay, that works for me.22:01
mordred(I worry that if we tried to collapse at that scale we'd get something weirdly wrong)22:01
jeblairgood point22:01
mordredbranches can take a scalar right?22:01
jeblairmordred: a scalar or a list22:01
mordredso "branches: master" works? awesome22:01
mordredI'll modify that patch real quick22:01
openstackgerritOpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements  https://review.openstack.org/50364522:02
*** esberglu has joined #openstack-infra22:02
jeblairyep.  as does "branches: [stable/ocata, stable/pike]".  this is, i think, going to be a big improvement.  :)22:02
jeblairthe checksums process is 93% complete; i'm afk for 20m.22:03
*** iyamahat_ has quit IRC22:03
*** iyamahat__ has joined #openstack-infra22:03
mordredjeblair: while I'm doing that - you feel like restarting v3 scheduler to pick up the zuul_success fix so we can test dmsimard's patch?22:03
mordredor - afk - that's better22:03
clarkbpabelanger: I have a change up to help debug that by dumping data from config drive and /home/root/.ssh/authorized_keys, not sure if it mergee22:04
dmsimardI have to relocate, I'll be back in >1hr22:06
*** esberglu has quit IRC22:06
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add branches to all periodic jobs  https://review.openstack.org/51153322:07
*** rbrndt has quit IRC22:08
openstackgerritMerged openstack-infra/system-config master: Add documentation on force-merging a change  https://review.openstack.org/51124822:08
openstackgerritMerged openstack-infra/shade master: Add group parameter to create_server  https://review.openstack.org/51130522:10
*** rbrndt has joined #openstack-infra22:10
*** jascott1 has joined #openstack-infra22:12
*** bobh has quit IRC22:12
*** mriedem1 has joined #openstack-infra22:14
*** mriedem has quit IRC22:15
*** jascott1 has quit IRC22:16
*** jascott1 has joined #openstack-infra22:16
*** Keitaro has quit IRC22:20
*** jascott1 has quit IRC22:21
*** gildub has joined #openstack-infra22:22
clarkbpabelanger: https://review.openstack.org/#/c/501887/22:25
*** jascott1 has joined #openstack-infra22:25
clarkbI've rechecked it, I was sort of hoping we'd catch a failure premerge22:26
clarkbwe are at 99% used now22:26
clarkbfungi: ^ fyi22:27
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Use native propose-translation jobs  https://review.openstack.org/51143522:28
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add branches to all periodic jobs  https://review.openstack.org/51153322:28
clarkbI've approved 511260 now22:28
*** threestrands has quit IRC22:28
mordredjeblair, clarkb: ^^ updated both patches - they should be correct for v3 now22:28
clarkbI'm reviewing the ara on failure related changes now22:30
mordredcool22:30
jeblairmordred: back.  i think that's an executor fix.  we can leave the scheduler and restart the execs22:30
mordredjeblair: oh! good point22:30
jeblairmordred: i will do that22:30
mordredjeblair: thanks22:30
*** gouthamr has quit IRC22:30
jeblairokay, i ran "service zuul-executor stop" on all ze machines, and they all stopped cleanly22:32
jeblairmordred: there's something i think i need your help with though22:33
jeblairmordred: every time zuul is installed, it seems to ignore the "-e git+https" requirement and re-installs the new version of gitpython22:33
jeblairmordred: i think clarkb said pbr may somehow be involved22:33
fungii just knew if i stopped to eat something, i'd miss the great unveiling of the 99%22:34
clarkbjeblair: mordred yes it is, basically setuptools doesn't understand those requirements so when pbr reads those reqs into setuptools it strips all the git stuff out and uses the egg as is22:34
clarkbjeblair: mordred you have ot intsall requirements with pip directly to get it to do what you expect22:34
*** Keitaro has joined #openstack-infra22:34
clarkbmordred: dmsimard re https://review.openstack.org/#/c/511624/1 what tests use base-test? I'd like to see them not include ara reports on success22:35
jeblairclarkb: yeah, but we re-install zuul on every commit.  so that means we're now uninstalling our gitpython fork on every commit.22:35
clarkbjeblair: probably the thing to do is make zuul install a pip install -U /path/to/zuul && pip install -U /path/to/zuul/requirements.txt ?22:35
ianwjeblair: i see the checksums has now exceeded the size of the original.  that's good, i guess?22:36
jeblairclarkb: the procedure is to land a change to base-test, then create a DNM change which reparents any job (say, unittests) to base-test, and examine the results.  if that works, then we copy the change from base-test to base.22:36
*** lin_yang has joined #openstack-infra22:36
jeblairclarkb: nothing normally uses base-test.  so it's safe to land changes to it as long as they look reasonable.22:37
clarkbjeblair: gotcha, so we have to merge the things first22:37
jeblairyep.  we should put this in some documentation around there :)22:37
jeblairianw: neat!22:37
*** threestrands has joined #openstack-infra22:37
clarkbI've approved the parent of https://review.openstack.org/#/c/511624/1 if we can get a second review on that change it would be great to get this tested soon22:38
jeblairdone22:39
clarkbtyty22:39
mordredjeblair: I agree with the thing clarkb said - we could also do pip install -U /path/to/zuul/requirements.txt && pip install --no-deps -U /path/to/zuul22:40
jeblairclarkb: yeah, i think your install procedure will work22:40
jeblairi just tested that manually...22:40
jeblairmordred: should i do clarkb's thing or try yours?22:40
clarkbmordreds is likely a bit quicker22:41
mordredjeblair: try mine- it's less churn - clarkb's will owrk - but will result in gitpython temporarily being changed22:41
fungiand does also cause some dependencies to be installed and then reinstalled i guess22:41
clarkbseems like inode cleanup is really flying now. I wonder if that is the addition of the undercloud/tmp cleanup?22:42
fungi(in the case of the ones which have to be installed from git urls)22:42
fungiclarkb: i have a feeling it's ci load dropping off for the evening22:42
fungiwe tend to make more progress on bulk cleanup tasks like this on the logs site off-hours and on weekends22:43
*** mriedem1 has quit IRC22:43
clarkbah22:43
openstackgerritMonty Taylor proposed openstack-infra/puppet-zuul master: Split zuul and requirements install  https://review.openstack.org/51163722:43
fungiwhereas during peak load its lucky to be marching in place22:43
mordredjeblair, clarkb: ^^ like that22:43
jeblairoh that's easier than what i was about to do. :)22:43
jeblairmordred: needs a "-r" though, right?22:44
mordredjeblair: were you going to make a puppet resource dependency graph?22:44
jeblairmordred: yes22:44
mordredjeblair: YES22:44
openstackgerritMonty Taylor proposed openstack-infra/puppet-zuul master: Split zuul and requirements install  https://review.openstack.org/51163722:44
jeblairlgtm22:45
clarkbapproved22:45
mordredjeblair: does https://review.openstack.org/#/c/511533 and its parent look better to you now?22:45
jeblairi re-installed deps manually and restarted all ze machines22:46
jeblairmordred: yep22:47
clarkbbecause I was curious I checked my local 4TB zpool's inode count and it has more than 7 times the number of inodes of our 12TB fs22:47
clarkb(and that was just with default fs creation commands)22:47
clarkbif we ever get around to moving this filesystem maybe we should multiply the inode count by some big number22:48
jeblairya22:48
mordredwho would ever need more than 640k of inodes22:48
mordreds/0//22:48
mordredbleh22:48
*** rbrndt has quit IRC22:52
jeblairfungi: want to send the status ok?22:54
*** iyamahat__ has quit IRC22:54
jeblairianw: however, being at 105% of completion has thrown off my time estimates.  :|22:54
ianwi got all day :)22:55
ianwi just hope it fixes it22:55
clarkbjeblair: that is a neat trick22:55
jeblairthe file it currently has open is in the "/u/" directory22:56
jeblairi have no idea how close to alphabetical it is though.22:56
fungijeblair: you bet22:56
jeblairnext time (please no) -- find > file, then reprepro < file.22:57
openstackgerritMerged openstack-infra/zuul-jobs master: Add the ability to generate an ARA report only on job failure  https://review.openstack.org/51162222:57
*** iyamahat has joined #openstack-infra22:58
fungistatus ok Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now22:59
fungithat look okay?22:59
jeblair++23:00
clarkbyes23:00
ianwjeblair: combined with "pv" in the middle, it might even come up with an accurate %23:00
fungi#status ok Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now23:00
openstackstatusfungi: sending ok23:00
*** aeng has quit IRC23:01
*** ChanServ changes topic to "Discussion of OpenStack Developer and Community Infrastructure | docs http://docs.openstack.org/infra/ | bugs https://storyboard.openstack.org/ | source https://git.openstack.org/cgit/openstack-infra/ | channel logs http://eavesdrop.openstack.org/irclogs/%23openstack-infra/"23:03
-openstackstatus- NOTICE: Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now23:03
jeblairianw: oh it finished!23:03
jeblair13484 files were added but not used.23:03
jeblairThe next deleteunreferenced call will delete them.23:03
ianw:/ that seems like ... a lot23:03
*** xarses has quit IRC23:04
jeblairianw: i'm a little worried that maybe i should have started with step 1?23:04
ianw? i maybe wouldn't run the /usr/local/bin script, isn't deleteunreferenced a separate step there (checking ...)23:04
jeblairi was assuming that references.db was okay, but i don't know that.  (i'm not actually sure how to know that)23:04
openstackgerritMerged openstack-infra/puppet-zuul master: Split zuul and requirements install  https://review.openstack.org/51163723:05
ianwhmm, that one i regenerated yesterday afternoon23:05
jeblairianw: okay, assuming that's okay, then i think we've done step1 and step223:06
openstackstatusfungi: finished sending ok23:06
openstackgerritMerged openstack-infra/shade master: Image should be optional  https://review.openstack.org/51129923:06
jeblairianw: there is no step 323:06
clarkbcan we serve the read write volume befor ereleasing it?23:07
jeblairianw: should i try running "reprepro update" now?23:07
ianwjeblair: i would just run "k5start -t -f /etc/reprepro.keytab service/reprepro -- reprepro --confdir  /etc/reprepro/ubuntu update" by hand23:07
ianwyeah, that :)23:07
jeblairclarkb: yeah, but we're not there yet i don't think23:07
openstackgerritMerged openstack-infra/shade master: Add method to set bootable flag on volumes  https://review.openstack.org/50247923:07
clarkbgotcha23:07
ianwoh with a -VVVV23:07
openstackgerritMerged openstack-infra/shade master: Allow domain_id for roles  https://review.openstack.org/49699223:07
openstackgerritMerged openstack-infra/shade master: Move role normalization to normalize.py  https://review.openstack.org/50017023:07
ianw*hopefully* it does something other than peg at 100% cpu saying nothing23:07
jeblairokay, i will copy the db files from the local disk into afs, delete *.old (i still have them on the local disk), then run that.23:08
jeblairianw: reprepro --confdir /etc/reprepro/ubuntu -VVVV update23:10
jeblairlook right?23:10
ianwyep23:10
*** wolverineav has quit IRC23:10
*** wolverineav has joined #openstack-infra23:10
mordredpabelanger, Shrews: remote:   https://review.openstack.org/511643 Release 1.24.0 of shade ... patch submitted to cut new release of shade23:11
jeblairgood news!  it did not hang.23:12
*** bobh has joined #openstack-infra23:12
jeblairthe bad news: File "pool/main/c/ceph/librbd1-dbg_0.80.11-0ubuntu1.14.04.3_amd64.deb" is already registered with different checksums!23:12
mordredjeblair: grumble23:12
clarkbthat is a neat trick23:12
ianwhaha, i knew it was a Farnsworth "good news, everybody!"23:13
jeblairhttp://paste.openstack.org/show/623503/23:14
*** rwsu has joined #openstack-infra23:14
jeblairthe "expected" values match what's in afs23:15
clarkbpabelanger: first recheck of that ssh debugging change didn't fail, trying again23:15
*** hongbin has quit IRC23:15
dmsimardclarkb: did you get your answer for base-test ?23:15
dmsimardclarkb: I figure we could just create an adhoc no-op job based off of base-test if there wasn't any.23:16
clarkbdmsimard: ya we need to push a change to eg ozj once things merge to rebase to base-test23:16
clarkbjeblair: that imply the got: side is what is already registered?23:16
jeblairclarkb: ya... i'm trying to extract a text version of the checksum db to examine23:17
*** aeng has joined #openstack-infra23:18
jeblairpool/main/c/ceph/librbd1-dbg_0.80.11-0ubuntu1.14.04.3_amd64.deb :1:1c15ec44003064eb9e664462f764e98aa5e9d36c :2:e4ca9514867498531f1feea0d081c1d3df8d91d9b8bc0f353315e9a9a362e2a2 d7159bf89cc9df87ba64db43c7d8bd1a 166987323:18
ianwthe sizes aren't even close?23:18
jeblairthat also seems to match "expected"23:19
jeblairwhat does "got" mean?23:19
jeblairfrom whence was it "got"?23:19
clarkbdmsimard: the update to base-test is in the gate now, so once that merges we just push a chang eto ozj or whatever to use base-test in some test (probably a test that runs against ozj)23:19
*** felipemonteiro has joined #openstack-infra23:19
clarkbjeblair: perhaps got is what the upstream mirror gave it?23:19
dmsimardclarkb: right, I won't be at a keyboard for a while. If we want to test it soon, someone else can do it.23:20
clarkbdmsimard: ok I'll push one up then23:20
jeblairwhat is our upstream?23:20
dmsimardclarkb: thanks!23:20
ianwI think it means "I read the info from disk and got this value, so that's what i expect, but the database told me this other value"?23:20
jeblairthe lines before that were23:21
jeblair  processing updates for 'trusty-security|main|amd64'23:21
jeblair  reading '/afs/.openstack.org/mirror/ubuntu/lists/ubuntu-security_trusty-security_main_amd64_Packages'23:21
jeblairso was 'got' from that file?23:21
jeblairyes23:22
jeblairthe values in that file match "got"23:22
openstackgerritMerged openstack-infra/project-config master: Test ARA report generation only on failure in base-test  https://review.openstack.org/51162423:22
*** Apoorva has joined #openstack-infra23:23
ianwjeblair: it seems to be the wrong size ... see one i download from upstream in /tmp23:23
jeblairianw: yeah, that matches the index23:23
jeblairso the file we have in our archive is wrong23:23
ianwso maybe we just do this over and over, copying in the wrong stuff?23:23
pabelangermordred: I really like the idea of an abandon pipeline, that's a great idea.23:24
jeblairor if we remove that file, and remove its entry from checksums.db, will it re-download it and add it?23:24
openstackgerritClark Boylan proposed openstack-infra/openstack-zuul-jobs master: Reparent ozj integration jobs to base-test for testing  https://review.openstack.org/51164623:24
ianwit's  probably a lot easier to just wget it in than fiddle the db?23:24
clarkbthere is the ara test I think23:24
tonybBefoer I write one is the a tool that will take a repo name and grovel around in the zuul(v2) config data and list all the jobs that will be run? (bonus points if it can check for branch exclusions)23:24
jeblairianw: i think if we wget it, we still have to fiddle the db to add the newly correct entry to checksums23:25
fungitonyb: not only project and target branch but also changed files in the diff can determine which jobs will run23:25
jeblairianw: there are commands to both remove a single entry from checsums, as well as add/update one.23:25
pabelangerclarkb: cool, I'll add some rechecks to that too. We might also want to setup autohold of the job too23:26
fungitonyb: as well as the pipeline into which the ref is enqueued23:26
pabelangerand caught up on backscroll23:26
ianwjeblair: should we just delete the file and rerun the update, and see if it just downloads it?23:26
tonybfungi: That is true but for my use case today I don't think that matters23:26
ianwmaybe it can recover from that23:26
ianwif not, move on to replacing23:27
tonybfungi: Hmm I will need to consider the pipeline23:27
jeblairianw: i'm pretty sure we need a db fiddle either way, cause i *think* what's happening here is comparision of checksums.db with package list.  i don't know if it's going out to actual files at all.23:27
jeblairianw: so i think we should either remove the file and remove the checksums.db entry; or replace the file and replace the checksums.db entry.23:27
pabelangermordred: 511643 has some errors23:28
jeblairianw: i'm hopeful that if we did the second thing, it would auto-correct.  i have no basis other than hope for that though.23:28
EmilienMI know you're very busy but if someone can ping me when release-tarball jobs are kick-off again, thanks a lot23:28
ianwjeblair: i'm just hoping it stat()s the file or something (i have no idea).  i think start simple, remove the file from disk and try update, see what happens23:28
ianwjust reading up on the checksum remove cmds now23:29
*** Swami has quit IRC23:29
clarkbianw: not to completely distrct you from the ubuntu mirror but do you know if there is a fix for http://logs.openstack.org/46/511646/1/infra-check/base-integration-centos-7/910a17b/job-output.txt.gz#_2017-10-12_23_27_37_155946 pushed up yet?23:29
jeblairianw: okay, i'm happy to try 1) remove file; rerun.  then if that fails, 2) also remove checksum; rerun.23:29
mordredpabelanger: so it does :)23:29
*** caphrim007 has joined #openstack-infra23:29
clarkbianw: need to check ansible_default_ipv6 is defined instead of ansible_default_ipv6.address is defined23:29
*** aeng has quit IRC23:30
jeblairianw: i'll wait for you to finish reading before i execute.23:30
fungiEmilienM: they should be safe to run now. i know there were several tripleo releases i need to reenqueue but was hoping someone on the release team could put together a list of all releases that need reenqueuing besides those so i can do them all in one batch (per my e-mail to the dev list)23:30
ianwjeblair: ++ on that plan23:30
jeblairtonyb: what's the use case (curiosity)23:30
*** caphrim007_ has joined #openstack-infra23:30
ianwclarkb: ahh, no i haven't.  does that explain the occasional errors?23:30
clarkbianw: I think so23:30
clarkbianw: should I go ahead and push a patc or do you want to?23:30
ianwthat would be nice, i wasn't looking forward to debugging that23:30
ianwi can, let me pull it up23:31
tonybjeblair: I want a list of all the jobs (and nodes) that tripleo run to help understand the impact of keeping stable/newaton around for longer23:31
ianwclarkb: that's weird though, i thought that variable was always defined, and just blank23:31
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Let v2 publish shade releases again  https://review.openstack.org/51164923:31
mordredpabelanger: ^^23:31
ianwi couldn't find it documented though, maybe i didn't look hard enough23:31
clarkbianw: seems to imply its not defined if no ipv623:31
tonybjeblair: if it were 1 or 2 repos I'd just do it by hand but ...23:31
mordredpabelanger: we need to re-enable publish-to-pypi for shade in v2 :)23:32
pabelangerclarkb: I've added auto-hold for gate-infra-puppet-apply-3-centos-7 on nodepool.o.o23:32
jeblairtonyb: gotcha.  fwiw, i expect us to have a rest api in zuulv3 in a couple of months that would help with this sort of thing.23:32
tonybjeblair: \o/23:32
pabelangermordred: ack23:32
clarkbtonyb: globally tripleo was ~1/3 of all jobs run when I last checked23:32
mordredclarkb, jeblair, ianw: if you have a sec - https://review.openstack.org/511649 is needed for  us to cut a shade release23:32
jeblairianw: no change after rming the file23:32
pabelangermordred: do we need to disable in zuulv3?23:33
jeblairianw: will proceed to checksums.db surgery23:33
mordredpabelanger: no - we have those pipelines disabled in v3 anyway23:33
tonybclarkb: Wow23:33
clarkbtonyb: er not jobs run23:33
clarkbtonyb: sorry it was a cpu time calculation23:33
clarkbtriple owas 1/3 of all cpu usage23:33
tonybclarkb: Ahh okay.  that's less shocking ;P23:34
pabelangermordred: kk23:34
*** caphrim007 has quit IRC23:34
ianwclarkb: do you know if ansible is a "&&" or a "&" ?   i.e. is "- ansible_default_ipv6 is defined" then "- ansible_default_ipv6.address is defined" going to bail too?23:34
openstackgerritLin Yang proposed openstack-infra/project-config master: Add OpenStack client check to python-rsdclient  https://review.openstack.org/51165023:35
ianwcause i'm sure i saw in the no routable address that being blank23:35
clarkbianw: I'm not sure if it will short circuit23:35
clarkbdmsimard: pabelanger mordred ^ do you know?23:35
jeblairianw: it is now doing more things.23:35
jeblairlooks like it's actually downloading package files.23:36
ianw\o/23:36
pabelangerclarkb: looking23:36
jeblairreprepro --confdir /etc/reprepro/ubuntu -VVVV _forget pool/main/c/ceph/librbd1-dbg_0.80.11-0ubuntu1.14.04.3_amd64.deb23:36
jeblairwas the command i ran to drop the checksums.db entry for that, btw23:36
*** thorst has joined #openstack-infra23:37
dmsimardianw: I don't understand the question, probably missing context23:37
ianw  when:23:37
ianw    - ansible_default_ipv6 is defined23:37
ianw    - ansible_default_ipv6.address is defined23:37
ianwdmsimard: ^ does that work when ansible_default_ipv6 is not defined at all is basically the question23:37
dmsimardIf you're only interested in the second condition, it should work by itself without the first one but I'd test it first to make sure23:38
clarkbdmsimard: we think the first is required because of http://logs.openstack.org/46/511646/1/infra-check/base-integration-centos-7/910a17b/job-output.txt.gz#_2017-10-12_23_27_37_15594623:38
ianwdmsimard: emperical evidence shows it doesn't, but i agree i thought it would too :)23:39
*** bobh has quit IRC23:39
mnaserbut is it possible that ansible_default_ipv6 is defined but without and address?23:39
ianwmnaser: i'm pretty sure i saw that with ipv6 but no routable address23:39
pabelangerianw: clarkb: we might want to use nodepool.public_ipv623:39
pabelangerwhich we setup in inventory with zuul23:40
clarkbmnaser: the log message is "ansible_default_ipv6' is undefined"23:40
clarkbmnaser: implying its the root var that does not exist23:40
pabelangerthen you can when: nodepool.public_ipv623:40
dmsimardianw, clarkb: I have a sandbox on my laptop just to keep testing this sort of junk with conditionals and other things. It's never straightforward :(23:40
pabelangerhttp://logs.openstack.org/69/511069/1/infra-check/project-config-nodepool/d1f74c9/zuul-info/host-info.ubuntu-xenial.yaml23:40
pabelanger  ansible_default_ipv6: {}23:41
mnaserclarkb: which is why i'm saying if its possible that "ansible_default_ipv6" is defined with "ansible_default_ipv6.address" undefined.   if that's the case, the second conditional can be dropped23:41
ianwpabelanger: that's cool ... i mean this "default_ipv6" *should* be exactly what we want to express, as it seems to only put a routable ipv6 in there23:41
mnaserbut i guess pabelanger just confirmed it can be23:41
openstackgerritMerged openstack-infra/project-config master: Let v2 publish shade releases again  https://review.openstack.org/51164923:41
pabelangerso, need to check if ansible_default_ipv6 is empty23:42
pabelangerwhich means, no ipv623:42
clarkbI think the two checks ianw has above should cover all cases23:42
pabelangeror, nodepool.public_ipv623:42
pabelangerwhich is str23:42
*** thorst has quit IRC23:42
clarkbits just a matter of knowing if ansible short circuits or not23:42
clarkbI guess we might also want to check that the address is not empty23:42
mnaserand until the new release on ansible changes the behaviour too (ha, ha, ha :-P)23:42
ianwclarkb: such confusion!  i think i will propose an update to the ansible doc page when we figure this out23:43
clarkbmnaser: indeed23:43
*** aeng has joined #openstack-infra23:43
* mnaser goes back to getting pdfs to accountant who is unable to unzip an archived file23:43
ianwmnaser: just fax them23:44
mnaserso his specific request: please attach every single pdf into the email without a zip file, because that stuff is complicated ..23:44
clarkbfungi: is the undercloud/tmp delete done? I'm not seeing it in ps23:45
pabelangerclarkb: ianw: you can use with_dict: ansible_default_ipv623:45
pabelangerthen {{ item.address }}23:45
pabelangerand should do the right thing23:45
*** markvoelker has quit IRC23:46
*** gouthamr has joined #openstack-infra23:46
ianwand check if that's defined?23:46
pabelangerya, or | default({})23:46
openstackgerritMerged openstack-infra/system-config master: Remove npm / rubygem crontab entries  https://review.openstack.org/47391123:46
*** bobh has joined #openstack-infra23:47
EmilienMfungi: ack, thx for the update23:48
*** tosky has quit IRC23:49
fungiclarkb: still running. it's in a window of the root screen session there23:49
fungii've cycled to that window now, you should see it if you attach23:50
clarkbfungi: thanks23:50
clarkbfungi: that is the log archive maintenance script?23:51
funginope23:51
*** wolverineav has quit IRC23:51
fungiwhen you pointed out there was a missing subdirectory level, i went back to the original set of three patterns to delete and added the addional */23:51
ianwwhy did i start looking : https://github.com/ansible/ansible/issues/2367523:51
clarkbapparently screen -x maintains different windows on different attaches23:51
fungioh neat23:51
fungiwell, anyway, it's one of the three windows under that root screen session23:52
fungithe other two are the v3 log deletion and the 2-week expiration23:53
clarkbyup I see it now23:53
jeblairianw: reprepro finished23:54
jeblairthe file i deletex exists now and is the correct size23:55
ianwyay team!23:55
jeblair818 files lost their last reference.23:55
jeblair(dumpunreferenced lists such files, use deleteunreferenced to delete them.)23:55
pabelangergreat work23:55
jeblairokay, what do we want to do next?23:55
jeblairclarkb: i think you suggested that we switch one or more mirrors to serving from the rw volume, yeah?23:56
jeblairthat's basically a single character apache config change23:56
clarkbjeblair: ya and then make sure that jobs are happy wit hthe mirror before we commit to it via vos release23:56
pabelangerwe should increase timeout kill value too, I don't thing 30m is enough time23:56
pabelangermaybe make it 9023:56
clarkbjeblair: ianw do we want to consider rerunning reprepro again and see it mostly noop?23:57
jeblairpabelanger: can you write that change?  let's get it merged before we turn cron back on23:57
ianwi would, run the full /usr/local/bin script23:57
pabelangersure23:57
jeblairianw: that will do the vos release which i don't want to do just yet23:57
clarkbfungi: looking at ps I'm worried that that rm has been spending all of its time globbing things?23:57
jeblairianw: but i can do the rest of the reprepro steps there23:57
ianwoh, yeah, with that commented out, and maybe the timeout commented out to23:57
jeblairianw: so maybe deleteunreferenced next?23:57
clarkbfungi: we might need to run that through find too instead?23:58
clarkbfungi: strace seems to agree that its just sitting there for the most part23:58
ianwjeblair: i think so, since better to know what happens now than when cron hits it23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!