Friday, 2018-11-30

*** flaper87 has quit IRC00:09
*** flaper87 has joined #openstack-tc00:14
*** tosky has quit IRC00:27
clarkbOk I've got to run now. Please do ping if others are interested in discussing more. I think it is an important thing we want to sort out00:49
*** dklyle has joined #openstack-tc01:59
*** dklyle has quit IRC02:05
*** mriedem_afk has quit IRC02:19
*** ricolin has joined #openstack-tc03:02
*** dklyle has joined #openstack-tc03:11
*** whoami-rajat has joined #openstack-tc03:12
*** dklyle has quit IRC03:17
*** diablo_rojo has quit IRC06:12
*** e0ne has joined #openstack-tc06:32
*** flaper87 has quit IRC06:32
*** e0ne has quit IRC07:31
*** Luzi has joined #openstack-tc07:38
*** tosky has joined #openstack-tc08:42
*** jpich has joined #openstack-tc08:53
*** e0ne has joined #openstack-tc09:27
*** ricolin has quit IRC10:44
*** cdent has joined #openstack-tc10:59
*** dtantsur|mtg is now known as dtantsur|afk11:00
*** cdent has quit IRC11:20
*** cdent has joined #openstack-tc11:46
*** Luzi has quit IRC12:00
*** jaypipes is now known as leakypipes12:44
*** e0ne has quit IRC12:54
*** cdent has quit IRC13:37
*** e0ne has joined #openstack-tc13:37
*** whoami-rajat has quit IRC13:49
*** EmilienM is now known as EvilienM13:58
openstackgerritSean McGinnis proposed openstack/governance master: Add stable:follows-policy for cinder deliverables  https://review.openstack.org/62116414:11
*** mriedem has joined #openstack-tc14:19
*** jamesmcarthur has joined #openstack-tc14:24
*** cdent has joined #openstack-tc14:32
*** lbragstad is now known as elbragstad14:36
*** whoami-rajat has joined #openstack-tc14:38
dhellmannclarkb : good topic, and thanks for not waiting for office hours to raise it15:01
dhellmannI'd like to include "gate stability" or "quality" somehow as a goal, but I'm struggling to come up with a way to quantify it in a per-team way so we can measure progress15:03
dhellmannI'm not sure asking a specific group of people to dedicate their time to debugging the issues is the right approach. Where would we find those people?15:04
dhellmannI do like the approach of incentivizing everyone to make their tests reliable by "rewarding" stable jobs with priority15:04
dhellmannthe implementation details there may be tricky15:05
cdentthat this [t F3u] was true in the past it part of why we have trouble now: we hope/think it is going to be other people that fix it. having it rotate and/or be "part time" is a nice idea but the amount of experience an expertise to do so is large, sadly15:06
purplerbot<clarkb> one (admittedly less practical idea) I had was to encourage a sort of "sdague/jogo/mtreinish" rotation. Basically have a group of people that can take on the tasks they did in the past, but be explicit that it shouldn't be a full time thing to help avoid burn out but also ensure more than one person knows what to do [2018-11-29 23:03:51.525961] [n F3u]15:06
cdentI recently put that word out internally that this specifically is a critical area and there were some warm rumblings in response, but I don't know if it will turn into anything real15:07
dhellmannwe need to design the system so we don't need heroes to keep it running15:09
smcginnis++15:09
cdentyes15:13
cdentheroes are rare (and bad for health). When there are many people they are easier to find.15:14
*** jamesmcarthur has quit IRC15:33
fungitime to apply all that behavioral psychology i learned at university, i guess15:35
fungiwe can give users a lever that dispenses food pellets. also electrifying the cage floor is probably a viable tactic15:36
* cdent re-reads walden two15:37
dhellmannheh15:38
ttxOH: "Heroes are bad for health"15:43
*** dansmith is now known as SteelyDan15:43
dimsttx : LOL15:45
*** jamesmcarthur has joined #openstack-tc15:46
*** jamesmcarthur has quit IRC15:49
*** jamesmcarthur has joined #openstack-tc15:49
*** mriedem has quit IRC15:50
mnaserfor example, it took me probably 20 minutes today to find out we were uselessly creating swap in OSA jobs because we didn't use any of it and i took 15-20 minutes to find out and push a fix to disable that behaviour15:54
mnaserclarkb / infra-core: do we have stats on the number of always-failing non voting jobs?15:54
mnaseri feel like those contribute a lot.15:54
cdentyeah, good point15:54
fungiwhat was the savings in job runtime from not creating a swapfile?15:54
mnaserfungi: on ovh, 15-18 minutes15:56
fungiwow15:56
mnaseri dont know if this was a one-off15:56
fungii guess it wasn't being created sparse15:56
cdentthat much? that's rather surprising15:56
mnaserwe cant do sparse on certain os like centos 715:57
cdentit makes it seem like $stuff is _way_ oversubscribed15:57
fungisparse swapfiles should be nearly instantaneous to create, but you risk crashing the node hard if you use all the disk15:57
mnaserhttp://logs.openstack.org/36/619636/1/gate/openstack-ansible-deploy-aio_metal-ubuntu-bionic/72c540f/logs/ara-report/result/f6ed9f8a-419a-41b8-8d81-19d6e5aac6cc/15:57
mnasersparse swapfiles dont work on xfs (which is centos-7)15:58
fungion the other hand, if the job does use more memory than anticipated, without swap you'll be unable to debug it when the oom killer sacrifices something which makes the node no longer accessible15:58
mnaserfungi: i went over some of our numbers over successful jobs and we're far away from swapping15:58
fungiso there are always trade-offs15:58
mnaserlike, some 4gb away from swapping..15:58
mnaserthe other thing im struggling with right now with my ptl hat on is15:59
fungibut yeah, if you don't use most of the ram and we're using platforms which don't support sparse swap (or you need the additional filesystem space) then dropping it is certainlg a good call15:59
mnasercontributions to implement things that need CI resources which are then not maintained by those who push them15:59
fungisure. in general "contributions to implement things [...] which are then not maintained by those who push them" has been a perpetual problem in openstack16:00
mnaserthe thing that bothers me is things like this16:01
mnaserhttp://zuul.openstack.org/builds?job_name=openstack-ansible-deploy-aio_distro_lxc-opensuse-150&job_name=openstack-ansible-deploy-aio_distro_lxc-opensuse-423&branch=master&branch=openstack-ansible-deploy-distro_ceph-opensuse-423&branch=openstack-ansible-deploy-distro_ceph-opensuse-15016:01
mnaserthat's a lot of wasted CI resources16:01
mnaserand i'm really just wondering if we should come up with some policy that says if a job is non-voting and failing for N period of time, it will be removed.16:02
* cdent is still stuck on it taking so long to do a dd?16:02
fungicdent: slow disk16:02
cmurphymnaser: there's nothing stopping you from creating that policy for your project16:02
fungimnaser: we've certainly done that from time to time, but yes maybe a policy within openstack would be good there16:02
cdentfungi: isn't that something that ought to be investigated too16:03
cdentI have felt (since my dawn of openstack) that we are constantly in a state of oversubscription and it is _that_ which causes us so much pain16:03
fungicdent: yes, it's something we can bring to the attention of the service provider, but i think they have us on cheaper storage there by choice16:03
mnasercdent: hardware is expensive16:04
mnaserno one is writing a check for that hardware :)16:05
mnaserso there isn't exactly an SLA for donated infrastructure16:05
mnasercmurphy: fungi that's true, but i would be more comfortable if it was an openstack-y thing rather than grumpy-mo-keeps-seeing-failing-jobs-and-has-no-time-to-fix-them-so-he-removed-them16:06
cmurphyit's not i'm-grumpy-so-i-removed-them it's "Our team's policy is to only keep running jobs that are consistently useful"16:07
openstackgerritLance Bragstad proposed openstack/governance master: Update charter to include PTL appointment  https://review.openstack.org/62092816:14
*** mriedem has joined #openstack-tc16:21
fungimnaser: out of curiosity, did you happen to notice whether the slow swap creation was happening only in one of the two ovh regions? i've been trying to narrow down why we have 20x as many job timeouts in one as in the other even when we ran for nearly a week with the same max-servers in both16:24
mnaserfungi: i have not dug in that far into it to be honest16:25
fungiif filesystem access is waaay slower in one of them than the other, that could certainly explain it16:25
fungino worries, that gives me something to test next16:25
mnaseryep, that could be a helpful next step16:25
*** jamesmcarthur has quit IRC16:38
*** jamesmcarthur has joined #openstack-tc16:42
clarkbmnaser: fwiw I think that quality and efficiency aren't exactly the same thing here. Yes we are inefficient, but separately we also seem to have regressions in quality which impact efficiency. Not running jobs that always fail will address efficiency positively and potentially quality negatively (beacuse those jobs should pass if they test something useful)16:43
*** whoami-rajat has quit IRC16:48
*** whoami-rajat has joined #openstack-tc16:55
*** jpich has quit IRC17:00
*** mriedem is now known as mriedem_lunch17:11
fungispeaking of centos, looks like rhel 8 *will* still include python 2.7? https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8-beta/html/8.0_beta_release_notes/new-features#web_servers_databases_dynamic_languages_217:11
fungi"Python 2.7 is available in the python2 package. However, Python 2 will have a shorter life cycle and its aim is to facilitate smoother transition to Python 3 for customers."17:12
*** weshay is now known as he_hates_me17:35
*** he_hates_me is now known as weshay17:36
*** e0ne has quit IRC17:45
*** openstackgerrit has quit IRC17:51
*** jamesmcarthur has quit IRC18:15
clarkbfwiw I don't think a group of individuals should be the only people that care/act on quality concerns. But I also don't really see any change in behavior without something setting an example for others18:16
scasa rant on software quality from yesteryear is still relevant today https://queue.acm.org/detail.cfm?id=234925718:19
clarkbanother approach may be to set an expectation that teams have an "at least triage, but fixing is even better" day or week each milestone18:28
clarkband don't prescribe activity directly. But instead use that as a reminder that we care about this stuff.18:28
*** diablo_rojo has joined #openstack-tc18:29
elbragstadclarkb ++18:31
*** jamesmcarthur has joined #openstack-tc18:31
clarkbI think in theory we've used the feature freeze and RC period for this sort of work, but it is hard to tell how effective that is as all the release projects get very quiet and the deployment project gets very busy18:32
clarkb(so as an outsider I don't have enough insight to know if those set aside periods are useful for this task)18:33
*** logan- has joined #openstack-tc18:36
scasa bugbusting event does work in other long-lived open source projects, but it's the coordination that's always the unknown unknown18:44
*** openstackgerrit has joined #openstack-tc19:10
openstackgerritDoug Hellmann proposed openstack/governance master: clean up readme  https://review.openstack.org/62127019:10
*** mriedem_lunch is now known as mriedem19:13
*** jamesmcarthur has quit IRC19:20
*** jamesmcarthur has joined #openstack-tc19:21
*** jamesmcarthur has quit IRC19:28
*** jamesmcarthur has joined #openstack-tc19:28
*** jamesmcarthur_ has joined #openstack-tc19:29
*** jamesmcarthur has quit IRC19:33
openstackgerritDoug Hellmann proposed openstack/governance master: add board working group data handling  https://review.openstack.org/62127719:42
*** whoami-rajat has quit IRC20:08
*** jamesmcarthur_ has quit IRC20:14
*** jamesmcarthur has joined #openstack-tc20:15
*** jamesmcarthur has quit IRC20:20
openstackgerritJeremy Stanley proposed openstack/project-team-guide master: Document use of the openstack-discuss mailing list  https://review.openstack.org/62128420:38
fungitrying to untangle the mentions of mailing lists in the governance-sigs repo, and having a hard time separating ideas people had about how sigs were going to work from how things actually shook out. for example, the bi-weekly newsletter/summary etherpad seems to have never actually been touched and i don't remember a single one ever going to any mailing list20:47
fungimrhillsman: ttx: i think https://git.openstack.org/cgit/openstack/governance-sigs/tree/doc/source/index.rst#n55 might be due for removal from that document?20:48
fungidhellmann: ^20:48
fungihappy to just rip that out while i'm making other edits20:48
*** openstackgerrit has quit IRC20:50
mrhillsman++20:50
fungiseems it got overly-specific about process which wasn't actually in use yet20:52
*** mriedem has quit IRC22:54
*** cdent has quit IRC23:07
*** tosky has quit IRC23:42

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!