Tuesday, 2020-11-03

*** lseki has joined #openstack-infra00:00
*** tosky has quit IRC00:26
*** yamamoto has joined #openstack-infra00:32
*** yamamoto has quit IRC00:39
*** yamamoto has joined #openstack-infra00:48
*** yamamoto has quit IRC00:48
*** yamamoto has joined #openstack-infra00:49
*** mmethot_ has quit IRC00:51
*** yamamoto has quit IRC00:54
*** ramishra has joined #openstack-infra00:55
openstackgerritwu.shiming proposed openstack/openstack-zuul-jobs master: Fix hacking min version to 3.2.0  https://review.opendev.org/76074600:59
*** zzzeek has quit IRC01:13
*** zzzeek has joined #openstack-infra01:16
*** lbragstad has joined #openstack-infra01:37
*** yamamoto has joined #openstack-infra01:37
*** yamamoto has quit IRC01:39
*** yamamoto has joined #openstack-infra01:40
*** hamalq has quit IRC01:41
*** yamamoto has quit IRC01:41
*** yamamoto has joined #openstack-infra01:42
*** yamamoto has quit IRC01:50
*** lbragstad has quit IRC02:13
*** lbragstad has joined #openstack-infra02:40
*** kaisers has quit IRC02:48
*** rcernin has quit IRC02:50
*** ianychoi has quit IRC03:00
*** rcernin has joined #openstack-infra03:00
*** rcernin has quit IRC03:07
*** rcernin has joined #openstack-infra03:20
*** ianychoi has joined #openstack-infra03:21
*** rcernin has quit IRC03:25
*** lbragstad has quit IRC03:27
*** rcernin has joined #openstack-infra03:31
*** lbragstad has joined #openstack-infra03:33
*** ociuhandu has joined #openstack-infra03:40
*** rcernin has quit IRC03:40
*** rcernin has joined #openstack-infra03:40
*** ociuhandu has quit IRC03:49
*** auristor has quit IRC03:49
*** auristor has joined #openstack-infra03:53
*** ykarel|away has joined #openstack-infra04:43
*** lbragstad has quit IRC04:44
*** mtreinish has quit IRC04:44
*** ykarel|away is now known as ykarel04:47
*** evrardjp has quit IRC05:33
*** evrardjp has joined #openstack-infra05:33
*** matt_kosut has joined #openstack-infra05:53
*** ysandeep|away is now known as ysandeep05:53
*** ysandeep is now known as ysandeep|ruck05:53
*** lmiccini has joined #openstack-infra06:35
*** rcernin has quit IRC06:45
*** gnuoy has joined #openstack-infra06:48
*** ramishra has quit IRC07:14
*** vishalmanchanda has joined #openstack-infra07:22
*** dklyle has quit IRC07:28
*** ramishra has joined #openstack-infra07:37
*** ralonsoh has joined #openstack-infra07:41
*** eolivare has joined #openstack-infra07:42
*** zzzeek has quit IRC07:55
*** zzzeek has joined #openstack-infra07:56
*** slaweq has joined #openstack-infra08:01
*** hashar has joined #openstack-infra08:03
*** jcapitao has joined #openstack-infra08:05
*** sboyron has joined #openstack-infra08:11
*** andrewbonney has joined #openstack-infra08:14
*** rpittau|afk is now known as rpittau08:19
*** gfidente has joined #openstack-infra08:35
*** lpetrut has joined #openstack-infra08:36
*** tosky has joined #openstack-infra08:40
*** ykarel has quit IRC08:42
*** matbu has quit IRC08:51
*** matbu has joined #openstack-infra08:52
*** jpena|off is now known as jpena08:56
*** ysandeep|ruck is now known as ysandeep|lunch08:58
*** lyarwood has quit IRC09:01
*** lucasagomes has joined #openstack-infra09:04
*** sboyron has quit IRC09:12
*** sboyron has joined #openstack-infra09:12
*** ociuhandu has joined #openstack-infra09:19
*** dtantsur|afk is now known as dtantsur09:35
*** derekh has joined #openstack-infra09:37
*** nightmare_unreal has joined #openstack-infra09:41
Tenguhello there09:52
Tengufyi there are apparently CI jobes failing with oom-killer on a regulare basis. Not sure if it has to be reported here though... ?09:53
*** sshnaidm|afk is now known as sshnaidm|rover09:56
*** ysandeep|lunch is now known as ysandeep|ruck10:02
*** ykarel has joined #openstack-infra10:06
fricklerinfra-root: ^^ see https://review.opendev.org/757488 which seems to fix this, do we want to globally revert the swapsize to 8G?10:33
*** d34dh0r53 has quit IRC10:50
*** dchen has quit IRC10:51
*** soniya29 is now known as soniya29|afk11:11
*** dtantsur is now known as dtantsur|bbl11:16
*** jcapitao is now known as jcapitao_lunch11:36
openstackgerritJens Harbott (frickler) proposed openstack/openstack-zuul-jobs master: Set default swap size back to 8G  https://review.opendev.org/76111911:43
fricklerTengu: ^^ FYI11:44
*** mtreinish has joined #openstack-infra11:47
*** rfolco has joined #openstack-infra12:00
Tengufrickler: ah, thanks. So I should update my depends-on then12:09
Tengufrickler: and moving to 8gb makes the two jobs pass in my review!12:09
*** janders has joined #openstack-infra12:16
jandersI'm having permissions issues with my gerrit account, I suspect it has to do with the cleanup after the security issue a couple weeks back. Do you happen to know how to get some assistance with this? Thanks in advance.12:17
*** eolivare has quit IRC12:19
fricklerTengu: I don't think a dependency will work on this patch since it is a config project12:23
*** rlandy has joined #openstack-infra12:26
*** jcapitao_lunch is now known as jcapitao12:27
*** jpena is now known as jpena|lunch12:32
Tengujanders: did you update to fedora-33? if so, it might also be due to some new policies being enforced in ssh client12:33
jandersTengu yes!12:33
jandersTengu setenforce 0 and retest?12:33
Tengujanders: nope, not related to selinux12:33
Tenguwait a second12:33
jandersTengu ACK12:34
Tengujanders: I didn't upgrade to fc-33, but there's a thing that changed in /etc/ssh/ssh_config.d/50-redhat.conf12:34
Tengujanders: as a quick workaround, you can comment out the "Include /etc/ssh/ssh_config.d/*.conf" in /etc/ssh/ssh_config file12:34
Tenguno restart is needed, since it's only for the ssh client, that re-reads this file upon each invocation.12:35
Tengunot sure about the proper correction - maybe it's related to the ssh key.... as said, I didn't upgrade yet.12:35
jandersTengu trying this now12:36
Tengujanders: is it better?12:37
jandersTengu no change12:37
Tenguoh? err... weird.12:38
Tengujanders: https://www.reddit.com/r/Fedora/comments/jhxbdh/no_ssh_public_key_auth_after_upgrade_to_fedora_33/12:38
Tengujust found that one.12:38
Tengumight be better.12:38
ykarelTengu, you seen https://fedoraproject.org/wiki/Changes/StrongCryptoSettings2 ?12:42
ykarelif things not work, ^ suggests update-crypto-policies --set DEFAULT:FEDORA3212:42
ykarelor update-crypto-policies --set LEGACY12:42
Tenguykarel: one of the proposal in the reddit link points to that update-crypto-policies12:43
Tenguguess I should be good with my key(s) here.12:44
Tengubut it will become funky once I update my private infra. Time to re-key a bit.12:44
ykarelTengu, ack i have not faced the issue but someone shared above link related to above issue so shared here12:45
Tenguykarel: thanks for sharing :). as said - I'm still on fc-3212:46
*** lbragstad has joined #openstack-infra12:53
*** dtantsur|bbl is now known as dtantsur12:54
jandersthank you for your assistance frickler Tengu ykarel I'm back in business12:55
jandersit was some weirdness with ssh ports12:55
fricklerjanders: great to hear that, let us know if you see any further issues12:58
jandersfrickler thank you, will do!12:58
*** eolivare has joined #openstack-infra12:59
*** __ministry1 has joined #openstack-infra13:01
*** lpetrut has quit IRC13:02
*** lbragstad has quit IRC13:06
*** lbragstad has joined #openstack-infra13:09
*** __ministry1 has quit IRC13:21
*** __ministry1 has joined #openstack-infra13:21
*** derekh has quit IRC13:27
*** mmethot has joined #openstack-infra13:31
*** jpena|lunch is now known as jpena13:32
*** raukadah is now known as chandankumar13:32
*** __ministry1 has quit IRC13:38
*** __ministry1 has joined #openstack-infra13:38
*** slaweq has quit IRC13:42
*** __ministry1 has quit IRC13:46
*** __ministry1 has joined #openstack-infra13:46
*** slaweq has joined #openstack-infra13:47
Tengufrickler: fyi the depends-on is working :).13:49
Tengulet's get your patch merged now.13:49
*** __ministry1 has quit IRC13:49
*** __ministry1 has joined #openstack-infra13:51
*** __ministry1 has quit IRC13:52
*** __ministry1 has joined #openstack-infra13:59
*** __ministry1 has quit IRC14:01
*** __ministry1 has joined #openstack-infra14:02
*** derekh has joined #openstack-infra14:05
*** __ministry1 has quit IRC14:06
*** d34dh0r53 has joined #openstack-infra14:10
*** jamesdenton has joined #openstack-infra14:17
*** Goneri has joined #openstack-infra14:36
*** ykarel is now known as ykarel|away14:44
*** lmiccini has quit IRC14:45
*** ysandeep|ruck is now known as ysandeep|away14:55
*** lmiccini has joined #openstack-infra14:55
*** wolsen has quit IRC15:00
*** mordred has quit IRC15:00
*** kwazar has joined #openstack-infra15:05
*** ykarel|away has quit IRC15:14
fungifrickler: Tengu: increasing the swap space (unless we go back to sparse fallocate which i thought breaks on newer kernels?) is going to break other jobs which rely on that available disk. we've already had folks run into the 1gb decrease from the current swap dd causing job failures in providers with limited disk15:18
fungiyou know you can adjust how much swap devstack creates in your job configuration, right?15:18
fungiit seems to me that any build trying to use >1gb swap is in for a rough time for other reasons, and probably means severe slowdowns occupying nodes out of our quota for far longer than necessary15:20
clarkbalso 8gb of swap will take mich longer to allocate. I've pointed out periodically for years bow that openstack spftware and jobs could really use a diet. Many of them over deploy services and memory use has skyrocketed since the beginning of the project15:20
Tengufungi: 1gb of swap is kind of similar to nothing.... ? Frankly, I don't care - a backport is blocked due to resource limitations, and it's a bit annoying. Passing to 8gb makes it pass, and that swap thing is NOT realted to the change itself.15:21
Tengumaybe we can cut that thing in two and push 4gb instead?15:22
Tengua bit less than 8, but a bit more than 1, and everyone will be happy?15:22
clarkbyou can set it on your job is what fungi is saying15:22
*** hashar is now known as hasharOut15:22
clarkbrather than trying to make it one size fits all15:22
clarkbbut also 1gb should be plenty15:22
Tenguand do some more hit'n'fail the whole way down.. how to lose time -.-15:23
clarkbthe swap on the test nodes is there to keep a job on yhe edge form failing. It is not there to double available "memory"15:23
* Tengu remembers the old days where swap was usually twice the ram15:23
fungiyeah, but that doesn't tend to work so well on virtual machines15:24
fungithe idea is to add enough swap so that inactive or infrequently accessed pages can be stashed on disk, but 1gb is usually far more than enough for that especially in an hour-long job15:25
fungiit's so that you can free up more ram for things like filesystem cache to speed up execution time15:25
Tengudoesn't seem to be enough for the 2 devstack jobs on this change: https://review.opendev.org/760911  though it's not really part of the change itself.15:25
fricklerone day I would also like to learn from our resource providers whether our quota are really limited by ram or more by cpu. in my own clouds, a flavor with 8 vcpu would naturally have 16-24 gig of ram15:26
fungiit's not that there's not enough swap, it's that there's not enough ram. if the kernel is trying to stick things in swap because there's truly no more room in ram, then you're sunk15:26
Tenguusually the limit is more CPU than RAM in cloud providers....15:26
Tenguremembering that time I worked with AWS, the main issue was getting enough cores, not enough ram.15:26
Tenguram is cheap, compared to cpu.15:27
clarkbTengu: yes and for years I have warned opemstack of this problem15:27
clarkbno one seems to care15:27
Tenguand with motherboards supporting 4TB or ram... :)15:27
Tenguclarkb: well.... I know tripleo is something terrible for the ram (and the whole openstack world in the end)15:27
clarkbsome services are mich larger than others and the privsep daemons are not small15:28
fungipart of why we limit the amount of ram to a consistent amount across our providers is that's what jobs tend to try to use the most of, and this is a means of assuring that projects design their tests and test frameworks to not need more than that (so that people can also run them in their personal developer environments or in local virtual machines)15:28
fungiwhen jobs start running out of memory, "just throw some more ram at the problem" isn't indefinitely  scalable15:29
fungithrow some more effort at not wasting as much ram15:29
fungithe root of the issue is that, for the most part, developers are happy not to really have to care how much ram a piece of software uses until they run out, and then it's a scramble to try and figure out how to put the entire project on a diet15:31
Tenguso in this case, it's no on me - just trying to push a change that missed a backport. basically, I just don't care, downstream has been done - I'm not the right one to talk into resources for tests -.-15:32
fungialso stated as "enjoy your tragedy of the commons"15:32
fungi(it's not my fault the park is full of litter, i just wanted to take a walk in it)15:32
Tenguand, well, passing from 8gb to 1gb of swap IS an infra change, and there isn't any wonder it makes things failing - the project I'm hitting now didn't get any update for the past 5 months or so in that branch.... so it's a bit "meh" in the end.15:33
clarkbdo the jobs run cinder backup service?15:33
fungiif more people brought a trash back with them when they went for a walk, this would be far less of an issue15:33
fungier, a trash bag15:34
clarkbiirc the one thing that was done before was disabling that since no tempest tests exercise it15:34
Tengunot that I know... it's apparently testing the openstacksdk itself.15:34
fungiTengu: yes, it's an infra change to allow you to use newer linux kernels. we could instead just decide not to let you test on newer kernels15:34
Tenguno idea what it does, but it's apparently running multiple things in the same fashion tempest does15:34
clarkbTengu: that infra change is a rwsult of ext4 breaking sparse allocation for swap files15:34
clarkbwe dont have control over that15:34
*** dklyle has joined #openstack-infra15:34
Tenguwould putting 4g as a global value be acceptable?15:35
*** wolsen has joined #openstack-infra15:35
fungiwell, for starters, this discussion is about devstack specifically, so speaking about defaults in devstack it's a question of how many projects will need to adjust the swap value up in their builds to get more virtual memory, vs how many will need to adjust it down to get more disk15:36
fungiideally a good default would minimize both of those15:36
Tengu4gigs is probably low enough to have no impact on the actual disk space.15:37
Tenguthere were 8, before....15:37
fricklerfungi: do you know which project would be low on disk? the change also affected at least tripleo15:37
clarkbit was sparseallocated before15:37
Tenguunless the disk size was also reduced?15:37
clarkbso not actually 8gb in most cases15:37
Tenguhumpf..15:37
clarkbit would be 0 unless used15:37
Tenguthanks ext4 -.-15:37
fungiyeah, it was using 0gb disk before unless you started paging things out to swap15:37
fungihowever, there's the added problem that jobs using more swap are likely doing so not to improve performance but because they've run out of ram, so they're going to start crawling when active memory is being read from/written to swap, which means the job will run way longer and consume more overall quota in our providers which could be used to run different (better performing) jobs instead15:38
fricklerthat's all fine in theory, but nobody is maintaining devstack anymore, in particular stable branches15:39
fungiand on top of that, it swap thrash tortures the storage in those providers and in many cases drags down overall i/o for other server instances on the same compute nodes or the same storage networks/backends, leading to "noisy neighbor" issues which actually impact our other builds15:39
fungiso choosing a smaller default swap value helps force jobs to abort early (oom) if they try to use too much memory15:40
fungirather than making worse for everyone15:40
Tenguis there a way to edit the .zuul.yaml thing in order to actually set the ram we need to make it pass?15:40
fungifrickler: it's okay if people don't maintain software in stable branches any more. just eol those branches15:41
fungithat's literally what eol means: "nobody's maintaining this"15:41
Tengustable/train isn't going EOL anytime soon.15:41
Tengudevstack, on the other hand.....15:41
fungiit totally can if openstack wants it to15:41
*** mordred has joined #openstack-infra15:41
Tenguand, well, it's more masking the actual problem....15:42
Tengulike passing those jobs as non-voting, in the end.15:42
fungiit's a project decision. either find people who want to maintain those jobs and frameworks, or don't and acknowledge to your users that it's not happening15:42
Tenguyay - all green.15:42
toskyclarkb: I've just noticed the question - in cinder we re-enabled cinder-backup on a few jobs15:43
fungii'm not an active core reviewer on devstack so i'm not going to tell the devstack maintainers not to change their default swap size, but i think it's likely to lead to problems and it's not overall a responsible use of test resources. on the other hand, stable branches get far fewer changes so maybe changing the default in stable/train and earlier is an acceptable compromise?15:44
fungior maybe just un-backporting the fallocate->dd change on branches which don't run on newer kernels15:45
clarkbironic in particular haddisk space issues iirc15:45
clarkbmay want to double check with them at least15:45
sean-k-mooneyfungi: well really non of the ci jobs should be swapping15:45
sean-k-mooneywe all know the amount of ram available in the ci vms its 8GBs15:46
sean-k-mooneywe shoudl not be running tests in such a way as we swap in the first place15:46
*** mordred has quit IRC15:47
sean-k-mooneyif we need to drop the test concurancy to make that happen or use smaller guest vms that is what we should do instead of swapping15:47
*** wolsen has quit IRC15:47
Tengu(for the records, the undercloud is supposed to get 23gb or so.... just saying)15:48
sean-k-mooney23GB of what? cinder space or swap?15:50
Tengumemory.15:50
sean-k-mooneyno its not15:50
sean-k-mooneynot in the ci15:50
Tenguaccording to the specs we provide customers.15:50
*** bnemec has quit IRC15:50
sean-k-mooneynot accordign to the spec we provide for test vms15:50
Tengu8gigs without any swap for the undercloud is not working (tested here, even the undercloud deploy has issues). anyway....15:51
Tenguwe'll do the change in devstack itself, and be off with it.15:51
Tengunext time I get this kind of issue, I'll probably just abandon the patch and go on other tasks.15:52
sean-k-mooneyhttps://docs.opendev.org/opendev/infra-manual/latest/testing.html15:52
sean-k-mooneyThere is at least 8GB of system memory available.15:52
clarkbTengu: right we're dealing with the resources available to us15:52
clarkbunfortunately there are no good answers and everything is essentially a compromise15:52
clarkbbut we're doing our best and when we noticed memory was a problem I sent out the alarm15:52
clarkbbut its been many months and very little has happened as a result15:53
Tengucouldn't we move to xfs instead? iirc it's the default for centos/rhel anyway.. ?15:53
sean-k-mooneyTengu: this is really a ooo issue in that it has not keep it minium requirement within the resouce avaiable to test it15:53
clarkbTengu: xfs doesn't do sparse allocations either15:53
clarkbaiui it never has whereas ext4 has regressed15:53
Tengusean-k-mooney: not even ooo in fact... devstack isn't ooo, is it?15:53
sean-k-mooneyno but destack runs fine in the requiremetn you have depending on what service you enable15:54
Tenguclarkb: or maybe allocate an actual volume to the VM? though it makes things a bit more complicated to handle.15:54
sean-k-mooneyyou just have to configre the services to work iwthin the 8GB ram and 80GB disk requiremetns15:54
clarkbTengu: we've tried that too and it doesn't work reliably because nova and cinder leak the volumes15:54
clarkbTengu: then you run out of volume quota and can't boot more instances15:54
Tengusean-k-mooney: yeah - well. in this case, again, I didn't do anything with the job description. Just applying a |#ßæð backport that applied without any issue..... so meh, again.15:55
clarkbbut you also can't delete the leaked volumes because cinder and nova disagree on them being used or something15:55
clarkb(also not all clouds support that)15:55
Tenguclarkb: erf.... "great"....15:55
clarkband then you can't delete the old images that the volumes use ...15:56
clarkbit turns into a large chain reaction of pain15:56
Tenguguess so. Happy to NOT use anything like that for my local lab then. just plain libvirt with some ansible...15:57
sean-k-mooneyclarkb: the image deletion depends on if you are doing a shallow copy or not but ya15:57
Tenguanyway. we'll push the change on devstack, since it's the one thing causing the issue.15:57
Tengufrickler: guess you'll abandon your patch against openstack-zuul-jobs?15:58
clarkbmy suggestion for right now would be to bump up the value on a per job basis15:58
clarkbthat way we don't break ironic with a global change15:58
Tengubasically https://review.opendev.org/75748815:59
clarkband longer term figuring out how to make the jobs leaner would be excellent. I think my original thread on this had a bunch of notes on how people can do that15:59
*** mordred has joined #openstack-infra15:59
Tenguor deprecate devstack.16:00
Tenguthat's probably what will happen. eventually.16:00
sean-k-mooneywe can deprecate devstack after we deprecate ooo16:00
Tengufirst time I see devstack jobs actually.16:00
sean-k-mooney devstack is actully one of the better installers IMO16:00
clarkbit works well for CI16:01
clarkbdefinitely has its problems16:01
sean-k-mooneyit does but its easy to debug vs the complexity of others16:01
* Tengu sticks to his tripleo-lab16:01
sean-k-mooneyi fine myself graviating to the simpler installer like devstack and kolla-ansible more16:01
sean-k-mooneyi had hopes for ooo standalone but since i cant get our deployment folks to support that in multinode i have basically given up on that too16:02
Tengustandalone isn't for multinode, indeed..16:02
Tenguanyway. we'll see if it merge - downstream has been done, I'm fine with it.16:03
Tengunow is EOD for me. see you around16:03
zbrfungi: clarkb: small doc fix on bindep regarding undocumented compile tag: https://review.opendev.org/#/c/751253/16:03
zbri think some would find it useful16:03
*** gyee has joined #openstack-infra16:14
clarkbhttp://lists.openstack.org/pipermail/openstack-discuss/2019-July/007493.html found the original thread from last year16:14
clarkbTengu: sean-k-mooney ^ fyi16:14
*** dwilde has joined #openstack-infra16:22
*** d34dh0r53 has quit IRC16:24
*** dwilde has quit IRC16:24
*** d34dh0r53 has joined #openstack-infra16:24
*** gyee has quit IRC16:31
*** dave-mccowan has joined #openstack-infra16:36
*** wolsen has joined #openstack-infra16:36
*** gyee has joined #openstack-infra16:41
*** gyee has quit IRC16:45
*** rpittau is now known as rpittau|afk16:46
*** dtantsur is now known as dtantsur|afk16:47
*** bnemec has joined #openstack-infra17:00
*** hamalq has joined #openstack-infra17:01
*** lpetrut has joined #openstack-infra17:02
ade_leeclarkb, hey - so I tried an experiment yesterday where I commented out the fips bits and just had the node reboot.  https://zuul.opendev.org/t/openstack/build/16680efdff1a4b9d8153783ec002a779/log/job-output.txt#871-88217:05
ade_leeclarkb, other than the weirdness that the reboot did not just do it immediately, what is notable is that it doesn't come back17:05
clarkblooks like it did come back though?17:06
clarkbyou need to reenable the logger daemon after the reboot which was the original role I shared17:06
ade_leealso - I noticed this from the previous run -- where fips is enabled .. https://zuul.opendev.org/t/openstack/build/94645ae4c6b741f0ac352028f29e45a617:06
clarkbbut it appears the job continues post reboot17:06
clarkbdoes fips change the hostkeys somehow?17:07
ade_leeclarkb, well - when it continues without fips, we get this ..17:07
ade_leehttps://zuul.opendev.org/t/openstack/build/16680efdff1a4b9d8153783ec002a77917:07
ade_leesee the perms denied ..17:07
clarkbright but the reboot succeeds?17:08
clarkbbasically reboot failing seems to be related to fips somehow?17:08
ade_leeclarkb, several folks I've chatted to -- weshay|ruck for instance, indicated to me that nodepool reclaims the node when its rebooted17:08
clarkbno it does not17:08
clarkbmany many many years ago during the zuulv2 days this was true but it hasn't been true since the zuulv3 upgrade17:09
clarkbyour example shows that this seems to work fine?17:09
ade_leeok cool - thats good to know -- can you repost the role that restarts the logger?17:10
clarkbhttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/start-zuul-console is the role to run post reboot to get the console log back17:10
clarkbwithout that you see the 2020-11-02 21:12:38.076687 | [controller] Waiting on logger entries from your example17:10
fungiso just including "start-zuul-console" after the reboot ought to do it17:11
clarkband then it is later that the job fails17:11
ade_leeI'll try it - as for fips mucking with the keys -- that would be unexpected17:11
*** elod has quit IRC17:11
*** elod has joined #openstack-infra17:12
ade_leebut if it works when I restart the logger, then something fips related is clearly a problem ..17:12
ade_leeclarkb, fungi thanks -- off to experiment some more ..17:12
*** dave-mccowan has quit IRC17:12
*** jcapitao has quit IRC17:15
*** Ajohn has joined #openstack-infra17:16
*** dave-mccowan has joined #openstack-infra17:16
*** gfidente is now known as gfidente|afk17:21
*** fresta has quit IRC17:22
*** sshnaidm|rover is now known as sshnaidm|afk17:22
*** lucasagomes has quit IRC17:24
*** dikonoor has joined #openstack-infra17:27
*** ociuhandu has quit IRC17:31
*** dikonoor has quit IRC17:36
clarkbjust to confirm the mkswap man page seems to indicate that xfs + fallocate wouldn't work either17:45
*** tosky has quit IRC17:57
*** derekh has quit IRC18:03
*** dwilde has joined #openstack-infra18:05
*** d34dh0r53 has quit IRC18:05
*** lpetrut has quit IRC18:26
*** hamalq has quit IRC18:27
*** hamalq has joined #openstack-infra18:27
*** arxcruz has quit IRC18:28
*** gyee has joined #openstack-infra18:32
*** jpena is now known as jpena|off18:33
*** ralonsoh has quit IRC18:49
*** eolivare has quit IRC18:52
*** jamesdenton has quit IRC18:54
*** ianw_pto is now known as ianw18:59
*** vishalmanchanda has quit IRC19:02
*** arxcruz has joined #openstack-infra19:28
ade_leeclarkb, fungi I think we figured out the problem.  This is the error message we are getting .. http://paste.openstack.org/show/799667/19:32
fungiade_lee: indeed, looks like maybe fips mode could have cleared and regenerated the ssh host key?19:33
fungior otherwise changed how it's being served19:34
ade_leeclarkb, fungi the problem is not that the keys are regenerated, but that we default to using ed25519 keys when we first set things up19:34
ade_leeand those types of keys are disabled in fips19:34
fungiaha!19:34
fungigood point19:34
fungiwe'd need zuul to add all available host keys to the known_hosts for that build i guess19:35
ade_leeyup -- or default to using ecdsa perhaps19:36
clarkbI thought ecdsa was considered less secure ...19:36
clarkbsomething about unexplainable magic numbers19:36
*** andrewbonney has quit IRC19:37
ade_leeclarkb, not sure - but the logs show that its trying  to use ecdsa and getting a ed25519 key19:38
clarkbya the issue is we're only registering the ed25519 key19:38
clarkbso when it gets something else it can't verify it. As fungi mentions we can probably have zuul write them alldown19:38
fungifips isn't about being "secure" it's about being predictable and vetted19:39
fungithe usg (via fips) doesn't trust some newer and ostensibly stronger ciphers and hashes because it hasn't confidently vetted them yet19:40
ade_leeclarkb, fungi so - how do we get zuul to write all the keys down?19:40
fungiade_lee: i'll have to think about it more when we're not in the middle of our weekly meeting19:40
ade_leefungi, ack thanks19:41
fungicorvus: ^ this is an interesting corner case for zuul's handling of node ssh host keys20:01
fungii'm refreshing my memory now on where we do the ssh-keygen call to fetch those20:01
clarkbfungi: its in nodepool utils.nodescan20:06
clarkbthen passed all the way up the chain from there20:06
fungiaha, so we'd need to plumb it through zk, maybe need to change the message format?20:09
fungithough perhaps multiline content there is fine20:09
clarkbwell it seems nodepool grabs all the keys20:09
clarkbI'm working from there to see if I can find where we might select the key type to use20:09
clarkbhrm it seems like zuul writes all the keys it gets20:11
clarkbis it possible that centos-8 is only generating ed25519 by default?20:11
clarkboh unless it is paramiko only grabing a single key20:13
clarkbhttps://opendev.org/zuul/nodepool/src/branch/master/nodepool/nodeutils.py#L88 that may be it20:13
clarkbeverything else assumes a list but there we may only grab a single key20:13
clarkbthe comment in there also says paramiko only grabs the rsa key20:14
clarkbI'm guessing that has changed :)20:14
clarkbhttp://docs.paramiko.org/en/stable/api/transport.html#paramiko.transport.Transport.get_remote_server_key is very quiet about what that actually does20:15
*** jamesdenton has joined #openstack-infra20:17
clarkbskimming the code I think it is using the host key that was used to establish the connection so not a truescan20:17
clarkbwhat we want is an ssh-keyscan equivalent for paramiko20:21
*** tosky has joined #openstack-infra20:27
clarkbreading rfc 4253 I think what you can do is a series of connections each with a different supported host key type by the client20:32
clarkbnot sure if paramiko exposes that though20:32
*** slaweq has quit IRC20:33
*** slaweq has joined #openstack-infra20:44
*** zxiiro has joined #openstack-infra20:59
*** raissa has joined #openstack-infra21:01
*** hasharOut is now known as hashar21:12
*** rcernin has joined #openstack-infra21:18
*** rcernin has quit IRC21:18
*** rcernin has joined #openstack-infra21:19
clarkbade_lee: fungi https://review.opendev.org/761229 is a rudimentary attempt at addressing this in nodepool21:21
*** rfolco has quit IRC21:24
*** rfolco has joined #openstack-infra21:24
ade_leeclarkb, cool thanks -- I'll add as a depends-on in my patch and see if it works!21:24
clarkbwell depends on won't help here21:24
clarkbthat is part of the nodepool service running in production that hands out test nodes and isn't speculatively applied21:25
ade_leeclarkb, ah ok21:25
*** raissa has quit IRC21:25
ade_leeclarkb, so I have to wait for it to be merged to try it out?21:26
clarkbfwiw it seems that paramiko is what prefers ed2551921:26
*** ChanServ has quit IRC21:26
clarkbhttps://github.com/paramiko/paramiko/blob/dccc563c3a581a14ceaa261b5fe0d08539f84b87/paramiko/transport.py#L171-L178 is the list I think21:26
clarkband that is why we end up using that by default21:27
ade_leeclarkb, one of the reasons paramiko is not fips certified ..21:28
clarkbI mean you can disable it which is essentially how my change works21:28
clarkbbut also I'm not really convinced in the value of fips if it bypasses ed25519 in favor of ecdsa21:28
clarkbsince literally all the writing on the subject seems to be that ed25519 is far superior21:28
*** rfolco has quit IRC21:29
ade_leeclarkb, as fungi mentioned, fips is all about using things that are vetted and predictable -- you have to use fips certified algorithms etc.21:30
clarkbright but the reason people don't like ecdsa is it has unvettable magic numbers21:30
clarkbwhereas the other algorithm is actually understandable21:30
clarkb(which makes me think it isn't even achieving that goal)21:31
clarkbbut I guess if you are the same entity creating the fips vetting and the magic numbers...21:31
ade_lee:/21:31
*** ChanServ has joined #openstack-infra21:32
*** tepper.freenode.net sets mode: +o ChanServ21:32
clarkband ya apparently it is NIST in both cases21:32
ade_leeyup21:32
fungiclarkb: "unvetted" depends on who's doing the vetting21:46
fungiahh, yeah, as you said21:46
fungifips is really mostly only for the usg and contractors... anyone who doesn't trust "certain three-letter agencies" should probably not use fips mode21:47
*** gfidente|afk has quit IRC21:52
*** matt_kosut has quit IRC21:59
*** hashar has quit IRC22:01
*** slaweq has quit IRC22:02
*** aedc has joined #openstack-infra22:03
*** aedc has quit IRC22:04
*** sboyron has quit IRC22:34
*** tkajinam has joined #openstack-infra22:59
*** tosky has quit IRC23:01
*** mmethot has quit IRC23:18
*** dchen has joined #openstack-infra23:34
*** mmethot has joined #openstack-infra23:34

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!