Friday, 2019-03-01

*** bhavikdbavishi has joined #softwarefactory02:06
*** bhavikdbavishi has quit IRC03:18
*** bhavikdbavishi has joined #softwarefactory04:06
*** raukadah is now known as chandankumar05:09
jangutterhi, I did an upgrade last night from 3.0 to 3.2 (!!!) and I'm getting weird errors in zuul-scheduler.log ERROR gear.Server: Exception in connect loop: ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:645)08:19
*** jpena|off is now known as jpena08:48
jangutterI found a hint: reverting https://softwarefactory-project.io/r/#/c/11591/ silenced the warnings (unencrypting the gearman comms). I wonder if my localCA needs to be regenerated?09:08
*** bhavikdbavishi has quit IRC09:32
*** bhavikdbavishi has joined #softwarefactory09:32
*** sshnaidm is now known as sshnaidm|off10:00
*** bhavikdbavishi has quit IRC10:46
*** bhavikdbavishi has joined #softwarefactory11:29
*** jpena is now known as jpena|lunch12:51
tristanCjangutter: ssl cert should have been auto generated, are all the package up-to-date? (e.g. yum update)12:56
janguttertristanC: Lemme check! Thanks.12:56
janguttertristanC: No packages marked for update. I am using my own certs for the reverse proxy though, would that make a difference?12:57
janguttertristanC: sfconfig.yaml:network.tls_cert_file and friends. (thanks for that, btw!)12:58
tristanCjangutter: hum, maybe, so the gearman.crt and key are generated with https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/sfconfig/components.py#n6912:59
tristanCusing the CA from /var/lib/software-factory/bootstrap-data/certs/localCA.pem13:01
janguttertristanC: yeah, what's weird to me is I'm getting "WRONG_VERSION_NUMBER".... I'm not a TLS expert, but could it be the localCA.pem is still pretty old and doesn't have new cyphers?13:04
tristanCjangutter: is this command returns "OK": openssl verify -CAfile /etc/zuul/ssl/localCA.pem /etc/zuul/ssl/gearman.crt ?13:05
janguttertristanC: where else is localCA used? (i.e. would it break stuff if I go delete it and recreate it?)13:05
janguttertristanC: /etc/zuul/ssl/gearman.crt: OK13:05
tristanCjangutter: and "rpm -q rh-python35-python-gear" says rh-python35-python-gear-0.13.0-1.el7.noarch right?13:08
jangutterOooh! nope13:09
jangutter0.1213:09
tristanCactually, that's normal, this version of python-gear should work13:10
tristanCso yeah, maybe the localCA is too old, though we didn't have such issue for our SF who got upgraded from 3.013:11
jangutterThe backtrace has: ssl_version=ssl.PROTOCOL_TLSv113:11
tristanCwhich is what python-gear is using13:13
jangutterfrom what I can gather WRONG_VERSION_NUMBER seems to be a generic error (you also get it if you're trying to do tls on unencrypted links).13:15
tristanCyes, so in SF-3.2, the zuul-scheduler gearman service is now protected by TLS13:16
tristanCjangutter: is this working: "echo status | /usr/local/bin/gearman-client13:17
jangutterShould be libexec, let me check.13:17
jangutterI'm restarting zuul with the TLS back in place.13:18
tristanCjangutter: this script should be copied to /usr/local/bin13:19
jangutterNope, I only see: cgit-config-generator.py  resources2repoxplorer.py  resources.sh in there.13:19
tristanCjangutter: another thing to check is if the gearman service has been restarted, it should be a scheduler child process, you can get its pid using "sudo netstat -nepal | grep 4730.*LISTEN"13:19
jangutterYep, ss -nlp | grep 4730 shows it listening.13:20
tristanCjangutter: that's odd, did you run "sfconfig" after the upgrade?13:20
jangutteryep, multiple times afterwards in fact.13:20
jangutterI checked the libexec path is in the ansible scripts shipped with sf-3.2 btw.13:21
jangutteroh. Zuul's not starting now :-(13:22
jangutterzapping the TLS config made it work again (phew).13:23
tristanCjangutter: my bad, the script is indeed copied to libexec now, sorry it's late here :)13:23
jangutterHey, this is definitely not a priority! Thanks very much for helping.13:24
tristanCjangutter: could you paste the error you get? e.g. the line before might be helpful13:24
jangutterI tried to find what exactly triggered it but failed. Lemme paste the error into pastebin13:25
jangutterhttps://pastebin.com/RHQtgDAm13:26
tristanCjangutter: and could it be that the gearman process (the one listening on 4730) didn't got restarted with the tls settings?13:27
jangutterHmmmm.... let me check that theory! I though the gearman process forked off the zuul-scheduler service.13:27
jangutterOK, the 4730 port goes dead if I stop zuul-scheduler.13:28
jangutterIf I re-enable TLS, it pauses at: INFO zuul.ConfigLoader: Loading configuration from /etc/opt/rh/rh-python35/zuul/main.yaml13:29
tristanCjangutter: if it pauses there, it likely means it is waiting for executor/merger to perform merge task over gearman13:31
tristanCjangutter: perhaps try to restart rh-python35-zuul-executor now13:31
tristanCjangutter: it seems like if the scheduler reach "Loading configuration", then it manage to connect to gearman13:32
tristanCmanaged*13:32
janguttersystemctl restart zuul-executor took loooong13:33
tristanCjangutter: it could have been that services didn't got restarted properly, especially if you went from 3.0 to 3.213:33
jangutterAh failing: AttributeError: 'MergeJob' object has no attribute 'updated'13:33
jangutterzuul-scheduler is now failing on an exception.13:33
jangutterI'm restarting it... looks fine thus far.13:34
jangutterINFO zuul.Scheduler: Full reconfiguration complete13:34
jangutterBingo, thanks!13:35
jangutterlooks like "turning it off, then turning it on" made it work!13:35
tristanCjangutter: the updated attribute error is a known benign issue: see https://review.openstack.org/#/c/633259/13:35
tristanCjangutter: yeah, i guess the service didn't got restarted as expected, before 3.2 the upgrade used to stop everything, do the upgrade, and start everything13:36
tristanCjangutter: in 3.2, sf-config tries to be smarter and it should only restart service if the service's package got updated13:37
jangutterIf I went from 3.0 to 3.1, I might not have noticed.13:37
tristanCjangutter: but this may not work well if sfconfig process failed13:37
tristanCjangutter: perhaps you should restart the instance too, to make sure everything is running the right version, and perhaps update kernel too13:38
jangutterecho status | /usr/libexec/software-factory/gearman-client is pausing though.13:38
jangutterBut, thanks I think I'll do some rebooting.13:39
jangutterhave a great weekend!13:39
tristanCjangutter: gearman-client doesn't exit iirc, it should print the list of job and ends with a single ".\n"13:39
jangutteryeah, it's just quiet.13:40
jangutterGimme a sec, let me run the openssl manually.13:40
jangutternope, I'll do more digging.13:41
tristanCalright, i'll leave now, let me know if reboot helped13:42
jangutterweird, gearman-client works, but it's just reallly slooow.13:49
jangutterrebooting.13:50
*** jpena|lunch is now known as jpena13:54
*** bhavikdbavishi has quit IRC13:55
janguttertristanC: thanks again, after a reboot, all the services seem to be much happier now.13:55
*** chandankumar is now known as raukadah14:26
*** bhavikdbavishi has joined #softwarefactory15:20
*** jangutter has quit IRC16:58
*** bhavikdbavishi has quit IRC17:11
*** bhavikdbavishi has joined #softwarefactory17:17
sfbenderJavier Peña created DLRN master: Do not fallback to master on branches starting with rhos-  https://softwarefactory-project.io/r/1512117:47
*** jpena is now known as jpena|off18:03
*** irclogbot_3 has joined #softwarefactory18:11
*** bhavikdbavishi has quit IRC18:46
*** rfolco|rover has quit IRC19:21
*** irclogbot_3 has quit IRC19:48
*** irclogbot_3 has joined #softwarefactory20:03
*** rfolco has joined #softwarefactory20:37
*** irclogbot_3 has quit IRC21:37

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!