Thursday, 2016-05-26

*** openstack has joined #openstack-sprint00:04
pabelangerdecided to switch to rsync00:18
pabelangerhad to bring meetbot back online00:18
*** baoli has joined #openstack-sprint00:25
*** openstackstatus has quit IRC00:46
*** openstack has joined #openstack-sprint00:55
pabelanger\o/00:55
pabelangerlogs persisted to cinder on eavesdrop.o.o00:56
pabelangerjust waiting for confirmation that logging is still working00:57
*** anteaya has quit IRC00:59
pabelangerw00t: http://eavesdrop.openstack.org/irclogs/%23openstack-sprint/latest.log.html01:00
pabelangerI'll finish the migration of eavesdrop.o.o tomorrow01:00
pabelangershould be able to use the same rsync process for lists.o.o too, and have minimal downtime for migrating the data to cinder01:01
jheskethwith server deletions are we taking snapshots or anything before offlining them?02:04
jheskethdo we have any recovery plans if we delete something we actually needede02:04
*** openstack has joined #openstack-sprint02:40
*** rfolco has quit IRC03:03
*** baoli has quit IRC03:26
*** rfolco has joined #openstack-sprint03:46
*** rfolco has quit IRC04:16
*** baoli has joined #openstack-sprint04:27
*** baoli has quit IRC05:39
*** openstack has joined #openstack-sprint06:55
*** baoli has joined #openstack-sprint12:46
*** baoli_ has joined #openstack-sprint12:49
*** baoli has quit IRC12:52
*** anteaya has joined #openstack-sprint13:16
pabelangerjhesketh: I've been deleting servers as I have replaced them.  Once I've confirmed things are working correctly.13:18
jheskethpabelanger: sure I more meant as a backup in case we've missed something and don't notice for a week13:19
pabelangerjhesketh: ya, I've assumed we'd find that data on our bup.o.o server, but that is a big assumption on my behalf13:20
jheskethHmm okay13:22
pabelangerI think I'll push eavesdrop.o.o to tomorow13:22
jheskethThere's probably no harm in taking snapshots right? As in the storage is reasonably cheap13:22
pabelangersince there are only 2 meetings on Friday13:22
pabelangerjhesketh: I don't think so13:22
*** rcarrillocruz has joined #openstack-sprint13:23
rcarrillocruzoh13:24
rcarrillocruzwas not aware of a sprint :/13:24
pabelangernp13:25
pabelangerWhat is the plan for logstash-workerXX.o.o? You and yolanda are working on 01?13:26
rcarrillocruzwe started on it about 45min on it13:26
rcarrillocruzplaybooks and roles runs with latest changes13:26
fungipabelanger: yeah, i wouldn't make assumptions about backups being viable unless you 1. confirm the server in question actually has backups configured, and 2. test restoring some data from it13:26
rcarrillocruzbut the ansible-puppet role fails13:26
rcarrillocruzas it cannot access hiera files13:26
*** yolanda has joined #openstack-sprint13:26
rcarrillocruznot sure why13:26
rcarrillocruzdo you folks run launch-node.py as root?13:27
yolandahi13:27
rcarrillocruzyolanda: could you share the commands you are running with pabelanger so he can login and check13:27
yolandasure13:27
pabelangerrcarrillocruz: yolanda: Ah, yes that is a bug with our file permissions. We need to change the hieradata to puppet:puppet IMO13:27
pabelangerfungi: ack13:27
rcarrillocruzoh ok13:27
pabelangerI've been doing launch-node.py as root13:28
jheskethfungi: what do you think about snapshotting nodes before deletion?13:28
rcarrillocruzso, yolanda , ansible-playbook as 'root' should work then13:28
yolandayes, running with root now13:28
fungijhesketh: for servers with actual state on their filesystems, i'm not opposed to snapshotting13:28
fungijhesketh: for stuff like zmXX or logstash-workerXX i'm less concerned13:28
jheskethfungi: well that's the tricky part right. If we've got all the state elsewhere  (such as git in the case of apps.o.o) we're fine. It's more for what we might have missed or not noticed13:29
yolandarcarrillocruz, weird, it timed out on creating keypairs13:30
yolandagoing to retry13:30
jheskethBut yeah those workers are more obvious13:30
rcarrillocruzrax api or net transient issue i assume13:30
fungijhesketh: the main things which make snapshotting useful for some of those is so that we can dig up logs from before the upgrade (or similar stuff like periodic mysqldumps)13:31
jheskethThat's a good point too13:31
rcarrillocruzonce we migrate all the stuff to trusty, i'd like to start a conversation on how we are going to manage the infra resources. Making a playbook that leverages the cloud-launcher role to have a feature parity with launch_node.py is ok, but the advantages of the role is to have a full infra defined in  a yaml13:31
jheskethCan we set retentions on snapshots? Do we need to or do we have enough quota to keep them indefinitely13:32
rcarrillocruzi.e. work a process on adding new servers / resources in a resources.yml and continously deploy them13:32
pabelangerrcarrillocruz: I think that is some of the proposal nibalizer added, but ya. Eager to do that too13:32
jhesketh+113:33
rcarrillocruzit's linked to what nibalizer talked about one-off thing , yeah...13:33
fungircarrillocruz: and the design for that is also predicated on refactoring a lot of our manifests so that we can start having hostnames differ from service names/site names13:33
rcarrillocruzmy next work item is adding the ability to have servers in the resources.yml a 'node_count', so if you have let's say a server 'logstash-worker.openstack.org' with node_count attribute set to two, the role would create two numbered resources13:34
rcarrillocruzfungi: ++13:34
fungijhesketh: we've hardly used snapshotting (aside from our old nodepool image method, and that was in a separate tenant) so no idea13:34
yolandarcarrillocruz, it worked fine with root13:39
anteayarcarrillocruz: we talked about the sprint several times in the weekly meeting, how could you have found out about it prior to it starting?13:39
rcarrillocruz\o/13:39
rcarrillocruzpabelanger: ^13:39
rcarrillocruzso13:39
rcarrillocruzi guess i leave to both of you13:39
rcarrillocruzto get rid of ls workers13:39
rcarrillocruzand you spin replacements13:39
anteayarcarrillocruz yolanda glad you are here, just wanting to know how you consume information of this nature?13:39
rcarrillocruzclarkb mentioned also the dns corrections and modifying the firewall for the changes13:40
rcarrillocruzlet me search for that irc conversation, i'll paste here13:41
pabelangerrcarrillocruz: cool, so a bug on our permissions. Will wait until nibalizer is around to confirm we can update /etc/puppet/hieradata to puppet:puppet13:42
rcarrillocruzthis is what clarkb put yesterday about additional steps:13:42
rcarrillocruzhttp://paste.openstack.org/show/505652/13:42
rcarrillocruzpabelanger , yolanda : ^13:43
rcarrillocruzanteaya: sorry, i don't follow what you mean of consume information of this nature13:43
rcarrillocruznot sure if i missed an earlier comment13:43
rcarrillocruzoh13:43
rcarrillocruzthe sprint announcement13:43
rcarrillocruzwell13:43
pabelangerrcarrillocruz: Yup, had to do the same with zuul.o.o13:43
rcarrillocruzi missed the meeting13:43
rcarrillocruzi use to be able to attend13:43
rcarrillocruzbut haven't been able the last one13:43
rcarrillocruzi should have read the meeting wrap-up i guess13:44
rcarrillocruzanyway, it's good that incidentally i was working on something that is *about* this sprint topic :D13:44
anteayarcarrillocruz: oh okay, I just assumed you were reading meeting logs13:45
anteayaI'm glad you are here13:45
rcarrillocruzout for a bit, my wife is about to arrive with the kid13:45
rcarrillocruzbe back in an 1h, haven't had lunch yet (starving!)13:45
anteayarcarrillocruz: happy family time13:45
yolandarcarrillocruz, so i totally lost the scope of the sprint, and the intentions with logstash. I have not been able to follow chat and contribute to infra so much lately... so what shall we do with that logstash-test ? first of all, is that naming something temporary? i guess we need to be launching replacements for logstash to 1-20?13:46
anteayayolanda: hope all is well with your family13:46
yolandaanteaya, a bit better, but things are still complicated. Thanks...13:47
anteayayolanda: I understand, yes it is hard to be all the places you want to be13:47
anteayathanks for your help here13:47
yolandafamily is first priority... but trying to be back to work normally13:48
yolandarcarrillocruz, i guess first step is to approve your changes, so we have a functional launch-node play13:48
anteayayolanda: yup, I agree with your priorities13:49
anteayaand I understand enjoying the routine of work13:49
anteayaclarkb was back within about 3 weeks of the babies being born13:50
yolandawe may be a bit workaholics...13:52
anteayaagreed13:54
anteayaa big family of workaholics13:54
pabelangeryolanda: rcarrillocruz: okay, I am going to do logstash-worker01.o.o using launch-node.py, just to confirm everything works correctly on ubuntu-trusty for the first one. Then, I'll review the topic and see how to use cloud-launcher for the next server14:07
yolanda++14:07
rcarrillocruz++14:07
pabelangerclarkb: is there a procedure for taking a logstack-worker out of service?14:07
pabelangerdropping the DNS TTL to 5mins on logstash-workers14:09
clarkbpabelanger: no, you just stop services on them, logstash and the 4 jenkins log workers14:12
clarkbas rcarrillocruz mentioned you will have to bounce iptables on the elasticsearch hosts and logstash.o.o after dns updates14:13
pabelangerclarkb: perfect14:13
pabelangerack14:13
rcarrillocruzyolanda: can you pls add the steps to create a worker with the launcher onto https://etherpad.openstack.org/p/newton-infra-distro-upgrade-plans ?14:14
rcarrillocruzwith the source ansible/hacking/env-setup etc14:14
rcarrillocruzrax params and all14:14
rcarrillocruzso pabelanger or whoever can just copy paste and run it14:14
pabelangerrcarrillocruz: why do we need ansible-devel branch?14:15
rcarrillocruzbecause of this bug: https://github.com/ansible/ansible/issues/1414614:16
rcarrillocruzyou can't have include with_items nested, it's broken14:16
rcarrillocruzthat will land on ansible 2.1 , which is about to be released14:16
pabelanger2.1 was released yesterday :)14:16
yolandai tested with ansible 2.2 on my venv14:17
rcarrillocruzwas it?14:17
rcarrillocruz\o/14:17
rcarrillocruzhaven't seen the announcement on ansible-devel14:17
* rcarrillocruz goes check14:17
rcarrillocruzweeee14:18
rcarrillocruzyou folks ok if I bump the version on puppetmaster14:18
rcarrillocruz?14:18
pabelangerclarkb: does it make more sense to just delete the logstash-worker from rackspace vs stopping services?14:20
clarkbpabelanger: I would leave old one up, make new one, bounce iptables, make sure new one works then delete old14:21
pabelangerokay, I'm fine with that14:21
clarkbits fine to have both running at the same time (we normally have 20 instances)14:21
rcarrillocruzhmm14:21
rcarrillocruzhttps://github.com/openstack-infra/puppet-ansible/blob/master/manifests/init.pp14:21
rcarrillocruzso by default the class installs latest from pip14:22
rcarrillocruzi vaguely recall an issue with the pip provider where latest had issues14:22
rcarrillocruzcan someone confirm if the puppetmaster ansible is now on 2.1 ?14:22
rcarrillocruzi.e. 'ansible --version'14:22
yolandachecking14:22
yolanda2.1.0.014:23
rcarrillocruz\o/14:23
rcarrillocruzyesterday you pasted me 2.0.0.214:23
rcarrillocruzso all good then14:23
rcarrillocruzjust paste the ansible-playbook command and will be good :D14:24
yolandayes14:24
yolandait got updated14:24
rcarrillocruzthx a bunch14:24
yolandalet me try without my venv, just with the ansible we have14:24
rcarrillocruzand sorry for being a pain and being woman-in-the-middle :/14:24
yolandaglad to help a bit with the sprint, even just for that...14:25
pabelangerlaunching replacement logstash-worker01.o.o now14:25
yolandaconfirmed that play works with ansible 2.1 version14:28
rcarrillocruz'o/14:30
rcarrillocruz\o/14:30
rcarrillocruzand with that, i can resume the work on my tests for the cloud launcher... was reluctant to test against devel on requirements.txt14:31
pabelangerclarkb: so looking at logstash-worker01.o.o replacement, I guess our services don't launch on start up?  They need to be manually started?14:47
rcarrillocruzthat would follow a common pattern on other manifests, like zuul...14:48
clarkbpabelanger: ya that sounds right14:48
pabelangerrcarrillocruz: zuul-mergers will start on boot, that's why I was asking14:49
pabelangerclarkb: ack14:49
clarkblogtsash should start iirc14:49
clarkbbut not the workers14:49
pabelangerRight, jenkins workers are stopped14:50
nibalizerpabelanger: rcarrillocruz heya15:20
nibalizerwhat's up with hiera?15:20
pabelangerclarkb: okay, I think logstash-worker01.o.o is upgrade properly. Is there an easy way to confirm it is working properly?15:20
pabelangernibalizer: We think the permissions on /etc/puppet/hieradata on to restrictive for non-root users, in the puppet group. And have some issues using launch-node.py as non-root15:22
clarkbpabelanger: tail /var/log/logprocessor files15:22
clarkbyou should see those log files reporting work is happening15:22
pabelangerclarkb: only issue I see is some HTTPError: HTTP Error 404: Not Found15:23
pabelangernot sure if that is expected15:23
clarkbya that is expected since it is greedy and many log files dont exist on all the jobs15:24
clarkbpabelanger: but the log files are advamcing?15:24
pabelangerclarkb: yes15:24
clarkbthen should be good15:25
pabelangerperfect!15:25
nibalizerpabelanger: well those are the keys to the kingdom... where are you seeing a permission denied?15:25
clarkbthey will not advance if part of the pipeline is plugged15:25
nibalizeri am suprised that launch-node is going into /etc/puppet/hiera15:25
pabelangernibalizer: issue revolves around copying hiera data onto the new node, the bits fail to copy properly (which I assume is because of permission issues). I haven't debugged it, but have switched to root for the moment.15:26
rcarrillocruznibalizer: not the script itself, but the remote-adhoc-puppet playbook, which in turn is the ansible-puppet15:26
nibalizeroh interesting15:28
nibalizeryep that will fail15:28
nibalizerwe wrote the hiera-copy stuff assuming root15:29
nibalizerso it comes down to which is more valuable - running launch_node.py as nonroot or having /etc/hieradata locked off15:29
pabelangerright15:29
nibalizerthere is at least one more option, that sudo is used to grab the file from /etc/hieradata15:29
nibalizerso yea i'd defer to fungi clarkb and jeblair on that one15:30
fungiyes, i've been running in an interactive root shell because the hiera copying doesn't work if launch-node.py is run as non-root15:32
rcarrillocruzpabelanger: so, does trusty work ok for the logstash worker manifest?15:37
pabelangerrcarrillocruz: yup15:37
rcarrillocruznice15:37
pabelangergoing to do a few more now, then try out cloud-launcher after lunch15:39
rcarrillocruz++15:40
* rcarrillocruz goes for a coffee15:42
pabelangerokay, letting logstach-worker05.o.o build while I get some food16:27
pabelangeronce online, I'll use cloud-launcher for logstash-worker06.o.o16:27
*** rfolco has joined #openstack-sprint16:53
rcarrillocruzpabelanger: i can't see the entry from yolanda on the etherpad about the exact one-liner she ran16:54
rcarrillocruzbut left instructions at the bottom16:54
rcarrillocruzyou should be able to work it out with that16:54
pabelangerswitching to cloud-launcher17:01
clarkbI am about to do logstash.o.o again17:02
*** baoli_ has quit IRC17:06
*** baoli has joined #openstack-sprint17:06
pabelangerrcarrillocruz: first issue: http://paste.openstack.org/show/505708/17:13
rcarrillocruzah well17:13
rcarrillocruzput your .ansible.cfg this:17:13
rcarrillocruz[defaults]17:13
rcarrillocruzhost_key_checking = no17:13
rcarrillocruzalternatively17:13
rcarrillocruzexport ANSIBLE_HOST_KEY_CHECKING=False17:14
rcarrillocruzthat should get you past that issue17:14
clarkbplease do not set that in roots ansible.cfg17:14
pabelangerRight, I'm not a fan of having to setup defaults in ansible.cfg to run it honestly17:14
clarkbthe env var for one time use would be preferable17:14
pabelangerwe should be able to setup a pre_task or something to dynamically disable it17:15
jeblairit looks like ansible on puppetmaster is sploding:17:16
jeblair2016-05-26 17:13:10,734 p=13399 u=root |  An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: string indices must be integers, not str17:16
jeblair2016-05-26 17:13:10,735 p=13399 u=root |  fatal: [paste.openstack.org -> localhost]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Traceback (most recent call last):\n  File \"/tmp/ansible_MIFKWV/ansible_module_puppet_post_puppetdb.py\", line 149, in <module>\n    main()\n  File \"/tmp/ansible_MIFKWV/ansible_module_puppet_post_puppetdb.py\", line 78, in main\n    fqdn = ...17:16
jeblair... p['hostvars']['inventory_hostname']\nTypeError: string indices must be integers, not str\n", "module_stdout": "", "msg": "MODULE FAILURE", "parsed": false}17:16
jeblairthat's for every host17:16
rcarrillocruzansible 2.1 issue ?17:16
jeblairi guess it's just for posting facts17:16
jeblairoh, istr someone saying posting facts is broken17:17
jeblairis that what was meant, or is that something new?17:17
rcarrillocruznot aware of issues specific of ansible 2.1, but it got upgraded on the puppetmaster yesterday17:18
rcarrillocruzso unless someone identifies that prior to yesterday, could be linked17:18
rcarrillocruzalthough that error message suggests the fix should be issue17:19
rcarrillocruzpassing a filter to type cast it to int17:19
pabelangerjeblair: missing if __name__ == '__main__' bits?17:19
pabelangerya, ansible-puppet doesn't have that17:20
pabelangerlet me patch, see if that helps17:20
jeblairhow is it working at all then?17:20
pabelangerI thought the magic bits were optional until 2.1? I think that is what dshrews mentioned?17:21
pabelangerhonestly, I'm just guessing ATM17:21
jeblairyeah, i just mean, the "run puppet" task works, but the "post facts" task doesn't.  they are 2 different modules, but both lack the ifnamemain17:21
pabelangerRight, not sure actually17:22
jeblairit also looks like no 'group' hiera files are being copied over17:25
clarkbfinally tracked down logstash.o.o fails, related to apache update looks like17:25
jeblairgrep "hieradata/production/group" puppet_run_all.log17:25
jeblairreturns nothing17:25
jeblairgroup copying last worked at 2016-05-25 12:52:02,46917:26
rcarrillocruzpabelanger: going thru now?17:26
rcarrillocruzi gotta leave shortly to buy some food17:27
pabelangerrcarrillocruz: stopped for the moment, going to try again shortly17:27
jeblair"MODULE FAILURE" first happened at 2016-05-25 13:03:58,51417:27
jeblairdo we know when we upgraded to 2.1?17:27
rcarrillocruzyesterday jeblair17:28
rcarrillocruzthe class installs latest from pip by default17:28
jeblairi mean a specific time17:28
rcarrillocruztime not sure, pypi page should tell17:28
rcarrillocruzsec17:28
pabelangerI don't see a timestamp on https://pypi.python.org/pypi/ansible/2.1.0.017:28
jeblairi'm trying to determine if both of those problems (which started within 10 minutes of each other) are related to 2.117:28
jeblairwell, i'm more interested in when our puppet upgraded it :)17:29
rcarrillocruzhmm, i don't see upload time17:29
rcarrillocruzhttps://pypi.python.org/pypi/ansible/2.1.0.017:29
*** baoli_ has joined #openstack-sprint17:29
jeblairMay 25 13:02:02 puppetmaster puppet-user[5811]: (/Stage[main]/Ansible/Package[ansible]/ensure) ensure changed '2.0.2.0' to '2.1.0.0'17:29
pabelangerwas just about to link that17:30
jeblairso yep, i think that both the post module failure and the group hiera copying issues are likely related17:30
rcarrillocruzttyl17:31
*** baoli has quit IRC17:32
jeblairremote:   https://review.openstack.org/321772 Pin to ansible 2.0.2.017:33
pabelangerjeblair: +217:34
jeblairclarkb: would you +3 that please?17:34
* clarkb looks17:35
anteayajeblair: where is the remote: bit coming from, is that an artifact from gertty?17:36
clarkbapproved, note that that host may not be able to puppet itself right now17:36
anteayawhen you post a patch url17:36
clarkbwe may have to manually downgrade after that merges17:36
jeblairanteaya: it's what gerrit sends back to 'git review' (and git review prints it on the terminal)17:41
jeblairanteaya: i've just been copy/pasting the whole line, so 'remote:' shows up17:42
anteayajeblair: ah yes, makes sense, thank you17:42
jeblairclarkb: and yeah, i can manually downgrade once that lands17:42
fungipabelanger: were you wanting to snapshot status.o.o and then delete it? or want me to?17:46
fungi(the old one i mean, not the new one of course)17:46
pabelangerfungi: Ya, I just have it shutdown for the moment.  I'll defer to you on how to handle it17:47
fungipabelanger: i'm creating an image from it called "status-precise-backup" and will delete the server instance once that shows up17:48
pabelangerfungi: ack17:48
fungionce it looks like ansible is happy on puppetmaster again, i'll try booting paste01.openstack.org (bkero's change to pass through the vhost name has merged just a little while ago)17:50
bkero\o/18:05
clarkbpabelanger: for the recently completed 08 host are you using new ansible thingy or old launch script?18:12
clarkbcurious because downgrade of ansible will break the new thing aiui18:12
pabelangerclarkb: I reverted to launch-node.py18:12
pabelangerI tried cloud-launcher a few more times, but ran into issues18:13
pabelangerlikely trivial to fix, but don't want to get bogged down debug it right now18:13
clarkbkk18:13
clarkbI am really happy at how many of these we are knocking out. I am sad I didn't hav time to help more early in the week but the list of precise is dwindling18:14
pabelangerYa, so far things are working really well18:14
pabelangerI am pleased18:14
clarkbpabelanger: if my current stack for logstash.o.o fails I will need to add a third change which adds the guards to puppet-logstash18:16
anteayapabelanger: eavesdrop is done?18:16
pabelangeranteaya: not yet, going to do it tomorrow.  only 2 meetings scheduled18:16
pabelangerbut data is persisted18:16
anteayapabelanger: awesome18:16
anteayanow I understand what is in the etherpad for that server18:17
pabelangerI quickly looked at lists.o.o for the data, but need to figured out the mount point18:17
anteayafungi: storyboard.o.o is done I think, yes?18:17
anteayapabelanger: cool18:17
rcarrillocruzpabelanger: paste me the issues so I can check when back home18:17
pabelangerrcarrillocruz: will do18:18
clarkbjeblair: does zuul 2.5 have a story for our privileged long running slaves? wondering if jenkins.o.o needs to be treated separately from the other jenkisnes18:18
anteayazuul got restarted yesterday to pick up a patch, what is the status of its server?18:18
clarkbanteaya: zuul is still precise18:18
anteayaokay18:19
pabelangerYa, that should be straightforward to upgrade. We just need to schedule the outage I think18:19
jeblairclarkb: thanks for asking!  https://review.openstack.org/321584  https://review.openstack.org/321615  https://review.openstack.org/32161618:19
anteayaso zuul, wiki and static I think have no notes beside them18:19
clarkbpabelanger: doing the ES hosts shouldn't be too difficult either. The process there will be to ru nthat temporary no allocation curl command, shut off ES on a host, detach its cinder volume, boot new cinder host reattaching that cinder volume, start ES, delete old host, then enable allocation again18:20
anteayaall others appear to be in some sort of progress18:20
clarkbpabelanger: really similar to how we did the ES upgrades18:20
pabelangerclarkb: Ya, was going to ask what was needed for that.  But makes sense18:21
fungianteaya: yep, just crossed it out18:21
*** baoli_ has quit IRC18:21
anteayafungi: awesome, thank you18:21
*** baoli has joined #openstack-sprint18:21
clarkbjust looking at the list static is likely to be tricky18:22
clarkbwe will need to pause all new jobs, wait for running jobs to finish (or kill them early), then do a cinder volume shuffle18:22
pabelangerclarkb: maybe we can do both zuul.o.o and static.o.o at the same time18:23
clarkbpabelanger: good idea18:24
clarkbjeblair: comments on https://review.openstack.org/#/c/321616/218:25
jeblairi can not type that line correctly18:27
clarkbjeblair: comment on https://review.openstack.org/#/c/321615/2 as well18:28
clarkbnow to review the big change that implements the thing18:28
clarkb"big"18:29
jeblairheh, it totally replaces a comment with the thing the comment said we would replace it with someday!18:29
anteaya...how big is it...18:29
clarkbwell its dense18:29
clarkball that erb to read18:30
clarkbhalf the characters are non in alnum18:30
clarkbjeblair: squashing may not be a terrible idea, but either way18:31
jeblairclarkb: there is some trepidation the hiera thing may not work. i'd like to land both and try it, but if it doesn't, i kept them separate so we can revert the 2nd and have working zl.18:32
clarkbgotcha18:32
jeblairtesting also caught an issue with 615 fixing that too18:33
jeblairboth sysconfig changes updated18:36
pabelangerokay, 10 of 20 logstash-workers upgrade to ubuntu-trusty18:37
pabelangerGoing to take a break and walk down to pickup my daughter from school18:37
pabelangerI'll finished off the other 10 when I get back18:37
jeblairoh, updated one more time because i forgot our group double-accounting18:41
jeblairpabelanger: btw, you can #status in any statusbot channel (incl this one)18:41
clarkbjeblair: one more thing on https://review.openstack.org/#/c/321615/4 I don't think the regexes currently match up between ansible and puppet, I left a comment for what a possible regex would be18:45
jeblairclarkb: oh yep18:46
jeblairdone18:47
*** baoli has quit IRC19:01
*** baoli has joined #openstack-sprint19:01
rcarrillocruzback19:08
rcarrillocruzsup pabelanger , how many workers have been migrated19:08
anteayarcarrillocruz: he is getting his daughter19:12
rcarrillocruzcool , thx19:13
anteayawelcome19:13
nibalizeryall are doing a great job!19:13
nibalizersorry im not helping!19:14
anteayanibalizer: I think you have helped in a few key moments19:14
clarkbnodepool needs a hug19:19
anteaya<hug>19:21
*** baoli has quit IRC19:23
pabelangerjeblair: neat, TIL19:29
pabelangerrcarrillocruz: 10 of 20 ATM19:30
pabelangergoing to finish them off using launch-node.py19:30
pabelangerplan to do some more testing of cloud-launcher once I'm finished19:30
rcarrillocruzyou remember what kind of issues you had earlier so I could poke?19:34
pabelangerrcarrillocruz: ssh hostchecking was 119:35
pabelangerlet me see if I have backscroll of other19:35
pabelangerI did a quick hack to disable it via env in playbook19:35
clarkbmy logstash fixes are still hanging out in check19:36
anteayaplaying cards, drinking beer19:37
pabelangerclarkb: Ya, don't see any stale nodes.  Just busy today it seems19:39
*** baoli has joined #openstack-sprint19:41
pabelanger#status log logstash-worker11.openstack.org now running ubuntu-trusty and processing requests19:45
openstackstatuspabelanger: finished logging19:45
clarkbpabelanger: busy and osic and bluebox are basically offline due to fip things19:46
clarkbor rather osic is not sure about bluebox19:46
pabelangerclarkb: Ya, looking forward to when we fix shade19:47
fungilooks like we're back to a working ansible version on puppetmaster again19:47
fungithanks jeblair!19:47
clarkbfungi: jeblair did someone manually downgrade or did it sort itself out?19:48
fungii don't know, i simply checked `pip list|grep ansible`19:48
pabelangernot I19:49
pabelangerclarkb: we do have a large number of server in delete state on nodepool.o.o about 13219:52
pabelanger86 in OSIC alone19:52
pabelangerso, not that bad, if we account for the FIP issue19:53
clarkbpabelanger: they are in that state due to the fip issue19:53
clarkbpabelanger: they take an hour to build, timeout, then get deleted19:53
pabelangerwe seem to be on an uptick of deleting nodes however: http://grafana.openstack.org/dashboard/db/nodepool19:53
pabelangerHmm, something up with ORD: http://grafana.openstack.org/dashboard/db/nodepool-rackspace19:54
pabelanger13mins time to ready ATM19:54
pabelangerstatus.rackspace.com is reporting some ORD storage maintenance today:19:55
pabelangerhttps://status.rackspace.com/19:55
fungiunfortunately my first attempt at booting paste01 failed, so i'm rerunning with --keep and going out for a walk20:02
fungibbiaw20:02
pabelangerI'm having some issues using ansible-playbook on puppetmaster.o.o.20:08
pabelangerlooks to be related to the inventory20:08
anteayafungi: enjoy your walk20:09
pabelangerI suspect JJB is running on jenkins servers, which is affecting it20:11
clarkbcan has approval for https://review.openstack.org/#/c/321778/ and its dependency?20:12
jeblairfungi, clarkb, pabelanger: neat.  i did not manually fix puppetmaster, guess it fixed itself20:13
clarkbwith that stack in I will retry making logstash.o.o20:13
jeblairclarkb: you might use zuul enqueue20:13
clarkbjeblair: I don't think I need enqueue, they both passed testing20:13
clarkbjust need review and approvals20:13
jeblairclarkb: oh, thought you mentioned something being stuck in check20:13
clarkbthey were I occupied my time with other stuff so it was fine20:14
pabelangerhttp://paste.openstack.org/show/505734/20:15
pabelangerthat's the error I am seeing now when I run ansible-playbook20:15
pabelangerhoping it fixes itself20:15
clarkbhrm looks like it can't talk to osic?20:16
clarkbmaybe try using openstackclient against the same clouds.yaml20:17
pabelangerclarkb: ya, looks to be an issue20:18
pabelangergoing to hope into #osic to see what is going on20:18
clarkbok20:18
pabelangerclarkb: all quiet in #osic. Do we have another contact besides cloudnull?  I believe he is on vacation today20:23
pabelangeradditionally, guess we found a bug in openstack inventory20:24
pabelangersince losing a cloud stop our puppet wheel20:25
clarkboh you know what20:26
clarkbI think the ssl cert had a really short time before expiry20:26
clarkbpabelanger: maybe check if the ssl cert for it is still good?20:26
pabelangerHmm20:27
pabelangerIssued On Thursday, May 26, 2016 at 2:27:00 PM according to chrome20:28
pabelangerI'm also using python-openstackclient20:29
clarkbya so thats brand new I wonder if related20:30
pabelangerhttps://bugs.launchpad.net/python-openstackclient/+bug/144770420:30
openstackLaunchpad bug 1447704 in python-openstackclient "token issue fails for keystone v2 if OS_PROJECT_DOMAIN_NAME or OS_USER_DOMAIN_NAME are set" [Medium,Fix released] - Assigned to Hieu LE (hieulq)20:30
pabelangerlooks like the same backtrace I am seeing20:30
pabelangerDiscoveryFailure: Could not determine a suitable URL for the plugin20:31
clarkbmaybe we updated other libs?20:32
pabelangerpython-openstackclient 2.5.0 was just tagged 1 hour ago20:32
pabelangerwith a fix20:32
pabelangerlet me test in a venv20:32
pabelangersame issue20:35
pabelangerand --insecure doesn't work either20:35
pabelangerhttp://paste.openstack.org/show/505739/20:37
pabelangerSNIMissingWarning is new to me20:37
clarkbpabelanger: thats part of urllib3 trying to be a good citizen by annoying its users in hopes they will get the services they talk to to fix their ssl certs20:38
jeblairoh, it's over here :)20:38
clarkber not SNI there is a different one. In any case urllib3 has a handful of warnings that are "hey user bad things that you probably can't easily fix yourself"20:38
pabelangerright20:39
jeblairOpenStackCloudException: error fetching floating IPs list: 503 Service Unavailable20:40
jeblairThe server is currently unavailable. Please try again at a later time.20:40
jeblairthat's what running nodepool is seeing20:40
jeblairor at least one of the errors20:40
pabelangerso, maybe they are down20:40
jeblairof course, it may have something cached20:40
pabelangerI have a query into #osic20:40
jeblairwe might see something different if we restart nodepool20:40
pabelangerjeblair: clarkb: seems to be related to the new SSL cert, according to #osic. They are working on it20:49
clarkbfun20:50
clarkbpabelanger: I am guessing that broken ansible inventory is preventing the puppet modules from updating on puppetmsater because we use ansible to do that20:50
jeblairoh, what's wrong with ansible inventory?20:51
clarkbjeblair: the osic thing20:52
clarkbit doesn't gracefully handle clouds being gone20:52
jeblairoh20:52
pabelangerclarkb: ya20:52
pabelangerI don't know how to tell openstack inventory to skip osic20:52
jeblairwe could remove it from that clouds.yaml20:52
jeblairit might be nice to fix it so that it fails gracefully, but some day we're going to have to think about what that means for a systems that wants to automatically create servers that don't exist20:53
pabelangerYa, I think commenting it out for the moment is our fix20:54
clarkbit will put itself back in if you don't disable puppet on the puppetmaster20:54
jeblairi will do these things20:54
pabelangerthanks20:55
jeblair#status log puppet disabled on puppetmaster (for the puppetmaster host itsself -- not globally) and OSIC manually removed from clouds.yaml because OSIC is down which is causing ansible openstack inventory to fail20:57
openstackstatusjeblair: finished logging20:57
clarkbtyty20:57
clarkbnew logstash.o.o launching now21:03
pabelanger#osic says they are reverting the SSL cert now21:03
jeblairpuppet run all is running21:03
pabelangernodepool.o.o is building nodes again in OSIC21:04
pabelanger#status log logstash-worker12.openstack.org now running ubuntu-trusty and processing requests21:05
openstackstatuspabelanger: finished logging21:05
fungibkero: if i try to launch paste01.openstack.org it doesn't puppet sufficiently for me to even be able to log into it, so no idea what's wrong there. if i launch paste.openstack.org it works fine: 2001:4800:7817:104:be76:4eff:fe06:83b8, 23.253.238.18721:06
fungi(for definitions of fine where i needed to `sudo start openstack-paste` because it doesn't start automagically, that is)21:07
bkerofungi: Huh, let me check the service resource agaib21:08
bkeroagain*21:08
fungiso anyway, i'm inclined to just replace it with the trusty one i booted as paste.o.o and have tested and confirmed to be up and working/serving content from trove21:12
bkerook21:13
* bkero looks at the puppetboard run anyway to see ig it's something we need to be worried about21:13
clarkbpabelanger: I got a whole bunch of http://paste.openstack.org/show/505745/21:14
clarkbthat almost seems like an issue with new ansible21:15
clarkbbut pip says 2.0.2.0 is installed21:15
clarkbin any case I appear to have a new logstash.o.o that didn't break during puppeting21:16
clarkbshould I go ahead and use it or debug the above issues first?21:16
jeblairclarkb: yeah, that's the same inventory error.  apparently i failed at preventing it from being reverted21:17
fungibkero: i can retry with paste01 one more time just to confirm it wasn't a fluke21:17
jeblairclarkb: because it's 'localhost'21:18
clarkbaha21:19
bkerofungi: ok, do you know where i can see the report made by the puppet run?21:19
clarkbjeblair: do you think I should rerun? I do not know which step requires the inventory, but the host is built21:19
clarkbjeblair: its pretty cheap to delete and rebuild for safety though21:19
jeblairclarkb: i think if you run with our ansible kick thing, it should be fine21:19
fungibkero: nope. i don't think launch-node does trigger a report?21:19
clarkbjeblair: I am not sure I know what that is21:19
bkeroOh, hrm21:20
fungibkero: i'm really not sure if it does or not anyway21:20
jeblairclarkb: tools/kick.sh (which runs the adhoc playbook)21:20
bkeroclarkb: any clue if launch-node generates puppet reports?21:20
fungibkero: puppet apply logs in syslog, but since we don't get that back through ansible, if it doesn't puppet far enough to set up my account i can't ssh in to look at the errors21:20
clarkbbkero: no idea21:20
clarkbjeblair: oh puppet ran and everything just fine21:21
bkeroi tested locally, but obv the environment is different21:21
clarkbwhich is why I am confused about why it needs to execute the inventory script, maybe to update the cacahe21:21
jeblair(though, looking at that, i wonder if disabling localhost (which i have now done) will have further adverse effects)21:21
jeblairclarkb: oh!  yes, launch-node does a cache flush21:21
jeblairclarkb: i thought it just removed, but maybe it repopulates too?21:21
clarkbya I am guessing that is what it is trying to do21:21
clarkbI am going to just redo since its quick and low cost and will ensure all that data is correct21:22
jeblairmanage-projects is running on review.o.o and taking seriously long time21:22
jeblairclarkb: it runs 'expand-groups' after clearing the cache21:22
jeblairclarkb: which uses ansible to list hosts, so yeah21:23
clarkbok rebuilding now21:24
bkeroI'm surprised puppet could be borked enough to at least not set up users. I wonder if install_puppet had a network hiccup or something21:24
pabelangerclarkb: that usually happens when ansible inventory is doing something21:25
pabelangerclarkb: I don't know what, but it eventually fixes itself21:26
fungibkero: yeah, the failure is consistent. this is the error i get back from the launch script:21:29
fungifatal: [paste01.openstack.org]: FAILED! => {"changed": false, "disabled": false,21:29
fungi "error": true, "failed": true, "msg": "puppet did not run", "rc": 1, "stderr": "", "stdout": "", "stdout_lines": []}21:29
fungibkero: and if i try to ssh to the ipv4 or ipv6 address of the kept (broken) server, it prompts me for a password implying it didn't get far enough to puppet my ssh key on there21:30
bkerofungi: the comments in ansible-puppet seem to indicate that it's a compilation failure.21:31
fungiand if i make the hostname paste.openstack.org instead, it's fine21:31
clarkbwell it failed again, I am just going to ignore that stuff for now and move forward with finishing this server replacement21:31
clarkbany objections?21:32
jeblairclarkb: sounds good21:33
bkerofungi: The only difference I can think of is if the "vhost_name" parameter makes catalog application fail :/21:34
fungibkero: which is odd since it's a class parameter passed directly to http://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/manifests/paste.pp#n621:36
bkerofungi: Yep. Shouldn't make a diff.21:36
clarkbDNS updates are done21:37
clarkbwill bounce iptables on jenkinses and logstash workers as soon as the new stuff resolves21:37
pabelangerokay, moving on to logstash-worker13.o.o replacement21:39
clarkbactually I think only the jenkisnes need it21:39
clarkbsince the workers connect to it21:39
*** rfolco has quit IRC21:43
clarkbhttp://logstash.openstack.org/ forbidden!21:47
clarkbI think this means I need to do the file stuff for 2.4?21:47
clarkbeverything else seems to be functioning21:49
fungiclarkb: i get a ton of the same errors you pasted in http://paste.openstack.org/show/505745/ every time i successfully launch a server too, so it's not just you. nobody else seemed to be able to reproduce it, but i guess you can21:52
fungialso, i'm updating dns for the new paste.o.o now21:53
clarkbI am making sure all the workers are talking to new logstash.o.o then will work on fixing apache config, then can delete old one21:54
fungi#status log paste.openstack.org now running ubuntu-trusty and successfully responding to requests21:59
openstackstatusfungi: finished logging21:59
fungii wonder if we should take the downtime during the gerrit rename maintenance window as an opportunity to replace static.o.o22:01
anteayafungi: earlier there was a thought that when zuul was being replaced that that downtime might make a good static replacement window22:03
anteayabut I don't know if zuul has already been replaced22:03
anteayaif not, two windows to replace static22:03
fungiyeah, that's a possibility, or we also do zuul during that same window (but zuul seems like it would be potentially quicker to replace?)22:03
anteayayeah the gerrit downtime is not until a week tomorrow22:04
clarkbfungi: pabelanger mentioned doing zuul at the same time22:04
anteayabut I'm not on the root end so whatever rooters want22:04
anteayayeah that's right it was pabelanger's idea22:05
anteayasorry about that, forgot who mentioned it22:05
fungiwe've made huge progress this week, so if some stuff gets pushed off i won't object22:05
fungii mean, we've already identified at least a couple we won't be able to migrate for a while22:05
fungi(planet, wiki)22:05
anteayapleia2: was still working on planet last I understood22:06
anteayais there more to the tale?22:06
clarkbfungi: you were saying we did the Require all granted stuff in two different ways? one of them is using mod version to switch on including it, the other is what? I need to add it to kibana's vhost22:06
anteayaI think wiki was the only server that didnt' get touched or talked about22:06
fungiclarkb: yeah, i don't remember and would resort to digging up examples22:06
anteayaresort!22:08
anteayaresorts are nice I hear22:08
anteayabeaches and so on22:08
clarkbnow to find the conditional for installing mod version22:08
pabelanger2016-05-26 22:08:45,320 Error connecting to logstash.openstack.org port 473022:08
pabelangerthat is what I am seeing now22:08
clarkbpabelanger: you have to bounce iptables on logstash.o.o22:09
clarkbafter dns is updated22:09
pabelangerclarkb: did22:09
pabelangerwell, I think I did22:09
clarkbpabelanger: on the new one?22:09
pabelangerI have to check, I wrote a quick ansible playbook to do the bouncing22:10
clarkb23.x.y.z is new. 166.x.y.z is old22:10
pabelanger08c356e5-d225-4163-9dce-c57b4d68eb55 : ok=0    changed=0    unreachable=1    failed=022:10
clarkbpabelanger: thats the right uuid, maybe you raced the record timing out?22:11
clarkbin any case the other 19 are working22:11
pabelangerokay better22:11
pabelangerI had to fix SSH host keys on puppetmaster.o.o22:11
pabelanger#status log logstash-worker13.openstack.org now running ubuntu-trusty and processing requests22:12
openstackstatuspabelanger: finished logging22:12
fungii'm not seeing any new traffic to the old paste server for ~10 minutes now, so i'm going to halt and start snapshotting it22:12
clarkbpabelanger: which repo did you do the mod version thing in again?22:12
fungiup 449 days22:12
fungisorry to see some of these uptimes die22:12
pabelangerclarkb: puppet-graphite I think22:13
pabelangerya, that's right22:13
clarkbyup found it, thanks22:15
fungithe status-precise-backup image exists now, so i'm deleting the old offline server instance22:15
pabelangerack22:16
clarkbhttps://review.openstack.org/321875 should be all that is needed to finish up logstash.o.o move22:17
bkerofungi: odd. I'm trying to replicate again locally using a simple manifest. Seems to be apply fine. O_o http://paste.openstack.org/show/505749/22:24
bkerocreating a user for you, write perms, etc22:24
bkeroon a clean trusty host :/22:25
pabelanger#status log logstash-worker14.openstack.org now running ubuntu-trusty and processing requests22:33
openstackstatuspabelanger: finished logging22:33
clarkbpabelanger: if now is a good time for reviews https://review.openstack.org/#/c/321875/ has your name on it :)22:34
pabelangerclarkb: WFM22:35
pabelangerI can +A if needed22:35
clarkb+A would be great22:35
bkeroDoes openstack-infra's hiera limit access to variables based on node name?22:43
clarkbbkero: ya ansible only copies the data that belongs to a speicifc host or group22:44
bkeroclarkb: I'm wondering. http://paste.openstack.org/show/505752/22:45
bkeropaste01.openstack.org fails, but paste.openstack.org succeeds (as trusty)22:45
bkeroI'm wondering if 1) that hostname regex isn't matching like I assume it is, or 2) hiera values aren't accessible.22:46
clarkboh ya the hostname changes so the asnible hiera matching stuff won't pick it up22:46
bkeroWould that cause ansible-puppet to hit this? https://github.com/openstack-infra/ansible-puppet/blob/master/library/puppet#L21922:47
bkeroThat's what's happening22:47
clarkbI would expect it to run and fail on hiera lookups22:48
bkerofungi's user isn't even being created, so it can't be getting terribly far based on my local run. http://sprunge.us/hOGO22:50
clarkbwhat is hiera's behavior when there is no hieradata?22:51
clarkbmaybe its bailing out really early?22:51
bkeroThat's why the hiera() function has 2 parameters22:52
bkeroif it fails it returns the 2nd22:52
bkeroif it doesn't have a 2nd...dunno22:52
clarkbpossible it also has different behavior when the files that should have the data don't exist22:53
bkerofungi: I'm guessing that's it ^22:53
bkeroclarkb: is infra's hiera data in a secret repo?22:54
clarkbbkero: the secret info is, there is a public set of data in system-config too22:54
bkeroclarkb: I'm guessing the new hostname should probably be in system-config/hiera/common.yaml22:55
bkerofungi: ^22:56
bkero(sorry for double-ping)22:56
clarkbit should go in cacti hosts maybe, depending on how dns is configured22:56
clarkbbut otherwise I don't think it needs anything there22:56
bkeroCacti? For paste.o.o?22:57
clarkbya that list there tells cacti which hosts to poll22:57
bkeroAh22:58
pabelanger#status log logstash-worker15.openstack.org now running ubuntu-trusty and processing requests23:01
*** rfolco has joined #openstack-sprint23:01
openstackstatuspabelanger: finished logging23:01
clarkbok kibana apache 2.4 fix has merged, almost done here23:01
fungibkero: clarkb: ooh! great point. our host-based hiera split is almost certainly at odds with the idea that we'll have host name patterns in most cases going forward23:07
fungii didn't even consider that23:07
fungii'll follow up to the ml thread with that as yet another caveat23:08
bkerofungi: Don't know if you want another review to add that to the correct hiera groups, or to just go with the old naming scheme and adding a Node tag to it.23:08
fungiwe can hash it out on the ml. there are more stakeholders potentially following that thread23:08
bkerook23:08
bkeroSounds good23:09
*** asselin_ has quit IRC23:23
pabelanger#status log logstash-worker16.openstack.org now running ubuntu-trusty and processing requests23:29
openstackstatuspabelanger: finished logging23:29
clarkbI am not seeing puppet update logstash.o.o with my apache 2.4 fix, guessing ansible + puppet are still both unhappy?23:29
clarkbmaybe I didn't get the key accepted like I thought I did23:29
clarkbno ssh works23:29
pabelangerstarting wheel again now23:30
clarkblogstash.o.o doesn't show up in the puppet run all log23:32
pabelangerclarkb: try the UUId23:32
pabelangersuspect there is 2 logstash.o.o server ATM23:32
clarkboh right23:32
clarkboh hrm it looks like our puppet runs are taking a large amount of time and we may not be updating every 15 minutes currnetly23:34
clarkbI need to practice patience23:34
pabelangerwe usually get back to back puppet runs23:37
fungithe snapshot for the old paste.o.o instance is still 0% complete even after i went out to dinner and came back23:37
fungiopenstack has bugs?23:37
pabelangerJJB is usually the reason we don't23:37
clarkbfungi: are we snapshotting for safety? or are you wanting to boot off of snapshot?23:38
fungii'm snapshotting before deleting stuff that's not farm-style23:38
fungijhesketh talked me into it23:39
fungiit _looks_ like the ci-backup-rs-ord migration completed23:44
fungithough there are two still listed in the server list, the one on the new flavor seems to have inherited the ip addresses we put in dns and the legacy one was assigned different addresses23:45
clarkbfatal: [git06.openstack.org]: FAILED! => {"failed": true, "msg": "'dict object' has no attribute 'gitinfo'"}23:48
clarkbthat seems unhappy23:49
clarkband the puppet-kibana module doesn't appear to have updated on the puppetmsater which means it isn't getting updated on logstash.o.o23:49
clarkbbut I am quickly running out of steam for the day may have to pcik that up in the morning23:49
bkerowomp womp missing ansible dict elements23:49

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!