Monday, 2017-12-11

*** jkilpatr has joined #openstack-sprint00:09
*** baoli has quit IRC03:34
*** skramaja has joined #openstack-sprint05:26
*** ianychoi_ is now known as ianychoi06:12
fricklerianw: hi, sorry for being late, is there anything left I could help you with? or are you done for today?08:37
* frickler is seeing messages of type "< openstackrecheck> Console logs not available after ..." for the first time after weeks again this morning, does anyone know what happened there?09:04
ianwfrickler: hey, thanks for the reviews10:47
ianwyou can pick out any parts, or wait for clarkb etc10:48
ianwif you want to jump further into ethercalc, feel free10:48
ianwbasically, ssh to the testing host 23.253.119.134 and "cd /opt/system-config/production; puppet apply -v --modulepath=modules:/etc/puppet/modules manifests/site.pp" and keep fixing stuff till it works :)  there's some notes on the etherpad, for sure our puppet needs to ship a .service file instead of an upstart, for example10:50
ianwi'm off but will jump back in tomorrow!10:50
fricklerI've looked at your notes for ethercalc and was wandering whether we should do a systemd service file directly10:51
fricklerotherwise I can go on with iterating. do I need to become root for that?10:52
ianwfrickler: we'll want to replace https://git.openstack.org/cgit/openstack-infra/puppet-ethercalc/tree/templates/upstart.erb (and all the stuff that writes that out) with a .service file10:54
ianwfrickler: yep; for these hosts log in as yourself and sudo -s, for ci hosts you log in as root@ (your key should be deployed, i went through all that today :)10:55
fricklerianw: yeah, I'm on the host already, will go ahead and try to build a service definition10:56
ianwthat'll be step 1 ... the nodejs deployment stuff might need fiddling.  i think that will work out common for etherpad too, so that's good.  it's just a matter of trying & fixing failures till it works really10:58
ianwi thought this would be an easy one :)  status.o.o is probably *really* an easy one ... but you never know :)10:59
fricklereverything looks easy from the outside probably ;)11:01
*** skramaja_ has joined #openstack-sprint11:20
*** skramaja has quit IRC11:21
*** skramaja has joined #openstack-sprint11:25
*** skramaja_ has quit IRC11:25
*** jkilpatr has quit IRC11:37
*** ianychoi has quit IRC11:37
*** ianychoi has joined #openstack-sprint11:50
*** jkilpatr has joined #openstack-sprint12:11
*** baoli has joined #openstack-sprint13:10
*** clarkb has joined #openstack-sprint13:15
clarkbfrickler: dmsimard the puppetmaster:/etc/puppet/hieradata/production git repo is where we keep the root non public hiera data13:20
fricklerclarkb: so do I connect with my account and use sudo then?13:20
clarkbfrickler: correct13:20
clarkbthe reason our email addresses are not public is because we found people were using our puppet modules and installing our email addresses for root spam resulting in us getting their root email13:21
fricklerclarkb: can you take a look at elasticsearch in the meantime? seems the cluster is stuck13:21
fricklerprobably since ian did some updates earlier13:21
clarkblooks like es02 and es04 did not hav etheir elasticserach processes running (we don't let them start on boot to give us more control over cluster management) so I started the service on those two hosts13:22
clarkbin general editing this hiera repo is what we'll do to update ssl certs or db credentials and so on13:23
clarkbso adding yourself to the sysadmins list is a good first exposure to where that lives and how to update it13:23
fricklero.k., so I'll start with that now13:23
clarkband since its a shared repo when we edit it we'll usually drop a note in IRC to let others know not to conflict with us13:25
clarkbfrickler: looks like you are all done?13:27
frickleroh, I'm seeing a note in the log about issues with google mail13:27
fricklermy address is also gmail rebranded13:27
fricklermaybe I should set a something different from work email anyway13:28
fricklerbut for now I'm done with editing, yes13:28
clarkbit may not be an issue for you, I think pabelanger's red hat email is gmail too13:28
clarkbI personally had problems with gmail and switched away from it though13:29
clarkbfrickler: dmsimard the next thing I had in mind was to replace a logstash-workerNN.openstack.org node each since those are straightforward to replcae and should give us ability to focus more on process than specific service details13:30
fricklermy private stuff is hosted at hetzner.de but I need to move things around there a bit first13:30
clarkbfrickler: dmsimard if you haven't seen it yet you probably want to start at https://git.openstack.org/cgit/openstack-infra/system-config/tree/launch/README13:30
clarkbsystem-config/launch is our openstack cloud VM launching tool for booting new instances in clouds13:31
clarkbwhen executed from the puppetmaster it can make use of our clouds.yaml on that node making the process fairly straightforward13:32
clarkbI personally have a git clone of system-config in my homedir on the puppetmaster that I run that from13:32
fricklerI noticed that I'm in admin group but not puppet. Is the idea to set this up manually when needed or should this get better automation?13:32
clarkbfrickler: I think we've always just run the manual group addition like in that doc, but we probably could automate that instead13:33
clarkbif you want to work on a change to automate that I think it would be a good addition13:33
clarkb(but maybe for later so we can focus on launch node things now)13:34
frickleryeah, I'll put it on my todo list13:34
*** skramaja has quit IRC13:36
fricklerthe pip install is failing for me https://git.openstack.org/cgit/openstack-infra/system-config/tree/launch/README#n2113:37
clarkbfun, is it a dependency issue?13:37
fricklerfailing to build multiple wheels13:37
* clarkb works to reproduce13:37
fricklerhttp://paste.openstack.org/show/62861513:38
fricklerthats the full log13:38
clarkblooks like it failed to find Python.h which comes from python-dev13:40
clarkbI think this may be a new dependency or we otherwise were able to pull wheels for it in the past?13:40
*** ianychoi has quit IRC13:40
dmsimardI'm here for around ~30 minutes before I have to afk briefly, going to look at step 0 and bootstrapping13:40
clarkboh wait I see13:40
clarkbfrickler: we have python 2 dev files installed but not python3 and virtualenv defaulted to python3 for some reason13:41
clarkbfrickler: I'm testing with `virtualenv -p python2 launch-env`13:42
Shrewsclarkb: i'm probably going to need the same bootstrapping as frickler and dmsimard13:42
clarkbShrews: good morning, feel free to follow along, ask questions, etc. We hvae plenty of logstash worker nodes so should be plenty of room.13:43
*** baoli has quit IRC13:43
clarkbShrews: dmsimard has indicated he is editing the sysadmins list in hiera and since that is a shared git repo we will have to wait for him to indicate completion before you add yourself13:43
clarkbShrews: the file for that is puppetmaster.openstack.org:/etc/puppet/hieradata/production/common.yaml when dmsimard is done13:43
Shrewsk k13:43
clarkbbsaically you edit and commit as root and sign off on the change with your name in the commit message13:44
dmsimardclarkb: I see symlinks from <nickname> to production13:44
clarkbdmsimard: yes, that is an artifact of puppet environments13:44
dmsimardclarkb: is that a system used to "lock" ? i.e, we grep to see if there is a user doing it ?13:44
dmsimardoh, ok13:44
clarkbI've personally not used puppet environments any time recently becaues they are often quite clunky (and I think ansible-puppet may have mostly negated their usefulness by local applying everything)13:45
clarkbInstead I do my best to run puppet locally until I'm happy with it (which is probably a better way to do things anyways)13:45
clarkbfrickler: yes virtualenv -p python2 launch-env seemed to work13:45
clarkbfrickler: that forced virtualenv to make the env with python2 instead of python313:45
dmsimardclarkb: do people typically remain as their own user or they sudo as root ? i.e, I'd want to move to /etc/puppet/hieradata13:45
clarkbdmsimard: I think its a mix. I know pleia2 for example was really good about always sudoing everything and never properly becoming root. I came from an env where we didn't have sudo and only had proper root so end up as proper root more often than is good probably13:46
fricklerclarkb: confirmed, ansible installed fine now13:47
dmsimardoh wow, nano as default editor on git commit.. that's something I haven't seen in a long time13:47
dmsimard:D13:48
fricklerdmsimard: I stumbled about that too ;)13:48
clarkbdmsimard: I think that is how we've avoided the vi(m) vs emacs battle :P13:48
clarkbfrickler: great, I'll push a patch up for that now and then add you two to the infra root gerrit group so you can review it for me :)13:48
dmsimardShrews: I'm done editing hieradata13:48
Shrewsdmsimard: k13:49
Shrewsdmsimard: i noticed you didn't sign your commit. want to amend before I change anything?13:50
dmsimardShrews: let me see..13:51
clarkbfrickler: https://review.openstack.org/527092 and I will have gerrit groups updated momentarily13:51
dmsimardShrews: by sign you mean append my nickname to the commit description ?13:51
dmsimardShrews: or gpg sign ?13:52
Shrewsdmsimard: just a nick in the commit msg13:52
dmsimardShrews: ok, I added it13:52
dmsimardShrews: er, hang on..13:52
clarkb13:50:23   Shrews | dmsimard: k13:53
clarkbsilly weechat mouse mode13:54
dmsimardIt's picking up the author as "Your Name <you@example.com>"13:54
dmsimard¯\_(ツ)_/¯13:54
dmsimardfixing that13:54
dmsimardShrews: ok, go13:55
clarkbfrickler: dmsimard you have been added to the infra-core group in gerrit. So you can now +/-2 +/-A changes like https://review.openstack.org/52709213:56
fricklerclarkb: already done ;)13:57
clarkbdmsimard: you'll want to read https://git.openstack.org/cgit/openstack-infra/system-config/tree/launch/README next and follow the steps through to line 21 (but using my edit in change 527092)13:57
Shrewsalright, done. i don't have any option but a gmail address, so *shrug* if there's a problem with that13:57
clarkbShrews: ^ you'll want to follow that too13:57
dmsimardyou'd think that "virtualenv" would be py2 and "virtualenv-3.4" would be py3 :D13:58
clarkbya I'm not sure why its picking python3 yet13:58
clarkbI think because it got installed under python3?13:58
dmsimardahhhh13:59
dmsimardThe default is the                        interpreter that virtualenv was installed with                        (/usr/bin/python3)13:59
clarkbI'm going to make tea while everyone makes virtualenvs14:00
* Shrews is now virtual14:01
dmsimardThose are some kind of old versions of shade and ansible by now -- Ansible 2.1 is EOL actually. Are they pinned for a good reason ?14:01
clarkbdmsimard: they are pinned because releases of bot htend to break things. I'm not sure that they are pinned to those specific versions for a good reason though14:02
clarkbI expect that ansible 2.3 would work as well14:02
dmsimardclarkb: yeah that's totally fair, I would up the pin. I'll guinea pig ?14:02
clarkbdmsimard: maybe after the first round so that we can hopefully avoid problems first time through?14:03
dmsimardsure14:03
clarkbwhen we upgrade nodes typically what that actually means is replacing the instance with a new instance running newer software14:04
clarkbI only know of one case where we upgraded in place which was the lists.openstack.org upgrade and we did that ot keep the IP and its reputation for sending email14:05
clarkbto upgrade logstash worker nodes we will be using the replace method14:05
clarkbSo the next step is looking at the old instance(s) to see what flavor/size/distro we need `openstack --os-cloud openstackci-rax --os-region DFW server show logstash-worker01.openstack.org` should be runnable as a normal user on puppetmaster to give you that info14:06
clarkbin this case we see the flavor is 'performance1-4' and it is indeed a trusty node so we will want to replace it with a 16.04 xenial node14:06
Shrewsaye14:07
dmsimardclarkb: should we grab a copy of clouds.yaml from root and put it in our home directory ?14:07
clarkbdmsimard: no, you should probably use the root copy it should be readable by your user14:07
* dmsimard looks14:08
clarkbdmsimard: the root copy is the default for openstack client and this way we can keep it up to date more easily14:08
clarkbyou can also do things like flavor list and image list to get a sense of what flavors and images are available14:08
clarkbone piece of information that the launch README doesn't really call out that is probably worth being more explicit about is that we have two tenants/users/projects/whateveritscalledtoday14:09
clarkbwe have the openstackci account and the openstackjenkins/openstackzuul account. openstackci is where we run the control plane servers and openstackjenkins/openstackzuul is what we give nodepool access to14:09
Shrewsyeah, that's kinda important14:10
clarkbin this case we are using the openstackci account because logstash workers are in the control plane but when you work with nodepool nodes you will use the openstackzuul/openstackjenkins account14:10
dmsimardclarkb: yeah I guess that's why I was asking for the clouds.yaml -- in order to use openstackclient "freely"14:10
clarkbdmsimard: you should be able to use it freely already14:11
clarkbdoes the command I pasted above work for you?14:11
clarkb(it should work as is)14:11
Shrewsi don't see the other account(s) in clouds.yaml14:11
Shrewsoh, all-clouds.yaml has them14:12
clarkbShrews: oh thats brings up another important piece of info. We have two clouds.yaml the default file only has control plane stuff and then there is all-clouds.yaml which you can set with an env var for everything14:12
Shrews*nod*14:12
dmsimardclarkb: what is this magic, are we created as uid 0 ?14:12
clarkbthe reason for this is the ansible-puppet things use the dfeault file and we don't want it attempting to puppet nodepool nodes14:12
*** baoli has joined #openstack-sprint14:12
clarkbdmsimard: I think its just group membership14:12
dmsimardclarkb: huh, I totally expected osc to seek in ~/.config, not ~/root/.config14:13
clarkbdmsimard: ya group admin gets rw access to the file14:13
dmsimardwell, wfm14:13
clarkbdmsimard: it is actually looking at /etc/openstack/clouds.yaml14:13
Shrewsdmsimard: shade (or occ, rather) will look in /etc/openstack and ~/.config14:13
Shrewspart of occ magic14:13
dmsimardclarkb: ohhhhhh, yeah /etc/openstack totally makes more sense than my confused explanation14:14
Shrewsos-client-config for non-shorthand14:14
clarkbso now we should all pick a unique logstash-workerNN NN value then we can start running some boots14:14
* frickler picks 0114:14
* Shrews picks 0214:15
dmsimardI have to afk briefly but I'm all set up, I'll pick a number when I'm back14:15
dmsimardLet's use the pad to keep up with who's doing what14:15
clarkbdmsimard: ++ to using etherpad to track14:15
dmsimardpad is here: https://etherpad.openstack.org/p/infra-sprint-xenial-upgrades14:15
dmsimardok, brb14:16
Shrewsclarkb: and we use launch-node.py, right?14:16
fricklerclarkb: how does our quota look like, do we need to check before launching new servers14:17
clarkbShrews: correct14:17
clarkbfrickler: I don't actually know but we can ask openstackclient for that info (or we can just execute the command and if we don't have enoug hquota it will fail fast14:17
clarkbbefore we start though a few more things14:17
Shrewsclarkb: value for $FQDN can be the same as the thing we are replacing?14:18
clarkbsince this is a base distro image upgrade we should be careful to explicitly set the image name we want. Also make sure they flavor matches the old server's14:18
clarkbShrews: yes14:18
* clarkb will make a quick paste for what the commands should look like in this specific case14:18
Shrewsa3b50a75-2fe0-437a-bf7a-04c2cf0adf4c | Ubuntu 16.04 LTS (Xenial Xerus) (PVHVM)14:19
clarkbya, something like http://paste.openstack.org/show/628623/14:20
clarkbreplacing the NN with your chosen value14:21
clarkbalso I tend to run this in screen14:21
clarkbsome server builds take longer than expected and being able to close the laptop is nice14:21
Shrewsoh, yeah. that's a good tip14:22
dmsimardOh yay other screen users14:22
* dmsimard needs to learn tmux14:22
Shrewsespecially since i have a chiro appointment soon14:22
clarkbfrickler: Shrews but ya I think you can go ahead and run that whenever you are ready14:22
clarkbin this specific case the server we are bringing up is largely stateless and will start its life firewalled off from the rest of the cluster so very little to worry about :)14:23
* Shrews launching14:24
* frickler is launching too and will be back in a couple of minutes14:24
* fungi sprints in, very late14:26
fungii'll get something good in the channel topic in just a sec14:27
fungididn't we have an ml thread discussing this? was it just in meetings?14:28
fungii guess i can link the 'pad14:28
clarkbthere is a ml thread too but I think the etherpad is likely most useful14:29
*** ChanServ changes topic to "OpenStack Infra team Xenial upgrade sprint | Coordinating at https://etherpad.openstack.org/p/infra-sprint-xenial-upgrades"14:31
clarkbShrews: frickler let me know when that completes (there should be a bunch of information about dns related items which we'll talk about next once we have that info)14:32
Shrewsooh exception14:33
clarkbwoo fun14:33
fungithe launch script raises an exception if puppet (or anything really) fails during the process14:34
Shrewshttp://paste.openstack.org/show/628626/14:34
clarkbok I think I actually know what this bug is14:35
clarkbI expect we'll be seeing a lot of this one because systemd14:35
* fungi shakes fist at systemd14:35
clarkbwell systemd and puppet. The problem (I think) is that we use sys V init scripts which systemd supports but you have to reload its config for it to find them14:35
clarkbpuppet does not reload this config for us automatically so we'll need to add some puppetry to do that14:35
clarkbI've done this before for zanata let me dig up that change14:36
dmsimardok, I'm back14:36
fungifor basically any service we're adding custom initscripts with i suppose14:36
clarkbfungi: ya14:36
* Shrews has to step away a sec. brb14:36
fungii guess the distro packages use maintscripts to register their initscripts with systemd14:36
dmsimardfungi: that was the thread for the sprint: http://lists.openstack.org/pipermail/openstack-infra/2017-November/005702.html14:36
*** mrhillsman has joined #openstack-sprint14:37
clarkbfungi: ya and puppet's excuse is that this is how you are suppoesd to use puppet14:37
fungidmsimard: ahh, back in november. no wonder i wasn't spotting it14:37
clarkbwe basically need to add the code that was removed in https://review.openstack.org/#/c/423369/3/manifests/wildfly.pp to the puppet for logstash workers14:39
clarkb(it was removed in ^ because an external dep solved the problem for us, but we don't have external deps for logstash workers like that so we'll carry it ourselves)14:39
clarkbdoes someone else want to work on that change or should I?14:39
clarkbShrews: frickler another note, by default launch-node.py will clean up after itself on failure by default so you shouldn't need to do anything special here14:40
fungiand if you need it _not_ to clean up after itself, add --keep14:41
dmsimardclarkb: I can send a patch.14:41
clarkbdmsimard: cool I think you want to edit worker.pp in puppet-log_processor repo14:41
fungii'm starting on subunit-worker01 (to replace subunit-worker02) since i had actually started trying to boot it on xenial a month or so ago and then got sidetracked by other stuff14:42
fungiodds are i'll want to copy dmsimard's patch for that14:42
clarkbyup14:42
fungishould we patch tools/launch-node.py to switcn the default image to 'Ubuntu 16.04 LTS (Xenial Xerus) (PVHVM)' now?14:45
fungis/switcn/switch/14:45
clarkbfungi: probably a good idea to prevent regressions launching new servers or mistakes if we forget to specify the image14:45
fungipatch on the way then14:46
fungido we have a review topic we're using?14:46
dmsimardshould we use a topic for sprint patches ?14:46
dmsimardwow, fungi beat me to it :)14:47
fungiheh14:47
fungilet's use topic:xenial-upgrades14:47
dmsimardwfm14:47
dmsimardhttps://review.openstack.org/527109 is up for logprocessor14:47
fricklerianw started with topic infra-xenial already14:48
fungiahh, i'll adjust accordingly. now i see it in the notes section of the pad14:48
fungitotally missed it earlier14:48
clarkbdmsimard: +214:48
fricklercouple of patches that could be reviewed there already https://review.openstack.org/#/q/status:open+topic:infra-xenial14:49
dmsimardok /me switches topic14:49
ShrewsSo once that lands to the puppet-log_processor repo, do we need to update a repo on puppetmaster, or is that done automatically by the launch script?14:49
clarkbShrews: the puppet modules are updated by the ansible run puppet cron. Which runs every 15 minutes but due to how long it takes to get through effectively runs every 45 minutes14:50
clarkbin this case I think we can go ahead and update the git repo early to speed up the process14:51
dmsimardfrickler: oh your comment on https://review.openstack.org/#/c/515279/ .. I remember writing a blog post exactly for stuff around those lines when 14.04 came out14:51
fungiheh, pabelanger already beat me to https://review.openstack.org/502856 "Bump default image to xenial to launch-node.py"14:52
fungiso we can already skip specifying --image14:52
dmsimardclarkb: it looks like https://review.openstack.org/#/c/515279/ would save us some trouble14:54
clarkbreading now14:55
Shrewsclarkb: so I'm not seeing another logstash-worker02 in the server list. i guess the process didn't get far enough to create it14:55
Shrewsor it automatically deleted it?14:56
*** jeblair has joined #openstack-sprint14:56
fungilikely the latter14:56
clarkbShrews: it automatically deleted it14:56
Shrewsmaybe i should just look at the code :)14:56
clarkblaunch-node tries to be helpful that way14:56
fungiif the launch fails for any reason then the script will delete the instance14:56
Shrewsyay14:56
fungiunless you specify --keep and then you can use the temporary root ssh key for that uuid in /tmp to log into it if you need to investigate it directly14:57
*** baoli has quit IRC14:57
clarkbfungi: are you willing to be second reviewer on https://review.openstack.org/#/c/527109/1 ?14:57
Shrewsfungi: *nod* thx14:57
*** baoli has joined #openstack-sprint15:00
*** baoli has quit IRC15:00
*** pabelanger has joined #openstack-sprint15:01
pabelangero/15:01
pabelangerrunning a little behind this morning15:01
pabelangerjust getting coffee and will start reviewing changes that are up15:02
fungisubunit-worker01 puppet-user[11998]: (/Stage[main]/Subunit2sql/Package[subunit2sql]/ensure) change from absent to latest failed: Could not update: Execution of '/usr/local/bin/pip install -q --upgrade subunit2sql' returned 1: Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-9bVQMS/netifaces/setup.py';f=getattr(tokenize, 'open',15:03
fungiopen)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-wuw6O6-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-9bVQMS/netifaces/15:03
fricklernot directly related but adding me to accessbot could also use another review https://review.openstack.org/52612515:03
fungilooks like building netifaces from sdist is failing when attempting to install subunit2sql15:03
*** baoli has joined #openstack-sprint15:04
clarkbfungi: I wonder if that is due to the same issue we had with virtualenv on the puppetmaster (using python3 instead of 2)15:04
fungiahh, right, this is the case where we want to override --upgrade-strategy for pip15:04
dmsimardfrickler: ah I guess I should do that too.15:04
fungiit's calling /usr/bin/python according to that message, so should be python 2.715:05
clarkboh ya15:05
* clarkb pops out for a few, brb15:06
fungii think it's just trying to upgrade to a later netifaces than the distro package because it sees that what's on pypi is newer (even though what's already installed is sufficient)15:06
pabelangercool, looks like logstash-workers have been started15:06
pabelangerI'm going to delete puppetdb.o.o and puppetdb01.o.o, and clean up system-config15:07
ShrewsALL: I have to step out for a chiropractor appointment now. I'll catch up on things when I return. Shouldn't be long.15:07
fungididn't we have a change up to do --upgrade-strategy=only-if-needed for pip in some module recently? codesearch isn't turning it up for me so maybe hasn't merged yet?15:09
clarkbfungi: yes, ara in puppet-zuul iirc15:10
fungithanks, finding15:10
clarkbdoesn't look like it merged though15:10
clarkbfungi: https://review.openstack.org/#/c/516740/ yup not merged yet15:11
pabelangerremote:   https://review.openstack.org/449167 Remove puppetdb / puppetboard server15:12
fungiannoying that gerrit message searches unconditionally replace hyphens with spaces so you can't search for strings containing hyphens15:12
pabelangerclarkb: fungi: any objections to deleteing puppetdb^/puppetboard above? it is still precise15:12
pabelangererr15:12
pabelangeryah, precise15:13
clarkbfor some reason I thought that was already done so no objection from me15:13
fungipabelanger: by all means15:13
pabelangerokay, done15:17
pabelangerupdating etherpad15:17
dmsimardpabelanger, clarkb: that reminds me.. i'll take the opportunity of the sprint week to write the draft for continuous deployment dashboard to replace puppetboard15:20
clarkbwaiting on gating for the log processor fix was clearly a missed opportunity to make breakfast15:21
dmsimardyeah it hasn't passed check yet15:22
* dmsimard starts working on draft15:22
pabelangerdmsimard: great15:22
pabelangerokay, working on tripleo mirror now, going to ping them for a larger flavor. 100GB is the max listed right now15:23
dmsimardpabelanger: yeah good idea15:23
pabelangerI also think, we might be able to now move mirror-update.o.o into a zuulv3 job and periodic pipeline (may have to create)15:25
clarkbI'm worried that the log processor fix for centos 7 is out to lunch15:28
clarkber the centos7 job is15:28
clarkbwe may have to recheck it and if that happens I am making breakfast15:29
dmsimardclarkb: should we check out the patch locally ?15:31
dmsimardor wait it out ?15:31
clarkbI think we should wait it out, if the job doesn't go out to lunch it runs fairly quickly and this way we can't lose track of where we have or haven't fixed this particular systemd/xenial thing15:32
dmsimardack.15:32
clarkb(also its not an emergency)15:32
dmsimardindeed.15:33
jeblairpabelanger: let's not tackle mirror-update right now.  i think it will take some work, and just replacing the server will be easier.15:33
pabelangersure15:37
jeblairwhat data does grafana require be migrated?15:38
pabelangerjeblair: for AFS services, we can join the new (xenial) servers to the existing AFS cells right? Then after some sync process retire the original trusty based servers?15:38
jeblairpabelanger: depends on the servers -- can you be more specific15:39
pabelangerjeblair: sure, afsdb01/afsdb02 right now. Could we bring online afsdb03 and join the existing?15:40
pabelangerjeblair: I think we'd need to update puppet-grafana in system-config to working xenial, it is also possible we might need to patch grafyaml too. I think they changed some of the APIs in newer versions.15:41
jeblairpabelanger: yes -- i forget off the top of my head how to tell it to join, but we should be able to tell it to sync its data from the others, then remove them.15:41
pabelangerokay cool15:42
jeblairpabelanger: okay, that's not a data migration though...15:42
jeblairthere's a lot of servers under "these servers require data to be migrated" which i don't think require data to be migrated15:43
pabelangerYah, I might have put it there by mistake. We shouldn't need any data because of grafyaml15:43
*** baoli has quit IRC15:45
*** baoli has joined #openstack-sprint15:45
clarkbdmsimard: frickler Shrews ok if someone can recheck that change when the centos7 job finally times out I am going to make breakfast (there are penty of other roots around now to answer questions, walk through process so feel free to ping them too)15:47
dmsimardk15:48
pabelangerI'm going to start on eavesdrop01.o.o replacement15:49
*** baoli has quit IRC15:50
pabelangerIIRC, we'll need to migrate the volume between servers15:50
pabelangerclarkb: mind a +3: https://review.openstack.org/449167/15:51
*** baoli has joined #openstack-sprint15:57
fungipabelanger: yeah, /dev/mapper/main-meetbot seems to be on a cinder volume15:58
pabelangeryah15:58
pabelangerremote:   https://review.openstack.org/527139 Update eavesdrop.o.o to support xenial15:58
pabelangerreworks eavesdrop.o.o to support numeric hosts15:59
pabelangerand ups our testing to start on xenial15:59
fricklerso I have a patch to make puppet-ethercalc work on xenial. question is: do we need to keep it backwards compatible for < xenial at the same time? or can we avoid a lot of extra code and target only xenial/systemd-based hosts?16:03
clarkblooks like the log processor fix is queuing the centos7 job again so we may not need a recheck afterall16:08
clarkbfrickler: I think it best to keep support for both16:08
clarkbfrickler: makes the upgrade process (replacing servers) a little simpler16:09
fricklerclarkb: hmm, I just submitted the xenial-only version, will update later: https://review.openstack.org/527144 Update to work on Ubuntu Xenial or newer16:09
clarkbpabelanger: done16:13
pabelangerclarkb: frickler: Yah, that one doesn't look too bad to support both. for ethercalc16:13
pabelangerclarkb: danke16:13
*** baoli has quit IRC16:15
pabelangerI'm going to run into town for a quick errand / lunch. But have 2 servers in my name16:16
pabelangerI also added a 'Bug Fixes' section to https://etherpad.openstack.org/p/infra-sprint-xenial-upgrades so we can quickly identify things we need to merge16:16
pabelangerwe should also pick a topic to make it easier, if somebody wants to do so16:16
pabelangershould be back in 45mins16:16
clarkbthere is a topic for the sprint already, not sure if we need another for bugfixes?16:17
dmsimardlet's use the same topic ?16:22
dmsimardwe have infra-xenial right now16:22
*** baoli has joined #openstack-sprint16:27
*** baoli has quit IRC16:29
*** baoli has joined #openstack-sprint16:29
* jeblair looks up grafana stats on cacti16:33
jeblairgrafana has like no cpu or memory usage.  i think we can shrink the flavor16:33
jeblairthe 1 year max used ram is 771M (!)16:34
clarkb++ to shrikning flavor16:35
jeblairthe load average is 0.01616:35
jeblair1 year max16:35
jeblair2G then?16:35
clarkbwhat is it now?16:36
clarkbbut ya thats double max ram usage which seems like safe overhead16:37
jeblair8G16:37
clarkb2G sounds good to me16:37
jeblairhttp://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=2715&rra_id=4&view_type=&graph_start=1479957086&graph_end=151301027016:38
jeblairwe will need to reboot it at least once every 2 years i think.16:38
fricklerpabelanger: clarkb: next question then, may also affect other upgrades: do we want to continue piping the output to log files like here for backwards compatibility? or can we use native systemd/journald log handling? http://git.openstack.org/cgit/openstack-infra/puppet-ethercalc/tree/templates/upstart.erb#n2616:39
clarkbfrickler: I'm good with journald, the one thing we should check on that before we commit to it though is whether or not journald is logging persistently on our nodes16:40
clarkbshould get infra-root's larger opinion too16:40
* Shrews catches up16:42
fungias long as stuff gets logged *somewhere* and i can find it, i'm fine16:42
clarkbit doesn't look like journald is currently logging persistently on ubuntu fwiw16:42
clarkbwe can address that though16:43
jeblairme too.  that hasn't been my experience with journald to date, but if someone's willing to go on a limb and guarantee that, i'm fine with it.  :)16:43
clarkbShrews: zuul had a burp on one job for the systemd fix so we are still waiting for that to merge, but its close now16:44
Shrewsah, then i haven't missed much fun16:45
Shrewslooks like it just merged17:00
fricklerso https://review.openstack.org/527109 merged, do we need to update puppetmaster or is there a cron?17:00
clarkbfrickler: there is a cron but its a "slow" one. The cron that updates those puppet modules is our main run puppet with ansible cron job17:00
clarkbfrickler: Shrews dmsimard you can check whree that cron is by looking at /var/log/puppet_run_all.log on the puppetmaster17:01
clarkbit looks like it just started at 1700UTC which I think means it will have just updated the module for us17:02
clarkbfrickler: Shrews dmsimard you can confirm this by running git log at puppetmaster:/etc/puppet/modules/log_processor17:02
clarkbonce you've poked at those items and have convinced yourselves that I actually did check before making those claims :) I think we can go ahead and try the new instance boot again17:03
Shrewsyuppers17:03
Shrewsi used the wait time to setup my tmux properly17:04
* clarkb migrates into the office now that the kids are awake17:06
Shrewsis this something to be concerned about? http://paste.openstack.org/show/628639/17:11
Shrewsit seems we progressed past that, but happened to notice it in the output17:11
clarkbpabelanger: I think ^ may be related to your host removal work17:11
clarkbShrews: my guess is that the host deletions pabelanger has been doing have resulted in some groups defined that don't match any instances17:12
clarkbpabelanger: is that something you can look into?17:12
clarkbif that is the cause then I don't think we need to worry about it17:12
frickleranyway it failed again for me, will retry with --keep for better debugging, not sure about the failure from the log17:14
clarkbfrickler: can you share the log?17:14
jeblair17:12 < openstackgerrit> James E. Blair proposed openstack-infra/system-config master: Support xenial on health  https://review.openstack.org/52716917:14
jeblair17:14 < openstackgerrit> James E. Blair proposed openstack-infra/system-config master: Support xenial on stackalytics  https://review.openstack.org/52717117:14
jeblairsince the first step is to update the node selector and the node-os comment in site.pp, and then wait for that to gate, is there any reason we shouldn't do a bunch of those ahead of time ^ ?17:15
clarkbjeblair: probably not, just split them up so that failures can be debugged individually17:15
jeblairclarkb: ya, i've pushed up 3 all based on tip so far17:16
fricklerhttp://paste.openstack.org/show/628641/ is the tail of it, neglegted to tee all of it17:16
clarkbfrickler: ya may need keep or a bigger screen bugger to see why puppet is unhappy17:18
clarkbsome dependency for logrotate failed looks like17:19
fricklerya, need to amend my tmux settings to have more scrollback and searching17:19
clarkbmaybe its a new package name or different dir path for that config?17:19
jeblairjust so we're really clear, i'm pushing up a bunch of changes, but i don't plan on doing all these servers, i'm just trying to save time so that the initial step (with a bunch of waiting) is already done.  please grab/update/abandon my changes as needed as you work on servers.17:20
fricklerclarkb: /tmp/launch-log on puppetmaster is the complete log now, instance is kept for checking17:25
clarkbfrickler: its lookling like the reload for systemctl isn't finding the sys v compat scripts? maybe permissions or something is wrong with them?17:27
clarkbfrickler: running the systemctl reload in the foreground may have more details? possibly also list-units?17:28
clarkbI need to pop out agin to help with kids now that they are awake. Back in a bit. Look forward to seeing what you find out17:30
fricklerclarkb: http://paste.openstack.org/show/628645/ looking deeper into the service definitions now17:32
Shrewsi know less about puppet than anybody, but there is this in that log: | Dec 11 17:20:36 logstash-worker01 puppet-agent[10308]: Could not run: SIGTERM17:32
clarkbShrews: that is expected since we are puppet apply only I think. That happened as a result of the puppet agent stop I think17:34
dmsimardfrickler: the daemon reload isn't working17:35
dmsimardfrickler: (/Stage[main]/Openstack_project::Logstash_worker/Log_processor::Worker[B]/Service[jenkins-log-worker-B]/enable) change from false to true failed: Could not enable jenkins-log-worker-B:17:35
fricklerya, fix upcoming17:35
jeblaircould folks +3 https://review.openstack.org/527168 please?17:36
dmsimardjeblair: do we actually have different grafana numbered nodes ?17:36
jeblairdmsimard: not yet -- we're transitioning all of the hosts to numbered hosts so it's easier to replace them17:37
dmsimardjeblair: makes sense17:37
jeblairdmsimard: so the replacement for grafana.o.o will be grafana01.o.o, with a cname in dns17:37
dmsimardjeblair: in any case, that pattern should match numbered or not17:37
jeblairyep17:37
fricklerclarkb: dmsimard: that fixed it on my node: https://review.openstack.org/527193 Fix multiple workers for systemd17:37
jeblairi'm using \d* so we continue to have puppet operate on the current host17:38
dmsimardfrickler: makes sense17:38
jeblairdmsimard, frickler: not sure if you're aware -- the node-os comment is read by the infra apply jobs, so adding that xenial comment causes those jobs to run, and we verify that at least puppet apply -noop works on that os.17:39
dmsimardI wasn't aware those comments were actually important, thanks for that17:40
* frickler needs a break now, will take another look later17:40
jeblairit looks like i got 36% of the way through site.pp updating the node matchers and os comments.  i'm going to stop there and leave more for others to do.  :)17:41
dmsimardjeblair: have a comment on https://review.openstack.org/#/c/527172/17:45
dmsimardquestion came up when I was looking at https://review.openstack.org/#/c/527186/1/manifests/site.pp with the files group left intact17:45
clarkbfrickler: back and reviewing your fix as well as jeblairs now17:53
clarkboh frickler is taking a break, I have a comment on the fix I'll just update the patch17:54
clarkbdmsimard: no patchset on https://review.openstack.org/527193 can you rereview? jeblair care to review as well?17:55
dmsimardclarkb: ah I guess frickler's patch was working although it was a little bit uglier with two dashes17:56
clarkbdmsimard: ya and may have confused systemd slightly depending on how important that name is17:56
clarkbfigure better to just get it matching the name used elsewhere and not worry about it17:56
* dmsimard nods17:57
jeblairdmsimard: good catch thanks18:01
*** baoli has quit IRC18:01
*** baoli_ has joined #openstack-sprint18:04
clarkbI'm just going to approve all those changes without check results as long as my eyeballs don't catch anything wrong with them. Then if tests do fail we can sort them out (otherwise there is just too much state to track)18:05
clarkbits unfortunate that our puppet apply --noop testing won't catch the systemd reload issue though18:06
fungii'm looking at the implementation of that in puppet-zuul18:09
fungilooks like there's a manifests/systemd_reload.pp classfile implementing it18:09
fungiwhich gets called out as a require line in services18:09
fungibut then there's also what looks like basically a duplicate implementation of it in manifests/executor.pp18:10
clarkbfungi: that would be one way to do it. The tricky thing is requiring something that won't necessarily be in place on all platforms (but hiding it in a class of its own is one way to do that18:10
fungiam i right in thinking that's redundant?18:10
clarkboh ya if there is something else doing it then it problem is redundant /me looks18:10
fungior is it serving some subtle purpose i'm not picking up?18:11
clarkbit looks redundant to me as well, but maybe there is an ordering issue that isn't immediately apparent that that works around18:12
*** baoli has joined #openstack-sprint18:46
*** baoli_ has quit IRC18:47
clarkbfor anyone wondering why it got quiet all of a sudden we are mostly just waiting on CI to finish and changes to merge at this point (lots of demand in zuul right now)18:54
pabelangerand back18:57
pabelangercatching up on backscroll18:57
clarkbthe log_processor fix has finally started jobs19:05
clarkbhopeflly will be in gate in the not too distant future then shrews and dmsimard (and frickler if still around) can give it another go.19:05
fungii have a couple of puppet-subunit2sql changes proposed to help me build the replacement worker19:12
clarkbI'll do another round of reviews shortly19:13
pabelangerremote:   https://review.openstack.org/526194 Remove zuulv2 long lived servers19:18
pabelangercould use another +3 on^ had to rebase19:18
pabelangerclarkb: Shrews: is the pastebin from above on expand-groups.sh still an issue?>19:18
dmsimardpabelanger: I believe so19:22
pabelangerk, lets land 526194, then delete ansible-inventory cache, since we've deleted some servers19:23
pabelangerokay, tripleo has bumped the flavor for mirror to 150GB19:27
pabelangeruploading xenial cloud image to tripleo-test-cloud-rh1 now19:27
clarkbfungi: did you see https://review.openstack.org/#/c/527193/ ? you amy need similar for subunit2sql19:33
fungiclarkb: oh, thanks! i missed that. will update my open change if it's not merged yet19:36
fungiadded19:39
pabelangerokay, mirror01.regionone.tripleo-test-cloud-rh1.openstack.org launched properly19:40
pabelangersetting up DNS now19:40
pabelangerhttp://mirror01.regionone.tripleo-test-cloud-rh1.openstack.org/19:43
pabelangereverything seems okay19:43
pabelangerI'm going to redirect mirror.regionone to mirror01.regionone now19:44
pabelangerDNS updated, waiting to confirm it correct19:49
clarkbpabelanger: remember to use hour long ttls on those records (to avoid dns requerying19:50
pabelangerclarkb: Yup! confirmed at 60min19:50
pabelangerand cname is working19:50
pabelangerwill accept ssh hostkey on puppetmaster19:51
pabelangerremote:   https://review.openstack.org/507266 Comment out server in puppet.conf19:54
pabelangerI believe that will stop puppet from hanging for 2mins when we boot new servers19:55
clarkbpabelanger: will puppet apply do that?19:55
clarkbseems like that should be a noop19:55
clarkbespecially now that ianw's change to stop the agent is in19:56
ianwoh good19:56
clarkbianw: good morning19:56
ianwsorry, just catching up with reviews etc19:56
ianwmorning!19:57
pabelangerclarkb: I think it is a race condition we install puppet with install_puppet.sh, but server boots and puppet-agent tried to connect to puppetmaster, during when puppet apply is running. So I think it might be too late20:01
pabelangeralso, trying to see the change ian did20:01
clarkbpabelanger: ya but ianw's patch explicitly stop puppet agent20:02
clarkband puppet apply shouldn't talk to a server aiui20:02
pabelangerclarkb: I don't think it worked, cause It still happened when I tried bringing tripleo mirror online20:02
pabelangerlet me see which system-config I had20:03
pabelanger | Dec 11 19:36:28 mirror01 puppet-agent[4061]: Could not request certificate: Failed to open TCP connection to puppet:8140 (getaddrinfo: Name or service not known)20:03
clarkbya I'm thinking your system-config may not have been up to date? the change just merged a few hours ago20:03
pabelangeralso, I see HTTP requests to new tripleo mirror now20:03
clarkbShrews: the fix for log_processor appears to be about to merge, will you be able to give that another shot in a few minutes?20:04
Shrewsclarkb: yeah20:04
Shrewsgetting frustrated with sockets so could use a diversion20:04
ianwi dropped a comment ... so the package just assumes that there's a resolvable remote host called "puppet" ?20:04
clarkbianw: ya thats puppets default behaviopr20:05
pabelangeryup20:05
pabelangerI think I had ianw commit when I ran it just now20:05
ianwok, TIL :)20:06
pabelangerbut, will know in a moment when I try to launch next server20:06
pabelangerserver=puppet20:06
pabelangerthat is what the default is in puppet.conf for us20:07
pabelangerat one point I think we managed it on server boot20:07
clarkbis the problem that we can't stop/disable the service until after we've already started it and sent it off trying?20:07
pabelanger | + systemctl disable puppet20:08
pabelangerokay I see that in my console20:08
pabelanger | Executing /lib/systemd/systemd-sysv-install disable puppet20:08
pabelanger | Dec 11 19:32:26 mirror01 systemd[1]: Started Puppet agent.20:09
pabelangerso, something started it again20:09
pabelangerthen20:09
pabelanger | Dec 11 19:32:28 mirror01 puppet-agent[4061]: Could not request certificate: Failed to open TCP connection to puppet:8140 (getaddrinfo: Name or service not known)20:09
clarkbhuh20:11
clarkbShrews: arg zuul just put the trusty job back to queuing20:12
clarkbI'm worried that infracloud networking is falling over with nodepool running at full capacity20:12
Shrewsclarkb: did we put your nodepool fix in?20:12
Shrewsclarkb: if we didn't, we could be hitting that again20:13
clarkbShrews: I don't thin so that reminds me I want to say tobias hda comments for me to address and I cimpletely forgot with the sprint stuff this morning20:13
* Shrews checks nodepool20:13
clarkblooks like the comments are more along the lines of "this is weird and test doesn't do a good job reproducing but dunno what is going on yet"20:14
Shrewsclarkb: hrm, only 1 ready&locked node, so unlikely we're hitting the issue you found20:15
Shrewsjust busy20:15
clarkbShrews: ya I'm thinking the networking in hpcloud just can't handle the demand and is dropping connections20:16
pabelangerokay, moving on to eavesdrop01.o.o20:23
pabelanger | Dec 11 20:30:35 eavesdrop01 puppet-user[11951]: Could not find data item openstack_meetbot_password in any Hiera data file and no default supplied at /opt/system-config/production/manifests/site.pp:347 on node eavesdrop01.openstack.org20:32
pabelangerhow did we handle hiera data for numeric hosts again?20:32
pabelangerdid we just move them into a group20:32
clarkbpabelanger: yes that is what I have been doing with eg translate20:32
pabelangerokay, wanted to confirm20:33
pabelangerI'll send a patch for eavesdrop here shortly20:33
clarkbShrews: ok fix for log_processor merged20:34
clarkbShrews: puppetmaster:/var/log/puppet_run_all.log says that the ansible puppet cron is currently running so we can either wait for it to finish or just manually update the puppet module on the puppet master20:35
clarkbShrews: if you are able to give the node launch another go right now I can walk through updating the puppet module20:35
clarkbdmsimard: ^you too20:35
dmsimardyeah will give a try after extinguishing a fire20:36
clarkb(I expect at this point frickler has called it a day)20:36
Shrewsclarkb: waiting for the puppet repo to update20:38
Shrewsclarkb: oh, that's what you want to walk me thru20:38
Shrews:)20:38
Shrewsyeah, i'm ready20:38
clarkbShrews: cool, so the module is at /etc/puppet/modules/log_processor20:40
ianwwith something like 527144 ... do we care about effectively dropping trusty support?  should we put a tag in before merging maybe?20:40
ianwthat's puppet-ethercalc btw, moving from an upstart file to a .service file20:40
clarkbShrews: as root you will want to do a `git remote update` to fetch latest changes then `git checkout origin/master` it might be `git checkout origin master` I can never rememner where git wants the /20:41
clarkbShrews: however the cron will update it in 3 minutes if you want to wait (and avoid conflicts though git should sort those for us in this case)20:41
clarkbianw: ya frickler had asked about that and I had asked to keep trusty support for now. Simplifies the transition/upgrade too20:42
Shrewsclarkb: those commands put me in a detached HEAD state. is that the norm?20:42
clarkbShrews: yes20:43
clarkbShrews: rather than try and curate a local branch we just checkout upstream states20:43
clarkbShrews: its easier this way when you rely on code review to specify a state20:43
Shrewsclarkb: that's done then20:43
clarkbcool I think you can give the launch node script another go then20:43
Shrewsif i could get my copy-pasta fixed20:48
Shrewsk. kicked off20:49
Shrewsfwiw, launch-node.py does not play nicely with tee20:50
clarkbis it writing to stderr?20:50
Shrewsi guess?20:50
clarkbI wonder if that isb ecause that is how ansible does it?20:50
clarkbmordred or dmsimard may know20:50
dmsimardShrews: launch-node.py 2&>1 |tee -a file.out ?20:51
dmsimardor maybe PYTHON_UNBUFFERED thing20:51
Shrewsi'll just depend on my tmux buffer20:52
dmsimardif we were really motivated, we could do, like, launch-node.py | systemd-cat -t launch-node20:55
dmsimardthat sends the output straight to the journal and then you can do, like, journalctl -u launch-node20:55
clarkbdmsimard: currently no journald on that node20:55
clarkbfungi: https://review.openstack.org/#/c/527203/2 failed testing20:57
fungigrar20:58
* dmsimard pictures fungi growling20:58
fungiit's not a particularly intimidating growl20:58
*** jkilpatr has quit IRC20:59
Shrewsclarkb: looks like we can haz new node21:00
clarkbShrews: yay21:00
clarkbShrews: ok now don't immediately do the dns stuff yet21:00
clarkbbecause dns is a pita we should probably talk about it a little21:01
clarkbdmsimard: maybe you want to get to the point where you have a launched logstash worker too and we can go through that together?21:01
fungicue rant about proprietary dns hosting api21:01
dmsimardclarkb: fire almost extinguished21:01
clarkbdmsimard: cool21:01
Shrewsaaaaaaaand go go gadget rant21:01
clarkbShrews: does it work if I grab lunch and dmsimard gets a node launched before we dig into the next step?21:02
dmsimardfor context, I don't think I've mentioned this before but I'm basically infra-root for RDO's infrastructure21:02
dmsimardso from time to time there's those fires :)21:02
Shrewsclarkb: yes. i will task switch back to the finger gateway, but we do have the zuul meeting in an hour21:02
jeblairi'm back from lunch if needed here21:03
clarkbShrews: oh right zuul meeting21:04
clarkbShrews: we can also just go through the dns stuff and take the pressure off getting everything done in that time21:04
clarkbits not the end of the world to go through it multiple times21:04
clarkbShrews: so the deal with DNS is its hosted by rackspace and they use a proprietary client and service for managing it21:05
clarkbShrews: this works reasonably well for when you are just adding a new host (and not replacing an existing one) because adding records is super easy21:05
clarkbShrews: the problems largely lie in removing old records safely because there is no version control like you get with gandi and other services21:05
clarkbShrews: and since we share the openstack.org domain with the foundation we have had cases of stepping on each others toes in the past :/21:06
clarkbShrews: in this case of replacing an instance my preferred method is to use the command line client to udpate only the reverse PTR records, then log in to the web ui and delete the old A and AAAA records and add new ones21:07
clarkbthis means we'll only run half of the commands printed out by the launch script (2/4 that update the reverse ptr records)21:07
clarkbfungi: jeblair do you recall if the reverse ptr records are the first two commands or the second two? I think they are the first two21:08
jeblairthey are the first21:08
Shrewsso command line for one direction resolution, gui for the other21:09
jeblairexample: http://paste.openstack.org/show/628658/21:09
clarkbShrews: correct21:09
clarkbShrews: so you can go ahead and run the command above line 15 in jeblairs example (but use the command that were pritned out for your launch invocation)21:10
jeblairwe have some (a lot of) hiera data assigned by fqdn.  i'm guessing that as we transition nodes to numbered, we're going to need to move those to groups, yeah?21:11
clarkbjeblair: yup, pabelanger ran into that with eavesdrop and I did with translate*. Making a copy of the heira data in a group is what I did for translate21:12
clarkbthen once things are transitioned we can remove the fqdn specific data21:12
fungiclarkb: the entries with ip addresses are the address records, then entries with server uuids are the reverse ptrs21:12
fungii don't recall what order the wind up in21:12
jeblairclarkb: i can never remember how our split group system works.  what do i need to do to make a grafana group and add grafana01 to it?21:12
clarkbjeblair: in the site.pp add group = grafana line like the other examples in there21:13
jeblairthat's the only thing?21:13
clarkbjeblair: then we need to update the ansible group file that I can never remember the path to /me finds it21:13
jeblairyeah, that's the thing i was worried about :)21:13
clarkbjeblair: openstack-infra/system-config/modules/openstack_project/files/puppetmaster/groups.txt21:14
Shrewsclarkb: done21:14
clarkbShrews: ok next step is the fun step21:14
* clarkb actually goes through process with shrews to figure it out21:15
Shrewsyou mean the fun doesn't stop there?????21:15
Shrews:)21:15
clarkbShrews: go to https://www.rackspace.com/login then click on cloud control panel login21:15
clarkbShrews: username and password can be found in the file being sourced on line 16 in jeblairs example21:16
funginext, attempt to extrude your brain matter through a collander21:16
clarkbShrews: once there click on Networking -> Cloud DNS21:17
clarkbthen click on openstack.org21:17
fungibecause, you know, dns is totally a network thing21:17
clarkbNow my favorite part of this whole process, it doesn't load all of the records for you to serach at once, so you want to scroll taht scroll bar until its done loading all the things it can load21:18
* fungi wonders why they don't also put database services under the "storage" menu21:18
jeblairremote:   https://review.openstack.org/527245 Create a grafana group21:18
jeblairclarkb: can you ^ pls?21:18
clarkbjeblair: yup21:18
clarkbShrews: let me know when you get tehre21:18
Shrewsclarkb: there, and see logstash-worker0221:19
pabelangerremote:   https://review.openstack.org/527246 Add eavesdrop into groups.txt21:19
fungiyeah, i basically scroll as far down as it will go, then do that again, and again, and again... until it stops letting me do it any longer or i get distracted and go do something else21:19
pabelangerclarkb: jeblair: also^21:19
fungiShrews: there will be two, one for ipv4 and one for ipv6... and they won't be even remotely adjacent in the ui21:19
Shrewsoh21:20
clarkbpabelanger: you have two differen regexes in use fwiw21:20
jeblairpabelanger, clarkb: i used \d* and pabelanger used \d+.  which is better?21:20
fungiwhich is why once you've gotten it to load all the paginated chunks of the set, you can then use in-browser keyword searching to find them all21:20
clarkbjeblair: I think you got yours right because it matches the node spec in site.pp21:20
jeblairi mean, specifically, because i don't understand the group system, i don't know if things will break if they are different21:20
clarkbpabelanger should update his change to use * in groups.txt I think21:20
pabelangerah, I copypasted another21:20
pabelangerlet me fix21:20
Shrewsah yes. i see both A and AAAA entries21:20
jeblairclarkb: sounds like you are inclined to think they may break -- ie, puppet will expect a group to be present that ansible won't have placed on the filesystem, unless they match?21:21
clarkbjeblair: I don't think they will break but the old server will continue to fail to find the group it thinks it is in and fall back to the fqdn system instead until it is gone21:21
clarkbShrews: cool now you can use brwoser search to find logstash-worker02 (I think you were 02)21:21
jeblairclarkb: that sounds reasonable too.  hrm21:21
pabelangerokay, updated21:22
pabelangerremote:   https://review.openstack.org/527246 Add eavesdrop into groups.txt21:22
Shrewsclarkb: yup21:22
clarkbShrews: then click the little gear next to the records name and click modify record21:22
clarkbShrews: then replace the ipv6 address if modifying the AAAA record with the one launch printed out or the ipv4 if modifying the A record21:22
clarkbShrews: and do that for both the A and AAAA records21:22
clarkbpabelanger: approved21:23
*** jkilpatr has joined #openstack-sprint21:24
Shrewsclarkb: done21:24
clarkbShrews: then you can `dig +short logstash-worker02.openstack.org` and `dig +short AAAA logstash-worker02.openstack.org` to see when the records update21:25
clarkbonce that happens there is one last step we have for the lgostash workers which is updating the firewalls to accept the new host and making sure services on new host are functioning21:26
Shrewsgroovy. i can dig it21:26
Shrewsfar out21:26
pabelangerokay, I'm going to delete the old mirror in triple-test-cloud-rh1, I don't see any traffic in apache logs for 45mins now21:27
* Shrews speaks fungi language21:27
clarkbpabelanger: sounds good21:28
fungignarly21:29
Shrewspfft, that's 2 decades beyond21:29
jeblairi need a translator21:29
dmsimardok fire extinguished21:30
dmsimardgoing through a logstashworker now.21:30
pabelangerand deleted21:30
Shrewsclarkb: anyhoo, dig seems to be immediately returning the correct things (from multiple places)21:30
clarkbShrews: awesome, so now some logstash-worker specifc things. We use unauthenticated connectivity to gearman (which could be changed) and to elasticsearch (whcih can't be changed without paying them money or writing our own auth plugin for es)21:31
pabelangerianw: clarkb: so, what are we thinking on https://review.openstack.org/507266/ (puppet DNS error on server boot)21:31
fungiclarkb: there are also two other steps... updating the ssh host key cached by root on puppetmaster, and truncating the ansible inventory cache21:32
pabelangerclarkb: Shrews: we'll also need to restart firewalls too, to pickup new IP addresses21:32
clarkbShrews: this means we have to kick the firewall on logstash.openstack.org (where gearman server runs) and elasticsearch[2-7].openstack.org where elasticsearch runs to have it pick up the new IPs based on name21:32
clarkbfungi: oh right21:32
fungisteps which i frequently forget21:32
pabelangeri think the last time we changes out logstash workers I wrote an ansible-playbook to restart firewalls, I think I added it to system-config21:33
clarkbShrews: the way to restart the firewall on those nodes is to run `service restart iptables-persistent`21:33
ianwpabelanger: my only thought is that it's quite untested on everything other than xenial?21:33
clarkbfungi: doesn't launch node automatically truncate the cache file now?21:33
clarkbfungi: I think it may, but the ssh key add will need to be done21:34
clarkbpabelanger: oh cool21:34
ianwpabelanger: maybe we should just limit it to that for now?21:34
pabelangerianw: sure, we can do it for xenial, then add it to others21:34
fungiclarkb: oh, maybe21:34
clarkbpabelanger: I don't see it, maybe it hasn't merged?21:34
pabelangerclarkb: yah, looking now21:35
clarkbShrews: anyways let me know once that is run on logstash.o.o and elasticsearch[2-7].o.o (can just ssh directly or figure out ansible)21:35
ianwpabelanger: although actually, the apply tests do run it21:35
ianwhttp://logs.openstack.org/66/507266/2/check/legacy-infra-puppet-apply-3-centos-7/5db1915/job-output.txt.gz#_2017-12-11_20_21_21_95069021:36
Shrewsclarkb: will do21:36
fungiclarkb: easiest way to be sure is to check whether the old instance continues to appear in the inventory cache file, i guess21:36
Shrewsclarkb: should these be done in any particular order?21:38
pabelangerclarkb: yah, I don't see it any more but it wasn't a big playbook. I can whip up a replacement if needed21:38
Shrewsclarkb: like logstash.o.o first, then the elasticsearch nodes?21:38
clarkbShrews: probably best if elasticsearch is done first as its at the end of the data processing pipeline21:38
clarkbShrews: this way we don't try processing anything until the whole pipeline can talk21:38
clarkbfungi: ya there is code to make sure the inventory cache file is not out of date in launch script21:39
fungioh, good21:40
pabelangerianw: yah, your call. If you want only xenial, I can propose that.21:42
dmsimardI guess I'll go learn what the DNS stuff looks like while logstash-worker03 is installing.21:42
Shrewsclarkb: that should be 'service iptables-persistent restart', right?21:43
clarkbShrews: possibly systemctl goes one way and service theo ther so I mix them up21:43
dmsimardShrews: oh, that's different from trusty to xenial21:43
clarkbdmsimard: ya21:43
dmsimardShrews: in xenial is netfilter-persistent21:43
clarkbShrews: if your command works and mine doesn't then yours is correct :)21:44
dmsimardclarkb: where is the rackdns script ?21:46
clarkbyou mean where does things like rdns-create live?21:47
clarkbdmsimard: http://paste.openstack.org/show/628658/ is jeblairs example. It lives in the virtualenv that is sourced early in that21:47
dmsimardoh, root/rackdns-venv/21:47
Shrewsclarkb: those are done21:48
clarkbShrews: ok now we want to hop on the node itself and check the services are working, then we will swing around and do the thing fungi mentioned and remove the old instance21:49
clarkbShrews: there are 4 log worker processes that log in /var/log/logprocessor and one logstash jvm process that logs in /var/log/logstash21:49
clarkbShrews: if you tail the files in /var/log/logprocessor you should see it grabbing gearman jobs and pushing log files21:50
clarkblogstash on the other hand seems to make on demand http connections to the elasticsearch servers so as long as the process is running it hsould be fine21:50
dmsimardwhat was that about the firewall ? I think I need that too. /me reads backlog21:51
dmsimardgetting connection denied to gearman from the new worker21:51
clarkbunfortunately logstash doesn't log as well as I'd like21:51
Shrewsclarkb: yep, seeing that21:51
fungipretty ironic considering its name21:51
clarkbdmsimard: yup we use the dns names to set up firewall rules so you need to "restart" the iptables-persistent service once you are happy with the state of dns21:52
clarkbon logstash.o.o and elasticsearch[2-7].o.o21:52
fungior netfilter-persistent21:52
Shrewsthose machines are still trusy21:52
Shrewstrusty21:52
fungiahh, right-o21:52
clarkbShrews: so I think this ndoe is happy21:52
dmsimardclarkb: hmm, so we need to change the DNS before the worker can connect to gearman ?21:52
clarkbdmsimard: correct21:52
dmsimardthus we can't really validate that it works21:52
dmsimardshould we perhaps use /etc/hosts ?21:53
dmsimardat least before changing the DNS to ensure it works21:53
fungiif we're worried about not being able to switch back and forth quickly enough, set a low ttl on the record21:53
clarkbfungi: ya that21:53
clarkbdmsimard: ^21:53
dmsimardTTLs is mostly a suggestion though21:53
dmsimardbut sure21:53
clarkbthis is also fairly specific to the logstash workers fo which we have many and can be replaced at any time21:53
clarkbbecause elasticsearch is money grabbing for features21:54
dmsimardlol21:54
*** pabelanger_ has joined #openstack-sprint21:55
*** EmilienM_ has joined #openstack-sprint21:55
clarkbShrews: now before zuul meeting. As root on puppet master you need to ssh to logstash-worker02 and accept its ssh host key. This is so that ansible can ssh to it for puppetting21:55
*** EmilienM has quit IRC21:56
*** pabelanger has quit IRC21:56
clarkbShrews: then for deleting the old instance when we are happy with how new one is functioning (seems fine to me so far)21:56
clarkbShrews: I like to do something like `openstack --os-cloud openstackci-rax --os-region DFW server show cf873928-122c-447b-ad24-d1e213d277f0` to confirm the uuid I think is the old instance is actually the old instance21:56
*** EmilienM_ is now known as EmilienM21:56
dmsimardTTL is already 300, short enough21:56
clarkbShrews: then I can change the 'show' in that command to 'delete' to delete it21:56
*** EmilienM has quit IRC21:56
*** EmilienM has joined #openstack-sprint21:56
Shrewsclarkb: known_hosts updated21:57
*** pabelanger_ is now known as pabelanger21:57
dmsimardwill it work if we do a rdns create/record create on a record that already exists ?21:58
clarkbdmsimard: sort of21:58
Shrewsclarkb: old server deleted. many thx for the guidance21:58
dmsimardclarkb: heh, okay, let's see.21:58
fungiyeah, having the same reverse dns for multiple systems is perfectly fine21:58
clarkbdmsimard: I walked shrewd though it above, the reverse dns you can run the commands that launch spat out. So basically everything above line 15 in jeblairs example21:58
clarkbdmsimard: but when replacing a server it is easier to update the forward A and AAAA records through the gui21:59
Shrewsetherpad updated. now meeting21:59
clarkbdmsimard: otherwise you get a round robin between the instances21:59
clarkbShrews: thanks!21:59
dmsimardclarkb: yeah I've seen that, but for an existing node I'd tend to do a delete before the create -- or there is a record modify command, but not a rdns modify.21:59
pabelangerIIRC, rdns won't update, but create a 2nd DNS entry21:59
Shrewsclarkb: oh, updating ansible inventory cache?21:59
clarkbShrews: launch handled that for us buiilt in features21:59
Shrewscool cool cool22:00
clarkbdmsimard: ya rdns is specific to the IP address22:00
clarkbdmsimard: and the other rdns record gets removed when you delete the old instance22:00
clarkbdmsimard: wheras A and AAAA are specific to the name22:00
pabelangersorry, record-create will not update22:00
jeblairit's zuul meeting time in #openstack-meeting-alt22:00
clarkbdmsimard: so its an artifact of how DNS + rax dns service operate22:00
fungidmsimard: problem is you need to know thee "record id" for it which you can only get from the api, but the api refuses to return more than 100 records i think, and has no pagination, so you usually can't get the info you need to delete or modify a record via th api22:00
dmsimardbah22:01
dmsimardjeblair has not written a raxtty yet? :D22:01
fungifor the a/aaaa records22:01
fungii doubt jeblair has any interest writing a client for a proprietary api22:01
dmsimardit was mostly a joke, but indeed22:01
clarkbdmsimard: so your general process here is run the commands for reverse dns, then ignore the forward dns commands. Switch over to rax gui using steps I described above for shrews and modify the A and AAAA records to point at the new IP addresses22:02
clarkbdmsimard: then once dig reports new addrs "restart" the iptables-persistent service on the nodes that firewall things (logstash.o.o and elasticsearch[2-7].o.o)22:02
dmsimardyup, I'll figure it out and report back if I have issues22:02
clarkband be very careful when modifying openstack.org records as there is no revision control and it is a shared resource :/22:03
clarkbDNS is bascially the least optimal part of this whole process22:03
clarkbdmsimard: also totally happy to walk you through it step by step like I did with shrews after the zuul meeting if you like22:21
*** baoli has quit IRC22:28
*** baoli has joined #openstack-sprint22:29
*** baoli has quit IRC22:33
*** larainema has quit IRC22:45
dmsimardclarkb: DNS updated so I'll check every once in a while. Someone mentioned there was an ansible inventory somewhere ?22:52
clarkbdmsimard: there is, it is what the ansible that runs puppet uses to know what to puppet, but the launch node script automatically updates that for you so you should be fine22:53
dmsimardclarkb: oh, it was mostly to do like ansible -i inventory -m command "dig ..." :)22:54
clarkboh that, I think the default inventoy will work22:54
clarkbbut default inventory has every control plane host in it so be careful22:54
dmsimardyeah, but where is it ?22:55
dmsimardoh, /etc/ansible/hosts, got it22:55
clarkbdmsimard: /etc/ansible/hosts/openstack it uses the openstack dynamic inventory thing22:55
clarkb(with a cache file that is the thing that launch-node.py updates)22:55
dmsimardansible -i /etc/ansible/hosts/openstack logstash.openstack.org,elasticsearch* --list-hosts <-- does what I wanted22:59
dmsimardclarkb: new logstash-worker03 is processing things \o/23:02
clarkbdmsimard: woot23:02
dmsimardso delete the old one and done ?23:03
clarkbdmsimard: so the lsat two steps are to make sure root accepts the host key for the new host on puppetmaster (just ssh to the host and accept it if it looks good) then delete the old one23:03
clarkbdmsimard: like I told shrews I like to use openstack server show $uuid and check that the uuid I have is the one then change show to delete to delete it23:03
*** jesusaur has quit IRC23:03
clarkbdmsimard: and you have to use uuid in this case because there are duplicate matching names23:03
dmsimardyeah23:04
dmsimardI always use UUIDs anyway, even for flavors and images23:04
dmsimardname matching is nice but..23:04
pabelangerheads up, I've modifying /etc/puppet/hieradata for eavesdrop0123:04
pabelangertesting out, then will commit changes23:04
clarkbdmsimard: looks like 582c3ddf-a669-4c2b-bdd3-87a5ca088d0f in this case23:05
dmsimardyeah23:05
dmsimard582c3ddf-a669-4c2b-bdd3-87a5ca088d0f is deleted \o/23:06
pabelangercool23:06
dmsimardok, that was easy enough once we churned through some of the patches23:06
clarkbdmsimard: if the host key has been accepted I think thats it23:06
dmsimardI have to step away for dinner but I'll probably take a few out23:06
dmsimardclarkb: yeah, did that too.23:07
clarkbya I'm about to call it a day myself. Got up very early and expect I'll try that again to walk frickler through the rest of the process23:07
dmsimardclarkb: I'll send you a link later tonight for continuous deployment dashboard spec23:07
dmsimardno rush, just sayin23:07
*** jesusaur has joined #openstack-sprint23:09
ianwwould someone might a quick eye on https://review.openstack.org/#/c/526975/ and i'll see about status.o.o23:10
ianwi'm also working through the puppet for nodejs and ethercalc23:10
clarkbianw: ya I can take a look before I call it a day23:10
ianwyep we were chatting yesterday, all good23:11
clarkbianw: re 526975 I think you also want to add a status group? see https://review.openstack.org/52724523:13
ianwclarkb: ok, done23:15
clarkbianw: one thing inline23:17
pabelangerokay, hieradata for eavesdrop group works, I've commit the change23:18
jeblairi've added a grafana group to private hiera23:18
clarkbianw: +2 thanks23:19
pabelangerokay, eavesdrop server failed. running with --keep to debug and propose fixes23:20
ianwheira will fallback to the fqdn if the group doesn't exist?23:26
clarkbianw: yes23:26
clarkbmost specifi cmatch wins23:26
clarkbin the case of status.o.o -> status01.o.o ther ewon't be an fqdn file for status01.o.o. What I did for translate was to copy the existing translate.o.o fqdn hiera data to a group for translate23:27
clarkband I've now got kids telling me its walk time so I gotta go23:27
clarkbthanks everyone see you tomorrow23:27
jeblairhttp://grafana01.openstack.org/dashboard/db/zuul-status23:29
jeblairthat looks really promising23:29
jeblairi'll delete dns for the old server and add a cname now23:30
pabelangernice23:38
jeblairnew dns has taken effect for me23:41
jeblairi'll delete the old server tomorrow unless someone screams23:42
*** baoli has joined #openstack-sprint23:46
*** baoli_ has joined #openstack-sprint23:50
*** baoli has quit IRC23:50
pabelangerokay, I see the issue with eavesdrop0123:52
pabelangerDec 11 23:26:21 eavesdrop01 puppet-user[11794]: (/Stage[main]/Ptgbot/Exec[install_ptgbot]) Failed to call refresh: Could not find command 'pip3'23:52
pabelangerI start working on a fix23:52

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!