Thursday, 2016-02-25

mordredcrinkle: fwiw:  delete from compute_nodes where id=21;00:00
crinklemordred: cool00:00
mordredcrinkle: I'm looking for pre-written scripts for doing the soft deletes00:00
mordredcrinkle: it seems to be ... tricky00:01
nibalizer28446800:01
nibalizerlogs ^^00:01
*** cody-somerville has joined #openstack-sprint00:05
*** dfflanders has quit IRC00:20
*** cody-somerville has quit IRC00:21
*** delatte has joined #openstack-sprint00:58
*** delattec has quit IRC01:00
*** yolanda_ has joined #openstack-sprint01:05
*** yolanda has quit IRC01:18
*** yolanda_ is now known as yolanda01:18
*** sivaramakrishna has joined #openstack-sprint02:40
*** baoli_ has quit IRC03:31
*** baoli has joined #openstack-sprint03:40
*** baoli has quit IRC03:42
*** baoli has joined #openstack-sprint03:43
*** baoli has quit IRC03:48
*** yolanda has quit IRC04:28
*** mrmartin has joined #openstack-sprint05:44
*** mrmartin has quit IRC05:48
*** cody-somerville has joined #openstack-sprint05:54
pabelangersigh06:30
pabelangerhttps://github.com/openstack/puppet-nova/blob/stable/kilo/manifests/init.pp#L60306:30
pabelangerkilo hardcodes /var/log/nova to 075006:31
pabelangerwhich causes apache to 40406:31
pabelangerwill hack on it in the morning06:31
*** cody-somerville has quit IRC06:34
*** mrmartin has joined #openstack-sprint06:38
*** imcsk8 has quit IRC06:53
*** imcsk8 has joined #openstack-sprint07:06
*** AJaeger has quit IRC08:10
*** mrmartin has quit IRC08:55
*** trusted has joined #openstack-sprint09:08
*** trusted has quit IRC09:10
*** sivaramakrishna has quit IRC09:10
*** sivaramakrishna has joined #openstack-sprint09:11
*** sivaramakrishna is now known as Guest9228709:11
*** yolanda has joined #openstack-sprint09:19
*** yolanda has quit IRC09:35
*** mrmartin has joined #openstack-sprint09:44
*** Guest92287 has quit IRC10:19
*** _degorenko|afk is now known as degorenko10:30
*** mrmartin has quit IRC10:34
*** mrmartin has joined #openstack-sprint10:35
*** mrmartin has joined #openstack-sprint10:47
*** mrmartin has quit IRC11:12
*** rfolco has joined #openstack-sprint11:53
*** mrmartin has joined #openstack-sprint12:11
*** NobodyCa1 has joined #openstack-sprint12:15
*** NobodyCam has quit IRC12:17
*** mrmartin has quit IRC12:43
*** yolanda has joined #openstack-sprint12:45
*** baoli has joined #openstack-sprint12:55
*** baoli_ has joined #openstack-sprint13:26
*** baoli has quit IRC13:28
clarkbanyone else doing pancakes? otherwiae we are heading over soon14:33
fungiclarkb: i am not. trying to catch up on some work and have been eating more than i'm used to this week14:38
fungithough if anyone's driving to the airport this afternoon and willing to have a passenger, i'm in need of a ride. if not, i'll probably just arrange a cab14:39
clarkbI am catching shuttle at noon14:42
clarkbwhich is a bit early for you14:42
fungiwell, also i'm not booked on your shuttle, so they'd likely balk14:48
*** NobodyCa1 is now known as Nobodycam15:09
*** Nobodycam is now known as NobodyCam15:10
anteayamorning15:13
*** dfflanders has joined #openstack-sprint15:19
fungianteaya: how's minnesota?15:21
anteayamuch the same as fort collins15:21
anteayadriving a rav415:21
anteayaliking it15:21
anteayalooks like devstack had a problem last night: http://lists.openstack.org/pipermail/openstack-dev/2016-February/087518.html15:22
anteayasdague and jhesketh worked on it: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-02-25.log.html#t2016-02-25T11:55:4215:22
anteayaand ansible was pinned in devstack-gate: https://review.openstack.org/#/c/284652/215:22
anteayathat is my summary thus far15:22
anteayaand I understand that you got infra cloud connected to nodepool yesterday, yay!15:25
anteaya\o/15:25
anteayanice work15:25
*** dfflanders has quit IRC15:25
fungiit was exciting to see15:27
nibalizergood morning15:27
fungieven if that short-lived excitement was soon replaced by new bugs to fix15:28
clarkbgood morning. jeblair pabelanger amd I are at hpe lobby. anyome check into why nodepool cant boot things yet?15:28
fungii'm just about to check out of my room and will be there shortly15:29
anteayayay15:30
anteayafungi: new bugs, yay!15:30
anteayanibalizer clarkb morning15:30
*** yolanda has quit IRC15:32
nibalizerclarkb: i have done no investigating15:37
nibalizergonna head to the hpe office in a few15:37
clarkbhttps://www.cloudbees.com/press/cloudbees-announces-first-jenkins-based-continuous-delivery-service-aws-and-openstack-private15:50
nibalizerheya16:07
nibalizerwere at the security desk16:07
fungiyep16:08
nibalizercan has fetching?16:08
clarkbpleia2: is on the way16:08
*** yolanda has joined #openstack-sprint16:09
rcarrillocruzhttps://review.openstack.org/#/c/284789/16:23
rcarrillocruzreviews pls, fix for the set project quotas playbook16:23
clarkbfile=/var/lib/nova/instances/979306d6-e169-4ba8-a343-b266b4f422c1/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none16:26
clarkbcompute029 doesn't have the updated nova.conf16:27
clarkbso this may just be a matter of making sure puppet ran everywhere16:27
clarkbmordred: are you around? new ansible may have broken infracloud puppeting. We don't run puppet against the infracloud hosts16:31
clarkbhttps://jenkins05.openstack.org/job/gate-rally-dsvm-rally-cinder/848/console16:32
mordredclarkb: uhm16:35
mordredclarkb: I am here - would it be useful for me to dial in?16:36
clarkbmordred: sure16:36
clarkbwe are currently talking about plans for mid term of clouds16:37
clarkbbut you are welcome to join in16:37
mordredneat16:37
clarkbmordred: the other issue is remember that host that nodepool couldn't delete and you cleaned up? it seems to still count against quota16:38
mordredclarkb: neat. I  may need to delete more records16:39
clarkbwe bumped quota and are now booting nodes again successfully16:39
clarkbbut we have one nodes worth of extra quota counted against us and it does not exist16:39
* jeblair thinks we sholud just redploy west ;)16:41
clarkbjeblair: how were you checking disk IO of running VM yesterday? I can start that up on compute02916:42
jeblairiotop16:46
clarkbwe seem to do a consistent 1-2 megabyte per second on that VM16:47
clarkbfor writes16:47
jeblairthat's what we saw on 035 yesterday too16:47
clarkbnote 029 has not updated its libvirt config16:47
clarkbbecause puppet isn't running on the infracloud things16:48
yolandainfra-root, is ok to restart gerrit ? is performing badly16:48
fungiyolanda: fine by me16:48
*** rfolco has quit IRC16:56
crinkleokay I believe https://review.openstack.org/#/c/276375/ is reasonably safe to apply to west though it will cause a short downtime16:57
Clintfungi: "ssh: Enable HashKnownHosts by default to try to limit the spread of ssh worms."16:58
*** rfolco has joined #openstack-sprint16:59
*** rfolco has quit IRC16:59
fungiClint: thanks~ i feel like we should just set that globally across all our puppeted servers in fact17:00
Clinti certainly do personally17:01
fungii understand the theory as to why it's sometimes a security improvement for personal systems (since it allows an attacker to get a list of other places you may have an account) but for our servers i don't see it being anything beyond a complication17:04
mordredfungi: ++17:04
clarkbmordred: [DEPRECATION WARNING]: Using bare variables is deprecated. from puppet run all log17:06
Clintan attacker can get that list of places from my shell history17:06
clarkbI think that is new with new ansible17:06
Clintso all it does is irritate me17:06
yolandacrinkle is it ok now to approve https://review.openstack.org/#/c/276375/ ?17:09
yolandaif that lands i can remove management address from my patches17:09
clarkbnibalizer: puppet did not run17:10
clarkbbut thats what it said so it actually tried17:11
crinkleyolanda: I think so17:11
crinkleexcept i guess puppet isn't running ^17:11
mordredclarkb: https://review.openstack.org/28482017:11
fungiClint: true. dat17:18
*** krtaylor has quit IRC17:21
jeblairhttp://docs.openstack.org/infra/system-config/sysadmin.html#ssh-access17:23
jeblaircrinkle: ^17:23
crinklejeblair: ty17:23
jeblairyolanda: can you see my comment on https://review.openstack.org/280326 ?17:26
*** sivaramakrishna has joined #openstack-sprint17:27
yolandachecking17:28
yolandayes i will fix17:29
crinklenibalizer: https://review.openstack.org/28483017:30
nibalizerty17:30
yolandajeblair, amended17:30
jeblairyolanda: thanks17:31
nibalizerfungi: 268366 and 28483017:33
nibalizersets up infracloud-root17:33
*** krtaylor has joined #openstack-sprint17:33
jeblairmordred, nibalizer, yolanda: can we chat about https://review.openstack.org/281892  and  https://review.openstack.org/243399 ?17:35
clarkbnibalizer: Could not find data item openstackci_infracloud_password in any Hiera data file and no default supplied at /opt/system-config/production/manifests/site.pp:1188 on node controller00.hpuswest.ic.openstack.org17:37
clarkbthe rally job succeeded \o/17:52
clarkbso while a little slow we are producing good results17:52
crinkle\o/17:57
yolandajeblair, patches fixed and waiting for review again17:58
*** sivaramakrishna has quit IRC18:04
*** degorenko is now known as _degorenko|afk18:04
Clinthttps://jenkins01.openstack.org/job/gate-tempest-dsvm-neutron-full/21745/console aborted18:06
fungiClint: did the change which triggered it get a new patchset, or was it in the gate and a change ahead of it failed so testing for it was restarted as new job runs?18:09
Clintfungi: no new patchset, no explicit dependency18:11
fungiwas it for an approved change?18:12
mordred  sensu_metrics_check: name=swift-sla-metrics plugin=metrics-os-api.py18:13
mordred                       args='-S swift --scheme {{ monitoring.graphite.host_prefix }}'18:13
nibalizeri have fixed up controller00's puppet i think18:13
yolandajeblair so we have port 80 in west, but is not needed18:13
yolandarcarrillocruz sent a change to remove for west18:13
rcarrillocruzi pushed a change to disable that for west18:13
rcarrillocruzy18:13
jeblairyolanda, crinkle: yes -- so let's make it it the same18:13
jeblairpabelanger: are you doing any logs on the baremetal nodes yet?18:14
yolandajeblair so i think the way is to not use 80, and remove from west, rather than adding to east18:14
jeblairpabelanger: or just the controller for now?18:14
Clintfungi: got approved ~1h ago18:14
pabelangerjeblair: just controllers, but easy to enable on controllers18:14
pabelangererr18:14
pabelangercomputes18:15
jeblairpabelanger: okay, so if you're not doing it on baremetal for now, then it may be safe to turn off port 80 on baremetal18:15
jeblaircrinkle: sound right to you?18:15
mordredhttps://github.com/blueboxgroup/ursula-monitoring/tree/master/sensu/plugins18:15
pabelangerjeblair: Ya, a matter of just saying we want it and updating a line in puppet18:15
mordredhttps://github.com/blueboxgroup/ursula-monitoring/blob/master/sensu/plugins/check-glance-store.py18:16
crinklejeblair: yolanda I don't see anything running on port 80...I'm wondering why it was there, nginx is running on 808018:16
mordredhttps://github.com/blueboxgroup/ursula-monitoring/blob/master/sensu/plugins/metrics-nova.py18:16
rcarrillocruzprobably a copy pasta from other definition18:16
crinklecould be18:17
rcarrillocruzbased as a template for baremetal18:17
fungiClint: what change was it? my connection is not well suited to browsing the jenkins webui18:17
crinklejeblair: yolanda I think it should be safe to remove it18:17
Clintfungi: https://review.openstack.org/#/c/27159918:18
fungiClint: oh, it's a cinder change. it's almost certainly still in the gate and a change ahead of it in the integrated queue failed and caused a gate reset18:19
mordredhttps://github.com/blueboxgroup/ursula-monitoring/tree/master/collectd/plugins/openstack18:20
fungiClint: http://status.openstack.org/zuul/ would tell you for sure (i similarly can't easily look at that from this machine)18:20
Clintfungi: yup, thanks18:21
crinklethis might be useful for the compute nodes https://collectd.org/wiki/index.php/Plugin:virt18:23
crinklesince sometimes the controller lies about what the computes are doing18:23
jeblairmordred:  https://review.openstack.org/281892  and  https://review.openstack.org/24339918:26
yolandainfra-root, there is an extra +2 needed for patches on infra-cloud apart from jeblair, care reviewing those ?18:26
mordredhttps://galaxy.ansible.com/openstack-infra/puppet/ btw18:35
clarkbnibalizer: has puppet updated the libvurt settings on compute029 yet?18:36
yolandajeblair, resent18:36
clarkbif that improves performance we can probably merge your change to bump max-servers18:37
nibalizerclarkb: checking18:37
funginibalizer: why is 268366 using depends-on instead of being rebased onto 284830 (they're in the same git repo)18:38
nibalizerclarkb: yes it ran18:39
clarkbnibalizer: if you look in /etc/nova/nova.conf is something=file=unsafe presentm18:39
clarkb?18:39
clarkbif so check the currently running VM for its qcow2 disk line in ps18:40
nibalizeri dont see a vm in 02918:40
clarkbscheduker us probablu moving them aroubd now18:40
fungiis anyone driving to the denver airport this afternoon/evening i can bum a ride from? i don't really care what time... my flight doesn't take off until after midnight so i expect to just work from the airport for a while anyway18:46
fungiif no, not a big deal, just don't want to waste money on cab fare needlessly18:46
crinkleyolanda: jeblair https://review.openstack.org/28487218:51
pabelangerjeblair: mordred: I asked the trystack team to publish there grafana dashboard tooling too: https://github.com/trystack/trystack-collectd As another metric to how to get the data18:53
yolandacrinkle, pleia2, or anyone interested, do you want to join a skype call about move of the servers in an hour ?18:54
nibalizerclarkb: hrm18:54
jeblairpabelanger: cool, and yeah, more collectd18:54
crinkleyolanda: yes18:54
nibalizerso according to openstack server show the vm is on 02918:54
nibalizerbut when i ps on 029 it does not exist18:54
nibalizerso i wonder if we lost another one18:54
pleia2yolanda: I assumed we'd do it here in the room with the phone thing18:54
clarkbnibalizer: if it uodated the hostname when it rwstarted then that seems likrly18:54
nibalizerclarkb: yea the vm doesn't ping18:54
pleia2we can use the skype phone number on the conference a call device18:54
yolandai had an invite with skype so we need to change the method18:55
yolandaah ok18:55
clarkbso we may want a controlled puppet everywhere18:55
pleia2I don't use skype, I always use the phone numbers :)18:55
clarkbthen redo mordreds cleanups18:55
yolandalet's do with skype phone18:55
crinklenibalizer: yep nova hypervisor-list | grep compute029 has one up one down18:55
nibalizergg18:55
clarkbskype stopped working on linux this week too18:55
clarkbso ya bad skype18:55
nibalizerour mirror pings18:55
nibalizerbecause its a good mirror18:56
*** AJaeger has joined #openstack-sprint18:56
clarkbI am on shuttle now so service may be spotty but ya just getting it everywhere then cleaning up may get us past this18:59
fungipleia2: yolanda: and we can dial the conference phone into pbx.openstack.org if there's more than one person not in the room who wants to be on the call19:00
crinkleyolanda: https://review.openstack.org/#/c/277605/5/manifests/site.pp19:06
yolandajeblair, crinkle, can you revisit  https://review.openstack.org/260018 ? fixed rabbit port to use ssl19:09
*** baoli_ has quit IRC19:11
yolandafungi ok i'll share phone number and conf and we can do it on the room, sounds good19:22
nibalizered78ffbb-ce70-43a1-8c5d-b0bbbe7d8174 | devstack-trusty-infracloud-west-8311836        | ACTIVE | public=15.184.55.1619:27
yolandafungi thanks for approving, i'll look at puppetmaster to babysit19:32
clarkbnibalizer: is thst a new host with better cache?19:36
nibalizerclarkb: no thats an orphaned node19:37
nibalizerthat needs the mordred clenaup funtime19:37
clarkbah19:37
clarkbwe should write up a proposal for nova to do that19:40
clarkbpossibly a no really just delete from db please19:40
clarkband maybe we can identify compute hosts with unique id set in config?19:41
clarkbthen this only changes if we explicitly change it in config19:41
clarkbI wonder how something like mesos does that19:42
jeblairor generate a uuid and store it in host-local storage on boot?19:42
jeblair(first boot)19:42
jeblair(and if you blow away your host, you get another uuid and you need to clean up the old one, but so what?)19:43
clarkbya as long as it is more explicit than hostname19:44
*** baoli has joined #openstack-sprint19:49
*** krotscheck is now known as krotscheck_dcm19:50
*** baoli has quit IRC19:51
nibalizerhttps://review.openstack.org/#/c/284820/20:02
nibalizerinfra root should be able to do that20:02
clarkb nibalizer we can manually remove that instance from nodepool db too,  since it is gone cloud side20:02
nibalizerclarkb: ok20:02
nibalizerwould i have to stop nodepool?20:02
clarkbthat should boot a new thing with cache hopefully20:02
clarkbnibalizer: no20:02
nibalizerwe're doing a conf call to determine the future of the gear20:02
clarkbah20:02
clarkbjust thinking it would be good to discover if cache mode fixes performance with io20:03
clarkbbut uncapable of doing anything myself so dont worry about it20:03
jeblairi wonder if "missing hardware" might explain some of the nodes that are unresponsive :)20:08
crinkleafaik there are only two that are unresponsive https://etherpad.openstack.org/p/infra-cloud-inventory-status20:10
crinkleothers might not be fully deployed or have other miscellaneous errors20:10
mordredit'd be great if we could get ipv6 in the new ecopod20:10
mordred:)20:10
clarkb+120:11
mordredand also maybe not so much with the tagged vlans20:11
mordredAND20:11
mordredif they're unracking everything anyway20:11
mordredmaybe they can do battery remediation20:11
clarkbtagged is fine if we can have two nics20:12
mordredyah20:12
clarkbnoe that we grok vlans eith glean and friemds20:12
mordredwhat?20:13
clarkbnow that we grok vlans20:14
clarkbthe trouble before was around interface setup when needing to tag20:14
clarkbbut thats mostly solved and leaves us with neutron funkyness on one interface20:15
rcarrillocruzfolks20:25
rcarrillocruzfor the ansible role to create resources baseline20:25
rcarrillocruzwhat are the resources that are common to all clouds20:26
rcarrillocruztrying to model it20:26
rcarrillocruzflavors are common to all clouds i assume20:26
rcarrillocruz?20:26
rcarrillocruzanything else?20:26
rcarrillocruzimages20:26
rcarrillocruz...20:26
clarkbflavors images users/projects/domains20:28
rcarrillocruzthat's it?20:28
rcarrillocruzthanks20:28
clarkbthe mirror maybe but I think thats a layer above20:28
rcarrillocruzyeah20:28
mordredcrinkle (and everyone else) this: http://paste.openstack.org/show/488271 is a pseudo-code first draft of the delete script20:29
clarkbnetworks20:29
clarkband subnets20:29
mordredI took the previous script which was using pt-archiver to move things to shadow tables20:30
mordredand am rewriting it ot just delete things20:30
mordredbecause screw that20:30
mordredalso, it'll now take a server argument which is the UUID of a server to purge from the db20:30
mordredI've also got a dump of the nova db from west on my laptop so that I can practice a few times :)20:31
rcarrillocruzclarkb: ?20:31
rcarrillocruzi had the impression you get the netowrks and subnets created for you on public cloud providers20:31
rcarrillocruzso it's not really common to all clousd?20:32
mordredit depends20:32
mordredsome clouds create them for you, some do not20:32
rcarrillocruzok20:32
rcarrillocruztell you what20:32
mordredsame with flavors - some clouds we create them, some we do not20:32
rcarrillocruzi'll model it in a yaml20:32
mordredyah20:32
mordred++20:32
mordredI think that's a great first step20:32
rcarrillocruzwith a dict as 'common_cloud_resources"20:32
mordredimages is a thing we want to be the same everywhere20:32
rcarrillocruzthose get created oin all clouds defined20:32
rcarrillocruzthen in a cloud/region basis we can have specific additional resources20:32
mordredyah20:32
rcarrillocruznot mutually exclusive20:33
anteayaoh good news rcarrillocruz and clarkb, dougwig and kevinbenton were sitting close to me at lunch and are willing to listen to you tell them your use case involving default security groups20:35
anteayaI suggested next week might be a good time for a chat20:35
anteayaand recommended they try to find you in infra20:36
rcarrillocruznice anteaya20:38
clarkbdenver is an airport where pre would be useful20:38
rcarrillocruzmordred: fwiw, can't wait to see this coming: https://github.com/ansible/ansible/issues/1326220:38
rcarrillocruzinclude with_items workaround is nasty :/20:39
Clintclarkb: they don't have pre?20:39
clarkbI dont20:39
Clintohhh20:39
clarkbportland + pre is not super useful20:39
clarkblines what lines20:39
clarkbdenver on the other hand20:39
clarkbalso there are special baggage claims for skis20:42
*** _degorenko|afk has quit IRC20:42
*** _degorenko|afk has joined #openstack-sprint20:43
rcarrillocruzfolks, can you +2+A https://review.openstack.org/#/c/284801/ , had to rebase20:44
rcarrillocruzthx fungi20:47
pabelangerSo, crinkle was mentioning that our puppet-infracloud jobs in the gate were pretty slow.  Did we want to review the jobs and possible trim them?  EG: setup -nv for centos / fedora and or drop beaker jobs?20:49
pabelangersince we depend on bare-precise / bare-trusty and are limited by rax20:50
crinklepabelanger: in this case they were waiting to run puppet-lint and puppet-syntax jobs20:51
rcarrillocruzmordred, clarkb: http://paste.openstack.org/show/488275/20:52
pabelangercrinkle: right, which is on bare-trusty nodes (rax only).  i think we could take with fungi and convert -infra to bindep jobs (which is ubuntu-trusty) nodes and on all clouds now20:53
fungipabelanger: the challenge i think will be that the plan so far has been to switch jobs over on a per-job-template basis (since we need to force them to specific node types), so we'd need to use slightly different templates just for the jobs using bindep if we're talking about splitting some projects to this and not others which used the same job-templates20:55
pabelangerfungi: right20:56
fungithat also complicates the unwinding once we finish migrating20:57
fungiso would want to make sure we track what can be recombined into common job-templates again in the end20:57
AJaegerfungi, yeah - I had to duplicate the tox job for moving the manual jobs in 28344521:02
fungipabelanger: also i have a feeling we're about to be not landing many puppet-infracloud changes soon for at least some weeks while all te hardware is offline21:02
anteayaclarkb: yes denver is an airport where pre would be useful21:02
AJaegerand right now our systems are really busy - 380 changes in check queue...21:03
pabelangerfungi: agreed.  Just looking at optimizations21:03
fungiclarkb: current discussion is around whether we should be doing 802.3ad or similar and then 802.1q over that. you almost certainly have some input here21:13
crinklecould someone check on the puppetmaster and see if the last puppet run on controller00 was successful? it looks like puppet has run on the controller but not on the computes so the computes are trying to reach a rabbitmq that has moved ports21:14
fungiclarkb: as opposed to putting different native vlans on the two uplinks21:14
crinkleo.0 Feb 25 21:20:30 controller00 puppet-user[10150]: Could not find data item elasticsearch_nodes in any Hiera data file and no default supplied at /opt/system-config/production/manifests/site.pp:8 on node controller00.hpuswest.ic.openstack.org21:21
crinkleroot@controller00:~# grep elasticsearch_nodes /opt/system-config/production/hiera/common.yaml21:22
crinkleelasticsearch_nodes:21:22
crinkleaha http://git.openstack.org/cgit/openstack-infra/ansible-puppet/tree/defaults/main.yml#n1521:25
*** cody-somerville has joined #openstack-sprint21:25
nibalizercody-somerville: http://paste.openstack.org/show/488279/21:25
yolandainfra-root , are you familiar with error21:27
yolandafatal: [controller00.hpuseast.ic.openstack.org]: FAILED! => {"changed": false, "disabled": false, "error": true, "failed": true, "invocation": {"module_args": {"environment": null, "facter_basename": "ansible", "facts": null, "logdest": "syslog", "manifest": "/opt/system-config/production/manifests/site.pp", "puppetmaster": null, "show_diff": false,21:27
yolanda"timeout": "30m"}, "module_name": "puppet"}, "msg": "puppet did not run", "rc": 1, "stderr": "", "stdout": "", "stdout_lines": []}21:27
yolandahow can i better debug it ?21:28
crinkleyolanda: please see what i pasted above ^21:28
crinklethe last ansible-puppet change broke the hiera config21:28
yolandacrinkle there was a typo on the flag, i fixed with https://review.openstack.org/28493621:29
yolandabut still fails for me21:30
funginibalizer: i'm reading through lspci... two gigabit interfaces, a mellanox 10g interface... i wonder what the fourth nic is?21:30
yolandabut yes , apart from that there is something with hieradata now21:31
crinkleyolanda: the hiera.yaml is pointing the hiera data at   :datadir: "/etc/puppet/hieradata"21:31
clarkbfungi: neutron wants its own logical interface21:31
clarkbfungi: so link aggregation doesn't really help21:31
clarkbyou can do link aggregation but you still need two logical interfaces21:31
yolandanibalizer, can you check at that hiera paths in ansible-puppet ? looks as they are not really matching our needs21:34
clarkbI think the ideal would be 10Gbe for neutron, and 1gig or better for control21:34
clarkband on the controller use 10Gbe for image stuff21:34
clarkbactually you want 10Gbe for all the things because images21:35
fungiclarkb: sounds like these have a 10gbe mellanox interface and then a couple of 1gbe nics21:35
fungithe hardware in west anyway21:35
yolandammm21:35
yolanda:yaml:21:35
yolanda  :datadir: "/opt/system-config/"21:35
yolandaneeds to be that, not /etc/puppet/hieradata...21:35
clarkblooks like the unhappy VM is still in nodepool and still unhappy21:35
Clintfungi: if you up all 4 interfaces, how many of them have link?21:35
clarkbmordred: ^ you are working on cleaning it out of cloud? has that happened yet?21:36
fungialso i have a feeling the ipmi boot isn't going to work over the 10gbe nic21:36
clarkbmordred: not sure if nodepool needs encouragement or if it isn't expected to work yet21:36
fungiClint: Clint two21:36
fungieth0 and eth1 have no carrier, eth2 and eth3 have carrier21:37
fungiafter ip link set up21:37
Clintfungi: and which one are we using?21:37
fungieth221:38
crinklecould someone make recommendations on https://review.openstack.org/284939 ?21:38
Clintso if eth3 is shared with the ilo port that might make sense21:38
yolandacrinkle nibalizer https://review.openstack.org/#/c/28494221:40
fungiClint: yep, that's what i'm suspecting21:41
jeblairSpeed: 10000Mb/s21:41
jeblairfrom ethtool eth2 in compute03521:41
fungiClint: because i see the ilo show up in lspci21:41
yolandaand i need https://review.openstack.org/284936 and https://review.openstack.org/284938 to land as well21:42
fungijeblair: did you manually install ethtool? i was only finding mii-tool installed on compute00121:42
jeblairfungi: yep21:42
mordredclarkb: I'm still working on the script21:43
yolandainfra-root, to fix puppet ^21:43
crinkleyolanda: I +1'd all your things and abandoned my thing21:44
yolandathx21:44
yolandai did some tests changing that parameters on my controller and puppet ran21:45
jheskethHowdy21:48
nibalizeryolanda: commented on 284938/121:48
jeblairjhesketh: o/21:48
nibalizeryolanda: crinkle is the problems with ansible-puppet why the controller00 is failing puppet?21:49
crinklenibalizer: yes21:49
nibalizercool21:49
crinkleand most likely everything in infra is failing21:49
nibalizercrinkle: suprisingly not21:50
nibalizerprobably because we do not have manage_true21:50
crinkleah21:50
nibalizerer manage_config: true21:50
jeblairmanage_config=true but yeah21:50
nibalizerin the other playbooks21:51
nibalizer0 controller00 puppet-user[11898]: Could not find data item elasticsearch_nodes in any Hiera data file and no default supplied21:53
nibalizercrinkle: yolanda21:53
crinkleright that is what led to this21:53
crinklethere is a hiera.yaml that points hiera data dir to /etc/puppet/hierdata21:53
jeblair(is elasticsearch_nodes something that controller00 should be attempting to lookup?)21:54
crinklejeblair: it's in the public common.yaml21:54
nibalizerthe problem is 28494221:54
nibalizerwe need that21:54
crinklejeblair: http://git.openstack.org/cgit/openstack-infra/system-config/tree/manifests/site.pp#n821:55
clarkbhttp://www.meetup.com/OpenStack-Colorado/events/228594900/ if you want to learn about magnum and kolla21:57
Clintalas, i have a prior engagement21:58
nibalizerhttps://review.openstack.org/#/c/284954/21:58
clarkbmordred: would it be a bad idea to remove that node fomr the nodepool db so that nodepool can boot another machine?21:59
clarkbI think we should've applied the new nova compute config everywhere at this point21:59
yolandainfra-root can you review https://review.openstack.org/#/c/284942/ ?21:59
mordredclarkb: sure. which node is it?21:59
clarkb8311836 with uuid ed78ffbb-ce70-43a1-8c5d-b0bbbe7d817422:00
*** delatte has quit IRC22:00
mordredclarkb: done22:01
mordredclarkb: I have not yet figured out the quota thing22:01
clarkbmordred: done meaning nova no longer know about it?22:01
jeblairclarkb: puppet has been broken; i'm not sure if we know whether it has been applied everywhere22:01
clarkbjeblair: rgr22:01
mordredclarkb: I have done the same db things to that node as I did to the previous one22:02
clarkbmordred: woot thanks22:02
mordredupdate instances set task_state='deleted', deleted=1 where uuid= 'ed78ffbb-ce70-43a1-8c5d-b0bbbe7d8174';22:02
mordredfor what it's worth22:02
yolandainfra-root, i need reviews again for https://review.openstack.org/284938, i refactored22:02
jeblairclarkb: where's the setting?22:02
clarkbjeblair: /etc/nova/nova.conf grep for 'unsafe'22:02
jeblairclarkb: it is applies everywhere22:03
jeblairapplied22:03
clarkbcool22:04
mordredclarkb: how many instances does nova think are in use now?22:04
jeblair(confirmed with ansible adhoc grep)22:04
clarkbmordred: nova? or nodepool?22:04
mordredeither22:04
clarkbnodepool says 022:04
crinkleone (the mirror)22:04
crinklefrom nova's perspective22:04
mordredthat's in a different project22:04
mordred| 2016-02-24 20:20:24 | 2016-02-25 18:12:57 | NULL       | 17 | 894a11e0a16a4c29bb8b884c1c70bf2c | instances       |      2 |        0 |          NULL |       0 | 7dbe0f121e424a74be2eed25399e2c75 |22:04
mordred| 2016-02-24 20:20:24 | 2016-02-25 18:12:57 | NULL       | 18 | 894a11e0a16a4c29bb8b884c1c70bf2c | ram             |  16384 |        0 |          NULL |       0 | 7dbe0f121e424a74be2eed25399e2c75 |22:04
mordred| 2016-02-24 20:20:24 | 2016-02-25 18:12:57 | NULL       | 19 | 894a11e0a16a4c29bb8b884c1c70bf2c | cores           |     32 |        0 |          NULL |       0 | 7dbe0f121e424a74be2eed25399e2c75 |22:04
mordredthat's the quota usage for the nodepool project22:04
mordredI think I'd like to set instance, ram and core usage to 022:04
mordredI have now done that22:05
mordredI think quotas should be correct now22:05
clarkbthe next VM booted should use cache unsafe on its boot disk22:07
mordredwoot22:07
clarkbyou can confirm this by running a ps -elf on that process22:07
clarkband part of the command is the string used for the root disk22:07
clarkbwas saying cache=none should say cache=unsafe22:07
Clintclarkb: so it hasn't yet today?22:08
clarkbno because puppet wasn't running on the computes22:08
clarkbthen it ran and it broke the single VM which made nodepool stop making new ones22:09
Clintahh22:09
fungifragility, thy name is openstack22:11
clarkbLaunchStatusException: Server ca1453ce-23bb-41e7-ba8c-9e44415f0064 for node id: 8316914 status: ERROR22:12
clarkbso thats neat22:12
mordredmmm22:12
mordredthat's a great status22:12
crinkleclarkb: run nova hypervisor-list22:12
jeblairclarkb: yeah, rabbitmq is broken because of the puppet break22:12
clarkbaha22:12
jeblairi think all puppet-fixing patches have been aprvd22:13
jeblairbut only just now22:13
nibalizer28493822:14
nibalizer28495422:15
Clintfull of lies22:15
jeblair936, 938 and 94222:15
*** delatte has joined #openstack-sprint22:25
yolandainfra-root, i need https://review.openstack.org/#/c/284447/j22:30
yolandahttps://review.openstack.org/#/c/284447/ , for controller to work22:31
*** delattec has joined #openstack-sprint22:33
mordredhttps://review.openstack.org/28496922:35
mordredclarkb, crinkle: ^^22:35
mordredthat should be usable to delete an instance from the db - and to clean out soft-deleted records22:35
yolandacrinkle, i fixed... and also change https://review.openstack.org/#/c/284881/ to properly send neutron in west needs to land https://review.openstack.org/#/c/284881/22:36
*** delatte has quit IRC22:36
crinkleyolanda: commented on 88122:37
mordredperhaps we should just make a cron to install that script and run it every 15 minutes?22:39
mordredwe pretty much never want to keep actual deleted records around22:39
crinklepuppet is still broken, hiera.yaml still has wrong datadir22:40
crinkleoh, possibly because 284936 merged after the last puppet run22:41
fungii now have a cab booked departing to den from the hilton garden inn at 5:30pm local if anyone needs to share a ride22:46
jeblairmordred: yeah, add it to puppet-infracloud... :|22:47
crinklepuppet looks fixed22:50
* crinkle crosses fingers22:50
jeblaircrinkle: \o/22:50
yolandainfra-root, this change https://review.openstack.org/#/c/284447/ is blocking addition of controller in useast... can you review ?22:56
crinklepuppet successfully ran on controller00, was it able to start on the computes?22:58
crinklenibalizer: yolanda ^22:58
crinklemy reading of puppetboard makes it look like controller hasn't submitted a report in a couple of hours22:59
yolandaand this https://review.openstack.org/284881 needs to land first23:00
*** delatte has joined #openstack-sprint23:01
crinkleI don't think 881 and 447 will pass puppet apply separately, I think they need to be in the same change23:02
*** delattec has quit IRC23:03
yolandainfra-root, i abandoned 881 and added west to https://review.openstack.org/28444723:08
nibalizercrinkle: we failed to post facts23:10
nibalizerso no23:10
nibalizerim debugging23:10
crinklenibalizer: :'(23:10
nibalizerweeping uncontrollably23:12
fungithanks yolanda23:12
funginibalizer: *sniffle*23:12
yolandajeblair, can you add another +2 to 28444723:18
jeblairnibalizer: https://review.openstack.org/28434623:19
mordredjeblair: shouldn't it reference remote_puppet_adhoc?23:22
mordredoh. piddle. that didn't land yet because linters23:23
mordredsilly me - I got indentation of the shellscript wrong23:24
mordredhttps://review.openstack.org/#/c/284352/23:24
yolandathx jeblair23:25
yolandajeblair http://paste.openstack.org/show/488292/23:38
rcarrillocruzhuh23:38
rcarrillocruzmordred: there's no os_domain module23:38
rcarrillocruz?23:38
* rcarrillocruz sad panda23:38
mordredrcarrillocruz: https://github.com/ansible/ansible-modules-extras/blob/devel/cloud/openstack/os_keystone_domain.py23:39
rcarrillocruzaha!23:39
rcarrillocruzthanks23:39
mordreda domain is an admin-only thing, so  it gets prefixed with the name of the service23:39
rcarrillocruzany reason for 'keystone' in middle23:40
rcarrillocruzah, see the pattern23:40
rcarrillocruzsomething i see there's missing is an os_quota thing23:40
rcarrillocruz?23:40
rcarrillocruzin case, would that be os_server_quota23:40
rcarrillocruzos_port_quota23:40
rcarrillocruzetc23:40
rcarrillocruzor os_quota with the resource passed in as param23:41
rcarrillocruzthinking on potential stream works for myself23:41
rcarrillocruzmordred: ^23:43
mordredhrm23:44
mordredI think probably os_nova_quota23:44
mordredagain - it's an admin thing, so you konw you're setting nova quotas23:44
rcarrillocruzerm, yeah23:45
mordredI think we could also have the ansible modules do validation that the project_id you're passing in is a valid project_id23:45
crinklehave we put together an etherpad yet for our networking request? and/or could someone start that?23:49
crinkleclarkb: can you review and help expand on the bottom of https://etherpad.openstack.org/p/mitaka-infra-midcycle so we can send it as an email to the dc ops team?23:53
*** krtaylor has quit IRC23:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!