Tuesday, 2016-12-13

*** kevinz has joined #openstack-nimble00:50
*** kevinz has quit IRC00:58
*** kevinz has joined #openstack-nimble01:39
*** kevinz has quit IRC02:34
*** kevinz has joined #openstack-nimble02:40
liushengzhenguo: ping02:57
zhenguoliusheng: pong02:57
liushengzhenguo: a question, we don't need to config [pxe] section in our devstack installation ?02:58
zhenguoliusheng: seems not02:58
liushengzhenguo: in ironic.conf02:58
liushengzhenguo: why02:58
zhenguoliusheng: it has default values02:59
zhenguoliusheng: we should not change it in devstack env03:00
liushengzhenguo: now, my local devstack can also create instance, I just compared the ironic.conf in my devstack and the tempest job, the obvious difference is the ironic.conf in tempest job didn't config this section03:00
zhenguoliusheng: oh, you mean you env is ok now?03:01
liushengzhenguo: yes, but I have re-installed my devstack in a vm, it is ok. seems it cannot work in a devstack installed based on a physical server.03:02
liushengzhenguo: it is strange03:02
zhenguoliusheng: hah03:03
liushengzhenguo: I have checked the ironic.conf both in my env and another ironic tempest job, it has the [pxe] section, but our tempest job don't include that03:04
zhenguoliusheng: I just checked the tempest ironic.conf, it has a pxe section03:06
liushengzhenguo: oh, my mistake :(03:07
liushengzhenguo: yes, it has03:07
zhenguoliusheng: but not sure why it not work03:07
liushengzhenguo: I am crazy with that :(03:08
zhenguoliusheng: can you read more information form the console log03:08
liushengzhenguo: do you think it is relate with "crazy"03:08
liushengenabled_drivers = fake,agent_ssh,agent_ipmitool,pxe_ssh,pxe_ipmitool03:08
zhenguoliusheng: lol03:08
liushengthis config option ?03:08
zhenguoliusheng: no, the ironic node console log03:08
liushengzhenguo: I cannot find useful info from the console log03:09
zhenguoliusheng: does it have a DHCP process03:09
liushengzhenguo: it is enabled_drivers = fake,agent_ssh,agent_ipmitool in our tempest job, but enabled_drivers = fake,agent_ssh,agent_ipmitool,pxe_ssh,pxe_ipmitool in my env03:10
zhenguoliusheng: seems not related, as we only use agent_ssh.03:10
liushengzhenguo: you mean neutron-dhcp-agent ?03:11
zhenguoliusheng: no,03:11
zhenguoliusheng: I mean when the node is start, it should have a DHCP process,  I want to know if it can get a IP from neutron03:12
zhenguoliusheng: I still suspect it's a network problem03:12
liushengzhenguo: I don't how to confirm, the ironic-cond's log just says "timeout for waiting call-back"03:13
zhenguoliusheng: the only way it's to get some clues from ironic-bm-logs03:13
zhenguoliusheng: you can check the ironic-bm-logs in your env03:17
zhenguoliusheng: which VIM plugin do you use to make it human readable?03:18
liushengzhenguo: Nothing to boot: No such file or directory (http://ipxe.org/2d03e13b03:19
liushengNo more network devices03:19
liushengPress Ctrl-B for the iPX E command line...03:19
liushengNo bootable device.03:19
liushengzhenguo: you can use less -R {filename} to read03:19
zhenguoliusheng: ok, thanks03:19
liushengzhenguo: or install the plugin AnsiEsc03:19
liushengzhenguo: http://vim.sourceforge.net/scripts/script.php?script_id=30203:20
liushengzhenguo: the above is the useful info from the console log03:20
zhenguoliusheng: seems it's still network problem03:22
zhenguoliusheng: we don't get a IP from the neutron network DHCP03:22
liushengzhenguo: hmm, may it cannot support dynamically creating network in tempest job :(03:25
zhenguoliusheng: I remember Ironic devstack plugin also set other options for tempest network03:25
zhenguoliusheng: no, we have already created the network03:25
zhenguoliusheng: some other options03:25
liushengzhenguo: may you can help, I haven't in the door of ironic yet, lol03:26
zhenguoliusheng: haha. sure03:27
zhenguoliusheng: ironic's tempest can work, and it really create instance and delete instance03:39
zhenguoliusheng: the best way it's to follow it03:39
liushengzhenguo: yes, but I don't what is the difference of ironic and nimble's job03:40
*** kevinz has quit IRC03:58
*** kevinz has joined #openstack-nimble04:50
*** kevinz has quit IRC06:14
zhenguoliusheng: I will dig the tempest failure after taskflow work, maybe you can do other things first06:37
liushengzhenguo: thank you a lot! :)06:37
zhenguoliusheng: np :D06:38
zhenguoliusheng: Alex mentioned that nova has a config options for name, maybe we can follow that way06:38
zhenguoliusheng: but on db side, we should not set name as unique, please continue the patch06:39
liushengzhenguo: OK, get it06:39
zhenguoliusheng: thanks06:39
*** yuntongjin has joined #openstack-nimble06:55
shaohe_fengzhenguo: hi06:57
zhenguoshaohe_feng: hi06:57
shaohe_fengzhenguo: I create the etherpad, you can add it to wiki. https://etherpad.openstack.org/p/nimble-task06:58
shaohe_fengzhenguo: and I have seen you are working on configdrive06:58
*** kevinz has joined #openstack-nimble06:58
shaohe_fengzhenguo: so I will working on the quotas.06:58
zhenguoshaohe_feng: ok, thanks06:59
zhenguoshaohe_feng: I have taken over the taskflow work06:59
shaohe_fengzhenguo: so many tasks for you.06:59
shaohe_fengzhenguo: you can update the etherpad.07:00
zhenguoshaohe_feng: I find a way to revert task, will update soon, hope the create taskflow work will be done by tomorrow.07:00
shaohe_fengzhenguo: so I can help to work on the configdrive07:01
shaohe_fengzhenguo: OK. another questions.07:01
shaohe_fengzhenguo: if the delete api delete the DB07:01
shaohe_fengzhenguo: and the create is still in process07:01
zhenguoshaohe_feng: sure, you can work on that first :P07:01
zhenguoshaohe_feng: I will focus on create/delete refactor task07:01
zhenguoshahe_feng: it's a tricky one07:02
shaohe_fengzhenguo: something wrong with the nimble  daemon07:02
zhenguoshaohe_feng: maybe we need to add a lock07:02
shaohe_fengzhenguo: the nimble restart07:02
shaohe_fengzhenguo: I means the nimble daemon running "create instance", restart07:03
shaohe_fengzhenguo: how does it what resource should it free?07:03
zhenguoshaohe_feng: not sure, I will refactor the how process07:04
zhenguoshaohe_feng: please check tomorrow, hah07:04
zhenguos/how/whole07:04
shaohe_fengzhenguo: so the resource such as network and volumes will be zombie07:04
shaohe_fengzhenguo: who will reap07:05
shaohe_fengzhenguo: who will reap them?07:05
zhenguoshaohe_feng: yes, nobody will remove them07:05
zhenguoshaohe_feng: and as the instance has been deleted, you even can't find the network information07:05
shaohe_fengzhenguo: can we avoid the zombie resource?07:06
zhenguoshahe_feng: you mean the scenario of delete instance when create is still in process07:06
zhenguoshaohe_feng: or create instance failed07:07
shaohe_fengzhenguo: yes.07:08
zhenguoshaohe_feng: do you think it make sense to prevent deleting when we are in building process07:09
shaohe_fengzhenguo: prevent is a simple design.07:11
zhenguoshaohe_feng: yes, but simple doesn't mean bad, hah07:12
shaohe_fengzhenguo: agree.07:14
zhenguoshaohe_feng: why do you want to delete it when it's still in building process07:14
zhenguoshaohe_feng: if we can make sure it will not stay in building process always07:14
shaohe_fengzhenguo: yes, we need a scenario.07:14
shaohe_fengzhenguo: https://wiki.openstack.org/wiki/Nimble#Task_track07:15
zhenguoshaohe_feng: yes, before someone requests us to do that, I think just prevent deleting is ok07:15
shaohe_fengzhenguo: let me discuss it with ZangRui.07:15
shaohe_fengzhenguo: OK.07:15
zhenguoshaohe_feng: thanks07:15
zhenguoshaohe_feng: and when an instance is building process, if a delete request comming, do you think we should just retrun with error or wait the process done and then delete it07:18
shaohe_fengzhenguo: an error is simple.07:37
shaohe_fengzhenguo: also return 202 and a task in backgroud to delete is OK.07:38
zhenguoshaohe_feng: I think we can discuss it more, when the deleting refactor is in process.07:39
zhenguoshaohe_feng: after I finished the create taskflow work07:39
shaohe_fengzhenguo: OK. another idea. And an "deleted" field in DB. and default value is "False".07:41
shaohe_fengzhenguo: when delete API, request. If the status is error or deploying or finish, just simple reap the resource and delete the instance.07:42
shaohe_fengzhenguo: if the status is in buiding, just mark the "deleted" field of instance in DB as "True"07:43
shaohe_fengzhenguo: and the create task check the this "deleted" field, and start revert.07:44
zhenguoshaohe_feng: seems ok07:44
zhenguoshaohe_feng: besides this, I also want to add deleted field to db07:45
zhenguoshaohe_feng: as users may want to check the last months instances used information07:45
shaohe_fengzhenguo: if nimble daemon restart, at the initiation phase,  it can check the status/and "deleted" field of all instances and then reap the zombie resource.07:46
shaohe_fengzhenguo: let me check nova code,  if there is "deleted" field of instance07:47
zhenguoshaohe_feng: do you think it's a bit waste if we created all resouces and just delete them after that.07:47
zhenguoshaohe_feng: there are deleted fields in all nova, cinder, glance tables.07:48
shaohe_fengzhenguo: maybe the users may regret after they create the instance for some reason, maybe somemistick.07:49
zhenguoshaohe_feng: in every task in the create instance flow, we should check if it has been deleted and raise InstanceNotFound exception to trigger the revert work, instead of waiting for all resouces created07:50
shaohe_fengzhenguo: such as they use the wrong key to inject, or other reasons.07:50
shaohe_fengzhenguo: agree.07:50
zhenguoshaohe_feng: yes we should gracefully handle users' requests instead of just return a forbidden07:51
zhenguoshaohe_feng: do you know whether we need to proved floating ip associate/unassociate API? or neutron API can do that07:53
zhenguos/proved/provide07:53
shaohe_fengzhenguo: https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/models.py#L196 nova mark many deleted.07:55
zhenguoshaohe_feng: yes07:57
shaohe_fengzhenguo: neutron API do that, does that nimble always request floating ip to neutron?07:58
zhenguoshaohe_feng: yes07:58
zhenguoshaohe_feng: so we don't need to provide a seperate API?07:59
zhenguoshaohe_feng: I know we can allocate floating IP from neutron, but netron can also provide the instance associate API?08:00
shaohe_fengzhenguo: we need to dig out, how nova use the deleted information?08:01
shaohe_fengzhenguo: https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L2075  you can see the delete and soft_delete description08:01
*** yuntongjin has quit IRC08:02
zhenguoshaohe_feng: yes, but seems we don't need soft_deleted08:02
shaohe_fengzhenguo: 'deleted' - only return (or exclude) deleted instances08:03
zhenguoshaohe_feng: yes, it's a query option08:03
zhenguoshaohe_feng: if you specify deleted parameter when query instances, it just return the deleted instances08:04
zhenguoshaohe_feng: but I remember someone said that nova will remove all 'deleted' parameter in it's API08:05
shaohe_fengzhenguo: yes. but we need to know more about how does "deleted" take effect.08:06
zhenguoshaohe_feng: yes,08:07
shaohe_fengzhenguo: Oh, so we need to check it. if it remove all "deleted", then should we support it?08:07
shaohe_fengzhenguo:  does RuiChen know this "deleted" field in DB?08:08
zhenguoshaohe_feng: I amd also worried about that, maybe you can ask Alex about that08:08
shaohe_fengzhenguo: OK. it remove all "deleted" parameters in API, does also the field in DB?08:08
zhenguoshaohe_feng: not sure08:09
*** kevinz has quit IRC08:13
* zhenguo brb08:18
*** yuntongjin has joined #openstack-nimble08:38
zhenguoshaohe_feng, liusheng: I find a issue when using node_cache, if there are many create request received, seems they will schedule to the same node :(08:51
zhenguowe should add a threading lock when one accesses the node_cache, and after scheduling remove the selected node from cache.08:57
liushengzhenguo: may we will support multiple-workers ?08:58
zhenguoliusheng: you mean nimble-engine?08:59
liushengzhenguo: yes08:59
zhenguoliusheng: yes, but not sure whether we need to support active-active mode08:59
zhenguoliusheng: maybe just active and standby?08:59
liushengzhenguo: most of other openstack services multi-workers is a-a mode, right ?09:01
zhenguoliusheng: yes09:01
liushengzhenguo: if we support multi-workers with a-a mode with node_cache, we may need an external lock mechanism09:02
zhenguoliusheng: I think we can use one worker now, as I think the most biggest baremetal cluster is about 400009:03
zhenguoliusheng: one worker should be enough to handle that09:04
liushengzhenguo: hah, hope that09:04
zhenguoliusheng: hah, If there are more, we can change in the future09:04
zhenguoliusheng: but if there are more than one worker, I think we can't use cache but should store the nodes in DB09:05
liushengzhenguo: ok, yes, the DB naturally support lock mechanism09:06
zhenguoliusheng: yes09:06
*** yuntongjin has quit IRC09:19
*** Kevin_Zheng has quit IRC10:47
*** RuiChen has quit IRC11:06
openstackgerritZhenguo Niu proposed openstack/nimble: [WIP] Add create instance taskflow  https://review.openstack.org/40355512:01
*** liusheng has quit IRC12:17
openstackgerritZhenguo Niu proposed openstack/nimble: Add create instance taskflow  https://review.openstack.org/40355512:47
-openstackstatus- NOTICE: Launchpad SSO is not currently working, so logins to our services like review.openstack.org and wiki.openstack.org are failing; the admins at Canonical are looking into the issue but there is no estimated time for a fix yet.16:24
*** ChanServ changes topic to "Launchpad SSO is not currently working, so logins to our services like review.openstack.org and wiki.openstack.org are failing; the admins at Canonical are looking into the issue but there is no estimated time for a fix yet."16:24
*** ChanServ changes topic to "Bugs: bugs.launchpad.net/nimble | Review: https://review.openstack.org/#/q/project:openstack/nimble,n,z"17:01
-openstackstatus- NOTICE: Canonical admins have resolved the issue with login.launchpad.net, so authentication should be restored now.17:01

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!