Tuesday, 2016-12-13

*** kevinz has joined #openstack-nimble		00:50
*** kevinz has quit IRC		00:58
*** kevinz has joined #openstack-nimble		01:39
*** kevinz has quit IRC		02:34
*** kevinz has joined #openstack-nimble		02:40
liusheng	zhenguo: ping	02:57
zhenguo	liusheng: pong	02:57
liusheng	zhenguo: a question, we don't need to config [pxe] section in our devstack installation ?	02:58
zhenguo	liusheng: seems not	02:58
liusheng	zhenguo: in ironic.conf	02:58
liusheng	zhenguo: why	02:58
zhenguo	liusheng: it has default values	02:59
zhenguo	liusheng: we should not change it in devstack env	03:00
liusheng	zhenguo: now, my local devstack can also create instance, I just compared the ironic.conf in my devstack and the tempest job, the obvious difference is the ironic.conf in tempest job didn't config this section	03:00
zhenguo	liusheng: oh, you mean you env is ok now?	03:01
liusheng	zhenguo: yes, but I have re-installed my devstack in a vm, it is ok. seems it cannot work in a devstack installed based on a physical server.	03:02
liusheng	zhenguo: it is strange	03:02
zhenguo	liusheng: hah	03:03
liusheng	zhenguo: I have checked the ironic.conf both in my env and another ironic tempest job, it has the [pxe] section, but our tempest job don't include that	03:04
zhenguo	liusheng: I just checked the tempest ironic.conf, it has a pxe section	03:06
liusheng	zhenguo: oh, my mistake :(	03:07
liusheng	zhenguo: yes, it has	03:07
zhenguo	liusheng: but not sure why it not work	03:07
liusheng	zhenguo: I am crazy with that :(	03:08
zhenguo	liusheng: can you read more information form the console log	03:08
liusheng	zhenguo: do you think it is relate with "crazy"	03:08
liusheng	enabled_drivers = fake,agent_ssh,agent_ipmitool,pxe_ssh,pxe_ipmitool	03:08
zhenguo	liusheng: lol	03:08
liusheng	this config option ?	03:08
zhenguo	liusheng: no, the ironic node console log	03:08
liusheng	zhenguo: I cannot find useful info from the console log	03:09
zhenguo	liusheng: does it have a DHCP process	03:09
liusheng	zhenguo: it is enabled_drivers = fake,agent_ssh,agent_ipmitool in our tempest job, but enabled_drivers = fake,agent_ssh,agent_ipmitool,pxe_ssh,pxe_ipmitool in my env	03:10
zhenguo	liusheng: seems not related, as we only use agent_ssh.	03:10
liusheng	zhenguo: you mean neutron-dhcp-agent ?	03:11
zhenguo	liusheng: no,	03:11
zhenguo	liusheng: I mean when the node is start, it should have a DHCP process, I want to know if it can get a IP from neutron	03:12
zhenguo	liusheng: I still suspect it's a network problem	03:12
liusheng	zhenguo: I don't how to confirm, the ironic-cond's log just says "timeout for waiting call-back"	03:13
zhenguo	liusheng: the only way it's to get some clues from ironic-bm-logs	03:13
zhenguo	liusheng: you can check the ironic-bm-logs in your env	03:17
zhenguo	liusheng: which VIM plugin do you use to make it human readable?	03:18
liusheng	zhenguo: Nothing to boot: No such file or directory (http://ipxe.org/2d03e13b	03:19
liusheng	No more network devices	03:19
liusheng	Press Ctrl-B for the iPX E command line...	03:19
liusheng	No bootable device.	03:19
liusheng	zhenguo: you can use less -R {filename} to read	03:19
zhenguo	liusheng: ok, thanks	03:19
liusheng	zhenguo: or install the plugin AnsiEsc	03:19
liusheng	zhenguo: http://vim.sourceforge.net/scripts/script.php?script_id=302	03:20
liusheng	zhenguo: the above is the useful info from the console log	03:20
zhenguo	liusheng: seems it's still network problem	03:22
zhenguo	liusheng: we don't get a IP from the neutron network DHCP	03:22
liusheng	zhenguo: hmm, may it cannot support dynamically creating network in tempest job :(	03:25
zhenguo	liusheng: I remember Ironic devstack plugin also set other options for tempest network	03:25
zhenguo	liusheng: no, we have already created the network	03:25
zhenguo	liusheng: some other options	03:25
liusheng	zhenguo: may you can help, I haven't in the door of ironic yet, lol	03:26
zhenguo	liusheng: haha. sure	03:27
zhenguo	liusheng: ironic's tempest can work, and it really create instance and delete instance	03:39
zhenguo	liusheng: the best way it's to follow it	03:39
liusheng	zhenguo: yes, but I don't what is the difference of ironic and nimble's job	03:40
*** kevinz has quit IRC		03:58
*** kevinz has joined #openstack-nimble		04:50
*** kevinz has quit IRC		06:14
zhenguo	liusheng: I will dig the tempest failure after taskflow work, maybe you can do other things first	06:37
liusheng	zhenguo: thank you a lot! :)	06:37
zhenguo	liusheng: np :D	06:38
zhenguo	liusheng: Alex mentioned that nova has a config options for name, maybe we can follow that way	06:38
zhenguo	liusheng: but on db side, we should not set name as unique, please continue the patch	06:39
liusheng	zhenguo: OK, get it	06:39
zhenguo	liusheng: thanks	06:39
*** yuntongjin has joined #openstack-nimble		06:55
shaohe_feng	zhenguo: hi	06:57
zhenguo	shaohe_feng: hi	06:57
shaohe_feng	zhenguo: I create the etherpad, you can add it to wiki. https://etherpad.openstack.org/p/nimble-task	06:58
shaohe_feng	zhenguo: and I have seen you are working on configdrive	06:58
*** kevinz has joined #openstack-nimble		06:58
shaohe_feng	zhenguo: so I will working on the quotas.	06:58
zhenguo	shaohe_feng: ok, thanks	06:59
zhenguo	shaohe_feng: I have taken over the taskflow work	06:59
shaohe_feng	zhenguo: so many tasks for you.	06:59
shaohe_feng	zhenguo: you can update the etherpad.	07:00
zhenguo	shaohe_feng: I find a way to revert task, will update soon, hope the create taskflow work will be done by tomorrow.	07:00
shaohe_feng	zhenguo: so I can help to work on the configdrive	07:01
shaohe_feng	zhenguo: OK. another questions.	07:01
shaohe_feng	zhenguo: if the delete api delete the DB	07:01
shaohe_feng	zhenguo: and the create is still in process	07:01
zhenguo	shaohe_feng: sure, you can work on that first :P	07:01
zhenguo	shaohe_feng: I will focus on create/delete refactor task	07:01
zhenguo	shahe_feng: it's a tricky one	07:02
shaohe_feng	zhenguo: something wrong with the nimble daemon	07:02
zhenguo	shaohe_feng: maybe we need to add a lock	07:02
shaohe_feng	zhenguo: the nimble restart	07:02
shaohe_feng	zhenguo: I means the nimble daemon running "create instance", restart	07:03
shaohe_feng	zhenguo: how does it what resource should it free?	07:03
zhenguo	shaohe_feng: not sure, I will refactor the how process	07:04
zhenguo	shaohe_feng: please check tomorrow, hah	07:04
zhenguo	s/how/whole	07:04
shaohe_feng	zhenguo: so the resource such as network and volumes will be zombie	07:04
shaohe_feng	zhenguo: who will reap	07:05
shaohe_feng	zhenguo: who will reap them?	07:05
zhenguo	shaohe_feng: yes, nobody will remove them	07:05
zhenguo	shaohe_feng: and as the instance has been deleted, you even can't find the network information	07:05
shaohe_feng	zhenguo: can we avoid the zombie resource?	07:06
zhenguo	shahe_feng: you mean the scenario of delete instance when create is still in process	07:06
zhenguo	shaohe_feng: or create instance failed	07:07
shaohe_feng	zhenguo: yes.	07:08
zhenguo	shaohe_feng: do you think it make sense to prevent deleting when we are in building process	07:09
shaohe_feng	zhenguo: prevent is a simple design.	07:11
zhenguo	shaohe_feng: yes, but simple doesn't mean bad, hah	07:12
shaohe_feng	zhenguo: agree.	07:14
zhenguo	shaohe_feng: why do you want to delete it when it's still in building process	07:14
zhenguo	shaohe_feng: if we can make sure it will not stay in building process always	07:14
shaohe_feng	zhenguo: yes, we need a scenario.	07:14
shaohe_feng	zhenguo: https://wiki.openstack.org/wiki/Nimble#Task_track	07:15
zhenguo	shaohe_feng: yes, before someone requests us to do that, I think just prevent deleting is ok	07:15
shaohe_feng	zhenguo: let me discuss it with ZangRui.	07:15
shaohe_feng	zhenguo: OK.	07:15
zhenguo	shaohe_feng: thanks	07:15
zhenguo	shaohe_feng: and when an instance is building process, if a delete request comming, do you think we should just retrun with error or wait the process done and then delete it	07:18
shaohe_feng	zhenguo: an error is simple.	07:37
shaohe_feng	zhenguo: also return 202 and a task in backgroud to delete is OK.	07:38
zhenguo	shaohe_feng: I think we can discuss it more, when the deleting refactor is in process.	07:39
zhenguo	shaohe_feng: after I finished the create taskflow work	07:39
shaohe_feng	zhenguo: OK. another idea. And an "deleted" field in DB. and default value is "False".	07:41
shaohe_feng	zhenguo: when delete API, request. If the status is error or deploying or finish, just simple reap the resource and delete the instance.	07:42
shaohe_feng	zhenguo: if the status is in buiding, just mark the "deleted" field of instance in DB as "True"	07:43
shaohe_feng	zhenguo: and the create task check the this "deleted" field, and start revert.	07:44
zhenguo	shaohe_feng: seems ok	07:44
zhenguo	shaohe_feng: besides this, I also want to add deleted field to db	07:45
zhenguo	shaohe_feng: as users may want to check the last months instances used information	07:45
shaohe_feng	zhenguo: if nimble daemon restart, at the initiation phase, it can check the status/and "deleted" field of all instances and then reap the zombie resource.	07:46
shaohe_feng	zhenguo: let me check nova code, if there is "deleted" field of instance	07:47
zhenguo	shaohe_feng: do you think it's a bit waste if we created all resouces and just delete them after that.	07:47
zhenguo	shaohe_feng: there are deleted fields in all nova, cinder, glance tables.	07:48
shaohe_feng	zhenguo: maybe the users may regret after they create the instance for some reason, maybe somemistick.	07:49
zhenguo	shaohe_feng: in every task in the create instance flow, we should check if it has been deleted and raise InstanceNotFound exception to trigger the revert work, instead of waiting for all resouces created	07:50
shaohe_feng	zhenguo: such as they use the wrong key to inject, or other reasons.	07:50
shaohe_feng	zhenguo: agree.	07:50
zhenguo	shaohe_feng: yes we should gracefully handle users' requests instead of just return a forbidden	07:51
zhenguo	shaohe_feng: do you know whether we need to proved floating ip associate/unassociate API? or neutron API can do that	07:53
zhenguo	s/proved/provide	07:53
shaohe_feng	zhenguo: https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/models.py#L196 nova mark many deleted.	07:55
zhenguo	shaohe_feng: yes	07:57
shaohe_feng	zhenguo: neutron API do that, does that nimble always request floating ip to neutron?	07:58
zhenguo	shaohe_feng: yes	07:58
zhenguo	shaohe_feng: so we don't need to provide a seperate API?	07:59
zhenguo	shaohe_feng: I know we can allocate floating IP from neutron, but netron can also provide the instance associate API?	08:00
shaohe_feng	zhenguo: we need to dig out, how nova use the deleted information?	08:01
shaohe_feng	zhenguo: https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L2075 you can see the delete and soft_delete description	08:01
*** yuntongjin has quit IRC		08:02
zhenguo	shaohe_feng: yes, but seems we don't need soft_deleted	08:02
shaohe_feng	zhenguo: 'deleted' - only return (or exclude) deleted instances	08:03
zhenguo	shaohe_feng: yes, it's a query option	08:03
zhenguo	shaohe_feng: if you specify deleted parameter when query instances, it just return the deleted instances	08:04
zhenguo	shaohe_feng: but I remember someone said that nova will remove all 'deleted' parameter in it's API	08:05
shaohe_feng	zhenguo: yes. but we need to know more about how does "deleted" take effect.	08:06
zhenguo	shaohe_feng: yes,	08:07
shaohe_feng	zhenguo: Oh, so we need to check it. if it remove all "deleted", then should we support it?	08:07
shaohe_feng	zhenguo: does RuiChen know this "deleted" field in DB?	08:08
zhenguo	shaohe_feng: I amd also worried about that, maybe you can ask Alex about that	08:08
shaohe_feng	zhenguo: OK. it remove all "deleted" parameters in API, does also the field in DB?	08:08
zhenguo	shaohe_feng: not sure	08:09
*** kevinz has quit IRC		08:13
* zhenguo brb		08:18
*** yuntongjin has joined #openstack-nimble		08:38
zhenguo	shaohe_feng, liusheng: I find a issue when using node_cache, if there are many create request received, seems they will schedule to the same node :(	08:51
zhenguo	we should add a threading lock when one accesses the node_cache, and after scheduling remove the selected node from cache.	08:57
liusheng	zhenguo: may we will support multiple-workers ?	08:58
zhenguo	liusheng: you mean nimble-engine?	08:59
liusheng	zhenguo: yes	08:59
zhenguo	liusheng: yes, but not sure whether we need to support active-active mode	08:59
zhenguo	liusheng: maybe just active and standby?	08:59
liusheng	zhenguo: most of other openstack services multi-workers is a-a mode, right ?	09:01
zhenguo	liusheng: yes	09:01
liusheng	zhenguo: if we support multi-workers with a-a mode with node_cache, we may need an external lock mechanism	09:02
zhenguo	liusheng: I think we can use one worker now, as I think the most biggest baremetal cluster is about 4000	09:03
zhenguo	liusheng: one worker should be enough to handle that	09:04
liusheng	zhenguo: hah, hope that	09:04
zhenguo	liusheng: hah, If there are more, we can change in the future	09:04
zhenguo	liusheng: but if there are more than one worker, I think we can't use cache but should store the nodes in DB	09:05
liusheng	zhenguo: ok, yes, the DB naturally support lock mechanism	09:06
zhenguo	liusheng: yes	09:06
*** yuntongjin has quit IRC		09:19
*** Kevin_Zheng has quit IRC		10:47
*** RuiChen has quit IRC		11:06
openstackgerrit	Zhenguo Niu proposed openstack/nimble: [WIP] Add create instance taskflow https://review.openstack.org/403555	12:01
*** liusheng has quit IRC		12:17
openstackgerrit	Zhenguo Niu proposed openstack/nimble: Add create instance taskflow https://review.openstack.org/403555	12:47
-openstackstatus- NOTICE: Launchpad SSO is not currently working, so logins to our services like review.openstack.org and wiki.openstack.org are failing; the admins at Canonical are looking into the issue but there is no estimated time for a fix yet.		16:24
*** ChanServ changes topic to "Launchpad SSO is not currently working, so logins to our services like review.openstack.org and wiki.openstack.org are failing; the admins at Canonical are looking into the issue but there is no estimated time for a fix yet."		16:24
*** ChanServ changes topic to "Bugs: bugs.launchpad.net/nimble \| Review: https://review.openstack.org/#/q/project:openstack/nimble,n,z"		17:01
-openstackstatus- NOTICE: Canonical admins have resolved the issue with login.launchpad.net, so authentication should be restored now.		17:01

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!