Tuesday, 2024-01-16

opendevreview	Merged openstack/manila-specs master: Add spec for share/share-snapshot deferred deletion https://review.opendev.org/c/openstack/manila-specs/+/901700	01:24
opendevreview	Merged openstack/manila stable/xena: Fix error message from share server API https://review.opendev.org/c/openstack/manila/+/905355	12:01
opendevreview	Merged openstack/manila stable/xena: skip periodic update on replicas in 'error_deleting' https://review.opendev.org/c/openstack/manila/+/892345	12:03
opendevreview	Merged openstack/python-manilaclient stable/wallaby: Support --os-key option https://review.opendev.org/c/openstack/python-manilaclient/+/877023	12:07
opendevreview	Merged openstack/python-manilaclient stable/wallaby: Use suitable api version for OSC. https://review.opendev.org/c/openstack/python-manilaclient/+/878037	12:15
opendevreview	Merged openstack/manila stable/wallaby: Validate provider_location while managing snapshot https://review.opendev.org/c/openstack/manila/+/897035	12:42
opendevreview	Merged openstack/manila stable/2023.1: Change status and error handling for /shares API https://review.opendev.org/c/openstack/manila/+/905459	12:42
opendevreview	kiran pawar proposed openstack/manila master: Retry on connection error to neutron https://review.opendev.org/c/openstack/manila/+/905695	14:11
klindgren	hello I am having some issues with Manila and looking for some guidance. I am trying to use manila with the cephfs native driver. I have a single cephfs cluster with mutiple cephfs filesystems. I am trying to add each filesystem to manila and have share requests split between them. When I do so, all share provisioning requests go to the first backend. The scheduler see's both backends but always filters down two 2 hosts. Then it weights,	16:40
klindgren	and goodness them both to 0 and then chooses then always chooses the first backend. No matter how many share's are provisioned against it. If I try setting the goodness setting on the second backend, the scheduler reports that the goodness filter is not found and defaults 0. It seems like looking at the code that the goodness filter defaults to none, and that its up to the share driver to implement it?	16:40
klindgren	How can I get share requests to split screates between two backends? I tried the simple scheduler and it stack traces on: ` results = db.service_get_all_share_sorted(elevated)` in the simple driver with a `sqlalchemy.exc.InvalidRequestError: Entity namespace for "coalesce(anon_1.share_gigabytes, :coalesce_1)" has no property "topic"`.	16:40
klindgren	So just wondering what am I missing and how is this suppose to work?	16:40
gouthamr	hey klindgren: what are you trying to set the goodness function to? When creating multiple backends off of the same ceph cluster, the capacity information would be the same (cephfs filesystems can span the whole ceph cluster) - so all things will appear the same unless the request has some characteristics (share type extra specs) that need to match the backend capabilities	16:49
klindgren	for testing I was just simply setting the the goodness_filter to "100". To try to make a request go to the second backend, but the debug logs say that the goodness function is not defined.	16:51
klindgren	My hope was that I could write a filter that would simple split the number of shares between the backends with the same share_backend_name. The reason for multiple cephfs filesystems on the same cluster is each cephfs filesystem is tied to different MDS daemon pairs. Due to the workloads that we have being extreemly metadata intensive.	16:52
klindgren	I had already seen that the capacity was reported the same, though I would have expected to also look at allocated resources against a backend (vs's actual consumed) and eventually would expect one to be weighed higher than the other, simple because it had no shares/resources allocated to it.	16:54
gouthamr	yes; that’s how it works with drivers that allow controlling over provisioning with Manila… but the Cephfs driver does not	16:55
gouthamr	are you aware of a way that the mds load can be detected? Maybe that’s a good goodness_function that we can implement	16:56
klindgren	re: detecting MDS load, I can ask around - I deal much less on that side of the house.	16:58
gouthamr	thanks; I’ll look at some docs as well or consult some ceph folks.. in the meanwhile, the simple filter failure sure looks like a db query bug…	16:59
gouthamr	could you please report it on bugs.launchpad.net/manila with the stack trace?	17:00
klindgren	Sure.	17:05
klindgren	When you say: "that�s how it works with drivers that allow controlling over provisioning with Manila� but the Cephfs driver does not". Can you help me understand how scheduling works then? Like if I have 2 backends both providing the same features and I jsut want share creates to round-robin between the two - is that not possible? or only possible on specific drivers?	17:07
klindgren	Is there someway that I can have each filesystem on the ceph cluster exposed as a pool and use the pool_weight_multiplier = -1.0, so that it spreads requests across the pools, vs's packs them all into the same pool	17:08
klindgren	is it possible to just set a simple no more than x shares per backend?	17:09
klindgren	Per my ceph teamates, they said: `The number of requests per second could be used, but it's pretty spiky. The command "ceph fs status" can be used to see those metrics`	17:20
gouthamr	klindgren: thanks for that feedback; round robin's not possible because we don't preserve context of scheduling decisions.. i think your issue may be resolved with implementing "allocated_capacity_gb" in the cephfs driver, so that you could use it in a goodness_function configuration	17:29
klindgren	https://bugs.launchpad.net/manila/+bug/2049528 - re simple scheduler stack trace	17:31
gouthamr	thanks klindgren	17:31
klindgren	That also appears to require implementing the goodness function in the share driver as well? From what I was able to see - it looks like only really the netapp drivers appear to have this functionality? Everything else appeared to inherit the base class which returned none.	17:38
klindgren	I did have an additional user cluster. We have multiple control plane servers, that we were planning on running manila-share under with the same backends configured (like we do for pretty much everything else that we run openstack wise). However, this appears that we would cause 3 host entires to show up exposing the same backends. EG if we had the same 20 backends, configured on 3 control plane nodes, with would result in 60 backends showi	17:42
klindgren	ng up? What is the recommended deployment for this? To set the hostname in config file on all the share servers the same?	17:42
gouthamr	klindgren: goodness functions can be layered on top of whatever the driver supports, and evaluated in the scheduler.. so no, the driver doesn't _need_ to implement a custom one; it can be oblivious to it... but, drivers can implement a default goodness function (this one will kick in if there's no configured goodness_function)	17:45
klindgren	Ok - so in the exmaple where I set the goodness filter in the config to 100. And the scheduler doesn't see a function defined, is that a bug then? When I added debug statements around `host_state.capabilities` the goodness_function is not contained in the data about the backend. It just contains:	17:49
klindgren	{'pool_name': 'cephfs', 'total_capacity_gb': 429237.19, 'free_capacity_gb': 429161.58, 'qos': 'False', 'reserved_percentage': 0, 'reserved_snapshot_percentage': 0, 'reserved_share_extend_percentage': 0, 'dedupe': [False], 'compression': [False], 'thin_provisioning': [False], 'share_backend_name': 'cephfs-fstest-2', 'storage_protocol': 'CEPHFS', 'vendor_name': 'Ceph', 'driver_version': '1.0', 'timestamp': datetime.datetime(2024, 1, 16, 15, 22,	17:49
klindgren	10, 250194), 'driver_handles_share_servers': False, 'snapshot_support': True, 'create_share_from_snapshot_support': True, 'revert_to_snapshot_support': False, 'mount_snapshot_support': False, 'replication_type': None, 'replication_domain': None, 'sg_consistent_snapshot_support': None, 'security_service_update_support': False, 'network_allocation_update_support': False, 'share_server_multiple_subnet_support': False, 'ipv4_support': True, 'ipv6_	17:49
klindgren	support': False}	17:49
klindgren	even when the backend is defined as:	17:51
klindgren	[cephfs-fstest-2]	17:51
klindgren	driver_handles_share_servers = False	17:51
klindgren	share_backend_name = cephfs-fstest-2	17:51
klindgren	share_driver = manila.share.drivers.cephfs.driver.CephFSDriver	17:51
klindgren	cephfs_conf_path = /etc/ceph/fstest-2.conf	17:51
klindgren	cephfs_auth_id = manila	17:51
klindgren	cephfs_filesystem_name = cephfs02	17:51
klindgren	cephfs_cluster_name = cephtest	17:51
klindgren	goodness_function = "100"	17:51
gouthamr	klindgren: regarding your other question.. we'd recommend you only run one instance of manila-share per backend... what most deployments do is run manila's share-manager service under pacemaker or similar services that handle HA... the service is effectively deployed active/passive:	17:51
gouthamr	klindgren: in that case, yes, the "host" attribute in the config file on each controller node is set to a common string	17:52
klindgren	pacemaker :puke: - I've personally never had good experience with pacemaker clusters, but I guess we can come up with something for ensuring only one copy is running at a time.	17:53
gouthamr	klindgren: hmm, i don't see that behavior on the CI - i.e., the scheduler does see a "goodness_function" (defaults to None) - https://zuul.opendev.org/t/openstack/build/cf3bfa7cdca4489caad829bcd11c4bab/log/controller/logs/screen-m-sch.txt#892 ..	17:56
gouthamr	klindgren: what version of openstack are you using?	17:56
klindgren	2023.1	17:56
klindgren	I am using kolla-ansible under the hood here. Which by default only configures the share specific stuff under the manila.conf for the share service. However, I modified the scheduler manila.conf as well and added it there, and the logs still say its not found.	17:58
gouthamr	klindgren: share-manager service node is where you need to put this.. and it should work like the way you've configured it; i'm confused and reading code to see why that doesn't work	17:59
gouthamr	klindgren: is it possible for you to enable debug=True on the node containing manila-share? the service will spit out the config opts right on top.. can you see if this is getting picked up?	18:03
opendevreview	Takashi Kajinami proposed openstack/manila master: Drop upgrade scripts for old releases https://review.opendev.org/c/openstack/manila/+/905754	18:04
klindgren	its already running in debug - checking.	18:04
gouthamr	klindgren: ack; you can use this pastebin to share long pastes: paste.openstack.org	18:05
klindgren	```2024-01-16 18:04:57.979 7 DEBUG oslo_service.service [None req-5e7d5451-80c5-451d-9791-b9971f10b132 - - - - - -] cephfs-fstest-2.goodness_function = 100 log_opt_values /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_config/cfg.py:2609	18:05
klindgren	looks like its seeing it	18:05
klindgren	but scheduler still logs that its not set on share creates:	18:08
klindgren	https://paste.openstack.org/show/bbsu5ESqhqogNkTzVgnc/	18:08
gouthamr	klindgren: i just tried doing this on my machine, and it showed up in the scheduler..	18:13
gouthamr	klindgren: by this, i mean i set "goodness_function = 100".. and i bounced the manila-share service.. the scheduler reflected it in the backend's pool	18:14
gouthamr	if you turn on debug=True on the scheduler's manila.conf, you'll see a message in the host_manager: "Received share service update from <host> ..." -- this message contains host's stats	18:15
klindgren	does the `share_backend_name` need to exactly match the config stanza name? I saw something about that as a bug like 7-8 years ago that got fixed.	18:18
klindgren	https://paste.openstack.org/show/buBeMSGeiX9KSfUF6pnv/	18:18
klindgren	I believe is what you are talking about - it doesn't have the goodness_filter stuff in the updates	18:19
gouthamr	klindgren: i see "'goodness_function': '55'" here	18:20
klindgren	hrm	18:20
klindgren	I guess I have it set to 55 right now, I had it at 100, but I see 55. now.	18:21
klindgren	So at `2024-01-16 18:04:58.274`, it says its set to 55, but at `2024-01-16 18:06:02.110` it says that its not defined. No other updates happened between those:	18:24
klindgren	https://paste.openstack.org/show/b1WWLlBSSg9a8M0q2ntx/	18:24
gouthamr	Couple of things to try to isolate this: bounce the scheduler service - allow it to get fresh updates and not rely on any data that’s possibly stale.. it is also possible that updates from multiple controllers are messing with this? has the config opt been set everywhere where this backend has been defined?	18:56
klindgren	I can work on debugging this some more. Everything is on a single host for now - to avoid initial roll out complications. Might also look at just moving to the latest release for this component.	19:10
gouthamr	klindgren: thanks; i reported https://bugs.launchpad.net/manila/+bug/2049538 .. please take a look/subscribe to it, and feel free to add any comments there	19:30
gouthamr	i'll use it when discussing with my ceph engineering colleagues	19:30

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!