Tuesday, 2024-01-16

opendevreviewMerged openstack/manila-specs master: Add spec for share/share-snapshot deferred deletion  https://review.opendev.org/c/openstack/manila-specs/+/90170001:24
opendevreviewMerged openstack/manila stable/xena: Fix error message from share server API  https://review.opendev.org/c/openstack/manila/+/90535512:01
opendevreviewMerged openstack/manila stable/xena: skip periodic update on replicas in 'error_deleting'  https://review.opendev.org/c/openstack/manila/+/89234512:03
opendevreviewMerged openstack/python-manilaclient stable/wallaby: Support --os-key option  https://review.opendev.org/c/openstack/python-manilaclient/+/87702312:07
opendevreviewMerged openstack/python-manilaclient stable/wallaby: Use suitable api version for OSC.  https://review.opendev.org/c/openstack/python-manilaclient/+/87803712:15
opendevreviewMerged openstack/manila stable/wallaby: Validate provider_location while managing snapshot  https://review.opendev.org/c/openstack/manila/+/89703512:42
opendevreviewMerged openstack/manila stable/2023.1: Change status and error handling for /shares API  https://review.opendev.org/c/openstack/manila/+/90545912:42
opendevreviewkiran pawar proposed openstack/manila master: Retry on connection error to neutron  https://review.opendev.org/c/openstack/manila/+/90569514:11
klindgrenhello I am having some issues with Manila and looking for some guidance.  I am trying to use manila with the cephfs native driver.  I have a single cephfs cluster with mutiple cephfs filesystems.  I am trying to add each filesystem to manila and have share requests split between them.  When I do so, all share provisioning requests go to the first backend.  The scheduler see's both backends but always filters down two 2 hosts.  Then it weights,16:40
klindgren and goodness them both to 0 and then chooses then always chooses the first backend.  No matter how many share's are provisioned against it.  If I try setting the goodness setting on the second backend, the scheduler reports that the goodness filter is not found and defaults 0.  It seems like looking at the code that the goodness filter defaults to none, and that its up to the share driver to implement it?16:40
klindgrenHow  can I get share requests to split screates between two backends?  I tried the simple scheduler and it stack traces on: ` results = db.service_get_all_share_sorted(elevated)` in the simple driver with a `sqlalchemy.exc.InvalidRequestError: Entity namespace for "coalesce(anon_1.share_gigabytes, :coalesce_1)" has no property "topic"`.16:40
klindgrenSo just wondering what am I missing and how is this suppose to work?16:40
gouthamrhey klindgren: what are you trying to set the goodness function to? When creating multiple backends off of the same ceph cluster, the capacity information would be the same (cephfs filesystems can span the whole ceph cluster) - so all things will appear the same unless the request has some characteristics (share type extra specs) that need to match the backend capabilities16:49
klindgrenfor testing I was just simply setting the the goodness_filter to "100".  To try to make a request go to the second backend, but the debug logs say that the goodness function is not defined.16:51
klindgrenMy hope was that I could write a filter that would simple split the number of shares between the backends with the same share_backend_name.  The reason for multiple cephfs filesystems on the same cluster is each cephfs filesystem is tied to different MDS daemon pairs.  Due to the workloads that we have being extreemly metadata intensive.16:52
klindgrenI had already seen that the capacity was reported the same, though I would have expected to also look at allocated resources against a backend (vs's actual consumed) and eventually would expect one to be weighed higher than the other, simple because it had no shares/resources allocated to it.16:54
gouthamryes; that’s how it works with drivers that allow controlling over provisioning with Manila… but the Cephfs driver does not16:55
gouthamrare you aware of a way that the mds load can be detected? Maybe that’s a good goodness_function that we can implement16:56
klindgrenre: detecting MDS load, I can ask around - I deal much less on that side of the house.16:58
gouthamrthanks; I’ll look at some docs as well or consult some ceph folks.. in the meanwhile, the simple filter failure sure looks like a db query bug…16:59
gouthamrcould you please report it on bugs.launchpad.net/manila with the stack trace?17:00
klindgrenSure.17:05
klindgrenWhen you say: "that�s how it works with drivers that allow controlling over provisioning with Manila� but the Cephfs driver does not".  Can you help me understand how scheduling works then?  Like if I have 2 backends both providing the same features and I jsut want share creates to round-robin between the two - is that not possible? or only possible on specific drivers?17:07
klindgrenIs there someway that I can have each filesystem on the ceph cluster exposed as a pool and use the pool_weight_multiplier = -1.0, so that it spreads requests across the pools, vs's packs them all into the same pool17:08
klindgrenis it possible to just set a simple no more than x shares per backend?17:09
klindgrenPer my ceph teamates, they said: `The number of requests per second could be used, but it's pretty spiky. The command "ceph fs status" can be used to see those metrics`17:20
gouthamrklindgren: thanks for that feedback; round robin's not possible because we don't preserve context of scheduling decisions.. i think your issue may be resolved with implementing "allocated_capacity_gb" in the cephfs driver, so that you could use it in a goodness_function configuration17:29
klindgrenhttps://bugs.launchpad.net/manila/+bug/2049528 - re simple scheduler stack trace17:31
gouthamrthanks klindgren 17:31
klindgrenThat also appears to require implementing the goodness function in the share driver as well?  From what I was able to see - it looks like only really the netapp drivers appear to have this functionality?  Everything else appeared to inherit the base class which returned none.17:38
klindgrenI did have an additional user cluster.  We have multiple control plane servers, that we were planning on running manila-share under with the same backends configured (like we do for pretty much everything else that we run openstack wise).  However, this appears that we would cause 3 host entires to show up exposing the same backends.  EG if we had the same 20 backends, configured on 3 control plane nodes, with would result in 60 backends showi17:42
klindgrenng up?  What is the recommended deployment for this?  To set the hostname in config file on all the share servers the same?17:42
gouthamrklindgren: goodness functions can be layered on top of whatever the driver supports, and evaluated in the scheduler.. so no, the driver doesn't _need_ to implement a custom one; it can be oblivious to it... but, drivers can implement a default goodness function (this one will kick in if there's no configured goodness_function)17:45
klindgrenOk - so in the exmaple where I set the goodness filter in the config to 100.  And the scheduler doesn't see a function defined, is that a bug then?  When I added debug statements around `host_state.capabilities` the goodness_function is not contained in the data about the backend.  It just contains:17:49
klindgren{'pool_name': 'cephfs', 'total_capacity_gb': 429237.19, 'free_capacity_gb': 429161.58, 'qos': 'False', 'reserved_percentage': 0, 'reserved_snapshot_percentage': 0, 'reserved_share_extend_percentage': 0, 'dedupe': [False], 'compression': [False], 'thin_provisioning': [False], 'share_backend_name': 'cephfs-fstest-2', 'storage_protocol': 'CEPHFS', 'vendor_name': 'Ceph', 'driver_version': '1.0', 'timestamp': datetime.datetime(2024, 1, 16, 15, 22, 17:49
klindgren10, 250194), 'driver_handles_share_servers': False, 'snapshot_support': True, 'create_share_from_snapshot_support': True, 'revert_to_snapshot_support': False, 'mount_snapshot_support': False, 'replication_type': None, 'replication_domain': None, 'sg_consistent_snapshot_support': None, 'security_service_update_support': False, 'network_allocation_update_support': False, 'share_server_multiple_subnet_support': False, 'ipv4_support': True, 'ipv6_17:49
klindgrensupport': False}17:49
klindgreneven when the backend is defined as:17:51
klindgren[cephfs-fstest-2]17:51
klindgrendriver_handles_share_servers = False17:51
klindgrenshare_backend_name = cephfs-fstest-217:51
klindgrenshare_driver = manila.share.drivers.cephfs.driver.CephFSDriver17:51
klindgrencephfs_conf_path = /etc/ceph/fstest-2.conf17:51
klindgrencephfs_auth_id = manila17:51
klindgrencephfs_filesystem_name = cephfs0217:51
klindgrencephfs_cluster_name = cephtest17:51
klindgrengoodness_function = "100"17:51
gouthamrklindgren: regarding your other question.. we'd recommend you only run one instance of manila-share per backend... what most deployments do is run manila's share-manager service under pacemaker or similar services that handle HA... the service is effectively deployed active/passive: 17:51
gouthamrklindgren: in that case, yes, the "host" attribute in the config file on each controller node is set to a common string17:52
klindgrenpacemaker :puke: - I've personally never had good experience with pacemaker clusters, but I guess we can come up with something for ensuring only one copy is running at a time.17:53
gouthamrklindgren: hmm, i don't see that behavior on the CI - i.e., the scheduler does see a "goodness_function" (defaults to None) - https://zuul.opendev.org/t/openstack/build/cf3bfa7cdca4489caad829bcd11c4bab/log/controller/logs/screen-m-sch.txt#892 ..17:56
gouthamrklindgren: what version of openstack are you using? 17:56
klindgren2023.117:56
klindgrenI am using kolla-ansible under the hood here.  Which by default only configures the share specific stuff under the manila.conf for the share service.  However, I modified the scheduler manila.conf as well and added it there, and the logs still say its not found.17:58
gouthamrklindgren: share-manager service node is where you need to put this.. and it should work like the way you've configured it; i'm confused and reading code to see why that doesn't work17:59
gouthamrklindgren: is it possible for you to enable debug=True on the node containing manila-share? the service will spit out the config opts right on top.. can you see if this is getting picked up? 18:03
opendevreviewTakashi Kajinami proposed openstack/manila master: Drop upgrade scripts for old releases  https://review.opendev.org/c/openstack/manila/+/90575418:04
klindgrenits already running in debug - checking.18:04
gouthamrklindgren: ack; you can use this pastebin to share long pastes: paste.openstack.org 18:05
klindgren```2024-01-16 18:04:57.979 7 DEBUG oslo_service.service [None req-5e7d5451-80c5-451d-9791-b9971f10b132 - - - - - -] cephfs-fstest-2.goodness_function = 100 log_opt_values /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_config/cfg.py:260918:05
klindgrenlooks like its seeing it18:05
klindgrenbut scheduler still logs that its not set on share creates:18:08
klindgrenhttps://paste.openstack.org/show/bbsu5ESqhqogNkTzVgnc/18:08
gouthamrklindgren: i just tried doing this on my machine, and it showed up in the scheduler.. 18:13
gouthamrklindgren: by this, i mean i set "goodness_function = 100".. and i bounced the manila-share service.. the scheduler reflected it in the backend's pool18:14
gouthamrif you turn on debug=True on the scheduler's manila.conf, you'll see a message in the host_manager: "Received share service update from <host> ..." -- this message contains host's stats 18:15
klindgrendoes the `share_backend_name` need to exactly match the config stanza name?  I saw something about that as a bug like 7-8 years ago that got fixed.18:18
klindgrenhttps://paste.openstack.org/show/buBeMSGeiX9KSfUF6pnv/18:18
klindgrenI believe is what you are talking about - it doesn't have the goodness_filter stuff in the updates18:19
gouthamrklindgren: i see "'goodness_function': '55'" here18:20
klindgrenhrm18:20
klindgrenI guess I have it set to 55 right now, I had it at 100, but I see 55. now.18:21
klindgrenSo at `2024-01-16 18:04:58.274`, it says its set to 55, but at `2024-01-16 18:06:02.110` it says that its not defined.  No other updates happened between those:18:24
klindgrenhttps://paste.openstack.org/show/b1WWLlBSSg9a8M0q2ntx/18:24
gouthamrCouple of things to try to isolate this: bounce the scheduler service - allow it to get fresh updates and not rely on any data that’s possibly stale.. it is also possible that updates from multiple controllers are messing with this? has the config opt been set everywhere where this backend has been defined?18:56
klindgrenI can work on debugging this some more.  Everything is on a single host for now - to avoid initial roll out complications.  Might also look at just moving to the latest release for this component.19:10
gouthamrklindgren: thanks; i reported https://bugs.launchpad.net/manila/+bug/2049538 .. please take a look/subscribe to it, and feel free to add any comments there19:30
gouthamri'll use it when discussing with my ceph engineering colleagues19:30

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!