Thursday, 2020-11-05

ianwinfra-root: pypa/pip have enabled the opendev app with https://github.com/pypa/pip/issues/910300:01
ianwinfra-root: i've proposed https://review.opendev.org/#/c/761467/2 and https://review.opendev.org/#/c/761468/2 to setup the same project-config/tenant setup as for pyca00:01
ianwI'll leave it for review on those to essentially agree that we're happy to have our resources put towards this; for mine I think similar to pyca it's going to help everyone00:01
clarkbwhat sort of job do we expect them to be running? openstack constraints integration testing type deals?00:01
ianwmy understanding would be tox type testing on, particularly on debian/centos and perhaps fedora00:06
*** tosky has quit IRC00:14
openstackgerritIan Wienand proposed opendev/system-config master: [wip] borg: add fuse  https://review.opendev.org/76127500:15
clarkbianw: couple of comments on ^00:19
ianwclarkb: not sure what you mean if it depends on borgbackup?  i think adding it to extras includes it?  i'll have a poke at the unmount, you're probably right but it does seem to have a command00:21
clarkbI mean does pip install borgbackup[fuse] imply installing borgbackup? I don't actually know00:21
clarkband for the unmount I think you're just doing `borg /opt/backups` rather than something like borg unmount /opt/backups00:22
fungiusually [extras] will install everything which gets installed without that, plus whatever is in the extra-requires00:22
ianwis grafana.opendev.org not responding for others?00:28
clarkbI can get it via ssh but not https00:29
clarkblooks like an iptables problem00:30
clarkbit doesn't have port 80 and 443 open in iptables00:30
fungihow would that have happened?00:31
clarkbdid the groups change for it? we use webserver group for 80 and 443 in many cases00:31
ianwhrm, weird00:31
fungilast ssh login (before now) was nearly a month ago, so i doubt we did anything manually directly on that server00:33
ianwgrafana[0-9].opendev.org is in the webservers group00:33
ianwthat wants a *00:35
ianwhrm00:35
fungid'oh!00:36
openstackgerritIan Wienand proposed opendev/system-config master: Add * match to grafana.opendev.org  https://review.opendev.org/76147600:36
fungithat changed with the cleanup of the old grafana server i guess00:36
ianwi need to shut that down00:38
ianwi'll get the opendev working then do that today00:38
ianwi think i didn't notice because my url bar has auto-filled in the old openstack.org server00:38
fungiahh, okay, i didn't realize that was still in progress00:39
*** DSpider has quit IRC00:40
ianwneither did I :)00:42
ianwclarkb: This is a convenience wrapper that just calls the platform-specific shell command - usually this is either umount or fusermount -u.00:48
ianwso yeah, can just unmount00:49
ianwtestinfra works though; it runs a test backup to the test backup server, then can mount it via fuse.   pretty cool!00:49
clarkbnice00:49
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: add fuse  https://review.opendev.org/76127500:57
*** whoami-rajat___ has joined #opendev01:06
openstackgerritMerged opendev/elastic-recheck master: Add query for bug 1901739  https://review.opendev.org/75996701:07
openstackbug 1901739 in OpenStack Compute (nova) " libvirt.libvirtError: internal error: missing block job data for disk 'vda'" [High,Confirmed] https://launchpad.net/bugs/190173901:07
openstackgerritmelanie witt proposed opendev/elastic-recheck master: Add query for bug 1902002  https://review.opendev.org/76147801:16
openstackbug 1902002 in devstack "Fail to get default route device in CI jobs" [Medium,In progress] https://launchpad.net/bugs/1902002 - Assigned to Dr. Jens Harbott (j-harbott)01:16
openstackgerritMerged opendev/system-config master: Add * match to grafana.opendev.org  https://review.opendev.org/76147601:16
openstackgerritmelanie witt proposed opendev/elastic-recheck master: Add query for bug 1902002  https://review.opendev.org/76147801:18
openstackbug 1902002 in devstack "Fail to get default route device in CI jobs" [Medium,In progress] https://launchpad.net/bugs/1902002 - Assigned to Dr. Jens Harbott (j-harbott)01:18
ianwgrafana opendev back, i'll clean up the old now02:18
openstackgerritIan Wienand proposed opendev/system-config master: grafana: redirect http to CNAME  https://review.opendev.org/76148702:28
ianwi think the new graphite server is good too.  i'll cleanup the old one02:38
ianw#status log remove old graphite01.opendev.org server and storage02:41
openstackstatusianw: finished logging02:42
ianw#status log removed grafana02.openstack.org, CNAME now goes to grafana.opendev.org02:42
openstackstatusianw: finished logging02:42
openstackgerritMerged opendev/system-config master: borg-backup: add fuse  https://review.opendev.org/76127502:45
openstackgerritIan Wienand proposed opendev/system-config master: grafana: fix typo in test name  https://review.opendev.org/76148902:57
*** hamalq has quit IRC03:21
*** ykarel has joined #opendev03:49
melwittdoes anyone know if this kind of 503 a common/known thing? ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='opendev.org', port=443): Max retries exceeded with url: /openstack/requirements/raw/branch/master/upper-constraints.txt (Caused by ResponseError('too many 503 error responses',))03:50
mnasermelwitt: interesting you bring that up, i am getting a few failures in our downstream with `error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.`04:59
melwittO.o05:01
melwittthe gate is killing me rn I swear :(05:01
mnaser:(05:03
ianwmelwitt: that ... shouldn't happen.  one of our backends might be unhappy05:04
ianwmelwitt: which job?05:04
melwittthat was nova-live-migration https://zuul.opendev.org/t/openstack/build/5a4e126cd734457fa1024575ec193440/log/logs/devstacklog.txt#847405:04
ianwmnaser: i agree with your point re: zuul as a 3rd party CI.  but ... I think we need to reach out and try to bring people along for the journey05:05
melwittI'll be back in a couple of hours to recheck for the 9th time o/05:05
ianwlet me see if i can find where the lb sent that05:06
mnaserianw: i only voice this because i kinda tried the experiment with cherrypy and it really lead to nothing but just a normal reporting job amongst many others05:06
mnaserif there's no incentive to move towards gating, eh.05:06
ianwi do see your point and somewhat, agree and pyca has been similar.  we could do their wheel releases for them, which they get manually involved with but there's been resitance05:08
ianwhowever, i feel like having skin in the game, when things come up; when you can point out that zuul would have stopped that breaking change, etc. gives a chance for adoption05:09
ianwi think that was 23.253.203.14705:09
ianwi think that went to gitea05 balance_git_https/gitea05.opendev.org05:11
ianwsorty, it actually went to balance_git_https/gitea06.opendev.org05:18
ianwthat host does actually look unhappy05:20
ianw2020/11/05 03:21:43 cmd/web.go:107:runWeb() [I] Starting Gitea on PID: 105:22
ianw2020-11-05 03:21:32.393 | ERROR: Could not install packages due to an EnvironmentError: HTTPSConnectionPool(host='opendev.org', port=443): Max retries exceeded with url: /openstack/requirements/raw/branch/master/upper-constraints.txt (Caused by ResponseError('too many 503 error responses',))05:24
ianwit seems this managed to happen right as the container was restarting05:24
*** fressi has joined #opendev05:29
*** whoami-rajat___ is now known as whoami-rajat__05:32
ianwclarkb: just mounted fuse backups on ethercalc02, all seems to work.  i think everything is ready to roll otu to more servers now05:39
mnaserianw: just had a downstream job fail on 'error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.'06:27
*** sboyron has joined #opendev06:27
*** ysandeep|away is now known as ysandeep|ruck06:36
*** Tengu has quit IRC06:58
*** Tengu has joined #opendev07:05
*** mschoenlaub has joined #opendev07:05
*** mschoenlaub has quit IRC07:06
*** marios has joined #opendev07:09
*** lpetrut has joined #opendev07:11
*** melwitt has quit IRC07:20
*** melwitt has joined #opendev07:21
*** ykarel_ has joined #opendev07:21
*** eolivare has joined #opendev07:22
*** ykarel has quit IRC07:24
fricklermnaser: RPC sounds like some internal call, not a download. do you have the logs accessible somewhere?07:31
mnaserfrickler: that was during a git clone -- i don't have the log in an easily locatable way but ill try to keep more for the next time07:32
frickleriiuc we did unblock that crawling job, maybe it isn't limiting itself enough yet07:34
mnaserfrickler: that was git clones to https://opendev.org though07:35
*** ykarel_ is now known as ykarel07:39
fricklerah, right, the crawler went against gerrit. I do see some spikes on the gitea-lb cacti graphs since 0400, not sure if those might be related or whether they are normal and just smoothed out in the longer intervals07:43
fricklerseems selection of custom intervals in the cacti graphs doesn't work for me, I always get only the default view07:44
mnaserfrickler: caught it -- http://paste.openstack.org/show/799721/07:46
mnaseri think its almost always horizon at the root07:46
*** slaweq has joined #opendev07:54
*** ralonsoh has joined #opendev07:59
fricklerI didn't find anything obvious in the logs. I also tried cloning horizon from every gitea instance, found no issues there, either08:07
*** tosky has joined #opendev08:13
*** andrewbonney has joined #opendev08:14
zbrfungi: ianw: ansible-lint does require ansible >=2.9 and the can't upgrade ansible in place is still valid without any fixing being planned.08:18
zbransible team is seeing it as a pip/setuptools bug, and they other side has other priorities. so we need to be careful to avoid it.08:19
ianwzbr: ansible-lint only specifies ansible>=2.8 in it's setup.cfg08:19
zbrwell, that is easy to fix.08:20
ianwi'm not sure how that fits with https://review.opendev.org/#/c/761473/08:21
zbri wanted to make the linter ansible version agnostic but is not possible now, it may take an year or more and help from ansible core to implement some missing features.08:21
ianwit doesn't seem right to lint the jobs with an ansible that zuul isn't running, although i'm not sure how much that actually matters08:22
*** ysandeep|ruck is now known as ysandeep|lunch08:22
zbrhmm... now i check and I see that we still have the 2.8 pipelines in linter so it should really support 2.808:25
zbrif it failed is likely due to a messed ansible install (due to upgrade/downgrade)08:25
ianwthere's no upgrade or downgrade happening in the tox jobs; it was pinning the version to 2.7 which causes the failure08:26
zbryep. that is because old pip does not check for conflicts, the new resolver would have prevented it.08:27
zbradd a "pip check" as first command, to prevent running code with broken deps.08:27
zbrupgrade/downgrade can still happen inside tox jobs based on the order the deps are defined, but that was not the issue in this case.08:28
zbrI could try to add an extra check for version in linter but I am not sure it does worth the effort.08:29
ianwi'm not actually sure what the failure mode would be leaving ansible uncapped.  perhaps a later version would correctly parse something that would not actually parse in the earlier version zuul is using?08:31
openstackgerritzbr proposed zuul/zuul-jobs master: More E208  https://review.opendev.org/76129308:33
zbrlinting should not be confused with functional testing, linting is more about testing practices and detecting upcoming changes that may break your code, so that is why is better to use the upper bounds instead of lower ones.  For functional is different.08:36
*** sshnaidm|rover has quit IRC08:36
zbri have a good example from flake8 where it required to be run on newer version of python in order to detect a big range of issues, even if the linted code did support a lower version of python.08:36
zbrto test compatibility, we would have to run functional testing with both lower and upper bounds, but that brings huge extra costs.08:37
*** sshnaidm|rover has joined #opendev08:38
zbri personally finding the version mix as providing a decent coverage of both.08:38
*** rpittau|afk is now known as rpittau08:39
*** sshnaidm|rover has quit IRC08:43
*** sshnaidm|rover has joined #opendev08:45
*** sshnaidm|rover has quit IRC08:52
*** sshnaidm|rover has joined #opendev08:56
*** sshnaidm|rover has quit IRC09:00
*** jaicaa has quit IRC09:01
*** jaicaa has joined #opendev09:02
kevinzfrickler: ianw:  Following the talk about https://review.opendev.org/#/c/760790/. We'd like to introduce OpenEuler 20.09 to Devstack, which is a Rpm based operation system and now work for AArch64 and X86_6409:17
kevinzIf the DIB is essential, I will ask OpenEuler team to offer the some support in upstreaming this features.09:18
kevinzBut if uploading images is fine temporily, I think adding a jobs to test this Devstack support woule be a good plus, so that we can work parallelly to make that work quickly happen09:20
fricklerinfra-root: ^^ there seems to be a generic cloud image available, not sure whether it would be o.k. for us to start with that or whether we'd have to insist on having dib support in order to get our customizations in place from the start09:20
*** ysandeep|lunch is now known as ysandeep|ruck09:20
fricklerI'm also not sure whether we'd have a procedure in place to use upstream images in nodepool at all, or whether that would have to be done manually09:21
*** Green_Bird has joined #opendev09:33
*** sshnaidm has joined #opendev09:49
*** DSpider has joined #opendev10:00
*** fressi has quit IRC10:10
*** hashar has joined #opendev10:15
*** noonedeadpunk has quit IRC10:32
*** noonedeadpunk has joined #opendev10:32
*** ysandeep|ruck is now known as ysandeep|brb10:35
kevinzfrickler: Thanks! will wait for more comments here :-D10:48
*** ysandeep|brb is now known as ysandeep|ruck10:51
*** fressi has joined #opendev10:57
*** fressi has quit IRC11:13
*** noonedeadpunk has quit IRC11:21
*** noonedeadpunk has joined #opendev11:25
*** sboyron has quit IRC11:49
*** sboyron has joined #opendev11:52
*** marios has quit IRC12:17
*** marios has joined #opendev12:21
*** marios has quit IRC13:00
*** marios has joined #opendev13:03
*** dmellado has quit IRC14:16
*** dmellado has joined #opendev14:20
openstackgerritMerged openstack/project-config master: tox.ini : update Ansible pin  https://review.opendev.org/76147314:34
*** dtantsur has joined #opendev15:00
dtantsurhi folks! sorry if it has been asked too often already, but would it possible to enable code search on opendev git?15:06
mordreddtantsur: it exists? https://opendev.org/explore/code?tab=&q=novaclient https://opendev.org/sardonic/sardonic/search?q=cmdb15:08
mordredthere is an open issue upstream gitea for making that all pluggable so that something like elasticsearch could be used to power the indexing ... so at the moment I think codesearch.openstack.org is still better at searching15:11
dtantsurmordred: this is empty for me: https://opendev.org/openstack/ironic/search?q=automated_clean15:12
dtantsurhow does it look for you?15:12
mordredsimilar. I'm guessing automated_clean is something that is in the ironic repo?15:12
dtantsuryep. I've tried many things including "if" :)15:13
mordredfascinating15:13
mordredwell - it's not a thing that's intentionally turned off15:13
dtantsurI've never had any results whenever I tried, so I assumed it might have been turned off15:13
dtantsurfascinating indeed15:13
mordredbut it's also not a subsystem that's gotten a lot of love - largely because it's currently a per-gitea-node thing15:13
*** hashar is now known as hasharKids15:20
clarkbthe current code search uses a go lib that seems to have odd behaviors that don't map well to how humans search for text16:01
clarkbthe elasticsearch support comes in the next release and should be more familiar to those who have used our logstash16:01
clarkbI expect we can try deploying ES alongside gitea in non clustered mode just to ensure that all works well16:01
clarkbfrickler: kevinz: we strongly prefer dib because what we've found happens with the upstream images is they change behaviors or do things with cloud init that don't make sense. Its just easier to have a single common image that uses glean for our test nodes16:02
*** ysandeep|ruck is now known as ysandeep|away16:06
*** dmellado has quit IRC16:11
*** dmellado has joined #opendev16:13
openstackgerritMerged openstack/project-config master: Add manila client,ui,tempest plugin core teams  https://review.opendev.org/75886816:21
*** marios is now known as marios|out17:01
*** ykarel has quit IRC17:04
*** ykarel has joined #opendev17:05
openstackgerritMerged openstack/project-config master: Update neutron grafana dashboard  https://review.opendev.org/75820817:06
*** marios|out has quit IRC17:14
openstackgerritClark Boylan proposed opendev/system-config master: Update gerrit plugins on 2.16 and 3.0  https://review.opendev.org/76164117:16
clarkbensuring we're keeping our gerrit images up to date after hashar's feedback17:16
openstackgerritMerged opendev/system-config master: Document dual account split for Gerrit admins  https://review.opendev.org/76005117:19
*** rpittau is now known as rpittau|afk17:21
*** Green_Bird has quit IRC17:21
*** Green_Bird has joined #opendev17:25
hasharKidsclarkb: hi, I hope my reply was not perceived as me patronizing!17:30
*** hasharKids is now known as hashar17:30
fungihashar: not at all! it had a lot of good reminders17:30
clarkbhashar: nope, it was useful to get input on whether or not we are on track17:30
clarkband the bits about the js stuff were helpful too17:30
hasharbut generally Gerrit upstream recommend to use the very latest patch release of any minor series17:30
fungiyeah, reassuring to see it basically matches our upgrade plan17:30
clarkbhashar: yup, our docker builds build off of stable-* branches and get the latest commit17:30
hasharso 2.x.max(y)17:30
clarkbso we should be at least as new as the most recent release for each stable branch when we rebuild17:31
hasharalso note I haven't been directly involved in the Gerrit upgrade planning.   Christian Aistleitner has done all the hardwork17:32
hasharso the reference is his writing at https://groups.google.com/g/repo-discuss/c/G5wucKJg9Ag/m/pLin-i3mBgAJ :]17:32
hasharI merely echoed and mentioned a few things we found after we upgraded17:33
fungi2.x.max(y) or newer, yeah. in many cases there are subsequent stable branch commits which are not yet tagged as point releases17:38
*** hamalq has joined #opendev17:38
clarkbyup, we were already testing with the notedb migration improvements prior to the latest 2.16 release as a result17:39
*** eolivare has quit IRC17:45
hasharahh great !17:51
hashardo you, or will you, run the production Gerrit out of a Docker image?17:51
clarkbhashar: we do and we will :)17:52
fungiyeah, we build a docker image with zuul jobs, so that our chosen set of plugins will be included17:52
fungiwe also continuously deploy image updates with a zuul job too17:53
hasharnice. You are way more automatized than us :-D17:53
clarkbwe don't auto restart though17:53
hasharfor later, you might be interested in the  multi-site plugin  https://gerrit.googlesource.com/plugins/multi-site17:53
fungiright, we'd rather still control the outage times for restarts17:53
hasharas I get it, that lets ones do rolling upgrades with 0 downtime17:53
fungibut yeah, if multi-site is robust enough now, maybe rolling restarts of cluster members behind an lb would suffice17:54
fricklerthat might even allow us to distribute over multiple providers. not sure how we'd lb ssh though?17:55
hasharthere are some explanation by Luca Milanesio (a Gerrit maintainer and he is behind https://www.gerritforge.com/ ) at  https://www.mediawiki.org/wiki/Topic:Vwkvtt6hlurmo42t17:55
clarkbfrickler: aiui its all primary secondary17:55
clarkbfrickler: not active active17:55
fungiright, we'd configure haproxy or whatever to only ever send connections to one cluster member or the other, never both17:56
fungibut i agree, it will likely be disruptive for ssh stream-events connections. they'll get reset and need to reconnect17:56
*** ykarel is now known as ykarel|away17:56
fungiwhich could leave windows of time where events are missed17:56
clarkbone thing at a time :)17:57
fungiyeah, i'm not in any hurry to add multi-site but it's neat to consider for down the road17:58
fungiit would also be lots of additional complexity optimizing away one-minute restart outages which happen at most once a month17:58
fungiso we should definitely weigh the positives and negatives of such a solution17:59
hasharanother advantage is to reduce latency  which comes helpful when your users are geographically distributed all accross the world17:59
fungihow does it reduce latency if only one cluster member is active?18:00
hasharI mean, you could have a Gerrit in asia for example18:00
fungior is there an active/active model with multi-site too, not just active/standby?18:00
clarkbit was my understanding that the gerrit clustering doens't do active active18:00
hasharbut maybe some locks have to happen all the way back to a reference that is held in the US, so maybe that doesn't help much18:00
clarkbyou sync from the primary to the standby's using the replication plugin18:01
clarkband you can't sanly do that in both directions I don't think18:01
fungiin theory clients could read from the standby node, but not write to it18:01
clarkb(but maybe that has changed since I lasts looked at this)18:01
hasharthe link I pasted above was us complaining about multi site not really working for us (  https://www.mediawiki.org/wiki/Topic:Vwkvtt6hlurmo42t ) , but one of its maintainer pointed out the doc we used was outdated18:02
hasharseems like the plugin has been largely improved and the doc has been updated as a result of the above discussion18:02
hasharhttps://gerrit.googlesource.com/plugins/multi-site/+/HEAD/DESIGN.md  might gives more clues18:02
hasharbut as Clark said, one thing at a time. You can look at it next year I guess :]18:03
*** ykarel|away has quit IRC18:11
openstackgerritMerged opendev/system-config master: Update gerrit plugins on 2.16 and 3.0  https://review.opendev.org/76164118:25
*** andrewbonney has quit IRC18:29
*** hashar is now known as hashardinner18:39
*** ralonsoh has quit IRC18:41
fungiooh, python 3.10.0a2 just dropped!18:42
*** sshnaidm is now known as sshnaidm|afk18:44
fungii've booted review-test back up and then downed the gerrit container on it18:52
*** dtantsur is now known as dtantsur|afk18:53
*** lpetrut has quit IRC19:16
*** _mlavalle_2 has quit IRC19:16
*** Tengu has quit IRC19:31
*** rchurch has quit IRC20:22
*** hashardinner is now known as hashar20:38
*** dwilde has quit IRC20:46
*** d34dh0r53 has joined #opendev20:46
fungiany tmux users who aren't aware, be on the lookout for updates to fix code execution by carefully crafted escape sequences: https://www.openwall.com/lists/oss-security/2020/11/05/321:10
ianwfungi: when you have a sec, would you mind a double check on the grafana http -> https redirect one-liner @ https://review.opendev.org/#/c/761487/ ... just making sure there isn't a better way to do it21:19
fungiwe can redirect /.* to /$121:21
fungiso old http urls continue to work21:21
fungii think redirecting / only does any good if folks load up / explicitly?21:22
ianwfungi: i think Redirect just replaces the string and leaves the rest of the url alone ... i mean it seems to work that way?  e.g. http://grafana.opendev.org/dashboards21:24
fungioh, maybe21:25
fungicould be i'm confusing it with rewrite21:25
*** hashar has quit IRC21:27
*** sboyron has quit IRC21:28
ianwyeah, something about "nice thing about standards is there's so many to choose from" :)21:59
fungire.*22:01
fungilgtm then22:01
*** slaweq has quit IRC22:06
*** mlavalle has joined #opendev22:08
*** hamalq has quit IRC22:11
openstackgerritMerged opendev/system-config master: grafana: redirect http to CNAME  https://review.opendev.org/76148722:31
ianwdoes limestone not have ipv4 nat?  or is glean doing something wrong?22:37
ianwwrt https://review.opendev.org/#/c/761178/22:37
ianwhttps://d4eb7e3efe98cba79a4b-f4d168cdb20f40841821e4b213645c0f.ssl.cf2.rackcdn.com/739139/12/gate/neutron-tempest-plugin-scenario-linuxbridge/9a6b4f7/zuul-info/zuul-info.controller.txt22:37
clarkbianw: something is going on there. I pinged logan yesterday in -infra but havemt heard back22:37
ianwahh, ok22:37
clarkbit should have a 10/8 network and gleans hould configure it to dhcp22:37
clarkbbut I havent actually poked at the opemstack apis and hosts yet22:38
openstackgerritMerged opendev/system-config master: grafana: fix typo in test name  https://review.opendev.org/76148922:38
openstackgerritMerged openstack/project-config master: Add pypa/project-config  https://review.opendev.org/76146722:39
ianwfungi/clarkb: you seemed to have some opinions on the 8gb swap reset @ https://review.opendev.org/761119 in the linked irc conversation, so i haven't approved.  it does seem that the larger swap is a matter of ~20 seconds to create which doesn't seem too bad to me22:43
clarkbthe problem is projects like ironic run out of disk with even the 1gb swap22:43
clarkband increasing it to 8gb will only make that bigger. If this wasn't a last ditch method to avoid jobs failing when they need to swap a little I'd be more on board but the swap isn't really there to double the "memory"22:44
clarkbif jobs hvae those problems they need to reduce memory or be multinode and distribute the memory load22:44
clarkbultimately if the rest of openstack says projects like ironic are the ones that need to change then ok we cna land something like that, but I think that gets the purpose of the swap device wrong22:45
ianwyeah, good points; we probably should communicate that though22:50
ianwback to limestone, in "nodepool list" the nodes have a 10. ip address.  so presumably openstacksdk is seeing an address defined22:50
clarkbya it was probably a mistake to make it so big previously, but we figured its sparse allocated so it doesn't actually matter unless you need it and if you run out of disk and need swap you'll break anyway22:51
openstackgerritMerged openstack/project-config master: Add pypa tenant  https://review.opendev.org/76146822:51
ianwi just jumped on a random focal node and it has ipv422:51
clarkbianw: was it configured by dhcp (just want to confirm that assumption on my part)22:51
ianwNov  5 22:21:38 ubuntu-focal-limestone-regionone-0021581754 dhclient[466]: DHCPREQUEST for 10.4.70.27 on ens3 to 255.255.255.255 port 67 (xid=0xcc89a40)22:52
ianwyep22:52
clarkbI wonder if there is some issue with dhcp for some hosts, like maybe neutron isn't setting up the mapping in dnsmasq in some cases then it fails?22:53
clarkbI've jumped on a bionic node and it too looks fine, has a default route via ens3 and a 10/8 address22:53
clarkbanother thing it could be is we're running out of addresses in the pool?22:54
ianwsimilar on another two nodes i've jumped on22:54
clarkballocation_pools     | 10.4.70.10-10.4.70.25422:56
clarkbthat should be plenty for what I think is a ~50 node max-server limit22:56
ianwunfortunately the syslog in that job doesn't go back to the start of boot22:56
clarkbthere are 62 ports in use22:57
clarkball that is telling me that we're well below our allocation limit so that shouldn't be the problem22:58
fungiit's possible that with random macs and decent churn we're overrunning the pool in dhcpd if the leases are established with too long of a timeout?22:58
clarkboh maybe22:58
clarkbusually neutron leases are very short, but that isn't necessarily the case22:58
clarkbianw: does for focal node say what the lease period is?22:59
ianw option dhcp-lease-time 86400;23:00
clarkbthat is one day right? I wonder if that is the problem23:00
ianwoption dhcp-renewal-time 43200;23:01
ianwoption dhcp-rebinding-time 75600;23:01
ianwdunno what those are23:01
fungiyeah, that's a day. it really depends on the dhcpd though as to whether it will recycle leases which don't respond to ping/arp under pressure23:01
clarkbianw: the renewal time is when the client should renew usually set to 1/2 the lease time23:01
clarkbI'm trying to see where neutron might expose this and if we can see it as non cloud admins23:02
fungihowever, if the api is claiming to have assigned an ip address for the question nodes, then i don't expect it to be a pool problem23:02
clarkblooks like its a config option in the dhcp agent config23:02
fungii want to say neutron sets up reservations in dnsmasq?23:02
clarkbnot something exposed by the api?23:03
ianwnot sure if the journal file will have the syslog23:03
clarkbfungi: yes it uses mac address maps that dnsmasq assigns23:03
fungiif it's really all explicit reservations then lease times are irrelevant23:03
fungisince it's not doing an actual dhcp "pool"23:03
clarkbah ok.23:03
fungi(where allocation within the pool is left up to the dhcpd)23:04
fungisounds like neutron is responsible for tracking allocations and just tells the dhcpd what's been assigned instead23:04
ianwi think we should probably convert the journal to export from the start of boot, not from the time devstack started23:14
clarkbianw: I think devstack does that since some people reuse the nodes for CI23:15
clarkbbut in our case it would be fine23:15
ianwat the moment we're flying blind, but i guess there's nothing obvious/systematic, at least right now23:16
ianwi need to force merge the pypa project-config pipeline config23:38
ianwso trying the instructions23:39
clarkbianw: why is that?23:39
clarkboh its a config project with no jobs23:39
clarkbchange adds the jobs23:39
ianwchicken egg because there's no pipeline config to merge before there's a config :)23:39
ianw"Members added to group Project Bootstrappers: n/a"23:39
fungii think that's normal if the account running the set-members command isn't itself a member23:40
clarkbianw: web ui says it added you23:40
ianwyeah, i think red herring because the email is "n/a"23:41
fungiaha23:41
fungithat makes sense23:41
fungiin a gerrit sort of way23:41
clarkboh that reminds me one thing I ran into when testing new gerrit is it really wants an email on accounts23:41
clarkbso we may have to add email addrs to those accounts at some point23:41
fungialso you can use the ls-members command to look at the list of group members, if needed23:42
clarkbI had to set one to update the public key for the test project creator account23:42
clarkbfungi: we should test if the upgrade will break our admin accounts without email addrs set23:42
clarkbthat is something we can test though23:42
fungithat could pose problems since gerrit also doesn't like accounts to share e-mail addresses, so all admins will need two e-mail addresses23:42
clarkbwell we can also set it to a bogus value23:43
clarkbwe don't actually need the review emails23:43
clarkbjust need to convince it to not complan when setting things like public keys23:43
clarkb(my worry is there is a chicken and egg where we might not be able to change things like that because we need the email field to have something in it)23:43
ianwhttps://review.opendev.org/#/c/761681/ looks merged by ianw.admin, that's good23:43
fungibut also having it try to e-mail bogus addresses could be problematic23:44
ianwcan probably use + addresses?23:44
ianwto make something valid but different23:44
fungiyeah, i mean it's no problem for me, i run my mailserver so i fan add whatever addresses i want23:44
clarkbya we'll sort it out on the test node23:44
clarkbits possible its a non issue too23:44
ianwhttps://zuul.opendev.org/t/pypa/status has all the pipelines23:46
ianwand removed.  ++ to fungi for great instructions23:47
fungi+++ to gerrit's documentation23:48
*** tosky has quit IRC23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!