Tuesday, 2020-10-27

*** mugsie has quit IRC01:00
*** mugsie has joined #openstack-release01:04
*** dave-mccowan has quit IRC01:20
*** armax has quit IRC01:25
*** armstrong has quit IRC03:55
*** ykarel has joined #openstack-release04:31
*** ykarel has quit IRC04:35
*** ykarel has joined #openstack-release04:36
*** ykarel has quit IRC05:10
*** ykarel has joined #openstack-release05:16
*** ykarel_ has joined #openstack-release05:18
*** ykarel has quit IRC05:21
*** vishalmanchanda has joined #openstack-release05:29
*** armax has joined #openstack-release05:32
*** evrardjp has quit IRC05:33
*** evrardjp has joined #openstack-release05:33
*** e0ne has joined #openstack-release06:22
*** e0ne has quit IRC06:24
*** e0ne has joined #openstack-release07:06
*** sboyron has joined #openstack-release07:08
*** dtantsur|afk is now known as dtantsur07:20
*** rpittau|afk is now known as rpittau07:42
*** e0ne has quit IRC07:45
*** ykarel_ is now known as ykarel07:56
*** slaweq has joined #openstack-release08:11
*** ykarel has quit IRC08:13
*** e0ne has joined #openstack-release08:14
*** ykarel has joined #openstack-release08:16
*** hberaud has joined #openstack-release08:19
*** jbadiapa has quit IRC08:28
*** tosky has joined #openstack-release08:36
*** jbadiapa has joined #openstack-release08:38
*** e0ne has quit IRC08:54
openstackgerritHervé Beraud proposed openstack/releases master: ignore trailing projects in R-2  https://review.opendev.org/75586109:40
openstackgerritHervé Beraud proposed openstack/releases master: ignore trailing projects in R-2  https://review.opendev.org/75586109:44
*** ykarel_ has joined #openstack-release10:06
*** ykarel has quit IRC10:09
*** hberaud has quit IRC10:20
*** hberaud has joined #openstack-release10:21
openstackgerritHervé Beraud proposed openstack/releases master: Adding a tool to track project who need to drop eol branches  https://review.opendev.org/75899010:46
*** ykarel_ is now known as ykarel11:17
*** e0ne has joined #openstack-release11:37
*** sboyron has quit IRC11:45
hberaudsmcginnis: I wonder why your changes (https://review.opendev.org/#/c/759309/) are not present => https://releases.openstack.org/reference/reviewer_guide.html12:08
hberaudI didn't see job failure12:08
hberaudlogs seems ok => https://zuul.opendev.org/t/openstack/build/caffc17288b84f4ab6a048ab2c7f5614/log/job-output.txt12:17
smcginnishberaud: That's odd. I wonder if there's an AFS sync issue or something like that again.12:29
smcginnisfungi: Do you have a moment to check on that? ^12:29
hberaudI didn't see weird things in logs12:29
fungilookinf12:30
fungilooking i mean12:30
*** dtantsur is now known as dtantsur|brb12:34
fungithe bottom of https://releases.openstack.org/reference/reviewer_guide.html says it was last "updated: Mon Oct 26 13:04:28 2020, commit bcc0fbced"12:37
fungihttps://zuul.opendev.org/t/openstack/builds/?job_name=publish-tox-docs-releases shows that was the last commit to build12:38
fungiand that's not the most recent merge commit12:40
fungib5cc5e9 is the current branch tip, not bcc0fbc12:41
fungioh, my terminal was wrapping something around, 0c59b5c is the current branch tip13:01
fungiokay, so that was the problem13:01
fungihberaud: smcginnis: the builds completed out of order. the publish-tox-docs-releases build for bcc0fbc finished after the build for 0c59b5c13:03
fungithey started out of order, which usually indicates a restart for the job, first one may have lost its node somehow13:06
*** dave-mccowan has joined #openstack-release13:07
hberaudfungi: I see thanks13:16
hberaudso normally during the next build the doc will be fixed13:17
fungiyes13:18
fungiwe just happened to publish the second-most-recent state immediately after the most-recent state13:18
fungithere are ways to avoid that, like using a semaphore in the job or running that job in a supercedent pipeline13:19
hberaudfungi: AFAIK we already have a semaphore for this job https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L41313:25
hberaudhttps://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L43213:25
hberaudas we already faced race condition with access during doc deployment with rsync13:26
*** brinzhang_ has quit IRC13:37
hberaudour current semaphore ensure us that our resource (the doc server) is only sync by one job at once, but it doesn't protect us for this use case (when the builds completed out of order), resource is available, so the upload can be launched. (if I correctly understood our use case)13:43
*** dave-mccowan has quit IRC13:46
*** dave-mccowan has joined #openstack-release13:50
smcginnisThanks fungi. That makes sense.14:02
fungioh, good point, that prevents two builds from running concurrently, but doesn't necessarily prevent them from running out of order14:02
smcginnisWe should have another build sometime soon, so it all sorts out in the end.14:02
fungia supercedent pipeline would preserve ordering14:02
*** dtantsur|brb is now known as dtantsur14:03
fungi(as would a dependent pipeline)14:03
*** slaweq is now known as slaweq|ptg14:07
hberaudfungi: I don't expect that this kind of scenario will happen at each build but if supercedent can help us to definitely fix similar bugs why not14:19
fungithere are tradeoffs14:21
fungirelease-post is set to independent (unlike post which is supercedent) because you're also running some jobs in there that need to run for every enqueued ref, though the docs job isn't one of them14:22
fungiit could be moved to an existing supercedent pipeline like post or promote, but those pipelines don't have as high of a priority as the release and release-post pipelines so jobs in them can take longer to run when we're short on resources14:23
hberaudI see14:26
fungior we could create a release-specific supercedent pipeline, but seems like it would be overkill just for that one job14:26
fungior it's possible tag-releases no longer needs to run for every ref, and would be fine just running the most recently enqueued ref, but i'm not deeply familiar with how that job decides what should be tagged14:27
hberaudmost of our doc changes are not really high priority changes except the last changes of a cycle when we update cycle status14:28
fungidigging deeper in the scheduler's debug log, it doesn't appear there was any retry for the earlier build, they simply started out of order. the only other cause i can think of is if the node request for that build of the earlier ref failed and had to be resubmitted, and since the later ref got the node request for its build satisfied sooner it ran first14:35
hberaudI don't think we want to move the release-post to supercedent, I personally prefer to move this doc job to an existing supercedent pipeline, it looks more safe for us14:38
hberaudalso AFAIK I didn't see similar issue in the past and now we are aware of that, so maybe we could consider to leave our existing config as it is for now and move on another pipeline only if we face another similar issue, it will avoid the tradeoffs for now, thoughts?14:44
fungiyeah, i'm still trying to nail down the precise circumstances which allowed this to happen, but i'm getting the feeling it would be a very rare race condition14:46
hberaudyep seem things here14:47
hberaudI don't want to sacrifice our pipeline priority for an hypothetical and unlikely use case14:48
hberaudfungi: anyway thanks for your great help14:49
fungimy pleasure, i'm still trying to get to the bottom of this because i want to understand it better in case it comes up again14:51
hberaudfungi: sure, don't hesitate to share your observations I'm interested by this topic14:54
fungiit gets stranger. so i've confirmed that the ref-updated events from those changes merging did arrive in order, but the scheduler initiated the node requests for them out of sequence... and nearly an hour later14:56
hberaudI see14:57
fungii wonder if the mergers were overloaded, and eventually returned the build refs out of sequence14:57
clarkbnode requests can be fulfilled out of order15:10
clarkbthe reason for this is a specific provider grabs each request and they may fulfill them in different orders. If we need to sequence them that has to happen prior to sibmitting the node requests15:11
fungiin this case the scheduler submitted the node requests out of order, though it looks like the mergers returned refs in order15:17
fungibut there's nearly an hour between when the mergers return and when the node requests are submitted, so i'm still trying to work out what happened there15:18
fungiclarkb: do you happen to know if semaphores block nodes from being requested? i guess that would make sense. seems like maybe there was a series of merges for openstack/releases so this may have been the end of a long chain of serialized builds. maybe the ordering for submitting node requests gets racy15:19
clarkbI think semaphores should order the node request submissions15:20
fungii don't see any indication that the node requests failed and were resubmitted15:39
fungithis is a heavily filtered timeline: http://paste.openstack.org/show/79943615:40
*** ykarel has quit IRC15:40
fungiyou can see the merge requests happened and returned immediately after the triggering events15:40
fungibut the node requests were added out of order15:41
fungibut also the only explanation i have for the ~45-minute gap between when the mergers returned refs and when the node requests were submitted is that there were other builds queued/underway with the same semaphore15:42
*** sboyron has joined #openstack-release15:46
*** e0ne has quit IRC15:59
*** tosky_ has joined #openstack-release16:07
*** tosky has quit IRC16:08
*** slaweq|ptg is now known as slaweq16:08
clarkbif the node requests fail the job should have a node failure iirc16:22
clarkbthey don't get resubmitted16:22
fungiyeah, so that definitely wasn't it16:36
fungii don't have a good explanation for why the scheduler seems to have submitted these node requests out of sequence16:36
fungicorvus: no idea if you're around, but maybe you have a suggestion of what to look at next16:36
clarkbcorvus may have thoughts ?16:36
*** tosky_ is now known as tosky16:56
*** ricolin has quit IRC17:10
*** vishalmanchanda has quit IRC17:30
*** rpittau is now known as rpittau|afk17:34
*** dtantsur is now known as dtantsur|afk18:13
*** hberaud has quit IRC18:25
*** hberaud has joined #openstack-release19:54
*** hberaud has quit IRC20:05
*** melwitt has joined #openstack-release20:07
*** slaweq has quit IRC20:16
*** rpittau|afk is now known as rpittau20:18
*** slaweq has joined #openstack-release20:19
*** hberaud has joined #openstack-release20:23
*** otherwiseguy_ is now known as otherwiseguy20:33
*** gouthamr has quit IRC20:58
*** gouthamr has joined #openstack-release20:58
*** gouthamr has quit IRC20:59
*** gouthamr has joined #openstack-release20:59
*** rpittau is now known as rpittau|afk21:03
*** slaweq has quit IRC21:04
*** slaweq has joined #openstack-release21:22
*** hberaud has quit IRC21:28
*** sboyron has quit IRC22:06
*** slaweq has quit IRC23:10
*** tosky has quit IRC23:52
*** armax has quit IRC23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!