15:02:28 #startmeeting third-party 15:02:29 Meeting started Mon Jun 15 15:02:28 2015 UTC and is due to finish in 60 minutes. The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:30 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:33 The meeting name has been set to 'third_party' 15:02:37 hello 15:02:45 hello 15:02:45 hey 15:02:55 akerr patrickeast how are you? 15:02:56 Hi 15:03:02 Hi guys, I am looking for hardware requirements/references for CI infrastructure - can someone point me into a right direction? 15:03:03 Hi 15:03:04 Alexey_Nexenta: hello 15:03:12 IlyaG: I'm not a guy 15:03:28 what shall we discuss today? 15:03:53 hello 15:04:25 #link https://wiki.openstack.org/wiki/VirtualSprints#OpenStack_Common-CI_Solution 15:04:33 hi 15:04:37 asselin_'s sprint is coming up 15:04:46 please consider attending if you can 15:05:03 we will need people to conduct patch reviews and tests 15:05:11 so everyone can be helpful 15:05:28 wznoinsk: do you want to go into any more detail about your experience 15:05:33 thanks for posting ot the mailing list 15:06:12 #link http://lists.openstack.org/pipermail/openstack-infra/2015-June/002851.html 15:06:15 hi 15:06:17 anteaya: no problem 15:06:27 wznoinsk: care to expand a bit? 15:06:29 o/ 15:06:31 what is pygerrit? 15:06:42 is that the name of your script you wrote? 15:07:08 anteaya: I wasn't able to drill down exactly what piece of the operation is causing the problem or how to reproduce it outside of a python script 15:07:35 pygerrit is python lib for gerrit: https://pypi.python.org/pypi/pygerrit/0.2.2 15:07:40 ah 15:07:47 is anyone else using pygerrit? 15:08:27 I'm not advocating for it, just curious 15:08:47 wznoinsk: it seems to be staying away from Control-Z is your recommended way forward? 15:09:43 it didn't affect my 'ssh ... gerrit stream-event' on the command line when I did ctrl+z so my python/pygerrit is one situation you want to avoid ctrl+z with... rather than avoid ctrl+z at all 15:10:08 ah okay thanks for that clarification 15:10:24 wznoinsk: any other thoughts worth sharing at this time? 15:11:28 I got this link from someone on -infra, can't remember who it was tho: https://gerrit-documentation.storage.googleapis.com/ReleaseNotes/ReleaseNotes-2.9.2.html 15:11:36 it seems like this is fixed in gerrit 2.9.2 15:12:04 anteaya: nope, that was the most interesting one 15:12:25 wonderful 15:12:35 thank you for helping us with this issue 15:13:40 if anyone is able to help review any puppet-openstackci patches this week 15:13:45 there are two right now 15:13:49 #link https://review.openstack.org/#/q/status:open+project:openstack-infra/puppet-openstackci+branch:master+topic:downstream-puppet,n,z 15:14:00 it would be great if you are able to take a look at them 15:14:23 on slightly other note, I think it would be beneficial for CI operators (especially the ones that have their accounts disabled) to attend these meetings, it's a opportunity to get the issues solved quicker than on ML I think, how do we attract them? ;-) 15:14:33 IlyaG: you had some questions 15:15:03 IlyaG: did this person leave? 15:15:28 looks like it 15:15:33 wznoinsk: a very good point 15:15:41 wznoinsk: I'm open to suggestions 15:15:58 as one of the big things I need when re-enabling a system is trust 15:16:10 and open communication goes a long way to establishing that 15:16:31 it needs promoting this irc meeting during any disabled/naughty account discussino on ml 15:16:37 since if someone takes out our system, I have to trust they will pay more attention next time 15:16:51 wznoinsk: I'm open to that 15:17:03 I don't do it myself as it may appear to be self-serving 15:17:12 however I do not discourage others from doing so 15:17:35 I do link to common third party resources when I have the energy 15:17:54 but I do admit that sometimes I lag behind in linking all the things 15:18:12 I welcome help and support in providing links to all the things 15:18:30 wznoinsk, system operators have to want to get involved, they just can't miss all the irc/email traffic on it if they are paying attention 15:19:23 can the ML not inject a link into the footer of every message on a list-by-list basis? 15:19:30 anteaya: Ilya wanted to ask if you have any recommendation/requirements for CI hardware ? 15:19:43 akerr: I don't know I haven't investigated 15:19:44 I would forward it to him 15:19:54 akerr, we use openstack-dev, it would have no way of knowing 15:19:56 akerr: I find footers tend to be ignored 15:20:02 ++ 15:20:18 Alexey_Nexenta: ah okay thanks, are you working on the same ci system? 15:20:39 yes, working on Nexenta CI 15:20:43 well somehow the third-part-announce emails have a very specific link in their footer, so it's able to figure out 15:20:45 Alexey_Nexenta: we don't have a list, it depends entirely on what you are planning on testing 15:21:41 akerr: the footer is the third party announce mailing list information 15:21:56 some of the conversations happen on -dev and -infra too 15:22:13 if folks want a busier footer for third party announce, I can look into it 15:22:34 anteaya: true, but we when an account is disabled it goes through the announce email. Why not just also advertise this meeting there 15:22:40 Alexey_Nexenta: you would be testing cinder, right? 15:22:49 akerr: I can look into it 15:22:52 hm, for announce, it might make sense, ++ 15:22:56 patrickeast: yes, cinder drivers 15:23:20 #action anteaya to look into adding link to third party meetings in -announce ml footer 15:23:30 Our main problem is that we keep getting timeout errors 15:23:34 Alexey_Nexenta: ok cool, same here, for fwiw i’ve found that any less than ~3 build slaves able to run tests and our system gets too far behind 15:23:55 Alexey_Nexenta: with a single jenkins slave it ends up ~12-24 hours behind on build very quickly 15:24:10 with 3 it can stay only maybe 2-3 behind (with 35min builds) 15:24:37 so any hardware that can acommodate that amount of devstack vm’s and the controllers should work 15:24:38 Alexey_Nexenta: A good optimization is to only run on patches that Jenkins has +1'd 15:25:06 ameade: yes, I think we only run those 15:26:33 Alexey_Nexenta: what is your current setup? 15:27:06 anteaya: we use john griffiths sos-ci 15:27:23 hodos|2: how many other team members do you have in this meeting? 15:27:36 anteaya: 3 threads, fuel over vsphere 15:27:41 anteaya: just us two 15:27:45 okay thanks 15:28:06 hodos|2: are you using zuul and jenkins then? 15:28:17 * anteaya is not familiar with jgriffith's setup 15:28:33 anteaya: nope, just python + ansible 15:28:35 it replaces zuul, gearman, jenkins, all that 15:28:42 oh okay 15:28:45 no idea then 15:28:55 for timeouts I suggest you talk to jgriffith 15:29:06 it isn't an infra supported setup 15:29:18 anteaya: it seems to do the job, timeouts are happening on the different side 15:29:19 what kind of timeouts are you guys getting? 15:30:02 we get ssh timeouts on some of the tempest tests when our system is stressed (still looking into why its happening) 15:30:40 patrickeast: we probably using slow setup for openstack cloud: we have all-in-one node, but i guess, it's the same: backend or network is too slow 15:31:25 patrickeast: most of those are happening with compute_with_volumes 15:31:32 have you adjusted any of the timeout settings? iirc there are like 5 different ones to play with depending on what is timing out 15:31:57 it may be too radical but you may look into containers instead of vm's and stack inside 15:32:28 patrickeast: can you point on how to change those, cause i couldn't find any iinfo 15:32:56 patrickeast: same here (ssh timeouts), boot_pattern intermittently fails by timeout. If you solve this problem, could you let me know? I'm still working on it too 15:33:16 we change a lot of our timeouts to crazy high numbers... let me see if i can find the variables 15:33:53 hodos|2: so the ones we change are env variables for devstack-gate and jenkins, which unfortunately are not going to be helpful for sos-ci 15:34:08 hodos|2: but, at some point they should be ending up in the tempest config file 15:34:28 or i suppose in the sos-ci code if it has any global timeout for test runs like jenkins does 15:35:10 patrickeast: well, i meant tempest timeouts, but i couldn't find the variable names for local.conf 15:35:26 marcusvrn1: will do, we have looked into them a bit in the past and didn’t figure it out, but we are going back in to deep dive this sprint 15:35:35 hodos|2: those variables are in tempest.conf 15:36:02 #link https://github.com/openstack/tempest/blob/master/etc/tempest.conf.sample 15:36:31 In the [volume] section you can set build_timeout 15:36:36 of tempest.conf 15:36:50 you also might want to adjust [compute] build_timeout 15:37:00 marcusvrn1: thanks, i thougt you're able to modify tempest.conf through local.conf with TEMPEST_ vars 15:37:10 hodos|2: 15:37:17 hodos|2: yep, you are able 15:37:26 hodos|2: you can, just define those variables in a section for your tempest config 15:37:36 add [[post-extra|$TEMPEST_CONFIG]] to local.conf to set tempest variables 15:37:53 hodos|2: in the local.conf you have to add something like "[[post-extra|\$TEMPEST_CONFIG]]" and then add which sections and variables you want to add 15:38:32 akerr marcusvrn1 patrickeast: thanks, will try 15:39:11 patrickeast: thanks 15:39:25 have we reached a happy place? 15:40:00 anything more to say on this topic? 15:40:08 one thing to remember when using proxy is to specify no_proxy to have at least 127.0.0.1,ip of management interface when running commands that connect to keystone 15:40:47 o/ 15:40:52 hey asselin_ 15:41:08 anteaya, hi 15:41:53 Alexey_Nexenta hodos|2 thanks for coming and asking, I hope you have enough to move forward this week 15:42:24 does anyone have anything else they would like to discuss today? 15:42:55 anteaya: yep, we're hoping to speed up our env, by removing bottlenecks in network, and providing SSDs to backend 15:43:13 hodos|2: okay sounds great 15:43:17 anteaya: I focus to get my CIs as good as possible, is not having CIs mentioned on mailing list enough to say they're ok assuming they follow the rules from wiki? 15:43:19 thanks for the help everyone, will try those advices 15:43:48 wznoinsk: we don't have any stamp that says "this is a good ci" 15:44:08 wznoinsk: the biggest thing I look for is if someone has a question is someone around to answer it 15:44:11 or will be soon 15:44:39 anteaya: ok, is there anything in works to have something like 'infra approved' ci kind of thing? 15:44:53 wznoinsk: the fact you show up to meetings regularly and help others when you can goes a long way for me to add my voice of confidence when someone has a question about your system 15:45:15 Alexey_Nexenta: welcome, I hope you find a solution 15:45:19 wznoinsk: no 15:45:28 wznoinsk, not infra approved, but close: https://github.com/rasselin/os-ext-testing 15:45:33 wznoinsk: as any system can go sideways at any time 15:45:55 wznoinsk: we have an infra supported workflowl 15:46:11 wznoinsk, we'll have a CI monitoring dashboard, hopefully soon, that will help with whether "this is a good CI" question 15:46:11 anteaya: thanks for the link, that's what I was after 15:46:13 which is what asselin_ is helpfing folks with and is working on improving 15:46:24 wznoinsk: ah sorry I mis-understood 15:46:42 wznoinsk, this is the goal: http://specs.openstack.org/openstack-infra/infra-specs/specs/openstackci.html 15:47:06 wznoinsk: some folks are looking to have their system blessed so they can stop maintaining it, a turn it on a forget it solution 15:47:17 wznoinsk: sorry I mis-understood your question 15:48:01 anything else on this topic? 15:48:24 anyone with anything else to discuss today? 15:49:08 can anyone think of any reason why we shouldn't wrap up the meeting? 15:49:29 okey dokey 15:49:43 thanks everyone for your attendance and participation 15:49:46 have a good week 15:49:51 see you next monday 15:49:54 #endmeeting