#ceph IRC Log


IRC Log for 2011-01-12

Timestamps are in GMT/BST.

[0:29] * sagelap (~sage@static-66-14-234-139.bdsl.verizon.net) has joined #ceph
[0:29] * sagelap (~sage@static-66-14-234-139.bdsl.verizon.net) has left #ceph
[0:40] * greglap (~Adium@ has joined #ceph
[1:14] * gnp421 (~hutchint@c-75-71-83-44.hsd1.co.comcast.net) has joined #ceph
[1:16] * NoahWatkins (~NoahWatki@soenat3.cse.ucsc.edu) has joined #ceph
[1:24] <gnp421> Is there any other documentation for the RADOS API other than the brief Wiki page?
[1:25] <greglap> gnp421: the wiki and the header file
[1:25] <greglap> are you after anything specific?
[1:26] <gnp421> more information but if the header file has information, I'll check it out
[1:27] <greglap> well it's got the code, so there's at least some there ;)
[1:27] <gnp421> lol true
[1:28] <greglap> there's not a lot of documentation in general, but I don't remember librados being too weird about most things
[1:36] <gnp421> ok, I'll give it a shot
[1:44] <Tv|work> new branch "clitests" in git, builds in to of gtest changes to do simple command line tool testing
[1:44] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[1:44] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:47] <Tv|work> s/in to/on top/
[1:50] * MarkN1 (~nathan@ Quit (Ping timeout: 480 seconds)
[1:54] * MarkN (~nathan@ has joined #ceph
[2:11] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:23] * NoahWatkins (~NoahWatki@soenat3.cse.ucsc.edu) Quit (Remote host closed the connection)
[2:40] * cmccabe (~cmccabe@ has left #ceph
[2:51] * NoahWatkins (~NoahWatki@c-98-234-57-117.hsd1.ca.comcast.net) has joined #ceph
[2:52] * NoahWatkins (~NoahWatki@c-98-234-57-117.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[3:00] * bchrisman (~Adium@ has joined #ceph
[3:06] * MarkN (~nathan@ Quit (Ping timeout: 480 seconds)
[3:09] * Zarly (603641e1@ircip1.mibbit.com) has joined #ceph
[3:11] <Zarly> Worldwide Protests for free speech and a free press in happening January 15th - support us, join us, check it out at http://operationcelebration.net/wiki/ and http://whyweprotest.net - thank you and we look for your support
[3:15] * MarkN (~nathan@ has joined #ceph
[3:24] * Zarly (603641e1@ircip1.mibbit.com) has left #ceph
[3:37] * MKFG (~MK_FG@ has joined #ceph
[3:37] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[3:37] * MKFG is now known as MK_FG
[3:45] * MK_FG (~MK_FG@ Quit (Quit: o//)
[3:47] * MK_FG (~MK_FG@ has joined #ceph
[3:59] * MK_FG (~MK_FG@ Quit (Quit: o//)
[4:01] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[4:02] * MK_FG (~MK_FG@ has joined #ceph
[4:37] * gnp421 (~hutchint@c-75-71-83-44.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[5:31] * pruby (~tim@leibniz.catalyst.net.nz) Quit (Ping timeout: 480 seconds)
[5:33] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[6:25] * gnp421 (~hutchint@c-75-71-83-44.hsd1.co.comcast.net) has joined #ceph
[6:33] * earth (~summer@75-137-144-177.dhcp.gwnt.ga.charter.com) has joined #ceph
[6:36] * gnp421 (~hutchint@c-75-71-83-44.hsd1.co.comcast.net) Quit (Read error: Connection reset by peer)
[6:54] * ijuz__ (~ijuz@p4FFF69B1.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[7:03] * ijuz__ (~ijuz@p57999889.dip.t-dialin.net) has joined #ceph
[7:15] * joshd (~jdurgin@ has joined #ceph
[7:18] * gnp421 (~hutchint@c-75-71-83-44.hsd1.co.comcast.net) has joined #ceph
[8:27] * gnp421 (~hutchint@c-75-71-83-44.hsd1.co.comcast.net) Quit (Read error: Connection reset by peer)
[8:32] * joshd (~jdurgin@ Quit (Quit: Leaving.)
[8:41] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[9:13] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:59] <jantje> wido: yes, I id hit #563 all the time
[10:03] <jantje> and I hit #666 as well
[10:06] * earth (~summer@75-137-144-177.dhcp.gwnt.ga.charter.com) Quit (Ping timeout: 480 seconds)
[10:09] * Yoric (~David@ has joined #ceph
[11:09] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[11:10] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[11:25] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[11:27] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[11:32] * allsystemsarego (~allsystem@ has joined #ceph
[11:37] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[11:37] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[11:46] * SimonB (~Simon@mail.openminds.co.uk) has joined #ceph
[11:46] <SimonB> Morning all.
[11:55] * SimonB (~Simon@mail.openminds.co.uk) Quit (Quit: leaving)
[11:58] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[12:05] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[12:08] * SimonB (~Simon@mail.openminds.co.uk) has joined #ceph
[12:10] <SimonB> Hi all, can someone kindly point me in the direction of what im doing with my ceph.conf? just c&p'ing examples isnt help me understand too much tbh. I just want to setup a two node to play with for the moment.
[12:13] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[12:23] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) has joined #ceph
[12:23] * Michiel_ (~michiel@mike.dwaas.org) has joined #ceph
[12:24] <Michiel_> hi guys
[12:24] * Michiel_ is now known as Guest4133
[12:24] * Guest4133 is now known as MichielM
[12:31] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[12:36] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[12:51] <jantje> hi
[13:05] * Yoric (~David@ Quit (Quit: Yoric)
[13:31] <wido> jantje: hi!
[13:32] <wido> Yes, 563 and 666 are a bit of a 'killers' right now
[13:37] * Yoric (~David@ has joined #ceph
[13:38] <wido> SimonB: sure! What is the problem?
[13:39] <wido> SimonB: what is your goal? What kind of setup are you trying to configure?
[13:39] <MichielM> hmm
[13:39] <MichielM> 2011-01-12 13:39:18.254490 mds e23: 2/2/2 up {0=up:active,1=up:resolve(laggy or crashed)}
[13:39] <MichielM> after a stresstest with bonnie
[13:41] <SimonB> wido: just a nice simple two node cluster. I have currently one disk on each formatted with btrfs. Thats about it at the moment. Currently when i run mkcephfs Im getting a seg fault and a warning about 'no btrfs'. So I presume my conf file is really wrong?
[13:41] <SimonB> running debian squeeze by the way.
[13:41] <wido> SimonB: could you pose your config somehwere? pastebin.com?
[13:44] <SimonB> wido: Im guessing its very wrong but: http://pastebin.com/XJ42Y23V
[13:45] <wido> SimonB: not that bad at all ;)
[13:45] <wido> first, you don't need to format the drive with btrfs, mkcephfs will do this for your
[13:45] <wido> you*
[13:46] <wido> second, your OSD's need a journal, this can be a block device or a file
[13:46] <wido> when you specify a file, please set the osd journal size = 100M"
[13:46] <wido> "osd journal size = 100M", you can make the journal as large or small as you want
[13:47] <SimonB> ok, sure. I'll give that a shot now.
[13:47] <wido> SimonB: my ceph.conf: http://pastebin.com/4K6yXDU2
[13:47] <wido> everything on one node
[13:50] <SimonB> and thats now created, that you very much.
[13:50] <MichielM> [WRN] message from mon1 was stamped 0.001001s in the future, clocks not synchronized
[13:50] <MichielM> how the heck can i get them in sync
[13:50] <MichielM> ntp wont resolve it
[13:52] <wido> MichielM: "mon clock drift allowed = 1"
[13:52] <wido> then your clocks are allowed to drift 1 second
[13:52] <MichielM> h ok
[13:52] <MichielM> thx
[13:52] <wido> you could also specify "mon clock drift warn backoff = 30", then you will only get a message every 30 seconds
[13:53] <wido> MichielM: how many MDS'es do you have? (Just saw your line above)
[13:53] <MichielM> wido: just 2
[13:53] <MichielM> on test cluster
[13:53] <wido> MichielM: are they both running? Or did one of them really crash?
[13:54] <MichielM> yep
[13:54] <MichielM> no
[13:54] <MichielM> afaik not
[13:54] <MichielM> hmm
[13:54] <MichielM> now he's dead
[13:55] <wido> is there a file called "core" in / ?
[13:55] <MichielM> 2011-01-12 13:54:56.207414 mds e26: 2/2/2 up {0=up:active,1=up:resolve}
[13:55] <MichielM> -rw------- 1 root root 60350464 2011-01-11 10:53 core
[13:55] <MichielM> yep
[13:55] <wido> Yes, it crashed
[13:55] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) Quit (Ping timeout: 480 seconds)
[13:55] <MichielM> no shit :)
[13:55] <wido> MichielM: you might want to check out: http://ceph.newdream.net/wiki/Troubleshooting
[13:56] <wido> MichielM: you never now :)
[13:56] <wido> if you could make a backtrace? It might be a issue which is already known
[13:57] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[14:00] <MichielM> wido: sure
[14:01] <MichielM> bah
[14:01] <MichielM> Program terminated with signal 6, Aborted.
[14:01] <MichielM> #0 0x00007f53b6c90a75 in ?? ()
[14:01] <MichielM> (gdb) backtrace
[14:01] <MichielM> #0 0x00007f53b6c90a75 in ?? ()
[14:01] <MichielM> #1 0x00007f53b6c945c0 in ?? ()
[14:01] <MichielM> #2 0x0000000000000000 in ?? ()
[14:01] <wido> MichielM: seems you don't have the debug symbols installed
[14:01] <wido> which distro are you running?
[14:01] <MichielM> ubuntu
[14:01] <MichielM> just dig
[14:01] <MichielM> *did
[14:01] <MichielM> but indeed
[14:02] <MichielM> before the crash i didnt :)
[14:02] <MichielM> i'll restart the daemon with the dbg symbols and kill it again with bonie
[14:02] <MichielM> *bonnie
[14:03] * sakib (~sakib@ has joined #ceph
[14:03] <wido> MichielM: I think that installing the debug symbols would be enough
[14:04] <wido> apt-get install ceph-dbg
[14:04] <MichielM> yeah i did
[14:04] <wido> then your backtrace should display more usefull data
[14:04] <SimonB> bah my mount command is returning 'error 5'. Sorry about this.
[14:04] <MichielM> SimonB: dmesg
[14:04] <wido> SimonB: what does 'ceph -s' show?
[14:05] <wido> OSD's, MDS and MON online?
[14:06] <MichielM> intresting.. mds1 keeps crashing now
[14:06] <MichielM> with a sgfault
[14:06] <SimonB> wido: http://pastebin.com/cL2GdFQG
[14:07] <MichielM> [pid 5962] write(3, "2011-01-12 14:06:51.966557 7f1404ccc710 mds1.cache ... [dir 1 / [2,head] rep@0.0 dir_auth=0 state=268435456 f(v1 m2011-01-12 11:55:25.418833 4=2+2) n(v383 rc2011-01-12 11:57:06.154929 b1453654016 6441=6439+2)/n(v383 rc2011-01-12 11:56:53.038149 b1453654016 6054=6052+2) hs=0+0,ss=0+0 | subtree 0x219f618]\n", 306) = 306
[14:07] <MichielM> [pid 5962] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
[14:07] <SimonB> MichielM: dmesg last says connection established
[14:07] <MichielM> working with auth or not?
[14:07] <MichielM> and indeed what does -s say?
[14:07] <MichielM> (ceph -s)
[14:07] <SimonB> caph -s shows [ 88.303468] ceph: loaded (mon/mds/osd proto 15/32/24, osdmap 5/5 5/5)
[14:08] <SimonB> [ 88.464704] ceph: client4199 fsid 46ee0c21-ca2e-a0fe-e98a-c821dc9d3e78
[14:08] <SimonB> no it doesnt
[14:08] <SimonB> [ 88.465176] ceph: mon0 session established
[14:08] <SimonB> it shows: http://pastebin.com/cL2GdFQG
[14:09] <wido> SimonB: you have no online OSD's
[14:09] <wido> that is not so good :)
[14:09] <MichielM> v
[14:09] <MichielM> pg v3: 528 pgs: 528 creating; 0 KB data, 0 KB used, 0 KB / 0 KB avail
[14:09] <MichielM> ah you win
[14:09] <wido> did the mkcephfs succeed?
[14:09] <SimonB> it didnt throw any errors or warnings.
[14:10] * SimonB goes and looks
[14:10] <MichielM> are the osd processes running ?
[14:11] <SimonB> they were. Bare with me I'm making a mess here.
[14:15] <SimonB> ok found this error, "unable to open OSD superblock" which I guess would explain it.
[14:15] <wido> SimonB: yes, that your mkcephfs went wrong
[14:16] <SimonB> yup. seems so :/
[14:16] <wido> try to shut everything down, remove all the mon / osd data
[14:16] <wido> and do a fresh mkcephfs
[14:25] <SimonB> ok I can see the mkfs failing. But not seeing anything other than snippets of what correct syntax should be.
[14:25] * SimonB goes to check the conf file for typos
[14:27] <wido> SimonB: on what does it fail?
[14:27] <SimonB> Im getting the following (sorry for the c&p)
[14:27] <SimonB> === osd.0 ===
[14:27] <SimonB> umount: /data/osd0: not mounted
[14:27] <SimonB> usage: mkfs.btrfs [options] dev [ dev ... ]
[14:33] <wido> SimonB: could you post your current ceph.conf?
[14:34] <SimonB> http://pastebin.com/eVYFW4SB
[14:38] <wido> SimonB: ah, I think it see your problem
[14:38] <SimonB> that I don't know what Im doing? heh
[14:38] <wido> you should set 'journal dio = false', this is when you are running your journal on a tmpfs (which I assume for /var/run)
[14:41] <SimonB> done and still the same. Sorry about this.
[14:51] <SimonB> so for some reason mkbtrfs is being passed something it doesnt like as far as I can see, but Im just trying to see what is being passed over than btrfs_devs? or what btrfs_devs is returning.
[14:56] <SimonB> ok $btrfs_devs has a null value when running mkcephfs
[15:00] <wido> SimonB: that could be a bug, but I'm not sure
[15:00] <wido> SimonB: no, it is a typo!
[15:00] <SimonB> yup it is :(
[15:00] <wido> you specified "btrfs dev"
[15:00] <wido> should be devs ;)
[15:00] <SimonB> just found that myself too
[15:00] <wido> np :)
[15:04] <SimonB> gone back to error 5 when mounting now
[15:04] * Yoric (~David@ Quit (Quit: Yoric)
[15:08] * Yoric (~David@ has joined #ceph
[15:09] * MarkN (~nathan@ Quit (synthon.oftc.net osmotic.oftc.net)
[15:09] * atg (~atg@please.dont.hacktheinter.net) Quit (synthon.oftc.net osmotic.oftc.net)
[15:09] * darkfader (~floh@host-93-104-226-28.customer.m-online.net) Quit (synthon.oftc.net osmotic.oftc.net)
[15:09] * Meths (rift@ Quit (synthon.oftc.net oxygen.oftc.net)
[15:09] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (synthon.oftc.net oxygen.oftc.net)
[15:09] * ElectricBill (~bill@smtpv2.cosi.net) Quit (synthon.oftc.net oxygen.oftc.net)
[15:09] * Yoric (~David@ Quit ()
[15:10] * Yoric (~David@ has joined #ceph
[15:10] * Meths (rift@ has joined #ceph
[15:10] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[15:10] * ElectricBill (~bill@smtpv2.cosi.net) has joined #ceph
[15:10] * SimonB (~Simon@mail.openminds.co.uk) Quit (synthon.oftc.net charm.oftc.net)
[15:10] * pruby (~tim@leibniz.catalyst.net.nz) Quit (synthon.oftc.net charm.oftc.net)
[15:10] * MK_FG (~MK_FG@ Quit (synthon.oftc.net charm.oftc.net)
[15:10] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (synthon.oftc.net charm.oftc.net)
[15:10] * __jt__ (~james@jamestaylor.org) Quit (synthon.oftc.net charm.oftc.net)
[15:10] * eternaleye (~eternaley@ Quit (synthon.oftc.net charm.oftc.net)
[15:10] * bchrisman (~Adium@ Quit (reticulum.oftc.net synthon.oftc.net)
[15:10] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (reticulum.oftc.net synthon.oftc.net)
[15:10] * johnl_ (~johnl@ Quit (reticulum.oftc.net synthon.oftc.net)
[15:10] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (reticulum.oftc.net synthon.oftc.net)
[15:10] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (reticulum.oftc.net synthon.oftc.net)
[15:11] * bchrisman (~Adium@ has joined #ceph
[15:11] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[15:11] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[15:11] * johnl_ (~johnl@ has joined #ceph
[15:11] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[15:11] * MarkN (~nathan@ has joined #ceph
[15:11] * darkfader (~floh@host-93-104-226-28.customer.m-online.net) has joined #ceph
[15:11] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[15:11] * SimonB (~Simon@mail.openminds.co.uk) has joined #ceph
[15:11] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[15:11] * MK_FG (~MK_FG@ has joined #ceph
[15:11] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[15:11] * eternaleye (~eternaley@ has joined #ceph
[15:11] * __jt__ (~james@jamestaylor.org) has joined #ceph
[15:23] <wido> SimonB: what does ceph -s show now?
[15:23] <wido> btw, you can remove 'debug ms = 1' from your conf
[15:24] <wido> also, remove the 'grou[
[15:24] <wido> group and mount, not needed
[15:24] * __jt__ (~james@jamestaylor.org) Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * MK_FG (~MK_FG@ Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * SimonB (~Simon@mail.openminds.co.uk) Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * eternaleye (~eternaley@ Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * pruby (~tim@leibniz.catalyst.net.nz) Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * atg (~atg@please.dont.hacktheinter.net) Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * darkfader (~floh@host-93-104-226-28.customer.m-online.net) Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * MarkN (~nathan@ Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * johnl_ (~johnl@ Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * bchrisman (~Adium@ Quit (reticulum.oftc.net synthon.oftc.net)
[15:24] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (reticulum.oftc.net synthon.oftc.net)
[15:25] * bchrisman (~Adium@ has joined #ceph
[15:25] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[15:25] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[15:25] * johnl_ (~johnl@ has joined #ceph
[15:25] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[15:25] * MarkN (~nathan@ has joined #ceph
[15:25] * darkfader (~floh@host-93-104-226-28.customer.m-online.net) has joined #ceph
[15:25] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[15:25] * SimonB (~Simon@mail.openminds.co.uk) has joined #ceph
[15:25] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[15:25] * MK_FG (~MK_FG@ has joined #ceph
[15:25] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[15:25] * eternaleye (~eternaley@ has joined #ceph
[15:25] * __jt__ (~james@jamestaylor.org) has joined #ceph
[15:40] <SimonB> ok for my last silly question of the day... Windows based clients? at first I thought samba ontop, but how would file locking then work if you have the same shares on multiple nodes? I know im kind of missing the point here, but Im hoping Im not being dumb again.
[15:41] <wido> SimonB: no clue at all :)
[15:41] <wido> I never use Windows
[15:41] <wido> I heard some people about running Samba which caused problems with file locks
[15:41] <darkfader> SimonB: if you have the same share on multiple nodes the samba servers have to do some kind of distributed locking
[15:42] <SimonB> thats exactly what I thought, I just wanted to make sure I wasnt missing something obvious again :)
[15:42] <darkfader> i think there is something in samba for that these days, but i can't say if it actually works
[15:42] <SimonB> Thanks guys.
[15:42] <wido> SimonB: np! Does the FS work now?
[15:42] <darkfader> it doesn't even work flawless if you use w2k8 file servers or netapps ;)
[15:42] <wido> darkfader: format C:\ ;)
[15:43] <SimonB> it works great thanks. Thank you for helping me with it.
[15:44] <wido> great! Btw, which Ceph version and kernel are you using?
[15:45] <SimonB> kernel 2.6.32-5-amd64 and ceph 0.24.1-1~bpo60
[15:46] <wido> I'd recommend using a newer kernel, since there are a LOT of fixes in the new kernels
[15:47] <wido> 2.6.37 would be better for testing Ceph
[15:47] <wido> also, there are a lot of btrfs fixes since .32
[15:47] <darkfader> wido: can you explain me one thing about btrfs?
[15:48] <darkfader> what will happen if ceph becomes stable before btrfs?
[15:50] <wido> darkfader: Then we have to hope that btrfs works good enough for Ceph :)
[15:50] <SimonB> I will do. I just threw these boxes together yesterday purely for testing ceph, so its just a default Debian Squeeze box.
[15:51] <wido> but sagewk is submitting some stuff towards btrfs, so I guess he'll fix some really important bugs if they show up
[15:51] <darkfader> oh that sounds great
[15:51] <darkfader> <- happy
[15:52] <MichielM> are there any tests online ?
[15:52] <MichielM> or benchmarks
[15:53] <wido> MichielM: not really
[15:56] <MichielM> hmm ok
[15:56] <MichielM> what is the largest set you've test it with ?
[15:56] <MichielM> and is the rbd stuff also available as a kernel patch ?
[15:56] <MichielM> trying to get it move into proxmox :-)
[15:57] <MichielM> 2.6.35 is latest kernel for that, and ceph is included in kernel
[15:58] <wido> MichielM: I had a cluster with 32 OSD's
[15:58] <wido> and there are two flavours or RBD, the kernel one and Qemu-RBD
[15:58] <MichielM> ah got one with 16 now
[15:59] <wido> if Proxmox wants RBD, they should just get the new Qemu version, since RBD is included in Qemu 0.13
[15:59] <MichielM> im able to test with a cluster next week
[16:00] <MichielM> 288 osd's
[16:00] <MichielM> 2 or 3TB each
[16:00] <MichielM> so ceph@PB :)
[16:00] <MichielM> hmm
[16:00] <wido> serious? 288 OSD's? I think the Ceph dev's would like to take a sneak peak in it
[16:00] <MichielM> yeah
[16:00] <wido> are those all physical or virtual machines?
[16:01] <MichielM> physical
[16:01] <MichielM> 12 nodes with 24 disks
[16:01] <wido> Ah, ok :)
[16:01] <MichielM> but im also waiting for a new delivery
[16:01] <MichielM> another 24
[16:01] <wido> There was a guy a few months ago who had 207 physical nodes, all with one disk
[16:01] <MichielM> so 36*24 osd's
[16:01] <MichielM> ah ok
[16:01] <wido> I hope you didn't buy the hardware for Ceph?
[16:01] <MichielM> no :)
[16:02] <MichielM> but does it matter?
[16:02] <wido> but those machines seem right to build a descent test. You should contact them, they might (or else I would) want to take a look and perform some tests
[16:02] <wido> No, not at all. But would be kind of silly to buy such hardware in this state :)
[16:02] <MichielM> 207 physical nodes vs 864 osd's ?
[16:02] <MichielM> still 1 process per osd right
[16:02] <MichielM> yep also got a bunch blades
[16:02] <wido> Yes, 1 proces per OSD
[16:02] <MichielM> 8x20 quad-core with 16GB
[16:03] <MichielM> so power to test
[16:03] <wido> but some nice stuff, that would be great
[16:03] <wido> commercial company or research / educational?
[16:03] <MichielM> comm
[16:03] <wido> k
[16:03] <MichielM> see your /q\
[16:03] <wido> ttyl!
[16:04] <MichielM> bye then
[16:13] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:21] <stingray> gregaf: http://stingr.net/d/stuff/mds.0.log-1.bz2
[16:47] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[16:49] * SimonB (~Simon@mail.openminds.co.uk) Quit (Quit: heading home)
[17:04] <jantje> hi !
[17:23] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) Quit (Quit: Leaving.)
[17:45] * sagephone (~yaaic@ has joined #ceph
[17:46] <sagephone> will be in late this morning...11:30 or so. let's meet then
[17:50] * greglap (~Adium@ has joined #ceph
[17:51] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:58] <greglap> hi jantje
[18:03] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:10] * Yoric (~David@ Quit (Quit: Yoric)
[18:43] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:43] * sagephone (~yaaic@ Quit (Ping timeout: 480 seconds)
[18:48] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:55] * jantje (~jan@paranoid.nl) Quit (Read error: Connection reset by peer)
[18:55] * jantje (~jan@paranoid.nl) has joined #ceph
[19:08] <wido> I think the current unstable is a bit broken
[19:09] <wido> well, building it with dpkg-buildpackage -j4 fails due to unfinished jobs
[19:11] <wido> Hmm, it won't build in a single process either: http://pastebin.com/kpL1juPi
[19:11] <wido> I'm running Ubuntu 10.04 btw
[19:13] <gregaf> cmccabe: looks like merging refactor_pg didn't work out too well
[19:17] * cmccabe (~cmccabe@ has joined #ceph
[19:19] <wido> gregaf: cmccabe was not here ;)
[19:19] <wido> cmccabe: looks like merging refactor_pg didn't work out too well (by gregaf)
[19:19] <gregaf> heh, I noticed, but he's on jabber and he's looking at it
[19:19] <wido> oh, ok :)
[19:19] <cmccabe> wido, gregaf: looks like a compiler version skew problem
[19:19] <cmccabe> wido, gregaf: your version needs more #include
[19:19] <wido> I thought so, i'm running gcc 4.4.3
[19:19] <cmccabe> 4.2.4
[19:20] <gregaf> hmm
[19:20] <gregaf> oh right, you're still on that older machine
[19:20] <cmccabe> technically it's the includes of libstdc++ that is the problem
[19:20] <wido> gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
[19:20] <gregaf> gcc (Debian 4.4.5-6) 4.4.5
[19:22] <cmccabe> I have fixed it here, will rewrite unstable's history in one sec
[19:23] <wido> rewriting history, that is interesting ;)
[19:23] <cmccabe> it's mostly so that git-bisect can work
[19:23] <cmccabe> ok, it's out there
[19:23] <cmccabe> you will probably need to re-create your unstable checkout
[19:25] <Tv|work> rewriting git history is usually a bad idea :(
[19:25] <gregaf> nah, just git fetch; git reset —hard origin/unstable
[19:25] <gregaf> Tv|work: we only do it on recent pushes of unstable
[19:25] <gregaf> though we do try and avoid it generally speaking
[19:26] <Tv|work> that just makes unstable generally unusable as a base of work
[19:26] <Tv|work> like, what if i had started a branch off of it, and didn't happen to hear you mention that
[19:26] <gregaf> well we've only had trouble with it once so far
[19:27] <cmccabe> tv: the main reason I did it is so that git-bisect would work
[19:27] <gregaf> and having merge commits that don't compile makes bisecting later hell
[19:27] <cmccabe> tv: so far, we've had more problems with bisect not working than with people getting burned by history rewrites
[19:27] <cmccabe> tv: however, maybe I should send a mesasge to the list?
[19:27] <Tv|work> yeah, the autobuilder found plenty of non-buildable commits already ;)
[19:28] <gregaf> cmccabe: looks like you killed our merge commits though, now it's a flat tree :(
[19:28] <Tv|work> ahahaha
[19:28] <cmccabe> blah
[19:28] <Tv|work> here's an idea
[19:28] <Tv|work> restore unstable to what it was before the merges
[19:28] <Tv|work> work on another branch until you two agree its good
[19:28] <cmccabe> there is no disagreement; it's just a compiler version skew thing
[19:28] <Tv|work> until it compiles for both of you
[19:29] <cmccabe> let's just redo the merges
[19:29] <cmccabe> there were two of them right?
[19:29] <cmccabe> refactor_pg and then ?
[19:30] <cmccabe> gregaf: what was the name of the branch you merged
[19:30] <gregaf> uclient_dentries
[19:31] <cmccabe> actually, I'm a little worried we might screw something else up
[19:31] <cmccabe> like yehuda's commits probably weren't on uclient_dentries or refactor_pg
[19:31] <gregaf> yeah, don't change it again
[19:32] <cmccabe> maybe it's best to leave things as-is
[19:32] <gregaf> it's just sad to lose our merge commits
[19:32] <Tv|work> the flat tree sounds pretty bad
[19:32] <Tv|work> i'd rather restore what was & fix that
[19:32] <cmccabe> tv: the branches were just like 4 or 5 commits each
[19:32] <cmccabe> tv: the amount of information lost is pretty minimal
[19:32] <gregaf> I miss it sometimes myself but try to remember in the future to check if it made a merge commit, otherwise redo it with —no-ff to force one
[19:33] <gregaf> yeah, mine was only 2 and was only a separate branch because I wanted Sage to check it before push anyway ;)
[19:33] <Tv|work> cmccabe: i hate it when history is misleading
[19:33] <Tv|work> i'm exactly the type that goes digging in old commits to find out what something was meant for & why it was done the way it was
[19:33] <Tv|work> it's not just the flat tree shape, it's lost commit messages etc
[19:33] <cmccabe> tv: there are no lost commit messages
[19:34] <cmccabe> tv: except I guess the two merge commits, but I don't think they had any text besides the auto-generated
[19:34] <Tv|work> did you end up rebasing instead of merging, or what?
[19:34] <cmccabe> tv: anyway, if you can re-create those merge commits, it would be nice. I don't think it's worth a lot of time though.
[19:34] <cmccabe> tv: the merges were so trivial that there was nothing to explain
[19:35] <gregaf> I think it probably just fast-forwarded, tv
[19:35] <Tv|work> oh i'm fine with ff
[19:35] <cmccabe> tv: the only reason we even did merge commits instead of fast-forwarding is because sage likes to know when a feature or idea goes in
[19:35] <Tv|work> you guys made it sound like you squashed one branch of the merge into just a single (non-merge) commit with the merge commit message
[19:35] <cmccabe> tv: and merge commits are kind of a mental note of that
[19:35] <gregaf> ah, no
[19:36] <gregaf> heck, back when I started sage was still trying to maintain a flat tree and liked us to rebase stuff before pushing :p
[19:36] <gregaf> never did quite figure that one out
[19:36] <cmccabe> tv: another reason why we sometimes do merge commits, even when fast-forwarding would be simple, is to make reverting something simpler
[19:36] <Tv|work> i end up doing a lot with topic branches, instead
[19:36] <Tv|work> merge when you approach a minor release
[19:36] <Tv|work> lets you postpone the decision of what to integrate
[19:36] <Tv|work> so anyway, who wants to explain "dout" to me?
[19:37] <cmccabe> tv: I supposed that would be me
[19:37] <cmccabe> tv: so basically, dout is our logging mechanism.
[19:37] <gregaf> we're still operating in tiny team mode, we can probably talk about branching in our meeting today, it'll fit in well with other topics :)
[19:37] <cmccabe> tv: dout is our main (really, in most cases, ony) mechanism of knowing what is going on
[19:37] <Tv|work> where's the code?
[19:37] <cmccabe> tv: debug.cc, DoutStreambuf.cc, debug.h
[19:38] <cmccabe> tv: dout has multiple log levels and the goal is that when the user sets a high log level, many messages are logged.
[19:38] <cmccabe> tv: at a lower log level, those messages are never printed and incur almost 0 overhead
[19:38] <Tv|work> yeah i saw the users
[19:39] <cmccabe> dout can log to syslog, a file, and stderr. Or all 3 simultaneously if you want.
[19:39] <cmccabe> command-line programs always log to stderr.
[19:39] <cmccabe> users almost always configure daemons to log to a file
[19:39] <Tv|work> i wish that were so
[19:39] <Tv|work> there's a bunch of clis trying to log to /var/log
[19:40] <cmccabe> tv: that should be fixed in unstable
[19:40] <stingray> gregaf:
[19:40] <cmccabe> tv: that is what the set_foreground_logging() function does
[19:40] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[19:40] <cmccabe> tv: all command-line programs should call it immediately after common_init
[19:41] <Tv|work> rebuilding with latest unstable to see if that's true
[19:41] <gregaf> stingray: sorry I haven't been too responsive the last few days, I've been taking some sick time
[19:41] <stingray> np
[19:41] <gregaf> I did see your objecter logs and scanned them briefly, I can take a deeper look shortly
[19:41] <stingray> me too
[19:41] <stingray> there's a logfile for you
[19:41] <stingray> ah ok
[19:41] <stingray> don't worry too much
[19:42] <stingray> I'm going home now, see you tomorrow
[19:42] <stingray> :)
[19:42] <cmccabe> tv: your unit tests will probably need to call set_foreground_logging in the setup hook
[19:42] <gregaf> okay, I'll let you know if I figure anything out or need more data :)
[19:42] <gregaf> feel better!
[19:42] <Tv|work> cmccabe: not talking about unit tests, talking about the cli tools
[19:42] <Tv|work> cmccabe: and not seeing output change on the /var/log parts
[19:43] <cmccabe> tv: for what tool?
[19:44] <Tv|work> cmccabe: just wait for clitests branch to be merged, this'll be easier then
[19:44] <Tv|work> right now fixing that will just cause trouble during merge
[19:44] <Tv|work> (the tests are assuming the old behavior)
[19:44] <cmccabe> tv: it logs to stderr, never to stdout
[19:45] <cmccabe> tv: so it should not affect your tests
[19:45] <cmccabe> tv: anyway, if you spot an actual bug, let me know
[19:46] * earth (~summer@75-137-144-177.dhcp.gwnt.ga.charter.com) has joined #ceph
[19:48] <cmccabe> tv: one problem that we have currently is that if people call dout() before calling set_foreground_logging(), the logs will go to the wrong place from then on
[19:48] <cmccabe> tv: unfortunately this is not well-documented or enforced in the initialization code
[19:49] <cmccabe> tv: I should probably put in a workaround to re-initialize dout when set_foreground_logging is called.
[20:01] * alexxy (~alexxy@ Quit (Remote host closed the connection)
[20:05] * alexxy (~alexxy@ has joined #ceph
[20:14] <Tv|work> alright i'm wasting a lot of time playing catchup with the tests (against a changing unstable) -- can we get a decision on the gtests branch? gregaf? sagewk? weigh in please.
[20:14] <gregaf> oh, sorry, I haven't had a chance to look at it yet, sage isn't in yet for some reason...
[20:14] <gregaf> what kind of decision were you looking for?
[20:15] <Tv|work> ideally, the kind that's spelled "git merge gtest"
[20:15] <cmccabe> tv: ask sage to take a look
[20:15] <Tv|work> i don't want to put it in unstable without an explicit ok -- i don't know the codebase nearly as well as you guys do, so i'm extra careful
[20:16] <Tv|work> and, well, there are no tests to reassure me ;)
[20:16] <gregaf> heh
[20:16] <cmccabe> tv: another thing you can do is rebase -i unstable
[20:16] <cmccabe> tv: that should get the changes from unstable pretty easily
[20:16] <gregaf> I'm not sure this will help you avoid wasting time against the changing unstable, though...?
[20:16] <Tv|work> cmccabe: the trouble is the moving target, not my lack of tools to cope with it
[20:17] <Tv|work> gregaf: well, that can make everyone fix the tests they break..
[20:17] <gregaf> or you just mean you want to checkout unstable and run the tests
[20:17] <gregaf> got it
[20:17] <Tv|work> and i just want to tie up loose ends before i move on too much
[20:18] <cmccabe> tv: what particular changes to unstable were hard to deal with recently?
[20:18] <cmccabe> tv: besides my ham-fisted rebase
[20:19] <Tv|work> cauthtool output kept fluctuating
[20:19] <Tv|work> not bad but just made me realize i'm chasing a moving target
[20:19] <cmccabe> tv: so this ties into yehuda's concern that parsing stdout will be a maintenance burden
[20:19] <Tv|work> cmccabe: well, the output change was made for a reason
[20:19] <Tv|work> the whole idea behind testing is that you're more likely to get it right from the start
[20:20] <Tv|work> look, i'm an advocate of TDD (at least in more flexible languages than C++); if you don't want to have tests we're just going to have to agree to disagree
[20:20] <cmccabe> tv: I tend to favor integrating the tests with the programs
[20:20] <Tv|work> i've already exposed segfaults, bad logging, bad error messages, ...
[20:21] <cmccabe> tv: for example, I created a few cephtool commands that were intended specifically for automated testing
[20:21] <cmccabe> tv: their output is not very human-friendly, but it is test-friendly
[20:21] <Tv|work> i'm not advocating you test the tests
[20:21] <cmccabe> tv: the point is that having special test hooks can prevent the frustration of chasing constantly changing human-readable output
[20:21] <Tv|work> but if you're writing something just for tests, maybe the underlying tool isn't very good..
[20:22] <gregaf> Tv|work: yehudasa is asking that you make the gtest stuff require a flag to build, rather than compiling automatically
[20:22] <gregaf> and did you mean to commit the Apple Xcode project files?
[20:22] <Tv|work> gregaf: i think it won't build until you say "make check" -- is that good enough?
[20:22] <Tv|work> i can confirm that, let me just clean my tree
[20:22] <gregaf> oh, yeah, that's fine
[20:23] <gregaf> I haven't built it or anything yet, just looking at the files ;)
[20:23] <cmccabe> gregaf: tv believes that gtest needs to be bundled code, and the Xcode project files are part of that :P
[20:23] <cmccabe> gregaf: he found a comment on a wiki that said some issues could result if libgtest was compiled with different flags than the code using it
[20:23] <gregaf> cmccabe: having separate testing paths is a recipe for both useless testing and additional code maintenance
[20:24] <Tv|work> cmccabe: Xcode?
[20:24] <cmccabe> gregaf: the code paths are not separate, the only thing that's separate is the user interface vs. testing interface
[20:24] <gregaf> *shrug*
[20:24] <cmccabe> gregaf: the important thing is not the user interface-- that's not where our bugs are.
[20:24] <Tv|work> cmccabe: maybe your ui just sucks...
[20:24] <gregaf> I've not even been in industry that long and I've personally seen things like that where the test output was fine and the user output was nonsensical
[20:24] <Tv|work> cmccabe: think about git porcelain vs plumbing, for inspiration
[20:24] <cmccabe> gregaf: the important thing is to test the underlying code
[20:25] <Tv|work> cmccabe: your user interface is the reason you can't have more users at this point
[20:25] <Tv|work> cmccabe: new user experience: ceph is a buggy pile of crap
[20:25] <cmccabe> can we all agree that what needs work in ceph is stability and performance, not fancy UIs
[20:25] <cmccabe> fancy UIs won't help if stability is not there
[20:25] <Tv|work> i count ~3 segfaults per day just using the basic command line tools
[20:25] <Tv|work> not fancy
[20:25] <cmccabe> therefore, I propose testing stability rather than UI
[20:25] <cmccabe> I don't think this is a radical concept
[20:25] <gregaf> the UI is part of stability
[20:26] <cmccabe> our UI does need work, but that's a separate issue
[20:26] <cmccabe> actually, tv, have you run gecph yet?
[20:26] <cmccabe> I'm sure you'll have a lot of ideas for improving it... it's very rough.
[20:26] <yehudasa> cmccabe: it's all part of the user experience.. both stability and config
[20:26] <sagewk> ok guys let's move this discussion into the meeting
[20:26] <cmccabe> automated tests can't really solve user experience problems
[20:27] <Tv|work> osdmontool not crashing when you don't follow the magic incantation 100% is not "fancy"
[20:27] <sagewk> now? :)
[20:27] <cmccabe> tv: that get more into fuzzing
[20:27] <cmccabe> ok
[20:37] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[20:48] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[20:56] <Tv|work> confirmed: gtest does not compile until you say "make check" (it does get automake'd, but that's different) (and it's a very light compile anyway)
[20:57] <Tv|work> confirmed: src/gtest gets included in "make dist" tarball, without any changes
[20:58] <sagewk> tv: ok cool
[20:59] * raso (~raso@debian-multimedia.org) Quit (Ping timeout: 480 seconds)
[20:59] <Tv|work> confirmed: the dist tarball passes "./configure --with-debug && make && make check"
[21:00] <Tv|work> with gtest running too
[21:01] <sagewk> ok, i'll go ahead and merge it into unstable. can you send an email briefly summarizing how to write/run the tests?
[21:01] <sagewk> maybe it should go in the wiki actually
[21:02] <Tv|work> sagewk: or source tree itself..
[21:02] <Tv|work> sagewk: gtest has pretty good docs itself, but i'll make sure to point them out, somehow
[21:02] <Tv|work> (part of why i picked gtest was the quality of that project)
[21:40] * Meths_ (rift@ has joined #ceph
[21:45] * Meths (rift@ Quit (Read error: Operation timed out)
[21:57] * bchrisman (~Adium@ has joined #ceph
[21:57] <Tv|work> sagewk: here's what i mentioned briefly: http://autotest.kernel.org/ http://www.linux-kvm.org/page/KVM-Autotest -- haven't evaluated yet
[21:58] * bchrisman (~Adium@ Quit ()
[21:58] <cmccabe> tv: kvm-autotest / autotest looks like a tool for writing single-machine tests, is that correct?
[21:59] * Meths_ is now known as Meths
[21:59] <Tv|work> cmccabe: haven't evaluated yet
[21:59] <cmccabe> tv: I almost feel like I made a mistake by calling what I wrote a test framework, when it's much more of a cluster management framework
[21:59] <cmccabe> tv: which different tests / test frameworks could be plugged in to
[22:00] <cmccabe> tv: however if a test framework has a stubborn "only testing one machine in isolation" bias, it might not be useful for certain tests
[22:00] <cmccabe> gtg, lunch in the SF office!
[22:00] <sagewk> slides: "Enables coordinated multimachine tests"
[22:01] <cmccabe> sagewk: cool. back in a bit
[22:01] * bchrisman (~Adium@ has joined #ceph
[22:01] <Tv|work> doing anything single machine is a vanishing quality in the modern world
[22:03] <Tv|work> back at the firewall gig i had, we managed resource allocation manually (whiteboard) because we often needed to change the network cables around etc for specific stress tests, but just being able to launch an 7-machine test (2 clients, 2 destinations, 3 firewalls) from a single command line made things much more comfortable
[22:03] <sagewk> http://autotest.kernel.org/wiki/AutotestApi -- there's a section on multi-machien server side tests
[22:07] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[22:11] <Tv|work> sagewk, *: http://ceph.newdream.net/wiki/Testing
[22:12] <sagewk> should there be a separate makefile in src/test/Makefile.am ?
[22:13] <Tv|work> sagewk: perhaps, i didn't want to make a change that "big" in that same commit
[22:13] <Tv|work> sagewk: since it didn't exist before, and gtest isn't the only thing from current Makefile.am that belongs there
[22:14] <Tv|work> sagewk: my instincts were to roll this thing in fully (clitests too) and then see if there's cleanup to do
[22:15] <sagewk> ok. pushed now.. let's separate those out to avoid further cluttered in the already cluttered main makefile :)
[22:16] <Tv|work> oh yes please
[22:17] * bchrisman (~Adium@ has joined #ceph
[22:17] <Tv|work> alright wiki page for Testing is as good as i can make it right now, if you think something is missing give me hints what to write about
[22:18] <Tv|work> oh one more thing, make check runs clitests too
[22:18] <sagewk> you mentioned an interactive mode for updating (and generating initial?) cli tests?
[22:19] <Tv|work> sagewk: yeah i didn't write an explicit "use it like this" in the wiki, i don't want to be copy-pasting upstream docs.. pass -i
[22:20] <Tv|work> to me it felt easier than riding a bicycle; i tried it once and will know it the rest of my life
[22:20] <Tv|work> i did put in an example of (re), that was harder to grok
[22:21] <Tv|work> 5 minutes hands on with this should make it feel pretty trivial
[22:21] <sagewk> k
[22:23] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[22:24] <Tv|work> ooh crap run-cli-tests assumes working dir to be src/test, fixing
[22:28] <Tv|work> there
[22:28] <gregaf> Tv|work: did you link in that testing page from anywhere else on the wiki?
[22:29] <Tv|work> gregaf: not yet, looking for suitable place
[22:29] <gregaf> k
[22:29] <Tv|work> best i can come up with is under "Misc" on main page
[22:29] <gregaf> just checking since I couldn't find it
[22:29] <Tv|work> but that just seems like a trashcan
[22:29] <gregaf> yeah, it's getting big enough that we may need to reorganize
[22:29] <Tv|work> oh well i'll put it in the trashcan for now ;)
[22:30] <gregaf> at some point we may want to add a "Developer" section to go along with all the admin stuff or whatever
[22:30] <Tv|work> yes
[22:31] <sagewk> "at some point" could be now
[22:31] <Tv|work> it might need a bit more vision
[22:31] <Tv|work> sagewk: this is stuff i've been wanting to wrestle out of you guys.. like "what really is an osdmap" etc
[22:32] <Tv|work> now that the stuff i've been working on is merged, i'll try to spend some time on "new developer love"
[22:32] <sagewk> until now that's been in the RTFM (where m == osdi paper) category
[22:32] <Tv|work> (while giving you old developers tough love about writing tests ;)
[22:33] <Tv|work> sagewk: yeah i'm more looking at a "quick overview & reminder" kind of stuff than actual explanations
[22:33] <gregaf> it would be good to get that information somewhere more accessible
[22:33] <Tv|work> sagewk: like, i want to draw a picture with the different components, and a bunch of funky-colored arrows
[22:33] <sagewk> yep
[22:34] <sagewk> we'll be getting some technical writers soon that we can shunt some of that off to
[22:35] <Tv|work> well i'll use drawing diagrams as a learning tool anyway ;)
[22:35] <Tv|work> but <3 a good technical writer
[22:45] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:46] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has left #ceph
[22:47] <gregaf> Tv|work: is run-cli-tests designed so that if any tests fail it will return an error code?
[22:47] <gregaf> or when you run make check do you need to do a manual inspection of the output?
[22:47] <Tv|work> gregaf: should exit with non-zero on failures
[22:48] <gregaf> cool, just checking
[22:48] <Tv|work> later on we can plug test & clitests etc into a unified result display, but now it'll just run one then the other
[22:48] <Tv|work> actually, multiple runs of clitests, then multiple runs of unittests; one run per "container"
[22:49] <Tv|work> unified result display more for CI type of stuff
[23:04] <Tv|work> i wrote this in an attempt to clear things up: http://ceph.newdream.net/wiki/TestingVocabulary feedback is welcome!
[23:05] <Tv|work> especially on whether you guys agree (/find meaningful) the system vs integration test thing
[23:05] <Tv|work> because i find i often need that boundary, when talking about test setups
[23:07] <cmccabe> to me, testing on a cluster, with only ceph software, should be considered a system test
[23:07] <cmccabe> since integration test tends to connote 3rd party software
[23:07] <gregaf> yeah, I'm not sure what the difference would be between a system and integration test
[23:07] <Tv|work> cmccabe: what would you calling running e.g. just osd & rgw locally, then?
[23:07] <Tv|work> i'm not 100% happy with the language
[23:07] <Tv|work> but i find that distinction to be needed often
[23:07] <Tv|work> and think we should have a way of saying it
[23:07] <cmccabe> tv: running locally is another kind of system test, although a less useful one
[23:07] <gregaf> I don't think there's really a difference between running daemons on one machine and running them on 2 machines...
[23:07] <Tv|work> cmccabe: it's often very much enough, during development
[23:08] <cmccabe> tv: we want to get away from running things locally all the time
[23:08] <Tv|work> cmccabe: not all the time
[23:08] <Tv|work> but it's way faster to develop that way
[23:08] <cmccabe> tv: it really conceals a lot of problems in my opinion
[23:08] <Tv|work> no single solution covers the whole space, but you don't want to run cluster-based tests all the time either
[23:08] <cmccabe> tv: we have enough hardware that you should be able to just queue up a real test and wait for it to finish
[23:08] <Tv|work> cmccabe: you'll kill productivity very fast that way
[23:08] <cmccabe> tv: I know this is practical because I worked at a company where we did it.
[23:08] <cmccabe> tv: our software was solid.
[23:09] <cmccabe> tv: and the tests took between 15-30 minutes in general
[23:09] <Tv|work> a good edit-compile-test is definitely <5 minutes
[23:09] <cmccabe> tv: well, I think there is a role for single-machine tests.
[23:09] <Tv|work> sure, that doesn't cover everything -- but you don't want everything all the time
[23:09] <cmccabe> tv; for example all unit tests are single-machine.
[23:10] <cmccabe> tv: we need to be testing things the way our customers will use them, rather than in some artificial environment.
[23:10] <Tv|work> cmccabe: by that argument, we shouldn't do unit tests either
[23:10] <Tv|work> cmccabe: there's multiple goals
[23:10] <cmccabe> tv: no, unit tests have a different goal
[23:10] <Tv|work> quick validation of basics is very important
[23:10] <cmccabe> tv: right. See my above comment
[23:10] <cmccabe> (02:10:00 PM) cmccabe: tv: well, I think there is a role for single-machine tests.
[23:10] <cmccabe> (02:10:06 PM) cmccabe: tv; for example all unit tests are single-machine.
[23:10] <Tv|work> so what do you call them?
[23:11] <cmccabe> call what?
[23:11] <Tv|work> the tests that are above unit tests and below integration/whatever tests
[23:12] <cmccabe> I think the point gregaf was making is that he considers tests on multiple nodes to also be system tests
[23:12] <gregaf> Tv|work: you're going to need to give me some examples of what you'd consider integration tests and what you'd consider system tests
[23:12] <cmccabe> I agree with him, but will use whatever nomenclature you guys decide on.
[23:13] <Tv|work> imagine me working on an osd bug.. i'll run it locally, even tied to a gdb, while feeding it "pre-prepared input" that's known to trigger this bug
[23:13] <gregaf> one node or many might to your mind indicate the type of test you're running, but I've run all the tests I've run on a single node and on many nodes, so I don't know what you're driving at here
[23:13] <Tv|work> now maintain a collection of these inputs, as a regression test
[23:13] <Tv|work> that's more than a unit test, definitely -- but only needs osd & maybe mon to run
[23:14] <cmccabe> small interruption:
[23:14] <gregaf> so you'd call that a system test?
[23:14] <Tv|work> perhaps
[23:14] <cmccabe> it's impossible (currently) to feed an OSD input directly
[23:14] <Tv|work> i'll use any words that make sense, as long as we can agree on something
[23:14] <Tv|work> cmccabe: open a socket
[23:14] <cmccabe> we instead rely on running all the daemons and running test sequences
[23:14] <cmccabe> so in essence you run the whole system
[23:14] <cmccabe> a system which you're testing
[23:15] <Tv|work> hence me calling it a system test
[23:15] <cmccabe> :)
[23:15] <cmccabe> but if I run all the daemons, but they happen to be on different machines, that is also a system test
[23:15] <cmccabe> it's not integrating with anything 3rd party
[23:15] <Tv|work> sure
[23:15] <cmccabe> ok
[23:15] <Tv|work> but an integration test is something more; it's a thing that won't be very useful unless given realistic conditions
[23:16] <Tv|work> where as that definition of system testing might not be very realistic
[23:16] <gregaf> so an integration test would be?
[23:16] <cmccabe> I'm fine with "unrealistic" system tests
[23:16] <cmccabe> that expose bugs by running in wild and wacky ways
[23:16] <cmccabe> I wrote some myself... check out test_unfound.sh and test_lost.sh
[23:17] <Tv|work> gregaf: e.g. metadata cluster on separate nodes to expose latency issues, kernel mounts to trigger those bugs, kernel compile running on a client machine to trigger lots of realistic IO
[23:17] <sagewk> can we not spend all day debating test framework please?
[23:18] <sagewk> let's spend some time looking at existing options, so we don't invent the wheel, and then do a skype later this afternoon or tomorrow
[23:18] <cmccabe> do we have a menu of existing options yet?
[23:18] <Tv|work> sagewk: this is not really a fight about frameworks though
[23:18] <sagewk> also, let's keep in mind that i'm reviewing resumes and scheduling interviews for a qa manager as we speak. :)
[23:19] <Tv|work> this is me trying to avoid 2 weeks down the line people saying "smoke" when they mean "performance", etc
[23:19] <cmccabe> sagewk: I'm really glad to hear we're looking at staffing for qa. It will make my job so much easier!
[23:20] <cmccabe> sagewk: I've been writing C++ for 10 years, would like to get back to that :)
[23:20] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has left #ceph
[23:20] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[23:20] <cmccabe> sagewk: so do you want me to do anything to prepare for this meeting?
[23:21] <ijuz__> well, there is also the possibility to do coverage testing, when you hit all code or "feature groups", then you can be halfway sure that things are working properly, then you develop your tests to reach a coverage that is as high as possible (that is at least like it is done for hardware) (and i'm not sure how good the coverage tools for C code are)
[23:21] <sagewk> look at autotest?
[23:21] <cmccabe> k
[23:22] <Tv|work> ijuz__: coverage is more like a tool to discover where your tests suck
[23:22] <Tv|work> Tv|work: so maybe under the "styles" header, that would apply
[23:23] <Tv|work> added
[23:24] <sagewk> cmccabe: or similar frameworks? it doesn't feel like we should need to roll our own scheduling/queueing daemons to do any of this
[23:24] <gregaf> Tv|work: so I think I understand your difference between system (I'd call it a module test, maybe?) and integration testing now, and I'm not sure that's a useful distinction to make when testing Ceph
[23:24] <gregaf> maybe that's just because we've been conflating these tests for the last year (look at the qa "workunits")
[23:25] <gregaf> but as a distributed system Ceph daemons are pretty heavily interdependent on the state others are holding
[23:25] <Tv|work> gregaf: it's more, "system" tests are faster to run than "integration" tests, so developing with those is more pleasant and pinpoints bugs better
[23:26] <Tv|work> gregaf: stuff like, if you're working on the kernel client, "system" might mean running in uml+gdb, "integration" might mean actual hardware or kvm
[23:26] <gregaf> well that sounds more like a thing of timing than inherent differences in the test
[23:27] <gregaf> like if you're running any tests on a running OSD, you're going to need to have either the monitor or a monitor simulator running
[23:27] <gregaf> we can separate short-running tests from long-running tests as a practical matter
[23:27] <Tv|work> gregaf: think about it this way: "integration" tests run as root and stomp all over the machine; "system" tests try to be nice & isolated
[23:27] <ijuz__> Tv|work: yes and that is IMO in the end what you want to know
[23:27] <gregaf> and we can split up Ceph tests and RADOS tests
[23:28] <gregaf> nice and isolated?
[23:28] <Tv|work> gregaf: for something like apache, you might say "system" test is "runs as my own user on a high port" and "integration" test is "starts as root, switched uids, listens to port 80, serves different vhosts as different users, ..."
[23:28] <cmccabe> sagewk: autotest has a shared machine pool and scheduler, that looks promising
[23:29] <cmccabe> sagewk: I will try to identify other competitors though so this isn't a no-bid contract, so to speak
[23:29] <sagewk> k thanks
[23:32] <gregaf> well to me this continues to just sound like a question of test scale rather than an actual difference in code coverage or methodology or whatever
[23:32] <gregaf> but it's not something i'm too worried about
[23:32] <sagewk> it sounds like it's the same harness to do both.. and more of a qualitative description of the test itself?
[23:33] <Tv|work> part of the framework should be shared if possible, sure
[23:33] <Tv|work> but by that definition, "system" tests shouldn't e.g. touch etc
[23:33] <Tv|work> so there's differnces too
[23:33] <cmccabe> as long as can run a test on 20 nodes to cover my rear end after changing the OSD code, you can call it a purple people eater for all I care
[23:33] <Tv|work> by etc i mean /etc
[23:34] <cmccabe> also I'm keen to dive into the performance optimization stuff
[23:34] <Tv|work> cmccabe: way too early for optimization
[23:34] <gregaf> in general performance optimizations aren't appropriate at this point, cmccabe
[23:35] <gregaf> unless we have customers telling us that we're an order of magnitude off in some task (indicating an architecture issue), we need a demonstrably stable system before we focus on optimizing performance
[23:35] <cmccabe> agree completely
[23:35] <cmccabe> however, I have some bugs sitting in my queue that do talk about order of magnitude problems
[23:36] <cmccabe> anyway. The time allocation stuff is something we'll talk about at scrum no doubt.
[23:37] <cmccabe> as always
[23:45] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.