#ceph IRC Log


IRC Log for 2011-11-21

Timestamps are in GMT/BST.

[2:09] * nolan (~nolan@phong.sigbus.net) Quit (Remote host closed the connection)
[2:09] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[2:12] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit ()
[2:39] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[2:55] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[3:29] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[3:29] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:26] * andresambrois (~aa@r190-135-153-236.dialup.adsl.anteldata.net.uy) Quit (Remote host closed the connection)
[5:59] * pruby (~tim@leibniz.catalyst.net.nz) Quit (Ping timeout: 480 seconds)
[6:01] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[6:26] * stass (stas@ssh.deglitch.com) has joined #ceph
[7:01] * tnt_ (~tnt@98.107-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:55] * tnt_ (~tnt@98.107-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[8:03] * tnt_ (~tnt@98.107-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:17] * tnt_ (~tnt@98.107-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[8:22] * tnt_ (~tnt@217.151-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:27] * tnt___ (~tnt@20.124-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:32] * tnt_ (~tnt@217.151-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[8:36] * tnt___ (~tnt@20.124-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[8:50] * tnt__ (~tnt@68.140-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[9:02] * tnt___ (~tnt@198.143-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[9:04] * tnt__ (~tnt@68.140-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:13] * jclendenan (~jclendena@ has joined #ceph
[9:29] * tnt__ (~tnt@42.147-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[9:31] * tnt___ (~tnt@198.143-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[10:41] * tnt____ (~tnt@42.147-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[10:43] * tnt__ (~tnt@42.147-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[11:06] * tnt____ (~tnt@42.147-67-87.adsl-dyn.isp.belgacom.be) Quit (Read error: Connection reset by peer)
[11:08] * tnt_ (~tnt@42.147-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[11:16] * korobkov (~korobkov@fryxell.info) has joined #ceph
[11:19] <korobkov> Hello, all. I'm a newcomer and just have set up a small test ceph cluster. Could you tell me, is there a risk of data loss or corruption? What should I do to minimize this risk? Is recovery possible?
[11:20] <korobkov> (yes, I konw, ceph is still experimental. but I don't know, what is and isn't dangerous in using it... and how dangerous...)
[11:21] * tnt___ (~tnt@42.147-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[11:23] * tnt_ (~tnt@42.147-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[11:33] * tnt___ (~tnt@42.147-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[11:46] * The_Bishop (~bishop@port-92-206-183-175.dynamic.qsc.de) Quit (Remote host closed the connection)
[11:51] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:51] * fronlius (~fronlius@testing78.jimdo-server.com) Quit ()
[11:51] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:51] * Nightdog (~karl@190.84-48-62.nextgentel.com) has joined #ceph
[11:54] * Colomonkey (~r.nap@ Quit (Quit: Restart)
[11:54] * rosco (~r.nap@ has joined #ceph
[11:55] * rosco is now known as colomonkey
[12:03] * The_Bishop (~bishop@port-92-206-183-175.dynamic.qsc.de) has joined #ceph
[13:04] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[13:48] <psomas> Is there any way to turn osd debugging/verbose logging on the fly? injectargs and sighup don't seem to work
[13:53] <todin> psomas: why don't they work?
[13:54] <psomas> The wiki says ceph mds tell '(mds name)' injectargs --debug_mds 20
[13:54] <psomas> I tried that but s/mds/osd/g, and it didn't work, in the logs i saw "ignoring empty injectargs"
[13:54] <psomas> and the sighup doesn't seem to work, ie the new conf is not read/loaded
[14:11] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[14:11] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit ()
[14:12] <todin> psomas: I don't have an mds, but for an osd it works
[14:12] <todin> ceph osd tell 1 injectargs '--debug-osd 20'
[14:13] <todin> here the osd side
[14:13] <todin> 2011-11-21 14:12:16.186747 7ff1a8937700 osd.1 13 do_command tid 0 [injectargs,--debug-osd 0]
[14:13] <todin> 2011-11-21 14:12:16.186979 7ff1a8937700 osd.1 13 do_command r=0 applying configuration change: debug_osd = '0'
[14:13] <todin> 2011-11-21 14:12:16.187019 7ff1a8937700 log [INF] : applying configuration change: debug_osd = '0'
[14:16] * Meths (rift@ Quit (Read error: Connection reset by peer)
[14:16] * Meths (rift@ has joined #ceph
[14:54] * korobkov (~korobkov@fryxell.info) Quit (Remote host closed the connection)
[14:58] * korobkov (~korobkov@fryxell.info) has joined #ceph
[15:28] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) has joined #ceph
[15:41] * korobkov (~korobkov@fryxell.info) has left #ceph
[15:44] * PuerScyphu (~PuerScyph@client-212-117-1-158.inturbo.lt) has joined #ceph
[15:44] <PuerScyphu> http://baron.proudandloud.com Create your own unique T-SHIRT.In our website you can upload your image and create your own design and order the product you made
[15:44] * PuerScyphu (~PuerScyph@client-212-117-1-158.inturbo.lt) has left #ceph
[15:48] * eternaleye (~eternaley@ Quit (Ping timeout: 480 seconds)
[15:55] * eternaleye (~eternaley@ has joined #ceph
[16:10] * dgandhi (~dwarren@ has joined #ceph
[16:12] <psomas> I've hit a strange bug. When I create an RBD image, and then try to remove it, librados i think hangs
[16:14] <psomas> I can provide client and server logs
[16:15] <psomas> Actually, it hangs when trying to delete an object (which is always in the same osd). rados rm -p rbd [oid] has the same behavior
[16:15] <psomas> And I get the same results, if I try to create that object
[16:28] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[16:40] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:48] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[16:54] * Olivier_bzh (~langella@xunil.moulon.inra.fr) Quit (Quit: Leaving.)
[16:56] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[17:15] * Tv (~tv@cpe-76-168-227-45.socal.res.rr.com) has joined #ceph
[17:24] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) has joined #ceph
[17:31] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[17:32] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:33] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) Quit (Remote host closed the connection)
[18:07] <sagewk> psomas: sounds like a pg wasn't active. the rbd create doesn't touch every object, but rbd rm does (to clean up), so you only notice it then
[18:09] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[18:09] <psomas> sagewk: y i saw that in the logs
[18:09] <psomas> from ceph -w/s i could see that all the pgs were active+clen except for 1 active+clean+scrubbing
[18:09] <psomas> but it didn't happen just one time
[18:10] <psomas> it happens repeatedly
[18:10] <psomas> and on a different pg i think
[18:10] <sagewk> hmm, can you reproduce with osd log turned up?
[18:12] <psomas> y, I tried with debug-ms=1, debug-osd=15-20 for the osds, and debug-rados=20 in the cli tool
[18:27] <sagewk> psomas: and you reproduced it?
[18:27] <psomas> y
[18:27] <sagewk> psomas: can you open a bug and attach/link to the logs?
[18:28] <psomas> sure, but I'll probably do it a bit later
[18:31] * bchrisman (~Adium@ has joined #ceph
[18:37] <Tv> i have successfully converted this laptop into a space heater
[18:37] <sagewk> as opposed to a lap heater?
[18:38] <sagewk> tv: did you see the teuthology bugs i opened yesterday?
[18:39] <Tv> not yet, looking now
[18:39] <sagewk> thanks
[18:40] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:42] * adjohn (~adjohn@ has joined #ceph
[18:45] * fronlius_ (~fronlius@testing78.jimdo-server.com) has joined #ceph
[18:45] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[18:45] * fronlius_ is now known as fronlius
[18:49] * fronlius (~fronlius@testing78.jimdo-server.com) Quit ()
[18:59] * dgandhi (~dwarren@ Quit (Quit: Leaving.)
[19:00] * elder (~elder@c-71-193-71-178.hsd1.mn.comcast.net) has joined #ceph
[19:05] <Tv> if you want to skype me for the daily, i'm just firstname.lastname.. my update is nothing special: public & cluster address via subnet, maybe teuthology bugs if they make enough sense, then back to new sepia planning and chef
[19:09] <damoxc> does anyone know what "libceph: osd7 (null)" means? I'm getting it spewed into syslog
[19:09] <damoxc> this is mixing bcache + rbd
[19:19] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit (Quit: cp)
[19:22] * adjohn (~adjohn@ Quit (Quit: adjohn)
[19:23] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:31] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:41] <damoxc> and happens when attaching a cache device to rbd0 which causes it to go a bit crazy
[19:41] <damoxc> is there anyway to increase the verbosity of the logging?
[19:41] <gregaf> damoxc: not sure what the (null) bit is about, that's weird
[19:42] <gregaf> we've got a way to turn up logging, let me check the wiki for it
[19:42] <damoxc> gregaf: cool thanks
[19:42] <gregaf> ugh, wiki database fail
[19:43] <damoxc> :-(
[19:43] <gregaf> http://ceph.newdream.net/wiki/Debugging section "Kernel Client Debugging"
[19:44] <gregaf> or the on/off scripts Sage gave me: http://pastebin.com/avGJzg6J and http://pastebin.com/fVTNAPAT
[19:44] <gregaf> :)
[19:45] <damoxc> excellent! i'll see if I can figure out what's up
[19:46] <damoxc> would that syslog message be generated by a dout?
[19:47] <gregaf> I think so? don't think we generate output any other way, but sagewk would know better than me
[19:48] <sagewk> damoxc: iirc yehudasa looked at this and bcache was doing something bad?
[19:48] <sagewk> yehudasa_: ping
[19:49] <damoxc> sagewk: it was, that has since been fixed
[19:49] * cp (~cp@ has joined #ceph
[19:49] <damoxc> well, as of today
[19:49] <Tv> 4GB of ram is so little these days :(
[19:49] <damoxc> sagewk: it wasn't calling something correctly to init the device
[19:49] <yehudasa_> sagewk: pong
[19:50] <damoxc> sagewk: now i'm able to create it as a backing device, mkfs.ext4 it, mount it, write to it, all fine
[19:50] <damoxc> sagewk: just attaching a cache device causes that message to spew
[19:51] <sagewk> damoxc: i thought it was related to bio splitting? yehudasa_ knows more ;)
[19:52] <damoxc> sagewk: okay :-) on a side note, in another cluster i'm having a problem with 3 of the osds becoming down and out, despite still running, looks like something to do with being unable to heartbeat
[19:53] <yehudasa_> damoxc, sagewk: bcache wasn't respecting the merge_bvec() callback
[19:53] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:53] <damoxc> yehudasa_: fixes for that were pushed today, this is a new issue
[19:54] <yehudasa_> damoxc: wasn't it spewing that before?
[19:54] <damoxc> yehudasa_: no it was a kernel oops before
[19:55] <damoxc> yehudasa_: and also at a different point, the oops occurred when trying to mount an ext4 formatted bcache device with a rbd device backing it, that's all fine now, it only happens when attaching the ssd for caching that this happens
[20:02] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:03] <yehudasa_> damoxc: what commit are you on?
[20:03] <damoxc> yehudasa_: bcache or ceph?
[20:03] <yehudasa_> ceph
[20:03] <yehudasa_> I mean ceph client
[20:04] <damoxc> yehudasa_: 339573406737461cfb17bebabf7ba536a302d841
[20:08] * Tv (~tv@cpe-76-168-227-45.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[20:08] * Tv (~tv@cpe-76-168-227-45.socal.res.rr.com) has joined #ceph
[20:09] <Tv> I just pushed a machine with 4GB RAM so deep into swap it needed to be hard reset :(
[20:14] * Tv__ (~Tv__@cpe-76-168-227-45.socal.res.rr.com) has joined #ceph
[20:14] * Tv__ (~Tv__@cpe-76-168-227-45.socal.res.rr.com) Quit (Remote host closed the connection)
[20:14] * Tv (~tv@cpe-76-168-227-45.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[20:14] * Tv__ (~Tv__@cpe-76-168-227-45.socal.res.rr.com) has joined #ceph
[20:14] * adjohn (~adjohn@ has joined #ceph
[20:15] <yehudasa_> damoxc: that message is at ceph_fault()
[20:16] <yehudasa_> damoxc: need the full logs to know what actually happened
[20:16] <yehudasa_> damoxc: probably some osd down?
[20:16] <damoxc> yehudasa_: just compiled a kernel with the stuff to enable debugging
[20:17] <damoxc> yehudasa_: don't think an osd is down
[20:17] <damoxc> yehudasa_: i have some pgs stuck as creating, could that cause any problems?
[20:17] <yehudasa_> damoxc: yes
[20:17] <yehudasa_> damoxc: though I'm not sure whether you'd see the same message
[20:28] <damoxc> yehudasa_: http://damoxc.net/ceph-syslog, it's 12mb
[20:38] <damoxc> yehudasa_: is that enough info to go on?
[21:23] * adjohn is now known as Guest17823
[21:23] * Guest17823 (~adjohn@ Quit (Read error: Connection reset by peer)
[21:23] * adjohn (~adjohn@ has joined #ceph
[21:46] <yehudasa_> damoxc: you need to set permissions on http://damoxc.net/ceph-syslog
[21:47] <damoxc> yehudasa_: whoops, try now
[21:57] <yehudasa_> damoxc: for some reason you're getting EFAULT on the connection
[21:57] <yehudasa_> damoxc: that is, when trying to write data (kernel_sendpage returns that)
[21:58] * The_Bishop (~bishop@port-92-206-183-175.dynamic.qsc.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[22:23] <damoxc> yehudasa_: hmm odd
[22:28] <damoxc> yehudasa_: would that indicate something wrong with the network, or osd, or pgs?
[22:35] * cp_ (~cp@ has joined #ceph
[22:38] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[22:41] * cp (~cp@ Quit (Read error: Operation timed out)
[22:41] * cp_ is now known as cp
[22:48] <Tv__> hmm it seems sagewk added --with-hadoop to gitbuilders but didn't tell it to install the -dev packages; gitbuilder-i386 has been broken for a while
[22:58] <Tv__> uhhh and it seems apt/sources.list has been hand-edited in a funky way?
[23:19] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[23:27] * stickman123 (~mmeuleman@ has joined #ceph
[23:28] <stickman123> hey, i'm looking into building out a clustered storage environment for a KVM cluster and i'm going to be using infiniband for all storage communication, but I can't seem to find any evidence about Ceph supporting infiniband or RMDA (outside of IPoIB)
[23:29] <stickman123> right now i'm between Ceph and GlusterFS, with the goal being to use the same hosts for storage as for KVM virtualization
[23:37] <joshd> stickman123: ceph doesn't support infiniband
[23:37] * Nightdog (~karl@190.84-48-62.nextgentel.com) Quit (Remote host closed the connection)
[23:48] <Tv__> well, there's always IPoIB
[23:50] <stickman123> sorry, got pulled into a call
[23:50] <stickman123> my goal is to setup a solution using open source software and infiniband to be able to scale computing and performance linearly and dynamically
[23:51] <stickman123> how much RAM and CPU does ceph tend to use?
[23:52] <stickman123> would setting up ceph on the same host as a virtual machine host be a bad idea in general?
[23:58] <joshd> stickman123: it uses about 200 MB during normal operation, during recovery it uses more

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.