#ceph IRC Log

Index

IRC Log for 2012-10-17

Timestamps are in GMT/BST.

[0:00] <cowbell> IMHO pki encryption for handshake and AES for content lightens the load a little.
[0:00] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[0:01] <cowbell> then there's the socket burden..
[0:02] * loicd1 (~loic@z2-8.pokersource.info) has joined #ceph
[0:03] <rlr219> sjust: looks like it is recovering nicely. it is moving in right direction. I have to go, but I will monitor and contact you back if it doesn't work. Thanks for your time today!
[0:03] <nhm> I'm curious how fast intel's hardware AES implementation could encrypt content.
[0:03] * synapsr (~synapsr@63.133.198.91) has joined #ceph
[0:03] <sjust> rlr219: not at all, thanks for your patience
[0:03] <sjust> rlr219: do let me know if it doesn't recover
[0:03] <rlr219> you too. cheers
[0:03] * rlr219 (43c87e04@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[0:04] * loicd (~loic@63.133.198.91) Quit (Read error: Connection reset by peer)
[0:07] <gregaf> yehudasa: didn't look deeply, but those all look like appropriate coverity fixes
[0:07] * loicd1 (~loic@z2-8.pokersource.info) Quit (Quit: Leaving.)
[0:07] * loicd (~loic@63.133.198.91) has joined #ceph
[0:09] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:10] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:10] * cblack101 (86868b46@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[0:12] <scuttlemonkey> rweeks: sorry, missed your question amidst booth insanity
[0:12] <scuttlemonkey> Mark of the Shuttleworth variety
[0:14] <rweeks> oh nice
[0:14] <rweeks> I didn't know he was there doing that demo
[0:16] * loicd (~loic@63.133.198.91) Quit (Ping timeout: 480 seconds)
[0:16] * loicd (~loic@z2-8.pokersource.info) has joined #ceph
[0:20] <Tv_> question is, has this Mark been to space or not
[0:24] * BManojlovic (~steki@212.200.241.182) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:26] <cowbell> I'll go to space when there's a McDonald's and a 7-11 there.
[0:27] <cowbell> hey.. I'm not me fsr
[0:27] * cowbell is now known as pentabular
[0:28] <pentabular> whups. no cowbell today.
[0:29] <rweeks> more cowbell?
[0:29] <pentabular> indeed!! :)
[0:32] <pentabular> so, about network communication wrt cephx+ssh, there was this a-ha moment with Salt when they put together zeromq+msgpack and the benchmarks were simply not believable, in terms of "that
[0:32] <pentabular> "that can't possibly be right"
[0:32] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[0:33] <pentabular> zmq == pub/sub, asynchronous, non-socket heavy (uses file descriptors instead)
[0:33] <pentabular> msgpack == 'data serialization' + tiny
[0:34] <pentabular> so the result is that you actually can send data/commands to 50,000 hosts or what have you simultaneously
[0:34] <pentabular> this is the kind of stuff they use in high frequency trading
[0:35] * scalability-junk (~stp@188-193-208-44-dynip.superkabel.de) Quit (Quit: Leaving)
[0:36] <pentabular> anyway.. if I had the balls for it I'd see if I could replace cephx with something like that.
[0:36] <pentabular> it seems like ceph might benefit from it.
[0:36] * pentabular does not have the balls for it, as far as I know (searches pants..)
[0:39] <rweeks> why not?
[0:39] <rweeks> sounds awesome
[0:40] * Ryan_Lane (~Adium@63.133.198.91) has joined #ceph
[0:40] <pentabular> I'm still pretty green when it comes to ceph.
[0:41] <pentabular> Hi Ryan_Lane! :)
[0:41] <Ryan_Lane> howdy
[0:41] <AaronSchulz> green?
[0:42] <pentabular> not very familiar with operating ceph and with developing / contributing to ceph
[0:42] <rweeks> hi Ryan
[0:42] <rweeks> how's the conference?
[0:42] <Ryan_Lane> great
[0:48] * loicd (~loic@z2-8.pokersource.info) Quit (Quit: Leaving.)
[0:50] * synapsr (~synapsr@63.133.198.91) Quit (Remote host closed the connection)
[0:50] <nhm> I'll have to branch out from supercomputing some time.
[0:51] <rweeks> are you going to sc12, nhm?
[0:51] <nhm> rweeks: yep
[0:52] * synapsr (~synapsr@63.133.198.91) has joined #ceph
[0:52] <rweeks> I am as well.
[0:52] <nhm> rweeks: excellent
[0:52] <rweeks> there will be 4-5 Inktank folks there as far as I know
[0:52] <nhm> rweeks: A lot of people I know won't be there this year though with the government travel restrictions. :(
[0:52] <rweeks> ugh
[0:53] <nhm> I think Sarp Oral at ORNL barely managed to convince them to send him after we got the OpenSFS benchmark working group BOF session unofficially sanctioned.
[0:54] <rweeks> oh jeez
[0:54] <gregaf> what travel restrictions are these?
[0:54] <nhm> gregaf: http://insidehpc.com/2012/08/14/new-doe-travel-restrictions-could-affect-sc12-and-beyond/
[0:55] * synapsr (~synapsr@63.133.198.91) Quit (Remote host closed the connection)
[0:57] <gregaf> heh
[1:01] * BManojlovic (~steki@212.200.241.182) has joined #ceph
[1:09] * synapsr (~synapsr@63.133.196.10) has joined #ceph
[1:09] * joshd (~joshd@63.133.198.91) has joined #ceph
[1:16] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[1:18] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:20] * lofejndif (~lsqavnbok@659AAA9ZD.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:27] * prometheanfire (~promethea@rrcs-24-173-105-83.sw.biz.rr.com) has joined #ceph
[1:29] * Ryan_Lane (~Adium@63.133.198.91) Quit (Quit: Leaving.)
[1:31] * Tv_ (~tv@38.122.20.226) Quit (Quit: Tv_)
[1:31] * BManojlovic (~steki@212.200.241.182) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:32] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:33] * tryggvil (~tryggvil@62.50.239.173) has joined #ceph
[1:34] * loicd (~loic@63.133.198.91) has joined #ceph
[1:35] * jlogan1 (~Thunderbi@2600:c00:3010:1:4fe:8250:70f9:cd1c) Quit (Ping timeout: 480 seconds)
[1:36] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[1:37] * pentabular is now known as cowbell
[1:37] <cowbell> ceph is awesome! :)
[1:38] <prometheanfire> rbd is awesome
[1:39] <joao> :)
[1:40] * cowbell wants to come for pizza lunch and bang my cowbell
[1:40] <prometheanfire> anyone have a link to the presentation given at the openstack conf about using ceph/rbd on openstack
[1:40] * oxhead (~oxhead@nom0065903.nomadic.ncsu.edu) has joined #ceph
[1:41] <prometheanfire> also, as long as the FS supports xattrs properly, ceph can run on it right?
[1:42] <sjust> prometheanfire: kinda
[1:42] <sjust> prometheanfire: yeah, pretty much
[1:42] <cowbell> ^ that is my impression 'in a perfect world'
[1:42] <sjust> prometheanfire: what fs are you thinking of
[1:42] <sjust> ?
[1:42] <prometheanfire> zfs
[1:43] <prometheanfire> well, zfsonlinux
[1:43] <sjust> ah, it might just work
[1:43] <sjust> also, it would be very interesting to add support in FileStore for zfs's more interesting features
[1:43] <prometheanfire> it'll go gentoo (hardened) -> zfs -> ceph -> rbd -> kvm (folsom) hopefully
[1:44] * LarsFronius (~LarsFroni@2a02:8108:3c0:79:9985:29ae:9b49:7247) Quit (Quit: LarsFronius)
[1:44] <cowbell> I thought you could configure osd's to use a separate metadata store on fs's that don't support xattrs/enough
[1:44] <prometheanfire> I think that's what I'm gonna plan for
[1:44] * cowbell is now known as pentabular
[1:44] <pentabular> back to being me
[1:45] <prometheanfire> no, just need to figure out if I want my hosts to be diskless
[1:46] <prometheanfire> heh, run hosts diskless with the source on themselves (bootstrapping would be fun)
[1:47] <pentabular> I'm a fan of diskless. heh.. sort of a pun.
[1:47] <prometheanfire> :P
[1:47] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[1:48] <prometheanfire> it kinda makes it so that it all has a single point of failure though
[1:48] * loicd (~loic@63.133.198.91) has joined #ceph
[1:48] <joshd> sjust: you might want to look at this: http://tracker.newdream.net/issues/3287
[1:49] <pentabular> well, if the 'single point' is actually a storage cluster... then at least having a VM host go down doesn't disrupt that.
[1:50] * scuttlemonkey (~scuttlemo@63.133.198.36) Quit (Quit: This computer has gone to sleep)
[1:52] <prometheanfire> but if the vm host is running the instance that is the source for the pxe boot (or iscsi or whatever)
[1:52] <sjust> joshd: cool
[1:52] * oxhead_ (~oxhead@nom0065903.nomadic.ncsu.edu) has joined #ceph
[1:53] <sjust> it's feeding us a funky xattr
[1:53] * oxhead (~oxhead@nom0065903.nomadic.ncsu.edu) Quit (Quit: oxhead)
[1:53] <dmick> the attr is more x than usual.
[1:53] <dmick> xxxattr
[1:53] <prometheanfire> heh
[1:53] <sjust> hmm, we'll need a live zfs to play with, I suspect
[1:55] <prometheanfire> what version of zfsonlinux was that?
[1:56] <prometheanfire> I'm not familiar with this option 'filestore xattr use omap '
[1:56] <pentabular> prometheanfire: your pxe/boot server is pretty important for sure. I consider that more closely tied to the storage pool.
[1:56] <pentabular> tho can still have it's image come from the cluster
[1:57] <pentabular> that can make for easy replacement
[1:59] <pentabular> I *think* that option says to use an external 'object map' instead of the fs's xattrs and is intended to support lesser filesystems
[1:59] * joshd (~joshd@63.133.198.91) Quit (Ping timeout: 480 seconds)
[1:59] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[1:59] <sjust> yeah, that's the idea
[1:59] <sjust> the xattrs get stuffed into leveldb
[2:00] <sjust> though you still need some xattr support regardless
[2:00] <prometheanfire> ya, I'd back it up to ceph or something, but bootstrapping is hard
[2:00] <prometheanfire> maybe I have one or two nodes that are physical
[2:00] <prometheanfire> that could work
[2:00] <pentabular> would that still be needed on btrfs?
[2:00] <sjust> that's just to allow the large xattrs allowed by rados to work on ext4, but we still use xattrs internally for other purposes
[2:00] <pentabular> prometheanfire: yeah, a chicken-and-egg issue, and not one you want to have to cook up repeatedly
[2:01] <prometheanfire> ya, since this is a home 'cluster' (4 hosts) it's gonna go up and down every couple of months probably
[2:01] * pentabular is confused by breakfast cereals that contain "clusters"
[2:01] <prometheanfire> heh
[2:02] <gregaf> sjust: can't we shunt all the xattrs into leveldb if we set the appropriate config options?
[2:02] <sjust> no, we need xattrs for non-idempotent transaction guards
[2:02] <prometheanfire> here is how zfs does xattrs https://github.com/zfsonlinux/zfs/issues/443
[2:02] <sjust> for now at least
[2:04] * loicd (~loic@63.133.198.91) has joined #ceph
[2:07] * loicd (~loic@63.133.198.91) Quit ()
[2:09] * James259 (~James259@94.199.25.228) has joined #ceph
[2:12] <James259> Hey guys.. strange little problem here. a typo in a script resulted in "rbd resize --size -1 imagename" being executed. the result was that imagename got resized to 15EB. Amusing at first - until I try and resize the image back to 25Gb, or delete it. rbd just hangs. I left it be for about 6 hours to see if it eventually deleted the image but its still sitting there. I tried restarting
[2:12] <James259> ceph on all servers and its still the same. Anyone got any idea how I can remove the image?
[2:13] * loicd (~loic@63.133.198.91) has joined #ceph
[2:15] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) has joined #ceph
[2:16] <prometheanfire> anyone know if live migration has been used with openstack/rbd?
[2:18] * sagelap1 (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[2:18] * rread (~rread@c-98-234-218-55.hsd1.ca.comcast.net) Quit (Quit: rread)
[2:21] * sagelap (~sage@143.sub-70-197-150.myvzw.com) has joined #ceph
[2:24] <dmick> James259: that *is* interesting.
[2:25] <dmick> you could definitely go hack at the object itself, but that's not for the faint of heart
[2:25] <dmick> prometheanfire: yes, I've seen it demoed
[2:25] <dmick> James259: are there other rbd images in that pool you want to save?
[2:26] <dmick> if not you can blow away and recreate the pool
[2:27] <prometheanfire> dmick: nice at the confrence?
[2:28] * loicd (~loic@63.133.198.91) Quit (Ping timeout: 480 seconds)
[2:29] <dmick> prometheanfire: no, here in the office
[2:29] * sagelap (~sage@143.sub-70-197-150.myvzw.com) Quit (Ping timeout: 480 seconds)
[2:29] <dmick> James259: if you do have stuff you want to save, we can try harder
[2:30] <prometheanfire> dmick: know if that's upstream'd? if not I can wait I supose :D
[2:30] <joao> are we supposed to triage the issues on coverity first, or should we just patch them and somehow mark them fixed?
[2:33] * Ryan_Lane (~Adium@63.133.198.91) has joined #ceph
[2:38] <James259> sorry dmick.. had to go afk a few.
[2:39] <James259> I could wipe and re-create everything stored right now.. this deployment is a live system that is being shipped to the data centre in just a couple of days so i wondered if there was a quick fix for this sort of thing if it happened to happen in a few weeks when we have live customers on it.
[2:41] <James259> is the hacking method a lot of hassle for you to tell me or is it something you can just point me in the direction of.
[2:41] <James259> ?
[2:42] <James259> just to be clear.. wiping the array IS a relatively easy option. Just wondering if its possible for me to learn the hacking route in case I ever need it.
[2:42] * danieagle (~Daniel@177.99.132.23) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[2:44] <prometheanfire> ceph/rbd is considered stable now right?
[2:44] <James259> I heard so. Have been testing it for about 10 months with very little issues.
[2:44] <James259> CephFS I understand is considered beta still.
[2:45] <prometheanfire> well, I'd be hosting on zfs at least for now
[2:45] <dmick> prometheanfire: I'm pretty sure migration is available in the current ceph-kvm, yes, and rbd is considered stable, yes
[2:45] <dmick> ideally I could point you at a presentation, but I can't seem to find it right now
[2:46] <dmick> James259: we can look. What release of Ceph?
[2:46] <dmick> or version, I should say
[2:46] <prometheanfire> 52 with the locking seems nice :D
[2:46] <dmick> (and we don't have to wipe the whole cluster, just the rbd pool)
[2:47] <James259> 0.48.2
[2:47] <James259> am i out of date?!?
[2:47] <dmick> 48.2 is the latest 'stable' release
[2:47] <James259> ah, kk.. phew
[2:47] <dmick> but that's fairly old, yeah
[2:47] <dmick> but no worries, that lets me know you don't have any format 2 images
[2:48] <James259> should we be deploying with a newer version?
[2:48] <James259> there is nothing else stored.. its purely rbd images
[2:48] <dmick> ah ok
[2:48] <dmick> in which case
[2:48] * sagelap1 (~sage@2600:1013:b01e:c695:a8bb:6bc0:8708:f582) has joined #ceph
[2:49] <dmick> you can 'rados -p rbd ls' to see the objects behind the rbd images
[2:49] <dmick> there'll be an <imagename>.rbd file that is the "header" object
[2:50] <James259> yes.
[2:51] <James259> and lots of rb.0.xxx objects below it
[2:51] <dmick> yes; those are the data objects; the huge image probably has none
[2:51] <dmick> if you 'rados -p rbd get <image>.rbd - | xxd'
[2:52] * Ryan_Lane (~Adium@63.133.198.91) Quit (Quit: Leaving.)
[2:52] <dmick> you can see the prefix of the objects involved in a particular image
[2:52] * loicd (~loic@63.133.198.91) has joined #ceph
[2:52] <James259> the huge image does have some. it was previously a working 25Gb image.
[2:52] <dmick> er, uh, oh
[2:52] <dmick> you resized. right.
[2:52] <James259> yes
[2:52] <dmick> well actually then
[2:52] <dmick> it might be most helpful to change the size back to 25?
[2:52] <dmick> then you can dispose of it as you wish
[2:53] <James259> rbd resize will not play. you thinking of another method?
[2:53] <dmick> yes, bitbanging
[2:53] <James259> sounds like fun. :)
[2:54] <James259> you need the output of that rbd get above still?
[2:54] <dmick> getting an argonaut source tree ready
[2:54] <dmick> yeah
[2:56] <James259> root@Node-1:~# rados -p rbd get disk98a.rbd - | xxd
[2:56] <James259> 0000000: 3c3c 3c20 5261 646f 7320 426c 6f63 6b20 <<< Rados Block
[2:56] <James259> 0000010: 4465 7669 6365 2049 6d61 6765 203e 3e3e Device Image >>>
[2:56] <James259> 0000020: 0a00 0000 0000 0000 7262 2e30 2e31 3262 ........rb.0.12b
[2:56] <James259> 0000030: 652e 3336 3235 6462 3064 0000 0000 0000 e.3625db0d......
[2:56] <James259> 0000040: 5242 4400 3030 312e 3030 3500 1600 0000 RBD.001.005.....
[2:56] <James259> 0000050: 0000 00c0 ffff ffff 0000 0000 0000 0000 ................
[2:56] <James259> 0000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
[2:56] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) has joined #ceph
[2:57] <dmick> right. so this is a "rbd_obj_header_ondisk"
[2:57] <dmick> the image size is in that object
[2:57] <dmick> if you "get" to a file, binary-edit it, and then "put" back, we can fix the size
[2:58] <James259> do i need to look for that file in /srv/osd.x on the servers?
[2:58] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[2:58] <dmick> no, you do it with rados operations
[2:58] <dmick> (rados get like you just did)
[3:00] <James259> so i do the above without the | xxd and redirect output to a temp file right?
[3:00] <dmick> right
[3:00] <James259> 112 bytes long - sound right?
[3:02] <dmick> 0x70, yep
[3:02] * miroslavk (~miroslavk@63.133.198.36) Quit (Quit: Leaving.)
[3:02] <Q310> when creating volumes in openstack with rbd is it supposed to be listed in a "cinder list" command? but i can only find them in "rbd -p volumes list"
[3:03] <James259> ok.. im off to find a i binary edit tool and will have a go. will let you know how it goes.. many thanks.
[3:03] <prometheanfire> that folsom?
[3:03] <Q310> i should add the volumes are created via the dashboard so not manually
[3:03] <Q310> prometheanfire: yeah
[3:03] <dmick> James259: cheers. you can ship me the file and I'll hack it if you like too
[3:03] <dmick> if you want, dan.mick@inktank.com
[3:03] <prometheanfire> haven't deployed openstack yet, still in planning
[3:03] * sagelap1 (~sage@2600:1013:b01e:c695:a8bb:6bc0:8708:f582) Quit (Ping timeout: 480 seconds)
[3:04] <Q310> prometheanfire: you'll enjoy it :)
[3:05] <prometheanfire> I'm sure, I'm hoping I'll have time to start setting up hosts this weekend
[3:05] <dmick> James259: image_size is at 0x50; currently 0xffffffff c0000000
[3:05] <James259> thank you so much.. I was just trying to find that.
[3:06] * loicd (~loic@63.133.198.91) has joined #ceph
[3:06] * loicd (~loic@63.133.198.91) Quit ()
[3:07] <dmick> 25GB, if it were exactly that, would be 0x6 40000000
[3:07] <dmick> so that's your mission there
[3:08] <James259> I will put calc away then. :P thx
[3:08] <dmick> heh
[3:08] <dmick> I will file a bug not to accept negative size values
[3:10] * loicd (~loic@63.133.198.91) has joined #ceph
[3:10] * synapsr (~synapsr@63.133.196.10) Quit (Remote host closed the connection)
[3:12] * maelfius (~mdrnstm@206.sub-70-197-141.myvzw.com) has joined #ceph
[3:13] <dmick> might be nice also to maintain a bitmap of used objects, perhaps
[3:13] <dmick> to speed deletion
[3:13] <prometheanfire> so if it gets too big it goes negative?
[3:13] * prometheanfire wonders what the biggest rbd device is
[3:14] <James259> it would help if I did not make a typo in a script and try resizing to -1 in the first place. :P
[3:14] <dmick> well, the size is parsed into a long long, but it should not allow negative input
[3:14] * miroslavk (~miroslavk@63.133.196.10) has joined #ceph
[3:14] <dmick> from the CLI, you're limited to 2^64 atm, and will soon be limited to 2^63 :)
[3:15] <James259> Thanks Dan - it appears to have worked. Just removing the image now.
[3:15] <dmick> excellent
[3:15] <dmick> out of curiosity, what did you use for binary editing? I'm not happy with the ones I've found
[3:16] <James259> vi
[3:16] <James259> :%!xxd
[3:16] <dmick> ah
[3:16] <James259> converts to binary view
[3:16] <prometheanfire> that measured in bytes?
[3:16] <James259> then..
[3:16] <slang> sagewk: pushed a functionality test for chmod and added the permissions checking to the wip-3301 branch
[3:16] <James259> :%!xxd -r followed by :wq
[3:16] <slang> sagewk: as soon as you give me the nod I'll merge to master
[3:16] <James259> converts it back and saves
[3:17] <dmick> prometheanfire: yes
[3:18] <prometheanfire> ok, so good for a while :D
[3:18] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[3:18] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:19] <James259> so was the problem with my -1 down to the fact that it took it as a signed value initially and then later read it back as unsigned?
[3:19] <James259> or am i just talking rubbish.. lol
[3:20] <dmick> yes, basically
[3:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[3:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:21] <James259> cool.. well, thanks very much for your help.. very much appreciated. (Im gonna go make sure I have no more typo's to bite me!)
[3:22] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[3:22] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:22] <dmick> yw. sorry for the trouble; that's a wicked trap. good job fixing itu p.
[3:23] <James259> no need to apologise.. You did not charge me for your software and the help you give is second to none.
[3:25] <James259> I just wish I could contribute back more but my C++ is not up to scratch yet.
[3:25] <dmick> well yw. tell all your friends. :)
[3:26] <dmick> https://github.com/ceph/ceph/commit/ee20cd02abd2b0c08f075e89f819eadfaccf7ee2
[3:27] <James259> when we get this product on sale to the public - the ceph project will be credited on the website for sure.
[3:30] <James259> hah.. that was quick. :D
[3:30] <prometheanfire> :D
[3:31] <dmick> knock the easy ones down while they're fresh in your mind...
[3:33] * loicd (~loic@63.133.198.91) Quit (Ping timeout: 480 seconds)
[3:36] * maelfius (~mdrnstm@206.sub-70-197-141.myvzw.com) Quit (Quit: Leaving.)
[3:41] * synapsr (~synapsr@63.133.196.10) has joined #ceph
[3:41] * rread (~rread@c-67-188-114-108.hsd1.ca.comcast.net) has joined #ceph
[3:41] * rturk (~rturk@ps94005.dreamhost.com) Quit (Ping timeout: 480 seconds)
[3:41] * rturk (~rturk@ps94005.dreamhost.com) has joined #ceph
[3:41] * rread (~rread@c-67-188-114-108.hsd1.ca.comcast.net) Quit ()
[3:43] * miroslavk (~miroslavk@63.133.196.10) Quit (Quit: Leaving.)
[3:53] * synapsr (~synapsr@63.133.196.10) Quit (Ping timeout: 480 seconds)
[4:01] * sjusthm (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[4:12] * loicd (~loic@63.133.198.91) has joined #ceph
[4:13] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[4:18] * aliguori (~anthony@cpe-70-123-146-246.austin.res.rr.com) has joined #ceph
[4:21] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[4:25] * oxhead_ (~oxhead@nom0065903.nomadic.ncsu.edu) Quit (Remote host closed the connection)
[4:25] * aliguori (~anthony@cpe-70-123-146-246.austin.res.rr.com) Quit (Remote host closed the connection)
[4:37] * deepsa (~deepsa@117.199.126.138) has joined #ceph
[4:42] * loicd (~loic@12.180.144.3) has joined #ceph
[4:57] * The_Bishop (~bishop@2001:470:50b6:0:ac8b:ad5b:26dd:5f13) Quit (Ping timeout: 480 seconds)
[5:01] * The_Bishop (~bishop@2001:470:50b6:0:ac8b:ad5b:26dd:5f13) has joined #ceph
[5:07] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[5:16] * oxhead (~oxhead@cpe-075-182-099-083.nc.res.rr.com) has joined #ceph
[5:19] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[5:23] * The_Bishop (~bishop@2001:470:50b6:0:ac8b:ad5b:26dd:5f13) Quit (Ping timeout: 480 seconds)
[5:27] <Q310> http://ceph.com/docs/master/rbd/rbd-openstack/ is this the only doco avaliable on cinder at this stage i guess?
[5:29] <dmick> it's the freshest we've created here. I have to believe there's more on the openstack site
[5:29] <prometheanfire> only one I know of
[5:31] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) Quit (Quit: dty)
[5:37] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:37] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:55] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[5:55] * Cube1 (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[6:02] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) has joined #ceph
[6:30] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[6:38] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[6:38] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) has joined #ceph
[7:00] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[7:00] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:16] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[7:16] <pentabular> 'sup ceph.
[7:20] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) Quit (Quit: This computer has gone to sleep)
[7:20] * dmick (~dmick@2607:f298:a:607:647d:1f2d:7bab:35f6) Quit (Quit: Leaving.)
[7:52] * sjusthm (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:00] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[8:04] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[8:06] * oxhead (~oxhead@cpe-075-182-099-083.nc.res.rr.com) Quit (Remote host closed the connection)
[8:18] * synapsr (~synapsr@63.133.196.10) has joined #ceph
[8:18] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:18] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:28] * maelfius (~mdrnstm@206.sub-70-197-141.myvzw.com) has joined #ceph
[8:33] * synapsr (~synapsr@63.133.196.10) Quit (Remote host closed the connection)
[8:55] * loicd (~loic@12.180.144.3) Quit (Ping timeout: 480 seconds)
[8:55] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:55] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:05] * gregaf1 (~Adium@2607:f298:a:607:f0f2:1a5d:9a78:9ac7) has joined #ceph
[9:07] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[9:07] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:09] * gregaf (~Adium@2607:f298:a:607:10ac:744:64fe:342b) Quit (Ping timeout: 480 seconds)
[9:12] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:16] <todin> morning ceph
[9:19] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:39] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[9:42] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[9:45] * gregaf (~Adium@38.122.20.226) has joined #ceph
[9:51] * gregaf1 (~Adium@2607:f298:a:607:f0f2:1a5d:9a78:9ac7) Quit (Ping timeout: 480 seconds)
[9:56] * LarsFronius (~LarsFroni@95-91-242-152-dynip.superkabel.de) has joined #ceph
[9:59] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[9:59] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[10:13] * steki-BLAH (~steki@smile.zis.co.rs) has joined #ceph
[10:14] * BManojlovic (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[10:14] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[10:14] * LarsFronius (~LarsFroni@95-91-242-152-dynip.superkabel.de) Quit (Quit: LarsFronius)
[10:15] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) has joined #ceph
[10:21] * tryggvil (~tryggvil@62.50.239.173) Quit (Ping timeout: 480 seconds)
[10:22] * oliver1 (~oliver@jump.filoo.de) has joined #ceph
[10:29] <oliver1> Good morning... I would vote for new topic: 0.53 is out ;)
[10:30] <oliver1> ... and perhaps a hint, that one should zero-out the journals from 0.52? Or is it only me getting a
[10:30] <oliver1> "filestore(/data/osd1-1) mount failed to open journal /dev/sdb8: (22) Invalid argument"
[10:31] <oliver1> After "ceph... --mkjournal" everything was clean again?!
[10:35] <todin> oliver1: morning, I could upgrade without a problem
[10:36] <oliver1> mhm... congrats ;) it was a running system, just did an apt-get update/upgrade, *sigh*
[10:36] <oliver1> Running again as I write, though...
[10:37] <oliver1> Perhaps s/t to do with me re-doing ceph with btrfs? One never knows...
[10:45] * stomper (ca037809@ircip2.mibbit.com) has joined #ceph
[10:47] <stomper> why developer community mails are not in working state? I tried to send some queries to developer community but it seems that mail is not delivered to the developer community
[10:50] <todin> oliver1: the system was running her as well, but after an upgrade, I did a ceph restart
[10:51] <todin> oliver1: and I use btrfs as well
[10:51] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[10:51] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[10:51] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[10:51] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[10:52] <oliver1> todin: sure, so did I. btrfs was just a guess, as nothing else change whilst upgrading...
[10:53] <oliver1> going on this my testing and tries to break down things :-D
[10:53] <oliver1> _with_ my testing... tz.
[10:53] <todin> oliver1: is your jorunal on a raw device, or on btrfs?
[10:54] <todin> I use a raw partition for the journal
[10:54] <oliver1> Raw device on a SSD-partition.
[10:55] <oliver1> Silly me, was too fast zeroing out all three journals, and not getting some more from debugging...
[11:03] * steki-BLAH (~steki@smile.zis.co.rs) Quit (Ping timeout: 480 seconds)
[11:05] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[11:18] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:20] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[11:20] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:29] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:32] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[11:32] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:33] * Cube1 (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[11:38] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[11:39] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:40] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:42] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:42] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:54] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:54] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:57] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:57] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:59] * deepsa (~deepsa@117.199.126.138) Quit (Remote host closed the connection)
[12:01] * deepsa (~deepsa@117.207.89.195) has joined #ceph
[12:01] * scalability-junk (~stp@188-193-208-44-dynip.superkabel.de) has joined #ceph
[12:15] * deepsa (~deepsa@117.207.89.195) Quit (Quit: Computer has gone to sleep.)
[12:23] <stomper> can any1 tell why ceph developer mails are undelivered ?
[12:33] * Cube1 (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[12:44] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:49] <todin> stomper: you mean the mailinglist?
[12:49] <stomper> yeah
[12:50] <todin> you get an bounce form veger.kernel.org?
[12:51] <stomper> i used to send my doubts to ceph-devel@vger.kernel.org .. gmail says k mail can not be delivered
[12:51] <stomper> today morning it happnd
[12:51] <stomper> twice
[12:54] <todin> not sure what happend there
[12:54] <stomper> did u tried now?
[12:56] <todin> stomper: no, I don't want to send a test email do the list, my last email to the list was on monday an everything was fine
[12:56] <todin> is there a reaseon in the bounce
[12:56] <todin> ?
[12:56] <stomper> Delivery to the following recipient failed permanently: ceph-devel@vger.kernel.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was
[12:57] <stomper> it was rejected by the recipient domain
[12:57] <stomper> it was rejected by the recipient domain
[12:58] <joao> last email I received from the list was at 6h44 UTC
[12:59] <joao> which tells nothing about your problem, but still... at least up until that time mail was being delivered
[13:00] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[13:00] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[13:02] <joao> stomper, it works
[13:03] <joao> just spammed the list and the email was delivered
[13:05] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[13:05] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[13:13] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[13:14] <stomper> :(
[13:14] <stomper> i dont knw , wat happnd to my account
[13:17] <stomper> do i need to ask som1 to solve this problem who belongs to inktank ? as gmail says its not accepted by devcomm side..
[13:18] <joao> a while back, some people at the office got mysteriously unsubscribed to the list
[13:18] <joao> had to resubscribe
[13:18] <joao> have you tried that?
[13:18] <stomper> i did that as well
[13:18] <joao> oh
[13:18] <joao> and you're trying to send a plain-text email?
[13:18] <joao> no html, right?
[13:18] <stomper> i guess i need to use some other account then
[13:19] <stomper> yeah , plain txt
[13:19] <joao> yeah, no idea
[13:19] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[13:19] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[13:19] <joao> that's all I had in my quick-and-dirty-ml-troubleshoot :p
[13:19] <stomper> anywayz thanx
[13:19] <stomper> :)
[13:20] <joao> not at all
[13:27] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[13:27] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[13:29] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[13:29] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[13:30] <ninkotech> you should imho try getting out of google services. you would not like nsa, cia, north korea telecom or google to run your digital life.
[13:30] <ninkotech> please run your own mail list
[13:30] <ninkotech> its not that hard
[13:31] <ninkotech> google is evil...
[13:32] <ninkotech> ah, sorry. i just noticed its on vger.kernel.org :) its outgoing mail problem
[13:33] <ninkotech> google is still evil though
[13:33] <ninkotech> :D
[13:50] * loicd (~loic@12.180.144.3) has joined #ceph
[14:00] * stomper (ca037809@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:14] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[14:14] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[14:23] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[14:23] <tnt> Hi. Any idea why sometime I get /dev/rbdN devices only and sometimes I get the /dev/rbd/pool_name/image_name link as well ?
[14:26] <zynzel> tnt: rbd map
[14:27] <tnt> mmm, strange. I was pretty sure I have the /dev/rbd/pool_name/image_name name on a machine where ceph wasn't installed (i.e. only the kernel module).
[14:37] <tnt> nevermind, it seems somehow librbd1 package was on that machine, which includes the udev rules ...
[14:48] * The_Bishop (~bishop@2001:470:50b6:0:3c0b:fe6b:861a:8c84) has joined #ceph
[14:58] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[14:58] * oxhead (~oxhead@cpe-075-182-099-083.nc.res.rr.com) has joined #ceph
[15:03] * oxhead (~oxhead@cpe-075-182-099-083.nc.res.rr.com) Quit (Remote host closed the connection)
[15:08] * deepsa (~deepsa@117.212.21.154) has joined #ceph
[15:10] * tnt just read the ceph performance article and the XFS number don't look so hot :(
[15:12] <liiwi> done any figures yet on btrfs?
[15:13] <tnt> yes they have. They're much better (at the cost of much more cpu usage though)
[15:13] <tnt> http://ceph.com/community/ceph-performance-part-1-disk-controller-write-throughput/
[15:16] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) has joined #ceph
[15:27] * oxhead (~oxhead@nom0065903.nomadic.ncsu.edu) has joined #ceph
[15:32] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) Quit (Ping timeout: 480 seconds)
[15:35] <nhm> tnt: heya, glad you liked the article. :)
[15:36] <nhm> tnt: one thing that the article doesn't show is performance degradation over time. btrfs seems to degrade faster than xfs.
[15:36] <nhm> I'll probably try to write another article focusing on that at some point.
[15:38] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) Quit (Quit: dty)
[15:39] <tnt> nhm: yes :) I'm especially glad to see that my choice to use 1 osd per disk rather than 1 osd per machine with disks in RAID0 was good :p
[15:50] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) has joined #ceph
[15:54] * dty (~derek@129-2-129-154.wireless.umd.edu) has joined #ceph
[15:58] * synapsr (~synapsr@63.133.196.10) has joined #ceph
[15:59] <jks> when I mount cephfs using the kernel driver everything is read-only... how can I mount it read/write?
[15:59] <jks> option rw to mount didn't work... when I use the FUSE driver it is mounted read/write automatically
[16:04] <jks> I read the announcement yesterday about 0.53... but the link for RPM downloads in the announcement has only 0.52... anyone knows if RPMs will be built regularly?
[16:08] * stass (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[16:08] * stass (stas@ssh.deglitch.com) has joined #ceph
[16:09] * aliguori (~anthony@32.97.110.59) has joined #ceph
[16:11] * tontsa (~tontsa@solu.fi) has joined #ceph
[16:13] <match> jks: I build from the spec file in with the source - generally builds pretty fast on my desktop box, and the dependencies aren't too unusual to satisfy
[16:13] <jks> match: I'll try that instead - just odd that the announcements link to the 0.52 rpms, so I thought perhaps there was some problem building RPMs for 0.53
[16:14] <jks> is there documentation on the recommend way to upgrade? - or can I just shut down the osd/mon/mds on a server, rpm -U the new RPM, and start the osd/mon/msd again?
[16:15] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[16:15] <match> jks: Don't know the 'official' answer, but for me I upgrade the rpms while the service is running, then just do 'service ceph restart' (obviously one by one, in case of issues!)
[16:15] <jks> simple enough for me ;-)
[16:16] * lofejndif (~lsqavnbok@9KCAACE83.tor-irc.dnsbl.oftc.net) has joined #ceph
[16:19] * pentabular (~sean@70.231.129.172) Quit (Quit: pentabular)
[16:20] * cdblack (c0373626@ircip3.mibbit.com) has joined #ceph
[16:22] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[16:24] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:32] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) has joined #ceph
[16:41] <scheuk>
[16:41] * scheuk (~scheuk@67.110.32.249.ptr.us.xo.net) Quit (Quit: Leaving)
[16:45] * scheuk (~scheuk@67.110.32.249.ptr.us.xo.net) has joined #ceph
[16:55] <scheuk> I have some good ceph osd failure scenario questions
[16:56] * dty_ (~derek@129-2-129-154.wireless.umd.edu) has joined #ceph
[16:56] * dty (~derek@129-2-129-154.wireless.umd.edu) Quit (Read error: Connection reset by peer)
[16:56] * dty_ is now known as dty
[16:57] <scheuk> If you had seperate disks for the OSD and Journal, and you loose either one of them (the disk), would it be best to just to wipe clean and re-fill the whole node
[16:57] <scheuk> or can you specifically recover if the OSD drive or Journal Drive fails
[17:02] * James259 (~James259@94.199.25.228) Quit (Ping timeout: 480 seconds)
[17:04] <tnt> good question, I'm curious to know the answer too :)
[17:10] * jlogan1 (~Thunderbi@2600:c00:3010:1:4fe:8250:70f9:cd1c) has joined #ceph
[17:11] * oxhead (~oxhead@nom0065903.nomadic.ncsu.edu) Quit (Remote host closed the connection)
[17:12] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:17] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:29] * oliver1 (~oliver@jump.filoo.de) has left #ceph
[17:29] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[17:30] <sage> joao: looks like he's sending html formatted email, which vger rejects
[17:31] * tziOm (~bjornar@ti0099a340-dhcp0778.bb.online.no) has joined #ceph
[17:33] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) Quit (Read error: Connection reset by peer)
[17:33] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) has joined #ceph
[17:36] <joao> sage, I asked him about that and he said he was sending plain-text
[17:37] <sage> hmm maybe, his email to me privately was html :)
[17:37] <joao> <joao> and you're trying to send a plain-text email?
[17:37] <joao> <joao> no html, right?
[17:37] <joao> <stomper> i guess i need to use some other account then
[17:37] <joao> <stomper> yeah , plain txt
[17:38] <joao> maybe he's sending html emails without noticing it though
[17:39] <sage> vger is pretty aggressive about filtering spam too, so maybe his isp was naughty recently or something
[17:47] * loicd (~loic@12.180.144.3) Quit (Ping timeout: 480 seconds)
[17:47] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:49] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) Quit (Quit: This computer has gone to sleep)
[17:57] * sagelap (~sage@41.sub-70-197-148.myvzw.com) has joined #ceph
[17:57] <sagelap> elder: that return value fix is only important for format 2 images right?
[17:58] * ChanServ sets mode +o sagelap
[17:58] * sagelap changes topic to 'v0.53 is out'
[17:58] <elder> Yes. Only v2 images have image id's.
[17:58] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[17:58] <sagelap> great thanks!
[17:58] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[17:59] * tnt (~tnt@246.121-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:04] <sagelap> slang: there?
[18:06] * loicd (~loic@63.133.198.91) has joined #ceph
[18:07] <sagelap> slang: wip-client-smb looks good, the one thing missing is a test for the relative symlink fix
[18:09] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[18:09] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[18:12] * joshd (~joshd@63.133.198.91) has joined #ceph
[18:12] * miroslavk (~miroslavk@63.133.196.10) has joined #ceph
[18:13] * scuttlemonkey (~scuttlemo@63.133.198.36) has joined #ceph
[18:13] * synapsr (~synapsr@63.133.196.10) Quit (Remote host closed the connection)
[18:14] * loicd1 (~loic@63.133.198.91) has joined #ceph
[18:14] * loicd (~loic@63.133.198.91) Quit (Read error: Connection reset by peer)
[18:16] <benpol> So the performance degradations people are seeing over time with btrfs, people feel that theyr
[18:16] <benpol> they're related to fragmentation?
[18:16] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) Quit (Ping timeout: 480 seconds)
[18:16] * Tv_ (~tv@38.122.20.226) has joined #ceph
[18:17] <benpol> if so, have there been any improvements in performance over time with the auto-defrag option enabled for the btrfs mount points?
[18:18] <Tv_> joao: vger is a *huge* mailing list host, and it's quite unlikely to have issues, apart from a bad attitude toward non-old school users
[18:19] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[18:19] * NaioN (stefan@andor.naion.nl) has joined #ceph
[18:19] <Tv_> http://vger.kernel.org/majordomo-info.html#taboo
[18:24] <nhm> benpol: fragmentation might be part of it. We notice writes get more seeky over time. It may be due to where on the disk file/directory metadata resides vs where data ends up as new data is written.
[18:24] <joao> Tv_, I know, but assuming that has any other service it might incur in occasional issues is not that far fetched :p
[18:25] <Tv_> joao: yeah it's just not the Occam's razor approved branch of reality ;)
[18:25] <joao> heh
[18:25] <jamespage> scuttlemonkey: glad you liked it - ping me if you find any issues....
[18:26] <Tv_> srsly i think vger handles more mail than usenet as a whole had posts when i got introduced to it
[18:26] <Tv_> and it's not like some social media vomit, this is long emails & patches etc
[18:26] <joao> I'm sure it does
[18:26] <Tv_> it's like staring at the fury of creation
[18:26] <Tv_> linux-kernel at least feels that way
[18:27] * loicd1 (~loic@63.133.198.91) Quit (Quit: Leaving.)
[18:27] <joao> just by the volume of traffic that linux-kernel imposes on my mail account, I'm assuming that's certainly a possibility
[18:27] * miroslavk (~miroslavk@63.133.196.10) Quit (Quit: Leaving.)
[18:27] <benpol> nhm: interesting, still one of the dark corners, huh? otherwise the initial performance numbers are so tantalizing!
[18:29] <joao> sagewk, pushed mon-coverity-fixes
[18:29] <scuttlemonkey> jamespage: will do, went a lot better after I talked to Mark Ramm et al and got the latest one from you and not the older one in /precise/
[18:30] <joao> tackled down some medium-priority ones, but will check for more
[18:30] <jamespage> scuttlemonkey, yeah - the new ones do things quite differently
[18:30] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[18:30] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) Quit ()
[18:30] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[18:33] <scuttlemonkey> jamespage: yeah, being able to spin the osds out individually was key?now I just need the gui stuff to come out so I can show people how fun it is :)
[18:33] * loicd (~loic@63.133.198.91) has joined #ceph
[18:34] * sagelap1 (~sage@2607:f298:a:607:572:128b:51f4:d83) has joined #ceph
[18:35] * deepsa (~deepsa@117.212.21.154) Quit (Quit: Computer has gone to sleep.)
[18:36] * deepsa (~deepsa@117.212.21.154) has joined #ceph
[18:36] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[18:36] * sagelap (~sage@41.sub-70-197-148.myvzw.com) Quit (Ping timeout: 480 seconds)
[18:41] * loicd1 (~loic@63.133.198.91) has joined #ceph
[18:41] * loicd (~loic@63.133.198.91) Quit (Ping timeout: 480 seconds)
[18:42] <sagewk> joao: mon fixups look good
[18:42] <joao> cool thanks
[18:43] <sagewk> joao: the authmonitor one could probalby just call get_auth and use the return value as the condition for the preceding if block
[18:43] <sagewk> instead of calling contains and then asserting the subsequent get succeeds
[18:44] <joao> yeah, that sounds more adequate; fixing it
[18:47] * Cube1 (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[18:48] * jlogan1 (~Thunderbi@2600:c00:3010:1:4fe:8250:70f9:cd1c) Quit (Ping timeout: 480 seconds)
[18:48] <joao> sagewk, fixed and pushed (w/ rebase)
[18:52] * maelfius (~mdrnstm@206.sub-70-197-141.myvzw.com) Quit (Quit: Leaving.)
[18:56] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:58] * loicd (~loic@63.133.198.91) has joined #ceph
[18:58] * loicd1 (~loic@63.133.198.91) Quit (Quit: Leaving.)
[19:00] * dty (~derek@129-2-129-154.wireless.umd.edu) Quit (Quit: dty)
[19:01] <sagewk> jamespage: i was about to merge in those libceph-java packaging changes.. any final comments/concerns?
[19:03] * joshd (~joshd@63.133.198.91) Quit (Ping timeout: 480 seconds)
[19:05] <jamespage> sagewk, I've not had time to build test the last set of updates but they looked OK
[19:05] <jamespage> aside from the sparse package descriptions ;-)
[19:06] * joshd (~joshd@63.133.198.91) has joined #ceph
[19:07] * dty (~derek@129-2-129-154.wireless.umd.edu) has joined #ceph
[19:08] * dty_ (~derek@129-2-129-154.wireless.umd.edu) has joined #ceph
[19:08] * dty (~derek@129-2-129-154.wireless.umd.edu) Quit (Read error: Connection reset by peer)
[19:08] * dty_ is now known as dty
[19:13] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[19:13] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[19:14] <jks> how can I mount cephfs read/write with the kernel driver? (I have tried mount option rw)
[19:14] * synapsr (~synapsr@63.133.198.91) has joined #ceph
[19:16] * scalability-junk (~stp@188-193-208-44-dynip.superkabel.de) Quit (Quit: Leaving)
[19:17] * LarsFronius (~LarsFroni@95-91-242-153-dynip.superkabel.de) has joined #ceph
[19:18] <Tv_> jks: it's rw by default -- perhaps you're trying to write a root-owned directory as non-root
[19:19] * oxhead (~oxhead@nom0065863.nomadic.ncsu.edu) has joined #ceph
[19:20] * cowbell (~sean@adsl-70-231-129-172.dsl.snfc21.sbcglobal.net) has joined #ceph
[19:23] * yehudasa_ (~yehudasa@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[19:27] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[19:30] <slang> ../../src/test/libcephfs/test.cc:179: Failure
[19:30] <slang> Value of: ceph_readdir(cmount, ls_dir) == NULL
[19:30] <slang> Actual: false
[19:30] <slang> Expected: true
[19:30] <slang> sagewk: is that the error you're seeing?
[19:30] <sagewk> slang: yeah
[19:30] <sagewk> line 178 in my case, but close enough
[19:31] <slang> just got it for a run I did
[19:32] <sagewk> k. i have a client 20 log for a failed run, if that's useful
[19:32] * loicd (~loic@63.133.198.91) has joined #ceph
[19:34] <sagewk> slang: opened 3339, attached log
[19:35] <slang> sagewk: this might just be the test
[19:35] <slang> rewinddir invalidates the offset cookie that telldir returns
[19:35] <sagewk> yeah
[19:36] <sagewk> yep.
[19:36] <slang> so I can't do that in the test
[19:36] <slang> sagewk: I think we can just whack that part of the test, it was meant to check that we can seekdir to the end, but there's no good way to do that
[19:37] <sagewk> k
[19:38] <sagewk> fwiw the posix seekdir interface is a pile of crap; it may have to be okay to just handle the basic cases of seekdir(telldir()) and rewinddir() and not too much else.
[19:38] * BManojlovic (~steki@212.200.241.182) has joined #ceph
[19:38] <sagewk> i forget exactly how much of it we do correctly now
[19:39] * slang nods
[19:39] <Tv_> oh god telldir()
[19:40] <slang> sagewk: not having to return the same order makes it easier to implement though :-)
[19:40] <Tv_> i've been wondering what would happen if i patched glibc to not support seekdir() or anything but beginning-of-dir
[19:40] <jks> Tv_, nope, I can't write to it even with root privileges
[19:40] <Tv_> i frankly don't think anything significant would even break
[19:40] <jks> Tv_, if I FUSE mount it, it works as expected
[19:40] <Tv_> jks: can you copy-paste?
[19:41] <jks> Tv_, sure what do you want me to copy-paste?
[19:41] <Tv_> jks: the command you tried and any output
[19:41] <Tv_> jks: and perhaps output of "mount" to show it really is mounted, etc
[19:42] <slang> Tv_: yeah I wonder
[19:42] <jks> Tv_, for example: "touch test" gives: touch: cannot touch 'test': Permission denied
[19:42] * yehudasa_ (~yehudasa@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[19:42] <jks> Tv_, mount shows that the file system is indeed mounted... I can write to the mount point fine when cephfs is not mounted
[19:43] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Quit: Leaving.)
[19:43] <jks> Tv_, my mount command: mount -t ceph 10.0.0.1:6789:/ /mnt/osd -vv -o name=admin,secret=(removed)
[19:44] <Tv_> jks: anything in dmesg?
[19:45] <jks> Tv_, not really, no
[19:46] <jks> Tv_, I just reran the commands... one thing, I got this warning message when I run the mount command: mount: error writing /etc/mtab: Invalid argument
[19:46] <jks> the file system mounted fine however, and it is listed in /etc/mtab
[19:47] <jks> perhaps it is some kind of selinux restriction (just guessing here)
[19:47] * synapsr (~synapsr@63.133.198.91) Quit (Remote host closed the connection)
[19:48] * scuttlemonkey_ (~scuttlemo@63.133.198.36) has joined #ceph
[19:49] * scuttlemonkey (~scuttlemo@63.133.198.36) Quit (Read error: Connection reset by peer)
[19:49] * dty (~derek@129-2-129-154.wireless.umd.edu) Quit (Read error: Connection reset by peer)
[19:49] * dty (~derek@129-2-129-154.wireless.umd.edu) has joined #ceph
[19:51] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[19:52] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:55] * dmick (~dmick@2607:f298:a:607:1c99:a06a:7608:e987) has joined #ceph
[19:55] * dmick (~dmick@2607:f298:a:607:1c99:a06a:7608:e987) has left #ceph
[19:55] * dmick (~dmick@2607:f298:a:607:1c99:a06a:7608:e987) has joined #ceph
[19:55] * ChanServ sets mode +o dmick
[19:55] * scuttlemonkey_ (~scuttlemo@63.133.198.36) Quit (Quit: This computer has gone to sleep)
[19:59] * joshd (~joshd@63.133.198.91) Quit (Ping timeout: 480 seconds)
[20:02] * synapsr (~synapsr@63.133.198.91) has joined #ceph
[20:03] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[20:06] * gregaf (~Adium@38.122.20.226) Quit (Quit: Leaving.)
[20:10] * gregaf (~Adium@2607:f298:a:607:f0f2:1a5d:9a78:9ac7) has joined #ceph
[20:15] * scuttlemonkey (~scuttlemo@63.133.198.36) has joined #ceph
[20:16] * renzhi (~xp@180.172.185.154) has joined #ceph
[20:18] <elder> sagewk, if I'm right about the root cause of this rbd bug, it could explain some bad rbd performance (and the fix could remedy that)
[20:18] <elder> Fingers crossed.
[20:18] * joshd (~joshd@63.133.198.91) has joined #ceph
[20:18] <renzhi> hi all
[20:18] <gregaf> scheuk: when you were asking about lost journal disks, if you're using btrfs then you shouldn't wipe the data store and recover everything, because it has consistent checkpoints it can move forward from
[20:19] <renzhi> why is one down mon also bring down the whole cluster?
[20:19] <gregaf> if you're using xfs/ext4/anything else, that is unfortunately not the case so losing the journal is generally equivalent to losing the store
[20:19] <gregaf> renzhi: how many monitors do you have in your cluster, and how many are active?
[20:19] <gregaf> and what do you mean "bring down the whole cluster"?
[20:19] <renzhi> gregaf: 3
[20:20] <renzhi> when one mon goes down, basically I can't connect to the cluster anymore
[20:20] <renzhi> ceph command does not work either
[20:20] <gregaf> do you have the monitor up or down right now?
[20:21] <renzhi> they are all up now, and trying to recover, but I think we are in quite deep trouble
[20:21] <lxo> still no 0.53 tarballs?!?
[20:21] <scheuk> gregaf: that's what I thought, we are using XFS
[20:21] <gregaf> renzhi: what's the output of ceph -s?
[20:22] <renzhi> one mon went down, we had to hard-reboot the machine, and now it's trying to recover
[20:22] <gregaf> glowell1: people were asking about RPMs for 0.53 earlier
[20:22] <renzhi> ceph -s takes a long long time to get back
[20:22] <scheuk> gregaf: is the OSD smart enough to go offline of it looses either the journal or osd disk?
[20:23] <gregaf> it'll quit if any write failures get reported to it, yes
[20:24] <joao> renzhi, but does ceph -s end up outputting something?
[20:24] <glowell1> gregaf: there are rpms for centos6 and fedora17 under rpm-testing.
[20:25] <renzhi> joao: finally got something back, which does not look pretty to me
[20:25] <renzhi> root@s200001:~# ceph -s
[20:25] <renzhi> health HEALTH_WARN 3690 pgs backfill; 11287 pgs degraded; 5238 pgs down; 98768 pgs peering; 59313 pgs recovering; 23945 pgs stale; 141020 pgs stuck inactive; 23945 pgs stuck stale; 203939 pgs stuck unclean; recovery 6562890/23041820 degraded (28.483%); 76790/7680621 unfound (1.000%); mds a is laggy; 4/38 in osds are down; 1 mons down, quorum 0,1 a,b
[20:25] <renzhi> monmap e1: 3 mons at {a=10.1.0.11:6789/0,b=10.1.0.13:6789/0,c=10.1.0.15:6789/0}, election epoch 176, quorum 0,1 a,b
[20:25] * synapsr (~synapsr@63.133.198.91) Quit (Remote host closed the connection)
[20:25] <renzhi> osdmap e3205: 76 osds: 34 up, 38 in
[20:25] <renzhi> it said on mon down, but all mons are running
[20:26] * cowbell is now known as pentabular
[20:26] <scheuk> gregaf: that's what I was hoping for
[20:26] <scheuk> :)
[20:26] * pentabular is now known as Guest2100
[20:26] <joao> renzhi, what version are you running?
[20:27] * Guest2100 is now known as pentabular
[20:27] <renzhi> joao: argonaut
[20:27] <joao> on all three monitors?
[20:27] <renzhi> yes
[20:27] <renzhi> we run the same version on all machines
[20:28] <gregaf> well, only two of them are in quorum so if all three are running there's a problem somewhere
[20:28] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[20:28] <renzhi> I check that all osds are running, but the ceph -s comman shows something weird
[20:28] * scuttlemonkey (~scuttlemo@63.133.198.36) Quit (Quit: This computer has gone to sleep)
[20:28] <joao> okay, could you please try './ceph --admin-daemon=/path/to/mon.c.asok mon_status' on mon.c's node?
[20:28] <joao> I can't really recall what's the default path to the admin socket
[20:29] <renzhi> ok, let me try that
[20:29] <sagewk> elder: nice
[20:31] <scheuk> Now I have a question about OSDs flapping and going offline while another OSD is offline and while the PGs are being repaired.
[20:31] <scheuk> we are using ceph .48.2
[20:32] <scheuk> with 6 OSDs
[20:32] <scheuk> 1 on each server
[20:32] <scheuk> While one OSD was offline due to a BTFRS problem
[20:33] <renzhi> joao: here's the result:
[20:33] <renzhi> root@s300001:/# ceph --admin-daemon=/var/run/ceph/ceph-mon.c.asok mon-status
[20:33] <scheuk> We noticed while the PGs wehere being repaird, 2 pf the other OSDs went offline
[20:33] <renzhi> read only got 0 bytes of 4 expected for response length; invalid command?
[20:33] <scheuk> Is there anyway to mitigate that problem?
[20:33] <joao> renzhi, 'mon_status' instead of 'mon-status'
[20:33] * joshd1 (~joshd@63.133.198.91) has joined #ceph
[20:33] * joshd (~joshd@63.133.198.91) Quit (Read error: Connection reset by peer)
[20:34] <renzhi> doh
[20:35] <renzhi> joao: here it is
[20:35] <renzhi> root@s300001:/# ceph --admin-daemon=/var/run/ceph/ceph-mon.c.asok mon_status
[20:35] <renzhi> { "name": "c",
[20:35] <renzhi> "rank": 2,
[20:35] <renzhi> "state": "slurping",
[20:35] <renzhi> "election_epoch": 167,
[20:35] <renzhi> "quorum": [],
[20:35] <glowell1> ceph 0.53 tarballs have been pushed to ceph.com/download.
[20:35] <renzhi> "outside_quorum": [
[20:35] <renzhi> "c"],
[20:35] <renzhi> "slurp_source": "mon.0 10.1.0.11:6789\/0",
[20:35] <renzhi> "slurp_version": { "osdmap": 3212,
[20:35] <renzhi> "pgmap": 5022700},
[20:35] <renzhi> "monmap": { "epoch": 1,
[20:35] <renzhi> "fsid": "35c281c2-9997-4775-982e-24810da15e6b",
[20:35] <renzhi> "modified": "2012-07-17 16:11:07.215231",
[20:35] <renzhi> "created": "2012-07-17 16:11:07.215231",
[20:35] <renzhi> "mons": [
[20:35] <renzhi> { "rank": 0,
[20:35] <renzhi> "name": "a",
[20:35] <renzhi> "addr": "10.1.0.11:6789\/0"},
[20:35] <renzhi> { "rank": 1,
[20:35] <renzhi> "name": "b",
[20:35] <renzhi> "addr": "10.1.0.13:6789\/0"},
[20:35] <renzhi> { "rank": 2,
[20:35] <renzhi> "name": "c",
[20:35] <renzhi> "addr": "10.1.0.15:6789\/0"}]}}
[20:36] * loicd (~loic@63.133.198.91) has joined #ceph
[20:36] <joao> renzhi, looks like it's catching up with the other monitors in order to join the quorum
[20:37] <joao> I'm unsure of how long it usually takes; does anyone have an idea?
[20:37] <renzhi> that's normal?
[20:38] <renzhi> because it wasn't mon.c that crashed, it was mon.a
[20:39] <joao> renzhi, somehow mon.a ended up forming a quorum with mon.b, and mon.c ended up with an old diverging state from the remaining quorum
[20:39] <joao> we can check the logs if mon.c have been like that for too long, maybe there's another issue there
[20:39] <gregaf> not diverging, just behind
[20:39] <joao> err, yes
[20:39] <renzhi> joao: forgive my ignorance here, which part of the message showed that?
[20:39] <joao> behind
[20:40] <joao> <renzhi> "slurp_source": "mon.0 10.1.0.11:6789\/0",
[20:40] <joao> <renzhi> "slurp_version": { "osdmap": 3212,
[20:40] <joao> <renzhi> "pgmap": 5022700},
[20:40] <gregaf> lunchtime, later folks
[20:41] <joao> it's slurping taking mon.a as its source; and mon.a and mon.b are in the quorum as per your ./ceph -s before
[20:41] <joao> so it's fair to assume that a and b formed a quorum, and c is catching up with 'a'
[20:41] <renzhi> so it's normal that ceph -s command would take a long time to return in this case?
[20:42] <renzhi> because this really affects the application, which connect to ceph via the rados api
[20:42] <joao> renzhi, ceph -s will take some time until the monitors form a quorum
[20:43] <renzhi> ok
[20:44] <renzhi> joao: another ignorant question. could you clarify this line from ceph -s?
[20:44] <joao> but once quorum is formed, I find it odd it takes too long
[20:44] <joao> unless the monitors are caught is some sort of election cycle
[20:44] <joao> in that case, I'd have to look into the logs from the three monitors
[20:44] <renzhi> osdmap e3222: 76 osds: 23 up, 39 in
[20:44] <joao> if that's what's happening, just upload them somewhere and I will take a look :)
[20:44] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[20:44] <renzhi> joao: normally it didn't take very long, but this time, it's crazy
[20:46] <joao> renzhi, does it still take too long to reply?
[20:46] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:46] <renzhi> yes
[20:46] <renzhi> the mon.c log still show this:
[20:46] <renzhi> 2012-10-18 02:38:11.030299 7f137ba94700 1 mon.c@2(slurping) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum
[20:47] <joao> renzhi, that's normal output when a client contacts a monitor that is not in the quorum
[20:48] <joao> I don't know how big your mon store is, but I think it's worth taking a look at the logs and check if there's some other issue there
[20:48] <joao> could you upload them somewhere? pastebin maybe?
[20:48] <renzhi> the whole log file?
[20:49] <joao> from mon.c and mon.a, if that's possible
[20:49] <renzhi> hold on
[20:49] <joao> if they are too big, a considerable portion of, say, the last hour?
[20:49] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[20:50] <joao> last 30 minutes would probably be fine though
[20:50] * Ryan_Lane (~Adium@63.133.198.91) has joined #ceph
[20:50] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) has joined #ceph
[20:55] <renzhi> joao: most interesting part in the last 30 minutes
[20:55] <renzhi> http://pastebin.com/CSeBbMd4
[20:55] <renzhi> for mon.c
[20:55] <joao> thanks
[20:55] * scuttlemonkey (~scuttlemo@63.133.198.36) has joined #ceph
[20:56] <joao> oh...
[20:56] <joao> that's not nearly enough :(
[20:56] * scuttlemonkey (~scuttlemo@63.133.198.36) Quit ()
[20:56] <joao> besides, I have a feeling that you'll need to run the monitor with higher debug levels
[20:56] <renzhi> ok, the log file is over 200MB, I'll grab more
[20:58] <renzhi> what would you recommand me to proceed?
[20:59] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[21:01] * dty (~derek@129-2-129-154.wireless.umd.edu) Quit (Remote host closed the connection)
[21:01] <joao> renzhi, any chance you can bzip them and drop them somewhere for me to download?
[21:01] * dty (~derek@testproxy.umiacs.umd.edu) has joined #ceph
[21:01] <renzhi> http://pastebin.com/5L0FGZVP
[21:02] <renzhi> would this give any hint?
[21:02] <joshd1> scheuk: check out http://ceph.com/docs/master/cluster-ops/troubleshooting-osd/#flapping-osds
[21:03] <joao> renzhi, how are you synchronizing the clocks between servers?
[21:03] <joshd1> scheuk: generally this is less of a problem with more osds
[21:03] <renzhi> ntp
[21:04] <joao> renzhi, I've never dealt with this before; I've heard of issues caused by clock skewing, and this might be it, but if it is in fact a ntp issue, I have no idea how to deal with it
[21:05] <renzhi> I have an ntp server which syncs with an external source, and all nodes sync to that
[21:05] * BManojlovic (~steki@212.200.241.182) Quit (Quit: Ja odoh a vi sta 'ocete...)
[21:05] <joao> but this appears to happen before we start slurping though
[21:06] <joao> actually, after?
[21:06] <joao> looks like the monitor is no longer slurping
[21:06] <renzhi> I don't think so
[21:06] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[21:06] <joao> could you please run 'ceph --admin-daemon=/var/run/ceph/ceph-mon.c.asok mon_status' again?
[21:06] <joao> don't need to paste the whole thing
[21:07] <joao> just the first 3 or 4 lines
[21:07] <joao> or check for the value of 'state'
[21:07] <renzhi> root@s300001:/var/log/ceph# ceph --admin-daemon=/var/run/ceph/ceph-mon.c.asok mon_status
[21:07] <renzhi> { "name": "c",
[21:07] <renzhi> "rank": 2,
[21:07] <renzhi> "state": "probing",
[21:07] <renzhi> "election_epoch": 365,
[21:07] <renzhi> "quorum": [],
[21:07] <renzhi> "outside_quorum": [
[21:07] <renzhi> "c"],
[21:08] * synapsr (~synapsr@63.133.198.91) has joined #ceph
[21:08] <joao> gregaf, sagewk, any ideas on how to solve the clock skewing shown here? http://pastebin.com/5L0FGZVP
[21:09] <sagewk> joao: i fyou add a - clock: task to your teuth job, it'll sync up the clocks...
[21:09] <sagewk> joao: that's on plana?
[21:09] <joao> sagewk, no, it's renzhi's log
[21:10] <scheuk> joshd1: that makes sense as there would be more osds handling the load
[21:10] <sagewk> joao: oh, his clocks are way way skewed (30 seconds). the mons won't work in that case.
[21:10] <joao> my guess is that his monitor is unable to join the quorum due to the clock skew
[21:10] <sagewk> renzhi: you need to set up ntpd
[21:10] <renzhi> joao: this state means it's good now?
[21:10] <renzhi> "state": "peon",
[21:11] <renzhi> sagewk: I did, all servers are running ntpd
[21:11] <sagewk> the leases are based on absolute timestamps. there is some slack, but when the lease is 3-5 seconds and the clocks are off by 30 that's not gonna work
[21:11] <joao> renzhi, yeah, but the skew messages popped up on the log after a successful election
[21:11] <sagewk> stop ntpd, run ntpdate manually, and restart.. ntpd isn't very helpful about fixing itself when the clock is way off
[21:11] <sagewk> it's the leases that trigger it
[21:11] <renzhi> I just did that
[21:11] <joao> I suppose that if the clocks are not sync'ed, it will eventually hop out of the quorum
[21:11] * miroslavk (~miroslavk@63.133.198.36) has joined #ceph
[21:12] <joao> sagewk, oh, okay
[21:12] <renzhi> here's the new status:
[21:12] <renzhi> root@s300001:/var/log/ceph# ceph --admin-daemon=/var/run/ceph/ceph-mon.c.asok mon_status
[21:12] <renzhi> { "name": "c",
[21:12] <renzhi> "rank": 2,
[21:12] <renzhi> "state": "peon",
[21:12] <renzhi> "election_epoch": 400,
[21:12] <renzhi> "quorum": [
[21:12] <renzhi> 0,
[21:12] <renzhi> 1,
[21:12] <renzhi> 2],
[21:12] <renzhi> "outside_quorum": [],
[21:12] <joao> renzhi, cool
[21:12] <renzhi> "monmap": { "epoch": 1,
[21:12] <renzhi> "fsid": "35c281c2-9997-4775-982e-24810da15e6b",
[21:12] <renzhi> "modified": "2012-07-17 16:11:07.215231",
[21:12] <renzhi> "created": "2012-07-17 16:11:07.215231",
[21:12] <renzhi> "mons": [
[21:12] <renzhi> { "rank": 0,
[21:12] <renzhi> "name": "a",
[21:12] <renzhi> "addr": "10.1.0.11:6789\/0"},
[21:12] <renzhi> { "rank": 1,
[21:12] <renzhi> "name": "b",
[21:12] <renzhi> "addr": "10.1.0.13:6789\/0"},
[21:12] <renzhi> { "rank": 2,
[21:13] <renzhi> "name": "c",
[21:13] <renzhi> "addr": "10.1.0.15:6789\/0"}]}}
[21:13] <renzhi> thanks for your help
[21:13] <joao> renzhi, if it remains in that state it should be okay
[21:14] <renzhi> alright
[21:14] <renzhi> I'd like some clarification for this line from ceph -s:
[21:14] <renzhi> osdmap e3261: 76 osds: 22 up, 22 in
[21:14] <renzhi> what are the last two values?
[21:15] <joao> I think it means that 22 osds are running of the 22 in the cluster
[21:15] <joao> let me check
[21:16] <renzhi> weird, I have 76, and all of them are running now
[21:21] <joao> renzhi, can you try 'ceph health' ?
[21:22] <renzhi> I checked that all osd processes are running, but ceph osd dump shows many of them down
[21:22] <renzhi> root@s200001:~# ceph health
[21:22] <renzhi> HEALTH_WARN 2526 pgs backfill; 20074 pgs degraded; 10865 pgs down; 129863 pgs peering; 42683 pgs recovering; 85724 pgs stale; 158302 pgs stuck inactive; 85724 pgs stuck stale; 206972 pgs stuck unclean; recovery 4436454/22831396 degraded (19.431%); 243095/7610472 unfound (3.194%)
[21:23] * BManojlovic (~steki@212.200.241.182) has joined #ceph
[21:23] <sjust> renzhi: it means that there are 76 osds of which 26 are up and 26 have data assigned to them
[21:23] <joao> renzhi, sorry, dinner is ready; bbiab
[21:24] <renzhi> thanks joao
[21:24] <renzhi> sjust: this is my suspicion, but definitely not good.
[21:24] <renzhi> I verified on all nodes that all osd processes are running though
[21:25] <sjust> ceph osd dump will tell you which osds are down
[21:25] <renzhi> yes, I did that, and it shows many osd down
[21:25] <renzhi> but why?
[21:25] <sjust> if you can restart one of those with debug osd = 20 and debug ms = 20, we should get an idea of what's going on
[21:25] <sjust> sorry, debug ms = 1
[21:26] * joshd1 (~joshd@63.133.198.91) Quit (Ping timeout: 480 seconds)
[21:28] <renzhi> how can I do that on the command line?
[21:29] * scuttlemonkey (~scuttlemo@63.133.198.36) has joined #ceph
[21:29] <renzhi> sorry for the ignorance, ceph has been running great so far, this is the first mess up
[21:29] <sjust> you'll want to add the following lines under the osd section of the ceph.conf
[21:29] <sjust> debug osd = 20
[21:29] <sjust> debug ms = 1
[21:29] <sjust> then restart the osd
[21:29] <renzhi> ok
[21:30] <gregaf> sjust: renzhi: are you using cephx?
[21:30] <renzhi> gregaf: yes
[21:30] <gregaf> that also depends on reasonable clock times; it can be much looser than the monitors but if they're off by >10 minutes or something that might do it
[21:30] <gregaf> (just a thing to check before you go digging through logs)
[21:31] * miroslavk (~miroslavk@63.133.198.36) Quit (Quit: Leaving.)
[21:32] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) has joined #ceph
[21:33] * stp (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[21:33] * stp (~stp@188-193-211-236-dynip.superkabel.de) Quit ()
[21:38] * Ryan_Lane (~Adium@63.133.198.91) Quit (Quit: Leaving.)
[21:40] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[21:40] * synapsr (~synapsr@63.133.198.91) Quit (Remote host closed the connection)
[21:42] <renzhi> I got this in one of the osd log:
[21:42] <renzhi> 012-10-18 03:33:44.328129 7f9c3ec80780 0 filestore(/disk7/osd.16) lock_fsid failed to lock /disk7/osd.16/fsid, is another ceph-osd still running? (11) Resource temporarily unavailable
[21:42] <renzhi> 2012-10-18 03:33:44.332148 7f9c3ec80780 -1 filestore(/disk7/osd.16) FileStore::mount: lock_fsid failed
[21:42] <renzhi> 2012-10-18 03:33:44.375261 7f9c3ec80780 -1 ESC[0;31m ** ERROR: error converting store /disk7/osd.16: (16) Device or resource busyESC[0m
[21:42] <renzhi> 2012-10-18 03:40:58.949920 7f4849b76780 0 filestore(/disk7/osd.16) lock_fsid failed to lock /disk7/osd.16/fsid, is another ceph-osd still running? (11) Resource temporarily unavailable
[21:42] <renzhi> 2012-10-18 03:40:58.957146 7f4849b76780 -1 filestore(/disk7/osd.16) FileStore::mount: lock_fsid failed
[21:42] <renzhi> 2012-10-18 03:40:59.079931 7f4849b76780 -1 ESC[0;31m ** ERROR: error converting store /disk7/osd.16: (16) Device or resource busyESC[0m
[21:44] <renzhi> seems to be unable to shutdown the osd
[21:44] <renzhi> when I tried to restart
[21:49] <elder> Is there a problem with the ceph master branch at the moment?
[21:50] <renzhi> why can't I kill the osd process, even with kill -9?
[21:50] <elder> I'll try again. I got an unexpected error in a simple teuthology run.
[21:52] <gregaf> renzhi: what are the process states?
[21:52] <gregaf> are they stuck in disk IO, perhaps? that's the most common cause
[21:54] <renzhi> is there a way to force that?
[21:56] <gregaf> umm, sjust? somebody else?
[21:56] <gregaf> (weakest linux-fu in the room, right here)
[21:57] <sjust> renzhi: gregaf: sorry, got sidetracked
[21:58] <sjust> hmm, I'm not sure how to kill them
[21:58] <scheuk> I would like to ask the community how they would engineer OSDs with this hardware: 6 servers, 2 Raid Controllers w/BBWRC, 8 disks per controller, 1 is the OS Drive, for a total of 15 usable drives, We will be using this mostly for RBDs for write heavy VMs
[22:00] <sjust> renzhi: are you using btrfs?
[22:00] <renzhi> sjust: no, on xfs
[22:01] <sjust> can you reboot the node?
[22:01] <elder> Yep, there's some sort of issue with the ceph master branch. If anyone wants details let me know.
[22:01] <renzhi> trying to, but it got stuck with the osd processes
[22:01] <renzhi> I'm doing it remotely
[22:01] <gregaf> elder: userspace or kernel?
[22:01] <elder> userspace
[22:01] <joao> elder, what happened?
[22:02] <elder> INFO:teuthology.task.ceph:Adding keys to all mons...
[22:02] <elder> INFO:teuthology.task.ceph:Running mkfs on mon nodes...
[22:02] <elder> INFO:teuthology.orchestra.run.err:2012-10-17 12:52:52.775617 7f43ad06e780 -1 store(/tmp/cephtest/data/mon.a) write_bl_ss_impl failed to create dir /tmp/cephtest/data/mon.a/mkfs: (17) File exists
[22:02] <joao> crap
[22:03] <joao> that must be one of my patches sage applied to master an hour ago or so
[22:03] <joao> I can only wonder *why* it is happening
[22:04] <elder> Well, me too...
[22:04] <gregaf> joao: it's got an unconditional mkdir in put_bl_ssn_map
[22:04] <gregaf> looks like he was shortcutting around the error checking
[22:05] <gregaf> but it needs to check for (r < 0 && r != -EEXIST)
[22:05] <gregaf> looks like the same in the others of that patch
[22:05] <gregaf> (e41caa190...)
[22:05] * houkouonchi-work (~linux@12.248.40.138) Quit (Read error: Connection reset by peer)
[22:05] <joao> yeah
[22:05] <joao> damn
[22:06] <gregaf> naughty joao! naught sagewk!
[22:06] * Ryan_Lane (~Adium@63.133.198.91) has joined #ceph
[22:06] <gregaf> ;)
[22:06] <joao> it seemed soooo straightforward though :\
[22:06] <dmick> heh. yeah, ignoring error checking is always more straighforward :)
[22:09] <mikeryan> sagewk: sjust: can i get a review on wip_recovery_reserve
[22:09] <elder> There is no shortcut around error checking.
[22:09] <sjust> mikeryan: yeah
[22:09] <sagewk> joao: are you fixing or shall i?
[22:09] <elder> dmick, is there anything special in the stable branch that would prevent me from mapping a newly-created rbd image?
[22:10] <elder> Usage different or something?
[22:10] <sagewk> stable is ancient, yes.
[22:10] <elder> rbd create image1 --size=1024
[22:10] <elder> OK.
[22:10] <sagewk> next will work
[22:10] <elder> I'll try next then.
[22:10] <elder> Thanks sagewk
[22:10] <sagewk> er, well, it wouldn't support format 2 at least. :)
[22:10] <elder> I don't think I need it.
[22:11] <elder> I'm trying to reproduce the bio_merge problem
[22:11] <dmick> shouldn't be, no
[22:11] <joao> sagewk, working on it
[22:11] <sagewk> k
[22:11] <dmick> elder: are you seeing a problem?
[22:12] <joao> will test it this time with --mkfs, something that I forgot to do since I had a working mon setup around
[22:12] <joao> :\
[22:12] <elder> Never mind, I'm switching to "next" rather than "stable"
[22:12] <elder> dmick,
[22:12] <sjust> mikeryan: wip_recovery_reserve or wip_recovery_reserve_clean?
[22:12] <sjust> oh, upstream, d'oh
[22:14] * joshd (~joshd@63.133.198.91) has joined #ceph
[22:19] <mikeryan> sjust: yes, wip_recovery_reserve on ceph github
[22:19] <sjust> yup
[22:23] <joao> sagewk, should I push it directly onto master or would you like to review it?
[22:24] <sagewk> paste the diff to me?
[22:24] <joao> it's on mon-coverity-fixes
[22:24] <joao> okay
[22:24] <sagewk> oh hold on
[22:24] <sagewk> looks good to me
[22:25] <joao> should I push it to master then?
[22:26] <sagewk> yeah
[22:26] <joao> kay
[22:27] <joao> done
[22:29] * maelfius (~mdrnstm@206.sub-70-197-141.myvzw.com) has joined #ceph
[22:31] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:31] * maelfius (~mdrnstm@206.sub-70-197-141.myvzw.com) Quit (Read error: Connection reset by peer)
[22:31] * maelfius (~mdrnstm@206.sub-70-197-141.myvzw.com) has joined #ceph
[22:31] * maelfius1 (~mdrnstm@206.sub-70-197-141.myvzw.com) has joined #ceph
[22:31] * maelfius (~mdrnstm@206.sub-70-197-141.myvzw.com) Quit (Read error: Connection reset by peer)
[22:36] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[22:43] <sjust> mikeryan: I left some comments you'll want to take a look at, I'll keep looking
[22:43] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[22:48] <PerlStalker> What's the "right" way to cleanly shutdown a ceph cluster?
[22:49] <sagewk> killall will do the trick
[22:49] * Ryan_Lane (~Adium@63.133.198.91) Quit (Quit: Leaving.)
[22:50] <nhm> :)
[22:50] * loicd (~loic@63.133.198.91) has joined #ceph
[22:50] <dmick> clean shutdown: "I can promise you'll feel no pain."
[22:50] <PerlStalker> Lovely
[22:50] * synapsr (~synapsr@63.133.198.91) has joined #ceph
[22:51] <dmick> I suppose if everything has to be robust in terms of failure anyway....
[22:51] <gregaf> it's better because it means your recovery systems are tested ;)
[22:51] <PerlStalker> I'm trying find a procedure to bring every host in the cluster down. If all it takes is shutting down each node, I'm okay with that.
[22:52] <nhm> PerlStalker: having said that, I typically just shut down the services via "sudo service ceph stop"
[22:52] <nhm> PerlStalker: on ubuntu at least.
[22:52] * PerlStalker nods
[22:54] <mikeryan> sjust: saw those in my inbox, looking
[22:54] <mikeryan> and thanks
[22:54] * joshd (~joshd@63.133.198.91) Quit (Ping timeout: 480 seconds)
[22:56] <renzhi> need help with those osd, been struggling for hours :( lots of osds are running, but ceph reports them as down
[22:57] <renzhi> how can I troubleshoot this issue now?
[22:58] * synapsr (~synapsr@63.133.198.91) Quit (Ping timeout: 480 seconds)
[22:59] * steki-BLAH (~steki@212.200.240.41) has joined #ceph
[23:00] <dmick> ceph osd dump is probably a good first thing
[23:00] <dmick> ^ renzhi
[23:00] <sjust> renzhi: have you been able to bring any of them back up by restarting them?
[23:00] <renzhi> dmick: I did, it reports that the osds are down, but acually the processes are running
[23:00] <renzhi> no
[23:01] <dmick> renzhi: I understand, I was just saying that showing the output of that command is probably useful to start with
[23:01] <renzhi> I even rebooted the machines, some of them are back, there are still 20 something osds not coming back
[23:01] <dmick> (showing it to the channel, perhaps in a pastebin)
[23:02] <dmick> and you've checked network connectivity
[23:05] * BManojlovic (~steki@212.200.241.182) Quit (Ping timeout: 480 seconds)
[23:06] <sjust> renzhi: can you pick one down and restart it with debug osd = 20, debug filestore = 20
[23:06] <sjust> sorry, debug osd = 20, debug ms = 1
[23:06] <sjust> as you did before
[23:07] <sjust> basically, the error you saw above was due to the old process still running, we need to get an osd past that point to see the error
[23:07] <renzhi> hold on, trying to paste to pastebin again, but suddenly want me to log in
[23:07] <sjust> ah
[23:09] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[23:12] <renzhi> from ceph osd dump:
[23:12] <renzhi> http://pastebin.com/xCSrqycX
[23:13] <renzhi> the scary thing is I lose osds as time pass by
[23:14] * synapsr (~synapsr@63.133.198.91) has joined #ceph
[23:14] <renzhi> after I rebooted all the servers, at some point, it reports 76 out of 76, but with 66 up. Then gradually, the number of osds decreases
[23:15] <renzhi> it's now only 49
[23:16] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[23:17] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has left #ceph
[23:19] <renzhi> osd log, with debug osd = 20, debug ms = 1
[23:19] <renzhi> http://pastebin.com/pfQ6yKvL
[23:19] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[23:20] * BManojlovic (~steki@212.200.243.180) has joined #ceph
[23:21] * miroslavk (~miroslavk@63.133.198.36) has joined #ceph
[23:21] <gregaf> how many physical nodes are these 76 OSDs on, and how big are the nodes?
[23:21] <renzhi> 8
[23:23] <renzhi> have been trying for hours to restore the service, that's our first big mess :(
[23:25] * steki-BLAH (~steki@212.200.240.41) Quit (Ping timeout: 480 seconds)
[23:26] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[23:27] <sjust> renzhi: is this all the output you got?
[23:27] <sjust> what does dmesg say on that node
[23:27] <sjust> ?
[23:27] * loicd (~loic@63.133.198.91) has joined #ceph
[23:27] <renzhi> I actually have a huge log
[23:27] <sjust> can you compress it and send it to cephdrop@ceph.com (sftp)
[23:27] <renzhi> any particular section you're looking for?
[23:28] <sjust> don't know yet
[23:28] <sjust> did the process eventually crash?
[23:28] <renzhi> the osd log file is over 2GB
[23:28] <sjust> it'll be smaller compressed
[23:29] * steki-BLAH (~steki@212.200.240.42) has joined #ceph
[23:33] * joshd (~joshd@63.133.198.91) has joined #ceph
[23:35] * BManojlovic (~steki@212.200.243.180) Quit (Ping timeout: 480 seconds)
[23:36] <renzhi> sjust: it's asking for a password?
[23:36] <sjust> yeah, I tried to pm you the password, did you see it?
[23:37] <renzhi> no
[23:37] <sjust> meh, it's asdf
[23:37] <renzhi> ok, got it
[23:38] <renzhi> uploading ceph-osd.19.log.gz
[23:38] <renzhi> very slow....
[23:39] <renzhi> it won't be done in half a day... :(
[23:39] <sjust> renzhi: did the process crash, or is it still running?
[23:39] <renzhi> it crashed
[23:39] <sjust> ok, in that case, give me the last 10000 lines
[23:39] <sjust> renzhi: are the osds full?
[23:40] <renzhi> no
[23:41] <sjust> let me know when the file with the last 10000 lines is uploaded
[23:41] * oxhead (~oxhead@nom0065863.nomadic.ncsu.edu) Quit (Remote host closed the connection)
[23:42] <sagewk> elder: how's it looking?
[23:43] * rektide (~rektide@deneb.eldergods.com) has joined #ceph
[23:43] <elder> My fix didn't make the problem go away.
[23:43] <elder> Not completely anyway. I'm working with Dan a bit, conferring.
[23:43] <elder> He's trying to learn about it by reproducing it in a VM and with a debugger I think.
[23:44] <elder> I'm pretty sure I have identified *a* problem though.
[23:44] <elder> And now I'm wondering if that bio_pair thing might also be contributing to some combined problem that leads to this issue. I just don't know though. I'm bummed the problem still showed up with my fix.
[23:48] <renzhi> sjust: uploading log.gz, the last 10000 lines of the osd log
[23:48] <renzhi> thanks for your help, I'm desparate :(
[23:49] * dty (~derek@testproxy.umiacs.umd.edu) Quit (Ping timeout: 480 seconds)
[23:49] <sjust> k
[23:49] * calebamiles (~caleb@65-183-128-164-dhcp.burlingtontelecom.net) has joined #ceph
[23:53] * synapsr (~synapsr@63.133.198.91) Quit (Remote host closed the connection)
[23:53] * synapsr (~synapsr@63.133.198.91) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.