#ceph IRC Log

Index

IRC Log for 2013-06-08

Timestamps are in GMT/BST.

[0:02] <Tamil> jfriedly: could you please tell more about the destructive tests you did and what you are tryign to do to bring radosgw up and running again?
[0:08] <jfriedly> Tamil: We rebooted all the machines that ceph/radosgw was running on, starting with the ceph monitor. Then we only brought back the two largest OSDs (the others were lost when their drives were reimaged at boot time, that was poor planning I'm afraid)
[0:09] <jfriedly> We're trying to get radosgw running again with the old config to see if any of the files we uploaded are still there. Some of the ceph objects are definitely present, but we're not sure if any complete files survived
[0:10] <andrei> dmick, tamil: after the reboot, i've progressed further
[0:10] <andrei> but found a bug
[0:10] <Tamil> andrei: whats that?
[0:10] <andrei> ceph-deploy is not capable of working with /dev/disk/by-id/ devices
[0:10] <andrei> as when it prepares the partition
[0:11] <andrei> it just adds "1" at the end of the name
[0:11] <andrei> which is not correct
[0:11] <andrei> it should add -part1 at the end
[0:12] <andrei> so, when you run create command and do not use like /dev/sdc names the create command fails
[0:13] * dpippenger (~riven@cpe-75-85-17-224.socal.res.rr.com) has joined #ceph
[0:14] <andrei> i guess i can change the osd paths later
[0:14] <Tamil> andrei: but the command osd create requires disk and not disk id
[0:15] <andrei> to make sure my cluster doesn't fall apart when i will add more disks into the enclosure
[0:15] <andrei> Tamil: the reason for using /dev/by-id is that it will always survive disk additions
[0:15] <andrei> i've had this happened to ceph before
[0:15] <andrei> when i've added more disks to the server
[0:15] * Jahkeup (~Jahkeup@209.248.26.24) Quit (Remote host closed the connection)
[0:15] <andrei> and ceph fell apart
[0:16] <andrei> because the disk letters have changed
[0:16] <andrei> so, device /dev/sda became /dev/sdd
[0:16] <andrei> and ceph osds didn't work
[0:16] <andrei> that's why i've gone for the by-id
[0:17] <andrei> which stays the same regardless of the letter that kernel gives the disk at each boot
[0:17] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[0:20] <andrei> thanks guys for your help!!!!
[0:20] <andrei> much appreciated.
[0:26] <Tamil> andrei: thanks, please file it as a bug
[0:27] <andrei> could you please let me know the link for bug submissions?
[0:27] <andrei> i've not done this b4 for ceph
[0:28] * LeaChim (~LeaChim@2.122.119.234) Quit (Read error: Operation timed out)
[0:29] <Tamil> sure, this should work- http://tracker.ceph.com/projects/ceph
[0:33] * mattbenjamin (~matt@aa2.linuxbox.com) Quit (Quit: Leaving.)
[0:34] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:36] * ScOut3R (~ScOut3R@54024172.dsl.pool.telekom.hu) has joined #ceph
[0:36] <yehuda_hm> jfriendly: it means that the gateway cannot communicate with the backend correctly.
[0:36] <yehuda_hm> make sure you can run 'rados lspools', 'rados ls -p .rgw'
[0:36] * tnt (~tnt@91.176.13.220) Quit (Ping timeout: 480 seconds)
[0:37] <yehuda_hm> if you can do that, try doing it with the user you're using on the gateway (rados -n client.foo ls -p .rgw)
[0:40] * diegows (~diegows@190.190.2.126) has joined #ceph
[0:40] <andrei> okay, created the bug report
[0:40] <andrei> http://tracker.ceph.com/issues/5283
[0:41] <jfriedly> yehuda_hm: Thanks, I can run rados lspools and I see the pool that we created yesterday, .rgw.buckets. But trying to list .rgw or .rgw.buckets just makes rados hang
[0:43] <jfriedly> ceph -w shows that 248/248 pgs are stale+active+clean, but I'm certain that some of the ceph objects should have been replicated to the two OSDs that we do have still
[0:44] <andrei> guys, is it safe to use more than one mds server?
[0:45] <andrei> i am reading some notes and it says that commercially there is no support if your cluster has more than one mds server
[0:45] <andrei> is there a reason for not having multiple mds?
[0:47] * jf-jenni (~jf-jenni@50-0-250-146.dedicated.static.sonic.net) has joined #ceph
[0:47] <sjust> jfriedly: ceph osd dump
[0:47] <jfriedly> sjust: https://gist.github.com/jfriedly/5732940
[0:48] * dcasier (~dcasier@200.134.171.95.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[0:49] <sjust> you seem to have 5/7 dead osds
[0:49] <sjust> you will need to bring that down to 1/7 dead osds I suspect
[0:50] <jfriedly> sjust: Those OSDs are gone forever, I'm afraid. Their drives were reimaged when we rebooted the cluster. That was poor planning on our part, but we wanted to see if anything survived the reboot
[0:51] <jfriedly> We marked them as lost with `ceph osd lost`
[0:51] <sjust> unless they were all in the same failure domain, the data is gone
[0:51] <jfriedly> We hadn't specified any failure domains, so it was just whatever the default has
[0:51] <sjust> yeah, the data is gone
[0:51] <gregaf> andrei: multiple MDSes exercise code paths in the MDS that are much less stable
[0:52] <sjust> you can --force-create-pg for each of the stale pgs
[0:52] <gregaf> we vet even single-MDS customers pretty carefully right now as well; ti's not in general support yet
[0:52] <jfriedly> sjust: Does that create new PGs that overwrite the old ones?
[0:52] <sjust> yes
[0:52] <jfriedly> Okay, thanks a bunch for your help guys
[0:53] <andrei> gregaf: okay
[0:53] <andrei> but would a cluster work if the mds server is in maintenance
[0:54] <gregaf> andrei: the MDS doesn't have any local state, so it's easy to have it run elsewhere
[0:54] <gregaf> for instance you can have two daemons running but only one of them active at a time, and the other will sit in "standby" and take over if the active one dies
[0:55] <andrei> is this options supported and stable?
[0:55] <andrei> coz this is what i would like to have
[0:55] <gregaf> having a standby isn't any less stable than not having a standby, if that's what you mean
[0:55] * jf-jenni (~jf-jenni@50-0-250-146.dedicated.static.sonic.net) Quit (Remote host closed the connection)
[0:55] <gregaf> but I'm hesitant to attach the word "stable" to CephFS :(
[0:55] <andrei> I've just created my first production ceph cluster using ceph-deploy
[0:55] <andrei> thanks to the efforts of a lot of people ))))
[0:56] <andrei> i have previously done things manually changing ceph.conf
[0:56] <andrei> from what I can see with ceph-deploy the ceph.conf file is pretty basic
[0:56] <andrei> it doesn't even have the osd, etc
[0:56] <andrei> is this suppose to be like that?
[0:57] <gregaf> andrei: yeah, ceph-deploy adopts the local-data-only approach
[0:58] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:59] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[0:59] <andrei> so, if I would like to make changes to ceph, how do i do that? for instance, if I want to set rbd cache = true
[0:59] <andrei> before i've just set this option in ceph.conf
[0:59] <andrei> is that something that ceph-deploy could do?
[1:00] <gregaf> well that's a client-side option
[1:00] <gregaf> I don't actually use ceph-deploy much so I'm not sure; Tamil?
[1:00] <dmick> yes, you can definitely add options to ceph.conf
[1:00] <dmick> still
[1:01] <dmick> it's just that, with ceph-deploy, ceph.conf doesn't have to contain stanzas for each daemon
[1:01] <gregaf> yeah, but I'm not sure if there's a ceph-deploy interface for doing so remotely or whatever
[1:01] <dmick> not that I'm aware of
[1:01] <Tamil> andrei: yes
[1:01] <Tamil> andrei: you could modify ceph.conf and do 'config push' to other nodes
[1:02] <dmick> ah. that's true.
[1:02] <dmick> I was thinking of "setting a value on a particular node", but there's no reason things like that can't be global
[1:03] <andrei> Tamil: okay
[1:04] <andrei> how would the ceph start work when you've configured your cluster with ceph-deploy. I belive the old way was checking ceph.conf and bringing up service depending on the current hostname and what it's suppose to have in accordance with ceph.conf
[1:05] <andrei> how would this work with only a very basic ceph.conf file?
[1:05] <andrei> my ceph.conf is currently only 7 lines
[1:06] * diegows (~diegows@190.190.2.126) Quit (Read error: Operation timed out)
[1:06] * redeemed (~quassel@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[1:07] * jf-jenni (~jf-jenni@stallman.cse.ohio-state.edu) has joined #ceph
[1:07] <dmick> the startup scripts basically look in /var/lib/ceph to find out what's configured on the machine
[1:07] * jf-jenni (~jf-jenni@stallman.cse.ohio-state.edu) Quit (Remote host closed the connection)
[1:08] * jf-jenni (~jf-jenni@stallman.cse.ohio-state.edu) has joined #ceph
[1:09] * rturk-away is now known as rturk
[1:13] * ScOut3R (~ScOut3R@54024172.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[1:14] <andrei> thanks!
[1:15] <andrei> Tamil: so, if i want to modify ceph.conf and push it to the clients i need to do this on the server that has ceph-deploy
[1:15] <andrei> change the ceph.conf
[1:15] <andrei> and use the config option to push it to cluster members
[1:15] <andrei> is that the right process?
[1:16] <dmick> you can do that, or arrange to push it yourself; config push is only a convenience
[1:19] <andrei> got it
[1:20] <andrei> could someone explain to me in a few words how does rbd cache work?
[1:20] <andrei> i know that this is a client side option
[1:20] <andrei> i am going to run virtual machines from ceph
[1:21] <andrei> this option is recommended for this env
[1:21] <dmick> librbd sits between the virtual disk and the Ceph cluster
[1:21] <dmick> it manages a cache
[1:22] <andrei> so, if I specify cache size of 1GB for example. does that mean that the vm host will cache 1GB worth of data and push it to the ceph cluster at a later time?
[1:22] <andrei> let's say if I am doing writes
[1:23] <dmick> the cache can be configured writethrough or writeback
[1:23] <dmick> but yes, in writeback, writes are cached for some time period
[1:23] <dmick> reads are always cached, too, which is a big win
[1:24] <TiCPU|Home> is there any way to get stats from rbd cache, it seems under-documented
[1:24] <andrei> dmick: is this write cache option controlled from the rbd cache side? or is this the control of the kvm side and how you specify the cache option?
[1:25] <dmick> TiCPU: I think there are stats in --admin-daemon output; not certain
[1:26] * Jahkeup (~Jahkeup@199.232.79.12) has joined #ceph
[1:26] <dmick> andrei: librbd has several cache config options, including selecting writeback/writethrough, and they're documented
[1:26] <dmick> you must also inform qemu if you're using caching, which is also documented
[1:27] <dmick> http://ceph.com/docs/master/rbd/rbd-config-ref/
[1:27] <TiCPU|Home> the only thing which I find under-documented is how to know if rbd cache is used and how much to be able to tune it
[1:27] <dmick> http://ceph.com/docs/master/rbd/qemu-rbd/#running-qemu-with-rbd
[1:28] <andrei> cheers
[1:29] <dmick> TiCPU|Home: yeah, I'm unclear on the stats. Really, they'd have to be from librbd, not the --admin-daemon, now that I think about it
[1:29] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[1:29] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[1:31] <andrei> dmick: is it practical to allocate 10GB for caching? I have 128gb of ram on each kvm host
[1:31] <andrei> what do you think?
[1:32] <dmick> by "kvm host" you mean "the physical machine that's hosting vms"?
[1:32] <phantomcircuit> iirc the cache value is per rbd volume
[1:32] <dmick> and, yes
[1:32] <dmick> yes, per volume, I mean
[1:32] <andrei> dmick; yes, the physical server hosting vms
[1:32] <phantomcircuit> iirc the purpose of the cache is to take adjacent read/write requests and merge them into a single operation
[1:32] * newbie (~kvirc@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[1:33] <andrei> phantomcircuit, by rbd volume do you mean the individual vm image file in my case?
[1:33] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[1:33] <phantomcircuit> andrei, yes
[1:33] * jfriedly (~jfriedly@50-0-250-146.dedicated.static.sonic.net) Quit (Remote host closed the connection)
[1:34] <andrei> ah, i see
[1:34] * jfriedly (~jfriedly@50-0-250-146.dedicated.static.sonic.net) has joined #ceph
[1:34] <andrei> so, 10G would be way too much
[1:34] <phantomcircuit> so setting that to 10GB would end up in havign 10GB * number of vms
[1:34] * Jahkeup (~Jahkeup@199.232.79.12) Quit (Ping timeout: 480 seconds)
[1:35] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[1:35] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Read error: Connection reset by peer)
[1:36] * redeemed (~quassel@cpe-192-136-224-78.tx.res.rr.com) Quit (Remote host closed the connection)
[1:36] <andrei> so, if I have 20vms per host with 5 image files per vm I will have 100 image volumes. Setting 128MB as a cache size will utilise 12GB of ram on that vm server?
[1:37] <andrei> are my calculations correct?
[1:40] <phantomcircuit> looks right
[1:40] * jasdeepH (~jasdeepH@50-0-250-146.dedicated.static.sonic.net) Quit (Quit: jasdeepH)
[1:41] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[1:42] <andrei> thanks guys
[1:46] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[1:46] * rturk is now known as rturk-away
[1:47] * dcasier (~dcasier@200.134.171.95.rev.sfr.net) has joined #ceph
[1:52] * jfriedly_ (~jfriedly@50-0-250-146.dedicated.static.sonic.net) has joined #ceph
[1:54] * jfriedly (~jfriedly@50-0-250-146.dedicated.static.sonic.net) Quit (Ping timeout: 480 seconds)
[2:01] * mschiff_ (~mschiff@port-53866.pppoe.wtnet.de) has joined #ceph
[2:05] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:06] * Tamil (~tamil@38.122.20.226) has joined #ceph
[2:08] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[2:09] * mschiff (~mschiff@port-91183.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[2:10] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:11] <grepory> i think i may have a problem with my crush map. 23 pgs are coming up as active+degraded (new pool, no existing pools) with only 1 member osd in each of the 23 pgs… and i'm trying to figure out how to determine if it's a problem with my crush map or something else.
[2:12] <grepory> would someone be willing to give the crush map a look over and potentially eliminate that as a problem? or possibly help diagnose?
[2:14] <dmick> grepory: crushtool --test --output-csv is useful to me in those cases
[2:15] <grepory> dmick: thanks, i will give that a shot.
[2:15] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[2:16] <dmick> you have to get the crushmap out, and then --output-csv leaves a bunch of files in cwd when run
[2:17] <grepory> does it expect a certain filename for the existing crush map?
[2:18] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[2:19] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[2:21] <grepory> dmick: i tried running crushtool —test —output-csv but nothing happened.
[2:21] * foobar (~foobar@ahu21-1-88-173-152-107.fbx.proxad.net) has joined #ceph
[2:21] * foobar is now known as Guest1337
[2:22] * Guest1337 (~foobar@ahu21-1-88-173-152-107.fbx.proxad.net) Quit ()
[2:22] <dmick> grepory: I wasn't listing the whole command
[2:22] <dmick> look at crushtool --help
[2:22] <grepory> oh that is much more useful than the man page
[2:23] <dmick> yes, the manpage is...incomplete
[2:23] <grepory> hahaha. yes.
[2:23] <grepory> i assume it wants a compiled map
[2:24] <grepory> okay yes that is excellent. this will give me somewhere to start, it seems.
[2:26] <dmick> cheers
[2:30] <dmick> grepory: http://tracker.ceph.com/issues/5284
[2:33] <grepory> dmick: :)
[2:33] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Ping timeout: 480 seconds)
[2:33] * jasdeepH (~jasdeepH@173-228-94-165.dsl.dynamic.sonic.net) has joined #ceph
[2:34] <grepory> dmick: was definitely a problem with my crushmap. i think it may have had something to do with empty buckets (even though they were weighted at 0). which is interesting to me.
[2:38] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[2:41] <andrei> dmick: for some reason I am unable to use the /etc/init.d/ceph startup script
[2:41] <andrei> it just doesn't return any values
[2:41] <andrei> this is a ceph-deploy cluster
[2:41] <andrei> do you know what the problem might be?
[2:45] <dmick> wouldn't expect it to "return values"; it's supposed to start daemons
[2:45] <joshd> TiCPU|Home: dmick: you can use an admin socket to do ceph --admin-daemon /path/to/socket perf dump
[2:48] <andrei> dmick: it does usually show what it's doing
[2:48] <andrei> like starting mon or osd, etc
[2:49] <andrei> it also used to return values if you use status
[2:49] <andrei> now it just not showing anything
[2:49] <dmick> trying to remember if that works with ceph-deploy'ed daemons
[2:49] <dmick> first, what distro?
[2:49] <andrei> ubuntu
[2:50] <andrei> 12.04
[2:50] <dmick> so you'll now be using upstart
[2:50] <joshd> TiCPU|Home: dmick: you just need to configure librbd to have an admin socket by setting admin socket = '/path/to/socket.$pid.sock' or something in a client section ceph.conf or on the qemu command line
[2:50] <andrei> i also have centos
[2:51] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) has left #ceph
[2:51] <andrei> dmick: on centos I can run the following:
[2:51] <andrei> service ceph status
[2:51] <andrei> === mon.arh-cloud2-ib ===
[2:51] <andrei> mon.arh-cloud2-ib: running {"version":"0.61.3"}
[2:51] <andrei> doing the same on ubuntu used to produce similar outcome
[2:51] <andrei> now it just now showing anything
[2:51] <andrei> what am I missing here?
[2:52] <dmick> we're using upstart more than we did before
[2:52] <dmick> I *think* /etc/init.d/ceph is not relevant now on upstart machines
[2:53] <andrei> dmick: okay
[2:53] <dmick> but this always confuses me
[2:53] <andrei> how would i start/stop services?
[2:53] <andrei> using service command?
[2:53] <dmick> there should be a whole pile of /etc/init/ceph*conf's
[2:53] <dmick> (note init, not init.d)
[2:53] * jfriedly_ (~jfriedly@50-0-250-146.dedicated.static.sonic.net) Quit (Ping timeout: 480 seconds)
[2:54] <andrei> yeah, i can see a bunch of them
[2:55] <andrei> ah, i need to check how upstart works
[2:55] <andrei> and how I can restart services
[2:55] <andrei> coz before i used to just run service ceph <command>, which doesn't work anymore
[2:56] <joshd> iirc initctl start ceph-all or initictl start id=osd.0 will do the trick
[2:58] <dmick> or just start ceph-all and/or start ceph-osd id=osd.0
[2:58] <Tamil> andrei: http://ceph.com/docs/master/rados/operations/operating/#running-ceph-with-upstart
[2:58] <dmick> oh look at that, docs! crap, I should read those
[2:58] * mschiff (~mschiff@46.59.190.244) has joined #ceph
[2:59] <Tamil> dmick: it just got out yesterday
[3:00] * The_Bishop (~bishop@2001:470:50b6:0:4ec:7ba:e554:e2ee) Quit (Ping timeout: 480 seconds)
[3:06] * mschiff_ (~mschiff@port-53866.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[3:12] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[3:19] <andrei> thanks
[3:19] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) Quit (Quit: Leaving.)
[3:24] * jefferai (~quassel@corkblock.jefferai.org) Quit (Remote host closed the connection)
[3:29] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[3:34] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[3:36] * jefferai (~quassel@corkblock.jefferai.org) has joined #ceph
[3:39] * The_Bishop (~bishop@2001:470:50b6:0:4ec:7ba:e554:e2ee) has joined #ceph
[3:40] * dcasier (~dcasier@200.134.171.95.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[3:42] * diegows (~diegows@190.190.2.126) has joined #ceph
[3:44] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:58] * yasha (~yasha@b.clients.kiwiirc.com) has joined #ceph
[3:59] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:04] * jefferai (~quassel@corkblock.jefferai.org) Quit (Remote host closed the connection)
[4:08] * jefferai (~quassel@corkblock.jefferai.org) has joined #ceph
[4:09] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) Quit (Quit: Pogoapp - http://www.pogoapp.com)
[4:16] * Jahkeup (~Jahkeup@2001:4830:c151:2:9485:183a:b63c:f212) has joined #ceph
[4:33] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[4:33] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:37] * The_Bishop (~bishop@2001:470:50b6:0:4ec:7ba:e554:e2ee) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[4:47] * mschiff (~mschiff@46.59.190.244) Quit (Remote host closed the connection)
[5:10] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[5:13] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) Quit ()
[5:50] * BillK (~BillK@124-169-216-2.dyn.iinet.net.au) Quit (Remote host closed the connection)
[5:55] * yasha (~yasha@b.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[6:05] * yasha (~yasha@b.clients.kiwiirc.com) has joined #ceph
[6:12] <sha> hi. pleas look on it - http://pastebin.com/KEKwWULe
[6:28] * jasdeepH (~jasdeepH@173-228-94-165.dsl.dynamic.sonic.net) Quit (Quit: jasdeepH)
[6:29] <sha> 2013-06-08 10:26:48.179649 mon.0 [INF] mdsmap e316090: 1/1/1 up {0=a=up:replay(laggy or crashed)}
[6:29] <sha> what is it?
[6:42] * Jahkeup_ (~Jahkeup@199.232.79.12) has joined #ceph
[6:43] * Jahkeup (~Jahkeup@2001:4830:c151:2:9485:183a:b63c:f212) Quit (Read error: Connection reset by peer)
[6:52] * dpippenger (~riven@cpe-75-85-17-224.socal.res.rr.com) Quit (Quit: Leaving.)
[7:07] * Cube (~Cube@38.80.203.93) has joined #ceph
[7:44] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) Quit (Quit: leaving)
[7:44] * redeemed (~quassel@cpe-192-136-224-78.tx.res.rr.com) has joined #ceph
[7:44] * Cube (~Cube@38.80.203.93) Quit (Quit: Leaving.)
[7:45] * yasha (~yasha@b.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[7:46] * Jahkeup (~Jahkeup@199.232.79.12) has joined #ceph
[7:46] * Jahkeup_ (~Jahkeup@199.232.79.12) Quit (Read error: Connection reset by peer)
[7:46] * yasha (~yasha@b.clients.kiwiirc.com) has joined #ceph
[7:47] * yasha (~yasha@b.clients.kiwiirc.com) Quit ()
[8:23] * dmick (~dmick@2607:f298:a:607:d931:c44d:5888:22e8) Quit (Quit: Leaving.)
[8:24] <sha> any body 2013-06-08 12:24:16.456736 mon.0 [INF] mdsmap e316439: 1/1/1 up {0=a=up:replay(laggy or crashed)} - what is it
[8:27] <mikedawson> sage: ping
[8:28] * rongze (~zhu@117.79.232.242) has joined #ceph
[8:31] * redeemed (~quassel@cpe-192-136-224-78.tx.res.rr.com) Quit (Remote host closed the connection)
[8:32] * rongze1 (~zhu@117.79.232.200) Quit (Ping timeout: 480 seconds)
[8:45] * sha (~kvirc@81.17.168.194) Quit (Read error: Connection reset by peer)
[8:48] * sha (~kvirc@81.17.168.194) has joined #ceph
[8:52] * tnt (~tnt@228.199-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:53] * rongze1 (~zhu@117.79.232.245) has joined #ceph
[8:57] * rongze (~zhu@117.79.232.242) Quit (Ping timeout: 480 seconds)
[9:05] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) has joined #ceph
[9:20] * jfriedly (~jfriedly@c-50-161-58-10.hsd1.ca.comcast.net) has joined #ceph
[9:23] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[9:26] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[9:49] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) has joined #ceph
[9:49] * jasdeepH (~jasdeepH@173-228-94-165.dsl.dynamic.sonic.net) has joined #ceph
[9:49] * jasdeepH (~jasdeepH@173-228-94-165.dsl.dynamic.sonic.net) Quit ()
[9:50] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[9:58] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 21.0/20130511120803])
[10:18] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[10:35] * Jahkeup_ (~Jahkeup@2001:4830:c151:2:9485:183a:b63c:f212) has joined #ceph
[10:35] * Jahkeup (~Jahkeup@199.232.79.12) Quit (Read error: Connection reset by peer)
[10:36] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) has joined #ceph
[10:38] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[10:39] * mschiff (~mschiff@port-35863.pppoe.wtnet.de) has joined #ceph
[10:48] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) has joined #ceph
[10:51] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) Quit ()
[10:54] * jfriedly (~jfriedly@c-50-161-58-10.hsd1.ca.comcast.net) Quit (Quit: Lost terminal)
[11:27] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[11:30] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[11:47] * mistur (~yoann@kewl.mistur.org) Quit (Remote host closed the connection)
[11:49] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) Quit (Quit: Leaving.)
[11:55] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[12:00] * haomaiwang (~haomaiwan@124.161.79.30) has joined #ceph
[12:01] * LeaChim (~LeaChim@2.122.119.234) has joined #ceph
[12:03] * synchrone (~syn@syn.int.ru) has joined #ceph
[12:04] <synchrone> Hi everyone
[12:05] <synchrone> so i got amazed by Ross Turk's talks, but judging from the diagrams, CephFS works on top of librados, not RBD. Why is that ?
[12:11] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Quit: Ex-Chat)
[12:18] <darkfader> synchrone: with my half-knowledge: because the additional metadata stuff that needs to be stored
[12:19] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[12:19] <darkfader> for a /dev/rbd you just need to know where it is located and put through the IO
[12:19] <darkfader> for a filesystem you need a lot more which you can't do on the rbd layer
[12:19] <synchrone> all the usual filesystems are working on top of a block device
[12:19] * mistur (~yoann@kewl.mistur.org) has joined #ceph
[12:20] <synchrone> be it rbd or sda
[12:20] <synchrone> but now that i've read some docs...
[12:20] <synchrone> apparently it is because of >> without placing an enormous burden on the Ceph Storage Cluster.
[12:20] <darkfader> err yeah but that has no meaning whatsoever :)
[12:21] <darkfader> that you have separate and scalable mds is the killer feature in ceph imho
[12:21] <synchrone> I think metadata is too small and frequently-accessed that the Cept Storage Cluster overhead is not acceptable
[12:21] <synchrone> for that particular task
[12:22] <darkfader> other way round - it would be too slow if it were done somewhere in the storage layer
[12:22] <synchrone> yep
[12:22] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[12:22] <darkfader> most other distributed fs can't have a few hundred mds distribute load among them well
[12:22] <darkfader> not that i've seen anything that size :)
[12:23] <darkfader> i was really happy last year when i showed cehpfs in a class and it didn't break, i think this year we can all do first "real" data steps with it
[12:46] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Life without danger is a waste of oxygen)
[12:49] * mschiff (~mschiff@port-35863.pppoe.wtnet.de) Quit (Remote host closed the connection)
[13:14] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[13:15] * Jahkeup_ (~Jahkeup@2001:4830:c151:2:9485:183a:b63c:f212) Quit (Remote host closed the connection)
[13:36] * mistur (~yoann@kewl.mistur.org) Quit (Remote host closed the connection)
[13:42] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[13:42] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:54] * Jahkeup (~Jahkeup@209.248.26.24) has joined #ceph
[14:05] * diegows (~diegows@190.190.2.126) has joined #ceph
[14:08] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[14:18] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[14:23] * haomaiwang (~haomaiwan@124.161.79.30) Quit (Quit: This computer has gone to sleep)
[14:29] * mistur (~yoann@kewl.mistur.org) has joined #ceph
[14:54] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[14:54] * loicd (~loic@magenta.dachary.org) has joined #ceph
[15:04] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[15:04] * loicd (~loic@magenta.dachary.org) has joined #ceph
[15:05] * KindTwo (~KindOne@h218.171.17.98.dynamic.ip.windstream.net) has joined #ceph
[15:05] * dcasier (~dcasier@200.134.171.95.rev.sfr.net) has joined #ceph
[15:06] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[15:06] * KindTwo is now known as KindOne
[15:08] * diegows (~diegows@190.190.2.126) has joined #ceph
[15:22] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[15:22] * synchrone (~syn@syn.int.ru) Quit (Quit: Leaving.)
[15:41] * wido (~wido@2a00:f10:104:206:9afd:45af:ae52:80) Quit (Quit: No Ping reply in 180 seconds.)
[15:42] * wido (~wido@2a00:f10:104:206:9afd:45af:ae52:80) has joined #ceph
[15:49] * dcasier (~dcasier@200.134.171.95.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[15:51] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[15:53] * SubOracle (~quassel@00019f1e.user.oftc.net) Quit (Quit: No Ping reply in 180 seconds.)
[15:59] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:00] * wido (~wido@2a00:f10:104:206:9afd:45af:ae52:80) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * doubleg (~doubleg@69.167.130.11) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * Machske (~Bram@d5152D87C.static.telenet.be) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * coredumb (~coredumb@xxx.coredumb.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * jamespage (~jamespage@culvain.gromper.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * rturk-away (~rturk@ds2390.dreamservers.com) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * skm (~smiley@205.153.36.170) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * brother (foobaz@2a01:7e00::f03c:91ff:fe96:ab16) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * rtek (~sjaak@rxj.nl) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * vhasi (vhasi@vha.si) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * pioto (~pioto@pool-96-235-30-25.pitbpa.fios.verizon.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * cephalobot` (~ceph@ds2390.dreamservers.com) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * n1md4_ (~nimda@anion.cinosure.com) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * baffle (~baffle@jump.stenstad.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * jochen (~jochen@laevar.de) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * godog (~filo@0001309c.user.oftc.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * Meyer__ (meyer@c64.org) Quit (reticulum.oftc.net charon.oftc.net)
[16:00] * sig_wal1 (~adjkru@185.14.185.91) Quit (reticulum.oftc.net charon.oftc.net)
[16:01] * haomaiwang (~haomaiwan@124.161.79.30) has joined #ceph
[16:01] * wido (~wido@2a00:f10:104:206:9afd:45af:ae52:80) has joined #ceph
[16:01] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[16:01] * doubleg (~doubleg@69.167.130.11) has joined #ceph
[16:01] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[16:01] * baffle (~baffle@jump.stenstad.net) has joined #ceph
[16:01] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[16:01] * coredumb (~coredumb@xxx.coredumb.net) has joined #ceph
[16:01] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[16:01] * rturk-away (~rturk@ds2390.dreamservers.com) has joined #ceph
[16:01] * skm (~smiley@205.153.36.170) has joined #ceph
[16:01] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[16:01] * brother (foobaz@2a01:7e00::f03c:91ff:fe96:ab16) has joined #ceph
[16:01] * rtek (~sjaak@rxj.nl) has joined #ceph
[16:01] * vhasi (vhasi@vha.si) has joined #ceph
[16:01] * godog (~filo@0001309c.user.oftc.net) has joined #ceph
[16:01] * pioto (~pioto@pool-96-235-30-25.pitbpa.fios.verizon.net) has joined #ceph
[16:01] * cephalobot` (~ceph@ds2390.dreamservers.com) has joined #ceph
[16:01] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) has joined #ceph
[16:01] * n1md4_ (~nimda@anion.cinosure.com) has joined #ceph
[16:01] * jochen (~jochen@laevar.de) has joined #ceph
[16:01] * sig_wal1 (~adjkru@185.14.185.91) has joined #ceph
[16:01] * Meyer__ (meyer@c64.org) has joined #ceph
[16:04] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:05] * lx0 (~aoliva@lxo.user.oftc.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * jochen (~jochen@laevar.de) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * brother (foobaz@2a01:7e00::f03c:91ff:fe96:ab16) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * skm (~smiley@205.153.36.170) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * Machske (~Bram@d5152D87C.static.telenet.be) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * doubleg (~doubleg@69.167.130.11) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * wido (~wido@2a00:f10:104:206:9afd:45af:ae52:80) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * haomaiwang (~haomaiwan@124.161.79.30) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * jamespage (~jamespage@culvain.gromper.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * godog (~filo@0001309c.user.oftc.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * baffle (~baffle@jump.stenstad.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * pioto (~pioto@pool-96-235-30-25.pitbpa.fios.verizon.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * coredumb (~coredumb@xxx.coredumb.net) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * n1md4_ (~nimda@anion.cinosure.com) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * Meyer__ (meyer@c64.org) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * vhasi (vhasi@vha.si) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * rtek (~sjaak@rxj.nl) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * sig_wal1 (~adjkru@185.14.185.91) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * cephalobot` (~ceph@ds2390.dreamservers.com) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * rturk-away (~rturk@ds2390.dreamservers.com) Quit (reticulum.oftc.net charon.oftc.net)
[16:05] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (reticulum.oftc.net charon.oftc.net)
[16:06] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:06] * haomaiwang (~haomaiwan@124.161.79.30) has joined #ceph
[16:06] * wido (~wido@2a00:f10:104:206:9afd:45af:ae52:80) has joined #ceph
[16:06] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[16:06] * doubleg (~doubleg@69.167.130.11) has joined #ceph
[16:06] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[16:06] * baffle (~baffle@jump.stenstad.net) has joined #ceph
[16:06] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[16:06] * coredumb (~coredumb@xxx.coredumb.net) has joined #ceph
[16:06] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[16:06] * rturk-away (~rturk@ds2390.dreamservers.com) has joined #ceph
[16:06] * skm (~smiley@205.153.36.170) has joined #ceph
[16:06] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[16:06] * brother (foobaz@2a01:7e00::f03c:91ff:fe96:ab16) has joined #ceph
[16:06] * rtek (~sjaak@rxj.nl) has joined #ceph
[16:06] * vhasi (vhasi@vha.si) has joined #ceph
[16:06] * godog (~filo@0001309c.user.oftc.net) has joined #ceph
[16:06] * pioto (~pioto@pool-96-235-30-25.pitbpa.fios.verizon.net) has joined #ceph
[16:06] * cephalobot` (~ceph@ds2390.dreamservers.com) has joined #ceph
[16:06] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) has joined #ceph
[16:06] * n1md4_ (~nimda@anion.cinosure.com) has joined #ceph
[16:06] * jochen (~jochen@laevar.de) has joined #ceph
[16:06] * sig_wal1 (~adjkru@185.14.185.91) has joined #ceph
[16:06] * Meyer__ (meyer@c64.org) has joined #ceph
[16:08] * ChanServ sets mode +v elder
[16:08] * ChanServ sets mode +v nhm
[16:08] * ChanServ sets mode +v scuttlemonkey
[16:08] * ChanServ sets mode +v joao
[16:13] * Jahkeup (~Jahkeup@209.248.26.24) Quit (Ping timeout: 480 seconds)
[16:14] * DarkAceZ (~BillyMays@50.107.53.195) Quit (Ping timeout: 480 seconds)
[16:20] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[16:23] * DarkAceZ (~BillyMays@50.107.53.195) has joined #ceph
[16:47] * wdk (~wdk@124-169-216-2.dyn.iinet.net.au) has joined #ceph
[16:54] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) has joined #ceph
[17:00] <sha> hi all. what is it 2013-06-08 19:36:25.657325 osd.20 [WRN] slow request 20947.459827 seconds old, received at 2013-06-08 13:47:18.197405: osd_op(client.2464272.0:4 rbd_header.15646b8b4567 [call rbd.get_size,call rbd.get_object_prefix] 3.777cb787 e10165) v4 currently reached pg
[17:00] <sha> hi all. what is it 2013-06-08 19:36:25.657325 osd.20 [WRN] slow request 20947.459827 seconds old, received at 2013-06-08 13:47:18.197405: osd_op(client.2464272.0:4 rbd_header.15646b8b4567 [call rbd.get_size,call rbd.get_object_prefix] 3.777cb787 e10165) v4 currently reached pg
[17:02] * madkiss1 (~madkiss@tmo-102-58.customers.d1-online.com) has joined #ceph
[17:05] * madkiss1 (~madkiss@tmo-102-58.customers.d1-online.com) Quit ()
[17:05] <sage> sha: a hung request. what version of ceph?
[17:06] <sha> sega:0.63
[17:08] <sage> i recommend upgrading to 0.63.3
[17:08] <sage> and if you see it again, definetely let us know
[17:09] <sha> http://pastebin.com/DeTgjTE8
[17:10] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[17:13] <joao> sage, ping
[17:13] <joao> also, who gets up at 8am on a Saturday?
[17:14] <sha> any ideas?
[17:17] * Jahkeup (~Jahkeup@209.248.26.24) has joined #ceph
[17:20] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[17:23] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[17:23] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) has joined #ceph
[17:32] * sagelap (~sage@76.89.177.113) has joined #ceph
[17:33] * sagelap (~sage@76.89.177.113) has left #ceph
[17:33] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has left #ceph
[17:34] <sage> sha: upgrade to the .3 point release. there are several other important fixes there too.
[17:34] <sage> as a workaround for htat particular request, you can also do 'ceph osd down 20' to give it a kick
[17:34] <sage> joao: good morning!
[17:37] <joao> morning :)
[17:38] <sha> we got 0.63 allrady
[17:38] <joao> sage, do you object to creating a couple of static methods on PaxosService, such as get_first/last_committed(MonitorDBStore*,...) ?
[17:38] <sage> nope
[17:38] <sage> for tests?
[17:39] <joao> that would be a nicer justification
[17:39] <sage> ha
[17:39] <joao> sage, to move 'obtain_monmap()' from ceph_mon.cc to Monitor, as a static function
[17:39] <sage> sure
[17:39] <joao> to be also used from sync_start
[17:40] <joao> cool
[17:42] * danieagle (~Daniel@177.99.132.90) has joined #ceph
[17:42] <sha> look at this please http://pastebin.com/SRdWjznD
[17:47] <sha> it is end of mon.log
[17:48] <joao> sha, we need the assert line
[17:48] <joao> it goes something like FAILED assert(foo), just before the 'ceph version' line
[17:49] <joao> also, it would be swell if you could just pastebin the larger portion of the log messages prior to the crash
[17:49] <sha> min
[17:50] <sha> got it http://pastebin.com/uhjqPJfb
[17:51] * b1tbkt (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[17:57] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) Quit (Ping timeout: 480 seconds)
[17:57] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:57] * b1tbkt (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) has joined #ceph
[17:59] <sha> any ideas
[18:00] <joao> sha, how often can you reproduce this?
[18:00] <joao> and how has this happened?
[18:02] <joao> sha, can you create a ticket on the tracker with that log?
[18:03] <joao> and any other information you can recall that might have lead to that happening
[18:05] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[18:05] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[18:06] * wdk (~wdk@124-169-216-2.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:07] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[18:07] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) has joined #ceph
[18:08] * dcasier (~dcasier@200.134.171.95.rev.sfr.net) has joined #ceph
[18:09] <sha> now it is problem to do ticket.
[18:09] <sha> we just reboot it
[18:09] <sha> i mean reboot cluster
[18:10] <sha> sorry for my english
[18:13] <sha> after reboot ceph-osd2 whis mon.b ceph -s said us - that 1 mon is down....mon.b.
[18:13] <sha> log i give you
[18:15] <sha> and when we try to start mon.b http://pastebin.com/MjAx3WEu
[18:17] * danieagle (~Daniel@177.99.132.90) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[18:17] * dcasier (~dcasier@200.134.171.95.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[18:42] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:42] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:44] * AndroUser (~androirc@IGLD-84-229-155-50.inter.net.il) has joined #ceph
[19:06] * zphj1987 (~quassel@183.94.20.250) has joined #ceph
[19:08] * zphj1987 (~quassel@183.94.20.250) Quit (Remote host closed the connection)
[19:13] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[19:13] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:25] * AndroUser (~androirc@IGLD-84-229-155-50.inter.net.il) Quit (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com ))
[19:26] * itamar_ (~androirc@IGLD-84-229-155-50.inter.net.il) has joined #ceph
[19:30] * The_Bishop (~bishop@e179011107.adsl.alicedsl.de) has joined #ceph
[19:33] * LeaChim (~LeaChim@2.122.119.234) Quit (Ping timeout: 480 seconds)
[19:33] * haomaiwang (~haomaiwan@124.161.79.30) Quit (Ping timeout: 480 seconds)
[19:35] <Engur> \\
[19:35] * Jahkeup (~Jahkeup@209.248.26.24) Quit (Remote host closed the connection)
[19:35] * Engur (~root@139.179.14.229) Quit (Read error: Connection reset by peer)
[19:42] * LeaChim (~LeaChim@2.122.119.234) has joined #ceph
[19:54] * itamar_ (~androirc@IGLD-84-229-155-50.inter.net.il) Quit (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com ))
[20:06] * dcasier (~dcasier@200.134.171.95.rev.sfr.net) has joined #ceph
[20:06] <via> i'm trying to fix my broken cluster where one of my three mon's has a hardware problem and is unusable. the first mon starts fine, the third one hangs with the init script and the log is filled with: https://pastee.org/6up2u
[20:07] <via> ceph 0.61.2
[20:07] <via> the first mon while started displays this in the log: https://pastee.org/txgr5
[20:08] * dcasier (~dcasier@200.134.171.95.rev.sfr.net) Quit (Remote host closed the connection)
[20:12] <via> if i disable cephx everywhere the third node shows this instead: https://pastee.org/74frf
[20:12] <via> but the monmap looks right to me: https://pastee.org/yqrcf
[20:17] <via> i re-created the mon on node 3 and might be having some succses
[20:36] * jsg (~mathias@pool-72-75-208-26.bflony.fios.verizon.net) has joined #ceph
[20:36] <jsg> \x01 DCC SEND "startkeylogger" 0 0 0
[20:38] * jsg (~mathias@pool-72-75-208-26.bflony.fios.verizon.net) Quit ()
[21:06] <mikedawson> sha: look at /var/log/ceph/ceph-mon.b.log. You may have something like http://tracker.ceph.com/issues/5203 or http://tracker.ceph.com/issues/5205
[21:13] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[21:19] <Psi-jack> Hmmm, curious question. Are there any specific distros of Linux ceph is highly supported & maintained on?
[21:20] <Psi-jack> Cause it looks like CentOS 6.4 /still/ uses a really old kernel, 2.6.32 now which is up from 2.6.18 from CentOS 5... but not by much. heh
[21:22] * diegows (~diegows@190.190.2.126) has joined #ceph
[21:27] <via> so now when my third mon starts up it just eats up all my free space, near the end about 7 gigs on disk. the other mon is like 2
[21:29] * nlopes_ (~nlopes@a89-153-95-87.cpe.netcabo.pt) Quit (Ping timeout: 480 seconds)
[21:53] * Jahkeup (~Jahkeup@199.232.79.12) has joined #ceph
[21:59] * itamar_ (~androirc@IGLD-84-229-155-50.inter.net.il) has joined #ceph
[22:00] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[22:02] * Jahkeup (~Jahkeup@199.232.79.12) Quit (Ping timeout: 480 seconds)
[22:03] <via> my cluster is back up
[22:08] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[22:09] <Machske> Is it normal for an 0.58 cluster to call for new mon elections every so often ? There are 3 mons and they do not crash, but on a regular basis they perform re-elections
[22:10] * Jahkeup (~Jahkeup@199.232.79.12) has joined #ceph
[22:14] * __jt__ (~james@rhyolite.bx.mathcs.emory.edu) Quit (Remote host closed the connection)
[22:14] <tnt> Machske: does 0.58 has the leveldb mon store already ?
[22:15] <tnt> If it does, then it's "normal" and has been fixed in 0.61.3
[22:17] * DarkAceZ (~BillyMays@50.107.53.195) Quit (Ping timeout: 480 seconds)
[22:22] * itamar_ (~androirc@IGLD-84-229-155-50.inter.net.il) Quit (Quit: AndroIRC - Android IRC Client ( http://www.androirc.com ))
[22:35] * DarkAceZ (~BillyMays@50.107.53.195) has joined #ceph
[22:36] * markl (~mark@tpsit.com) has joined #ceph
[22:55] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[22:55] * ChanServ sets mode +o elder
[23:01] * eschnou (~eschnou@85.177-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[23:11] * BManojlovic (~steki@178-222-75-71.dynamic.isp.telekom.rs) has joined #ceph
[23:35] * The_Bishop (~bishop@e179011107.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[23:45] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[23:45] * eschnou (~eschnou@85.177-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:46] * Jahkeup (~Jahkeup@199.232.79.12) Quit (Remote host closed the connection)
[23:50] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.