#ceph IRC Log

Index

IRC Log for 2012-12-08

Timestamps are in GMT/BST.

[0:01] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[0:02] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[0:02] * ChanServ sets mode +o scuttlemonkey
[0:16] * roald (~Roald@87.209.150.214) Quit (Read error: Connection reset by peer)
[0:19] * Machske (~bram@d5152D87C.static.telenet.be) has joined #ceph
[0:22] <Machske> hi guys, I'm trying v0.55. Setup seems to go OK, mkcephfs seems fine, but the osd's won't start. I'm failing to see the problem here. It keeps telling 'No filesystem type defined!' for every defined osd.I have defined 'osd fstype = xfs' but that does not seem to make much differences. Any clues on where to look ?
[0:23] <infernix> got my hardware pricing
[0:23] <infernix> with 2 way replica, i'm looking at $0.30/GB
[0:23] <infernix> for 12 disk nodes
[0:24] <infernix> lowest I can go is $0.19/GB with the 4U 36 disk boxes, but the aggregate disk BW at 80MB/sec comes to 2700MB/sec which is 21GBit
[0:24] <infernix> maybe i can get infiniband to do that. maybe.
[0:25] <rweeks> plus I know nhm has very specific concerns about disk numbers that high in a single node
[0:25] <infernix> got 6 DL380s prepped for the weekend, 7 OSD disks each but 10k SAS. will see what that yields
[0:25] <infernix> well yeh the main concern in a fully populated 36 disk 3TB box is the rebuild time
[0:25] <infernix> assuming 1GB/sec, that's a whopping 29 hours
[0:26] <rweeks> yep
[0:26] <infernix> but that's only when the box is completely filled
[0:26] <rweeks> plus can that box actually sustain full throughput from 36 disks over PCI and the wire?
[0:26] <infernix> and i'm not sure how often it would happen
[0:26] <infernix> pcie 3.0, yes; infiniband, maybe.
[0:26] <infernix> with RDMA no problem. but there's no RDMA in ceph yet
[0:26] <rweeks> yes.
[0:26] <rweeks> yet.
[0:27] <infernix> so it's down to ipoib, and perhaps with the offload libraries it can do 25gbit
[0:27] <infernix> which is what i'll test this weekend
[0:27] <rweeks> neat
[0:27] <rweeks> love to see your results
[0:27] <tontsa> in those supermicros you can install multiple x8 pcie NICs though if you need troughput
[0:27] <joshd> Machske: check the init script and mkcephfs - those are the places where that error comes from
[0:27] <tontsa> then you just need to take into consideration switch costs too
[0:28] <infernix> with 12 disks, and a 24gbit sas backplane, 1x pcie 3.0 is plenty
[0:28] <infernix> 36 disk box takes 2 HBAs
[0:28] <infernix> 1gbit on the 24 port backplane per disk, 2gbit on the 12 port
[0:28] <infernix> a 40gbit QDR HBA is about $600; an unmanaged 36 port IB switch is about $10k
[0:29] <infernix> 10gbit is what, $4k for 24 ports?
[0:29] <dmick> Machske: are you seeing http://tracker.newdream.net/issues/3581 ?
[0:29] <tontsa> infermix, depends how chinese you go
[0:29] <infernix> i still have to run the numbers on the network costs
[0:30] <infernix> if you want it redundant though it gets costly
[0:30] <infernix> $1200 for ports in the server, $20k for the switches
[0:31] <Machske> dmick: no /etc/init.d/ceph -a start osd gives === osd.11 ===
[0:31] <Machske> No filesystem type defined!
[0:31] <Machske> joshd: checking the init script
[0:31] <tontsa> it adds up quickly especially if you need manpower to manage the "unmanaged" switches
[0:31] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:31] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:32] <Machske> looks like this is the problem: "if [ $dofsmount -eq 1 ] && [ -n "$fs_devs" ]; then", the osd's are already mounted, can I defined dofsmount to be 0 somewhere ?
[0:33] <Machske> changing -eq 1 to 10 for fun :)
[0:33] <joshd> Machske: actually I think the option should just be 'fs type = xfs' instead of 'osd fstype = xfs'
[0:34] <Machske> under [osd] ?
[0:35] <joshd> although if you aren't using that option for mkcephfs, and the osd data directories exist, the init script shouldn't need it
[0:35] <mythzib> I don't understand how I can have two clients hosts mounting the same rbd image
[0:35] <mythzib> and have same files on both hosts
[0:35] <dmick> mythzib: generally you don't
[0:35] <mythzib> :(
[0:35] <dmick> that's like mounting one disk drive from two hosts; it's just a bad idea
[0:36] <dmick> do you really want a shared filesystem?
[0:36] <mythzib> as cehpfs is not ready... yep :(
[0:36] <rweeks> You wouldn't do that with iSCSI or FC, why would you do it with RDB?
[0:36] <rweeks> You don't share block devices natively, that's just a horrible idea
[0:37] <rweeks> mythzib: in the interim, why not export the RBD via nfs or samba
[0:37] <dmick> cephfs may be 'ready' for some definitions of ready
[0:37] <mythzib> I have strange behavior
[0:38] <mythzib> when doing write+read, speed are horrible
[0:38] <mythzib> on rbd it's ok
[0:38] <dmick> when doing write+read where?
[0:39] <Machske> joshd: changing -eq 1 to 10, seems to start the osd's, so the init script wants to mount the osd's while they are already mounted
[0:39] <Machske> ceph -s gives: health HEALTH_OK
[0:39] <Machske> so it looks to be up and running
[0:40] <mythzib> dmick: typically dd if=/dev/zero of=file2 bs=1M count=1024 and from an other client doing "wget -O /dev/null http://host/file" at the same time
[0:40] <mythzib> dmick: with rbd, both speeds are OK, on cephfs very slow
[0:40] <dmick> *on cephfs*
[0:40] <dmick> ok
[0:42] <mythzib> i'm using ceph 0.55, ubuntu 12.04 kernel 3.6.9
[0:42] <joshd> multiclient access is slower because the distributed filesystem in doing it's job and coordinating access. rbd is totally unsafe with multiclient stuff
[0:42] * benpol (~benp@garage.reed.edu) has left #ceph
[0:43] * jlogan (~Thunderbi@72.5.59.176) has joined #ceph
[0:44] <Machske> got a general question. How "up2date" is the ceph fs driver in the linux source distribution ? For example, is it ok to use the ceph fs driver from kernel source 3.6.1 with ceph 0.55 ?
[0:45] <joshd> yeah, upstream 3.6+ is ideal
[0:46] <Machske> kewl thx
[0:48] <joshd> Machske: what was your ceph.conf when you hit that init script problem?
[0:49] * jlogan2 (~Thunderbi@2600:c00:3010:1:8474:22a3:9709:26c3) Quit (Ping timeout: 480 seconds)
[0:49] * scalability-junk (~stp@dslb-084-056-037-229.pools.arcor-ip.net) has joined #ceph
[0:50] <joshd> Machske: seems that would only happen if you defined 'fs devs', but not 'fs type'
[0:50] <Machske> should I just paste it completely here ?
[0:50] <Machske> I did define fs devs
[0:50] <dmick> pastebin would be better
[0:50] <Machske> is that my mistake ?
[0:51] <joshd> yeah
[0:51] <Machske> k I should read more documentation :s
[0:51] <joshd> if you're mounting and mkfsing them yourself, you don't need to define it at all
[0:51] <joshd> I'm not sure this particular change was well documented
[0:52] <Machske> so in case the osd's are already mounted, fs type and devs can be left out ?
[0:52] <joshd> yeah
[0:52] <Machske> leason learned
[0:53] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[0:59] <Machske> hmm, this sounds not that great: [408237.872070] libceph: tid 298 timed out on osd21, will reset osd
[0:59] <Machske> although a bad test, I did a simple dd test to see all osd's starting to work
[1:00] <Machske> but after 2.4GB, it freezes up
[1:02] <infernix> can I have delayed replicas?
[1:02] <infernix> e.g. have 2 replicas synchronous, and a 3rd asynchronous
[1:03] <rweeks> not today
[1:04] <joshd> Machske: is the cluster still healthy? and you're using cephfs, rbd?
[1:04] <Machske> cluster healthy, using cephfs
[1:05] <dmick> joshd: wip-rbd-export-progress
[1:05] <Machske> note, I did mount it on one of the cluster nodes on a seperate mountpoint ofcourse, I've read that this should actually not be done
[1:06] <joshd> yeah, you can get a deadlock
[1:06] <dmick> ls
[1:06] <dmick> oops
[1:06] <Machske> howcome ?
[1:09] <joshd> dmick: s/size_t/uint64_t/ to match the types
[1:09] <dmick> grr. ok
[1:09] <joshd> dmick: otherwise looks good, probably just go to next, not testing though
[1:09] <dmick> ok
[1:10] <joshd> Machske: it's the same issue as with nfs and loopback mounts - if you want to dig deeper, http://tracker.newdream.net/issues/3076
[1:12] <Machske> thx joshd
[1:12] <joshd> np. as that bug says, it should be in a faq somewhere
[1:15] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has left #ceph
[1:15] <Machske> I'm a little disapointed by this though, I wanted to build a cheapo Xen cluster with 4 machines using ceph to store the vm images. Using all disks from the 4 nodes while running the vm's on those same machines. Looks like I'll have to find another way :)
[1:16] <gregaf> you can do that with QEMU since it'll do the RBD mount itself
[1:16] <gregaf> Xen support for that will happen eventually...
[1:16] <rweeks> right. so don't use the kernal RBD but use QEMU insstead
[1:19] * KindOne (KindOne@h183.63.186.173.dynamic.ip.windstream.net) Quit (Remote host closed the connection)
[1:19] <Machske> thx for the tip! we'll test this next, first need to recover the machines from deadlock :)
[1:21] <dmick> ARGH APT FORGET ABOUT THIS PACKAGE SOURCE
[1:28] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[1:28] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:28] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:30] * ircolle2 (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[1:30] <rweeks> settle down, beavis
[1:35] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[1:39] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[1:47] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:48] <jmlowe> any of devs on?
[1:48] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:49] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:52] <gregaf> we're mostly still here for at least a little while; what's up jmlowe?
[1:52] <jmlowe> I think there is a cephx bug
[1:54] <jmlowe> keep getting "libceph: osd5 xxx.xxx.xxx.xxx:6809 socket closed" and "libceph: osd5 xxx.xxx.xxx.xxx:6809 connect authorization failure"
[1:55] <jmlowe> I don't think I had trouble before 0.55, and I had cephx enabled
[1:55] <jmlowe> on another node it's another osd that it complains about
[1:56] <gregaf> okay, there are some issues in v0.55 but yehudasa can make sure this is a known one
[1:56] <gregaf> jmlowe: are you seeing any other symptoms, or just those libceph messages?
[1:56] <jmlowe> just those messages, afaik, there are corresponding errors in the osd logs
[1:57] <gregaf> can you read and write from the cluster, though?
[1:57] <jmlowe> yep, seems to be functioning
[1:57] <jmlowe> health ok
[1:58] <jmlowe> ceph-osd: 2012-12-07 19:53:40.856596 7f6f894de700 0 auth: could not find secret_id=0
[1:58] <jmlowe> ceph-osd: 2012-12-07 19:53:40.856600 7f6f894de700 0 cephx: verify_authorizer could not get service secret for service osd secret_id=0
[1:59] <yehudasa> ah, that looks like a different issue
[1:59] <yehudasa> I think that's the stuff dmick was looking at
[1:59] <jmlowe> ceph-osd: 2012-12-07 19:53:40.856604 7f6f894de700 0 -- xxx.xxx.xxx.xxx:6809/16426 >> yyy.yyy.yyy.yyy:0/2246996149 pipe(0x7df8240 sd=28 :6809 pgs=0 cs=0 l=1).accept: got bad authorizer
[1:59] <yehudasa> jmlowe: does the client otherwise function?
[2:00] <jmlowe> yes
[2:00] <jmlowe> just rsync'ed about 388GB to a rbd device while it complained the whole time
[2:01] <yehudasa> well, we'll need to understand where that comes from, I think we already have an open bug for it
[2:01] <yehudasa> jmlowe: how frequent are these message?
[2:01] <yehudasa> messages
[2:01] <jmlowe> ok, that's what I want to hear, you are winston wolf'ing it
[2:02] <jmlowe> 5-10 minutes apart I think
[2:02] <yehudasa> always to the same osd?
[2:02] <jmlowe> not at all regular in period
[2:02] <dmick> yehudasa: I'm not looking at it
[2:03] <yehudasa> dmick: ok
[2:03] <dmick> I was removing it from the rbd project and trying to stick it on someone else :)
[2:03] <jmlowe> always the same osd for a client, change client change osd
[2:03] <yehudasa> dmick: probably peter
[2:03] <jmlowe> http://pastebin.com/X2DWWvdL
[2:04] <jmlowe> I believe that covers an entire transaction
[2:06] <dmick> yehudasa: I tried
[2:12] * nwat (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[2:25] <yehudasa> jmlowe: opened issue #3591
[2:28] * The_Bishop (~bishop@2001:470:50b6:0:24dc:9482:b62d:3434) Quit (Ping timeout: 480 seconds)
[2:28] <infernix> so once i cobblerize ubuntu 12.04 on these 6 boxes - best way of deployment? chef?
[2:29] <infernix> or manual to get a feel for how it works?
[2:33] <gregaf> I don't think it's packaged for Ubuntu yet, but I'd probably recommend ceph-deploy for anybody who's starting something new on Ubuntu
[2:33] <gregaf> though if you're already familiar with Chef and are planning to use it that would also certainly work
[2:33] <gregaf> gotta run, though!
[2:34] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[2:48] * jlogan (~Thunderbi@72.5.59.176) Quit (Read error: Connection reset by peer)
[2:48] * jlogan (~Thunderbi@72.5.59.176) has joined #ceph
[2:50] * The_Bishop (~bishop@2001:470:50b6:0:24dc:9482:b62d:3434) has joined #ceph
[2:51] * plut0 (~cory@pool-96-236-43-69.albyny.fios.verizon.net) has joined #ceph
[3:05] * mythzib (52e7d4bf@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[3:15] <wer> ceph osd create 1 gives me (22) Invalid argument. Run with no arguments I received 1... and the osd was created.
[3:15] <wer> Now ceph osd tree doesn't have any hostnames or pools associated with osd.1.... Seems like I am missing the other nodes hostname in the crushmap?
[3:16] <wer> Whe I attempted to edit the crushmap to what I thought it should look like.... crushtool literally exploded with errors and traces.
[3:16] * aliguori (~anthony@cpe-70-113-5-4.austin.res.rr.com) Quit (Quit: Ex-Chat)
[3:17] * Cube1 (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[3:17] <wer> is there a way to use ceph to add the rack info.. and the weight of the host node? Cause it seems every time I use and osd id as an argument... I get Invalid argument
[3:18] <wer> I am guessing things went wrong on step 9 of adding an osd.... http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[3:19] <wer> ceph osd crush set {id} {name} {weight} pool={pool-name} I tried...
[3:20] <wer> ceph osd crush set 1 ceph .2 pool=default --> (22) Invalid argument
[3:21] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:21] <wer> So all my osd's are up and in... and the space occasionally reported in the logs reflects correctly.... but there is no hostname and total weight value in ceph osd tree like I would expect to see.
[3:21] <wer> I am wondering if v.55 adding osd's requires a slightly different methodology.
[3:25] <dmick> I would not have expected ceph osd create 1 to fail; I'd want to know why that happened I think
[3:25] <wer> yeah, it caught me off guard :(
[3:26] <wer> with no args it started at 0... and I just kept adding them that way until I got 0-23 added like I wanted.....
[3:27] <dmick> was osd.1 noted in ceph.conf at the time it failed?
[3:27] <wer> yes
[3:28] <wer> and check out this tree.... http://pastebin.com/EBTsiy0B
[3:29] <wer> The total weight is wrong... and there is no host entry for the second host/node. 0-23 are just there magically.
[3:30] * LeaChim (~LeaChim@b0fafb7d.bb.sky.com) Quit (Remote host closed the connection)
[3:30] <dmick> well the weight matches the nodes on root, right?
[3:31] <dmick> the problem is 0-23 aren't really 'in' the map
[3:31] <wer> yes it does... but all those others are "in" but not part of anything. They are in.
[3:31] <dmick> they're in the cluster, but not in the crushmap
[3:31] <dmick> I mean, not in any useful way
[3:31] <wer> dmick: right. They don't look like they are in the correct map. right.
[3:31] <dmick> so, you said ceph osd crush set 1 ceph .2 pool=default
[3:32] <dmick> what is ceph.2?
[3:32] <wer> I tried yes. but invalid argument....
[3:32] <wer> .2 was the weight.
[3:32] <dmick> oh oh there's a space there
[3:32] <dmick> so what is 'ceph'? the cluster name?
[3:32] <wer> And I was guessing the {name} was the cluster... which I think is ceph..... I dunno though :)
[3:32] <dmick> I don't think that's what name means there
[3:32] <wer> ok good :)
[3:33] <dmick> if you go to http://ceph.com/docs/master/rados/operations/crush-map/#addosd for details
[3:33] <dmick> it shows that's the full name of the osd (i.e. osd.{id})
[3:33] <wer> ok
[3:34] <wer> ceph osd crush set 1 osd.1 .2 pool=default invalid argument....
[3:35] <wer> again... guessing at some stuff.
[3:35] <wer> I think I have to specify pool :)
[3:35] <dmick> yep
[3:35] <dmick> but you did
[3:36] <wer> yeah, so that command yield invalid argument
[3:36] <dmick> sure would be nice if it would tell us why
[3:36] <dmick> you can add debug log config opts to the command
[3:36] <dmick> that might show more info
[3:36] <wer> to the command itself?
[3:36] <dmick> how about, say, ceph --debug_osd 20 <the rest>
[3:36] <dmick> yes
[3:38] <wer> (22) Invalid argument
[3:38] <wer> :)
[3:38] <joao> oh... you won't get anything from the ceph tool with extra debugging
[3:38] <dmick> awesome!
[3:38] <dmick> oh good, joao just worked on this code. help wer out joao :)
[3:38] <joao> those return values come directly from the monitor
[3:38] <dmick> and I'll buy you a burrito next time you're in the states
[3:38] <wer> k. I was just digging in the mon logs....
[3:39] <joao> wer, your problem is that you need to specify a placement according to your crush map
[3:39] <joao> that being a 'root', a 'row', a 'rack', or any combination of them
[3:39] <wer> ok.
[3:40] <joao> say, 'ceph osd crush set 1 osd.1 1.0 root=default'
[3:40] <joao> that would work
[3:40] <joao> dmick, would you mail me a burrito?
[3:40] <dmick> wait...root=default?
[3:40] <joao> that would be swell
[3:40] <joao> dmick, just an example
[3:40] <dmick> docs say "pool=", and say that the root is named 'default' by default
[3:40] <dmick> and imply that pool= is required
[3:40] <joao> docs are confusing
[3:40] <wer> agree
[3:41] <dmick> perhaps we could fix them :)
[3:41] <joao> we should
[3:41] <wer> well that totally worked
[3:41] <joao> dmick, talked about that with someone a while ago, and got the idea that 'pool' is the new 'root', or something like that
[3:42] <wer> it moved the osd up in the tree.... so can I specify host and create a new rack somehow?
[3:42] <joao> unfortunately, I tend to work out the commands I use by reading the code, so I might have missed on some convention where 'pool' was all that was needed
[3:43] <wer> :) nw. As long as I know what goes where.... I don't really care.
[3:43] <wer> though I hear mkcephfs may go away... which means nothing will be looking correct for me....
[3:43] <joao> wer, iirc, if you use something like 'root=default rack=foo', it will create rack foo under default (I think) if it didn't exist before
[3:43] <joao> but I'm not at all used to mess with crushmaps
[3:43] <joao> just the one patch I think
[3:44] <wer> so, why doesn't any of this get taken from the config? Cause I am having a hard time figuring out what ceph.conf is for other then to give the human something to keep up to date?
[3:46] <wer> joao: that totally worked btw...
[3:47] <joao> cool
[3:48] <dmick> default crushmaps are made when you make the cluster; when you add osd's later, you have to sorta patch them in. just the way it goes I spose
[3:49] <wer> ok. I will have to diff this new crush against the one I tried to make.... and see why there was such an explosion when attempting to compile...
[3:50] <wer> so when I reweight an osd, it doesn't change the value we added when first putting it in the map... it updates the value all the way to the right on the tree. I don't understand the difference.
[3:50] <joao> well, I'm off to bed
[3:50] <wer> damnit. Well thanks for the help joao .
[3:51] <joao> np
[3:51] <dmick> wer: I think there's 'reweight' and 'change weight' and they're different
[3:51] <dmick> I don't remember how
[3:52] <dmick> I'd google for osd reweight because I think it was discussed on the list
[3:52] <wer> ok. I actually just moved them again with the new values.... and I can see the network traffic. So I consider the one the weight for the map.... and the other for the osd's themselves.... Maybe?
[3:55] <wer> thanks dmick! This is enough progress to call it a day :) Have a good weekend.
[3:55] <dmick> yw; wish I had more authorative answers but I'm just not a CRUSH expert
[3:56] <dmick> gl, gw to you too
[3:56] <wer> ty
[3:56] * wer is now known as wer_gone
[3:58] <dmick> wer_gone, wer: http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/8904
[4:00] <wer_gone> heh. I don't understand why he has one full osd :)
[4:00] <wer_gone> ty. I will have to read through that thread. k later!
[4:00] <dmick> a key thing is "reweight 1 is ideal". I don't understand it, but there you go. clearly we need better docs
[4:01] <wer_gone> lol yeah. I am learning to speak cephese :)
[4:02] <wer_gone> host= is also scary.... cause it is hard coded in the config..... which bugs me.
[4:03] <dmick> you mean in ceph.conf?
[4:03] <dmick> I'm fairly sure that's used almost exclusively for mkcephfs; if you deploy with ceph-deploy, for example, it's not necessary
[4:04] <wer_gone> seriously? Cause I hate ceph.conf :)
[4:04] <dmick> no I mean host=
[4:04] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[4:04] <dmick> but ceph.conf gets much smaller in other deployment techniques
[4:05] <dmick> and you don't *absolutely* have to have one; it's just convenient lots of times. As I say, mkcephfs requires it, but that's only one way to deploy
[4:05] <dmick> basically the only thing really 'fixed' is the mon addresses. Everything can be discovered dynamically through the mons, pretty much. (well, and keys of course.)
[4:06] <wer_gone> yeah, I don't know any of them yet.... I see chef... never heard of ceph-deploy.... but I should probably start looking into them once I have a suitable test platform. Getting stability has actually been a little hit or miss..... But things appear stable atm. Right. The mons are making more sense to me now :)
[4:06] <wer_gone> ok. really gone. thanks again!
[4:06] <dmick> night
[4:14] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[4:15] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[4:22] * plut0 (~cory@pool-96-236-43-69.albyny.fios.verizon.net) has left #ceph
[4:24] <jefferai> gregaf: still around?
[4:25] <jefferai> dmick: joao: gregaf: I'm trying to do the upgrade with the checked-over methodology from here: http://paste.kde.org/621956/raw/
[4:25] <jefferai> only problem -- when I mark an osd down, it's being marked up again automatically
[4:25] <jefferai> is that normal? do I need to stop the daemon before I mark it out?
[4:25] <jefferai> er, down
[4:25] <jefferai> not out
[4:26] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:26] <jefferai> on some blogs I saw suggestions that you in fact need to mark the osd out, not down
[4:26] <jefferai> but greg indicated that that wasn't what I wanted to do
[4:28] <dmick> no you don't want out
[4:28] <jefferai> is there a "noup" command?
[4:28] <dmick> good question
[4:29] <jefferai> well
[4:29] <jefferai> it appears to work
[4:29] <jefferai> at least, doesn't error when I run that
[4:29] <dmick> no, that's no indication sadly
[4:30] <dmick> however
[4:30] <dmick> ceph osd set <noout|noin|nodown|noup>
[4:30] <dmick> is in the usage
[4:30] <dmick> so you may have something there
[4:30] <jefferai> seems to work
[4:30] <jefferai> cool
[4:31] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[4:32] <dmick> ceph osd dump should show you flags
[4:32] <dmick> to confirm
[4:33] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit ()
[4:42] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[4:44] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit ()
[4:49] <jefferai> um
[4:49] <jefferai> stupid question
[4:49] <jefferai> what's the opposite of "ceph osd down 0"?
[4:49] <jefferai> because it ain't "ceph osd up 0"
[4:50] <jefferai> since that -- at least on 0.55 -- is telling me that up is an unknown command
[4:51] <dmick> yet another good question
[4:52] <dmick> guessing, I'd say try resetting the noup
[4:52] <dmick> and if that doesn't work, kick it with osd in
[4:52] <jefferai> hm, ok
[4:53] <dmick> (I suspect the reason there's no "up" is that it's just supposed to happen whenever the daemon comes back, if it was a network interruption, crash, etc., whatever)
[4:54] <jefferai> I see
[4:54] <jefferai> that seems odd though
[4:54] <jefferai> since the documentation shows you how you can mark it down
[4:54] * jlogan (~Thunderbi@72.5.59.176) Quit (Ping timeout: 480 seconds)
[4:55] * wubo (~wubo3@nat-ind-inet.jhuapl.edu) Quit (Quit: Leaving)
[5:00] <dmick> it's not what you'd call obvious or symmetric, no
[5:00] <dmick> did it work?
[5:07] <jefferai> well
[5:07] <jefferai> one of my osds keeps going down
[5:07] <jefferai> seems like it goes down every time it tries to sync
[5:08] <dmick> ok but the 'reset noup' worked, so this is a second problem?
[5:08] <jefferai> actually, all of my 0.55 osds keep dying
[5:09] * stp (~stp@dslb-084-056-063-242.pools.arcor-ip.net) has joined #ceph
[5:09] <dmick> logs?
[5:11] * Ryan_Lane (~Adium@216.38.130.167) Quit (Quit: Leaving.)
[5:14] <jefferai> dmick: ok, so some of the crashes looked like this: http://paste.kde.org/622016/raw
[5:14] <jefferai> actually
[5:14] <jefferai> skip the /raw at the end
[5:14] <jefferai> or it's hard to read
[5:15] <dmick> that looks like the end of a coredump
[5:16] * scalability-junk (~stp@dslb-084-056-037-229.pools.arcor-ip.net) Quit (Ping timeout: 480 seconds)
[5:16] <jefferai> the rest are like this:
[5:16] <jefferai> http://paste.kde.org/622022
[5:17] <dmick> look for Caught signal
[5:17] <dmick> that should be near the stack trace
[5:17] <dmick> and may have the assert
[5:17] <dmick> ah, there's one
[5:18] <dmick> osd/PG.cc: 4533: FAILED assert(oinfo.last_epoch_started == info.last_epoch_started) is key
[5:18] <dmick> unfortunately I can't tell you what it means
[5:18] <jefferai> me neither
[5:18] <jefferai> I can revert back to 0.54
[5:19] <jefferai> I keep wanting to get off of 0.54 because I get tens of gigabytes of logs in a couple days from the "wrong node!" issue
[5:19] <jefferai> can you put that in whatever the appropriate queue is for reported issues? :-)
[5:19] <jefferai> any other debugging you can think of for me to gather?
[5:20] <dmick> well I can look for it in http://tracker.newdream.net/projects/ceph
[5:20] <dmick> as can you :)
[5:20] <dmick> nothing for that signature
[5:20] <jefferai> hah, I forgot about that
[5:20] <jefferai> I'll file a bug
[5:21] <dmick> interestingly, that code has changed recently
[5:21] <jefferai> post-0.55 you mean?
[5:22] <jefferai> perhaps it was a found bug
[5:22] <dmick> hm
[5:22] <dmick> do you speak git?
[5:22] <jefferai> yes
[5:22] <dmick> 0756052cff542ab02d653b40c37a645b395f31b3
[5:23] <jefferai> um
[5:23] <jefferai> let me fetch the ceph repo :-)
[5:23] <dmick> can just look on github if you wish
[5:24] <dmick> seems likely to be your problem, however
[5:24] <jefferai> which is more canonical -- github or ceph.newdream.net/git ?
[5:24] <dmick> github, for sure
[5:24] <dmick> but I meant just browse there; if you clone clone from github as well
[5:25] <jefferai> yeah, that *does* look quite like my problem
[5:25] <jefferai> by "replica running a version without the info.last_epoch_started" patch I assume it's meaning "since my entire cluster isn't yet 0.55..."
[5:26] <jefferai> so it's not getting some info it needs from 0.54
[5:26] <jefferai> and hence can never finish replicating
[5:26] <jefferai> perhaps
[5:26] <jefferai> but if that's true that suggests that it's basically impossible to update to 0.55 in a sane manner, without shutting down the entire cluster
[5:27] <dmick> it appears as though the patch Sam refers to is e2c4e2f63bfc138f1d122012ddf9731c8fa04758 from Nov 21
[5:27] <dmick> so yeah, that's almost certainly not in 0.54
[5:28] <jefferai> yeah
[5:28] <jefferai> okay, so, back to 0.54 with me until 0.56
[5:28] <jefferai> slash bobtail
[5:28] <jefferai> whichever it ends up being (or both)
[5:28] <dmick> there'll be a 0.55.1 soon
[5:28] <jefferai> ah, yeah?
[5:28] <dmick> a few other semi-serious things
[5:29] <jefferai> I see
[5:29] <dmick> sending email about this issue now for a decision; I think we want 0756052cff542ab02d653b40c37a645b395f31b3 put into the next point release of 55
[5:29] <jefferai> that would make sense
[5:29] <dmick> thanks for the debug; sorry about the problems
[5:29] <jefferai> I filed a bug http://tracker.newdream.net/issues/3592
[5:29] <jefferai> (before you found the issue)
[5:29] <jefferai> dmick: it's okay
[5:29] <dmick> ok, I'll mention that in the email, thanks
[5:29] <jefferai> I've been finding that ceph has bugs, but it is quite good about keeping data
[5:29] <jefferai> that's the most important thing
[5:30] <jefferai> and, it's only right to try to help debug open source projects that you are making use of
[5:30] <jefferai> :-)
[5:30] <dmick> I commend your attitude good sir
[5:33] <jefferai> :-)
[5:37] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[5:39] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:51] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.89 [Firefox 17.0.1/20121128204232])
[5:52] * gaveen (~gaveen@112.134.112.126) has joined #ceph
[5:58] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:28] * davidz2 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[6:29] * davidz3 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[6:34] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Ping timeout: 480 seconds)
[6:34] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Ping timeout: 480 seconds)
[6:37] <via> is it normal for, if you have one active out of three total mds's, for the active one to be replaced by another active one every once and a while?
[6:38] <via> and then for it to take a few minutes for it to show there being 3 mds's
[6:40] <via> oh, this time it didn't come back, and produced this: https://pastee.org/xnxtg
[6:53] * deepsa (~deepsa@122.172.210.248) has joined #ceph
[7:07] <via> and this is whats happening when it switches: https://pastee.org/fuqzd
[7:11] * DonaHolmberg (~Adium@76.91.162.233) has joined #ceph
[7:11] * DonaHolmberg (~Adium@76.91.162.233) has left #ceph
[7:30] * absynth (~absynth@irc.absynth.de) Quit (Ping timeout: 480 seconds)
[7:34] * KindOne (KindOne@h183.63.186.173.dynamic.ip.windstream.net) has joined #ceph
[7:57] * davidz2 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[7:59] * dmick (~dmick@2607:f298:a:607:2899:7889:ba56:8cb3) Quit (Quit: Leaving.)
[8:00] * davidz3 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[8:15] * absynth (~absynth@irc.absynth.de) has joined #ceph
[9:20] * tnt (~tnt@207.171-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[9:41] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:42] * deepsa_ (~deepsa@122.166.160.212) has joined #ceph
[9:45] * deepsa (~deepsa@122.172.210.248) Quit (Ping timeout: 480 seconds)
[9:45] * deepsa_ is now known as deepsa
[9:48] * deepsa_ (~deepsa@115.241.70.58) has joined #ceph
[9:53] * deepsa (~deepsa@122.166.160.212) Quit (Ping timeout: 480 seconds)
[9:53] * deepsa_ is now known as deepsa
[9:57] * absynth (~absynth@irc.absynth.de) Quit (Read error: Connection reset by peer)
[9:57] * absynth (~absynth@irc.absynth.de) has joined #ceph
[10:01] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[10:09] * deepsa (~deepsa@115.241.70.58) Quit (Ping timeout: 480 seconds)
[10:09] * gaveen (~gaveen@112.134.112.126) Quit (Remote host closed the connection)
[10:10] * deepsa (~deepsa@122.172.167.125) has joined #ceph
[10:11] * gaveen (~gaveen@112.134.112.126) has joined #ceph
[10:20] * maxiz (~pfliu@221.216.135.23) has joined #ceph
[10:34] * BManojlovic (~steki@242-174-222-85.adsl.verat.net) has joined #ceph
[10:46] * deepsa (~deepsa@122.172.167.125) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[10:51] * deepsa (~deepsa@122.172.167.125) has joined #ceph
[11:10] * KindOne (KindOne@h183.63.186.173.dynamic.ip.windstream.net) Quit (Remote host closed the connection)
[11:18] * KindOne (KindOne@h183.63.186.173.dynamic.ip.windstream.net) has joined #ceph
[11:28] * LeaChim (~LeaChim@b0fafb7d.bb.sky.com) has joined #ceph
[11:31] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[11:31] * alexxy[home] (~alexxy@2001:470:1f14:106::2) Quit (Read error: Connection reset by peer)
[11:35] * BillK (~billk@58-7-230-51.dyn.iinet.net.au) has joined #ceph
[11:44] * LeaChim (~LeaChim@b0fafb7d.bb.sky.com) Quit (Remote host closed the connection)
[11:44] * LeaChim (~LeaChim@b0fafb7d.bb.sky.com) has joined #ceph
[12:08] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[12:11] * guigouz (~guigouz@177.33.216.27) Quit (Quit: Computer has gone to sleep.)
[12:20] * gaveen (~gaveen@112.134.112.126) Quit (Remote host closed the connection)
[12:26] * gaveen (~gaveen@112.134.112.126) has joined #ceph
[12:33] * BillK (~billk@58-7-230-51.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[12:42] * BillK (~billk@203-59-91-90.dyn.iinet.net.au) has joined #ceph
[13:00] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:01] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[13:02] * Machske (~bram@d5152D87C.static.telenet.be) Quit (Quit: Ik ga weg)
[13:29] * absynth (~absynth@irc.absynth.de) Quit (Read error: Connection reset by peer)
[13:42] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[14:19] * BillK (~billk@203-59-91-90.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[14:24] * loicd (~loic@mon75-3-81-56-38-131.fbx.proxad.net) has joined #ceph
[14:27] * BillK (~billk@203-59-15-230.dyn.iinet.net.au) has joined #ceph
[14:54] * gaveen (~gaveen@112.134.112.126) Quit (Ping timeout: 480 seconds)
[15:00] * maxiz (~pfliu@221.216.135.23) Quit (Ping timeout: 480 seconds)
[15:03] * gaveen (~gaveen@112.134.113.227) has joined #ceph
[15:05] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[15:17] <jtang> are there plans to build an set of snmp mibs for ceph monitoring ?
[15:18] <jtang> a set
[15:23] * tnt (~tnt@207.171-67-87.adsl-dyn.isp.belgacom.be) Quit (Read error: Connection reset by peer)
[15:25] * tnt (~tnt@24.31-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[15:33] * loicd (~loic@mon75-3-81-56-38-131.fbx.proxad.net) Quit (Quit: Leaving.)
[15:57] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[16:09] * BillK (~billk@203-59-15-230.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:23] * BillK (~billk@58-7-146-104.dyn.iinet.net.au) has joined #ceph
[16:28] <joao> jtang, do you think that would be more useful than using the ceph tool?
[16:28] * BillK (~billk@58-7-146-104.dyn.iinet.net.au) Quit (Read error: Connection reset by peer)
[16:45] * Zethrok (~martin@95.154.26.34) has joined #ceph
[16:46] * BillK (~billk@58-7-69-122.dyn.iinet.net.au) has joined #ceph
[16:49] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[16:49] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[17:04] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[17:29] * loicd (~loic@mon75-3-81-56-38-131.fbx.proxad.net) has joined #ceph
[17:42] <denken> hmm. will noatime cause any problems for mon data?
[17:43] <joao> none I can think of
[17:46] <joao> denken, why would you ask? ran into some kind of trouble with the monitor?
[17:47] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[17:52] * loicd (~loic@mon75-3-81-56-38-131.fbx.proxad.net) Quit (Quit: Leaving.)
[18:00] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[18:02] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[18:05] <denken> nah just setting up a new one... im a fan of noatime, usually
[18:09] <tnt> who isn't :)
[18:55] <jtang> joao: if i can get my nagios install to poll a snmp backend from ceph they yea probably
[18:56] <jtang> otherwise i'd probably just look at writting a munin plugin or something like that
[18:56] <jtang> though i do wonder how many people out there that run a large enough system would care?
[18:57] <jtang> would they write their own custom nagios plugins etc...?
[18:57] <jmlowe> joao: snmp makes zenoss integration very easy
[18:58] <jmlowe> joao: you would just need to plug in the oid's for things you want to monitor
[19:01] <jtang> man its cold outside
[19:01] * tnt (~tnt@24.31-67-87.adsl-dyn.isp.belgacom.be) has left #ceph
[19:05] <joao> jmlowe, jtang, I'll talk with the guys next week to see what they think about that
[19:05] * joao takes a note in a random piece of paper
[19:06] * flash (~user1@host86-164-217-4.range86-164.btcentralplus.com) has joined #ceph
[19:06] <jmlowe> joao: it's pretty low on my priority list
[19:06] * flash is now known as Guest843
[19:12] <via> so, i still have some fairly regular mds crashes that thankfully don't seem to be taking down the cluster
[19:13] <via> https://pastee.org/94e7h
[19:15] <joao> via, slang is probably the one to talk to about the mds
[19:15] <via> ok
[19:15] <via> do you know what debugging options would be ideal?
[19:15] <joao> or just drop an email on ml or file a bug on the tracker
[19:15] <via> i can produce a lot of backgraces
[19:15] <via> traces even
[19:15] <via> ok
[19:15] <joao> via, 'debug mds = 20' might be a good option
[19:16] <joao> and 'debug ms = 10'
[19:16] <joao> oh, maybe 'debug ms = 10' might not be needed
[19:16] <joao> but couldn't hurt
[19:16] <joao> :)
[19:16] <via> yeah
[19:17] <via> i've just run into issues with running out of space if i leave logging on
[19:17] <via> at high levels
[19:17] <joao> oh, yeah, that might happen
[19:17] <joao> don't use ms = 10 then
[19:17] <joao> try debug ms = 1
[19:17] <joao> should generate just the required amount of info; 10 would totally spam your logs
[19:18] <joao> and as for the mds, I'm not familiar with the code so I don't know how you could best leverage the debug levels, hence the 20
[19:18] <joao> but if they grow out of proportion, try 10 instead of 20
[19:19] <joao> should generate some useful stuff
[19:19] <via> okay, i've set all mds's to 20, and next time it crashes i'll do something about it
[19:19] <joao> cool :)
[19:20] <joao> jmlowe, I've been wondering what would be the best way to monitor the monitors
[19:20] <joao> last week or something, someone said that they might hook their monitoring tools onto the monitor's admin socket
[19:21] <joao> and now that you guys were wondering about snmp, it got me thinking if that might actually be something worth to put some effort into
[19:21] <joao> then again, might require way too much effort vs the obtained rewards
[19:35] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:54] <slang> via: did you upgrade to 0.55?
[19:57] <via> slang: this cluster was from scratch with .55
[19:58] <via> with the patch from wip-3495 merged in
[19:58] <via> and cephx disabled since that crashes everything in about 5 minutes
[20:03] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:04] <joao> via, which components crash with cephx enabled?
[20:06] <via> all three
[20:06] <via> somewhat randomly
[20:06] <joao> monitors too?
[20:06] <via> yeah
[20:07] <via> but, i can't try to reproduce those now
[20:07] <joao> do you have a backtrace of the crash?
[20:07] <via> i'm just running sans cephx until i finish backing up everything to cephfs
[20:07] <via> once its done i'll happily debug that
[20:07] <joao> oh, can you recall if the monitor was crashing on a mds-related function?
[20:07] <via> its possible
[20:08] <via> i can look in the backlog of the channel, i might have posted it
[20:08] <joao> http://tracker.newdream.net/issues/3495
[20:09] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Connection reset by peer)
[20:09] <joao> oh
[20:09] <joao> yeah, sorry
[20:09] <joao> <via> with the patch from wip-3495 merged in
[20:09] <joao> that should have fixed the monitor side
[20:09] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:09] <joao> but if you happen to trigger anything else let me know
[20:10] <via> joao: this is the closest i can find from earlier this week, but its probably not helpful: http://pastebin.com/xtpFvQNt
[20:10] <via> yeah, that patch made ceph the most stable i've ever seen it
[20:11] <via> at least its all working, the mds's crashing in rotation is mainly just a large annoyance
[20:11] <via> oddly though, it was doing fine for about 30 hours of rsyncing, and now it does it about every 20-30 minutes
[20:11] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:11] <via> as if the more metadata there is, the more it wants to crash
[20:14] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:14] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Connection reset by peer)
[20:14] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[20:14] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:17] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[20:22] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Connection reset by peer)
[20:22] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:24] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:24] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:26] <slang> via: it happens when an mds appears to be laggy and tries to reconnect
[20:26] <slang> via: that's what triggers the bug
[20:27] <via> sounds right
[20:27] <slang> via: you can probably avoid hitting the bug (until we get it fixed) by explicitly setting the config parameter mds_beacon_grace to 30 or 40
[20:27] <slang> (it defaults to 15)
[20:27] <via> okay, i'll try that
[20:28] <slang> via: I think the bug might be related to recent pipe changes
[20:29] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[20:29] <via> yeah, it makes sense, cause i often see it think the mds is laggy before something dies
[20:29] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:29] <via> anyway, change is made, i'll report back when something happens
[20:31] <slang> via: cool, I've updated the bug with what I think is happening - GregF or Sage will probably have to take a look
[20:33] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Quit: Leaving.)
[20:33] <via> thank you
[20:40] * Guest843 (~user1@host86-164-217-4.range86-164.btcentralplus.com) Quit (Quit: Leaving.)
[20:48] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:49] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[21:00] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:01] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[21:02] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:02] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[21:11] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) has joined #ceph
[21:12] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) has left #ceph
[21:16] <via> slang: this happens occasionally with an mds, when it doesn't die but stops being active. about 95% of the way through the paste: https://pastee.org/2c97h
[21:37] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[23:25] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[23:30] * jlogan1 (~Thunderbi@2600:c00:3010:1:14a3:ca45:3136:669a) has joined #ceph
[23:35] * mr_pants (52e7d4bf@ircip2.mibbit.com) has joined #ceph
[23:36] <mr_pants> hey
[23:36] * mr_pants is now known as mythzib
[23:36] <mythzib> :)
[23:37] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[23:40] <mythzib> it's possible to limit write speed on cluster?
[23:43] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[23:43] <mythzib> or set priority between write/read
[23:44] * BillK (~billk@58-7-69-122.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[23:47] <mythzib> nhm: can you do read AND write performance test at the same time?
[23:47] <mythzib> because during write, read is horrible :(
[23:47] <mythzib> (rbd)
[23:54] * guigouz (~guigouz@177.33.216.27) has joined #ceph
[23:55] * BillK (~billk@124-148-122-172.dyn.iinet.net.au) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.