#ceph IRC Log


IRC Log for 2013-03-06

Timestamps are in GMT/BST.

[0:00] <Qten> mmmm replicas...
[0:00] <Vjarjadian> plus what if you want more VMs...you'd need to raid 1 all of them, when if you have a WAN that can hack it... 1 ceph cluster would work out of the box for new VMs
[0:01] <Qten> true, however you have no resil against the cluster going ass overhead
[0:01] <Vjarjadian> you also have no backup in case of supernova
[0:01] <Qten> so true
[0:01] <Qten> or zombie attack
[0:01] <MrNPP> nhm: ok, so at bs=1M, 35.1 MB/s, and at bs=4M it was 34.9MB/s
[0:02] <Vjarjadian> cluster goes down permanently, use one of your regular backups....
[0:02] <Qten> regular backups! bahh
[0:04] * dmner (~tra26@tux64-10.cs.drexel.edu) Quit (Quit: leaving)
[0:04] <Qten> thats the idea of having multiple clusters in different zones
[0:05] <Vjarjadian> ceph's design is for no single point of failure... but each to his own
[0:06] <Qten> true true
[0:09] * rinkusk (~Thunderbi@CPE00259c467789-CM00222d6c26a5.cpe.net.cable.rogers.com) Quit (Quit: rinkusk)
[0:11] * gucki (~smuxi@HSI-KBW-095-208-162-072.hsi5.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[0:20] <nhm> MrNPP: hrm, not much change then. Is this kernel rbd or qemu/kvm?
[0:20] <nhm> MrNPP: you may see better performance with multiple VMs
[0:20] <nhm> aggregate performance that is.
[0:20] <MrNPP> qemu
[0:21] <nhm> Ok. I'm working on kernel RBD testing right now but will be moving on to QEMU/KVM soon.
[0:21] <MrNPP> ok, i'll do some additional testing
[0:40] * rinkusk (~Thunderbi@ has joined #ceph
[0:43] * Cube (~Cube@ Quit (Quit: Leaving.)
[0:43] * Cube (~Cube@ has joined #ceph
[0:44] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) Quit (Quit: Leaving.)
[0:46] * Cube (~Cube@ Quit ()
[0:47] * miroslav (~miroslav@ Quit (Quit: Leaving.)
[0:47] * nick5 (~nick@ Quit (Remote host closed the connection)
[0:47] * jjgalvez (~jjgalvez@ Quit (Quit: Leaving.)
[0:47] * nick5 (~nick@ has joined #ceph
[0:51] * rinkusk (~Thunderbi@ Quit (Quit: rinkusk)
[0:51] * rinkusk (~Thunderbi@ has joined #ceph
[0:51] * rinkusk (~Thunderbi@ Quit ()
[0:52] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[0:52] * rinkusk (~Thunderbi@ has joined #ceph
[0:54] * Philip__ (~Philip@hnvr-4d07bfa9.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[0:57] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:57] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:07] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:17] * xmltok (~xmltok@pool101.bizrate.com) Quit (Ping timeout: 480 seconds)
[1:17] * sagelap (~sage@2600:1010:b11f:163c:492:e6c9:797:ff5b) Quit (Quit: Leaving.)
[1:18] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[1:21] * rinkusk (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[1:28] * xiaoxi (~xiaoxiche@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[1:29] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) Quit (Quit: noahmehl)
[1:29] * jlogan1 (~Thunderbi@2600:c00:3010:1:a073:d626:ddc8:2b2b) Quit (Ping timeout: 480 seconds)
[1:31] * rinkusk (~Thunderbi@ has joined #ceph
[1:33] * jlogan1 (~Thunderbi@ has joined #ceph
[1:33] * rturk is now known as rturk-away
[1:34] * vata (~vata@2607:fad8:4:6:c5f3:e0ab:5c04:da5a) Quit (Quit: Leaving.)
[1:38] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[1:38] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Remote host closed the connection)
[1:39] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[1:47] <nhm> ooh, bus error on the mon
[1:48] * rinkusk (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[1:49] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[1:52] <ShaunR> nhm: dont buy any of those old referb cards... no linux support :(
[1:52] <nhm> ShaunR: oooh, good to know!
[1:52] <nhm> ShaunR: Maybe that's why they are so cheap.
[1:52] <ShaunR> the newer cards have support, but they are aroung $500
[1:53] <ShaunR> well, that and they are EOL
[1:53] <ShaunR> I didnt care so much about the EOL
[1:54] * andrew_ (~andrew@ip68-231-33-29.ph.ph.cox.net) Quit (Quit: andrew_)
[1:55] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[1:57] * wschulze1 (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[1:58] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[2:00] * bstillwell (~bryan@ Quit (Quit: leaving)
[2:00] * esammy (~esamuels@host-2-103-102-78.as13285.net) Quit (Quit: esammy)
[2:01] * wschulze1 (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has left #ceph
[2:05] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:15] * alram (~alram@ Quit (Quit: leaving)
[2:17] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) has joined #ceph
[2:18] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) Quit ()
[2:25] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[2:31] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[2:32] * sagelap (~sage@2600:1010:b02f:6a48:f0b2:ae73:3082:73bc) has joined #ceph
[2:34] * rinkusk (~Thunderbi@CPEbc14015a7093-CMbc14015a7090.cpe.net.cable.rogers.com) has joined #ceph
[2:42] <jmlowe> ugh, looks like I'm stuck active+remapped+backfilling again
[2:49] * Kioob (~kioob@2a01:e35:2432:58a0:21a:92ff:fe90:42c5) has joined #ceph
[2:49] <Kioob> Hi
[2:50] <Kioob> after changing ruleset on an empty pool, I have some PG which are in �active+remapped� state
[2:50] <Kioob> but that pool was empty...
[2:50] <Kioob> how can I see what's happening ?
[2:52] <dmick> http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#troubleshooting-pg-errors may help
[2:53] * esammy (~esamuels@host-2-103-102-78.as13285.net) has joined #ceph
[2:54] <Kioob> mm maybe dmick. If there was data in my pool, PG would be in �stale+active+remapped� ?
[2:55] * esammy (~esamuels@host-2-103-102-78.as13285.net) Quit ()
[2:58] * sagelap (~sage@2600:1010:b02f:6a48:f0b2:ae73:3082:73bc) Quit (Ping timeout: 480 seconds)
[3:04] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[3:06] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Bye!)
[3:07] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[3:09] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit ()
[3:12] <infernix> huh, there isn't any minimum size to objects is there?
[3:12] <elder> 0
[3:12] <dmick> Kioob: don't understand the question
[3:12] <infernix> doh
[3:13] <nhm> elder: that'd be fun to benchmark
[3:13] <infernix> can rbd images have custom metadata?
[3:13] <elder> What does that mean?
[3:13] <dmick> not in any kind of legislated way
[3:14] <dmick> RADOS objects can hold attributes, and the RADOS objects that make up rbd images can too, but there's no useful interface to deal with them at the rbd-image level
[3:14] <infernix> not from librbd?
[3:14] <infernix> i'm already in python
[3:14] <dmick> no
[3:14] <infernix> oh but i see, many rados objects is one rbd image
[3:14] <infernix> that kinda defeats the whole point then. ok
[3:14] * rinkusk (~Thunderbi@CPEbc14015a7093-CMbc14015a7090.cpe.net.cable.rogers.com) Quit (Ping timeout: 480 seconds)
[3:14] <dmick> right
[3:15] <infernix> any limitations on rbd image names then? length or character wise?
[3:16] <jmlowe> any ideas on how to gently kick my cluster in the ass, enough to get recovery going again but not enough to knock it over
[3:16] <jmlowe> I'm going 15 minutes between recovery operations again
[3:16] <nhm> elder: btw, figured out my rbd problem. It looks like fio does not handle doing IO directly on rbd block devices well. Josh things maybe the non 512b block size.
[3:16] <dmick> infernix: you will have a hard time if you put @ or / in their names, since the cmdline tool parses those as separators
[3:17] * rinkusk (~Thunderbi@CPEbc14015a7093-CMbc14015a7090.cpe.net.cable.rogers.com) has joined #ceph
[3:17] <dmick> and I couldn't guarantee you they're 100% clean otherwise. (clearly one ought to avoid whitespace)
[3:17] <elder> I don't get it nhm
[3:17] <Kioob> dmick: in fact I didn't see any "error", so I was asking if my "active+remapped" PG are "stale+active+remapped"
[3:17] <infernix> will stick with a-zA-z0-9_-
[3:17] <Kioob> but... I don't think it's the case
[3:18] <elder> nhm I guess I wasn't up to speed on the problems you were having with rbd.
[3:18] <Kioob> with a "ceph pg 7.61 query", I don't see any problem... but that PG still marked as "remapped"
[3:18] <dmick> AFAIK, active+remapped means that the PGs were on one set of OSDs, but now are temporarily on a different set (because something is wrong with one or more of the original set)
[3:18] <Kioob> yes I change the ruleset for that pool. But it's a pool without any data :S
[3:19] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[3:20] <nhm> elder: very high read throughput claims with little to no disk or network activity despite direct IO. The big numbers Josh thinks may be due to it getting confused about block size.
[3:20] <elder> Hmm. I'd be interested in looking at that. Not tonight though...
[3:21] <elder> It doesn't really make sense. Does the read data get validated?
[3:22] <Kioob> for example, I obtain that : http://pastebin.com/hKa8V6MP. This PG is marked as "active+remapped", but it have "empty": 1 ; so there is no data to "remap"
[3:23] <Kioob> oh. maybe the problem is that there is only one OSD in the "up" section
[3:24] * sagelap (~sage@2600:1010:b000:6b8d:987e:1e0b:bb44:ce9e) has joined #ceph
[3:25] <Kioob> yes, on valid PG there is multiple OSD in the "up" section
[3:28] <dmick> Kioob: yeah, things aren't going to be normal until that OSD comes back
[3:28] <dmick> (1, I guess)
[3:28] <dmick> I'm a little surprised that it's "remapped", but, that's definitely keeping the PG unhealthy
[3:29] <Kioob> I'm trying to add some OSD reserved for a specific pool (SSD)
[3:30] <Kioob> so, 42 is a new OSD, reserved for SSD
[3:30] <Kioob> and 1 is a "normal" OSD
[3:35] <Kioob> so I suppose that my ruleset is wrong... I take it from the doc, but maybe I didn't understand. I have :
[3:35] <Kioob> step take rootSSD
[3:35] <Kioob> step chooseleaf firstn 0 type net
[3:36] <Kioob> and my �tree� is http://pastebin.com/Z0AbfLF5
[3:37] <Kioob> so, data from osd.42 should be replicate on osd.40 or osd.41
[3:43] <Kioob> mmm
[3:43] <infernix> Cloning with 8 processes took 24.387226s at 839.783909822 MB/s
[3:43] <infernix> oooh yeah
[3:43] <infernix> rbdbackup.py is a fact
[3:44] * infernix rejoices
[3:44] <infernix> that is some faaast backup
[3:45] <Kioob> So, first time I tried to move from "full platter" to "full ssd" in one step. Probably not a good idea.
[3:45] <Kioob> But if I use the "ssd primary" example, it works : HEALTH_OK and all PG are "active+clean"
[3:46] <Kioob> Then, from "ssd primary" I switch to "full ssd", then cluster stay in HEALTH_OK, but I have again 111 PG in "active+remapped" state
[3:48] <Kioob> and after a timeout (5 minutes ?) cluster swith to HEALTH_WARN, of course
[3:51] * jlogan1 (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[3:55] <Kioob> great. It's just a CRUSH rule problem : if I use "choose .... osd", it works
[3:55] <Kioob> but "chooseleaf ... host" doesn't work
[3:56] <Kioob> maybe 2 hosts is not enough ?
[3:57] * rinkusk (~Thunderbi@CPEbc14015a7093-CMbc14015a7090.cpe.net.cable.rogers.com) Quit (Ping timeout: 480 seconds)
[4:01] * dpippenger (~riven@ Quit (Remote host closed the connection)
[4:11] * nwat1 (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[4:12] <ShaunR> infernix: this somthing you wrote
[4:17] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[4:17] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[4:19] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Read error: Connection reset by peer)
[4:19] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[4:19] * ChanServ sets mode +o scuttlemonkey
[4:22] <Kioob> so dmick : the problem is "just" that I really misunderstood the "chooseleaf" behavior. Now it's ok. :)
[4:22] <dmick> cool
[4:24] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[4:26] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[4:30] * sagelap (~sage@2600:1010:b000:6b8d:987e:1e0b:bb44:ce9e) Quit (Ping timeout: 480 seconds)
[4:35] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[4:39] * sagelap (~sage@2600:1010:b000:6b8d:d0ae:fa6d:1146:f6bc) has joined #ceph
[4:43] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[4:53] <nwat1> any devs around?
[4:53] <dmick> I play one on tv
[4:53] <nwat1> heh...
[4:53] <dmick> sup?
[4:53] <nwat1> I'm seeing a bunch of assertion failures in gitbuilder in the bufferptr tests
[4:54] <dmick> ok..
[4:54] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Read error: Operation timed out)
[4:54] <nwat1> is there any known errors?
[4:54] <dmick> beats me, but would surprise me
[4:55] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[4:55] <dmick> the tests definitely uncovered some errors, so if there's some way the tests are running against older buffer.cc, that might be an issue
[4:56] <nwat1> ok. i'm only seeing them in a couple gitbuilders, but it looks like they are showing up in other people's branches…
[4:56] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:57] <nwat1> thanks. i was little concerned i was in over my head :)
[5:00] <infernix> ShaunR: yea
[5:00] <infernix> not quite doing what i want yet though
[5:02] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[5:03] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[5:03] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Read error: Connection reset by peer)
[5:04] <dmick> nwat1: I'm seeing encode/decode errors
[5:04] <dmick> 3 clone_info /tmp/typ-dbOkn2N0u /tmp/typ-qYB2RI9mG differ: byte 23, line 1 **** clone_info test 1 binary reencode check failed ****
[5:05] <dmick> I see the bufferlist error too but that doesn't seem to be fatal; not sure why
[5:05] * jamespage (~jamespage@culvain.gromper.net) Quit (Quit: Coyote finally caught me)
[5:05] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[5:10] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[5:10] <infernix> "Note that data cannot be :type:unicode - Librbd does not know how to deal with characters wider than a :c:type:char."
[5:11] <nwat1> ahh, yeh i see them locally too… i wonder how well ceph.git bisects
[5:11] <dmick> infernix: yeah
[5:12] <infernix> dmick: i'm confused
[5:12] <infernix> i'm reading data from files and disks and trying to write it to ceph but am not getting the same md5sums
[5:13] * dmick waits for detail of investigation
[5:13] <infernix> no failures to write or anything but i don't know how to correctly handle this type stuff
[5:13] <dmick> I'm sure that means in the parameter interface, not the data path
[5:13] <dmick> data is just a bag of bytes
[5:15] <infernix> so http://bpaste.net/show/3g4h92dLIlO20bqa0X5L/
[5:16] <infernix> i open a file or disk with mysource = open(options.source,'r'). I read 64MB off of it with mydata = mysource.read(mybuffer). and i write that to myrbdimage.write(mydata,myoffset)
[5:16] <infernix> but md5 checksums on that first 64mb don't match
[5:17] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[5:17] <dmick> the first thing I'd do is simplify, and binary-compare the chunks
[5:17] <infernix> i had it working fine with a benchmark tool that generated random data, but now that i'm reading files it goes haywire somehwere
[5:20] <dmick> you're getting the other md5 externally, then?
[5:21] <dmick> (and why are you printing the digest before the update?)
[5:21] <infernix> i'm just printing tons of digests now
[5:21] <infernix> to figure out wtf is going on
[5:22] <infernix> dd if=4gbtestfile bs=67108864 count=1|md5sum yields f809caba93df95581f71760c71313b2f
[5:22] <infernix> but the first 64mb through read yields d41d8cd98f00b204e9800998ecf8427e
[5:22] <infernix> and here's the kicker
[5:22] <infernix> dd if=thisisanerror | md5sum also yields d41d8cd98f00b204e9800998ecf8427e
[5:23] <infernix> so d41d8cd98f00b204e9800998ecf8427e means zero bytes
[5:23] <infernix> oh wait
[5:23] <infernix> i got that one
[5:24] <infernix> hah, that was an order error
[5:24] <dmick> I concur that the md5sum of nothing is d4...
[5:26] <dmick> an 'order error' you say
[5:27] <infernix> printed the md5sum before the md5.update
[5:27] <dmick> o
[5:27] <infernix> but the next 64mb doesn't match
[5:28] <dmick> are you sure the data is going to the rbd image?
[5:28] <dmick> I'd try a very small chunk, and then rbd export it to a second file and compare that way
[5:28] <dmick> you may visually spot the error quickly
[5:30] <infernix> dmick: no, the rbd device is empty
[5:31] <infernix> first 64MB has nothing, e.g. dd bs=64M | strings yields nada
[5:31] <infernix> but that seems only to be the case when i'm reading it on another host with kernel rbd
[5:31] <dmick> dd? strings?
[5:31] <dmick> oh kernel rbd
[5:31] <dmick> but I'd still be using something like xxd | less
[5:32] <dmick> who knows wtf strings will show you
[5:32] <infernix> alll 0s
[5:32] <dmick> but I don't know what you mean by "only to be the case when reading on another host"; you mean there's some situation where it's different?
[5:32] <infernix> so, now i know it reads, but it doesn't write
[5:32] <infernix> i'm writing to an rbd image on host A with librbd
[5:33] <infernix> reading it on host B with kernel rbd (rbd unmap; rbd map blah)
[5:33] <dmick> you could read it on host A with rbd export <image> - | xxd | less
[5:34] <dmick> so the writes are going wrong
[5:34] <dmick> the thing is...your program isn't reading back the image either
[5:34] <infernix> export looks like its valid
[5:35] <infernix> yep. lets try that on host B too
[5:35] <dmick> it's only summing what it read from the file
[5:35] <infernix> also valid
[5:35] <infernix> so kernel rbd is off
[5:35] <infernix> ok.. let me see then
[5:36] <dmick> you could export | md5sum to show the data is all correct; don't know what's up with krbd
[5:36] <infernix> yep, its running
[5:37] <dmick> only the one image? no chance of mapping the wrong one/wrong pool?
[5:43] <infernix> hah!
[5:43] <infernix> it works fine, it's kernel rbd thats throwing me off
[5:43] <infernix> that's returing all zeroes and is how i was verifying
[5:43] <dmick> that shouldn't be happening, clearly
[5:44] <infernix> Linux ha-lvs-002 3.8.0-ceph #1 SMP Mon Feb 18 16:06:35 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
[5:45] <infernix> now the real test, a 20gb lvm disk
[5:51] <infernix> \o/
[5:51] <infernix> victory
[5:51] <dmick> so I'm interested in what happened to your krbd, but cool
[5:52] <infernix> kernel rbd is phase 2 for me
[5:52] * nwat1 (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[5:52] <infernix> phase 1 is dumping 40-60TB of backups into ceph daily
[5:52] <infernix> will look at kernel later. at least now i can finish up and deploy this tomorrow
[5:53] <infernix> thanks :> need sleep now
[6:01] <dmick> yw. I need to go as well. gnite
[6:01] * dmick (~dmick@2607:f298:a:607:5416:89:4816:3512) has left #ceph
[6:17] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[6:31] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[7:36] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[7:37] * Kioob (~kioob@2a01:e35:2432:58a0:21a:92ff:fe90:42c5) Quit (Quit: Leaving.)
[7:47] * esammy (~esamuels@host-2-103-102-78.as13285.net) has joined #ceph
[8:08] * andrew_ (~andrew@ip68-231-33-29.ph.ph.cox.net) has joined #ceph
[8:08] * andrew_ (~andrew@ip68-231-33-29.ph.ph.cox.net) has left #ceph
[8:12] * Philip__ (~Philip@hnvr-4d07bfa9.pool.mediaWays.net) has joined #ceph
[8:40] * Philip__ (~Philip@hnvr-4d07bfa9.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[8:40] * triaklodis (~triaklodi@ has joined #ceph
[8:44] * wer (~wer@wer.youfarted.net) Quit (Ping timeout: 480 seconds)
[8:49] * leseb (~leseb@ has joined #ceph
[8:53] * gerard_dethier (~Thunderbi@ has joined #ceph
[8:54] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[9:02] * eschnou (~eschnou@ has joined #ceph
[9:11] * triaklodis_ (~triaklodi@ has joined #ceph
[9:11] * triaklodis (~triaklodi@ Quit (Read error: Connection reset by peer)
[9:19] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[9:23] <Kdecherf> oh god, a corrupted osd
[9:47] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: The early bird may get the worm, but the second mouse gets the cheese)
[9:47] * l0nk (~alex@ has joined #ceph
[9:51] * ScOut3R (~ScOut3R@ has joined #ceph
[9:52] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[9:53] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[9:56] * triaklodis_ (~triaklodi@ Quit (Quit: Konversation terminated!)
[9:59] * loicd1 (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[10:00] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[10:02] * dosaboy (~user1@host86-164-227-220.range86-164.btcentralplus.com) has joined #ceph
[10:06] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[10:06] * xiaoxi (~xiaoxiche@jfdmzpr06-ext.jf.intel.com) Quit (Ping timeout: 480 seconds)
[10:08] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) Quit (Remote host closed the connection)
[10:10] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Quit: Bye)
[10:10] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[10:16] * tryggvil_ (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[10:20] * tryggvil_ (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[10:22] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Ping timeout: 480 seconds)
[10:32] <joelio> Pretty chuffed, showed my boss the fruits of the past few weeks testing and he's keen as mustard. We'll be builing a production cluster over the next few weeks
[10:32] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[10:37] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:39] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[10:39] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[10:44] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[10:44] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[10:48] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[10:57] * tryggvil_ (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[10:59] * LeaChim (~LeaChim@b0faa0c8.bb.sky.com) has joined #ceph
[11:02] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Ping timeout: 480 seconds)
[11:08] * tryggvil_ (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[11:09] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[11:14] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[11:25] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[11:26] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[11:42] * tryggvil_ (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[11:48] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Ping timeout: 480 seconds)
[11:48] * tryggvil_ is now known as tryggvil
[11:51] * loicd1 (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[12:00] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[12:00] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[12:12] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[12:12] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[12:13] * yanzheng (~zhyan@ has joined #ceph
[12:19] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[12:20] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[12:23] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[12:25] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Remote host closed the connection)
[12:27] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[12:27] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[12:39] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[12:39] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[12:50] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[12:50] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[12:55] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[12:55] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[12:56] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[13:01] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Quit: tryggvil)
[13:04] * rinkusk (~Thunderbi@CPEbc14015a7093-CMbc14015a7090.cpe.net.cable.rogers.com) has joined #ceph
[13:07] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[13:11] <nhm> good morning #ceph
[13:13] <infernix> nhm: morning
[13:13] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[13:22] * lurbs (user@uber.geek.nz) Quit (Ping timeout: 480 seconds)
[13:24] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[13:25] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[13:25] <jtang> morning!
[13:27] * lurbs (~lurbs@uber.geek.nz) has joined #ceph
[13:31] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Read error: Connection reset by peer)
[13:31] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[13:38] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:38] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[13:38] * The_Bishop (~bishop@2001:470:50b6:0:5026:e6f0:2177:24d5) Quit (Ping timeout: 480 seconds)
[13:47] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Quit: tryggvil)
[13:47] * The_Bishop (~bishop@2001:470:50b6:0:830:3aa8:f78b:205) has joined #ceph
[13:50] * yanzheng (~zhyan@ has joined #ceph
[13:53] <joelio> afternoon!
[13:55] * rinkusk (~Thunderbi@CPEbc14015a7093-CMbc14015a7090.cpe.net.cable.rogers.com) Quit (Ping timeout: 480 seconds)
[13:56] * lurbs_ (user@uber.geek.nz) has joined #ceph
[13:56] * lurbs (~lurbs@uber.geek.nz) Quit (Read error: Connection reset by peer)
[13:58] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[13:59] * junglebells (~junglebel@0001b1b9.user.oftc.net) Quit (Quit: *sigh* off to go babysit a client)
[14:03] * lurbs_ (user@uber.geek.nz) Quit (Read error: Connection reset by peer)
[14:03] * lurbs (user@ has joined #ceph
[14:08] * Morg (b2f95a11@ircip4.mibbit.com) has joined #ceph
[14:09] <scuttlemonkey> joelio: thats great (re: cluster)
[14:09] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[14:12] * BillK (~BillK@58-7-246-238.dyn.iinet.net.au) has joined #ceph
[14:14] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[14:15] * markbby (~Adium@ has joined #ceph
[14:15] * rinkusk (~Thunderbi@ has joined #ceph
[14:16] * lofejndif (~lsqavnbok@thoreau.guilhem.org) has joined #ceph
[14:18] <Kioob`Taff> question, I have a cosmetic bug with Ceph : since I compile my own bobtail packages for Debian, "/etc/init.d/ceph status" doesn't show versions of binaries
[14:18] <Kioob`Taff> any idea where can be the problem ?
[14:19] <scuttlemonkey> you're looking for the ceph verion?
[14:19] <scuttlemonkey> a la 'ceph -v' ?
[14:19] <scuttlemonkey> or the status of the proc?
[14:21] <Kioob`Taff> # ceph -v
[14:21] <Kioob`Taff> ceph version ()
[14:21] <Kioob`Taff> *
[14:21] <Kioob`Taff> :=
[14:21] <Kioob`Taff> :)
[14:21] <Kioob`Taff> so, "ceph -v" doesn't work yes
[14:24] <scuttlemonkey> hah, interesting
[14:25] <scuttlemonkey> how did you snag the source?
[14:25] <scuttlemonkey> just clone it from the git repo?
[14:25] <Kioob`Taff> git checkout origin/bobtail
[14:26] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:26] <Kioob`Taff> then «dch -i», «dpkg-source -b ceph», and finally «cowbuilder --build ceph*dsc»
[14:34] * lurbs_ (user@uber.geek.nz) has joined #ceph
[14:34] * lurbs (user@ Quit (Read error: Connection reset by peer)
[14:35] <scuttlemonkey> Kioob`Taff: can you email that to ceph-devel?
[14:35] <Kioob`Taff> of course
[14:35] <scuttlemonkey> I am not an accomplished deb pkg builder...Sage or someone will probably have a better answer than I
[14:44] <fghaas> Kioob`Taff: running dpkg-buildpackage straight from the git checkout didn't do it?
[14:44] * mjevans (~mje@ Quit (Ping timeout: 480 seconds)
[14:44] <Kioob`Taff> I don't know fghaas, I don't have environment for that
[14:45] <Kioob`Taff> that system is sid, and I need to compile for squeeze
[14:45] <Kioob`Taff> (with non standard kernel)
[14:45] <fghaas> Kioob`Taff: should be simple to set up in a squeeze chroot. debootstrap, apt-get install build-essential, then run dpkg-buildpackage
[14:46] <fghaas> which will tell you about all unsatisfied build deps
[14:46] <fghaas> then you install those, and off you go
[14:46] <fghaas> that's how I recall building
[14:46] <fghaas> and the non-default kernel doesn't matter, there isn't any kernel code in the ceph git repo
[14:46] <Kioob`Taff> for syncfs
[14:47] <Kioob`Taff> for syncfs support, it matter
[14:47] <fghaas> iirc installing linux-headers for your desired kernel in the chroot should do the trick
[14:48] <fghaas> and the squeeze-backports kernel (3.2.0) should include syncfs, so install the kernel image and headers for that in the chroot, and you ought to be fine
[14:48] <fghaas> or, spin up squeeze in a vm
[14:49] <Kioob`Taff> yes... but «cowbuilder» already give me a squeeze chroot, which works... I don't really want to duplicate that for each builds
[14:52] * leseb__ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[14:53] * rinkusk (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[14:55] * netsrob (~thorsten@office-at.first-colo.net) has joined #ceph
[14:55] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[14:56] <netsrob> how many server nodes are the bare minimum for redundant use of ceph?
[14:57] <janos> with a default crush map, 2
[14:57] <Kioob`Taff> 3, if you want redundant monitors
[14:57] * tryggvil (~tryggvil@Router086.inet1.messe.de) has joined #ceph
[14:57] <jmlowe> you want redundant monitors
[14:57] <janos> right. the baseline example is 1 but you want 3 at least
[14:57] <janos> (monitors)
[14:58] <netsrob> ok, i have a 2 node setup atm and have the problem that when one host is down (with osd, mon etc.) the other node is not usable anymore
[14:59] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[14:59] <fghaas> netsrob: entirely expected
[14:59] <fghaas> that other node no longer has quorum
[14:59] <fghaas> you want a third node that's running at least a mon
[15:00] <netsrob> ok, then i'll try with 3 mon setup, thx :)
[15:00] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[15:00] <jmlowe> think of yourself as a mob boss and the mon is your mob accountant, you may have the account numbers but only the accountant knows which accounts are at which banks, if you or somebody else whacks your accountant you can't access your money
[15:01] <jmlowe> also your accountants may lie to you to save themselves so it's best to just trust the majority
[15:02] <jmlowe> <- has been watching the dark knight on cable a lot recently
[15:03] <netsrob> xD
[15:03] * BillK (~BillK@58-7-246-238.dyn.iinet.net.au) Quit (Quit: Leaving)
[15:03] * diegows (~diegows@ has joined #ceph
[15:03] <netsrob> nice comparison ^^
[15:06] <Kioob`Taff> bobtail have "long term support", like argonaut, right ?
[15:07] <scuttlemonkey> for some definition of "long term"
[15:07] <scuttlemonkey> all of the major (named) releases will get some amount of backports and the like
[15:07] <Kioob`Taff> ok, great
[15:08] <scuttlemonkey> but I don't think they have nailed down how long that will last
[15:08] <scuttlemonkey> for any given release
[15:08] <Kioob`Taff> for production usage, I suppose I should stay on bobtail until the next major release
[15:09] * fghaas works jmlowe's suggestion into his next talk. Now the billion-room Ceph hotel is run by the mob
[15:10] <jmlowe> :)
[15:11] * tryggvil (~tryggvil@Router086.inet1.messe.de) Quit (Quit: tryggvil)
[15:15] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:21] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[15:28] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[15:30] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 19.0/20130215130331])
[15:52] * rinkusk (~Thunderbi@CPE00259c467789-CM00222d6c26a5.cpe.net.cable.rogers.com) has joined #ceph
[15:59] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[15:59] * Teduardo (~DW-10297@dhcp92.cmh.ee.net) has joined #ceph
[15:59] <Teduardo> Howdy, is anyone from newdream or dreamhost in here, PM me please =)
[16:02] <jmlowe> that's -l 64k on mkfs.xfs ?
[16:02] <jmlowe> or -l size=64k rather
[16:02] * PerlStalker (~PerlStalk@ has joined #ceph
[16:04] * markl_ (~mark@tpsit.com) has joined #ceph
[16:04] * markl_ (~mark@tpsit.com) Quit ()
[16:06] * verwilst (~verwilst@ has joined #ceph
[16:06] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[16:08] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:21] * drokita (~drokita@ has joined #ceph
[16:23] * The_Bishop (~bishop@2001:470:50b6:0:830:3aa8:f78b:205) Quit (Ping timeout: 480 seconds)
[16:23] * lurbs (user@uber.geek.nz) has joined #ceph
[16:24] * lurbs_ (user@uber.geek.nz) Quit (Read error: Network is unreachable)
[16:31] * Philip__ (~Philip@hnvr-4d079fe4.pool.mediaWays.net) has joined #ceph
[16:33] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[16:33] * Morg (b2f95a11@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[16:36] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[16:38] * verwilst (~verwilst@ Quit (Ping timeout: 480 seconds)
[16:45] * vata (~vata@2607:fad8:4:6:7c6f:a43a:3c1f:85ab) has joined #ceph
[16:47] <sstan> jmlowe : I did that (suse SLES) ... and it makes the filesystem read-only .. did anoyone have similar issues?
[16:49] <joelio> hmm, slow request from OSD's - all in sync (time wise)
[16:49] <joelio> any idea?
[16:49] <joelio> 2013-03-06 15:49:24.039388 osd.306 [WRN] slow request 7401.895548 seconds old, received at 2013-03-06 13:46:02.143764: osd_op(client.4179.0:7109802 rb.0.1051.238e1f29.000000006073 [write 0~4194304] 3.c83311ee) v4 currently reached pg
[16:49] <joelio> etc..
[16:50] <joelio> https://gist.github.com/anonymous/5100271
[16:51] <fghaas> joelio: fairly obvious, right?
[16:51] <fghaas> osd.306 is slow, fix that bugger :)
[16:51] <fghaas> probably a slow (or dying) disk
[16:52] <joelio> I'm seeing no utilization at all
[16:52] <jmlowe> I've had trouble with that too
[16:52] <fghaas> well _all_ of your stuck PGs have a least one replica on osd.306, *and* it's being marked with slow requests
[16:53] <joelio> I'll mark as down and see what happens
[16:53] <fghaas> check dmesg, SMART logs, kernel logs
[16:53] <jmlowe> an osd that is being marked with slow requests that from what I can tell isn't slow
[16:53] <fghaas> just killing the 306 ceph-osd daemon ought to do it
[16:54] <fghaas> and then if you immediately want to rebalance, mark it out
[16:55] * leseb__ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Remote host closed the connection)
[16:59] * gerard_dethier (~Thunderbi@ Quit (Quit: gerard_dethier)
[17:00] <joelio> ahh, I think I've maxxed memory on the 306 host and it's started to swap
[17:00] <joelio> (only 4Gb in that box)
[17:01] <joelio> fixing itself now :)
[17:01] <fghaas> see?
[17:01] <joelio> +1
[17:01] <fghaas> do I get to say "toldjaso"? :)
[17:02] <joelio> you can bask in your awesomeness for a little longer :)
[17:02] <fghaas> joelio: sysctl -w vm.swappiness=10
[17:02] <joelio> swapoff !
[17:02] <joelio> :)
[17:02] <joelio> swap suck
[17:02] <fghaas> that may be a little extreme :)
[17:02] <joelio> yea, maybe
[17:04] <madkiss> I wonder how many people in fact *did* disable swap because they have no SSDs and have ran into scenarios where a system was swapping and thus unusable
[17:07] <rinkusk> In our test setup (3 nodes with 3 OSDs each) one OSD was taking up so much memory, it filled up the entire 8GB swap. I will be testing by turning swap off.
[17:10] <fghaas> rinkusk: again, the vm.swappiness sysctl is very helpful here
[17:10] <fghaas> setting it to 0 will have the system swap _only_ to avoid an OOM condition
[17:10] <fghaas> which means you won't be swapping unless absolutely necessary
[17:11] <fghaas> of course, as madkiss says, if your swap is on ssd you likely won't care
[17:11] <fghaas> I run a 6-node ceph cluster on this 4GB laptop of mine because I just don't feel it when the system is swapping
[17:12] <rinkusk> fghaas: Hmm.. I did not know about swappiness. I will give this a shot. But I have been reading some mails on forum where people are trying to debug some memory leak issue. So my problem might be related to that.
[17:13] <fghaas> sure, if you have something that leaks memory like crazy, *that's* the problem you want to solve :)
[17:13] <rinkusk> :) indeed
[17:18] <jtang> i dont suppose anyone from inktank/ceph are interested in some experiences with backblaze pods and ceph + el6 ?
[17:20] <scuttlemonkey> jtang: you built a ceph cluster on the backblaze?
[17:20] * rinkusk (~Thunderbi@CPE00259c467789-CM00222d6c26a5.cpe.net.cable.rogers.com) Quit (Quit: rinkusk)
[17:20] <jtang> scuttlemonkey: we tried, and failed
[17:20] <scuttlemonkey> oh?
[17:20] <scuttlemonkey> what went wrong?
[17:20] <jtang> we've come to the conclusion that backblaze pods are bad for ceph
[17:21] <jtang> firstly pods are just crap for data thats needs to be in use, they are single points of failure
[17:21] <madkiss> btw. has anyone ever tested CephFS on top of ZFS?
[17:21] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[17:21] <jtang> the controller cards need to be picked carefully, the ones that they recommend/ship if buy them pre-constructed (like we did) then disks crapout/disappear randomly
[17:21] <madkiss> At this years GUUG Frühjahrsfachgespraech, I learned that ZFS on Linux is not quite as bad[™] as I had originally thought, so that may be worth a try
[17:22] <scuttlemonkey> jtang: yikes
[17:22] <madkiss> Of course, I mean Ceph, not CephFS, sorry.
[17:22] <scuttlemonkey> madkiss: I talk to someone in France at Cloud Expo Europe who had built a cluster on ZFS
[17:22] <jtang> and a fully loaded pod has 135tb of space, so putting data on/off it over a 1gb link kinda sucks, unless you manage to soruce a decent mobo that will let you have enough space for a 10gb ethernet or ib card
[17:23] <scuttlemonkey> they had some performance problems iirc (but I don't remember the exact nature of them) and switched to xfs
[17:23] <jtang> since there are 45disks inside the pod, dealing with failures and a down'd pod is just unpleasant
[17:23] <jtang> we have only 2pods
[17:23] <joelio> why use it on Linux anyway when there's FreeBSD, Illumos, SmartOS... Solaris
[17:23] <joelio> if you really need ZFS that is
[17:23] <madkiss> scuttlemonkey: ouch. thanks for the heads-up; in fact, running OSDs on top of ZFS would only ever make sense to get better performance.
[17:24] <jtang> scuttlemonkey: we tried btrfs on sl6 (rhel6), that didn't really work out
[17:24] * joelio kisses XFS in the face
[17:24] <jtang> we also tried 45 single osds per pod, that didnt work out either as el6 doesnt have syncfs() in libc
[17:24] * lofejndif (~lsqavnbok@82VAAAFWS.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[17:24] <scuttlemonkey> jtang: yeah, the pods looked....awkward to get at
[17:24] <jtang> ubuntu wasnt a runner as it didnt pick up all 45disks
[17:25] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Remote host closed the connection)
[17:25] <jtang> scuttlemonkey: we suspect that the expander cards are the problem
[17:25] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[17:26] <jtang> the chipsets on the sata cards that we got have a bug with timing out disks on the expanders
[17:26] <jtang> currently we're running two storage pods with i think ~40tb per pod, with no expander cards, it seems stable enough
[17:26] <scuttlemonkey> jtang: doh
[17:27] <scuttlemonkey> I have heard horror stories about backblaze setups
[17:27] <jtang> we're going to re-provision the disks in other machines and slowly build up the system
[17:27] <scuttlemonkey> but never really looked into myself
[17:27] <jtang> scuttlemonkey: well, we tried and failed at using ceph on the pods
[17:27] <scuttlemonkey> biggest complaint I have heard though is from folks who had to use it
[17:27] <jtang> i suspect if you had about 10 of them, and 10gb/infiniband it would work better
[17:27] * jlogan1 (~Thunderbi@2600:c00:3010:1:d431:8b06:8e11:1828) has joined #ceph
[17:27] <scuttlemonkey> 12 screws to replace 1 drive == sad panda
[17:28] <jtang> yea i forgot to mention that
[17:28] <jtang> they arent nice to service
[17:28] <jtang> the rails on them are horrid
[17:28] <jtang> they dont slide out very well
[17:28] <scuttlemonkey> hehe
[17:28] <jtang> and they are heavy
[17:29] <jtang> if btrfs was more stable (i.e. software raid worked) then it would be a runner
[17:29] <jtang> well btrfs on el6 anyway
[17:29] <scuttlemonkey> I bet, 45 drives stacked in a 4u case would tend to be a bit on the heavy side
[17:29] * dmner (~tra26@tux64-13.cs.drexel.edu) has joined #ceph
[17:29] * rinkusk (~Thunderbi@CPE00259c467789-CM00222d6c26a5.cpe.net.cable.rogers.com) has joined #ceph
[17:29] <scuttlemonkey> yeah, I'm really hoping btrfs is the future
[17:29] <scuttlemonkey> but it has some work to get there
[17:30] * joelio pulls a host's IEC leads and watches as VMs stay up and magic rebalancing awesomeness takes place
[17:30] <janos> urg, didn't think about how heavy a pod would be
[17:30] <joelio> me likey
[17:30] <janos> cripes
[17:30] <scuttlemonkey> joelio: awesome
[17:30] <jtang> well, if anyone has the bright idea of using pods in a ceph deployment
[17:30] <jtang> my only advice is don't bother
[17:31] <scuttlemonkey> jtang: haha, good to know
[17:31] <jtang> unless you are feeling really lucky, or can get *lots* of them
[17:31] <jtang> so you treat an entire node as a failure unit
[17:31] <jtang> which not a lot of people can afford to do if there is 135tb of space on each pod
[17:31] <scuttlemonkey> yeah, thankfully that's what CRUSH excells at
[17:31] <scuttlemonkey> but yes...most folks will probably just get a bunch of cheap dell 1u's and go nuts
[17:31] <janos> jtang - that sounds like a nutty SPoF
[17:32] <scuttlemonkey> (if they want cheap that it)
[17:32] <jtang> we're looking at two deployments here at our site
[17:32] <scuttlemonkey> er..."if they want cheap, that is"
[17:32] <jtang> one is in testing
[17:32] <jtang> and the other one is kinda in limbo at the moment
[17:32] <jtang> but it looks likely we'll pick ceph for it
[17:32] * netsrob (~thorsten@office-at.first-colo.net) Quit (Quit: Lost terminal)
[17:32] <jtang> *sigh* i need to re-hire*
[17:34] <jtang> also we're deploying with ansible (not puppet/chef or juju)
[17:34] <jtang> and we're having good experiences with it
[17:35] <fghaas> scuttlemonkey: trouble with btrfs is it has been "the future" for a while
[17:35] <scuttlemonkey> jtang: yeah? You like it? Ansible is on my list of stuff to play with
[17:35] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[17:35] <scuttlemonkey> fghaas; I know :(
[17:35] <scuttlemonkey> "surely OP will deliver......."
[17:35] <jtang> scuttlemonkey: yea its funky, i even cooked up a ceph_facts module for it
[17:35] <jtang> so i can collect information from a running cluster to *do stuff*
[17:36] <scuttlemonkey> awesome
[17:36] <jtang> admitadly the module is pretty basic and isnt smart
[17:36] <jtang> but its good enough for what im testing and doing
[17:36] <scuttlemonkey> don't suppose you would be willing to write up a blog post on it for ceph.com?
[17:36] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[17:36] <scuttlemonkey> the orchestration stuff has been getting a decent amount of traffic, so it seems people like to read about it
[17:37] <jtang> the orchestration side of things is pretty adhoc right now
[17:37] <scuttlemonkey> or, if nothing else, just email me a quick overview of what you did and I can use it as a place to start when I play :)
[17:37] <scuttlemonkey> patrick at inktank
[17:37] <jtang> its far from polished on our end, but its good enough to for our documentation and "possible need for redeployment" if needed
[17:37] <scuttlemonkey> nice
[17:37] <jtang> so we dont spend hours figuring things out, its mostly automated
[17:37] <scuttlemonkey> hehe
[17:38] <jtang> with the exception of manually making a ceph.conf file
[17:38] <jtang> scuttlemonkey: i reckon i could do
[17:38] <scuttlemonkey> that would be awesome, thanks
[17:38] <jtang> we have two sets of playbooks for doing ceph right now
[17:38] <jtang> my own one which i test in a vagrant vm
[17:39] <jtang> and the one thats in use by the other team here
[17:39] <jtang> scuttlemonkey: i just need to check with the guys if its okay to do so
[17:40] <jtang> the university/projects hate stuff being published with out them checking it, especially if its paid for out of a research grant
[17:40] <scuttlemonkey> yeah, I can see that
[17:40] <jtang> university/funding agencies
[17:41] <scuttlemonkey> no rush...I'm heading to Germany soon for World Hosting Days and I'm pretty much redlined until after that anyway
[17:41] <jtang> heh, which part of germany?
[17:41] <scuttlemonkey> it's in Rust
[17:42] <scuttlemonkey> but I'll end up spending a few days in Frankfurt as well
[17:42] <jtang> europa park?
[17:42] <scuttlemonkey> yeah, right near there
[17:42] * lx0 is now known as lxo
[17:43] <scuttlemonkey> actually I think the hotel might be in the middle of it
[17:43] <jtang> patrick at inktank.com is your email address?
[17:44] <jluis> scuttlemonkey, you're gonna have a blast :p
[17:45] <jtang> frankfurt is cool
[17:45] <scuttlemonkey> jtang: yep, that's me
[17:45] <jtang> the dates dont seem to co-incide with anything that i might be at to be near there
[17:45] <jtang> :P
[17:45] <jluis> I've only been to the airport really, but Europa-Park is amazing
[17:45] <scuttlemonkey> jluis: yeah?
[17:46] <jluis> last year's WHD kicked ass
[17:46] <jtang> its st. patricks day aroudn then in ireland
[17:46] <jtang> which is tempting for me to go to germany around then
[17:46] * Cube1 (~Cube@ has joined #ceph
[17:46] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Remote host closed the connection)
[17:46] <scuttlemonkey> hehe
[17:46] * jtang cant deal with people
[17:46] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[17:46] <jtang> the wife is german, so i visit germany often
[17:47] <scuttlemonkey> nice
[17:47] <scuttlemonkey> yeah, looks like someone from Inktank is gonna be on the keynote panel and I'm gonna help give part of an openstack talk
[17:47] <scuttlemonkey> (ceph/openstack)
[17:47] <jtang> there is talk of an openstack set of playbooks for ansible
[17:48] <jtang> sparked off from the ceph_facts module ;)
[17:48] <scuttlemonkey> I hope they work famously
[17:48] <scuttlemonkey> the juju charms can be a little...problematic
[17:49] <jtang> i reckon with a week or two worth of work its possible to write some pretty polished playbooks with ansible to deploy and manage a ceph cluster
[17:49] <jtang> the template generation would be the hardest to do
[17:50] <jtang> and i reckon running adhoc tasks (with a playbook) its possible to add an osd and "update" the ceph.conf file everywhere
[17:50] <jtang> you'd probably need to poll the ceph_facts module for information on the cluster though
[17:51] <jtang> juju looks like ubuntu just needed their own thing, redhat has cobbler/spacewalk
[17:51] <jtang> so i guess ubuntu made juju to compete with redhat
[17:51] <scuttlemonkey> could be
[17:51] <scuttlemonkey> although juju is nice
[17:52] <scuttlemonkey> it's just young
[17:52] <jtang> debian/ubuntu pre-seed still looks painful to me
[17:52] <jtang> and juju seems rough
[17:52] <jtang> then again i havent really looked at it
[17:52] <jtang> at least not in detail
[17:57] <jtang> the maas stuff from ubuntu looks interesting though
[17:57] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[17:58] <scuttlemonkey> yeah, especially with the rumblings of virtualized version of maas
[17:58] <joelio> There no0 drive on the puppet front? I'll be writing some modules if not by the looks :)
[17:59] <jtang> joelio: someone started
[17:59] <jtang> but it never really took off
[17:59] <jtang> i had a stab at it, then i migrated our devel/test environment to ceph
[17:59] <jtang> i mean ansible
[17:59] * jtang is deploying a single node, 2osd test system in a vm
[18:00] <joelio> n/p we're heavy (and happy) puppet people, so better the devil you know.. I suppose
[18:00] <jtang> heh yea, if puppet works then stick with it
[18:01] <joelio> oh, it does, save us masses of time with projects
[18:01] <jtang> i just found the learning curve to be too high for new devs joining the team
[18:01] <joelio> erm, ok.. it's just like, well, ruby
[18:01] <joelio> with a DSL on top
[18:01] <jtang> and modules generally suck
[18:01] <joelio> ?
[18:01] <jtang> i usually ended up writing my own
[18:01] <joelio> oh, I'd rather do that anyway
[18:02] <joelio> you know what's 'in the box' so to speak then
[18:02] <jtang> heh, well i like to re-use
[18:02] <jtang> rather than re-invent
[18:02] <jtang> we're not in the business of writing puppet modules here at our site
[18:02] <jtang> so it doenst make much sense for us
[18:03] <joelio> fair enough
[18:03] <jtang> i'd imagine trying to deploy a fresh ceph cluster from puppet with have its issues of getting the order right
[18:03] <joelio> stages
[18:04] <joelio> http://docs.puppetlabs.com/puppet/2.7/reference/lang_run_stages.html
[18:04] <jtang> yea, im never convinced of them
[18:04] <joelio> haha, fair do's - they work for us at least :)
[18:05] <joelio> I'd be looking at writing a custom lib for the ceph puppet module anyway so it can gain some intrspection
[18:05] <jtang> seems kinda odd putting in stages into puppet modules/manifests since the original design didnt seem to do that
[18:05] <jtang> its just 'hard' compared to ansible ;)
[18:06] * joelio never used
[18:07] <jtang> try it ;)
[18:07] <jtang> right must go back to doing work, this is enough of a late lunch/break
[18:08] <jtang> be back later!
[18:08] <jtang> or tomorrow
[18:08] <joelio> aye, I will - I have a full puppet/foreman custom ENC behemoth setup - taken me months to build it, so can't see us moving any time soon :)
[18:08] <joelio> speak late :)
[18:10] <drokita> Anyone running libceph on RedHat?
[18:14] <infernix> drokita: i've built on centos 5.9
[18:15] <drokita> infernix: What kernel is that using?
[18:16] <infernix> no kernel support
[18:16] <infernix> just userspace
[18:17] * eschnou (~eschnou@ Quit (Quit: Leaving)
[18:23] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[18:23] <drokita> Gotcha.... that was my guess. Somebody should tell them to update the kernel.
[18:26] * l0nk (~alex@ Quit (Quit: Leaving.)
[18:27] <joelio> Redhat generally just backport fixes
[18:30] <joelio> Can you not just download vanilla sources and `make rpm` though?
[18:30] <joelio> the 3.8 kernel I'm using is just built with make-dpkg (I assume similar capabilites for rpm)?
[18:32] <drokita> I think it can be done, but if I can't get RedHat to support it, I'd rather just stick with Debian
[18:33] <joelio> very true
[18:39] <joelio> Are there any references to the RBD v2 format I've seen people talking about in here, wouldn't mind reading up
[18:42] * leseb_ (~leseb@3.46-14-84.ripe.coltfrance.com) Quit (Remote host closed the connection)
[18:47] <infernix> joelio: some here http://eu.ceph.com/docs/wip-msgauth/dev/rbd-layering/
[18:50] * alram (~alram@ has joined #ceph
[18:51] * sagelap (~sage@2600:1010:b000:6b8d:d0ae:fa6d:1146:f6bc) Quit (Ping timeout: 480 seconds)
[18:57] <joelio> infernix: cheers pal, will go nicely with a brew and some cake I think :)
[18:58] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:02] * sagelap (~sage@2600:1010:b00e:2916:d0ae:fa6d:1146:f6bc) has joined #ceph
[19:03] * chutzpah (~chutz@ has joined #ceph
[19:03] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[19:04] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:05] * xmltok (~xmltok@pool101.bizrate.com) Quit ()
[19:06] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:08] * Kioob (~kioob@2a01:e35:2432:58a0:21a:92ff:fe90:42c5) has joined #ceph
[19:10] * sagelap (~sage@2600:1010:b00e:2916:d0ae:fa6d:1146:f6bc) Quit (Ping timeout: 480 seconds)
[19:12] <Kioob> does Kernel RBD now support snapshots ?
[19:13] <Kioob> also, in the doc we can read "STOP I/O BEFORE�snapshotting an image. If the image contains a filesystem, the filesystem must be in a consistent stateBEFORE�snapshotting." ; so RBD snapshots are not atomic ?
[19:14] <fghaas> Kioob: there was a bug in there that was fixed just before 0.56.2
[19:15] <fghaas> of course, kernel rbd has v1 image support only, so while you do get snapshots, no joy for clones
[19:15] <Kioob> ok, thanks fghaas ! :)
[19:17] <fghaas> and on the atomic bit, this is relatively common. lvm snapshots have the same issue. you want a snapshot of an LVM LV that hosts xfs, you run xfs_freeze if you want to guarantee consistency. it's just that lvm has a vfs hook that can do this for you automagically, not sure if that has been implemented for kernel rbd -- probably a question for joshd
[19:17] * noob21 (~cjh@ has joined #ceph
[19:18] <jmlowe> freezefs works too
[19:18] <Kioob> mm ok. I was wondering if I can I use that snapshot for automatic VM backup
[19:19] <Kioob> but if it need FS synchronization, it's not a solution
[19:19] <jmlowe> using libvirt?
[19:19] <Kioob> no, xen
[19:19] <jmlowe> libvirt also does xen
[19:20] <Kioob> yes, you're right :/
[19:20] <Kioob> I will look for that
[19:21] <Kioob> is it working with the paravirtualized mode ? (not via QEMU)
[19:21] <jmlowe> kvm and xen I believe have guest agents, libvirt has hooks into those agents to freezefs;snapshot;un freezefs with a single command
[19:22] <jmlowe> if you use the quiesce option for snaptshot create in virsh it will freeze and unfreeze the guest filesystem provided the agent is working
[19:22] <Kioob> in fact I can �pause� a VM, do a snapshot, then �unpause� it
[19:23] <fghaas> Kioob: re xen in PV mode, you can use RBD as a mapped kernel block dev, sure. there was talk about a blktap2 driver at some point, which would save you the kernel roundtrip, but I believe that hasn't seen that much progress
[19:23] * sagelap (~sage@2600:1010:b00e:3f18:d0ae:fa6d:1146:f6bc) has joined #ceph
[19:24] <jmlowe> how are you going to get the contents of the guest's bock device cache onto the disk if you pause the vm?
[19:24] <fghaas> sagelap's ipv6 address is ... phreaky :)
[19:24] <Kioob> I like the kernel way, because it make tools like iostats usable.
[19:26] <Kioob> jmlowe: I can throw sys-req events, but not sure to want that
[19:26] <fghaas> Kioob: can't you get IO stats out of xentop?
[19:26] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[19:26] <Kioob> for me, not taking data from writeback is not a problem : it's exactly like if there was an electric shutdown or a kernel panic
[19:26] <fghaas> (been a while since I last used xen in earnest, mind you)
[19:27] <jmlowe> this is where I think kvm/qemu wins out a little bit over xen, when it comes to snapshoting you can snapshot the running vm and get the complete machine state
[19:27] <Kioob> as far as I know, there is not latency metrics in xentop
[19:28] <Kioob> well jmlowe, it's probably possible. Since it's paravirtualized, both kernel can �collaborates�, which is not possible in full virtualization
[19:28] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[19:29] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[19:29] <Kioob> I don't really know the �snapshot� stuff of Xen, because without Ceph I was not really able to use that
[19:36] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:37] <darkfader> the basic xl save can be used with a disk snapshot to get it "atomic" but you can also be lax and use xl save -c where the vm keeps running. etc...
[19:37] <darkfader> thats all 2005 stuff :)
[19:38] <Kioob> but "xl save" is for memory backup, it's not what I want
[19:39] <darkfader> reading the whole thing ftw ;)
[19:39] * rturk-away is now known as rturk
[19:39] <darkfader> jmlowe just said he likes that kvm does a "complete" snapshot as in ram+disk, but you can pick either
[19:39] <Kioob> ok ;)
[19:41] <Kioob> I think I will start with sysrq→pause→snapshot→unpause
[19:41] <Kioob> it's not atomic, but it's a good starting point
[19:41] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[19:41] <Kioob> except that the sysrq is not blocking :/
[19:42] <darkfader> that's good enough, but note the sysrq sync will show up in dmesg as "emergency sync"
[19:42] <Kioob> mmm yes :/
[19:43] <darkfader> the linux vfs code can also "quiesce" filesystems these days
[19:43] <darkfader> but i didnt find out how
[19:43] <darkfader> do you know when you're trying to post something and you just cant read the captchas...
[19:45] <Kioob> well, XenServer have the �vm-snapshot-with-quiesce� command
[19:46] <Kioob> but of course, I use XenSource, not XenServer
[19:46] <darkfader> obviously, you couldn't type after stabbing your eyes out
[19:47] <Kioob> and I also hate captchas :D I'm often unable to answer
[19:47] <Kioob> (at least without an error)
[19:49] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[19:52] <janos> despise them. i wonder each time "did hte developes who did this page do it well enough that i wont lose any other fields if i fail the captcha?"
[19:53] * sagelap (~sage@2600:1010:b00e:3f18:d0ae:fa6d:1146:f6bc) Quit (Quit: Leaving.)
[19:54] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:58] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[20:00] <joshd1> Kioob: rbd snapshots are atomic, but that doesn't mean your fs can't have a bunch of data in memory that rbd doesn't know about yet.
[20:00] <Kioob> ok, great joshd1
[20:01] <joshd1> Kioob: if you don't do the fsfreeze/thaw dance, you just might need to fsck after you restore
[20:01] <Kioob> exactly
[20:01] <jmlowe> <- hasn't had good luck with fsck'ing a dirty snapshot
[20:01] <jmlowe> wow, I didn't think about the phrasing there
[20:03] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[20:03] * markbby (~Adium@ Quit (Quit: Leaving.)
[20:05] * dmick (~dmick@2607:f298:a:607:514b:1518:5845:bb5f) has joined #ceph
[20:05] * markbby (~Adium@ has joined #ceph
[20:06] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[20:09] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[20:11] <darkfader> Kioob: so basically we'd just love a sysrq for fsfreeze, right? :))
[20:18] * tryggvil (~tryggvil@95-91-243-238-dynip.superkabel.de) has joined #ceph
[20:18] <darkfader> does one of you know if http://comments.gmane.org/gmane.linux.utilities.util-linux-ng/6860 still applies
[20:18] <darkfader> it seems stuff normally doesn't sync before freezing
[20:21] * dpippenger (~riven@ has joined #ceph
[20:26] * b1tbkt (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[20:27] * Tiger (~kvirc@ has joined #ceph
[20:28] <jmlowe> darkfader: I don't see how it wouldn't still apply
[20:29] <darkfader> i just tried to not be negative about it from the start
[20:29] <jmlowe> one way or another if you want any hope of a consistent filesystem you'll have to flush the cache
[20:30] <darkfader> i mean, if you'd ask 100 non-it people on a street they'd all guess you need a sync before snapshotting and that this should be done in the same call as the freeze
[20:30] <darkfader> so it's safe to assume the devs will refuse that :)
[20:31] <jmlowe> that guy is just complaining that sync takes a long time
[20:32] <jmlowe> joke is on him because it will also cause the system to choke if you call it before you freeze
[20:32] <darkfader> ok
[20:32] <darkfader> then i didn't interpret it correctly
[20:32] <darkfader> reading another time
[20:33] <darkfader> oh, yes, now i see the large amount of dirty data piece
[20:33] <darkfader> sorry.
[20:33] <darkfader> i got it wrong and thought it's not syncing and thats what he complained about
[20:33] <jmlowe> his disks just can't keep up that's all
[20:33] <darkfader> so he should just lower his commit interval
[20:34] * markbby (~Adium@ Quit (Quit: Leaving.)
[20:35] <Kioob> jmlowe: for me, the problem is that there is forbid writes → sync → continue. If the sync is long, writes will be forbid for a long time. Instead if you do sync → forbid writes → sync → continue, the «locked time» will be smaller
[20:35] <Kioob> no ?
[20:36] * markbby (~Adium@ has joined #ceph
[20:37] <Kioob> it's why I use ext4 in ordered mode for MySQL instead of the default writeback mode : here more frequent �flush� avoid occasional latency
[20:37] <darkfader> yes of course... but that's like the author of that post. basically you don't want ext to be idiotic and keep gig's of valuable data in ram just b/c it makes someone's laptop faster
[20:37] <darkfader> use commit=5 or so
[20:38] <darkfader> then you're looking at a much shorter spike and get more stable performace anyway
[20:38] <jmlowe> yep, pdflush is going to kill you any way you cut it if you keep too much in memory
[20:38] <darkfader> jmlowe: is there any way where you can see a certain fs is locked? i looked in tune2fs -l and it seems it's not there (ok makes sense since it's in vfs layer now)
[20:38] <darkfader> but nothing in dmesg either
[20:42] <jmlowe> never thought about checking to see if it is locked
[20:43] <jmlowe> ooh, there is a sysreq to unfreeze, I've shot myself in the foot that way once or twice
[20:44] <darkfader> is?
[20:44] <darkfader> that is interesting to know
[20:44] <jmlowe> 'j'
[20:45] <darkfader> but yeah, thats why i'm asking. kinda sucky if you log on a box with issues and it would be in middle of a snapshot and you start poking and can't see there is a freeze atm
[20:45] * mech422 (~steve@ip68-98-107-102.ph.ph.cox.net) has joined #ceph
[20:45] * brambles_ is now known as brambles
[20:46] <mech422> Hello - I just wanted to check, there is no problem running the full ceph stack (including storage) on every server in a 'cluster', right?
[20:46] <mech422> we only have a few machines, so I want to run mon + storage on all of them to get a decent sized quorum
[20:49] <ShaunR> is it not possible to shut ceph down cleanly on a test cluster so that when it comes back up i dont always see a HEALTH_WARNING
[20:49] <dmick> mech422: should be fine
[20:49] <mech422> thank ya much :-)
[20:52] * mech422 (~steve@ip68-98-107-102.ph.ph.cox.net) has left #ceph
[20:55] <dmick> ShaunR: as far as I can tell there's always a period when the OSDs talk to one another and make sure they're all on the same page
[20:55] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[20:56] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[21:02] * Cube (~Cube@ has joined #ceph
[21:02] * Cube1 (~Cube@ Quit (Read error: Connection reset by peer)
[21:15] <ShaunR> hmm, it's really pisd this time..
[21:16] <ShaunR> showing a HEALTH_WARN 1029 pgs peering; 1029 pgs stuck inactive; 1029 pgs stuck unclean
[21:16] <ShaunR> it's normally clean by now
[21:16] <ShaunR> also a rbd list is taking forever too
[21:25] <joelio> what does 'ceph heath detail' sayeth?
[21:27] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[21:28] * eschnou (~eschnou@157.68-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:30] <ShaunR> umm, alot!
[21:30] <ShaunR> :)
[21:31] <ShaunR> http://pastebin.ca/2328845
[21:35] <ShaunR> rbd list is just stuck so i obviously broke this thing.
[21:39] <phantomcircuit> ShaunR, did you trying brining the osd's backup one at a time
[21:39] <phantomcircuit> cause you clearly has a cluster fuck going on right now
[21:39] <phantomcircuit> i assume it will eventually sort itself out though
[21:40] * sstan (~chatzilla@dmzgw2.cbnco.com) Quit (Read error: Operation timed out)
[21:41] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:42] <jmlowe> I do like to see the proper use of the term cluster fuck
[21:43] <phantomcircuit> jmlowe, hehe i was wondering if anybody would notice that
[21:44] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[21:45] * markbby1 (~Adium@ has joined #ceph
[21:45] * markbby (~Adium@ Quit (Remote host closed the connection)
[21:47] <ShaunR> simply ran a service ceph start on each cluster
[21:47] * sstan (~chatzilla@dmzgw2.cbnco.com) has joined #ceph
[21:48] <ShaunR> each server*
[21:51] * madkiss (~madkiss@ has joined #ceph
[21:53] <ShaunR> i'm seeing slow requests in the logs too
[21:53] * bstaz_ (~bstaz@ext-itdev.tech-corps.com) Quit (Remote host closed the connection)
[21:57] <ShaunR> I was just reading somthing, i know your not suppose to run OSD's on the same drive as the OS and all but for this test one of my OSD's has about 100G taken out of it to use for the OS (3 servers, 4 OSDS, 100G taken from 1 OSD on each server). I read somthing that the monitors do fsync and can cause performance issues on a single OSD. Is tha the case even if the drive was partitioned for seperate
[21:57] <ShaunR> OS and OSD mounts?
[21:58] <ShaunR> phantomcircuit: stoping and restarting ceph on all servers resolved my health issue btw too... which seams a bit odd to me
[21:59] <ShaunR> if a simple restart fixed it, why the heck ceph couldnt figure out the problem to begin with seams odd.
[22:01] <phantomcircuit> ShaunR, the peering state has a tendency to get stuck
[22:01] <phantomcircuit> i have no idea why
[22:02] * leseb_ (~leseb@ has joined #ceph
[22:03] <nhm> ShaunR: you might want to check the osd admin sockets during heavy loads and see if the OSD on the system disk consistently has operations backing up on it.
[22:03] <nhm> ShaunR: if it does, you can reweight that OSD to get fewer writes.
[22:03] <gregaf> sjustlaptop might want to know about peering getting stuck — I'm not up on if there are any known issues in releases that have fixes in a branch
[22:03] * esammy_ (~esamuels@host-2-99-4-21.as13285.net) has joined #ceph
[22:03] <nhm> It may help.
[22:03] <sjustlaptop> hi
[22:04] <ShaunR> gregaf: I already fixed it now unfortunately
[22:04] <sjustlaptop> ShaunR: that kind of state is never supposed to happen :)
[22:04] <sjustlaptop> version?
[22:04] <ShaunR> 0.56.3
[22:05] <ShaunR> RPM version from ceph, CentOS 6
[22:05] <sjustlaptop> is it reproducible?
[22:05] <ShaunR> i might be able too, the only thing i did different when starting my cluster this time was i started them backwards from the way i normally do.
[22:05] <ShaunR> normally i do service ceph start on storage 1, then 2, then 3
[22:05] <ShaunR> this time i beleive i went 3, 2, 1
[22:06] <gregaf> fghaas: glad to see you guys are growing :)
[22:06] <fghaas> gregaf: thanks!
[22:07] <gregaf> is Noah somebody I should recognize from around here? (I see you mention him and Ceph)
[22:07] * esammy (~esamuels@host-2-103-102-78.as13285.net) Quit (Ping timeout: 480 seconds)
[22:07] * esammy_ is now known as esammy
[22:07] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) has joined #ceph
[22:07] <gregaf> haha
[22:07] <fghaas> gregaf: there he is :)
[22:07] <noahmehl> hi?
[22:08] <noahmehl> :)
[22:08] <gregaf> hello
[22:08] <noahmehl> hello gregaf
[22:08] * leseb (~leseb@ Quit (Ping timeout: 480 seconds)
[22:08] <fghaas> gregaf: he's been a polite and quiet lurker here, much unlike a loudmouth like me :)
[22:08] <noahmehl> I'm sure i'll have something to say soon :)
[22:08] <dmick> ah yes, noahmehl was on a video call the other day, hi noah
[22:08] <noahmehl> dmick: hello again
[22:09] <ShaunR> sjustlaptop: just tried to replicate it and the cluster health is ok.
[22:09] <sjustlaptop> mm
[22:09] <sjustlaptop> that's unfortunate, I had thought we had fixed it in 56.3
[22:10] * bstaz (~bstaz@ext-itdev.tech-corps.com) has joined #ceph
[22:10] <sjustlaptop> can you describe the steps leading up to the stuck peering?
[22:10] <sjustlaptop> and do you happen to have ceph -s from when it was stuck?
[22:10] <sjustlaptop> and even better ceph pg dump/ceph osd dump
[22:11] <ShaunR> i have a ceph bg dump
[22:15] <ShaunR> http://pastebin.ca/2328862
[22:15] <ShaunR> thats all i got
[22:15] <dmick> 404 ?
[22:16] <gregaf> it just worked for me (although the earlier one didn't) — pastebin troubles, maybe?
[22:16] <dmick> weird.
[22:17] <janos> ARG hurrrr. someone smack me with a fish. one of my smaller osd nodes is an AMD APU unit. i just realized why it was being starved for memory (was borderline already) it pulls a pretty decent chunk for video even though this is runlevel 3
[22:17] <janos> *in ghost voice* learn this lesson from meeeeeee. don't be a victim
[22:18] <dmick> http://www.youtube.com/watch?v=IhJQp-q1Y1s
[22:18] <janos> hahah
[22:18] <fghaas> janos: which type of fish would you prefer?
[22:19] <fghaas> I say you deserve a smack with a Great White :) (although it would be a pain to lug)
[22:19] <janos> i do deserve it, but have some mercy - this is a home cluster!!!
[22:19] <janos> would not do this for work
[22:20] <fghaas> janos: alright, goldfish then
[22:20] <ShaunR> sjustlaptop: you get it?
[22:20] <fghaas> don't anyone tell PETA about this conversation, though
[22:20] <sjustlaptop> cool
[22:20] <janos> i dont tell them anything ever
[22:21] <sjustlaptop> I'll have to look at it later
[22:21] <Kioob> one other question about snapshots : with LVM snapshots slow down IO a lot, and with BTRFS snapshots impact a lot the stability. How is it with RBD ? :D
[22:21] <ShaunR> sorry i dont have much more, if i see it again i wont fix it this time :)
[22:21] <fghaas> for lvm that's actually no longer true
[22:21] <sjustlaptop> no worries, might be enough
[22:21] <Kioob> fghaas: yes, since thin support ?
[22:22] <fghaas> yeah, well since they rebuilt snapshots on thinp or whatever they did
[22:23] <Kioob> and is it stable enough with RBD ? I can use it in production ?
[22:24] * eschnou (~eschnou@157.68-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[22:25] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:28] <joshd1> Kioob: it's stable and fast, for kernel rbd you'll want 3.6+ (for snapshot and networking bugs)
[22:29] <Kioob> perfect !
[22:29] <fghaas> joshd1: do you have any numbers on impact of snapshots on rbd performance (if any)?
[22:30] <Kioob> maybe fragmentation, if the backend is not btrfs ?
[22:31] <fghaas> well your rbd is _definitely_ gonna be fragmented across tons of objects and dozens of OSD, that's built into it's design (and quite rightly so)
[22:31] <joshd1> fghaas: no, all it takes for the client doing i/o is re-reading the rbd header on the next operation
[22:31] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[22:32] <fghaas> "its design"
[22:32] <Kioob> ok :)
[22:33] <fghaas> joshd1: thanks
[22:33] <Kioob> so... take a snapshot of each RBD every hour is ok ? :p
[22:34] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[22:37] <joshd1> sure, just don't forget to delete them
[22:37] <Kioob> of course !
[22:40] * diegows (~diegows@ has joined #ceph
[22:48] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:48] * loicd (~loic@magenta.dachary.org) has joined #ceph
[22:49] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[22:54] * Philip__ (~Philip@hnvr-4d079fe4.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[22:56] * madkiss (~madkiss@ Quit (Quit: Leaving.)
[23:09] <janos> arg, even worse on slow OSD. dmesg showing it's only seeing 2GB of the ram too, which explains my "wtf" when looking at htop
[23:09] <janos> weee!
[23:10] <Kioob> so, if it can help someone : http://pastebin.com/UJki5L0h
[23:11] <Kioob> outch... I have scrub errors :(
[23:13] * Cube1 (~Cube@ has joined #ceph
[23:13] * Cube (~Cube@ Quit (Write error: connection closed)
[23:15] <Kioob> oh !
[23:15] <Kioob> I had 3 PG in errors, I force a new scrub on them and now they are fine
[23:15] * yanzheng (~zhyan@ has joined #ceph
[23:22] * yanzheng (~zhyan@ Quit (Remote host closed the connection)
[23:23] <Kioob> osd.40 is near full at 88%
[23:23] <Kioob> ...
[23:23] <Kioob> great :D
[23:26] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:26] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:27] * esammy (~esamuels@host-2-99-4-21.as13285.net) has left #ceph
[23:31] <janos> arg. another motherboard is going to get stabbed and thrown outside. i can feel it
[23:32] * janos will remove items on it he likes first ;)
[23:36] * ntranger (~ntranger@proxy2.wolfram.com) has joined #ceph
[23:36] <ntranger> hey all!
[23:38] <rturk> Hi :)
[23:38] <ntranger> I had a quick question (or atleast I hope its quick) with an issue trying to start ceph. It keeps giving me a "no filesystem type defined", and I'm not finding much help on google.
[23:39] * flepied (~fredl@2a00:1a48:7803:107:8532:c238:ff08:354) Quit (Ping timeout: 480 seconds)
[23:40] <ntranger> I've formatted out the drives I'm using as ext4, and this I have in the conf "filestore xattr use omap = true", which the destructions I've been reading says should be right
[23:40] <infernix> dmick: that kernel rbd issue i have
[23:40] <infernix> could it be that old_format=false is the culprit?
[23:41] <infernix> that's how i make them with librbd
[23:41] <dmick> certainly the kernel can't deal with new format images
[23:41] <infernix> tadaa
[23:41] <dmick> I am surprised that the map doesn't fail, honestly, though
[23:41] <infernix> it doesn't, it just happily returns a device the size of the rbd device.
[23:41] <dmick> although error reporting there is...tough, since it's just "write to a /sys file"
[23:41] <infernix> with all zeroes
[23:41] <infernix> it should actually fail
[23:41] <dmick> I don't know for sure if that's expected behavior or not
[23:42] <infernix> so what am i missing if i don't use the new format, just snapshots?
[23:42] <dmick> not snapshots, but clones
[23:42] <infernix> no performance difference of any kind i presume?
[23:42] <dmick> and fancy striping
[23:43] <dmick> I'm not immediately aware of any
[23:43] <infernix> aha. well speed is my main concern, so how fancy is that fancy striping?
[23:44] * ScOut3R (~scout3r@1F2EAE22.dsl.pool.telekom.hu) has joined #ceph
[23:45] <dmick> don't know that we've studied performance a lot.
[23:45] <dmick> looking for docs
[23:45] * fred1 (~fredl@2a00:1a48:7803:107:8532:c238:ff08:354) has joined #ceph
[23:46] <ntranger> hey rturk, mind if I message you?
[23:46] <dmick> this is for cephfs, but the striping mech is the same for format 2 rbd devices
[23:46] <dmick> http://ceph.com/docs/master/dev/file-striping/?highlight=striping
[23:47] <rturk> ntranger: sure, although I'm probably not the right person for troubleshooting :)
[23:47] <ntranger> :D
[23:48] <dmick> ntranger: you say It keeps giving me a "no filesystem type defined"
[23:48] <dmick> what is "it"?
[23:49] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[23:49] <ShaunR> rturk: why not combine CENTOS/RHEL? -> http://ceph.com/community/results-from-the-ceph-census/
[23:49] <ntranger> sorry, when I go to start ceph, I get this
[23:49] <ntranger> [root@storage1 ceph-a]# service ceph -a start
[23:49] <ntranger> === mon.a ===
[23:49] <ntranger> Starting Ceph mon.a on storage1...
[23:49] <ntranger> starting mon.a rank 0 at mon_data /var/lib/ceph/mon/ceph-a fsid 1e4f24e7-acc2-4298-a0cb-f34eccc1b177
[23:49] <ntranger> === mds.a ===
[23:49] <ntranger> Starting Ceph mds.a on storage1...
[23:49] <ntranger> starting mds.a at :/0
[23:49] <ntranger> === osd.0 ===
[23:49] <ntranger> No filesystem type defined!
[23:50] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[23:50] <dmick> so, /etc/init.d/ceph gives you that when starting osd.0
[23:50] <rturk> ShaunR: hmmm..why?
[23:50] <ntranger> yes
[23:51] <ntranger> and this is what I have in the conf under osd.0
[23:51] <ntranger> [osd.0]
[23:51] <ntranger> host = storage1
[23:51] <ntranger> devs = /ceph1
[23:51] <ShaunR> rturk: because CentOS is just a RHEL rebranded
[23:51] <rturk> sort of
[23:52] <ntranger> and under osd, I have this "filestore xattr use omap = true" (without quotes)
[23:52] <dmick> 1) /ceph1 isn't a dev, it's a path, so you probably don't want to say that
[23:52] <ntranger> ok
[23:52] <ntranger> so instead of the path, I should have /dev/sdb
[23:52] <ntranger> ?
[23:52] <rturk> ShaunR: I want to rethink the OS questions entirely - CentOS wasn't even one of the multiple choice, enough people put it in for "other"
[23:53] <dmick> 2) if that's a path where you've mounted a formatted filesystem for the OSD to use, you want "osd data = /ceph1" (and you'll need something for journal too)
[23:53] <dmick> but you should probably read and digest http://ceph.com/docs/master/rados/configuration/ceph-conf/#osds and then ask more questions if it's unclear
[23:53] <ntranger> awesome. Thanks so much for your assistance. :)
[23:54] <rturk> dmick: hmm, maybe that doc page should be more clear about the difference between devs and path..i.e., you want to use one or the other?
[23:54] <ShaunR> rturk: CentOS is definitly a popular distro
[23:55] <dmick> devs are ... more complex
[23:55] <rturk> ShaunR: yes, for sure. It'll be there next time.
[23:55] <rturk> maybe I should do a "call for questions" before the next one
[23:55] <rturk> I bet there are better survey questions out there than the ones I can come up wtih
[23:55] <rturk> :)
[23:56] <ShaunR> My test deployment is on CentOS, and i was planing on my production cluster being on CentOS, however i'm wondering from a newer glib/kernel standpoint if i should be running ubuntu for performance reasons.
[23:57] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[23:57] * ChanServ sets mode +o scuttlemonkey

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.