#ceph IRC Log


IRC Log for 2013-02-14

Timestamps are in GMT/BST.

[0:03] <wer> nhm_: hmm. then what :)
[0:03] <slang1> dilemma: I think you should upgrade those two osds to bobtail and see if that fixes your problem, but let me check with sjust to to be sure that will work for you
[0:06] * ricksos (c0373727@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[0:07] <wer> nhm_: so should I be gathering these stats when I am running throughput tests to see if there is a trend? Right now, during a 12% degraded situation, they are bouncing from 0 to as high as 74....
[0:08] <nhm_> wer: that's not necessarily terrible. What some times can happen is if there is a consistently slow OSD, everything else is idle while that 1 OSD has all of the outstanding operations waiting on it.
[0:08] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[0:09] * sleinen (~Adium@2001:620:0:26:d51e:8c9b:8b6f:6815) Quit (Quit: Leaving.)
[0:10] <wer> ahhh. hmm. yeah have one osd with higher then normal memory... I can check that guy, and I had two freak out on a mon and cause it to abort (which I owe logs on).... other then that, I seem to just have expectation issue. In that ceph is not performing to my expectations :) And I am so sad, and so out of ideas to tweak it better.
[0:11] <wer> I keep looking at network, but I don't believe network is the issue.
[0:11] <nhm_> wer: hrm, what are you trying to acheive and with what hardware?
[0:11] <wer> I have 95 osd's on four nodes. Each with 10gig links.
[0:12] * jjgalvez1 (~jjgalvez@ has joined #ceph
[0:12] <wer> I need 4+ gigs on read. And I am around 1.6gbps or around 200MBps.
[0:12] <nhm_> wer: what size reads?
[0:12] <wer> am I asking too much of the hardware? This is a 261TB test.
[0:13] <wer> 20MB each. large files.
[0:13] <nhm_> what program to test the reads?
[0:13] <wer> ACtually, I am not sure... cause I am reading old populated objects so they could be !mb or 17 or 20 actually.
[0:13] <wer> *1MB.
[0:14] <wer> oh. using rest-bench and tsung.
[0:14] * jjgalvez1 (~jjgalvez@ Quit ()
[0:14] <nhm_> What kind of write/read throughput do you get with rados bench?
[0:15] <wer> 366MB quite a bit more.
[0:15] <mjevans> Some of that is probably cached IO
[0:15] * jjgalvez (~jjgalvez@ Quit (Ping timeout: 480 seconds)
[0:15] <gregaf> have you tried adding extra gateways and sticking a load balancer in front of them? and how many requests are you giving it at a time?
[0:16] <gregaf> you'd need to give it quite a lot to get that kind of bandwidth since the per-request latency is pretty high with 20MB objects
[0:17] <nhm_> wer: You may want to try increasing read_ahead_kb to 512 for each OSD device as well.
[0:17] <wer> so if I run rest-bench at 40 requests... I get 200MB on writes.... it round robins using a hwlb.
[0:17] <wer> is that an osd setting? or disk cache type setting?
[0:17] <nhm_> wer: I'd suggest starting out with just trying to optimize object creation/read performance first, then move on to RGW.
[0:18] * mauilion (~dcooley@crawford.dreamhost.com) has joined #ceph
[0:18] <nhm_> wer: that's a linux block device setting in /sys/block/<device>/queue
[0:18] <slang1> dilemma: ok, so if you don't want to upgrade to bobtail right now, your best option is to try restarting all the osds together
[0:18] <slang1> dilemma: instead of just the two that keep failing
[0:18] <wer> yeah. I have ditched radosgw... to see the affects rados bench of things first.
[0:18] <slang1> dilemma: if that doesn't work, your best bet is to upgrade everything to bobtail
[0:19] <nhm_> wer: Are you using a raid controller or JBOD?
[0:19] <wer> nhm_: ok I will check that. um not a raid controller. Just a multiport sata... I can get the exact model.
[0:19] <wer> AFAIK.
[0:20] <wer> Hell I have exact machine specs if you want those :)
[0:20] <wer> I am just trying to set my expectation.... cause after changing over to ten gig links... I didn't see the need for them. And that makes santa cry.
[0:20] <dilemma> slang1: thanks for the help
[0:21] <nhm_> wer: just fyi, I can do 2GB/s reads from a single node.
[0:21] * ninkotech (~duplo@ip-89-102-24-167.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[0:21] <wer> good. how many osd's?
[0:21] <nhm_> wer: s it is possible, just takes some work. :)
[0:21] <nhm_> wer: 24 OSDs
[0:21] <mjevans> wer: those are called 'HBA's or Host Bus Adapters
[0:21] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:21] <nhm_> wer: this is using bonded 10GbE to the client.
[0:22] <wer> ok. I am not bonded. Just one link per node.
[0:22] <nhm_> wer: the machine has 4 SAS9207-8i controllers with 24 spinning disks and 8 SSDs for journals.
[0:22] <wer> ahh see.
[0:22] <wer> I am not using ssd's for the journals and I expect that may be part of the throughput issue.
[0:22] <nhm_> wer: shouldn't matter for reads though.
[0:22] <wer> ok.
[0:22] <mjevans> Nice setup; your hardware budget
[0:23] <nhm_> mjevans: that's Inktank's performance test node. :)
[0:23] <mauilion> hey all! I am having great difficulty getting glance to use rbd. here is my ceph.conf http://nopaste.linux-dev.org/?67849 my glance-api.conf http://nopaste.linux-dev.org/?67850 the interesting part of my log file http://nopaste.linux-dev.org/?67851
[0:23] <wer> yeah pretty much. That is what I have.
[0:23] <wer> so nhm_ you think I should go adjust the cache first?
[0:24] <nhm_> wer: tough to say. The way I usually go about this is to first test concurrent fios to the disks without ceph involved at all.
[0:24] <nhm_> wer: Just to make sure they can each sustain reasonable throughput when written to concurrently.
[0:24] <nhm_> (or read from)
[0:25] <nhm_> If rados bench performance is bad, I look at the osd admin sockets to see if I'm backing up on particular OSDs.
[0:25] <nhm_> If not, I try to identify other causes. Cache? lack of syncfs support? Strange build? networking issues? etc.
[0:26] <wer> we have syncfs.. .that required a kernel to get working correctly :)
[0:26] <nmartin> so is the normal way to do OSDs 1:1 OSD:PhysDev?, so one OSD per physical disk, assuming jbod?
[0:27] <nhm_> nmartin: that tends to work best in a lot of cases. If you have something like 60 drives per node it may not be optimal.
[0:27] <nhm_> Another option is to do a single-drive raid0 array which on LSI controllers gets you WB cache if you have it.
[0:27] <nhm_> For other controllers like Areca it will use the WB cache even in JBOD mode.
[0:28] <wer> ok, I am going to see what my cache is set for... maybe santa will quit crying.
[0:29] <scuttlemonkey> mauilion: was just on my way out, but I could probably help with that tomorrow if you're around...alternately you can see if joshd is around
[0:29] <nhm_> wer: oh, you also may want to try just a single OSD node.
[0:29] <scuttlemonkey> alternately, this may help slightly (although it's the most basic crap that you probably already know very well by now) http://ceph.com/howto/building-a-public-ami-with-ceph-and-openstack/
[0:29] <nhm_> wer: even possibly running rados bench locally without any network involved.
[0:30] <wer> nhm_: so I think we ran some test... a single osd test... but I was not privy. I will check on that.
[0:30] <wer> on rados
[0:30] <wer> would I need to create a pool that only spans one node or something?
[0:31] <nhm_> wer: You can do that, I typically just recreate the cluster for 1 node, but I've got scripts that do all of that for me.
[0:31] <nhm_> wer: IE reformat the drives, rerun mkcephfs, etc.
[0:32] <wer> yikes. I have gotten pretty got at that.... but have shtuff on this thing. hmm.
[0:32] <mauilion> scuttlemonkey: thanks
[0:32] <wer> nhm_: I'll see about making a new pool though. ok. well I have a few things to look at. Thanks!
[0:32] <mauilion> scuttlemonkey: I think maybe this is going to help. http://irclogs.ceph.widodh.nl/index.php?date=2012-10-26
[0:33] <nhm_> wer: np. Good luck!
[0:33] <mauilion> scuttlemonkey: if I am still stuck after chasing that I will come back
[0:50] <mauilion> EUREKA
[0:50] <mauilion> :)
[0:50] <mauilion> [client.images] key = AQBYJRxRCPXOKRAAihHBQJFYLmNebr96zuk00A==
[0:51] <mauilion> my problem was with caps
[0:51] <mauilion> in this doc https://github.com/ceph/ceph/blob/master/doc/rbd/rbd-openstack.rst
[0:51] <wer> nhm_: write are going to be suckish if I am not putting my journals on SSD's?
[0:51] <mauilion> there is a line ceph auth get-or-create client.images mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images'
[0:52] <mauilion> and that won't work
[0:52] <mauilion> it needs to be ceph auth get-or-create client.images mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool images'
[0:52] <mauilion> arg
[0:53] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[0:53] <mjevans> mauilion: yes that line is bad. allow pool images rwx OR allow pool=images rwx
[0:53] <mjevans> The rwx first is incorrect
[0:53] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[0:54] <joshd> mauilion: which version of ceph are you running?
[0:55] <mauilion> bobtail
[0:55] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[0:55] <mauilion> specifically 0.56.2-1precise
[0:55] <nhm_> wer: RAID controllers with WB cache tend to do better than JBOD controllers if journals are on the disks.
[0:56] <mauilion> joshd: actually got this working per your work in this log! http://irclogs.ceph.widodh.nl/index.php?date=2012-10-26
[0:56] <mauilion> :)
[0:56] <mjevans> BBU ram?
[0:57] <mauilion> mjevans: so I should change it again to make rwx the last arg?
[0:57] <mauilion> mjevans: just nuking the = sign seems to have moved me forward
[0:57] <mauilion> :)
[0:58] <joshd> mauilion: mjevans: both of you and soneone else had odd errors related to caps recently. those caps continue to work for me though (just tested rwx pool=images again). I suspect there's a bug with osd cap construction somewhere
[0:59] <joshd> like uninitialized memory being read at some point
[1:01] <sagewk> any rhel/centos users in here?
[1:03] * jtang1 (~jtang@ has joined #ceph
[1:03] * ScOut3R (~ScOut3R@5400CAE0.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[1:05] <wer> nhm_: yeah I have LSISAS2008. That is just plain old jbod I think. hmm.
[1:06] <nhm_> Just 1 controller?
[1:06] <wer> I think so :) I am looking.
[1:07] <nhm_> Haven't tried that controller with expanders, but I've had 8 drives connected to one and I think it was around 800MB/s reads with 1 node.
[1:07] <nhm_> got to go, bbl
[1:07] <wer> ok thanks
[1:18] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) Quit (Quit: Leaving)
[1:18] * diegows (~diegows@ has joined #ceph
[1:21] * jlogan (~Thunderbi@2600:c00:3010:1:f10b:fe00:c3e7:1d31) Quit (Quit: jlogan)
[1:25] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[1:25] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:25] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:29] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[1:31] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[1:37] * jtang1 (~jtang@ has joined #ceph
[1:41] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[1:50] * al (d@niel.cx) Quit (Remote host closed the connection)
[1:51] <joshd> mjevans: could you try creating another user with caps to a libvirt-pool-test again to see if you'll get the same problem as before? I wondering if it's reproducible with a new user (also filed http://tracker.ceph.com/issues/4122 about it)
[1:53] * LeaChim (~LeaChim@5e0d73fe.bb.sky.com) Quit (Ping timeout: 480 seconds)
[1:53] * al (d@niel.cx) has joined #ceph
[1:54] * aliguori (~anthony@cpe-70-112-157-4.austin.res.rr.com) Quit (Remote host closed the connection)
[2:27] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:32] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[2:40] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[2:43] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[3:02] * Cube (~Cube@ Quit (Ping timeout: 480 seconds)
[3:04] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[3:07] <lurbs_> Have I just noticed it, or is the bytes read/written and ops/sec output on ceph -w new in 0.56.3?
[3:07] * lurbs_ is now known as lurbs
[3:19] * The_Bishop (~bishop@2001:470:50b6:0:e8cb:1adf:280:a93d) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[3:23] * JohansGlock (~quassel@kantoor.transip.nl) Quit (Read error: Connection reset by peer)
[3:29] * phillipp (~phil@p5B3AFF20.dip.t-dialin.net) Quit (Quit: Leaving.)
[4:03] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[4:03] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[4:05] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:18] * rturk is now known as rturk-away
[4:19] * KindOne (KindOne@h174.186.130.174.dynamic.ip.windstream.net) Quit (Remote host closed the connection)
[4:26] * KindOne (KindOne@h174.186.130.174.dynamic.ip.windstream.net) has joined #ceph
[4:30] <dmick> lurbs: yes
[4:30] <dmick> 3f6837e022176ec4b530219043cf12e009d1ed6e Fri Jan 25
[4:34] * The_Bishop (~bishop@e179010244.adsl.alicedsl.de) has joined #ceph
[4:36] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:48] * jvanb (~jvanb@c-76-20-150-129.hsd1.mi.comcast.net) has joined #ceph
[4:55] * jvanb (~jvanb@c-76-20-150-129.hsd1.mi.comcast.net) Quit (Quit: Leaving)
[4:59] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[5:02] * terje_ (~joey@97-118-121-147.hlrn.qwest.net) Quit (Remote host closed the connection)
[5:02] * terje_ (~joey@97-118-121-147.hlrn.qwest.net) has joined #ceph
[5:04] * mauilion (~dcooley@crawford.dreamhost.com) Quit (Quit: Lost terminal)
[5:15] * themgt (~themgt@97-95-235-55.dhcp.sffl.va.charter.com) has joined #ceph
[5:21] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[5:34] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[5:41] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) has joined #ceph
[5:43] <Karcaw> someone should update the motd for the channel...
[5:45] * scuttlemonkey changes topic to 'v0.56.3 has been released -- http://goo.gl/f3k3U || argonaut v0.48.3 released -- http://goo.gl/80aGP || performance tuning overview http://goo.gl/1ti5A'
[5:45] * dmick (~dmick@2607:f298:a:607:e4bc:3c7b:ef1d:1a2f) Quit (Quit: Leaving.)
[5:46] * dmick (~dmick@2607:f298:a:607:87f:9c2e:9564:d22c) has joined #ceph
[5:47] * ChanServ sets mode +o dmick
[5:47] <dmick> looks like someone just did :)
[5:47] <scuttlemonkey> :)
[5:47] <scuttlemonkey> did we rev argonaut too?
[5:47] <Karcaw> does the rpm repo info need to be updated to include the new rpms? or am i just bad at grabbing a mirror of it
[5:47] <dmick> (I wedged Pidgin trying ot mess with it)
[5:58] <dmick> not sure Karcaw, it looks ilke the metainfo was updated today
[5:58] <dmick> I'm no rpm expert
[6:01] <Karcaw> it looks correct, but i think our cobbler server is not getting the new list..
[6:02] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[6:06] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:10] * lx0 is now known as lxo
[6:15] * rturk-away is now known as rturk
[6:18] <dmick> scuttlemonkey: sorry I didn't answer. I do not believe argonaut was updated
[6:28] * ananthan_RnD (~ananthan@ has joined #ceph
[6:30] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 18.0.2/20130201065344])
[6:36] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (Ping timeout: 480 seconds)
[6:36] * rturk is now known as rturk-away
[7:25] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[7:27] * sleinen1 (~Adium@2001:620:0:25:48bd:aa72:ca76:3b5) has joined #ceph
[7:30] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[7:32] * sleinen1 (~Adium@2001:620:0:25:48bd:aa72:ca76:3b5) Quit ()
[7:32] * sleinen1 (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[7:34] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:38] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[7:40] * sleinen1 (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:57] * gucki (~smuxi@HSI-KBW-095-208-162-072.hsi5.kabel-badenwuerttemberg.de) has joined #ceph
[8:07] * bstaz_ (~bstaz@ext-itdev.tech-corps.com) Quit (Ping timeout: 480 seconds)
[8:07] * bstaz (~bstaz@ext-itdev.tech-corps.com) has joined #ceph
[8:29] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[8:33] * ScOut3R (~scout3r@5400CAE0.dsl.pool.telekom.hu) has joined #ceph
[8:35] * themgt (~themgt@97-95-235-55.dhcp.sffl.va.charter.com) Quit (Quit: themgt)
[8:40] * gaveen (~gaveen@ has joined #ceph
[8:46] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[8:55] * gerard_dethier (~Thunderbi@ has joined #ceph
[8:56] * sleinen (~Adium@2001:620:0:25:199c:bf39:aeb9:e76d) has joined #ceph
[9:02] * JohansGlock (~quassel@kantoor.transip.nl) has joined #ceph
[9:03] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[9:03] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: Depression is merely anger without enthusiasm)
[9:04] * low (~low@ has joined #ceph
[9:05] * jtang1 (~jtang@ has joined #ceph
[9:06] * sleinen (~Adium@2001:620:0:25:199c:bf39:aeb9:e76d) Quit (Quit: Leaving.)
[9:06] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[9:06] * themgt (~themgt@97-95-235-55.dhcp.sffl.va.charter.com) has joined #ceph
[9:07] * themgt (~themgt@97-95-235-55.dhcp.sffl.va.charter.com) Quit (Remote host closed the connection)
[9:08] * loicd (~loic@lvs-gateway1.teclib.net) has joined #ceph
[9:14] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[9:21] * bstaz_ (~bstaz@ext-itdev.tech-corps.com) has joined #ceph
[9:21] * bstaz (~bstaz@ext-itdev.tech-corps.com) Quit (Read error: Connection reset by peer)
[9:22] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[9:37] * jtang1 (~jtang@ Quit (Quit: Leaving.)
[9:39] * sleinen (~Adium@2001:620:0:26:dd2e:c974:ba65:7b) has joined #ceph
[9:41] * ScOut3R (~scout3r@5400CAE0.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[9:45] * l0nk (~alex@ has joined #ceph
[9:50] * leseb (~leseb@2001:980:759b:1:3d67:d80a:6ada:9700) has joined #ceph
[10:06] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:17] * LeaChim (~LeaChim@5e0d73fe.bb.sky.com) has joined #ceph
[10:21] * ScOut3R (~ScOut3R@ has joined #ceph
[10:23] * leseb_ (~leseb@mx00.stone-it.com) has joined #ceph
[10:30] * jtangwk (~Adium@2001:770:10:500:d97f:2952:4d27:da7f) Quit (Quit: Leaving.)
[10:30] * jtang1 (~jtang@2001:770:10:500:c040:95e5:4a16:7ae4) has joined #ceph
[10:30] * jtang1 (~jtang@2001:770:10:500:c040:95e5:4a16:7ae4) Quit ()
[10:30] * leseb (~leseb@2001:980:759b:1:3d67:d80a:6ada:9700) Quit (Ping timeout: 480 seconds)
[10:31] * jtangwk (~Adium@2001:770:10:500:d97f:2952:4d27:da7f) has joined #ceph
[10:31] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[10:55] * ScOut3R_ (~ScOut3R@ has joined #ceph
[11:01] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[11:19] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[11:32] * ScOut3R_ (~ScOut3R@ Quit (Remote host closed the connection)
[11:32] * ScOut3R (~ScOut3R@ has joined #ceph
[11:33] * eschnou (~eschnou@ has joined #ceph
[11:39] <Gugge-47527> Whats the difference in OSD reweights and CRUSH weights?
[11:42] <loicd> https://github.com/ceph/ceph I get a oops 500 :-D
[11:43] <loicd> is it just me ?
[11:45] <Gugge-47527> Nope, not just you :)
[11:45] * ananthan_RnD (~ananthan@ has left #ceph
[11:59] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[12:02] <loicd> Gugge-47527: ceph is getting too big for github ? :-D
[12:07] <absynth_> maybe github runs on ceph? :D
[12:07] <joao> absynth_, what are you implying? :p
[12:08] <absynth_> me? im not plying anything!
[12:08] <absynth_> sprechen sie deutsch? ich verstehe nicht english!
[12:09] <joao> I can recognize 'deutsch' and 'english' in that phrase
[12:09] <joao> and 'sprechen'
[12:09] <absynth_> smörebröd römpömpöm
[12:10] <joao> now I think you're just mocking me :p
[12:10] <absynth_> http://www.youtube.com/watch?v=nw-z_FAyIVc
[12:16] <liiwi> http://www.youtube.com/watch?v=tCHKIdup5Lo <- consultants
[12:22] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[12:28] <joao> wth
[12:28] <joao> AUTHORS
[12:28] <joao> Something went wrong.
[12:28] <joao> this from gh
[12:29] * jksM (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[12:32] <dwm37_> Wait, github is ROR-based, isn't it?
[12:32] * jks (~jks@4810ds1-ns.0.fullrate.dk) Quit (Ping timeout: 480 seconds)
[12:32] <dwm37_> I wouldn't be shocked if they're having another emergency security upgrade..
[12:51] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[12:54] <absynth_> it's only ceph/ceph
[12:54] <absynth_> ceph/ceph-client works
[13:19] * leseb_ (~leseb@mx00.stone-it.com) Quit (Remote host closed the connection)
[13:28] * leseb (~leseb@mx00.stone-it.com) has joined #ceph
[13:31] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[13:35] * alexxy[home] (~alexxy@2001:470:1f14:106::2) has joined #ceph
[13:35] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Read error: Connection reset by peer)
[13:36] * BillK (~BillK@124-168-232-158.dyn.iinet.net.au) has joined #ceph
[13:53] * ScOut3R (~ScOut3R@ Quit (Remote host closed the connection)
[13:54] * ScOut3R (~ScOut3R@ has joined #ceph
[13:55] <loicd> Would someone be willing to review https://github.com/dachary/ceph/commit/02a353e5940e003cfcdffc77920a6b518d581d95 in the context of http://tracker.ceph.com/issues/4123 ?
[13:56] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[13:59] * sleinen (~Adium@2001:620:0:26:dd2e:c974:ba65:7b) Quit (Quit: Leaving.)
[13:59] * sleinen (~Adium@ has joined #ceph
[14:00] <loicd> leseb: hi
[14:01] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:06] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[14:07] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[14:12] * ScOut3R (~ScOut3R@ Quit (Remote host closed the connection)
[14:12] * ScOut3R (~ScOut3R@ has joined #ceph
[14:20] * sleinen (~Adium@ has joined #ceph
[14:21] * sleinen1 (~Adium@2001:620:0:26:856:9884:2fbe:a2ec) has joined #ceph
[14:23] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[14:28] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[14:33] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[14:34] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:34] <TMM> has anyone ever considered building a datastore with SQL-like semantics on top of rados?
[14:34] <TMM> I was considering it
[14:35] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[14:40] <absynth_> nhm_: awake?
[14:40] * leseb (~leseb@mx00.stone-it.com) Quit (Remote host closed the connection)
[14:45] <scuttlemonkey> another deployment/orchestration example up on the Ceph blog for those who enjoy such things: http://ceph.com/community/deploying-ceph-with-comodit/
[14:45] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:46] <scuttlemonkey> will probably push a link to ceph-user list in a few as well
[14:46] <Robe> anything on puppet yet? ;)
[14:46] <scuttlemonkey> it's coming...I know puppet the least, so it fell to the bottom of my personal list
[14:47] <Robe> *nods*
[14:47] <Robe> and probably a huge pain in the ass
[14:47] <scuttlemonkey> Unless you want to write one up! I'd certainly publish a guest puppet walkthrough
[14:47] <scuttlemonkey> =D
[14:47] <Robe> I've dabbled a bit in it
[14:47] <scuttlemonkey> hey, if you get so inspired feel free to just send the raw line-by-line to community@inktank
[14:47] <Robe> the concept that you need to manage explizit identifiers for OSDs and mon servers makes it hard to deal with it in puppet terms
[14:47] <scuttlemonkey> I can write the prose
[14:47] <scuttlemonkey> yeah
[14:48] <Robe> explicit even..
[14:48] <scuttlemonkey> my next one is Juju, but there were some issues deploying it on ec2 that I had to fix
[14:48] <scuttlemonkey> once those fixes find their way to mainline I'll push the juju blog
[14:48] <Robe> nice
[14:49] <Robe> is there anything on the roadmap to remove explicit osd/mds/mon identifiers?
[14:49] <scuttlemonkey> yeah, it's a testament to juju I think...I'm _not_ a developer, but I could jump in and edit the python
[14:49] <scuttlemonkey> that I don't know
[14:49] <scuttlemonkey> but I haven't heard anything about that, so my guess would be no...but take with the requisite grain of salt
[14:50] <Robe> heh, yeah
[14:50] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[14:57] * leseb (~leseb@mx00.stone-it.com) has joined #ceph
[14:59] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[15:24] * sleinen1 (~Adium@2001:620:0:26:856:9884:2fbe:a2ec) Quit (Quit: Leaving.)
[15:24] * sleinen (~Adium@ has joined #ceph
[15:29] * ScOut3R (~ScOut3R@ Quit (Remote host closed the connection)
[15:29] * ScOut3R (~ScOut3R@ has joined #ceph
[15:31] * nhorman (~nhorman@nat-pool-rdu.redhat.com) has joined #ceph
[15:32] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[15:34] * fghaas1 (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[15:34] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[15:38] * fghaas1 is now known as fghaas
[15:40] * sleinen (~Adium@ has joined #ceph
[15:41] * sleinen1 (~Adium@2001:620:0:25:f140:a5a0:41c4:38e1) has joined #ceph
[15:42] * sleinen1 (~Adium@2001:620:0:25:f140:a5a0:41c4:38e1) Quit ()
[15:42] * sleinen (~Adium@ Quit (Read error: Connection reset by peer)
[15:44] * fghaas1 (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[15:44] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[15:45] * rturk-away is now known as rturk
[15:45] * drokita (~drokita@ has joined #ceph
[15:48] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:51] * fghaas1 (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[15:55] * aliguori (~anthony@ has joined #ceph
[15:55] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[15:56] * ScOut3R_ (~ScOut3R@ has joined #ceph
[15:57] * PerlStalker (~PerlStalk@ has joined #ceph
[15:57] * scuttlemonkey changes topic to 'v0.56.3 has been released -- http://goo.gl/f3k3U || argonaut v0.48.3 released -- http://goo.gl/80aGP || tell us about your Ceph use at http://ceph.com/census'
[16:03] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[16:07] * vata (~vata@2607:fad8:4:6:cd9f:cce3:722:7795) has joined #ceph
[16:12] * diegows (~diegows@ has joined #ceph
[16:17] * schamane (~tbo@barriere.frankfurter-softwarefabrik.de) has joined #ceph
[16:17] <schamane> hi
[16:18] <schamane> i trying to set up a ceph, but on my hosts there is no permission foor root login, is it possible to give the username to the mkcephfs -a option?
[16:21] <scuttlemonkey> schamane: you can't just use sudo?
[16:22] <schamane> scuttlemonkey: no, will not work, have to use pub key and other username
[16:22] * sleinen (~Adium@ has joined #ceph
[16:24] * sleinen1 (~Adium@2001:620:0:26:7c71:93a0:ef72:7eec) has joined #ceph
[16:24] * rturk is now known as rturk-away
[16:25] * jskinner (~jskinner@ has joined #ceph
[16:26] <scuttlemonkey> to the best of my knowledge you need a user w/ root permissions...but lemme check and see if any of the wizards know an incantation I don't
[16:27] * Ul (~Thunderbi@ip-83-101-40-231.customer.schedom-europe.net) has joined #ceph
[16:28] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[16:28] <noob2> is it safe to delete the metadata pool if you're not using ceph fs?
[16:30] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[16:31] <scuttlemonkey> schamane: nope, no option...however if you feel like getting your hands dirty you can modify ceph_common.sh (mkcephfs includes ceph_common.sh)
[16:32] <schamane> yeah, just trying to add it to /sbin/mkcepfs
[16:32] <scuttlemonkey> there is a command there do_root_cmd(), and you could specify a different user if you wish
[16:32] <scuttlemonkey> noob2: yeah, that should be fine as long as you didn't have mds servers configured
[16:33] <scuttlemonkey> if you did, gotta do a bit of cleanup first
[16:33] <scuttlemonkey> http://www.sebastien-han.fr/blog/2012/07/04/remove-a-mds-server-from-a-ceph-cluster/
[16:36] <noob2> yeah i don't have any mds servers at the moment. i don't think we'll be using them for some time
[16:36] <noob2> i suppose i could just leave it alone
[16:36] <noob2> the tinkerer in me just has to change things :D
[16:36] <scuttlemonkey> hehe...but that would defy the ocd desire for tidy :)
[16:37] <scuttlemonkey> right
[16:37] <noob2> haha
[16:37] <noob2> true
[16:38] <noob2> i saw a question on the ceph users list asking about deleting the metadata pool to save pg's
[16:39] <slang1> noob2: leaving the metadata pool alone doesn't take up space, really
[16:39] <slang1> noob2: most users don't run out of pgs
[16:40] <slang1> noob2: also, you can now increase the number of pgs after deployment
[16:41] <noob2> cool
[16:41] <noob2> i thought that was still being tested
[16:46] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:49] * Ul (~Thunderbi@ip-83-101-40-231.customer.schedom-europe.net) has left #ceph
[16:49] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[16:54] * gerard_dethier (~Thunderbi@ Quit (Quit: gerard_dethier)
[16:56] * low (~low@ Quit (Quit: Leaving)
[16:56] <madkiss1> after deleting all my MDSes and doing "ceph mds newfs", in "ceph -w" i still see "mdsmap e34980: 0/0/1 up" — is that expected or should the mdsmap entry disappear completely?
[16:59] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[17:00] * gaveen (~gaveen@ has joined #ceph
[17:01] * ScOut3R_ (~ScOut3R@ Quit (Remote host closed the connection)
[17:01] * ScOut3R (~ScOut3R@ has joined #ceph
[17:07] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[17:19] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[17:23] <ShaunR> scuttlemonkey: that blog is a big outdated, most the commands dont work with out modification.
[17:24] <scuttlemonkey> which blog?
[17:24] <scuttlemonkey> oh, sebastien's?
[17:24] <ShaunR> http://www.sebastien-han.fr/blog/2012/07/04/remove-a-mds-server-from-a-ceph-cluster/
[17:24] <ShaunR> ya
[17:25] * sleinen1 (~Adium@2001:620:0:26:7c71:93a0:ef72:7eec) Quit (Quit: Leaving.)
[17:25] <scuttlemonkey> yeah, I knew my target audience though...noob2 is relatively seasoned
[17:25] <ShaunR> still a good article though, it does work once you modify the commands
[17:25] * sleinen (~Adium@ has joined #ceph
[17:25] <scuttlemonkey> nod
[17:25] <ShaunR> i used it a few days ago
[17:25] <scuttlemonkey> a good thing to point out for the general pop
[17:26] <noob2> scuttlemonkey: haha which makes my handle even funnier
[17:26] <scuttlemonkey> I enjoy the irony, yes :)
[17:26] <noob2> :D
[17:26] <nhm_> noob2: it's the 2, better than 1.
[17:26] <nhm_> noob2: implies seniority
[17:26] <noob2> hehe
[17:27] <noob2> someone snagged my noob1 name and registered it
[17:27] <scuttlemonkey> afk a bit to grab some lunch
[17:28] * nhm_ is now known as nhm
[17:29] <absynth_> just so i don't misunderstand
[17:29] <absynth_> if i have 3 mons and 2 die, my cluster halts.
[17:29] <absynth_> right?
[17:29] <noob2> no it shouldn't
[17:29] <ShaunR> noob2: noob1 looks free
[17:29] <absynth_> noob2: why? no quorum
[17:29] <noob2> oh i guess the person gave up on it
[17:30] <noob2> 1 is a quorum no? nobody else can argue with it
[17:30] <absynth_> nope
[17:30] <absynth_> by ceph's definition, 1 is not a quorum
[17:30] <ShaunR> maybe a different network? i dont even show it being registered with nickserv
[17:30] <noob2> oh..
[17:30] <noob2> ShaunR: oh maybe i'm thinking freenode
[17:30] <ShaunR> not registered their either :)
[17:30] <noob2> lol
[17:31] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[17:31] <ShaunR> i prefer the number 2 though..
[17:31] <noob2> i donno then
[17:31] <ShaunR> always have
[17:31] <absynth_> noob2: i think the train of thought is as follows "hey, wait, i am alone. we used to be three. i cannot continue without a second opinion"
[17:31] <noob2> absynth: so i need to add some more monitors to my cluster then. i only have 3
[17:31] <noob2> yeah good point
[17:31] <noob2> if you had a network split nobody would know who is correct
[17:31] <absynth_> http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
[17:33] <noob2> hmm
[17:33] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[17:33] <noob2> it says you can run a cluster with 1 mon
[17:33] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit ()
[17:33] <absynth_> yes, but not with ONE OUT OF THREE
[17:33] <noob2> but they recommend 3 or more
[17:33] <noob2> oh..
[17:33] <noob2> i see what you're saying
[17:33] <absynth_> if you have one mon, it will always be in quorum
[17:33] <noob2> right
[17:33] <absynth_> since it never had anyone else to ask
[17:33] <noob2> if 2 others die now it doesn't know what to do
[17:34] <noob2> good point
[17:34] <noob2> i'll tell the guys at work to add 2 more mons just in case
[17:34] <absynth_> so, now consider this question
[17:34] <noob2> shoot
[17:34] <absynth_> is it possible to set up monitors in a way that network splits do *not* result in a non-responsive cluster on the smaller part?
[17:35] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[17:35] <absynth_> i think: no, and i think that is by design. right?
[17:35] <noob2> i'm thinking no
[17:36] <mjevans> absynth_: I believe by definition that's a no and the whole 'small' segment is failed. The good news is that when you reconnect it the 'write intent' resync is close to the actual size
[17:36] <noob2> yeah i would think that is by design so that you don't have to merge changes upon the network being restored
[17:36] <noob2> that could be a real mess
[17:37] * ninkotech (~duplo@ip-89-102-24-167.net.upcbroadband.cz) has joined #ceph
[17:39] * BillK (~BillK@124-168-232-158.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[17:39] * __jt__ (~james@rhyolite.bx.mathcs.emory.edu) Quit (Remote host closed the connection)
[17:39] <noob2> absynth: is your network setup in such a way that you can envision this happening?
[17:40] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[17:54] * jlogan (~Thunderbi@2600:c00:3010:1:2484:8d84:a41e:1d27) has joined #ceph
[17:54] * aliguori (~anthony@ has joined #ceph
[17:56] * BillK (~BillK@58-7-215-75.dyn.iinet.net.au) has joined #ceph
[17:58] <noob2> good news: work gave me permission to open source my rados block device mounting and fibre channel exporting code. It's here: https://github.com/cholcombe973/rbdmount
[18:00] * leseb (~leseb@mx00.stone-it.com) Quit (Remote host closed the connection)
[18:02] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[18:09] * sleinen (~Adium@2001:620:0:25:35b5:bd79:7e1c:d05c) has joined #ceph
[18:10] <fghaas> noob2: um, isn't that a bit of a misnomer? you typically only "mount" filesystems, which is not what you do, you export
[18:11] <noob2> yeah it needs the name tweaked for sure
[18:11] * markw (0c9d41c2@ircip1.mibbit.com) has joined #ceph
[18:11] <fghaas> also, wasn't there a python librbd at some point?
[18:12] <noob2> you can tell i'm not very creative with naming stuff
[18:12] <noob2> there is yes
[18:12] * markw is now known as Guest1738
[18:12] <fghaas> I was wondering why you're Popen'ing rbd then
[18:12] <noob2> I suppose i could do this with the python librbd also. I wanted to use the kernel rbd though
[18:12] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[18:12] <fghaas> ah, does librbd not expose the map interface? never checker
[18:12] <fghaas> checked
[18:13] <noob2> i'm not sure
[18:13] <noob2> i didn't even think about using that
[18:13] <noob2> might have made it easier
[18:13] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:13] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[18:14] * schamane (~tbo@barriere.frankfurter-softwarefabrik.de) has left #ceph
[18:17] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[18:18] * gaveen (~gaveen@ has joined #ceph
[18:25] * The_Bishop_ (~bishop@f052099210.adsl.alicedsl.de) has joined #ceph
[18:26] * loicd (~loic@lvs-gateway1.teclib.net) Quit (Ping timeout: 480 seconds)
[18:28] * l0nk (~alex@ Quit (Quit: Leaving.)
[18:32] * The_Bishop (~bishop@e179010244.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[18:39] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:40] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[18:45] * BillK (~BillK@58-7-215-75.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:52] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[18:55] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:58] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: A day without sunshine is like .... night)
[19:00] <ShaunR> anybody in here ever compare the performance of say a raid edition drive (ex: WD RE) vs a cheaper workstation drive (ex: Seagate Barracuda)
[19:02] <ShaunR> We always buy RE drives because we've run raid10 arrays, now that it looks like we wont need to do that with ceph and we'll run them directly attached i'm curious if it's worth it to continue to buy enterprise drives (atleast for sata, SSD would be enterprise)
[19:02] <Gugge-47527> Well, you will still have the shorter timeouts on errors on RE disks
[19:03] <nhm> ShaunR: I've always been more inclined toward enterprise class drives for the warantee and supposed MTBF rather than for performance.
[19:03] <Gugge-47527> Which is a good think for ceph too
[19:03] <Gugge-47527> And your disks will die some time, with a 5 year warrenty theres a bigger chance you can get a new one :)
[19:03] <mjevans> Yes, the 'time limited error recovery' is one of the main features of the RE drives.
[19:04] * Ryan_Lane (~Adium@ has joined #ceph
[19:06] <nmartin> If I'm looking at the quad node supermicro chassis that have 3 3.5" drives per node, would a single RAID0 VD per node be the best bet? I'd create a 10 GB OS partition, and the rest would be for and OSD. The other option would be JBOD with 1 OSD per disk. Does it matter from a reliability/performance standpoint?
[19:07] <fghaas> nmartin: in that case if one disk fails and you're in raid0, that means _all_ the data on that box is now shuffled elsewhere
[19:07] <nhm> nmartin: ceph tends to perform best with 1 drive per OSD, especially in configurations with few drives per node. WB cache does help performance when journals are not on SSDs, but that adds considerable cost.
[19:08] <nmartin> I'll consider SSDs as well fo rthe journal storage, I need to read up to see how much i need
[19:08] <fghaas> nmartin: jbod, 1 osd per spindle: 1 disk dies, only 1/3 of your data on that node gets shuffled around
[19:08] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[19:09] <ShaunR> How about SAS vs SATA when everything else being equal (rpm, speed, cache)
[19:09] <nhm> nmartin: It's kind of a big balancing act where you have to figure out the right ratio of drives, cpu, controllers, and whether or not to use SSDs and/or battery backed write-back cache.
[19:09] <nhm> oh, and network.
[19:09] <noob2> agreed
[19:10] <nhm> ShaunR: we'e sucessfully used both SATA and SAS drives.
[19:10] <ShaunR> some of the 7200 RPM SAS disks these days are not that much more...
[19:10] <noob2> i skipped the sas because they were quite a big more expensive per GB
[19:10] <nmartin> yeah, I'm doing an initial design for a Cloudstack (KVM + Ceph on each node) 8 way cluster
[19:10] <ShaunR> SAS really only gets expensive when you want 15k RPM... reason i was asking about drives being equal
[19:11] <noob2> right
[19:11] <nhm> ShaunR: The test node I have uses enterprise SATA, but if you have expanders in the backplane you may want to stick entirely with SAS.
[19:11] <ShaunR> curious how everything being equal, if SAS would be a better choise
[19:12] <ShaunR> It looks like most of supermicros have expanders, we'd be looking at there 24 disk servers most likely
[19:12] <nhm> ShaunR: I have the 36 drive chassis with the direct connect backplane.
[19:12] <noob2> i'd prob say if you can afford it and don't care about keeping things cheap than sure go for sas
[19:13] <noob2> the iops are a little better
[19:13] <ShaunR> nhm: I'm not a big fan of the drives being in the back... i feel like those suckers are going to get hot back where considering the hot/cold row config in a DC... how have thy been for you
[19:14] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:15] <nhm> ShaunR: I have the chassis sitting on a table, so I haven't gotten to see how it does in a real setup. Having said that, the fans they have in that chassis are insane.
[19:15] <ShaunR> nhm: are you the blog buy?
[19:15] <ShaunR> guy*
[19:15] <ShaunR> god i cant type today
[19:16] <ShaunR> mark nelson?
[19:16] <nhm> ShaunR: I do the performance articles on the ceph blog. :)
[19:16] <nhm> That's me
[19:16] <ShaunR> I've been reading alot of those
[19:16] <nhm> glad to hear it!
[19:17] <nhm> meeting, be back in 15
[19:18] <nhm> actually, back now (sort of)
[19:18] <ShaunR> haha
[19:18] <ShaunR> nhm: Have you tried just hooking those drives up directly to the controller?
[19:19] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:20] <ShaunR> I'm doing testing now, so far i've done some benches on a raid10 with a LSI 9266-4i controller... Next i'm gong to test single raid0, but i'm curious if these controllers are even needed, wondering how performance will be using the onboard controller.
[19:26] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[19:26] <nhm> ShaunR: they are still going through the backplane, but theoretically it shouldn't be interfering. I get really nice performance out of it.
[19:27] <nhm> ShaunR: my current setup is using 4 9207-8i controllers which are cheaper LSI ones. The much cheaper Highpoint Rocket 2720SGL does about as well, but I only have 1 of them to test.
[19:36] * chutzpah (~chutz@ has joined #ceph
[19:36] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[19:37] * nhorman (~nhorman@nat-pool-rdu.redhat.com) Quit (Quit: Leaving)
[19:39] * rturk-away is now known as rturk
[19:40] * rturk is now known as rturk-away
[19:40] * rturk-away is now known as rturk
[19:44] * yehuda_hm (~yehuda@2602:306:330b:a40:29a8:3ee1:e94f:c893) Quit (Read error: Connection timed out)
[19:44] <nmartin> i was looking at just using the 9240 8i or 4i for jbod if i was using a full multi-drive chassis
[19:45] <nmartin> but if i go with the supermicro quad node chassis, I'll just use the onboard sata controller
[19:45] * yehuda_hm (~yehuda@2602:306:330b:a40:29a8:3ee1:e94f:c893) has joined #ceph
[19:46] <nhm> nmartin: I don't have any evidence, but I imagine the on-board is probably going to be ok if it's just a couple of drives.
[19:47] <noob2> can you upgrade a ceph cluster all at once and then restart the osd's one by one?
[19:47] <noob2> or is it best practice to just go one by one
[19:52] <nmartin> nhm: yeah, it's 3 drives per controller, i figured that would be ok
[19:56] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:57] * gaveen (~gaveen@ Quit (Quit: Leaving)
[20:01] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[20:03] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) has joined #ceph
[20:06] <wer> what does getting a lot of slow request 67395.462269 seconds old mean. I have a degraded cluster that has been rebuilding since yesterday...
[20:06] <Karcaw> what version of ceph are you running?
[20:07] <wer> 0.55.1
[20:07] <wer> 0.55-1~bpo70+1
[20:08] <wer> I restarted one of the complaining OSD's and now it is incrementing slow requests in it's logs. IE it is around 121.671973 seconds old now.
[20:08] <wer> seems like something is stuck.
[20:09] <Karcaw> i'm experiencing an issue right now on the 0.56 series where i have long operations almost constantly, with osd processes crashing at times, which just start rebuilds again. The strangest thing is is this system has been hadnling my IO load for months now, and out of the blue on tuesday it starting having issues.
[20:12] * nmartin (~nmartin@adsl-98-90-198-125.mob.bellsouth.net) Quit (Ping timeout: 480 seconds)
[20:14] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[20:14] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[20:15] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:15] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:18] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[20:20] * schlitzer (~schlitzer@ip-109-90-143-216.unitymediagroup.de) has joined #ceph
[20:29] <schlitzer> hey all
[20:29] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[20:29] <schlitzer> when creating a pool, is there also a way to limit the size of this pool?
[20:31] <schlitzer> i also have problems to understand the meaning of pg_num and pgp_num
[20:31] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) has joined #ceph
[20:31] <schlitzer> from pgp-num: "This should be equal to the total number of placement groups"
[20:32] <schlitzer> but what when i add some osd�s to the cluster?
[20:32] <dmick> schlitzer: currently there is no quota system, no
[20:32] <schlitzer> wouldn't this increase the number of total pg�s?
[20:32] <schlitzer> dmick, ok, thx
[20:32] <dmick> schlitzer: number of pg's is currently fixed at pool creation time, even when OSDs are added
[20:33] <schlitzer> so a pool cannot grow?
[20:33] <dmick> there is new code to expand that number if it becomes clear that it's too small; it's probably useful to consider that beta code at the moment
[20:33] <dmick> I'm not exactly certain of the difference between pg_num and pgp_num; sjust/sjustlaptop?
[20:33] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[20:33] <dmick> hah
[20:34] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[20:34] <schlitzer> i also found a howto where one was doing "rados mkpool $poolname"
[20:34] <schlitzer> and there one will not have to provide this informations & it is working
[20:35] <schlitzer> it is from here: http://wiki.debian.org/OpenStackCephHowto
[20:36] <dmick> yes, you can't specify pg_num with rados mkpool, but you can with ceph osd pool create <pool> <pgnum> <pgpnum>
[20:36] <dmick> http://ceph.com/docs/master/rados/operations/pools/#createpool
[20:37] <schlitzer> yes, but i think under the hood both will do the same, so maybe rados mkpool will use the defaults (8)?
[20:37] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[20:37] <dmick> oh they both definitely create pools, yes. and indeed, mkpool uses 8
[20:37] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[20:38] <schlitzer> well, this is irritating me, because the doc says that 8 is not recommended :-D
[20:40] <dmick> it actually uses osd pool default size, which can be set
[20:40] <dmick> but you have the option to create pools with osd create pool
[20:41] <schlitzer> another question: the maximum number of PGs is fixed to each osd. when i have a osd with 1,5TB and 100PG�s. this would mean that every pg is 15GB big?
[20:41] <dmick> I'm sorry, I mean pool default pg num
[20:41] <dmick> anyway
[20:42] <schlitzer> or can one pg within a pool can have 100gb of date, while other just have less then 1gb or something like this?
[20:43] <dmick> pgs are relative to a pool
[20:43] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[20:43] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[20:43] <dmick> think of the pg as a first sharding of the object space, so that it's a smaller unit of placement distribution
[20:43] <schlitzer> so when i delete all pools, there will be no PG�s anymore?
[20:43] * jskinner (~jskinner@ has joined #ceph
[20:44] <dmick> the amount of data it contains is less important than 'having enough for a good distribution across OSDs'
[20:44] <dmick> well when you delete all pools there will be no objects to occupy PGs
[20:44] <schlitzer> so if i start with a 100osd cluster, and it would be possible that the cluster will grow to 1000, oder even 10000 osd instances
[20:45] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[20:45] <schlitzer> then i should create a pool with at lest 10000PG so i can distrubute the data over all osd�s?
[20:45] <schlitzer> or is it impossible for a pool to grow to new osd�s?
[20:46] * nmartin (~nmartin@adsl-98-90-194-253.mob.bellsouth.net) has joined #ceph
[20:46] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[20:47] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit ()
[20:47] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[20:47] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit ()
[20:48] * Cube (~Cube@ has joined #ceph
[20:49] * Ul (~Thunderbi@ip-83-101-40-231.customer.schedom-europe.net) has joined #ceph
[20:50] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[20:50] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit ()
[20:51] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[20:53] <wer> Karcaw: I had an osd crash too
[20:56] <Ul> hi everybody! i just installed a small cluster on debian wheezy with kernel 3.7.6. ceph/rdb is working fine. facing problem with qemu-kvm runnign on rbd though. installed qemu-kvm from package repository, then recompiled qemu from source as indicated on the wiki to obtain the rdb format in qemu-image. i've now ended up with a new set of qemu-* executables in /usr/local/bin while the old executables are still around in /usr/bin. is
[20:57] * eschnou (~eschnou@154.180-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:57] <fghaas> pet peeve: it's rbd, it ain't no database™ :)
[20:58] * wschulze1 (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[20:58] <fghaas> Ul: what's your question?
[21:00] <Ul> how do I make sure that the kvm stack is using the my recompiled binaries to obtain the qemu rbd driver. as far as I can tell I only recompiled qemu and not kvm
[21:01] * nmartin (~nmartin@adsl-98-90-194-253.mob.bellsouth.net) has left #ceph
[21:01] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[21:02] <Ul> a possible solution would be to reinstall the node with ubuntu 12.04 LTS instead which has the rdb patch included in qemu
[21:02] <Ul> i'd rather stick to wheezy if i can
[21:04] * jskinner (~jskinner@ has joined #ceph
[21:04] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:06] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[21:06] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) has joined #ceph
[21:07] <fghaas> Ul: well why don't you build your own debian packages with the rbd configure flags enabled?
[21:08] <fghaas> and then replace the upstream packages with your build?
[21:10] * yehuda_hm (~yehuda@2602:306:330b:a40:29a8:3ee1:e94f:c893) Quit (Ping timeout: 480 seconds)
[21:10] <Ul> ok, sounds good. any pointers where I can find some documentation how to do that? do i have to recompile only qemu-kvm?
[21:11] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[21:11] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[21:13] <Ul> thx I think I found an introduction to package building ... sorry to bother you with newbie questions
[21:16] * jlogan (~Thunderbi@2600:c00:3010:1:2484:8d84:a41e:1d27) Quit (Ping timeout: 480 seconds)
[21:17] * yehuda_hm (~yehuda@2602:306:330b:a40:133:7b35:e199:cb6e) has joined #ceph
[21:18] * jlogan (~Thunderbi@ has joined #ceph
[21:31] <Ul> thx for your help! ... gotta go
[21:31] <Ul> bye
[21:31] * Ul (~Thunderbi@ip-83-101-40-231.customer.schedom-europe.net) has left #ceph
[21:33] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[21:34] <dmick> schlitzer: pools and their existing pgs will use new OSDs as they are added
[21:40] * schlitzer (~schlitzer@ip-109-90-143-216.unitymediagroup.de) Quit (Ping timeout: 480 seconds)
[21:43] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[21:44] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:44] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[21:48] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[21:52] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[21:53] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:53] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) Quit (Quit: Leaving)
[21:55] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:55] * Guest1738 (0c9d41c2@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[22:02] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[22:02] * noob2 (~noob2@ext.cscinfo.com) Quit (Ping timeout: 480 seconds)
[22:02] * jskinner (~jskinner@ has joined #ceph
[22:04] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[22:07] * loicd (~loic@magenta.dachary.org) has joined #ceph
[22:07] * jjgalvez (~jjgalvez@ has joined #ceph
[22:10] * noahmehl (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) has joined #ceph
[22:12] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) Quit (Quit: Leaving)
[22:13] * MarkN (~nathan@ has joined #ceph
[22:14] * MarkN (~nathan@ has left #ceph
[22:18] * mjevans (~mje@ Quit (Ping timeout: 480 seconds)
[22:22] * fghaas (~florian@91-119-74-57.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[22:27] <nz_monkey> nhm: Last week you mentioned there were some performance improvements coming for OSD's on ext4/xfs . No pressure, but I was wondering if there was a timeframe for these being available for testing ?
[22:34] <Karcaw> if i mark an osd as 'lost', is there a way to bring it backonline later, say after i re-format the disks?
[22:38] <dwm37_> That sounds less like bringing the original OSD online, and more like adding a new thingy.
[22:38] <dwm37_> Uh, OSD.
[22:38] <dwm37_> (Sorry, brains.)
[22:39] <nhm> nz_monkey: you can test them now if you like. Check out wip-pginfo-rebase
[22:39] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:40] <nhm> nz_monkey: they aren't even in the master branch yet though, so don't use it for anything other than alpha testing.
[22:41] <dmick> Karcaw: that's why marking it "lost" requires "--yes-i-really-mean-it". It's one way.
[22:41] <Karcaw> i was just considering if i lost an external raid system to a node, then once i get disks replaced it would be nice to build it back up on the same server.
[22:41] <dmick> after that, it's just a new OSD, and will need to be repopulated from scratch
[22:41] <nhm> nz_monkey: at least, I don't think it's in master yet (watch as someone proves me wrong)
[22:41] <dmick> but sure, you can start a new OSD on the same server
[22:41] <Karcaw> so it will need a new name then?
[22:42] <dmick> that's a good question. I don't know if you can reuse the id. I think if your ids are contiguous, and you remove the lost osd, then when you create a new one it will use that same id because it's unallocated.
[22:42] <dmick> we're currently debating allowing specifying the id (again, really, this time)
[22:43] <nz_monkey> nhm: Thanks, we will run a performance comparison on our test environment
[22:46] <nhm> nz_monkey: you'll likely notice the biggest improvements with XFS and at small IO sizes. EXT4 performance seemed to improve in some cases but was more variable overall. BTRFS performance didn't seem to change much at all (but was already higher for these tests)
[22:47] <nz_monkey> nhm: Great, we are running XFS on our OSD's. Our small IO performance is already satisfactory, but more is always better. Sequential IO is currently not that great, but we only have L3+L4 hashed bonding on 4x Intel e1000's, we have 10gbit nic's on order so will test 0.56-3 and wip-pginfo-rebase once they arrive
[22:49] <nhm> nz_monkey: what is your read_ahead_kb set at on the OSD disks?
[22:50] <nz_monkey> nhm: default ;)
[22:50] <nhm> nz_monkey: ok, Xiaoxi from Intel saw a really big improvement in sequential read performance if I remember right by increasing it for each OSD disk from 128 to 512.
[22:50] <Karcaw> ok, so here is my reall issue, ive got processes stuck tring to complete an operation(in theis case a set xattr), it seemed to connect to one osd, but it seems to me that the other osd it needs to write to is not responding. now, i know its not responding, since it keeps crashing(bug 4116), and it will not stay up. is there a clean way to tell ceph to forget about an OST in a temporary way for a while so that these simple xattr's can co
[22:51] <nz_monkey> nhm: Thanks, will test that now.
[22:52] <nz_monkey> nhm: I am trying to find documentation on read_ahead_kb but cannot spot anything obvious. Is this configured per OSD in ceph.conf ?
[22:54] <nhm> nz_monkey: it's not a ceph setting, it's in /sys/block/<device>/queue/read_ahead_kb
[22:54] * itamar (~itamar@ has joined #ceph
[22:54] <nz_monkey> nhm: I was just about to say the only reference I can find to it is in sysfs ;) Thanks for that
[22:54] <nhm> np
[22:58] <mikedawson> sjustlaptop: similar assert on _get_map_bl on two osds after upgrade from 0.56.2 to 0.56.3 http://pastebin.com/JSrMg7Nj
[22:58] <dmick> Karcaw: if the OSD is down, that shouldn't stop responses; that's what replication is for
[22:59] <dmick> unless you've lost all the OSDs for a particular PG
[22:59] <sjustlaptop> mikedawson: for how long had those osds been running?
[22:59] <dmick> (lost is a bad term. Unless all the OSDs for a PG are nonresponsive/down/out)
[22:59] * mauilion (~dcooley@crawford.dreamhost.com) has joined #ceph
[22:59] <sjustlaptop> mikedawson: do you have an unused pool?
[23:00] <mikedawson> sjustlaptop: both are on the same node, one did it in just a few seconds, the other died after about a minute
[23:00] <mauilion> quick question. If I have a deployment running 56.2 and I want to upgrade to 56.3. Can I just apt-get update ; apt-get upgrade and expect everything to work?
[23:01] <mauilion> or is there some restarting of daemons to do?
[23:01] <sjustlaptop> having trouble finding the bug number, but I think it was probably caused by a bug fixed in 56.3
[23:01] <mikedawson> sjustlaptop: I'm not using two of the pools for anything at the moment
[23:01] <sjustlaptop> yeah, that's the trigger
[23:01] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[23:01] <dmick> mauilion: upgrade doesn't restart the daemons AFAIK
[23:02] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:02] <mauilion> dmick: is it required that I do as part of the upgrade?
[23:02] <sjustlaptop> if there is no IO on a particular PG, we were not persisting the current pg epoch, and on restart the PG would attempt to read a really old map
[23:02] <Karcaw> dmick: Thats what i thought, but i'm at 7000 seconds waiting on a set xattr at this point..
[23:02] <dmick> it depends on whether you want to run the new daemons :)
[23:02] <mauilion> dmick: gotcha
[23:03] <dmick> Karcaw: clearly something's wrong, but perhaps it's deeper than just "one OSD died"
[23:03] <dmick> http://ceph.com/docs/master/rados/operations/troubleshooting-osd/ is probably good to step through
[23:04] <mikedawson> sjustlaptop: this node had issues during the reboot and sat on a bootup screen for a few hours while the other nodes came back up quickly and re-shuffled data.
[23:04] <sjustlaptop> yeah, it was churning through old osdmap epochs
[23:04] <Karcaw> the pg i belive the object pg is in the down+peering state. and it says that the peering is blocked by the two osd's that are down.
[23:05] <Karcaw> numbers 5 and 8 are down, but 12 is complaining about the request being blocked.
[23:05] <mikedawson> sjustlaptop: it was off for several hours last night while the other nodes moved to 0.56.3. Today it came up with 0.56.2, then I updated, rebooted, was off for a few hours, then came up with this issue on 0.56.3
[23:05] <wer> I have an osd that hits a "hit suicide timeout" and dies...
[23:06] <Karcaw> wer: i have two right now doing that..
[23:07] <sjustlaptop> how long was it up on 56.2 prior to the restart?
[23:07] <mikedawson> sjustlaptop: 2 weeks? (since the day 0.56.2 was released).
[23:08] <sjustlaptop> sorry, between when you brought it back up in 56.2 and restarted it into 56.3 and hit the crash
[23:08] <sjustlaptop> ?
[23:08] <mikedawson> sjustlaptop: 10 min?
[23:08] <sjustlaptop> did it join the cluster?
[23:08] <sjustlaptop> (all pgs active+clean)?
[23:08] <mikedawson> not sure
[23:09] <sjustlaptop> ok, still consistent with my explanation, I think
[23:09] <sjustlaptop> if you want to recover the osds, I can push a branch that will allow them to boot
[23:09] <wer> heh. yeah, I am slowly pulling out one of my nodes so I can do more testing on that guy. But the degraded system is taking a long time to rebuild.... I am sure I didn't pull the node out the correct way or something.... but oh man it sucks being so unsure all the time. lost a few days.
[23:09] <mikedawson> gotcha. so to be clear, were there two bugs that ended with the _get_map_bl (one fixed for 0.56.2 and another fixed for 0.56.3)? Or is this just the same issue?
[23:10] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[23:10] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Read error: Connection reset by peer)
[23:10] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[23:10] <dmick> Karcaw: if you have two OSDs down, it's entirely possible those are the only two containing that PG
[23:10] <nz_monkey> nhm: I just set the read ahead on all 20 of our OSD's to 512 from 128, and no difference. I didnt expect one as the bottleneck seems to be the bonded 1gbit
[23:10] <dmick> the URL will allow you to show that
[23:11] <dmick> s/URL/instructions at the URL above/
[23:11] <nhm> nz_monkey: ah, ok
[23:11] * eschnou (~eschnou@154.180-201-80.adsl-dyn.isp.belgacom.be) Quit (Read error: Connection reset by peer)
[23:11] <sjustlaptop> yeah, there were two different bugs, one fixed in 56.2 the other in 56.3
[23:12] <nhm> nz_monkey: Yeah, should be interesting to see how things go for you with the 10GbE
[23:12] <nhm> nz_monkey: I'm doing bonded 10GbE with round-robin pretty sucessfully right now.
[23:13] <Karcaw> how can i tell exactly where a file is located: osd map says: osdmap e2007 pool 'chinook.points' (4) object '2013-02-14/cpuinfo.user.cpu7' -> pg 4.17b99420 (4.0) -> up [0] acting [0]
[23:13] <mikedawson> sjustlaptop: thx. looks like I'll be ok wiping these two osds and re-adding them, so I don't think I need the other branch. I'll let you know if I do
[23:13] <nz_monkey> nhm: We are doing quad port Intel I350 with L3+L4 hashing to Extreme switches. We are planning on doing the same with 2x10gbit on Intel x520's to Extreme x670 switches but I dont think we can do round robin to the extremes
[23:13] * itamar (~itamar@ Quit (Quit: Ex-Chat)
[23:13] <nz_monkey> nhm: what switches are you using ?
[23:13] <sjustlaptop> mikedawson: ok, thanks for the report, let me know if you see anything like that again
[23:13] <Karcaw> do i look at the pg number, or the up/acting numbers?
[23:14] * aliguori (~anthony@ Quit (Remote host closed the connection)
[23:14] <sjustlaptop> mikedawson: if you want to increase the odds of triggering it, you can just restart an osd after a week or so
[23:14] <sjustlaptop> mikedawson: though I think we've got it now
[23:14] <wer> nz_monkey: what controller are you using? for the spinny disks?
[23:15] * BManojlovic (~steki@business-89-135-166-219.business.broadband.hu) has joined #ceph
[23:15] <nz_monkey> wer: Our POC machines are just using the Intel Q77 onboard SATA in AHCI
[23:15] <wer> how many osd's?
[23:15] <wer> per node?
[23:15] <nz_monkey> wer: 4 per node
[23:15] <mikedawson> sjustlaptop: just in case, is the branch on gitbuilder? name?
[23:16] <wer> ok. And you are going to Ten gig?!
[23:16] <sjustlaptop> wip-bobtail-load_pgs-workaround
[23:16] <sjustlaptop> but it would need to be rebased on 56.3
[23:16] <nhm> nz_monkey: I'm just doing two X520s directly connected via SFP+
[23:16] <nz_monkey> wer: Each spinning disk is a backing device for a raid0 of ssd's using bcache
[23:17] <nz_monkey> nhm: Ahhh, the luxuries of linux to linux :)
[23:17] <mikedawson> sjustlaptop: i see. With the first of two osds killed and re-added I'm at 4800 pgs: 4461 active+clean, 196 active+remapped+wait_backfill, 129 active+recovery_wait, 9 active+remapped+backfilling, 5 active+recovering;
[23:17] <wer> gotcha.
[23:17] <nz_monkey> wer: we can clearly see the NIC's maxing out though
[23:17] <sjustlaptop> mikedawson: just repushed it
[23:17] <dmick> Karcaw: that says pg 4.0 is currently available on osd 0
[23:17] <sjustlaptop> mikedawson: looks like it's recovering
[23:17] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[23:17] <mikedawson> sjustlaptop: so I think it's safe to kill the other (no missing PGs), right?
[23:18] <sjustlaptop> the other is already not running, right?
[23:18] <mikedawson> yes
[23:18] <wer> ok. I was unable to max out much more then 1gig with my 24 osd's (which are actually spinny).... So the ten gig seemed bust. But I don't know yet why it sucks so bad.
[23:18] <nhm> nz_monkey: Indeed. :) I actually am kicking myself for not buying Mellanox connectX3 cards with QSFP+ cables.
[23:18] <sjustlaptop> yeah, you should be good to go
[23:18] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:18] <mikedawson> thx Sam
[23:18] <sjustlaptop> sure
[23:19] <Karcaw> osd 0 is up, and active.
[23:19] <nhm> nz_monkey: Back when I started with this box I was just hoping to be able to hit bonded 10GbE speeds. Now I'm wishing I could do 40GbE or QDR/FDR IB with IPoIB.
[23:19] <nz_monkey> nhm: we looked at that but 40gbit switches from Force10 or Extreme are not cheap
[23:19] <dmick> right, so operations to that object should be succeeding
[23:19] <nz_monkey> nhm: is that your 60OSD monster ?
[23:19] <nhm> nz_monkey: 36 bays
[23:20] <nhm> nz_monkey: right now I'm using 32 of them, 24 OSDs with 8 SSDs for journals.
[23:20] <wer> well what the hell did I do to make my nodes suck so bad? You guys are pushing waaaay more bandwidth...
[23:21] <Karcaw> right, which is why i'm confused.
[23:21] * wschulze1 (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[23:21] <nhm> wer: It's really easy for things to go wrong.
[23:21] <wer> apparently :) We are close to throwing in the towel... too many things...
[23:21] <nhm> wer: did any of the stuff we talked about yesterday help?
[23:22] <wer> nhm: I am at degraded (0.012%) and waiting for happiness... then I will have a whole node out to do some testing on.
[23:23] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[23:23] <wer> going to start with some good old bonnie++ probably...
[23:23] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[23:23] <nz_monkey> wer: Most of the issues I have has were because I assumed Ceph worked in a particular way, but was completely wrong :)
[23:24] <nz_monkey> wer: and unclean PG's have usually been due to the crush map being wrong for my current set up
[23:24] <wer> nz_monkey: exactly. It's the day to day that I am worried about. Like doing things, and doing them correctly. I got a lot of things wrong in the beginning too.
[23:25] <wer> I have been running clean for weeks... but performance is just terrible.
[23:26] <wer> I assumed from looking at the available io that it was network... but I was waay wrong. It is something else.
[23:26] <nz_monkey> wer: we purchased 5 "nodes" which are dedicated to testing, so not too worried about borking them. When we buy our production nodes we will still test every change on our test cluster before doing them on production, and we are also planning to write some management tools with django
[23:26] <wer> yep... already scripted the strangest bash ever :)
[23:27] <nhm> wer: Are you doing RBD in addition to RGW?
[23:27] <wer> This is all test as well, however I do have some test systems on it. plus I don't like to loose data so I treat it like production... sort of. No just radosgw.
[23:29] <wer> My biggest mistake to date (besides this performance issue) was getting the radosgw pool spread accross all the osd's...
[23:29] <nhm> wer: that's also a big part. The performance most of us are talking about are RBD or straight RADOS performance. RGW potentially can be a lot slower, especially if you are testing against a single bucket.
[23:29] <dmick> what's the format of a CephFS inum, say as used in 'mds tell <name> issue caps <inum>'?
[23:29] * BManojlovic (~steki@business-89-135-166-219.business.broadband.hu) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:29] <wer> yeah we are running rados bench right now.... and starting from zero on an empty single node.
[23:31] <wer> ditching the middleman... and I guess I am going to run something on all the disk to measure the controller performance... cause it is suspect too at this point.
[23:31] <dmick> (sorry that's issue_caps)
[23:32] <wer> 19 pgs incomplete, is that my dead osd or something or is that a bug?
[23:33] * BillK (~BillK@124-168-248-119.dyn.iinet.net.au) has joined #ceph
[23:35] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[23:37] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:37] <nz_monkey> nhm: It will be interesting to see how you find performance on IPoIB at 40gbit with no TOE
[23:40] <nz_monkey> nhm: If you need higher density 10gbit you can always try http://www.hotlavasystems.com/
[23:40] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[23:41] <wer> hmm. It is done... but I have health HEALTH_WARN 19 pgs incomplete; 19 pgs stuck inactive; 19 pgs stuck unclean
[23:41] <wer> What do I do about that?
[23:42] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[23:44] * vata (~vata@2607:fad8:4:6:cd9f:cce3:722:7795) Quit (Quit: Leaving.)
[23:44] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[23:45] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[23:45] * __jt__ (~james@rhyolite.bx.mathcs.emory.edu) has joined #ceph
[23:45] <wer> so incomplete pgs say to file a bug, and try to start any osd's that may be missing.... but my one dead osd seems to hit that suicide timeout and abort... and that OSD is not in the crushmap or weighted..... hmm.
[23:48] <wer> actually this is different.... filestore(/var/lib/ceph/osd/ceph-14) error getting collection index for FORREMOVAL_1_12.ff8: (2) No such file or directory
[23:48] <sjustlaptop> that's usually harmless
[23:48] <sjustlaptop> what version are you on?
[23:49] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[23:49] <wer> 0.5501
[23:49] <wer> 0.55-1
[23:51] <sjustlaptop> is that the error that kills the osd?
[23:52] <wer> sjustlaptop: no, before it was hitting a suicide timeout :)
[23:52] <sjustlaptop> ok, is it alive now?
[23:53] <wer> yes. This is my third restart today.... and it appears to be living.... shedding all it's diskspace.
[23:54] <sjustlaptop> is it "up" yet?
[23:54] <sjustlaptop> for the record, a lot of important bugs were fixed in 56.2 and 56.3, you probably want to upgrade
[23:54] <wer> sjustlaptop: no, it is not "up" yet.
[23:54] <sjustlaptop> ok
[23:55] <sjustlaptop> the suicide timeout part should be fixed in 56.3
[23:57] <wer> Well, hmm. I can upgrade I guess. ok it is "up" and just read it's journal....
[23:58] <wer> so, These osd's are "out". They still show up in the tree with zeros.... I was trying to get this whole node out. And basically it has been trying to work it's way back to healthy since yesterday. But finished,,, and remains unhealthy.
[23:59] <wer> health HEALTH_WARN 19 pgs incomplete; 19 pgs stuck inactive; 19 pgs stuck unclean

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.