#ceph IRC Log


IRC Log for 2013-04-25

Timestamps are in GMT/BST.

[0:03] * shardul_man (~shardul@174-17-80-182.phnx.qwest.net) has joined #ceph
[0:04] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Ping timeout: 480 seconds)
[0:08] * lofejndif (~lsqavnbok@83TAAAT52.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[0:08] * gmason (~gmason@host-67-59-38-227.host.ussignalcom.net) has joined #ceph
[0:09] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[0:18] <joelio> Not been about for a couple of weeks, come back and there's a new wiki, call for blueprints and loads of good stuff. Nice!
[0:20] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[0:20] <joelio> plus the production kit has arrived now.. the fun begins :)
[0:20] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[0:21] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[0:21] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[0:22] * Q310 (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[0:22] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[0:22] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Read error: Connection reset by peer)
[0:22] * flabbergaster (5b09cafb@ircip4.mibbit.com) has joined #ceph
[0:23] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[0:29] * flabbergaster (5b09cafb@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[0:31] * yehuda_hm (~yehuda@2602:306:330b:1410:c54d:84c4:231a:4ca6) has joined #ceph
[0:31] <joelio> +1 for rgw http standalone
[0:35] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Operation timed out)
[0:35] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[0:35] * ChanServ sets mode +o scuttlemonkey
[0:39] * dontalton (~dwt@128-107-239-234.cisco.com) Quit (Quit: Leaving)
[0:41] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[0:42] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[0:49] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:49] * amichel (~amichel@saint.uits.arizona.edu) Quit ()
[0:53] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[0:53] * gmason (~gmason@host-67-59-38-227.host.ussignalcom.net) Quit (Quit: Computer has gone to sleep.)
[0:54] * BillK (~BillK@58-7-145-25.dyn.iinet.net.au) has joined #ceph
[0:55] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[1:03] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[1:07] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[1:09] * gmason (~gmason@host-67-59-38-227.host.ussignalcom.net) has joined #ceph
[1:10] * gmason (~gmason@host-67-59-38-227.host.ussignalcom.net) Quit ()
[1:10] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[1:11] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[1:12] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[1:18] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[1:19] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[1:31] <mikedawson> gregaf: no problems to report so far with ceph version 0.60-641-gc7a0477 (c7a0477bad6bfbec4ef325295ca0489ec1977926). Thanks for working through the bugs!
[1:31] <gregaf> yay
[1:36] * rustam (~rustam@ has joined #ceph
[1:42] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[1:43] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[1:43] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[1:46] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[1:54] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[1:55] <mikedawson> sagewk: are you guys planning to build packages for Raring prior to Cuttlefish?
[1:55] <mrjack> will 0.60 be a new stable release?
[1:56] <mikedawson> mrjack: 0.60 is not stable. I believe 0.61 will be deemed stable and called Cuttlefish
[1:57] <dmick> mikedawson: I'm only guessing, but my guess would be that looking at raring will come after cuttlefish is in the can
[1:58] <mikedawson> dmick: ok. for reference installing quantal packages on raring *seems* to work
[2:00] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[2:00] * noob2 (~cjh@ has joined #ceph
[2:00] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:00] <jmlowe1> mikedawson: that's good to know, I was going to have a dilemma tomorrow
[2:01] <mikedawson> jmlowe1: did you get your driver issue worked out?
[2:01] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[2:02] <jmlowe1> mikedawson: I think it was irqbalance interacting badly, remove it and everything was ok
[2:02] <mikedawson> jmlowe1: I also recommend next over 0.60 if you are moving beyond bobtail
[2:03] <jmlowe1> mikedawson: I'm holding out for cuttlefish
[2:04] <mikedawson> jmlowe1: good call, I'm testing my way towards cuttlefish
[2:05] <jmlowe1> mikedawson: thanks for that, makes me feel a lot better when I know people have been kicking the tires on all the intermediate releases
[2:06] <jmlowe1> raring looking good so far?
[2:06] * alram (~alram@ Quit (Quit: leaving)
[2:06] <jmlowe1> I'm looking forward to qemu and libvirt version bumps along with a 3.8 kernel
[2:08] <mikedawson> jmlowe1: had several raring issues over the past month, but all have been fixed. 3.8, qemu, and libvirt have been solid
[2:09] <jmlowe1> I gave it a go with one of the dailies a few weeks ago, too broken to finish the install
[2:10] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[2:11] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[2:19] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Ping timeout: 480 seconds)
[2:20] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[2:21] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[2:26] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[2:27] * rustam (~rustam@ Quit (Remote host closed the connection)
[2:27] * Cube (~Cube@ Quit (Ping timeout: 480 seconds)
[2:28] * rustam (~rustam@ has joined #ceph
[2:29] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[2:29] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:31] * rustam (~rustam@ Quit (Remote host closed the connection)
[2:37] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:38] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[2:46] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Ping timeout: 480 seconds)
[2:47] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[2:57] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[2:58] * b1tbkt (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) has joined #ceph
[3:01] * b1tbkt (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[3:04] <jmlowe1> mikedawson: you still around?
[3:04] <mikedawson> jmlowe1: yes
[3:04] <jmlowe1> ever run into this with raring "error: unsupported configuration: unknown driver format value 'rbd'"
[3:06] <mikedawson> jmlowe1: sorry, no
[3:07] <mikedawson> are you using kernel rbd?
[3:07] <jmlowe1> nope, qemu driver
[3:10] <jmlowe1> *grumble*, xml format has changed
[3:15] * lerrie (~Larry@remote.compukos.nl) Quit ()
[3:16] * frank9999 (~frank@kantoor.transip.nl) Quit ()
[3:18] <jmlowe1> now <driver name="qemu" type="raw" cache="writeback"/> was <driver name='qemu' type='rbd' cache='writeback'/>
[3:18] <jmlowe1> also refused migration of a pc-1.2 vm, wouldn't start copying the memory
[3:22] * b1tbkt (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) has joined #ceph
[3:24] <dmick> jmlowe1: srsly? from ' to "?
[3:25] <dmick> or do you mean rbd -> raw? That doesn't seem right somehow
[3:26] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[3:34] <BillK> I am using .58, is .60 an improvement or will I be better off staying on .58
[3:41] * dwt (~dwt@ip68-226-26-79.tc.ph.cox.net) has joined #ceph
[3:46] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Quit: ...)
[3:49] * dragonfly (~Li@ has joined #ceph
[3:49] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Read error: Connection reset by peer)
[3:49] * mikedawson_ (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[3:49] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[3:51] * jmlowe1 (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Write error: connection closed)
[3:54] <jmlowe> dmick: you can't use the type="rbd" you have to use type="raw" with raring's libvirt
[3:54] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[3:54] * mikedawson_ is now known as mikedawson
[3:54] <dmick> but....how does it select rbd?...
[3:55] <jmlowe> protocol="rbd"
[3:55] <dmick> ah
[3:56] <jmlowe> they eliminated the redundant "rbd" but broke compatibility
[3:56] <jmlowe> <driver name="qemu" type="raw"/>
[3:56] <jmlowe> <source protocol="rbd" name="image_name2">
[4:02] * treaki (44be59d4da@p4FF4BAE8.dip0.t-ipconnect.de) has joined #ceph
[4:04] * treaki_ (0ad1dc3bc4@p4FDF7BEF.dip0.t-ipconnect.de) Quit (Read error: Operation timed out)
[4:07] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[4:18] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[4:22] <mikedawson> BillK: I would wait a few days for cuttlefish (0.61). If you must upgrade, the gitbuilder next may be a better choice than 0.60
[4:22] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[4:25] <BillK> mikedawson: tkx, Will stay with 58 until 61 if its only few days.
[4:26] <mikedawson> BillK: sure thing
[4:27] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Remote host closed the connection)
[4:28] * noob2 (~cjh@ Quit (Quit: Leaving.)
[4:29] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:30] * wen (~chatzilla@ has joined #ceph
[4:40] * wen (~chatzilla@ Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[4:51] * dragonfly (~Li@ Quit (Remote host closed the connection)
[4:57] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[4:58] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[5:01] * john_barbee (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[5:01] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[5:07] * dwt (~dwt@ip68-226-26-79.tc.ph.cox.net) Quit (Read error: Connection reset by peer)
[5:59] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[6:17] * hflai (~hflai@alumni.cs.nctu.edu.tw) Quit (Remote host closed the connection)
[6:18] * hflai (~hflai@alumni.cs.nctu.edu.tw) has joined #ceph
[6:44] * yehuda_hm (~yehuda@2602:306:330b:1410:c54d:84c4:231a:4ca6) Quit (Ping timeout: 480 seconds)
[6:48] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[6:48] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[6:49] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[6:50] * yehuda_hm (~yehuda@2602:306:330b:1410:c54d:84c4:231a:4ca6) has joined #ceph
[6:56] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[6:59] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[6:59] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[7:00] * loicd (~loic@2a01:e35:2eba:db10:f1a5:86bb:839b:2795) has joined #ceph
[7:00] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[7:02] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[7:03] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[7:03] * yehuda_hm (~yehuda@2602:306:330b:1410:c54d:84c4:231a:4ca6) Quit (Ping timeout: 480 seconds)
[7:09] * tkensiski1 (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[7:09] * tkensiski1 (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[7:20] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[7:21] * yehuda_hm (~yehuda@2602:306:330b:1410:c54d:84c4:231a:4ca6) has joined #ceph
[7:23] * rustam (~rustam@ has joined #ceph
[7:24] * rustam (~rustam@ Quit (Remote host closed the connection)
[7:24] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[7:27] * sjusthm1 (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:30] * trond (~trond@trh.betradar.com) Quit (Remote host closed the connection)
[7:30] * trond (~trond@trh.betradar.com) has joined #ceph
[7:30] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[7:31] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[7:41] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[7:42] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[7:43] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[7:45] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[7:51] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[7:53] * loicd (~loic@2a01:e35:2eba:db10:f1a5:86bb:839b:2795) Quit (Quit: Leaving.)
[7:53] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[7:57] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[7:57] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[7:57] * tnt (~tnt@ has joined #ceph
[8:02] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[8:06] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[8:11] * virsibl (~virsibl@ has joined #ceph
[8:25] * mortisha (d4af59a2@ircip2.mibbit.com) has joined #ceph
[8:25] * virsibl (~virsibl@ has left #ceph
[8:25] * Vjarjadian (~IceChat77@ Quit (Quit: Do fish get thirsty?)
[8:26] * mortisha (d4af59a2@ircip2.mibbit.com) Quit ()
[8:30] * shardul_man (~shardul@174-17-80-182.phnx.qwest.net) Quit (Quit: Leaving)
[8:37] * rawsik|2 (~kvirc@ has joined #ceph
[8:46] * hybrid5121 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[8:50] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[8:55] * LeaChim (~LeaChim@ has joined #ceph
[9:05] * loicd (~loic@ has joined #ceph
[9:11] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[9:11] * hybrid5121 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[9:16] * john_barbee (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[9:18] * mnash (~chatzilla@vpn.expressionanalysis.com) Quit (Remote host closed the connection)
[9:24] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:30] * BManojlovic (~steki@ has joined #ceph
[9:32] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:37] * frank9999 (~frank@kantoor.transip.nl) has joined #ceph
[9:38] * eschnou (~eschnou@ has joined #ceph
[9:39] * ScOut3R (~ScOut3R@ has joined #ceph
[9:39] * leseb (~Adium@ has joined #ceph
[9:41] * l0nk (~alex@ has joined #ceph
[9:46] * capri (~capri@pd95c3283.dip0.t-ipconnect.de) has joined #ceph
[10:16] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:28] * stacker666 (~stacker66@33.pool85-58-181.dynamic.orange.es) has joined #ceph
[10:41] * tziOm (~bjornar@ has joined #ceph
[10:48] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[10:50] * vo1d (~v0@193-83-55-200.adsl.highway.telekom.at) has joined #ceph
[10:57] * v0id (~v0@212-183-101-130.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[11:07] <Kioob`Taff> is there performance improvement in kernel RBD client between Linux 3.6 and Linux 3.8 ?
[11:07] <Kioob`Taff> (hi)
[11:14] * dxd828 (~dxd828@ Quit (Read error: No route to host)
[11:17] * dxd828 (~dxd828@ has joined #ceph
[12:01] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[12:02] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[12:02] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[12:03] <LeaChim> Hi, I'm having a problem with my crushmap. The output of osd tree just shows the osds, with none of the buckets I've specified in the crush map.
[12:07] * bergerx_ (~bekir@ has joined #ceph
[12:10] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[12:12] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[12:29] * NightDog (~Karl@dhcp-110-216.idi.ntnu.no) has joined #ceph
[12:31] * NightDog (~Karl@dhcp-110-216.idi.ntnu.no) Quit ()
[12:36] * calebamiles (~caleb@c-50-138-218-203.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[12:38] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[12:41] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[12:55] * sleinen (~Adium@2001:620:0:25:31c2:243a:5736:51bb) has joined #ceph
[12:58] <madkiss> hm.
[12:59] * BillK (~BillK@58-7-145-25.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[13:07] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[13:09] * steki (~steki@79-101-172-174.dynamic.isp.telekom.rs) has joined #ceph
[13:10] * pixel (~pixel@ has joined #ceph
[13:13] * BManojlovic (~steki@ Quit (Ping timeout: 480 seconds)
[13:14] * BillK (~BillK@124-149-73-192.dyn.iinet.net.au) has joined #ceph
[13:21] * loicd wonders what a PGPool auid is, in the context of https://github.com/ceph/ceph/blob/master/src/osd/PG.h#L142
[13:39] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[13:40] * treaki (44be59d4da@p4FF4BAE8.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[13:44] <Kioob`Taff> nhm: I rework my config, based on your article http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/ ; and effectivly I was able to reduce and stabilize the latency on my cluster. Thanks a lot !
[13:56] <nhm> Kioob`Taff: excellent! what did you change?
[14:04] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[14:10] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:14] <Kioob`Taff> nhm: I removed "filestore journal writeahead = true", use your "big_ops" parameters, remove my "journal max write bytes", and fix number of threads (disk = 4, op = 8)
[14:14] <Kioob`Taff> (I use XFS and have mainly random small writes)
[14:18] <Kioob`Taff> https://daevel.fr/lamp-response-time.png <== at left, my previous config, at right, the new config. And the huge overload is because of the restart of all the OSD (to be sure to re-init the conf)
[14:19] <Kioob`Taff> the new one is really stable
[14:19] <Kioob`Taff> (and faster)
[14:26] <Kioob`Taff> does some work is planned for RBD usage with Xen ? (in PV mode)
[14:27] <Kioob`Taff> current perfs are far lower that from the host
[14:27] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[14:29] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[14:29] <Kioob`Taff> I see 2 main problems : Xen doesn't expose the device informations to the DomU kernel, and Xen split all writes in 44KB chunks
[14:30] <Kioob`Taff> but it's probably more a Xen bug than a Ceph problem :/
[14:32] <nhm> Kioob`Taff: there was some talk on the mailing list about Xen recently
[14:33] <nhm> see the discusion about "Xen blktap driver"
[14:34] <Kioob`Taff> thanks, I look at that
[14:35] <Kioob`Taff> the ceph mailing list, or xen mailing list ?
[14:35] <Kioob`Taff> oh, ceph-devel
[14:35] <darkfader> Kioob`Taff: how much better does it get if you do the standard things like using noop scheduler?
[14:36] <Kioob`Taff> darkfader: I was already in deadline
[14:39] * mtk (~mtk@44c35983.test.dnsbl.oftc.net) Quit (Remote host closed the connection)
[14:41] <Kioob`Taff> "you need a working blktap setup"... and of course, blktap is not compiled with Debian
[14:42] <darkfader> debian will by unwritten policy never deliver a fully working domU ;)
[14:42] <darkfader> Kioob`Taff: but try noop some time if you can, it does not request ordering and generally tries less "clever" things that all don't work out
[14:43] <darkfader> but it can't help with the cut-off io sizes of course
[14:43] <darkfader> just less cutting-off
[14:44] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[14:48] * gaveen (~gaveen@ has joined #ceph
[15:03] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:03] * Vjarjadian (~IceChat77@ has joined #ceph
[15:06] * rawsik|2 (~kvirc@ Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[15:14] * pixel (~pixel@ Quit (Ping timeout: 480 seconds)
[15:17] * jskinner (~jskinner@ has joined #ceph
[15:19] * Havre (~Havre@2a01:e35:8a2c:b230:e8a8:e15:1197:808c) Quit (Ping timeout: 480 seconds)
[15:22] * pixel (~pixel@ has joined #ceph
[15:27] * Havre (~Havre@2a01:e35:8a2c:b230:3579:bf14:69ef:822b) has joined #ceph
[15:31] <jerker> cool with 12 PB raw 4 PB usable on 3000 HDD on mailinglist way to go
[15:31] * doubleg (~doubleg@ Quit (Quit: Lost terminal)
[15:32] <jerker> beats me with 16 TB raw 5 TB usable om 4 HDD :-)
[15:33] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[15:37] * pixel (~pixel@ Quit (Quit: Ухожу я от вас (xchat 2.4.5 или старше))
[15:38] * doubleg (~doubleg@ has joined #ceph
[15:50] * treaki (330a8b1619@p4FDF603C.dip0.t-ipconnect.de) has joined #ceph
[15:54] * ggreg (~ggreg@int.0x80.net) has joined #ceph
[15:54] <ggreg> hi all
[15:58] * Vjarjadian (~IceChat77@ Quit (Quit: I cna ytpe 300 wrods pre mniuet!!!)
[16:03] * PerlStalker (~PerlStalk@ has joined #ceph
[16:03] <jefferai> so if Ubuntu 13.04 includes latest Ceph Bobtail, does this mean that Ubuntu 13.04 will be a fully-supported release? Or is 12.04 + Ceph-provided packages still the most recommended solution?
[16:05] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (Quit: Leaving)
[16:08] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[16:22] * lofejndif (~lsqavnbok@09GAAB6V8.tor-irc.dnsbl.oftc.net) has joined #ceph
[16:23] <matt_> jefferai, I would say that the 13.04 packages are fine to use until cuttlefish is released
[16:23] <matt_> then you will probably need to use a repository
[16:24] <jefferai> I see
[16:25] <jefferai> matt_: so I have 12.04 machines now, and I'm just trying to figure out whether I should upgrade to 13.04 (and stick with Ceph packages) or stay on 12.04 (and stick with Ceph packages)
[16:25] <jefferai> 13.04 brings a higher likelihood of using btrfs successfully with Ceph, for instance
[16:25] <matt_> I'm upgrading a 12.10 machine to 13.04 right now, I'll let you know how I go
[16:26] <matt_> I'm hoping the 3.8 kernel will fix some BTRFS slowness
[16:28] <jmlowe> if you are using rbd with qemu and libvirt, they changed the xml syntax slightly
[16:29] <matt_> jmlowe, thanks for the heads up. This is just an OSD server so it should be all good
[16:29] <jmlowe> now <driver name="qemu" type="raw"/>, was <driver name="qemu" type="rbd"/>
[16:29] <jmlowe> caused me a little grief last night
[16:30] <matt_> jmlowe, do any of your VM's run Windows 2008r2?
[16:30] <jmlowe> all linux, mostly centos 5 and centos 6
[16:31] <jefferai> jmlowe: yeah, I'll be upgrading my compute hosts to 13.04 to take advantage of qemu 1.4, so I'm trying to decide whether to upgrade the storage nodes at the same time
[16:31] <matt_> ah ok, I found a RTC timer bug a while back when playing with the 3.8 kernel just wanted to know if you had seen it too but it doesn't affect linux guests
[16:31] <jmlowe> I wasn't successful at doing a live migration from a vm on quantal to raring
[16:32] <jmlowe> I didn't dig into it too much,
[16:32] <matt_> IIRC, there are a heap of live migration changes that break QEMU 1.4 compatibility with earlier releases
[16:32] <jmlowe> jefferai: if you manage to do it please let me know
[16:33] <jmlowe> !#@$
[16:33] <matt_> I read about it on the proxmox forum, seems to be a well known issue
[16:35] <jmlowe> there is also a new machine type, q35, I couldn't get that to work either
[16:36] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[16:36] <matt_> I haven't tried that one yet
[16:36] <jefferai> jmlowe: I see -- so my interest is that qemu is supposed to fix some slowness problems with I/O
[16:36] <jefferai> and some other changes
[16:36] <jefferai> but also, I want to use the native librados qemu stuff
[16:36] <jefferai> because I've been using ganeti and it does kernel RBD
[16:36] <matt_> jefferai, do you mean IO slowness with Ceph in particular?
[16:37] <jefferai> and that's been annoying and potentially causing issues that I've seen
[16:37] <jefferai> and requires using fairly untested kernels, or using older and known buggy kernels (known buggy w.r.t. RBD)
[16:37] <jefferai> so my thought was to use native qemu/libvirt migration, I don't *really* need a cluster manager for four nodes
[16:38] <jefferai> especially given that ganeti doesn't do automatic failover
[16:38] <jefferai> and in doing so I can use non-kernel RBD, which I have been hearing is a better option when possible
[16:38] * drokita (~drokita@ has joined #ceph
[16:38] <jefferai> so I'm fine testing this out on the compute side, but wondering if I should also upgrade on the storage side
[16:39] <jefferai> matt_: no, something odd -- I posted on ceph-devel but didn't hear back, let me dig it up
[16:39] <matt_> jefferai, just switching to the native RBD driver in qemu should give you a big jump in performance
[16:39] <jefferai> nice
[16:39] <matt_> and there is a heap of change in cuttlefish that fixes the rbd cache and add async IO
[16:39] * dgbaley27 (~matt@mrct45-133-dhcp.resnet.colorado.edu) has joined #ceph
[16:39] <jefferai> matt_: look at the "poor write performance" thread
[16:41] <jefferai> matt_: which kind of cache does it fix? Write-back or write-through?
[16:41] <jefferai> (or both)?
[16:42] <matt_> both, there was a problem when using the cache that causes the VM to get high latency under heavy disk IO
[16:42] <matt_> pretty sure it's been fixed for cuttlefish
[16:43] <jefferai> ah
[16:45] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[16:51] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) has joined #ceph
[16:51] <jefferai> matt_: http://www.mail-archive.com/ceph-devel@vger.kernel.org/msg13893.html
[16:51] <jefferai> that's my message, but it ended up killing that part of the thread
[16:51] <jefferai> :-)
[16:53] <matt_> I'll have a look in a sec, just rebooted my server into 13.04
[16:56] <nhm> yes, rbd cache behavior should be much better in cuttlefish
[16:57] <jefferai> ok
[16:57] <nhm> btw guys, I may be seeing some performance regressions with kernel 3.8 vs 3.6.
[16:57] <nhm> doing more tests now.
[16:58] <jmlowe> jefferai: is that virtio-blk or virtio-scsi?
[16:58] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Read error: Operation timed out)
[16:58] <matt_> nhm, of the BTRFS kind or just in general?
[16:59] <nhm> matt_: In general.
[16:59] <jmlowe> nhm: what order of magnitude are we talking about?
[17:00] <jefferai> jmlowe: sadly, I don't remember and can't look now as I'm still cut off from the testbed
[17:00] <nhm> jmlowe: not sure how much of this is due to 3.8. I may be simultaneously having another issue. I'm getting about 50-60% of the performance I was getting 2 months ago.
[17:00] <jefferai> (I am changing from full time employee there to part time hourly, and while that is in place I am not an employee at all)
[17:01] <nhm> jmlowe: though it's primarily write performance that's a problem.
[17:02] <matt_> nhm, I just tested my 4MB write after the upgrade and it appears to be the same as before using 0.60
[17:02] * aliguori (~anthony@ has joined #ceph
[17:02] <jefferai> matt_: how about 4k write?
[17:02] <matt_> Just doing that now
[17:03] <nhm> matt_: yeah, this seems to be a hardware/kernel issue
[17:04] <matt_> hmm... 4kb seems crappy. 1.5Mb/s average which is pretty bad
[17:04] * tziOm (~bjornar@ Quit (Remote host closed the connection)
[17:04] <jefferai> still better than 150kb/s
[17:04] <nhm> average for what?
[17:05] <matt_> 48 osd's over 2 servers, connected via infiniband, ssd journals, replica 3
[17:05] <nhm> 1.5MB/s aggregate throughput for the whole cluster?
[17:06] <matt_> at 4kb and 64 concurrent run from a single server, yes
[17:06] <nhm> :(
[17:06] <nhm> rados bench or something else?
[17:06] <matt_> This was rados bench
[17:07] <nhm> does more concurrrent ops help?
[17:07] <matt_> one of my servers is BTRFS and the other is XFS... I'm thinking I might change everything to XFS
[17:07] <jefferai> oh yeah, that was another reason I was thinking of upgrading my storage boxes to 13.04, I'm sick of XFS killing itself every reboot
[17:07] <jefferai> even when cleanly shut down
[17:07] <jefferai> some OSD or another fails to come up and XFS is corrupt :-(
[17:07] <jefferai> I know there were some kernel fixes put in XFS past 3.2...
[17:09] <matt_> nhm, I also have a pool with comprised of 48 SSD's over 10 hosts. Rep 3 again. Average is around 10MB/s for 4kb IO
[17:10] <jmlowe> matt_: I would switch to xfs, btrfs has caused me lots and lots of pain
[17:11] <nhm> matt_: writes and reads?
[17:11] <matt_> nhm, just writes. I haven't benched reads yet
[17:12] <matt_> 4MB writes are 450+ MB/s though :D
[17:12] <nhm> matt_: that's good at least!
[17:12] <nhm> matt_: I'm annoyed. btrfs on this node used to be good for 2GB/s+ and with kernel 3.8 I was hitting 1.1GB/s.
[17:13] <nhm> so now I have to go backtrack with older kernels and older ceph releases.
[17:13] * dxd828 (~dxd828@ Quit (Remote host closed the connection)
[17:13] <matt_> nhm, that's a bit odd. Did you keep your old kernel?
[17:14] <nhm> matt_: yeah, I've got a couple of old ones I'm trying out. I suspect maybe one of my drives is running a bit slower.
[17:14] <matt_> nhm, how are you benching reads? rados bench doesn't appear to do it
[17:15] <nhm> matt_: you have to do a write run with the --no-cleanup flag and then instead of write, use seq for the 2nd test.
[17:16] <jefferai> jmlowe: I thought in recent Ceph that ext4 was actually not a bad option these days
[17:16] <matt_> nhm, ah ok. That sounds a little too much work for tonight... I should probably get back to studying for my Google interview :/
[17:16] <nhm> I also do a sync and echo 3 | sudo tee /proc/sys/vm/drop_caches before the read test on all the nodes.
[17:16] <nhm> matt_: ah, good luck!
[17:17] <matt_> nhm, thanks! I'd happily trade some luck for a computer science degree right now though!
[17:18] <nhm> matt_: what kind of position are you interviewing for?
[17:18] <matt_> I think it's a software engineer position they had in mind for me. They tracked me down via linked-in so I didn't really apply for a certain one
[17:19] <jefferai> matt_: ah, cool -- most people that get jobs there actually get it from someone inside knowing them
[17:19] <jefferai> so if someone tracked you down on LinkedIn then you're partway in the door already
[17:20] <matt_> I'm hoping so
[17:20] <wido> nhm: I tried with the wip aio branch, that seems rather nice
[17:21] <wido> I'm thinking about upgrading to 0.61 right now to have all the fixes
[17:21] <wido> I'm just still trying to figure out why the Qemu instance isn't "snappy", for example a simply "df -h" took 5 seconds just now
[17:22] <nhm> wido: does turning off rbd cache help?
[17:22] <wido> nhm: I'll give that a try
[17:23] <wido> nhm: Observed any read issues with rbd cache on?
[17:23] * gmason (~gmason@hpcc-fw.net.msu.edu) has joined #ceph
[17:23] <nhm> wido: reads were fine in all of my tests, but I didn't really use the VMs interactively.
[17:24] <wido> nhm: Reads are fine in benchmarks, well, could be a bit better, but the VM isn't snappy
[17:24] <wido> seems like sometimes it's waiting
[17:24] <nhm> wido: We've definitely seen reports like that. Were hoping that Josh's patches from wip-aio fixed it.
[17:25] <matt_> wido, if you ping the VM whilst you're running commands do the ping times increase?
[17:25] <nhm> You are still seeing it 0.60?
[17:25] <wido> nhm: Have to try 0.60, but I'm running the wip-aio branch already on the client, with the Qemu fixes
[17:25] <wido> matt_: No, not seeing #3737
[17:28] <wido> nhm: I'll try the next branch to see what that does
[17:31] <nhm> wido: ok, cool
[17:31] <wido> nhm: What I do observe for example during a read is that the client isn't reading constantly
[17:31] <wido> so bwm-ng shows me a peak of 80MB/sec, nothing for 2 seconds and suddenly 80MB/sec again
[17:31] <wido> And in the end the VM reads with 40MB/sec with a simple dd read
[17:32] <nhm> wido: how big are the reads?
[17:32] <wido> nhm: 1M reads
[17:32] <nhm> wido: does increasing read_ahead_kb on the OSDs help at all?
[17:33] <nhm> also, on one cluster I was working on, tcp autotuning was causing all kinds of problems, but strangely only for reads.
[17:33] <nhm> It may not do anything, but you could try disabling it.
[17:35] <wido> nhm: I'll disable it on the cluster, just see what it does
[17:35] <nhm> make sure to do it on the clients and servers
[17:35] * dxd828 (~dxd828@ has joined #ceph
[17:36] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[17:37] <wido> nhm: That indeed changed a lot. Saw a 50% increase. Went from 41MB/sec to about 69MB/sec
[17:38] <wido> No more peaks in bandwith, but now a sustained throughput
[17:38] * loicd (~loic@magenta.dachary.org) has joined #ceph
[17:38] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[17:39] <nhm> wido: the tcp autotuning or read_ahead_kb setting?
[17:39] <wido> nhm: the tcp autotuning
[17:39] <nhm> wido: ok, that's very good to know. We put a patch in recent versions of ceph to make the buffer size configurable that should theoretically fix it too.
[17:39] <nhm> Did you just turn it off in proc?
[17:43] <stacker666> hi there!
[17:43] <stacker666> the kernels of http://gitbuilder.ceph.com/ are stable?
[17:43] <nhm> stacker666: yes, though I'm investigating a potential performance regression with the 3.8 kernel.
[17:44] <stacker666> i trying one of 3.8 and kernel panic appears when i create a 1T file
[17:44] <nhm> stacker666: In the middle of it right now actually.
[17:44] <stacker666> :S
[17:44] <nhm> stacker666: with cephfs?
[17:44] <stacker666> yes
[17:44] <nhm> stacker666: does it happen with a different kernel?
[17:44] * aliguori_ (~anthony@ has joined #ceph
[17:44] <stacker666> im trying now
[17:45] <stacker666> to do the same with a normal kernel
[17:45] <nhm> stacker666: You might want to file a bug in the tracker with the stacktrace
[17:45] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[17:45] <nhm> stacker666: unfortunately cephfs is lower priority right now since we've got a lot of paying customers that want work on RBD/RGW.
[17:47] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[17:48] <stacker666> oh
[17:49] <stacker666> nhm: better with rbd map?
[17:50] <Gugge-47527> stacker666: its two different usecases, rbd is considered stable though
[17:50] <stacker666> nhm: ooook thx
[17:51] <nhm> stacker666: rbd is the block device layer, so only 1 node can mount a volume at once.
[17:51] <nhm> stacker666: it's simpler but considered stable at this point.
[17:52] <wido> nhm: So disabling autotuning and enabling the RBD cache again with wip-aio seems to make the VM a lot snappier
[17:53] <wido> I haven't run any benchmarks yet, but it makes a big difference just working with the VM
[17:53] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[17:53] <nhm> wido: Jim Schutt had a big long email thread last year about this. Basically all kinds of tcp retransmits get sent causing all sorts of delays.
[17:54] <wido> nhm: I'll look that one up
[17:57] <stacker666> nhm: i have exported using iscsitarget and works fine with diferent images
[17:57] <stacker666> nhm: at the same time
[17:58] <nhm> stacker666: if you have a normal filesystem on a block device and try to mount it on multiple clients it will lead to trouble.
[18:06] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[18:08] * tnt (~tnt@212-166-48-236.win.be) Quit (Read error: Operation timed out)
[18:10] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[18:11] <stacker666> nhm: im trying to use it with several ESXi as a datastore
[18:13] * paravoid_ is now known as paravoid
[18:14] * steki (~steki@79-101-172-174.dynamic.isp.telekom.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[18:17] * sleinen (~Adium@2001:620:0:25:31c2:243a:5736:51bb) Quit (Quit: Leaving.)
[18:17] * sleinen (~Adium@ has joined #ceph
[18:17] <stacker666> nhm: with a low latency kernel in ubuntu repositories it seems that works without a problem (no kernel panic)
[18:18] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[18:21] * alram (~alram@ has joined #ceph
[18:25] * gaveen (~gaveen@ has joined #ceph
[18:25] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[18:26] * davidzlap (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[18:31] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:33] * Vjarjadian (~IceChat77@ has joined #ceph
[18:33] <imjustmatthew> greaf: around?
[18:34] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[18:38] <nhm> stacker666: good to know. FWIW, I just found that our 3.8 kernel also is dramatically slowing down throughput relative to a kernel-ppa 3.8 raring kernel.
[18:39] <nhm> stacker666: I suspect there is some debugging enabled that is causing issues.
[18:39] * diegows (~diegows@ has joined #ceph
[18:39] * tnt (~tnt@ has joined #ceph
[18:40] <gregaf> imjustmatthew: perhaps you meant gregaf? :p
[18:40] <imjustmatthew> gregaf: most def :)
[18:40] <imjustmatthew> did you per chance find out which gitbuilder builds have tcmalloc?
[18:41] <gregaf> I think dmick looked at it, the only thing I heard was "hmm, that should have had it wtf"
[18:44] <nhm> gregaf: missing tcmalloc?
[18:45] * l0nk (~alex@ Quit (Quit: Leaving.)
[18:45] <gregaf> yeah
[18:45] <imjustmatthew> okay. is their a biuld somewhere that you know I can use to track down this memory issue while it's still happening?
[18:46] <imjustmatthew> It's also weird bc it seems to be associated with unusually high CPU usage by the mons
[18:47] * portante` (~user@ has joined #ceph
[18:48] * portante` (~user@ Quit (Remote host closed the connection)
[18:49] * portante` (~user@ has joined #ceph
[18:56] * tkensiski (~tkensiski@ has joined #ceph
[18:56] * tkensiski (~tkensiski@ has left #ceph
[18:58] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[18:59] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[19:05] * davidzlap (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[19:06] * davidzlap (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit ()
[19:10] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[19:10] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[19:11] * BillK (~BillK@124-149-73-192.dyn.iinet.net.au) Quit (Read error: Operation timed out)
[19:13] * BillK (~BillK@124-149-73-192.dyn.iinet.net.au) has joined #ceph
[19:17] * noob2 (~cjh@ has joined #ceph
[19:18] * Vjarjadian (~IceChat77@ Quit (Quit: OUCH!!!)
[19:19] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[19:22] * dwt (~dwt@128-107-239-234.cisco.com) has joined #ceph
[19:23] * bergerx_ (~bekir@ Quit (Quit: Leaving.)
[19:24] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[19:36] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[19:38] * lofejndif (~lsqavnbok@09GAAB6V8.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[19:41] * mib_tpw2p2 (5b09cd2c@ircip4.mibbit.com) has joined #ceph
[19:55] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[19:59] <mikedawson> gregaf: ceph-mon.a appears to have died at 11:11 utc this morning. No core dump. Nothing in the logs past that point. ceph version 0.60-641-gc7a0477 (c7a0477bad6bfbec4ef325295ca0489ec1977926). Any idea how that happens?
[20:00] <gregaf> mikedawson: dmesg show it getting killed? is the process actually gone or just the log stopped?
[20:02] * Cube (~Cube@ has joined #ceph
[20:03] <mikedawson> gregaf: can't find anything in dmesg about ceph. Process is gone and logging shows nothing indicating it went away
[20:03] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Remote host closed the connection)
[20:03] <gregaf> I dunno what to tell you then; you sure nobody came in and turned it off?
[20:03] * portante` (~user@ Quit (Ping timeout: 480 seconds)
[20:04] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[20:10] * rustam (~rustam@ has joined #ceph
[20:10] <mikedawson> gregaf: that was my thought too, but the others have alibis (and logs confirm them)
[20:11] <gregaf> I've got no magic introspection that you don't, sorry :(
[20:11] <gregaf> I guess you could check the other monitor logs and see if they've got something at that time other than "oh, it disappeared"
[20:11] <mikedawson> gregaf: bigger problem is mon.a hasn't rejoined a happy quorum
[20:11] * rustam (~rustam@ Quit (Remote host closed the connection)
[20:13] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:13] * noob2 (~cjh@ Quit (Quit: Leaving.)
[20:19] <mikedawson> gregaf: with logging turned up: http://pastebin.com/raw.php?i=AkbNA5PP
[20:19] * dwt (~dwt@128-107-239-234.cisco.com) Quit (Read error: Connection reset by peer)
[20:20] * BillK (~BillK@124-149-73-192.dyn.iinet.net.au) Quit (Read error: Operation timed out)
[20:21] * barryo1 (~barry@host86-146-83-151.range86-146.btcentralplus.com) has joined #ceph
[20:24] <gregaf> I've got to work on something else right now, but thanks for the log
[20:24] <gregaf> it looks like it's got messenger but not monitor though?
[20:24] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[20:25] <mikedawson> gregaf: sure. would you like this to be entered as a bug? on the other question... mon.b and mon.c had an election right after mon.a went away. That seems like a reasonable response
[20:26] <gregaf> oh, sure — hoping it's a dupe but dunno
[20:28] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[20:31] <imjustmatthew> Quick check before I report it, does this mon crash look like a duplicate? http://pastebin.com/K7q71xPL
[20:32] <gregaf> imjustmatthew: not precisely, but we should flag sagewk about it as that could be related to a lost message bug he's working on
[20:32] * LeaChim (~LeaChim@ has joined #ceph
[20:34] <imjustmatthew> gregaf: #4810?
[20:34] <gregaf> don't remember, I just want sagewk to see it
[20:42] * DLange (~DLange@dlange.user.oftc.net) Quit (Server closed connection)
[20:43] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[20:43] * janisg (~troll@ Quit (Server closed connection)
[20:43] * lurbs (user@uber.geek.nz) Quit (Server closed connection)
[20:43] * lurbs (user@uber.geek.nz) has joined #ceph
[20:43] * sleinen (~Adium@2001:620:0:25:1424:ea6b:a127:4a70) has joined #ceph
[20:43] * janisg (~troll@ has joined #ceph
[20:43] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[20:44] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[20:44] * Tribaal (uid3081@id-3081.hillingdon.irccloud.com) Quit (Server closed connection)
[20:44] * scalability-junk (uid6422@id-6422.tooting.irccloud.com) Quit (Server closed connection)
[20:44] * scalability-junk (uid6422@tooting.irccloud.com) has joined #ceph
[20:44] * Tribaal (uid3081@hillingdon.irccloud.com) has joined #ceph
[20:44] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Server closed connection)
[20:45] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[20:51] * dwt (~dwt@wsip-70-166-104-226.ph.ph.cox.net) has joined #ceph
[20:52] * dontalton (~dwt@128-107-239-235.cisco.com) has joined #ceph
[20:54] <mikedawson> imjustmatthew: I think I'm suffering the mon tmalloc issue with ceph version 0.60-641-gc7a0477 (c7a0477bad6bfbec4ef325295ca0489ec1977926). High mem + cpu
[20:56] * BManojlovic (~steki@fo-d- has joined #ceph
[20:56] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[20:57] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Quit: Leaving.)
[20:59] * eschnou (~eschnou@131.165-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:59] * dwt (~dwt@wsip-70-166-104-226.ph.ph.cox.net) Quit (Ping timeout: 480 seconds)
[21:00] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) Quit (Server closed connection)
[21:00] * mynameisbruce (~mynameisb@tjure.netzquadrat.de) has joined #ceph
[21:02] * madkiss1 (~madkiss@2001:6f8:12c3:f00f:506:58ff:da81:533b) has joined #ceph
[21:03] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[21:03] * mikedawson_ (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[21:03] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[21:04] * b1tbkt_ (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) has joined #ceph
[21:05] * mistur_ (~yoann@kewl.mistur.org) has joined #ceph
[21:05] * trond_ (~trond@trh.betradar.com) has joined #ceph
[21:05] * dosaboy_ (~dosaboy@host86-161-164-218.range86-161.btcentralplus.com) has joined #ceph
[21:05] * ggreg_ (~ggreg@int.0x80.net) has joined #ceph
[21:06] * liiwi_ (liiwi@idle.fi) has joined #ceph
[21:06] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * ggreg (~ggreg@int.0x80.net) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * jbd_ (~jbd_@34322hpv162162.ikoula.com) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * trond (~trond@trh.betradar.com) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * b1tbkt (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * Q310 (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * dosaboy (~dosaboy@host86-161-164-218.range86-161.btcentralplus.com) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * madkiss (~madkiss@2001:6f8:12c3:f00f:75ae:96dd:448f:746b) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * mrjack (mrjack@office.smart-weblications.net) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * houkouonchi-work (~linux@ Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * dmick (~dmick@2607:f298:a:607:a034:ace:685b:f0b2) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * mistur (~yoann@kewl.mistur.org) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * liiwi (liiwi@idle.fi) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * jamespage (~jamespage@culvain.gromper.net) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * sileht (~sileht@sileht.net) Quit (reticulum.oftc.net magnet.oftc.net)
[21:06] * mikedawson_ is now known as mikedawson
[21:08] * jamespage (~jamespage@culvain.gromper.net) has joined #ceph
[21:09] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[21:11] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[21:12] * houkouonchi-work (~linux@ has joined #ceph
[21:12] * dmick (~dmick@2607:f298:a:607:b872:b2ac:376e:1053) has joined #ceph
[21:13] * MarkN (~nathan@ has joined #ceph
[21:14] * sileht (~sileht@sileht.net) has joined #ceph
[21:15] * jmlowe (~Adium@2001:18e8:2:28cf:f000::3ab8) has joined #ceph
[21:18] <benner> what's purpose for standart ceph osd pools? I suppose rbd is for rbd. how about metadata and data? is it for cephfs?
[21:19] <tnt> yes
[21:22] <benner> so is it safe to delete them if i don't using cephfs?
[21:23] * noob2 (~cjh@ has joined #ceph
[21:24] <tnt> AFAIK
[21:24] * leseb (~Adium@ Quit (Quit: Leaving.)
[21:24] * vata (~vata@2607:fad8:4:6:6d6f:921c:d389:bb97) has joined #ceph
[21:25] <benner> ok
[21:27] <imjustmatthew> sagewk: Issue is reported as #4816
[21:29] <imjustmatthew> mikedawson: Interesting, is the CPU load related directly to one of your bugs? Also, we're showing great talent at breaking mons :)
[21:29] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[21:31] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[21:32] <mikedawson> imjustmatthew: no, CPU doesn't appear to be directly related to bugs, but I did have oom-killer kill off a ceph-mon process, and it currently will not rejoin quorum
[21:33] <mikedawson> imjustmatthew: http://tracker.ceph.com/issues/4815
[21:33] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[21:34] <Elbandi_> i try to disable the readahead on a cephfs mount (rsize=0,rasize=0), but still transfer more bytes as it should be
[21:34] <Elbandi_> http://pastebin.com/zS1fzJwd
[21:34] <Elbandi_> aio_read 0~4
[21:35] <Elbandi_> 4 bytes from the beginning
[21:35] <Elbandi_> start_read 0~16384
[21:35] <Elbandi_> :(
[21:39] * MarkN (~nathan@ Quit (Ping timeout: 480 seconds)
[21:43] <sagewk> mikedawson: is there any output in the mon log while it is eating up ram like this?
[21:43] * kylehutson (~kylehutso@dhcp231-11.user.cis.ksu.edu) has joined #ceph
[21:44] <gregaf> are the monitors running with tcmalloc or has something gone horribly wrong with them?
[21:45] <mikedawson> sagewk: Mon log is attached at http://tracker.ceph.com/attachments/download/797/ceph-mon.a.log
[21:45] <mikedawson> gregaf: how do I check?
[21:45] <gregaf> try getting heap stats out of them
[21:45] <gregaf> "ceph -m <mon-ip> heap stats", I think?
[21:45] <gregaf> while running ceph -w in another window
[21:47] <kylehutson> I recently expanded my ceph cluster and ended up with the problem mentioned at http://www.spinics.net/lists/ceph-devel/msg08361.html , but when I run "ceph osd crush tunables bobtail" (per the documentation), I get "unknown command crush"
[21:47] <kylehutson> What's the proper way to implement tunables now?
[21:47] <mikedawson> gregaf: the mon that is borked doesn't respond. The others say "tcmalloc not enabled, can't use heap profiler commands"
[21:48] <gregaf> dammit dammit dammit what is going on here
[21:48] <gregaf> what repository are you pulling from mikedawson?
[21:49] <mikedawson> gregaf: I am on a raring nightly installing from deb http://gitbuilder.ceph.com/ceph-deb-quantal-x86_64-basic/ref/next quantal main
[21:49] <mikedawson> perhaps the raring/quantal mismatch is the issue
[21:52] <gregaf> no, it wouldn't be failing politely if that were the issue; it's somehow being built with tcmalloc disabled
[21:52] * glowell1 (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[21:52] <gregaf> unless I'm very much misremembering what's built versus linked
[21:54] * liiwi_ is now known as liiwi
[21:55] <mikedawson> gregaf: ok. I'll happily test a new build whenever you need me to
[21:57] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[22:01] * dontalton (~dwt@128-107-239-235.cisco.com) Quit (Ping timeout: 480 seconds)
[22:03] <mikedawson> gregaf: in case it matters, ceph-osd looks much more normal " 8044 root 20 0 9392m 613m 5848 S 2 1.3 29:20.35 ceph-osd"
[22:05] <mikedawson> actually that probably seems high, too
[22:05] <gregaf> mikedawson: hrm, can you do "ceph osd tell 0 heap stats" and see what that output is?
[22:06] <mikedawson> gregaf: it just returns "ok"
[22:06] <gregaf> what's ceph -w show?
[22:06] <gregaf> I want to see if it generates the heap stats output :)
[22:07] <mikedawson> gregaf: http://pastebin.com/raw.php?i=UWn9Hnqc
[22:08] <mikedawson> that covers the time I did ceph osd tell 0 heap stats
[22:08] <gregaf> okay
[22:08] <gregaf> so not enabled
[22:08] <gregaf> (the tell command has some pretty stupid routing, so the monitor is returning the "ok" without the OSD having said any such thing)
[22:08] <mikedawson> gotcha
[22:09] * dwt (~dwt@128-107-239-233.cisco.com) has joined #ceph
[22:12] * BManojlovic (~steki@fo-d- Quit (Read error: Operation timed out)
[22:12] <gregaf> well, I made a ticket (http://tracker.ceph.com/issues/4818) and that certainly explains why it's using so much memory
[22:18] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:18] * loicd (~loic@magenta.dachary.org) has joined #ceph
[22:19] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:20] <mikedawson> gregaf: is that bug hidden or something? I get a 403 "You are not authorized to access this page."
[22:20] <gregaf> hrm, it's in our sepia project which often concerns internals, so maybe the project is
[22:20] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:21] <mikedawson> ok
[22:21] <gregaf> ah, yep
[22:21] <gregaf> sorry
[22:21] <gregaf> didn't realize that
[22:30] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[22:34] <barryo1> does anyone have any experience using Dell's "value" SSD drives as journals?
[22:35] <barryo1> They're really cheap so I'm quite wary of using them
[22:36] * eschnou (~eschnou@131.165-201-80.adsl-dyn.isp.belgacom.be) Quit (Quit: Leaving)
[22:37] * mnash (~chatzilla@vpn.expressionanalysis.com) has joined #ceph
[22:38] <nhm> barryo1: I don't remember for sure, but I think those things are pretty slow.
[22:38] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has left #ceph
[22:38] <nhm> barryo1: I think I remember seeing 120MB/s for the small capacity ones.
[22:38] <barryo1> it's between that and nearline SAS
[22:40] <barryo1> I remember reading that with a decent controller you didn't need to worry about seperate journals, is a Dell H700 with 1GB NV Cache decent enough?
[22:41] <nhm> barryo1: For dell nodes, it may be better to just stick with SAS disks, throw the journals on the disks.
[22:41] <nhm> barryo1: I've had trouble getting good performance out of our R515s, but I think the R720xds performed a bit better. Sadly we don't have any in-house.
[22:42] <barryo1> It's 515's I'm looking at buying
[22:42] * Vjarjadian (~IceChat77@ has joined #ceph
[22:42] <nhm> barryo1: They work well enough for cheap bulk storage but they aren't speed demons.
[22:44] <barryo1> it'll mostly be used to host low i/o VM's so that should be ok
[22:47] <nhm> barryo1: ours have 8 disks in them and get about 300MB/s to the drives.
[22:47] <nhm> It's possible with some additional tuning we could get that up a bit.
[22:48] <barryo1> is that with journals on the osds?
[22:48] <nhm> yeah
[22:48] <nhm> and only 7 drives for OSDs
[22:48] <barryo1> nearline or real SAS?
[22:48] <nhm> nearline I think
[22:49] <nhm> it's been a while since I looked at them.
[22:49] <barryo1> thats not bad at all
[22:49] <nhm> barryo1: that's under very ideal testing.
[22:49] * sleinen (~Adium@2001:620:0:25:1424:ea6b:a127:4a70) Quit (Quit: Leaving.)
[22:50] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[22:50] <nhm> barryo1: basically 1 node, 7 OSDs, no replication.
[22:53] * jmlowe (~Adium@2001:18e8:2:28cf:f000::3ab8) Quit (Quit: Leaving.)
[22:55] * sleinen1 (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[22:58] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[22:58] * sleinen (~Adium@2001:620:0:26:4dec:c931:8746:c018) has joined #ceph
[23:03] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[23:03] <athrift> The R720XD's perform quite a bit better than the R51X series
[23:03] * sleinen1 (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:04] <athrift> better SAS routing, and 12 drives
[23:04] <athrift> also you get the two rear 2.5" for SSD
[23:08] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[23:10] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:10] * rustam (~rustam@ has joined #ceph
[23:11] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[23:12] <sjusthm> sagewk: having now actually looked at it, your scenario is probably correct for 3904
[23:12] <sjusthm> just pushed wip_3904
[23:15] <barryo1> athrift: sadly, the 720s won't meet my budget
[23:17] <athrift> barryo1: we managed to get ours down to around $7200 USD
[23:17] <paravoid> sjusthm: enjoying my bugs? :)
[23:17] <sjusthm> paravoid: oh, certainly
[23:17] <sjusthm> oh, you were 3904 as well
[23:17] <sjusthm> heh
[23:17] <athrift> with 12x 3TB NL-SAS, H310, x520 NDC, 1x Xeon 2670, 32GB ram
[23:18] <paravoid> yeah :)
[23:18] <athrift> with 2TB drives it was about $1000 less
[23:18] <paravoid> athrift: STAY AWAY FROM THE H310
[23:18] <paravoid> seriously
[23:18] <athrift> paravoid: why is that, we have had no issues with them
[23:19] <nhm> paravoid: isn't it just a SAS2008?
[23:19] <paravoid> no
[23:19] <paravoid> it's a piece of shit
[23:19] <athrift> it is a SAS2008
[23:19] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Quit: noahmehl)
[23:19] <paravoid> it is a SAS2008, it isn't "just" that
[23:19] <paravoid> sec.
[23:19] <athrift> you can reflash them with LSI firmware quite easily, but we have never needed to they work well for us
[23:20] <nhm> athrift: I'd stick with the H700 if you aren't using fast SSDs though.
[23:20] <nhm> athrift: actually, H710 is probably better.
[23:21] <nhm> athrift: not that it probably matters that much in the R515.
[23:21] <paravoid> problem no1 is http://en.community.dell.com/support-forums/servers/f/906/t/19480834.aspx
[23:21] <athrift> nhm: Why is the H710 better ? We only use R720XD's
[23:21] <paravoid> problem no2 is that reads on disk A block writes on disk B and vice versa
[23:22] <paravoid> try it
[23:22] <nhm> athrift: I think the H710 is a SAS2208 instead of a SAS2108.
[23:22] <paravoid> try writing sequentially to a random disk and reading from an entirely different disk
[23:22] <paravoid> you'll get reads ranging in the kilobytes per second
[23:22] <athrift> ok, luckily we have H710's sitting around
[23:23] <paravoid> r720xd is our platform too
[23:23] <paravoid> note that there's no migration path from H310 JBOD -> H710
[23:23] <paravoid> you need to reformat the box
[23:23] <paravoid> we've lost months doing that
[23:24] <paravoid> I even benchmarked it H310 vs. H710 in write-through
[23:24] <paravoid> there's no comparison really
[23:24] <nhm> btw, since you guys are interested in R515s and the H700, this is my post: http://lists.us.dell.com/pipermail/linux-poweredge/2012-July/046694.html
[23:26] <nhm> paravoid: that post you linked about reads blocking writes make me curious about the situation in my post where as soon as I have two writers going to the raid the performance tanks.
[23:26] <paravoid> h310?
[23:26] <nhm> paravoid: H700
[23:28] <nhm> btw, the iodepth=16 doesn't matter in those fio runs, not sure why I had it there.
[23:28] <paravoid> so I'm reading the mail I've written back then
[23:28] * sleinen (~Adium@2001:620:0:26:4dec:c931:8746:c018) Quit (Ping timeout: 480 seconds)
[23:29] <nhm> Interestingly they are selling the C8000 gear with standard LSI controllers.
[23:29] <nhm> I'm hoping to get some testing in on some C8220X nodes at some point here.
[23:29] <paravoid> so, busy-looping seq reads on 12 disks and trying to write to a 2-drive SSD RAID0 resulted in a write capacity of 30KB/s for the SSDs
[23:30] <paravoid> reading from 7 disks had 400-500KB/s, reading from 6 resulted in a jump to 30MB/s
[23:30] <nhm> wow
[23:30] <paravoid> no reads was 50MB/s
[23:30] <paravoid> that was basically a write(100 bytes); fsync workload
[23:31] <athrift> we are getting some C8220's soon as well, but for compute.
[23:31] * sleinen (~Adium@2001:620:0:26:edec:c0fd:9048:d23a) has joined #ceph
[23:31] <paravoid> H710 was 85MB/s consistently, independent of reads to other disks
[23:31] <paravoid> so, the LSI specsheet shows the controller having 8 ports
[23:31] <athrift> We thought of using them for Ceph, but dont want the mess of SAS cables going between the slots
[23:31] <nhm> Can you flash the H310 into a stock LSI card? It'd be interesting to see if you get the same results.
[23:31] <paravoid> the R720xd has 12 external bays + 2 internal
[23:31] <athrift> nhm: yes you can
[23:31] <paravoid> so there's probably a SAS expander in between
[23:31] <paravoid> that may be the culprit
[23:32] <athrift> paravoid: there is, check the Technical Guide
[23:32] <paravoid> so that may be why it sucks so much
[23:32] <nhm> paravoid: yes, both the R515 and the R720XD have SAS expanders, and I'm very suspicious that Ceph in general hates them.
[23:32] <paravoid> or firmware, who knows
[23:32] <athrift> but that wouldnt explain the difference in performance between the H310 and H710....
[23:33] <nhm> paravoid: though I've tested nodes with SAS expanders that don't suck, so it may come down to brand, or the drives being used, or some other crazy thing.
[23:33] <nhm> I wonder if Dell is using LSI expanders
[23:34] <nhm> athrift: yeah, the cables look to be a pain.
[23:35] <athrift> lsscsi shows the expander a BP12G+EXP
[23:35] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[23:36] <athrift> This discussion is making me reconsider SuperMicro even though they are more epensive
[23:36] * mib_tpw2p2 (5b09cd2c@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[23:37] <barryo1> We're a Dell shop so thats my only real choice
[23:38] <nhm> athrift: I have had very good performance with an SC847A chassis and multiple controllers. I imagine the 12-bay 2U node would perform very well with a pair of SAS9207-8is, 12 spinning disks, and 2 S3700 SSDs in the 2.5" bays.
[23:38] <nhm> or alternately no SSDs and a pair of 9265s.
[23:38] <nhm> wth the WB cache module.
[23:39] <nhm> er, BBU for wb rather.
[23:39] <athrift> Thanks Mark, I will try and get hold of one to test
[23:39] * kyle_ (~kyle@ip03.foxyf.simplybits.net) has joined #ceph
[23:40] <nhm> athrift: fwiw on the SC847a I can hit 2GB/s with rados bench using 24 spinning disks and 8 Intel 520 SSDs.
[23:40] <paravoid> I'm fairly happy with R720xd + H710s
[23:41] <paravoid> we had C2100s previously and boy, they sucked a lot
[23:41] <athrift> So for standard deployment, this sort of thing http://www.supermicro.com/products/chassis/2U/826/SC826BA-R1K28W.cfm ?
[23:43] <nhm> athrift: yeah, I don't have one those exact chassis, but I expect it'd be like a scaled down version of what I've got.
[23:44] <nhm> paravoid: what kind of performance can you get out of the R720xd? I haven't been able to test on one, just seen some numbers from someone else.
[23:45] * rustam (~rustam@ Quit (Remote host closed the connection)
[23:48] <barryo1> maybe the 720xd is more affordable than i thought
[23:49] * sleinen (~Adium@2001:620:0:26:edec:c0fd:9048:d23a) Quit (Quit: Leaving.)
[23:49] <nhm> barryo1: even just a single E5-2620 should be enough if it's just for OSDs.
[23:49] <nhm> Or a pair of E5-2403s if that's cheaper.
[23:50] <barryo1> It'll be osds and mon
[23:50] <barryo1> we have no need for mds at the moment
[23:51] <athrift> nhm: it looks like the 826 chassis has a SAS expander similar to the R720XD
[23:51] <nhm> athrift: the one you linked me claims to be a "direct attached" backplane.
[23:53] * drokita (~drokita@ has left #ceph
[23:53] <nhm> which is what the one I've got also claims (and has the appropriate number of SFF8087 ports for such claim)
[23:53] <athrift> nhm: you are right, the manual just doesnt correlate to it :)
[23:53] <athrift> nhm: teach me for RTFM
[23:53] <nhm> athrift: somewhere in there they may list both backplanes
[23:54] <nhm> athrift: I remember seeing the E16, E26, and A backplanes in the manual for my chassis.
[23:55] <athrift> nhm: Yes I found it, its the E5 backplane in this case
[23:59] <barryo1> I was hoping to finalise my spec tonight, now I have even more options to consider :s

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.