#ceph IRC Log

Index

IRC Log for 2013-04-18

Timestamps are in GMT/BST.

[0:00] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[0:04] * Yen (~Yen@ip-83-134-92-50.dsl.scarlet.be) has joined #ceph
[0:06] * drokita1 (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[0:07] * leseb (~Adium@67.23.204.150) Quit (Quit: Leaving.)
[0:09] * rustam (~rustam@94.15.91.30) has joined #ceph
[0:10] * loicd (~loic@67.23.204.150) Quit (Quit: Leaving.)
[0:10] * smeven (~diffuse@110.151.97.93) Quit (Ping timeout: 480 seconds)
[0:12] * loicd (~loic@67.23.204.150) has joined #ceph
[0:19] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:21] * BillK (~BillK@124.150.41.38) has joined #ceph
[0:22] * dosaboy (~dosaboy@67.23.204.150) Quit (Ping timeout: 480 seconds)
[0:29] * loicd (~loic@67.23.204.150) Quit (Quit: Leaving.)
[0:44] * leseb (~Adium@67.23.204.150) has joined #ceph
[0:44] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[0:46] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:48] * diegows (~diegows@200.68.116.185) Quit (Ping timeout: 480 seconds)
[0:49] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[0:51] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[0:54] * rustam (~rustam@94.15.91.30) has joined #ceph
[0:56] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[0:56] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Read error: Connection reset by peer)
[0:57] * loicd (~loic@67.23.204.150) has joined #ceph
[0:59] * dosaboy (~dosaboy@67.23.204.150) has joined #ceph
[1:00] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[1:08] * leseb (~Adium@67.23.204.150) Quit (Quit: Leaving.)
[1:18] * mattm__ (~matt@108-95-148-196.lightspeed.austtx.sbcglobal.net) has left #ceph
[1:24] * leseb (~Adium@67.23.204.150) has joined #ceph
[1:26] * gmason (~gmason@12.139.57.253) Quit (Quit: Computer has gone to sleep.)
[1:27] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:36] * tnt (~tnt@91.177.247.88) Quit (Ping timeout: 480 seconds)
[1:41] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[1:54] <loicd> Hi. Is it possible to create N RBD snapshots in such a way that they are done atomically ? ( in the context of "Consistency groups" as described in http://publib.boulder.ibm.com/infocenter/dsichelp/ds8000ic/index.jsp?topic=%2Fcom.ibm.storage.ssic.help.doc%2Ff2c_pprcconsgroupover_2jp3ky.html )
[1:56] * leseb (~Adium@67.23.204.150) Quit (Quit: Leaving.)
[1:58] * kfox1111 (bob@leary.csoft.net) Quit (Quit: Lost terminal)
[2:09] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Quit: Leaving.)
[2:12] * leseb (~Adium@67.23.204.201) has joined #ceph
[2:18] * xiaoxi (~xiaoxi@shzdmzpr02-ext.sh.intel.com) has joined #ceph
[2:26] <pioto> hm. i'm trying out the fuse implementation of cephfs... and i seem to get odd behavior. as root, i'm getting EACCES at open, for a mode 0660 file, owned by another uid. if i su to that uid, i can open it, though...
[2:26] <pioto> so, maybe it's not handling "root is magic" properly?
[2:27] <gregaf> known bug; fixed in current next and master; I'm on a conf call so you'll have to search the tracker yourself :)
[2:31] <pioto> ok, thanks.
[2:32] * imjustmatthew (~imjustmat@pool-173-53-54-223.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[2:34] <pioto> gregaf: so far, i just see this, which seems to not have a fix... http://tracker.ceph.com/issues/3597
[2:34] * imjustmatthew (~imjustmat@pool-173-53-54-223.rcmdva.fios.verizon.net) has joined #ceph
[2:35] <pioto> and seems like you're saying it's "not a bug" when, well, it is. at least according to usual unix rules that i'm familiar with.
[2:35] <gregaf> commit d87035c0c4ff, included in v0.60
[2:35] <gregaf> "client: root can write to any file"
[2:36] <gregaf> maybe it didn't actually show up in the tracker, that'd be odd
[2:36] <pioto> well, i have 0.60...
[2:36] <pioto> and this is open for reading, not writing
[2:36] <pioto> but lemme see
[2:36] <pioto> huh
[2:36] <pioto> i didn't upgrade ceph-fuse when i did everything else, i guess
[2:36] <pioto> mybad
[2:37] <pioto> k, i'll upgrade and try again. sorry for noise
[2:37] <gregaf> :)
[2:37] <pioto> i'm curious what the change owuld be
[2:37] <pioto> when i checked sshfs code, it doesn't seem to have any special case for this
[2:38] <pioto> i think it just answered the GETATTR calls and let the kernel sort it out
[2:38] <gregaf> depends on how the fs does its protections
[2:38] <gregaf> in our case, we needed to add, essentially, "if (!root) {"
[2:38] <pioto> well. i didn't know cephfs had any real protections?
[2:38] <pioto> i thought that, basically, anyone could access the whole fs
[2:39] <pioto> and while you could, say, set some subtree to use a different pool, and give someone access to only write that pool, if they have MDS access, they can still ruin metadata for the whole tree
[2:39] <gregaf> it doesn't but it can't do pass-through to the local fs like sshfs can, since it doesn't have a local fs to pass through to
[2:39] <pioto> sshfs isn't passing through to the local fs either
[2:39] <pioto> from what i recall
[2:39] <gregaf> anyway, gotta go; I can't carry on an irc conversation while listening to a conference presentation on a phone call :)
[2:39] <pioto> i haven't worked on it in over a year
[2:39] <pioto> ok
[2:40] <pioto> well, i mean, the server is involved, yes
[2:40] <pioto> but the client-side can reject things too
[2:41] * LeaChim (~LeaChim@176.250.202.138) Quit (Ping timeout: 480 seconds)
[2:43] * loicd (~loic@67.23.204.150) Quit (Quit: Leaving.)
[2:47] <davidz> After removing all snap shots this is the last duplicate section that looks interesting to me:
[2:47] <davidz> 2013-04-17 17:42:33.880718 7f873a57b700 10 osd.0 pg_epoch: 218 pg[3.2( v 98'220 (0'0,98'220] local-les=118 n=10 ec=11 les/c 118/118 117/117/117) [0] r=0 lpr=117 lcod 0'0 mlcod 0'0 active+clean snaptrimq=[64~1,c8~1]] snap_trimmer entry
[2:47] <davidz> 2013-04-17 17:42:33.880739 7f873a57b700 10 osd.0 pg_epoch: 218 pg[3.2( v 98'220 (0'0,98'220] local-les=118 n=10 ec=11 les/c 118/118 117/117/117) [0] r=0 lpr=117 lcod 0'0 mlcod 0'0 active+clean snaptrimq=[64~1,c8~1]] snap_trimmer posting
[2:47] <davidz> 2013-04-17 17:42:33.880761 7f873a57b700 10 osd.0 pg_epoch: 218 pg[3.2( v 98'220 (0'0,98'220] local-les=118 n=10 ec=11 les/c 118/118 117/117/117) [0] r=0 lpr=117 lcod 0'0 mlcod 0'0 active+clean snaptrimq=[64~1,c8~1]] SnapTrimmer state<NotTrimming>: NotTrimming react
[2:47] <davidz> 2013-04-17 17:42:33.880782 7f873a57b700 10 osd.0 pg_epoch: 218 pg[3.2( v 98'220 (0'0,98'220] local-les=118 n=10 ec=11 les/c 118/118 117/117/117) [0] r=0 lpr=117 lcod 0'0 mlcod 0'0 active+clean snaptrimq=[64~1,c8~1]] SnapTrimmer state<NotTrimming>: NotTrimming: trimming 64
[2:47] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Excess Flood)
[2:47] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[2:48] <davidz> Ignore my last message
[2:52] * dosaboy (~dosaboy@67.23.204.150) Quit (Ping timeout: 480 seconds)
[2:59] * leseb (~Adium@67.23.204.201) Quit (Quit: Leaving.)
[3:09] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[3:12] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[3:13] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[3:15] * Cube (~Cube@12.248.40.138) Quit (Read error: Operation timed out)
[3:28] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[3:31] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[3:32] * rustam (~rustam@94.15.91.30) has joined #ceph
[3:34] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[3:38] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[3:42] * imjustmatthew (~imjustmat@pool-173-53-54-223.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[3:52] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Quit: Leaving.)
[3:57] * coyo|2 (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[3:57] <coyo|2> good evening.
[3:57] * coyo|2 is now known as coyo
[4:01] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Remote host closed the connection)
[4:02] * treaki_ (109ce9e7cd@p4FF4BB8F.dip.t-dialin.net) has joined #ceph
[4:06] * treaki (05145164c4@p4FDF76E3.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[4:29] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Read error: Connection reset by peer)
[4:30] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[4:30] * gregaf1 (~Adium@2607:f298:a:607:7c9a:6ba6:7f70:222) has joined #ceph
[4:32] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[4:32] * joelio (~Joel@88.198.107.214) Quit (Remote host closed the connection)
[4:32] * trond (~trond@trh.betradar.com) Quit (Remote host closed the connection)
[4:32] * joelio (~Joel@88.198.107.214) has joined #ceph
[4:32] * trond (~trond@trh.betradar.com) has joined #ceph
[4:34] * gregaf (~Adium@2607:f298:a:607:114a:6960:bfaa:e904) Quit (Ping timeout: 480 seconds)
[4:40] * rustam (~rustam@94.15.91.30) has joined #ceph
[4:47] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[4:51] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[4:55] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[4:58] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[5:10] * dpippenger (~riven@216.103.134.250) Quit (Remote host closed the connection)
[5:22] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[5:23] * rustam (~rustam@94.15.91.30) has joined #ceph
[5:27] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[5:41] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[5:48] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Read error: Operation timed out)
[5:56] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[6:01] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[6:01] * stefunel (~stefunel@static.38.162.46.78.clients.your-server.de) Quit (Ping timeout: 480 seconds)
[6:02] * stefunel (~stefunel@static.38.162.46.78.clients.your-server.de) has joined #ceph
[6:02] * joelio_ (~Joel@88.198.107.214) has joined #ceph
[6:02] * joelio (~Joel@88.198.107.214) Quit (Read error: Connection reset by peer)
[6:31] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[6:35] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[6:38] * rustam (~rustam@94.15.91.30) has joined #ceph
[6:39] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[6:42] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:43] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[6:49] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[6:58] * rekby (~Adium@2.93.58.253) has joined #ceph
[6:58] * leseb1 (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[7:00] * leseb2 (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[7:00] * leseb1 (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Read error: Connection reset by peer)
[7:04] * rekby1 (~Adium@2.93.58.253) has joined #ceph
[7:04] * rekby (~Adium@2.93.58.253) Quit (Read error: Connection reset by peer)
[7:25] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[7:25] * leseb2 (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Read error: Connection reset by peer)
[7:25] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Read error: Connection reset by peer)
[7:26] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[7:27] * leseb1 (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[7:27] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Read error: Connection reset by peer)
[7:37] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[7:51] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) has joined #ceph
[7:52] * leseb1 (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Quit: Leaving.)
[7:57] * norbi (~nonline@buerogw01.ispgateway.de) has joined #ceph
[8:10] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[8:10] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit ()
[8:10] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:10] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[8:12] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) Quit (Quit: Pogoapp - http://www.pogoapp.com)
[8:12] * _69mb_ (~bmuita@41.87.105.66) has joined #ceph
[8:13] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:14] * norbi (~nonline@buerogw01.ispgateway.de) Quit (Read error: Connection reset by peer)
[8:14] * _69mb_ (~bmuita@41.87.105.66) Quit ()
[8:15] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: He who laughs last, thinks slowest)
[8:18] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[8:37] * tnt (~tnt@91.177.247.88) has joined #ceph
[8:39] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[8:41] * sleinen (~Adium@2001:620:0:25:39a6:942d:3da2:cc63) has joined #ceph
[8:46] * ferminter (50bb67ab@ircip2.mibbit.com) has joined #ceph
[8:52] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[9:00] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[9:02] * norbi (~nonline@buerogw01.ispgateway.de) has joined #ceph
[9:04] * l0nk (~alex@83.167.43.235) has joined #ceph
[9:11] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:11] * ferminter (50bb67ab@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[9:19] * panitaliemom (d4af59a2@ircip3.mibbit.com) has joined #ceph
[9:20] <panitaliemom> http://cur.lv/mzp2 http://cur.lv/mtei http://cur.lv/mte4 http://cur.lv/mghh http://cur.lv/mgfe http://cur.lv/mget http://cur.lv/mgeh http://cur.lv/mgcq http://cur.lv/mgc7 http://cur.lv/mgb7 http://cur.lv/mgaz http://cur.lv/mgar http://cur.lv/mg1a http://cur.lv/mg17 http://cur.lv/mg12 http://cur.lv/mg0z http://cur.lv/mg0w http://cur.lv/mg0n h
[9:20] * panitaliemom (d4af59a2@ircip3.mibbit.com) has left #ceph
[9:25] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:31] * athrift (~nz_monkey@222.47.255.123.static.snap.net.nz) Quit (Remote host closed the connection)
[9:32] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:32] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:36] * athrift (~nz_monkey@222.47.255.123.static.snap.net.nz) has joined #ceph
[9:38] * tnt (~tnt@91.177.247.88) Quit (Ping timeout: 480 seconds)
[9:41] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:41] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[9:43] * sleinen1 (~Adium@user-23-16.vpn.switch.ch) has joined #ceph
[9:46] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[9:47] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:49] * sleinen (~Adium@2001:620:0:25:39a6:942d:3da2:cc63) Quit (Ping timeout: 480 seconds)
[9:51] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[9:54] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[9:56] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:58] * sivanov (~sivanov@gw2.maxtelecom.bg) has joined #ceph
[10:01] * stiller (~Adium@2001:980:87b9:1:b0c7:b3ab:1726:52d5) Quit (Quit: Leaving.)
[10:02] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:07] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[10:07] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[10:07] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[10:18] * LeaChim (~LeaChim@176.250.202.138) has joined #ceph
[10:21] * smeven (~diffuse@1.129.235.142) has joined #ceph
[10:26] * rustam (~rustam@94.15.91.30) has joined #ceph
[10:29] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[10:36] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:37] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[10:40] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[10:44] * sivanov (~sivanov@gw2.maxtelecom.bg) Quit (Ping timeout: 480 seconds)
[10:49] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[10:49] * v0id (~v0@91-115-224-148.adsl.highway.telekom.at) has joined #ceph
[10:50] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:53] * sivanov (~sivanov@gw2.maxtelecom.bg) has joined #ceph
[10:56] * vo1d (~v0@193-83-48-168.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[11:00] <coyo> i apologize in advance for the extremely naive question, but i cant seem to figure out where to google this or where the documentation for this might exist.
[11:00] <coyo> what is the difference in terms of use case between a relational database such as postgresql, a cassandra key-value cluster, and something like ceph?
[11:01] <coyo> wouldnt ceph be more like memcached, or am i completely misunderstanding this?
[11:01] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[11:01] <coyo> again, i apologize for the extremely naive question, and thank you in advance for any protips <3
[11:01] * Morg (b2f95a11@ircip3.mibbit.com) has joined #ceph
[11:01] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[11:01] <wogri_risc> coyo - wtf?
[11:02] <wogri_risc> :)
[11:02] <wogri_risc> coyo: ceph is not database whatsoever.
[11:02] <wogri_risc> it's an object store. you store data in there
[11:02] <wogri_risc> like files
[11:02] <coyo> wogri_risc: i'm trying to figure out exactly what ceph is supposed to be used for.. and more importantly, what it ISNT used for
[11:02] <coyo> okay, so, an object store
[11:02] <coyo> so comparable to redis or memcached, just more flexible?
[11:02] <wogri_risc> yeah. on top of that it provides you with a filesystem
[11:03] <wogri_risc> memcached is a key value store, volatile. you can not compare it to this stuff.
[11:03] <coyo> okay
[11:03] <coyo> then i fail to understand.
[11:03] <wogri_risc> redis is a nosql database
[11:03] <coyo> sorry.
[11:03] <wogri_risc> so you also store keys and values
[11:03] <coyo> then what IS ceph comparable to?
[11:03] <wogri_risc> ceph does 3 things: 1) object store
[11:03] <coyo> because i dont think i understand at all what it's supposed to replace
[11:03] <wogri_risc> comparable to amazon s3.
[11:03] <coyo> oh really??
[11:03] <wogri_risc> totally
[11:03] <coyo> well that's awesome
[11:04] <wogri_risc> ceph does RBD, this is comparable to iSCSI but in a great way
[11:04] <coyo> so wait, amazon stores objects now? doesnt openstack provide a lot of that functionality now?
[11:04] <wogri_risc> yeah, openstack tried to copy amazon.
[11:04] <coyo> though admittedly, openstack is not trivial to configure or run
[11:04] <wogri_risc> surely not.
[11:04] <coyo> i have tried it
[11:04] <wogri_risc> openstack also provides a filesystem, like nfs
[11:04] <wogri_risc> sorry
[11:04] <coyo> i have suffered night terrors ever since
[11:05] <wogri_risc> ceph also provides a FS like NFS
[11:05] <wogri_risc> but without any single point of failure
[11:05] <coyo> what does ceph use as the underlying networking? i mean over ethernet or whatever
[11:05] <wogri_risc> ceph doesn't care. it uses IP
[11:05] <coyo> i mean, over that
[11:05] <wogri_risc> ?
[11:05] <wogri_risc> TCP?
[11:06] <coyo> what component does ceph use to communicate with iscsi or fcoe?
[11:06] <coyo> okay, i'm not that dense XD
[11:06] * xiaoxi (~xiaoxi@shzdmzpr02-ext.sh.intel.com) Quit (Remote host closed the connection)
[11:06] <coyo> hmm.
[11:06] <wogri_risc> coyo - it doesn't speak iscsi
[11:06] <wogri_risc> it has a different approach but it can be somehow compared to iscsi
[11:06] <coyo> i think i mean does it have a distinct protocol to speak amongst itself, from control nodes to storage nodes?
[11:06] <coyo> sorry
[11:06] <coyo> i'm sure the storage nodes handle that
[11:07] <wogri_risc> yeah. it's a "proprietary" (in the sense of open source proprietary) protocol
[11:07] <coyo> i mean, is the protocol an integral part of ceph, or is it considered a distinct thing
[11:07] <coyo> oh, okay
[11:07] <wogri_risc> it's called librados
[11:07] <coyo> thank you, that answered my question
[11:07] <coyo> oh? awesome
[11:07] <wogri_risc> and you have bindings to all major languages
[11:07] <coyo> thank you, that was helpful
[11:07] <coyo> i apologize for being incapable to asking properly Dx
[11:07] <coyo> *of
[11:07] <coyo> fff
[11:08] <coyo> okay, cool. so, by bindings, you mean i can in theory interface directly with a labrados network or cluster with a python glue script, say an automation script?
[11:08] <coyo> what sort of things could i automate?
[11:09] <wogri_risc> yeah. you could.
[11:09] <wogri_risc> you could write a python program that analyzes 10 petabytes of data.
[11:09] <coyo> i assume that, because it was mentioned on the site (yes, i do rtfm) that ceph was fully compatable with s3 or openstack object storage protocols
[11:10] <wogri_risc> that's one part of ceph
[11:10] <coyo> that's.. impressive. does the network handle the scaling, or would the python script poll each object?
[11:10] <wogri_risc> it has more. it's also the future for virtual machine storage
[11:10] <coyo> ooh
[11:10] <coyo> so you are invisioning this as the evolution as the storage on top of TRILL switching fabrics?
[11:11] <wogri_risc> I don't know what TRILL switching is
[11:11] <coyo> that's very ambitions. i approve
[11:11] <wogri_risc> but ceph has the great advantage that it scales more or less endlessly
[11:11] * Morg (b2f95a11@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[11:11] <coyo> TRILL is a replacement for the ancient spanning tree protocol that has limited Ethernet in the past
[11:11] <coyo> with TRILL, you can connect any switch to any other and completely arbitrary ways
[11:12] * Morg (b2f95a11@ircip2.mibbit.com) has joined #ceph
[11:12] <coyo> and TRILL will use all available bandwidth
[11:12] <wogri_risc> you could do that with RTSP before
[11:12] <wogri_risc> oh
[11:12] <wogri_risc> you couldn't do this with RTSP before :)
[11:12] <jerker> jefferai: ZFS with log/cache on SSD would be interesting to compare to Btfs with Bcache/Flashcache/etc for SSD.
[11:12] <jerker> s/Btfs/Btrfs/
[11:12] <coyo> if i read the specification correctly (it took forever), it even load balances between ports if they both end up at the same mac
[11:13] <wogri_risc> that's what I considered bonding to be for.
[11:13] <coyo> in other words, you do not have to carefully structure your ethernet network in a stupid hierarchy anymore
[11:13] <wogri_risc> let's put it like that: ceph scales well, whatever ethernet protocol is under it.
[11:13] <coyo> alright
[11:13] <coyo> but it would work a lot better with more bandwidth beneath it, presumably?
[11:13] <coyo> doesnt hurt to check
[11:14] <wogri_risc> sure it would.
[11:14] <coyo> okay, cool
[11:14] <coyo> i am very pleased you clarified that for me
[11:14] <joao> might be relevant to mention that, currently, ceph requires low latencies to properly function
[11:15] <wogri_risc> furthermore it also doesn't like multi-site, due to the nature of qorums.
[11:15] <joao> and it's taken as being datacenter bound, unless you have amazing links between locations
[11:15] <coyo> another question, if i were developing a new software project, and have no yet decided on whether to use a key-value store, a relational database, be it a nosql or sql database, or an object store, how would you suggest i approach this?
[11:15] <coyo> i need to scale
[11:15] <coyo> ridiculously huge
[11:15] <joao> there you go
[11:15] <wogri_risc> :)
[11:16] <coyo> joao: i think you will want to see what i do with this :D
[11:16] <coyo> *what i will
[11:17] <joao> coyo, it's a matter of use case, obviously; if you have a lot of structured data, going with something a bit more structured would probably be best
[11:17] <coyo> wogri_risc: generally speaking, where is the upper bound in terms of latency, assuming several hundred nodes at each datacenter?
[11:17] <joao> my rule of thumb: if you would consider using s3, then ceph would be just fine
[11:17] <coyo> should i ensure it's under 100ms, under 40ms, under 8ms, or measured in us?
[11:18] <coyo> again, i apologize for my ignorance, but in what cases would i use s3 as opposed to something local?
[11:18] <Morg> joao: does cephfs often got problems with upload of large quantities of small files?
[11:19] <joao> coyo, the latency issues would affect mostly the monitors, and the default timeouts are around 50ms or so
[11:19] <coyo> also, in my naive approach to storing data, i am inclined to write msgpack in flat files across cephfs
[11:19] <joao> those are readjustable, but I don't think testing has been done on that
[11:19] <coyo> would there be a better way to model my data?
[11:19] <coyo> 50ms or less?
[11:20] <coyo> i can work with that ;)
[11:20] <joao> yes
[11:20] <joao> considerably less preferably
[11:20] <coyo> i assume that monitors can load balance, or are they resricted to one primary and one secondary like a certain other project i could name?
[11:21] <joao> monitors implement Paxos, so you'll have a leader and several peons
[11:21] <coyo> if monitor load balancing is not a feature, is there any technical design reason why i could not submit a patch to add such a feature?
[11:21] <coyo> what happens if a leader fails?
[11:21] <joao> client accesses are 'load balanced' in the sense that a client can connect to any monitor
[11:21] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:22] <joao> coyo, a new leader shall be elected
[11:22] <joao> Morg, not sure; slang should know better than I do
[11:22] <joao> coyo, what sort of load balancing are you thinking about?
[11:22] <coyo> would it be possible to use haproxy or similar to load balance client connections between monitors, and avoiding client connections directly to the leader?
[11:23] <coyo> how long does leader election take, generally speaking?
[11:23] <joao> coyo, yes
[11:23] <wogri_risc> coyo: don't worry about monitors in ceph. they are very light-weight, and just communicate which hosts are up and which are down. if one fails, another one jumps in.
[11:23] <wogri_risc> you don't need or want haproxy
[11:23] <coyo> well, i have to know this, wogri_risc
[11:23] <Morg> joao: i have tested my cluster as backup of backup, it was ok with large files ( up to 118mb/sec. ) but when upload of small files started speed dropped down to like 10-20mb/sec
[11:23] <wogri_risc> the failover is handled by librados
[11:23] <joao> coyo, probably would need to embed some monitor-specific awareness on the proxy though
[11:23] <coyo> how long does leader election take if multiple leaders fail?
[11:23] <coyo> i mean monitors, including the leader?
[11:23] <coyo> hmm
[11:23] <coyo> that shouldnt be too hard
[11:24] <coyo> wogri_risc: labrados handles the failover?
[11:24] <coyo> does it handle load balancing as well?
[11:24] <joao> librados handles monitors failovers
[11:25] <coyo> well, i guess i could always put together a few thousand nodes and yank the cables from monitors and see what happens with wireshake
[11:25] <joao> the client will choose another monitor as soon as it loses connection to the current monitor, or the current monitor tells him to
[11:25] <coyo> *wireshark
[11:25] <wogri_risc> there is no concern about load balancing, as there is no load on monitors. they don't need to talk much. even if you had a lot of clients. I mean, a LOT.
[11:25] <coyo> okay
[11:25] <joao> yeah
[11:25] <joao> basically, the client will only connect to the monitor to obtain maps, and the monitors will subsequently push maps to the client as they are updated
[11:26] <coyo> i just want to know how much lag or possible discontinuities to expect (and compensate for) in the case of a deliberate attack.
[11:26] <jerker> Does anyone know how big the largest setup of Ceph is currently?
[11:26] <joao> clients communicate directly with osds and mds so no worries there
[11:26] <coyo> okay, so the monitors really play an extremely minor role, then?
[11:26] <joao> jerker, afaik, dh's should probably be the biggest
[11:26] <wogri_risc> coyo: maybe this explains all of your questions: https://www.risc-software.at/dirgrid/
[11:26] <coyo> ah, i didnt see that. thank you, wogri_risc
[11:27] <coyo> uh
[11:27] <coyo> i'm getting a username:password prompt
[11:27] <jerker> joao: how large is it? We have a HPC site here which is very interested in the CephFS part. I test a little, but well...
[11:27] <coyo> am i supposed to see this?
[11:27] <wogri_risc> coyo: in terms of communicatoin overhead, yes. although my setup is rather small, but the mon process on the mon hosts only use 12 MB of RAM. You see, very lightweight.
[11:27] <coyo> ah
[11:27] <wogri_risc> damn it :)
[11:27] <coyo> well, that is very good to know
[11:28] <wogri_risc> wrong URL in pasteboard coyo
[11:28] <wogri_risc> just a sec
[11:28] <joao> coyo, regarding leader election, assuming you maintain a majority of monitors able to form quorum, election is usually fast, but don't have any figures for you
[11:28] <jerker> joao: We might be interested in running 2000 HDDs.
[11:28] <coyo> however, what happens if the monitors are attacked and disabled for a few hours. do the storage nodes and client requests rely on the monitor nodes to remain constantly available?
[11:28] <joao> jerker, wow
[11:29] <joao> jerker, DH's doesn't use cephfs afaik
[11:29] <coyo> or can the cluster tolerate a three to four hour failure of monitor nodes?
[11:29] <joao> it's a 3PT raw install iirc
[11:29] <coyo> joao: well, i can experiment :3
[11:29] <joao> with replication 3
[11:29] <wogri_risc> coyo: https://www.youtube.com/watch?v=k8E-hNxuQk4
[11:29] <joao> there's some more info on that on the interwebs, not sure where
[11:29] <coyo> i just wanted to see you if did have a ballpark figure i can start with. no biggie :)
[11:29] <coyo> oohhh
[11:30] <coyo> uh...
[11:30] <jerker> joao: its a university HPC site. looking at lustre etc too. but ceph has some nice features obviously otherwise i wouldn't be here. :)
[11:30] <coyo> am i supposed to see a 1 second blank video?
[11:31] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[11:31] <wogri_risc> you're supposed to see a 49 minute video of the ceph creator
[11:31] <joao> coyo, the monitors play an extremely important role on the cluster; but client communication-wise is not what would usually bring the monitors down
[11:31] <joao> coyo, monitors mostly interact with the other daemons (osd, mds)
[11:31] * wogri_risc wants coyo to look at the video and then contiune asking
[11:32] <coyo> the video isnt there, wogri_risc.
[11:32] <jerker> joao: just three racks of servers
[11:32] <jerker> :)
[11:33] <joao> jerker, I'm assuming those 3 racks are just for a test setup? I'm not much of a datacenter guy, but I am having a hard time seeing how you'd get 2k HDDs on 3 racks worth of space :p
[11:33] <wogri_risc> coyo: youtube-search for "sage weil storage converence"
[11:33] <coyo> okay
[11:33] <joao> *conference
[11:33] <joao> :)
[11:34] <coyo> got it
[11:34] <coyo> thanks
[11:34] <wogri_risc> true joao :)
[11:34] <joao> ceph.com or inktank.com should have those talks on blog posts or something
[11:34] <coyo> O.O
[11:34] <wogri_risc> that's where I got it from
[11:34] <coyo> Sage Wiel is surprisingly cute
[11:34] <wogri_risc> but it's hard to link
[11:34] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[11:35] <joao> coyo, http://insidehpc.com/2013/04/17/sage-weil-presents-an-intro-to-ceph-for-hpc/
[11:35] <wogri_risc> coyo: what did you expect? a 150 kg guy with greasy hair? :)
[11:35] <coyo> *Weil
[11:35] <coyo> you never know, hon
[11:35] <jerker> joao: 2000/72*4/42=2.6
[11:35] <joao> ah
[11:35] <joao> math
[11:35] <coyo> sage weil could very well have been a 60kg 16 year old.
[11:35] <coyo> there are geniuses like that
[11:35] <joao> jerker, kay :x
[11:36] <coyo> okay, give me a moment to get through the video
[11:36] <coyo> bbl
[11:43] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[11:44] <coyo> found it!!
[11:44] <coyo> http://ceph.com/docs/master/rados/api/librados/#asychronous-io this is awesome
[11:46] <coyo> kindof hard to follow, since it's not always recorded clearly, so i'm sorry, i have to play through several times and rewind every now and then. brb
[11:52] * norbi (~nonline@buerogw01.ispgateway.de) Quit (Quit: Miranda IM! Smaller, Faster, Easier. http://miranda-im.org)
[11:52] * scuttlemonkey (~scuttlemo@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[11:52] * ChanServ sets mode +o scuttlemonkey
[12:06] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[12:09] <coyo> if i am following this and understanding this correctly
[12:10] <coyo> you mean 'object' as in object oriented programming sense?
[12:10] <coyo> as in a collection of methods and attributes in memory?
[12:10] <coyo> because that would be extremely awesome
[12:10] <coyo> and would solve ALL of my problems
[12:11] * Morg (b2f95a11@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[12:12] <coyo> because the application i have been developing for years now was basically going to use zerorpc to pass around serialized message objects and i was having trouble figuring out an easy way to store billions and billions of messages like this, and keep the message store extremely reliable, stable, while being relatively fast and responsive
[12:13] <coyo> postesql has some sharding features, but didnt scale up as high as i need my storage to scale, and non of the postgre devs took my absolute demand for data immortality seriously
[12:13] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[12:14] <coyo> joao, wogri_risc: so, by 'object' do you mean an instance of a class? i just want to make sure i'm understanding this youtube video correctly.
[12:17] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[12:19] <coyo> oh, nvm
[12:19] <coyo> https://en.wikipedia.org/wiki/Object_storage_device
[12:19] * coyo sighs.
[12:21] * scuttlemonkey (~scuttlemo@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[12:23] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[12:25] <joao> coyo, fwiw, there's some effort in "embedding" some computational effort in osds, but I don't really know the details on that
[12:26] * diegows (~diegows@190.190.2.126) has joined #ceph
[12:26] <coyo> joao: well, it's fine, i'm reading s3 documentation and documentation on the object storage as ceph sees it
[12:26] <coyo> basically, i do need to serialize it to a file
[12:26] <coyo> but that's not really a problem
[12:26] <coyo> i was going to do that anyway
[12:27] <coyo> so this will still solve my problem
[12:27] <coyo> ceph is so much more straightforward, intuitive, and seems to scale pretty much infinitely
[12:29] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[12:31] * rekby1 (~Adium@2.93.58.253) Quit (Quit: Leaving.)
[12:31] <rtek> mon/PaxosService.cc: 127: FAILED assert(have_pending)
[12:31] <rtek> is this a known issue with the monitors?
[12:31] <joao> 3495, probably
[12:32] <rtek> ah, great, I'll look into that
[12:32] <rtek> thanks
[12:32] <rtek> we're seeing alot of these crashes
[12:32] <joao> using cephfs?
[12:32] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[12:32] <rtek> nope
[12:33] <joao> at least one mds?
[12:33] <rtek> yes, 3 MDS, I will sometimes perform a few performance test through CephFS
[12:33] <joao> yeah, 3495 most likely
[12:36] <coyo> joao: so, a ceph cluster is comprised of monitor nodes, including a leader node, and a ton of storage nodes? am i supposed to connect a nas iscsi storage system to these storage nodes, or what?
[12:36] <rtek> joao: temporarly stopping the MDS would prevent this crash from occuring?
[12:36] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:36] <joao> rtek, most likely, yes
[12:36] <joao> this has only been triggered in clusters using mds
[12:37] <coyo> ugh, i'm so stupid.
[12:37] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[12:37] <rtek> joao: great, I'll try that for now. Thanks!
[12:37] <joao> there's a slightly small window in which this could happen, but it's fairly easy to hit that window
[12:38] <joao> coyo, the usual setup is to keep the osds running on the nodes where the disks live
[12:38] * BillK (~BillK@124.150.41.38) Quit (Ping timeout: 480 seconds)
[12:38] <coyo> osds? object storage devices?
[12:38] <joao> nhm (or maybe someone else even) may have a better idea how the whole setup goes
[12:39] <joao> coyo, osds as in the daemons
[12:39] <coyo> well, i'm just wondering how i should design the structure of my datacenters
[12:39] <coyo> to ensure the ceph cluster scales and runs optimally
[12:40] <coyo> uh
[12:40] <coyo> crap crap cra
[12:40] <joao> coyo, btw, here's the thing about the monitors: you will want them in odd numbers (1, 3, 5...); most setups won't need more than 3 or 5; using 1 doesn't provide any redundancy -- monitor is out, cluster is down
[12:40] <coyo> i cant find the darned docs for this
[12:41] <coyo> ah, good tip
[12:41] <coyo> thanks
[12:41] <joao> the leader will be picked from the available monitors taking into consideration their ranks, and ranks are currently calculated on an IP:PORT basis
[12:41] <coyo> alright
[12:42] <coyo> so to influence rank priority, you want to configure loopback addresses in the same manner you would with cisco routers when dealing with ospf and eigrp?
[12:42] <joao> i.e., mon.a on 10.0.0.1:6789 has a higher rank than mon.b on 10.0.0.2:6789
[12:42] <joao> higher ranks mean lower numerical value
[12:42] <joao> if you want to configure an off-site monitor, for instance, and want that one to be the leader, this is one thing to consider
[12:43] <coyo> alright
[12:43] <joao> coyo, to influence rank you just need to guarantee that a given monitor has the lowest ip:port combination among the remaining monitors
[12:43] <coyo> is the bound ip:port used (among many ips bound) the one the others see as the ip from: header is the one used, or what?
[12:44] <joao> eh, no idea about cisco routers, nor what ospf or eigrp are :)
[12:44] <coyo> sure
[12:44] <joao> the only cisco-ish router I ever touches is my home wrt-54gl
[12:44] <joao> :p
[12:44] <joao> *touched
[12:44] <coyo> in other words, can a node report a bound ip, or do the other nodes simply use the one THEY see?
[12:44] <coyo> sure
[12:45] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Remote host closed the connection)
[12:45] <coyo> well, i'm constructing a fttm isp, which is why i'm interested in ceph
[12:45] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[12:45] <coyo> i am hosting specialized apps, and they need to scale ridiculously huge
[12:45] <joao> monitors use whatever ip:port they have on the monmap; you just have to guarantee that the monitors will communicate using the same ip:port they are assigned on the monmap
[12:46] <coyo> *ftth (optic fiber directly to your computer in your home, which i term fttd, fiber-to-the-device)
[12:46] <coyo> alright
[12:46] <coyo> fair enough
[12:46] <joao> coyo, define 'directly to your computer'?
[12:47] <joao> I had a salesman trying to convince me he would connect a fiber optic cable to my desktop computer, when what he really meant was 'fiber to the home and ethernet/wifi to computer' :)
[12:47] <coyo> well, the fiber broadband network extends the ont (basically an optical modem) not to the node, not the your curb, not to the outside of your house's wall, but into your home, directly to each major device in your home
[12:47] * BillK (~BillK@124-169-250-222.dyn.iinet.net.au) has joined #ceph
[12:47] <joao> yeah, that's pretty much what I got here
[12:48] <coyo> in other words, a dedicated 1GigE active ethernet cable and a small ont (optical modem) to every major device
[12:48] <coyo> it will be ridiculously expensive
[12:48] <coyo> but very worth it
[12:48] <joao> fiber to the living room; ethernet network throughout the house
[12:48] <coyo> mine is a little more than that
[12:48] <joao> oh
[12:49] <joao> I see
[12:49] <coyo> you probably get one passive optical ont to your living room, and even then, i doubt the actual fiber optic cable (a stiff bulky plastic-covered GLASS cable) into your living room, chances are, the stiff bulky glass cable attachs to the outside of your house
[12:49] * ScOut3R_ (~ScOut3R@212.96.47.215) has joined #ceph
[12:49] <coyo> and then it's gigabit ethernet (copper twisted pair) or maybe a cable tv coaxial cable, depending on which part of the world you are from
[12:50] <joao> coyo, no idea; I saw them pulling a dedicated fiber channel from the building's mailine and extending it into my apartment
[12:50] <coyo> into your apartment?
[12:50] <coyo> interesting
[12:50] * ScOut3R__ (~ScOut3R@212.96.47.215) has joined #ceph
[12:50] <joao> and going all berserk when they were utterly unable to go around the corners, afraid the cable would break or something
[12:51] <joao> had to make a hole on two of the walls just to get that cable to the living room
[12:51] <coyo> okay, yes, that's a glass cable. all glass cables have a finite bend radius. they are also very expensive.
[12:51] <coyo> although, to be fair, fiber cable construction has gotten a lot better
[12:52] <coyo> only the cheaper older cables have to be treated like that, anymore
[12:52] <coyo> the older cheaper ones fracture if you bend them too much
[12:52] <coyo> and they are still very expensive, even the older ones
[12:52] <joao> fwiw, this was some 3 years ago
[12:52] <coyo> true
[12:52] <coyo> that makes more sense
[12:53] <coyo> i'm still impressed that your isp had the balls to extend the fiber optic cable into your home and place the ont inside your living room.
[12:54] <joao> a friend just finished building his house a couple months back, and made sure they would get him fiber not only to the living room but two other rooms on the top floor, just for future expansion purposes
[12:54] <joao> coyo, most ISPs here are providing fiber to the home
[12:54] <joao> and by most I mean all
[12:54] <coyo> however, my idea is to extend multiple glass cables from the node (a greenish box usually located at the end of a residential street) to every home, one dedicated active ethernet cable to every major device, every tv, every desktop, every phone.
[12:54] <coyo> where are you?
[12:54] <joao> Lisbon, Portugal
[12:55] <coyo> oh
[12:55] <coyo> huh.
[12:55] <coyo> portugal is fairly large, is it just in your city or country-wide?
[12:55] <joao> country-wide; some country areas don't get fiber yet though
[12:55] <coyo> huh
[12:56] <coyo> lucky you
[12:56] <coyo> most of the planet is lucky to even get 6 megabit data forced through crappy telephone wire
[12:56] <joao> some don't even get much more than plain ADSL, but to be fair, you must be living either in the middle of the woods or in a place with just a dozen people
[12:56] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[12:57] <coyo> pairs of thin copper wire twisted around each other in a cheap attempt to cancel out noise
[12:57] <coyo> i live in the middle of one of the largest metropolises in one of the largest countries in the world, only russia, china, and canada beats us, iirc
[12:57] * ScOut3R_ (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[12:58] <coyo> then again, my knowledge of geography sucks
[12:58] <coyo> i live in dallas, texas, united states of america. we have some of the largest and most modern datacenters in the country.
[12:58] <coyo> and most of the residents here are charged over 300 usd for crappy unreliable connections
[12:59] <coyo> not because we dont have the capacity, but because the internet service providers are greedy, myopic and power-hungry
[12:59] <coyo> and they strangle any attempts to increase capacity to normal everyday home users
[13:00] <coyo> i am forced to start my own isp to fight this
[13:00] <joao> last time I talked about this with the guys from DH and inktank, the thing in the US appeared to be just too much crap from ISPs in sharing fiber connections or something (not that I understood most of the explanation)
[13:00] <coyo> well, that's not my fault
[13:00] <coyo> that's the isps' incompentence
[13:00] <coyo> which works in my favor, really
[13:00] <joao> yeah, I suppose :)
[13:00] <coyo> you have any idea just how much raw capacity is just sitting there in the ground, already buried and ready, and is simply being ignored?
[13:01] <coyo> it's disgusting.
[13:01] <coyo> thankfully, much of that is for sale
[13:01] <coyo> it's called 'dark fiber' and i intend on buying a very large amount of it
[13:02] <wogri_risc> you only need one, and a lot of DWDM equipment
[13:02] <joao> to be honest, the only thing the ISPs keep on being greedy around here is on 3G data service, which is ridiculously expensive compared to the home packages (and most of the home ISPs are also mobile ISPs)
[13:02] <coyo> so long as it's on the same continent, with this dark fiber and modern carrier equipment such as dwdm road multiplexers, trill fabric switches, and core routers for peering with other carriers
[13:03] <coyo> i could easily get latency within my network well below 6ms
[13:03] <coyo> joao: i cant imagine what they'll charge for lte-advanced
[13:03] <coyo> maybe your first-born child?
[13:03] <coyo> ;)
[13:04] <joao> oh, I mean 3G and whatever comes after that
[13:04] <joao> they don't really differentiate that much
[13:04] <joao> afaict at least
[13:04] <coyo> lte is called '4g' but that's a blatant lie
[13:04] <coyo> i call it fauxg
[13:04] <coyo> real 4g is lte-advanced or wimax2
[13:04] <coyo> those are gigabit wwan protocols
[13:05] <joao> I'll just mostly ignore whatever their mobile data packages are for as long as they charge me 30 Eur for a 1GB data plan
[13:05] <coyo> theoretical maximum of 1,000 megabits up, 100 megabits down, and that's without offloading to special wifi access points with 802.11u enabled, microcells and picocells operated by malls, schools, storefronts or libraries
[13:06] <coyo> what the hell
[13:06] <coyo> you mean they hardcap you at 1gb transfer?
[13:06] <joao> considering I have a "unlimited" data, 100Mbps down/10Mbps up fiber connection at home for just 40Eur
[13:06] <coyo> i've seen people here in texas murder for less
[13:06] <joao> coyo, for 30 Eur, yeah
[13:07] <coyo> i'm surprised the ceos and executive teams for those companies have not been murdered by enraged subscribers already
[13:07] <joao> if you're on a monthly data plan, you'll probably get away with "unlimited" data transfer for the same price though
[13:07] <coyo> 100mbs down? that's decent, i guess
[13:08] <joao> err, s/monthly data plan/two-year subscription/
[13:08] <coyo> i'm guessing that's passive optical
[13:08] <wogri_risc> we switch to austria, the cell-phone-eldorado: 1gb transfer for 4 euros per month, unlimited for about 8e/month
[13:08] <coyo> lol
[13:08] <wogri_risc> the 4 euro thing is even without contract or subscription
[13:08] <coyo> austria, where you do not connect to the internet, you ARE the internet, and others connect to you
[13:08] <joao> wogri, I'm paying 5 Eur/mo for 150MB
[13:08] <wogri_risc> muhahahah
[13:08] <coyo> in soviet russia, you do not connect to the internet
[13:09] <coyo> the internet connects to YOU
[13:09] <coyo> :D
[13:09] <joao> well, I gotta run
[13:09] <wogri_risc> servus
[13:10] <joao> lunch; be back soon
[13:11] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[13:12] * BillK (~BillK@124-169-250-222.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[13:15] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[13:20] * BillK (~BillK@124-169-188-80.dyn.iinet.net.au) has joined #ceph
[13:20] <trond> Does anyone know if running osd's and rbd kernel client on the same host is still an issue? (0.56.4 + Kernel 3.8.8 )
[13:23] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[13:24] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[13:28] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:30] <wogri_risc> trond: I don't hope so, I have a machine in production with exactly that.
[13:31] <wogri_risc> sorry, no rbd kernel client
[13:31] <wogri_risc> but qemu-librbd binding
[13:31] * rekby (~Adium@2.93.58.253) has joined #ceph
[13:32] <trond> Okay, I was wondering since were having issues with our setup. The entire ceph cluster freezes occationally.
[13:45] * Morg (b2f95a11@ircip2.mibbit.com) has joined #ceph
[13:46] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[13:49] * BillK (~BillK@124-169-188-80.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[13:57] * BillK (~BillK@124-169-230-23.dyn.iinet.net.au) has joined #ceph
[13:58] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[13:59] * rustam (~rustam@94.15.91.30) has joined #ceph
[14:02] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[14:04] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) has joined #ceph
[14:08] * BillK (~BillK@124-169-230-23.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[14:17] * BillK (~BillK@124-169-120-115.dyn.iinet.net.au) has joined #ceph
[14:17] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[14:18] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[14:25] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:25] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Read error: No route to host)
[14:25] * Q310 (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[14:26] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[14:26] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[14:28] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Read error: Operation timed out)
[14:29] * Q310 (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Read error: No route to host)
[14:31] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[14:32] * calebamiles (~caleb@c-50-138-218-203.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[14:40] * rekby1 (~Adium@2.93.58.253) has joined #ceph
[14:40] * rekby (~Adium@2.93.58.253) Quit (Read error: Connection reset by peer)
[14:44] <jefferai> jerker: yeah
[15:00] * rekby1 (~Adium@2.93.58.253) Quit (Quit: Leaving.)
[15:06] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Remote host closed the connection)
[15:13] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[15:15] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[15:15] * sivanov (~sivanov@gw2.maxtelecom.bg) Quit (Quit: Leaving)
[15:16] * rekby (~Adium@2.93.58.253) has joined #ceph
[15:22] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[15:34] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[15:35] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[15:38] * RH-fred (~fred@95.130.8.50) has joined #ceph
[15:38] <RH-fred> Hi !
[15:40] * Morg (b2f95a11@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[15:40] <imjustmatthew> RH-fred: Hola!
[15:40] <RH-fred> I experience poor peroformance on a ceph test cluster : A huge amount of iowait and write speeds less than 20Mbit/s on a cluster with 4 OSD and a Gbit Network
[15:41] <RH-fred> the 4 OSD are on 2 separate hosts and are running XFS
[15:41] <RH-fred> Is there a way to find the bottleneck ?
[15:41] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[15:43] <mattch> RH-fred: You can test each constituent component with things like iperf, hdparm, rados bench etc
[15:44] <RH-fred> mattch : Ok I will do this
[15:44] <RH-fred> Disks are already tested : ~ 120MB/s write speed
[15:45] <mattch> RH-fred: 20Mbit/s does seem quite slow though... I get 20MB/s running on test kit based on old desktop boxes with no special tweaking.
[15:45] <imjustmatthew> RH-fred: Dumb question, are you using a journal disk?
[15:47] <mattch> RH-fred: I'd be interested to know the numbers from 'ceph osd tell $id bench' (http://ceph.com/w/index.php?title=Benchmark)
[15:51] * thorus (~jonas@pf01.intranet.centron.de) has joined #ceph
[15:52] <RH-fred> imjustmatthew : No, is this recommanded for a better performance ?
[15:53] <thorus> is it possible to convert a vmdk to rbd image?
[15:54] <janos> thorus: not sure. though i think qemu-img convert can operate on vmdk's
[15:54] <janos> i've investigate that
[15:55] <thorus> janos: thanks:)
[15:55] <janos> np
[15:58] <imjustmatthew> RH-fred: yes, but you might want to try what mattch suggested using the benchmark tool to see what that looks like before throwing hardware at the problem
[16:01] <mattch> thorus: 'rbd convert' can read in from a variety of sources and write out to an rbd image
[16:02] <mattch> thorus: Sorry - 'rbd import' not convert
[16:02] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:05] <thorus> mattch: thanks will try that
[16:06] <mattch> thorus: Depending on the format, you might need to go through qemu-img first though as janos suggests
[16:06] * drokita (~drokita@199.255.228.128) has joined #ceph
[16:06] <tnt> wido: ping
[16:07] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[16:10] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Say What?)
[16:11] <Kdecherf> gregaf1: /b 8
[16:11] <Kdecherf> hm, fail.
[16:11] <Kdecherf> hi all
[16:13] <Kdecherf> gregaf1: do you have any news about our cephfs latency issue?
[16:17] <RH-fred> So my test are done
[16:18] <RH-fred> Iperf : [ 4] 0.0-10.0 sec 862 MBytes 723 Mbits/sec
[16:18] <RH-fred> hdparm
[16:18] <RH-fred> Timing cached reads: 1062 MB in 2.00 seconds = 530.35 MB/sec
[16:18] <RH-fred> Timing buffered disk reads: 300 MB in 3.02 seconds = 99.42 MB/sec
[16:18] <RH-fred> (almost the same on all disks)
[16:18] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[16:18] <RH-fred> root@ceph1:~# dd if=/dev/zero of=ddfile bs=1M count=1024
[16:19] <RH-fred> 1073741824 octets (1,1 GB) copi�s, 10,3512 s, 104 MB/s
[16:19] <RH-fred> osd bench :
[16:19] <RH-fred> osd.0 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 32.517473 sec at 32246 KB/sec
[16:20] <RH-fred> osd.2 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 29.628486 sec at 35390KB/sec
[16:20] <RH-fred> osd.1 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 31.790094 sec at 32984KB/s
[16:20] <RH-fred> So, I do not need to make a RADOS test which will be slower than that...
[16:21] <RH-fred> why are my OSD writing so slow ?
[16:22] <mattch> RH-fred: Can you confirm you're using the latest ceph version, and paste your config somewhere (e.g. pastebin)
[16:22] <RH-fred> ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
[16:23] <RH-fred> http://pastebin.com/WyncjYk0
[16:27] <mattch> RH-fred: How are the OSDs set up? Are they all on the os disk? Or do you have 2 separate disks mounted at /var/lib/ceph/$cluster-$id ?
[16:31] <RH-fred> Osd.0 and osd.2 are on a system disk partition
[16:32] <RH-fred> but the others are dedicated
[16:32] <RH-fred> and OSD1 which is dedicated isn't working faster according to the previous bench
[16:33] <mattch> RH-fred: Can yuo pastebin the output of 'ceph osd dump' too?
[16:34] * Ramonskie (ab15507e@ircip2.mibbit.com) has joined #ceph
[16:35] <RH-fred> http://pastebin.com/U0vY8zeH
[16:35] <Ramonskie> anyone here succefully deployed ceph with crowbar-barclamp?
[16:40] <mattch> RH-fred: I noticed that one osd is down-out - does 'ceph status' show any problems from this?
[16:43] <RH-fred> no
[16:43] <RH-fred> I made a lot of test of data integrity
[16:44] <RH-fred> I noticed no performance difference with 4 or 3 disks
[16:44] <RH-fred> root@ceph1:~# ceph status
[16:44] <RH-fred> health HEALTH_OK
[16:44] <RH-fred> osdmap e942: 4 osds: 3 up, 3 in
[16:44] * rickstok (d4b24efa@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[16:44] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[16:45] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[16:45] <mattch> RH-fred: I'm not seeing any obvious problems anywhere here I'm afraid... but hopefully someone else will be able to comment on the above - sorry I couldn't be of more use
[16:47] <RH-fred> just one thing mattch :
[16:48] <RH-fred> I noticed that there are a lot of osd processes running on my servers : http://img.hilpert.me/images/capturedu2.png
[16:49] <RH-fred> maybee do they make I/O all together ? and thereforme my performance is poor ?
[16:49] <mattch> RH-fred: I have of course just scanned it all again, and noticed you're seeing around 35MB/sec write speeds... which if your read speed from hdparm is around 99MB/sec might not be that far off what you'd expect...
[16:50] <mattch> Rh-fred: I'd run something like 'ps -efH |less' to see if they're child processes, or separate ones - though I have a suspicion there should only be one for each id
[16:51] <wido> tnt: pong
[16:51] <RH-fred> mattch : ok but with dd i had more speed :
[16:51] <RH-fred> RH-fred> root@ceph1:~# dd if=/dev/zero of=ddfile bs=1M count=1024
[16:51] <RH-fred> <RH-fred> 1073741824 octets (1,1 GB) copi�s, 10,3512 s, 104 MB/s
[16:51] <RH-fred> 104MB/s ...
[16:51] * scuttlemonkey (~scuttlemo@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[16:51] * ChanServ sets mode +o scuttlemonkey
[16:52] <mattch> Rh-fred: That'll be in part writing to the pagecache - try using direct io or similar and trying again
[16:52] <RH-fred> There is a huge difference between 104MB/s on one disk and 20 or 30MB/s on 4disks...
[16:52] <tnt> wido: I saw a post by you I think about a blktap driver for RBD.
[16:52] <RH-fred> mattch : ok
[16:53] <wido> tnt: Don't know? I just know some people at Citrix are looking at it, or working on it
[16:53] <tnt> wido: I actually just started coding that this week and got a working prototype.
[16:53] <mattch> RH-fred: Add 'oflag=direct' to the dd command I think
[16:53] <wido> tnt: That is awesome! If you have something to share, let me know, I'll ping people inside Citrix
[16:53] <tnt> wido: I'll put the repo on github and send you a link.
[16:54] <wido> tnt: great, would be very cool to see RBD in blktap
[16:54] <wido> blktap or blktap2?
[16:54] <tnt> blktap2
[16:54] <RH-fred> dd if=/dev/zero of=ddfile bs=4M count=1024 oflag=direct
[16:54] <RH-fred> 4294967296 octets (4,3 GB) copi�s, 48,8911 s, 87,8 MB/s
[16:55] <tnt> (I think at least ...) I used git://github.com/xen-org/blktap.git as a base.
[16:56] <tnt> I mostly coded it because I have OSD runnin as domU on the same dom0 where I want to put RBD clients and I think this was causing issues.
[16:57] <mattch> Rh-fred: Next step is probably to restart each osd one by one, killing off any spurious processes that remain
[16:57] <joao> Ramonskie, I think gregaf1 worked with crowbar some time ago
[16:57] <joao> I recall him complaining about it a few times at least
[16:57] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[16:58] <Ramonskie> joao: any idea how i can contact him
[16:59] <Ramonskie> i can find some of the complaining with google but not much more
[17:00] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[17:02] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[17:02] <joao> Ramonskie, he should be around soon
[17:02] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Read error: Connection reset by peer)
[17:03] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[17:03] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[17:03] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[17:03] <Ramonskie> joao: thanks will keep a eye out here.
[17:03] <tnt> wido: https://github.com/smunaut/blktap/tree/rbd
[17:05] <mattch> RH-fred: Let us know if thr ceph osd bench numbers improve with only one ceph-osd instance for each osd :)
[17:08] * rustam (~rustam@94.15.91.30) has joined #ceph
[17:08] * sleinen1 (~Adium@user-23-16.vpn.switch.ch) Quit (Quit: Leaving.)
[17:08] * sleinen (~Adium@130.59.94.40) has joined #ceph
[17:10] <RH-fred> It come back again with one process but more threads
[17:10] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[17:10] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[17:11] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[17:11] <RH-fred> It come back with one process but multiples threads again
[17:12] <RH-fred> I was thinking about making other tests with btrfs
[17:12] <RH-fred> what do you think about this idea ?
[17:13] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[17:13] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit ()
[17:16] <janos> RH-fred: my main performance concern is the multiple OSD's defined on one physical disk
[17:16] <janos> that's what mattch was hinting at
[17:16] <RH-fred> Janos: In my cluster there is only one OSD per disk....
[17:16] * scuttlemonkey (~scuttlemo@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[17:16] <janos> ah i thought you said earlier there were two on paritions of a disk
[17:16] <janos> a single disk
[17:16] <janos> my apologies
[17:16] <RH-fred> yes
[17:16] * sleinen (~Adium@130.59.94.40) Quit (Ping timeout: 480 seconds)
[17:17] <RH-fred> a system partition and a big partition for an OSD
[17:17] <RH-fred> but I have dedicated disk too and there are no performance improvment on these OSD
[17:17] <janos> still somewhat of a concern
[17:18] <janos> yeah i'm not experienced enough on that front to know why the overall perf is hit so hard
[17:18] <janos> i'll go back to lurking ;)
[17:19] <mattch> RH-fred: for info, ideally you want your osd on it's own disk, separate to the os, else os laod will cause osd slow-down. Then (in order of preference) a journal on: own ssd, shared ssd, own disk, partition on osd disk, file on osd disk.
[17:19] <RH-fred> Yes I understand that
[17:19] <RH-fred> this is only a test lab
[17:20] * gmason (~gmason@12.139.57.253) has joined #ceph
[17:20] <RH-fred> before putting ceph on production
[17:20] <RH-fred> and I have a dedicated disk for osd.2
[17:20] <RH-fred> but
[17:20] <RH-fred> osd.2 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 29.540981 sec at 35495 KB/sec
[17:21] <RH-fred> it isn't better :(
[17:21] <mattch> RH-fred: No, not now, but under high load you'd probably see a difference
[17:21] <RH-fred> Yes
[17:21] <RH-fred> the journals will be separated on a production cluster don't worry
[17:22] <mattch> RH-fred: To be honest, if your 'best case write speed' with dd is 90MB/s and with ceph osd bench you're seeing 35MB/s you're probably not got any fundamental issues - just the potential to start tweaking things
[17:22] <RH-fred> yes ! :)
[17:22] <RH-fred> what parameter should I tweak ?
[17:22] <RH-fred> I didn't find anything interresting in the doc :(
[17:23] * dxd828_ (~dxd828@195.191.107.205) has joined #ceph
[17:24] <mattch> RH-fred: So, for a start, change what you're mounting the fs with to turn off things like atime etc. Turn on journal aio too. Also, try using xfs instead of ext4 and see if that makes a difference. Then have aread of http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/ and see what tweaks might apply to your end use-case
[17:25] <mattch> RH-fred: There are other things that can help, like changing the elevator between cfq and deadline, but a lot of these are dependant on underlying os, and local variables
[17:25] * dxd828_ (~dxd828@195.191.107.205) Quit ()
[17:26] <janos> does anyone have any idea if those osd benches clean up after themselves?
[17:27] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:27] <mattch> janos: Not sure... the rados bench does, so you have to explicitly tell it not to in some cases to test e.g. read speed on an empty pool
[17:27] * loicd (~loic@67.23.204.150) has joined #ceph
[17:28] <janos> i haven't run all those tests. itching to now ;)
[17:30] * Ramonskie (ab15507e@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[17:30] <mattch> janos: http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/ is a good walkthrough performance on different hardware setups, which also includes some benchmarks and the reasoning behind them
[17:31] <mattch> janos: given how often people are advised to run the osd benches here/on forums etc, I don't think there are any negative impacts from running it (other than hammering your pool while running)
[17:32] <janos> i've done all the hardware and network tests, just haven't really done much rados bench, ceph bench specifically
[17:33] <janos> i'm definitely not getting full potential right now, so i'll test when i can
[17:33] <janos> other priorities sadly
[17:33] <janos> ok, i'll admit i don't use bonnie++ much because the output makes my eyes cross
[17:34] <mattch> janos: have a look at phoronix - it launches lots of different tools with the 'same' test parameters and summarises them all for you - don't need to learn individual tools commands or options etc
[17:35] * janos looks
[17:36] <janos> neat
[17:36] * loicd (~loic@67.23.204.150) Quit (Read error: Connection reset by peer)
[17:36] * loicd (~loic@67.23.204.150) has joined #ceph
[17:36] <janos> and lookee there, foudn via yum as well
[17:37] * janos feels like a moronix
[17:37] <janos> thanks, mattch
[17:38] <mattch> janos: No problem - only discovered it a few weeks back - been trying to read the bonnie man page until then... still none the wiser :)
[17:38] <janos> yeah, bonnie and my brain just don't jive very well yet
[17:39] * loicd1 (~loic@67.23.204.150) has joined #ceph
[17:39] * loicd (~loic@67.23.204.150) Quit (Read error: Connection reset by peer)
[17:44] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:45] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:45] * sleinen (~Adium@2001:620:0:26:b1a7:11bd:7fe1:63c8) has joined #ceph
[17:52] * loicd1 (~loic@67.23.204.150) Quit (Quit: Leaving.)
[17:55] * tnt (~tnt@91.177.247.88) has joined #ceph
[17:55] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[17:56] * ScOut3R__ (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[17:58] * loicd (~loic@67.23.204.150) has joined #ceph
[17:58] <pioto> cephfs is just sooooo much slower than rbd, orders and orders of magnitude.... is there maybe some tunable or cluster setup thing that i'm doing wrong? or is that just how it is for now?
[17:59] <darkfaded> i fail at backlog-scanning. did anyone reply about OSD+/dev/rbd on the same box is still considered evil?
[18:00] <mattch> darkfaded: I can't confirm (ENOTADEV), but I believe it still is... you could potentially block the osd device and lock up the pool
[18:04] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[18:04] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:05] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:05] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:06] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:06] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:07] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:07] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:07] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:07] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:08] * BillK (~BillK@124-169-120-115.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:08] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:08] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:09] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:09] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:10] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:10] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:10] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[18:10] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:11] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:12] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:12] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:13] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:13] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:13] * Cube (~Cube@12.248.40.138) has joined #ceph
[18:13] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:13] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:14] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:14] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:15] * RH-fred (~fred@95.130.8.50) Quit (Quit: Quitte)
[18:15] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:15] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:16] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:16] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:16] * dosaboy (~dosaboy@67.23.204.150) has joined #ceph
[18:17] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:17] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:18] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:18] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:19] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:19] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:20] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:20] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:20] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:20] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:21] <gregaf1> why are we talking about crowbar, joao?
[18:21] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:21] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:21] <joao> <Ramonskie> anyone here succefully deployed ceph with crowbar-barclamp?
[18:21] <joao> gregaf1, ^
[18:22] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:22] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:23] <gregaf1> ah
[18:23] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:23] <gregaf1> well, yes, but I think we've addressed it on the mailing list now
[18:23] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:24] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:24] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:24] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:25] <joao> gregaf1, I see that now :) thanks
[18:25] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:25] * loicd (~loic@67.23.204.150) Quit (Quit: Leaving.)
[18:25] * loicd (~loic@67.23.204.150) has joined #ceph
[18:25] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:26] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:26] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:26] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:27] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:27] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:28] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:28] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:29] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:29] * l0nk (~alex@83.167.43.235) Quit (Quit: Leaving.)
[18:29] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:29] * rustam (~rustam@94.15.91.30) has joined #ceph
[18:30] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:30] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:31] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:31] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:32] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[18:32] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:32] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:33] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[18:33] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:33] * alram (~alram@38.122.20.226) has joined #ceph
[18:33] * jskinner_ (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[18:33] * ScOut3R (~ScOut3R@1F2EBE60.dsl.pool.telekom.hu) has joined #ceph
[18:33] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[18:38] * loicd1 (~loic@67.23.204.150) has joined #ceph
[18:38] * loicd (~loic@67.23.204.150) Quit (Read error: Connection reset by peer)
[18:41] * sleinen1 (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[18:42] * jskinner (~jskinner@69.170.148.179) Quit (Ping timeout: 480 seconds)
[18:45] * loicd1 (~loic@67.23.204.150) Quit (Quit: Leaving.)
[18:47] * sleinen (~Adium@2001:620:0:26:b1a7:11bd:7fe1:63c8) Quit (Ping timeout: 480 seconds)
[18:49] * sleinen1 (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[18:52] * infernix (nix@cl-1404.ams-04.nl.sixxs.net) Quit (Quit: ZNC - http://znc.sourceforge.net)
[18:54] * Svedrin (svedrin@2a01:4f8:121:3a8:0:1:0:2) Quit (Read error: No route to host)
[18:55] * infernix (nix@cl-1404.ams-04.nl.sixxs.net) has joined #ceph
[19:01] * loicd (~loic@67.23.204.150) has joined #ceph
[19:01] * gmason (~gmason@12.139.57.253) Quit (Quit: Computer has gone to sleep.)
[19:02] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[19:04] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[19:05] * loicd (~loic@67.23.204.150) Quit ()
[19:11] * ScOut3R (~ScOut3R@1F2EBE60.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[19:12] * jskinner (~jskinner@69.170.148.179) Quit (Ping timeout: 480 seconds)
[19:16] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:17] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:19] * loicd (~loic@67.23.204.150) has joined #ceph
[19:19] * loicd (~loic@67.23.204.150) Quit ()
[19:23] * leseb (~Adium@67.23.204.150) has joined #ceph
[19:27] * leseb (~Adium@67.23.204.150) Quit ()
[19:28] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[19:28] * leseb (~Adium@67.23.204.150) has joined #ceph
[19:29] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: The early bird may get the worm, but the second mouse gets the cheese)
[19:30] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) has joined #ceph
[19:32] * leseb (~Adium@67.23.204.150) Quit ()
[19:32] * Svedrin (svedrin@ketos.funzt-halt.net) has joined #ceph
[19:35] * gmason (~gmason@12.139.57.253) has joined #ceph
[19:36] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[19:36] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[19:42] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[19:49] * xmltok (~xmltok@pool101.bizrate.com) Quit (Read error: Connection reset by peer)
[19:50] * portante (~user@67.23.204.150) has joined #ceph
[19:54] * portante (~user@67.23.204.150) Quit (Read error: Operation timed out)
[19:55] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[19:55] * calebamiles (~caleb@c-50-138-218-203.hsd1.vt.comcast.net) has joined #ceph
[19:56] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[19:56] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[19:59] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[20:02] * rustam (~rustam@94.15.91.30) has joined #ceph
[20:03] <sjusthm> paravoid: are you there?
[20:03] <paravoid> I am
[20:03] <sjusthm> paravoid: if you do a normal scrub, on one of those pgs, what happens?
[20:03] <sjusthm> paravoid: oh, and did you get a chance to generate logs?
[20:04] <paravoid> I didn't yet
[20:04] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[20:04] <paravoid> so, deep scrubbing is done, it went up to 9 inconsistent and it's now down to 7, without me running a repair
[20:05] <sjusthm> mm, that suggests a problem with scrub
[20:05] <paravoid> I re-run deep scrub on all 7 of them and they remained inconsistent
[20:05] <sjusthm> yeah
[20:05] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) Quit (Quit: Leaving.)
[20:05] <paravoid> so what are these omaps used for?
[20:05] <sjusthm> probably bucket indices
[20:05] <sjusthm> what pool are they in?
[20:07] <paravoid> .rgw.buckets
[20:07] <sjusthm> yeah, I think that's the bucket index pool
[20:08] <paravoid> what would that be?
[20:08] * LeaChim (~LeaChim@176.250.202.138) Quit (Ping timeout: 480 seconds)
[20:08] <paravoid> bucket listings?
[20:08] <sjusthm> yeah
[20:08] <sjusthm> yeah, rescrubbing with logging is the next step
[20:09] * portante (~user@67.23.204.150) has joined #ceph
[20:09] <paravoid> so isn't it possible that those 3 pgs were buckets that were updated with new files
[20:09] <sjusthm> it is
[20:09] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[20:09] * rustam (~rustam@94.15.91.30) has joined #ceph
[20:09] <paravoid> and that a new omap was written to all three replicas?
[20:09] * lyncos (~chatzilla@208.71.184.41) has joined #ceph
[20:10] <sjusthm> it's possible, but it won't overwrite the entire omap
[20:10] <paravoid> what would be the real world consequence of these inconsistencies btw?
[20:10] <paravoid> listing containers and missing files?
[20:10] <paravoid> or stuck requests?
[20:10] <sjusthm> a bucket might loose an object
[20:10] <sjusthm> or gain one I guess
[20:10] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[20:10] <sjusthm> possibly other things, I'd have to ask yehuda
[20:10] <sjusthm> at any rate, it's not good
[20:11] * loicd (~loic@67.23.204.150) has joined #ceph
[20:11] <paravoid> yeah, I'm just trying to realize how bad is it :)
[20:11] <lyncos> Hi Everyone.. I need help to troubleshot my ceph cluster... I did break it (it's a test cluster) but I would like to bring it back to a functional state.. I don't care loosing objects... now my mds is not working and I don't know why ... See the status http://pastebin.com/KeFPhP3P
[20:11] * leseb (~Adium@67.23.204.150) has joined #ceph
[20:13] <lyncos> Any idea ?
[20:14] <lyncos> How can I flush all these stuck stale entries ?
[20:14] <dmick> are all osds up and healthy?
[20:15] <dmick> what's the replication level on all the pools (apparently you've added several)?
[20:15] <lyncos> let me check
[20:16] <lyncos> osdmap e565: 6 osds: 5 up, 5 in
[20:16] <dmick> ceph osd dump will tell you both those
[20:16] <lyncos> ah thanks
[20:16] <dmick> what's wrong with the sixth osd
[20:16] <lyncos> let me check
[20:16] <lyncos> http://pastebin.com/khkPSMkz
[20:17] <lyncos> i'm checking the osd.5
[20:18] <lyncos> one network interface is down :-)
[20:18] <paravoid> sjusthm: ceph osd tell NNN injectargs '--debug-filestore 20 --debug-osd 20 --debug-ms 1'
[20:18] <paravoid> correct?
[20:18] <sjusthm> yeah
[20:19] * LeaChim (~LeaChim@176.250.150.147) has joined #ceph
[20:19] <paravoid> oh god, it's very noisy
[20:19] <paravoid> even without the deep scrub
[20:19] <sjusthm> yeah
[20:19] <sjusthm> it is
[20:20] * rekby (~Adium@2.93.58.253) Quit (Quit: Leaving.)
[20:20] <lyncos> Ok now osd.5 is up
[20:20] <paravoid> and to disable? 0/0/0?
[20:21] <lyncos> dmick http://pastebin.com/9QVJW1tt
[20:21] <sjusthm> yeah
[20:22] * rustam (~rustam@94.15.91.30) has joined #ceph
[20:23] <lyncos> is it possible to reset the whole thing ?
[20:23] <dmick> lyncos: ok, so all pools are size 2, so probably all the pgs that have problems involve that one down OSD
[20:24] <lyncos> dmick the osd is up now
[20:24] <dmick> involve*d*. Is health coming back now?
[20:24] <lyncos> nop exact same thing
[20:24] <dmick> have you been changing the crush map?
[20:25] <lyncos> not really .. i'm just starting with ceph
[20:25] <lyncos> maybe I done it without knowing it
[20:25] <dmick> that doesn't mean you haven't been in there, it just rules out some problems. No, you'd know.
[20:25] <dmick> you would have issued commands including "crush"
[20:25] <lyncos> ok no I did not
[20:25] <dmick> how about ceph osd tree, and if that looks symmetric, pastebin ceph pg dump
[20:27] <lyncos> http://pastebin.com/9mqjgfWn
[20:27] <darkfaded> §
[20:27] <lyncos> You really able to read these info ? or you using a tool to 'parse' it
[20:28] * madalynn (5b09c468@ircip2.mibbit.com) has joined #ceph
[20:29] <dmick> there are things to look for
[20:29] <lyncos> I guess I'll learn using ceph
[20:30] <dmick> so all the pgs with problems are showing active set of osd 2 only
[20:30] <dmick> except 1.26, which is recovering, back onto 5, it looks like
[20:30] <dmick> so something is happening
[20:31] <lyncos> yeah I did make some bad things on the osds ... like thrashing the osd.X folder content
[20:31] <lyncos> I know it's bad
[20:31] <lyncos> but I did it anyway
[20:31] <dmick> well, that would have been useful information to know ...
[20:31] <lyncos> I'm sorry
[20:31] <lyncos> I tought I told you
[20:31] <paravoid> sjusthm: so, it's done. ~400M of logs
[20:32] <paravoid> sjusthm: anything in particular you'd like me to grep for :)
[20:32] <sjusthm> cool
[20:32] <lyncos> dmick is there any way to just flush these stuff ?
[20:32] <sjusthm> I'm going to trace the scrub process on both replicas
[20:32] <dmick> lyncos: well you can always destroy the cluster, sure
[20:32] <sjusthm> can you post the logs to cephdrop?
[20:32] <lyncos> dmick is there any way to do it quickly ?
[20:32] <sjusthm> it's the parts around '_scan_list scanning.*'
[20:32] <sjusthm> but I'd like the whole log if possible
[20:32] <lyncos> just turn down all osd and remove them ?
[20:33] <paravoid> remind me of cephdrop?
[20:33] * leseb (~Adium@67.23.204.150) Quit (Quit: Leaving.)
[20:35] <dmick> lyncos: kill all ceph- daemons, wipe /var/*/ceph, probably enough
[20:35] <dmick> maybe /etc/ceph if you're really starting over
[20:35] <dmick> but starting completely over is overkill
[20:35] <dmick> whatever you want.
[20:36] <lyncos> ok thanks I'll test that in a minute
[20:38] * loicd (~loic@67.23.204.150) Quit (Read error: Connection reset by peer)
[20:40] * coyo (~unf@00017955.user.oftc.net) Quit (Quit: F*ck you, I'm a daemon.)
[20:43] <sjusthm> paravoid: which pg?
[20:43] <paravoid> 2f2
[20:43] <paravoid> er
[20:43] <paravoid> 3.2f2
[20:46] * eschnou (~eschnou@42.165-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:51] <sjusthm> paravoid: I apologize
[20:51] <sjusthm> I need debug osd = 30
[20:51] <paravoid> hah
[20:52] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[20:52] <sjusthm> sorry, apparently the part where _scan_list prints out the omap entries doesn't happen at 20
[20:54] * pioto_ (~pioto@pool-96-235-30-25.pitbpa.fios.verizon.net) has joined #ceph
[20:55] * dpippenger (~riven@216.103.134.250) has joined #ceph
[20:58] * rustam (~rustam@94.15.91.30) has joined #ceph
[20:59] * pioto (~pioto@pool-96-235-30-25.pitbpa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[21:01] * loicd (~loic@67.23.204.150) has joined #ceph
[21:02] * leseb (~Adium@67.23.204.150) has joined #ceph
[21:06] * gmason (~gmason@12.139.57.253) Quit (Quit: Computer has gone to sleep.)
[21:08] <paravoid> sjusthm: I messed up and rerun it with 20, now I'm waiting for it to finish to start a clean one with 30 again
[21:08] <paravoid> sjusthm: it'd be nice to be able to abort a scrub
[21:14] <madalynn> what is the raeson to abort a scrub?
[21:16] <paravoid> troubleshooting
[21:19] <madalynn> ah ok
[21:22] * leseb (~Adium@67.23.204.150) Quit (Quit: Leaving.)
[21:25] * portante (~user@67.23.204.150) Quit (Ping timeout: 480 seconds)
[21:32] * dosaboy (~dosaboy@67.23.204.150) Quit (Quit: leaving)
[21:34] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) Quit (Remote host closed the connection)
[21:41] <madalynn> Does anybody kow whether pg splitting and pg mergeing will be supported by cuttlefish?
[21:42] <dmick> madalynn: they're actually in the source now experimentally; I'm not certain how testing is going
[21:43] <sjusthm> merging is not implemented
[21:43] <dmick> ah
[21:45] <madalynn> i've created a cluster in the past using argonout and setted the pg's a way too high (8192 for 24 osds with replication 3) to be able to upgrade. But right now it seems i don't need to upgrade soon.
[21:45] <madalynn> So i wanted to lower the PGs to the correct value
[21:46] * calebamiles (~caleb@c-50-138-218-203.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[21:55] <madalynn> so no way to lower pgs?
[21:58] * loicd (~loic@67.23.204.150) Quit (Quit: Leaving.)
[22:02] * gmason (~gmason@12.139.57.253) has joined #ceph
[22:06] <dmick> madalynn: none I'm aware of, other than "create a new pool and copy objects"
[22:06] <dmick> but that doesn't strike me as so many PGs that it'll be a problem
[22:14] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[22:18] <lyncos> dmick I did flush everything but I missed one part about the MDS .. when I start the service I get 'failed to authenticate' .. I try to give mds acces but I can't find any help in the documentation
[22:20] <dmick> gonna need more details than that
[22:20] <lyncos> ok hold on
[22:21] <lyncos> http://pastebin.com/RbvQASH4
[22:22] <lyncos> I get the mds log when I start mds service .. (it dosen't start)
[22:25] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:27] <sjusthm> paravoid: osd.133 appears to be missing 3 keys
[22:27] <sjusthm> very odd
[22:27] <sjusthm> out of 570k
[22:27] <sjusthm> **750k
[22:27] <paravoid> keys?
[22:28] <sjusthm> in the omap
[22:28] <sjusthm> sorry, on that particular object
[22:28] <sjusthm> there are 750k keys
[22:29] <sjusthm> but the copy on 133 is missing 3 of them
[22:29] <paravoid> you mean keys in the omap?
[22:30] <sjusthm> yeah
[22:30] <dmick> lyncos: are you using the mds? (are you using cephfs?) One answer is "so what"
[22:30] <dmick> but how did you create the cluster, and thus create the auth for the mds?
[22:30] <paravoid> uh, okay
[22:31] <dmick> I mean, I'm just doing leading questions around "there's an X problem, investigate X"
[22:32] <paravoid> sjusthm: do you want me to find which objects?
[22:32] <sjusthm> paravoid: sorry?
[22:33] <paravoid> er, which keys
[22:33] <sjusthm> the inconsistent object is https://github.com/zfsonlinux/zfs/issues/443'
[22:33] <sjusthm> oops
[22:33] <sjusthm> .dir.10267.612/head//3 is the inconsistent object
[22:33] <sjusthm> and I've got the exact keys here
[22:33] <paravoid> okay
[22:33] <sjusthm> not contiguous
[22:33] <sjusthm> you weren't doing snapshots/
[22:33] <sjusthm> ?
[22:34] <paravoid> I'm not
[22:36] <sjusthm> osd.133 isn't anywhere near full, right?
[22:36] <paravoid> so, not a deep scrub bug
[22:36] <sjusthm> no, it's not
[22:36] <paravoid> 55%
[22:37] <sjusthm> I'm going to guess that you haven't been removing very many objects (by which I mean radosgw objects in this context) from the cluster?
[22:37] <paravoid> removing? a few this week
[22:37] <paravoid> not last week
[22:38] <sjusthm> how many radosgw buckets to you have/
[22:38] <sjusthm> ?
[22:38] * leseb (~Adium@67.23.204.150) has joined #ceph
[22:39] * madalynn (5b09c468@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[22:41] * leseb (~Adium@67.23.204.150) Quit ()
[22:41] <paravoid> (still counting)
[22:41] * leseb (~Adium@67.23.204.220) has joined #ceph
[22:41] <sjusthm> so many
[22:41] <sjusthm> ok
[22:41] <paravoid> no
[22:42] <paravoid> I did a "stat" which also counts files
[22:42] <paravoid> I did a list now too, let's see
[22:42] <sjusthm> ah
[22:42] <lyncos> dmick Yes I want to use cephfs and I'm trying to find the documentation on how to authorize my mds
[22:42] <paravoid> it's waaay slower than swift in this regard btw
[22:42] <paravoid> heh, stat timed out
[22:43] * leseb (~Adium@67.23.204.220) Quit ()
[22:45] * loicd (~loic@67.23.204.150) has joined #ceph
[22:45] <dmick> lyncos: I'll ask again: how did you set up the cluster?
[22:45] <paravoid> hrm
[22:45] <paravoid> "list" takes quite a while too
[22:45] <paravoid> that's worrying
[22:46] <lyncos> dmick : with the ceph puppet module
[22:46] <dmick> oh. no idea how that works
[22:46] <paravoid> sjusthm: something close to 30-40k
[22:47] <sjusthm> ok
[22:47] <lyncos> dmick .. is there any documentation about setting a mds on an existing cluster?
[22:47] <lyncos> especially the authentication ?
[22:47] <dmick> I doubt it, but it's not really unlike adding an OSD
[22:47] <dmick> wrt auth
[22:48] <dmick> at least in a general sense
[22:48] <lyncos> dmick ok.. my osd uses client.admin key
[22:48] * dosaboy (~dosaboy@67.23.204.150) has joined #ceph
[22:48] <dmick> ? no, I doubt it
[22:48] <paravoid> is yehudasa_ around?
[22:48] <lyncos> dmick ok hold on I'll try to figure this out
[22:48] <dmick> client.admin is the key for clients, like, the ceph tool, or the rados tool
[22:48] <lyncos> dmick you want to see my ceph.conf file ?
[22:49] <dmick> http://ceph.com/docs/master/rados/operations/authentication/#daemon-keyrings is probably relevant
[22:49] * lyncos (~chatzilla@208.71.184.41) Quit (Remote host closed the connection)
[22:56] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:01] <paravoid> sjusthm: anything else I can do?
[23:01] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[23:01] <sjusthm> paravoid: not at the moment
[23:01] <paravoid> ok
[23:01] <sjusthm> paravoid: we are sure this pg was clean a week ago and that there was no cluster recover in the mean time?
[23:01] <paravoid> yes and yes
[23:02] <sjusthm> kernel/xfs version
[23:02] <sjusthm> ?
[23:02] <paravoid> 2013-04-14 19:19:10.827868 osd.56 10.64.32.10:6857/17945 342 : [INF] 3.2f2 scrub ok
[23:02] <paravoid> 2013-04-16 16:42:40.219647 osd.56 10.64.32.10:6857/17945 409 : [ERR] 3.2f2 osd.133: soid d340c2f2/.dir.10267.612/head//3 omap_digest 4263226353 != known omap_digest 3162369895
[23:02] <sjusthm> scrub ok I think doesn't mean it was a deep scrub
[23:03] <paravoid> hm, you are correct
[23:03] <sjusthm> yeah, we need to confirm it was a deep scrub
[23:03] <paravoid> it wasn't
[23:04] <sjusthm> sorry, we need to confirm that last time it had a clean deep scrub
[23:04] <paravoid> hm
[23:05] <paravoid> I can't find a deep-scrub ok
[23:05] <sjusthm> oh, this is pretty easy to explain if it hadn't been deep scrubbed since the upgrade to 56.4
[23:05] <paravoid> that's strange, I ran a ceph osd deep-scrub for all osds
[23:06] * jskinner_ (~jskinner@69.170.148.179) has joined #ceph
[23:06] * jskinner (~jskinner@69.170.148.179) Quit (Read error: Connection reset by peer)
[23:08] <paravoid> no deep-scrubs in the logs for all of these pgs
[23:09] <sjusthm> ok, I think the answer is to go ahead and repair them
[23:09] * eschnou (~eschnou@42.165-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:09] <sjusthm> and remain vigilant for more examples
[23:09] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[23:09] <sjusthm> if you can keep your central logs, we'll be able to confirm it if it pops up again
[23:10] <paravoid> sorry for the wild goose chase
[23:10] <sjusthm> no problem at all
[23:10] <sjusthm> these kinds of bugs will only ever show up like this
[23:10] <paravoid> I'm 100% sure I ran a for ...; do ceph osd deep-scrub $i; done
[23:10] <paravoid> no idea why those were skipped
[23:10] <sjusthm> I believe you, it's more likely that the pg scrub scheduling was flaky
[23:11] <sjusthm> odd though, I though bobtail nailed that behavior
[23:11] <sjusthm> **56.4
[23:11] <paravoid> so this omap corruption(?) is a known now fixed bug?
[23:11] <sjusthm> I am pretty sure it's a replay bug
[23:12] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[23:12] <sjusthm> at some point in the past, osd.133 or a previous owner of this object restarted and replayed it's journal slightly wrong
[23:12] <sjusthm> we fixed a few examples in 56.2 and 56.3
[23:12] <sjusthm> and 56.4 I think
[23:12] <paravoid> oh, this is the same cluster as the crazy/very old journal replaying
[23:12] <sjusthm> yeah, that's my thinking
[23:13] <sjusthm> for what it's worth, our testing has gotten much much better in that area
[23:13] <paravoid> oh?
[23:13] <paravoid> how come?
[23:13] * leseb (~Adium@67.23.204.220) has joined #ceph
[23:13] <sjusthm> the exercise of chasing down those bugs also yielded some much nicer testing
[23:13] <sjusthm> of the journal replay correctness
[23:14] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) has joined #ceph
[23:14] <paravoid> that's very nice to hear
[23:14] * leseb (~Adium@67.23.204.220) Quit ()
[23:15] * loicd (~loic@67.23.204.150) Quit (Quit: Leaving.)
[23:15] <paravoid> I'll run a repair on those 7 and rerun a deep scrub everywhere just to be sure
[23:15] <sjusthm> ok
[23:15] <paravoid> it takes about 2 days with max scrubs per osd = 4
[23:15] <sjusthm> yeah
[23:16] <paravoid> thanks again :)
[23:16] <sjusthm> no problem
[23:16] <sjusthm> keep turning up bugs!
[23:16] <paravoid> heh
[23:16] <paravoid> I think I'm over 20 now :)
[23:16] <paravoid> speaking of bugs, any updates on #4552?
[23:16] <paravoid> that was really nasty
[23:17] <sjusthm> not yet
[23:17] <sjusthm> it's getting close to the top of my list though
[23:18] <paravoid> must be a long list!
[23:18] * loicd (~loic@67.23.204.150) has joined #ceph
[23:21] * loicd1 (~loic@67.23.204.150) has joined #ceph
[23:21] * loicd (~loic@67.23.204.150) Quit (Read error: Connection reset by peer)
[23:21] * leseb (~Adium@67.23.204.220) has joined #ceph
[23:22] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) Quit (Ping timeout: 480 seconds)
[23:22] * leseb (~Adium@67.23.204.220) Quit ()
[23:29] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Read error: Operation timed out)
[23:30] * DarkAce-Z (~BillyMays@50.107.54.92) has joined #ceph
[23:38] * loicd1 (~loic@67.23.204.150) Quit (Quit: Leaving.)
[23:41] * leseb (~Adium@67.23.204.220) has joined #ceph
[23:41] * esammy (~esamuels@host-2-103-102-192.as13285.net) has joined #ceph
[23:44] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) has joined #ceph
[23:47] * verwilst (~verwilst@dD576962F.access.telenet.be) has joined #ceph
[23:49] * leseb (~Adium@67.23.204.220) Quit (Quit: Leaving.)
[23:52] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) Quit (Ping timeout: 480 seconds)
[23:53] * jskinner_ (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[23:55] * gmason (~gmason@12.139.57.253) Quit (Quit: Computer has gone to sleep.)
[23:57] * ashleyx42 (ashleyx42@c-76-108-138-141.hsd1.fl.comcast.net) has joined #ceph
[23:57] * loicd (~loic@67.23.204.150) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.