#ceph IRC Log


IRC Log for 2012-12-03

Timestamps are in GMT/BST.

[0:02] * KindOne (KindOne@h162.25.131.174.dynamic.ip.windstream.net) has joined #ceph
[0:19] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[0:30] * maxiz_ (~pfliu@ Quit (Ping timeout: 480 seconds)
[0:35] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[0:47] * maxiz_ (~pfliu@ has joined #ceph
[0:52] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[0:58] * gucki (~smuxi@80-218-32-162.dclient.hispeed.ch) Quit (Remote host closed the connection)
[1:12] * deepsa_ (~deepsa@ has joined #ceph
[1:12] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[1:12] * deepsa (~deepsa@ Quit (Read error: Connection reset by peer)
[1:12] * deepsa_ is now known as deepsa
[1:12] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit ()
[1:17] * mythzib (52e7d4bf@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[1:17] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:26] * maxiz_ (~pfliu@ Quit (Quit: Ex-Chat)
[1:40] * BManojlovic (~steki@242-174-222-85.adsl.verat.net) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:42] * mooperd (~andrew@dslb-188-103-068-079.pools.arcor-ip.net) Quit (Quit: mooperd)
[1:44] * joao (~JL@ has joined #ceph
[1:44] * ChanServ sets mode +o joao
[2:08] * joao sets mode -o joao
[2:17] * Qten (Q@qten.qnet.net.au) Quit (Remote host closed the connection)
[2:30] * gohko (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[2:33] * tnt (~tnt@83.164-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[2:34] * silversu_ (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[2:34] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[2:38] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[2:39] * Qten (Q@qten.qnet.net.au) has joined #ceph
[2:41] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[2:46] * Qten (Q@qten.qnet.net.au) Quit ()
[2:48] * gohko_ (~gohko@natter.interq.or.jp) has joined #ceph
[2:48] * gohko (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[2:48] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[2:48] * gohko_ (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[2:48] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:57] * maxiz (~pfliu@ has joined #ceph
[2:58] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[3:23] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[3:32] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[4:05] * plut0 (~cory@pool-96-236-43-69.albyny.fios.verizon.net) has left #ceph
[4:44] * df_ (~danielfar@c-98-253-7-127.hsd1.in.comcast.net) has joined #ceph
[4:44] * df_ (~danielfar@c-98-253-7-127.hsd1.in.comcast.net) has left #ceph
[4:44] * df117 (~danielfar@c-98-253-7-127.hsd1.in.comcast.net) has joined #ceph
[5:15] * df117 (~danielfar@c-98-253-7-127.hsd1.in.comcast.net) Quit (Quit: df117)
[5:18] * df117 (~danielfar@c-98-253-7-127.hsd1.in.comcast.net) has joined #ceph
[5:19] * df117 (~danielfar@c-98-253-7-127.hsd1.in.comcast.net) Quit ()
[5:22] * MooingLemur (~troy@phx-pnap.pinchaser.com) Quit (Remote host closed the connection)
[5:23] * MooingLemur (~troy@phx-pnap.pinchaser.com) has joined #ceph
[6:14] * jlogan1 (~Thunderbi@2600:c00:3010:1:e12a:776f:2a6d:8a8) has joined #ceph
[7:10] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[7:13] * boll (~boll@00012a62.user.oftc.net) Quit (Quit: boll)
[7:36] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[7:41] * The_Bishop__ (~bishop@e179016085.adsl.alicedsl.de) has joined #ceph
[7:48] * The_Bishop_ (~bishop@e179016085.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[7:51] * Psi-Jack_ (~psi-jack@psi-jack.user.oftc.net) has joined #ceph
[7:51] * tnt (~tnt@83.164-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:57] * Psi-jack- (~psi-jack@psi-jack.user.oftc.net) has joined #ceph
[7:58] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:58] * Psi-jack- is now known as Psi-jack
[8:03] * Psi-jack- (~psi-jack@psi-jack.user.oftc.net) has joined #ceph
[8:04] * Psi-Jack_ (~psi-jack@psi-jack.user.oftc.net) Quit (Read error: Connection reset by peer)
[8:04] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) Quit (Remote host closed the connection)
[8:04] * Psi-jack- is now known as Psi-jack
[8:07] * yoshi (~yoshi@p4105-ipngn4301marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[8:20] * jlogan1 (~Thunderbi@2600:c00:3010:1:e12a:776f:2a6d:8a8) Quit (Ping timeout: 480 seconds)
[8:48] * nosebleedkt (~kostas@kotama.dataways.gr) has joined #ceph
[8:51] <nosebleedkt> hello everybody!
[8:53] <nosebleedkt> can somebody give me a clue about this error:
[8:53] <nosebleedkt> [ 497.018924] libceph: client0 fsid 275e3f6e-96b2-4894-84db-c56e7f9db7b5
[8:53] <nosebleedkt> [ 497.019249] libceph: no secret set (for auth_x protocol)
[8:53] <nosebleedkt> [ 497.020257] libceph: error -22 on auth protocol 2 init
[8:53] <nosebleedkt> Googles has some stuff but I can't really figure out what's going on
[8:54] <nosebleedkt> It happens when I map and rbd device
[8:54] <nosebleedkt> and having cehpx enabled
[9:00] * gaveen (~gaveen@ has joined #ceph
[9:27] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:28] <nosebleedkt> from ceph monitor i get
[9:28] <nosebleedkt> 2012-12-03 10:27:26.097790 b2928b70 0 -- >> pipe(0x91ab040 sd=21 pgs=0 cs=0 l=0).accept peer addr is really (socket is
[9:28] <nosebleedkt> 2012-12-03 10:27:26.099167 b392ab70 1 -- <== unknown.0 1 ==== auth(proto 0 34 bytes epoch 0) ==== 60+0+0 (3679117932 0 0) 0x9450000 con 0x91adef0
[9:28] <nosebleedkt> 2012-12-03 10:27:26.099288 b392ab70 1 -- --> -- mon_map v1 -- ?+0 0x91ce280 con 0x91adef0
[9:34] * tnt (~tnt@83.164-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:34] * BManojlovic (~steki@ has joined #ceph
[9:39] * loicd (~loic@ has joined #ceph
[9:42] * ScOut3R (~ScOut3R@ has joined #ceph
[9:43] * ScOut3R_ (~ScOut3R@ has joined #ceph
[9:43] * The_Bishop__ (~bishop@e179016085.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[9:46] * Leseb (~Leseb@ has joined #ceph
[9:50] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:50] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[9:51] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[9:55] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:56] * The_Bishop__ (~bishop@e179016085.adsl.alicedsl.de) has joined #ceph
[10:34] * mooperd (~andrew@dslb-188-103-067-049.pools.arcor-ip.net) has joined #ceph
[10:51] * maxiz (~pfliu@ Quit (Quit: Ex-Chat)
[10:54] <nosebleedkt> I found it !
[11:04] * roald (~Roald@ has joined #ceph
[11:19] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[11:20] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:21] * boll (~boll@daimi-pat.daimi.au.dk) has joined #ceph
[11:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:23] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[11:37] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:37] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:43] <nosebleedkt> how can i list all the osds in cluster?
[11:46] <tnt> ceph osd tree
[11:47] <nosebleedkt> tnt, I have added an osd from a 2nd machine
[11:47] <nosebleedkt> but still ceph tells me
[11:47] <nosebleedkt> root@cluster:~# ceph -w
[11:47] <nosebleedkt> health HEALTH_OK
[11:47] <nosebleedkt> monmap e1: 1 mons at {a=}, election epoch 0, quorum 0 a
[11:47] <nosebleedkt> osdmap e243: 4 osds: 3 up, 3 in
[11:47] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[11:47] <nosebleedkt> why it cant find the 4th osd?
[11:48] <nosebleedkt> which is in my 2nd machine
[11:50] <tnt> no idea ... check the logs of that new osd
[11:51] <agh> hello
[11:53] <agh> I've a question : If i have, let's say 12 disks per OSD/host, am I sure that with the default crush map, all my datas will be replicated on sevral OSD hosts (2 for instance) but not only on 2 disks of the same OSD host ? More quickly, if my whole OSD host (so 12 disks) goes down, am I sure not to lose any data ?
[11:54] <nosebleedkt> tnt, I have enabled cephx. Do I need to do something to the 2nd machine so it can connect to cluster?
[11:56] <tnt> The documentation http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ lists all the steps including all the auth keys stuff
[11:57] <morpheus__> agh: you can define this via the crushmap
[11:58] <morpheus__> agh: replication on other osd / other host / other rack
[11:58] <agh> morpheus__: yes, but what is the default beavhiour ?
[11:58] <tnt> default behavior is just across osd, not across host
[11:58] <morpheus__> unsure, my cluster is running since ~6 months
[11:59] <tnt> the basic crushmap doesn't know which osd is on what machine.
[11:59] <agh> tnt: ok, so by default, if a whole machine goes down, It is possible to lose data.
[12:00] <tnt> yes
[12:00] <agh> ok, thanks
[12:00] <tnt> But it depends how you created your cluster I guess ... I did most stuff manually and so maybe there is some automatic stuff ...
[12:01] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:02] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:16] * yoshi (~yoshi@p4105-ipngn4301marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:16] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[12:16] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[12:18] <nosebleedkt> tnt,
[12:18] <nosebleedkt> how to resolve this?
[12:19] <nosebleedkt> health HEALTH_WARN 3 pgs stuck unclean
[12:19] <nosebleedkt> i got something stupid with osd.0 and now i cant get health_ok
[12:19] * ScOut3R (~ScOut3R@ has joined #ceph
[12:19] <nosebleedkt> i removed osd.0 from cluster
[12:19] <nosebleedkt> and those pages left in there
[12:23] <roald> did you remove it from the crushmap?
[12:24] <nosebleedkt> yes
[12:25] * ScOut3R_ (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[12:25] <nosebleedkt> isnt any command that 'deletes' those 3 pages?
[12:25] <tnt> pgs != pages .... pgs = placement groups
[12:26] <roald> you don´t want to delete those pg´s, you´ll loose data
[12:26] <nosebleedkt> ok
[12:26] <nosebleedkt> so what do i do now?
[12:26] <roald> you can do a ceph health detail and see which pg´s are unclean, and than do a ceph pg [PG ID] query to see more details
[12:27] <nosebleedkt> http://pastebin.com/WYaTJgEs
[12:27] <nosebleedkt> this is what is shows
[12:27] <nosebleedkt> root@cluster:~# ceph -s
[12:27] <nosebleedkt> health HEALTH_WARN 15 pgs stuck unclean
[12:29] <roald> can you do a ´ceph pg 2.5d query´ ?
[12:30] <roald> (because that´s the only which actually has data in it, from the looks of it)
[12:30] <roald> (not that it really matters, because you need all pg´s to be clean for a fully operational cluster)
[12:32] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has left #ceph
[12:34] * joao (~JL@ has joined #ceph
[12:34] * ChanServ sets mode +o joao
[12:37] * joao sets mode -o joao
[12:50] * loicd (~loic@ Quit (Quit: Leaving.)
[12:53] * gaveen (~gaveen@ has joined #ceph
[13:02] <nosebleedkt> tnt, ok i formatted the cluster and made a new one
[13:02] <nosebleedkt> now from the start..
[13:03] <nosebleedkt> i have another machine/another ip
[13:03] <nosebleedkt> which it will have osd to provide to the cluster
[13:04] * `gregorg` (~Greg@ Quit (Read error: Connection reset by peer)
[13:06] <nosebleedkt> ping joao
[13:13] * boll (~boll@daimi-pat.daimi.au.dk) Quit (Read error: Connection reset by peer)
[13:13] * boll (~boll@daimi-pat.daimi.au.dk) has joined #ceph
[13:21] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[13:23] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[13:36] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[13:39] * maxiz (~pfliu@ has joined #ceph
[13:56] <nosebleedkt> ping tnt
[13:56] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[13:58] * loicd (~loic@ has joined #ceph
[14:03] * BManojlovic (~steki@ has joined #ceph
[14:19] * loicd (~loic@ Quit (Quit: Leaving.)
[14:24] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[14:25] * mib_4kh17m (17192e61@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:32] * boll_ (~boll@daimi-pat.daimi.au.dk) has joined #ceph
[14:32] * boll_ (~boll@daimi-pat.daimi.au.dk) Quit ()
[14:39] * boll (~boll@daimi-pat.daimi.au.dk) Quit (Ping timeout: 480 seconds)
[14:46] * illuminatis (~illuminat@0001adba.user.oftc.net) has joined #ceph
[14:46] * illuminatis (~illuminat@0001adba.user.oftc.net) Quit ()
[14:57] * loicd (~loic@ has joined #ceph
[15:00] * illuminatis (~illuminat@0001adba.user.oftc.net) has joined #ceph
[15:01] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:01] * boll (~boll@daimi-pat.daimi.au.dk) has joined #ceph
[15:02] * aliguori (~anthony@cpe-70-123-145-75.austin.res.rr.com) has joined #ceph
[15:02] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[15:07] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[15:07] * tryggvil_ is now known as tryggvil
[15:15] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:22] * boll (~boll@daimi-pat.daimi.au.dk) Quit (Quit: boll)
[15:27] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[15:28] * The_Bishop_ (~bishop@f052103079.adsl.alicedsl.de) has joined #ceph
[15:29] * markl (~mark@tpsit.com) has joined #ceph
[15:32] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[15:35] * The_Bishop__ (~bishop@e179016085.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[15:37] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[15:40] * nosebleedkt (~kostas@kotama.dataways.gr) Quit (Quit: Leaving)
[16:01] * ScOut3R (~ScOut3R@ has joined #ceph
[16:07] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[16:20] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:37] * Psi-Jack_ (~psi-jack@psi-jack.user.oftc.net) has joined #ceph
[16:44] * aliguori (~anthony@cpe-70-123-145-75.austin.res.rr.com) Quit (Remote host closed the connection)
[16:44] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:44] * Psi-Jack_ is now known as Psi-jack
[16:44] * Kioob`Taff1 (~plug-oliv@local.plusdinfo.com) has joined #ceph
[16:51] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[16:51] * vata (~vata@ has joined #ceph
[17:04] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[17:09] * aliguori (~anthony@cpe-70-123-145-75.austin.res.rr.com) has joined #ceph
[17:19] * mdawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:19] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:25] * jbarbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:34] * jlogan1 (~Thunderbi@2600:c00:3010:1:e12a:776f:2a6d:8a8) has joined #ceph
[17:37] * calebamiles (~caleb@c-98-197-128-251.hsd1.tx.comcast.net) has joined #ceph
[17:37] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:46] * ScOut3R (~ScOut3R@ Quit (Remote host closed the connection)
[17:48] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:54] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[17:56] * asadpand- is now known as asadpanda
[18:01] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[18:01] * ChanServ sets mode +o elder
[18:02] * tnt (~tnt@251.163-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:08] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[18:12] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[18:14] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:14] * drokita (~drokita@ has joined #ceph
[18:16] * Kioob`Taff1 (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[18:21] * gaveen (~gaveen@ has joined #ceph
[18:24] * gaveen (~gaveen@ Quit ()
[18:25] <elder> nhm, are you around?
[18:27] <nhm> elder: heya
[18:28] <elder> I just got my computer back after a <100% successful upgrade attempt. I just saw your e-mail. It works better for me to not meet for lunch also.
[18:28] <elder> Maybe Friday.
[18:28] <elder> Give you plenty of time to recover.
[18:32] * yehudasa (~yehudasa@2607:f298:a:607:45bd:9a9d:83a3:4164) Quit (Ping timeout: 480 seconds)
[18:33] * drokita (~drokita@ Quit (Read error: Connection reset by peer)
[18:33] <nhm> sounds good
[18:39] * boll (~boll@home.overgaard.org) has joined #ceph
[18:41] * Leseb (~Leseb@ Quit (Quit: Leseb)
[18:41] * yehudasa (~yehudasa@2607:f298:a:607:65d2:38a:861e:163e) has joined #ceph
[18:43] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[18:52] * boll (~boll@00012a62.user.oftc.net) Quit (Quit: boll)
[18:55] * drokita (~drokita@ has joined #ceph
[19:01] * dshea (~dshea@masamune.med.harvard.edu) has joined #ceph
[19:03] <dshea> Hi, I've run into an issue with re-mounting my test cluster, mount gives me the following output 'mount error 5 = Input/output error'
[19:03] <dshea> ceph -s shows me the following
[19:03] <dshea> health HEALTH_WARN 417 pgs backfill; 417 pgs degraded; 417 pgs recovering; 417 pgs stuck unclean; recovery 3231534/15374107 degraded (21.019%)
[19:03] <dshea> monmap e1: 5 mons at {a=,b=,c=,d=,e=}, election epoch 4, quorum 0,1,2,3,4 a,b,c,d,e
[19:03] <dshea> osdmap e74: 10 osds: 10 up, 10 in
[19:03] <dshea> pgmap v145064: 1920 pgs: 1502 active+clean, 1 active+clean+scrubbing, 417 active+recovering+degraded+backfill; 2255 GB data, 5498 GB used, 31743 GB / 37242 GB avail; 3231534/15374107 degraded (21.019%)
[19:03] <dshea> mdsmap e22: 1/1/1 up {0=b=up:replay}
[19:04] <sjust> dshea: what version are you running?
[19:04] <sjust> it looks like your cluster is in the process of backfilling, which is fine
[19:04] <sjust> but the osd in active+clean+scrubbing might be stuck
[19:05] <sjust> a fix for a bug which causes that just went into next, are you running 0.54+?
[19:05] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:05] <dshea> Ah, let me double check my version number
[19:05] <dshea> one moment
[19:05] * gregaf1 (~Adium@2607:f298:a:607:54b4:3d88:b69b:826e) has joined #ceph
[19:06] <dshea> ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
[19:06] * chutzpah (~chutz@ has joined #ceph
[19:06] <sjust> hmm, the bug I mentioned wasn't introduced until 0.54
[19:06] <sjust> is it still scrubbing?
[19:07] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Quit: Leaving.)
[19:07] <dshea> sjust: sorry, I'm still coming up to speed with ceph, how do I check that?
[19:07] <sjust> ceph pg dump | grep scrubbing
[19:07] * chutzpah (~chutz@ Quit ()
[19:07] <sjust> if it stays in that state, that might be the problem
[19:07] <dshea> sjust: thanks, let me check
[19:08] * chutzpah (~chutz@ has joined #ceph
[19:08] <sjust> that is, if the same pgid stays in scrubbing
[19:08] <dshea> 0.1d2 10094 0 0 0 4725459364 136136 136136 active+clean+scrubbing 2012-12-03 13:05:03.513550 25'11280 57'19689 [4,3,0] [4,3,0] 21'2241 2012-12-01 20:35:13.120091
[19:08] <sjust> ok, if 0.1d2 stays in scrubbing for a few minutes, that's likely the problem
[19:09] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[19:09] <dshea> ok, I will monitor it, if it is stuck, is there a way for me to recover?
[19:09] <sjust> osd 4 out and then in should bandaid it
[19:09] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[19:10] <dshea> sjust: thanks, I'll keep an eye on it and try that if it is stuck.
[19:12] * gregaf (~Adium@2607:f298:a:607:fd99:359:95ec:8287) Quit (Ping timeout: 480 seconds)
[19:14] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[19:15] <dshea> It looks like scrub moved on to a different pg, still receiving i/o error when trying to mount
[19:16] <sjust> what is ceph -s showing?
[19:16] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[19:16] <dshea> health HEALTH_WARN 416 pgs backfill; 416 pgs degraded; 416 pgs recovering; 416 pgs stuck unclean; recovery 3211101/15374107 degraded (20.886%)
[19:16] <dshea> monmap e1: 5 mons at {a=,b=,c=,d=,e=}, election epoch 4, quorum 0,1,2,3,4 a,b,c,d,e
[19:16] <dshea> osdmap e75: 10 osds: 10 up, 10 in
[19:16] <dshea> pgmap v145744: 1920 pgs: 1504 active+clean, 416 active+recovering+degraded+backfill; 2255 GB data, 5506 GB used, 31735 GB / 37242 GB avail; 3211101/15374107 degraded (20.886%)
[19:16] <dshea> mdsmap e22: 1/1/1 up {0=b=up:replay}
[19:18] <tnt> mmm, "bad crc in data" that doesn't sound too good.
[19:23] <dshea> I'm not entirely sure where I should look, no interface errors, no errors mounting any of the disks on the local nodes.
[19:24] <dshea> the local storage is LVM of 2 drives with XFS as the filesystem, this is then passed to ceph as the osd
[19:24] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:26] * The_Bishop_ (~bishop@f052103079.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[19:26] <dshea> I left some i/o testing on over the weekend and noticed updatedb on the lcient was trying to index the mount. I killed the updatedb and stopped the i/o test. After I cleanly umount'ed the ceph mount I was unable to remount it
[19:27] <dshea> Should also mention that I have since I added the mount to the PRUNE path for updatedb.
[19:28] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:30] <dshea> Al three clients get the same error when attempting to mount, so I am guessing I need to fix something on the cluster's side to make it available. I already tried bringing down the cluster cleanly via 'service ceph -a stop' and the brought it back up, but I'm guessing the internal state is hosed up somehow, although I am not sure why
[19:34] <dshea> hmmmm...."{0=b=up:replay(laggy or crashed)}" now showing in ceph -s
[19:35] * MapspaM is now known as SpamapS
[19:35] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[19:35] * ScOut3R (~scout3r@1F2E942C.dsl.pool.telekom.hu) has joined #ceph
[19:42] <gregaf1> dshea: is your MDS running? and do you have any logs from it?
[19:42] <gregaf1> my guess is that with all the PGs rebuilding everything's just going too slowly, or maybe they aren't responding at all, which could prevent replay
[19:44] <dshea> gregaf1: they seem to have crashed
[19:44] <dshea> all of them...I had 5 running
[19:44] <dshea> let me check their logs
[19:47] <dshea> http://pastebin.com/NTbRpT01
[19:48] <sjust> dshea: can you post the output of ceph pg dump?
[19:48] <dshea> sure
[19:48] <gregaf1> and a few more lines backwards in the log? I'd like to see the assert if I can
[19:50] <dshea> ok
[19:52] <dshea> http://pastebin.com/ctahTUjK <--is the mds
[19:52] <dshea> the pg dump is huge, need to up my scrollback buffer, 1 sec
[19:57] <dshea> sjust: it's several thousand lines long, do you want the entire thing?
[19:57] <sjust> yeah, you can upload it via sftp if you want to cephdrop@ceph.com
[19:59] <gregaf1> dshea: can you see if one of your MDS boxes has a core file? (if the ulimits and stuff are set up to allow that, they will)
[20:00] <gregaf1> if yes, run "gdb ceph-mds /path/to/core/file" and then bt and pastebin the output for me; it often includes a bit more than the log does
[20:00] <dshea> okie dokie, will do
[20:01] * boll (~boll@00012a62.user.oftc.net) has joined #ceph
[20:01] * joey__ is now known as terje
[20:01] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:01] <gregaf1> if it doesn't we'll need to get install the debug symbol packages and do this again, maybe with a bit more logging ong
[20:03] <dshea> no core files in /root/ and none in /usr/bin would it dump core anywhere else?
[20:04] <gregaf1> damn, probably not
[20:05] <joshd> check / too
[20:05] * The_Bishop (~bishop@f052103079.adsl.alicedsl.de) has joined #ceph
[20:05] <gregaf1> *blink* actually that's most common, isn't it?
[20:05] <gregaf1> I'm scanning too fast
[20:05] <dshea> bingo
[20:05] <dshea> sorry forgot it would dump there as well :/
[20:06] <dshea> one sec, need to install gdb
[20:08] <dshea> okie...here it is http://pastebin.com/W3sYeQWt
[20:09] <dshea> sjust: sftp cephdrop asked me for a password
[20:09] <sjust> asdf
[20:09] <rweeks> qwerty
[20:10] <dshea> pg_dump uploaded
[20:10] * scuttlemonkey_ is now known as scuttlemonkey
[20:11] <sjust> is the number of degraded objects in ceph -s going down?
[20:11] <dshea> yes
[20:11] <dshea> it is decreasing
[20:12] <sjust> ok, I think the osd side is ok
[20:12] <gregaf1> yeah, I'm pretty sure there's bad data in the mds log
[20:12] <gregaf1> but I can't tell because all I've got right now is that it's asserting in a very large function :(
[20:16] <roald> ... ¨large¨ is an understatement here, it´s a mind blowing 400 lines long :-)
[20:16] <gregaf1> a lot of our older code is inappropriately sized like that
[20:18] * boll (~boll@00012a62.user.oftc.net) Quit (Quit: boll)
[20:25] * dmick (~dmick@2607:f298:a:607:7992:e07e:4bb4:6742) has joined #ceph
[20:34] * BManojlovic (~steki@242-174-222-85.adsl.verat.net) has joined #ceph
[20:37] * dshea is now known as dshea_afk
[20:42] <todin> joshd: hi, do you remember where in cinder can I activate discard?
[20:43] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[20:45] * dshea_afk is now known as dshea
[20:45] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[20:45] <joshd> todin: no way directly in cinder, since it's determined by how nova runs qemu
[20:49] <joshd> todin: you'd need to modify the xml template for vms, although I don't know exactly how to enable it through libvirt. possibly with the custom command line passthrough
[20:50] <elder> gregaf1, I will (sadly) not be participating in Journal Club again today.
[20:54] <todin> joshd: I know how to do it direct via libvirt xml, but I don't know where the xml is generated in openstack
[20:55] <todin> joshd: You mean I should look into nova?
[20:57] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:58] * loicd (~loic@magenta.dachary.org) Quit ()
[21:04] * boll (~boll@00012a62.user.oftc.net) has joined #ceph
[21:10] <joshd> todin: yes, that's where libvirt is configured
[21:11] <joshd> todin: nothing is aware of discard support out of the box that I've seen
[21:11] * aliguori (~anthony@cpe-70-123-145-75.austin.res.rr.com) Quit (Remote host closed the connection)
[21:14] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:15] <slang> gregaf1, joshd, sjust: jc happening?
[21:16] <dmick> slang: yes
[21:16] <dmick> I believe
[21:16] <dmick> although gregaf1 was the only one who'd read them apparently :)
[21:16] <dmick> so maybe not
[21:17] <dmick> (not josh, me, elder, or samf)
[21:17] <dmick> (samf? wth? sjust)
[21:18] <gregaf1> slang: nobody joined in the first 6 minutes
[21:18] <gregaf1> so I left and thought we'd delay until people had had more than 3 days to read
[21:18] <slang> gregaf1: ah ok
[21:18] <elder> Back in half an hour or so..
[21:19] <slang> gregaf1: yeah I should setup a reminder - didn't notice it was time till 10 minutes after
[21:30] * aliguori (~anthony@cpe-70-123-145-75.austin.res.rr.com) has joined #ceph
[21:31] * The_Bishop (~bishop@f052103079.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[21:33] * mdawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.89 [Firefox 17.0/20121119183901])
[21:33] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[21:33] * boll (~boll@00012a62.user.oftc.net) Quit (Quit: boll)
[21:37] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[21:37] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[21:38] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[21:39] * The_Bishop (~bishop@f052103079.adsl.alicedsl.de) has joined #ceph
[21:39] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[21:39] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit ()
[21:45] * jbarbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.89 [Firefox 16.0.2/20121024073032])
[21:55] <todin> joshd: I found it, should be easily patched.
[21:55] * jrisch (~jrisch@4505ds2-hi.0.fullrate.dk) has joined #ceph
[21:56] <joshd> todin: cool
[21:56] <joshd> todin: btw, what is the libvirt xml to enable it?
[21:57] * miroslav1 (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[21:58] <jrisch> Hi all. Trying out the rados-gw. But I seem to miss some info about linking a user to a pool or bucket, so I get access denied. Does anyone know how to give a swift or s3 user rw rights to a pool/bucket....?
[21:58] <gregaf1> yehudasa does!
[21:59] * boll (~boll@00012a62.user.oftc.net) has joined #ceph
[22:02] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[22:02] * boll (~boll@00012a62.user.oftc.net) Quit ()
[22:04] * boll (~boll@00012a62.user.oftc.net) has joined #ceph
[22:08] <lurbs_> Are the fixes for the scrub related hung requests likely to be in the dev testing packages?
[22:08] * lurbs_ is now known as lurbs
[22:10] <dmick> jrisch: radosgw-admin
[22:11] <jrisch> dmick: I've tried that, but I can't find any info on adding caps to a user - it's not quite clear from the docs
[22:12] <jrisch> dmick: I've added the pool and specified the user in that same command but radosgw-admin pools list only list the pool - not that it's linked to the user.
[22:12] <jrisch> dmick: maybe it's me who's doing it wrong - and grasping it wrong too...
[22:20] <dmick> well, let's start with where you're getting access denied
[22:21] <elder> joshd, reviews!
[22:21] <elder> What, are you getting bored?
[22:21] <joshd> elder: haha, I just thought those ones were important and backportable. still 59 more I'm not reviewing yet
[22:22] <elder> They're important but probably less than it might seem. Only "special" requests have those issues and they shouldn't be happening all that often.
[22:24] <dmick> http://s3.amazonaws.com/rapgenius/snl-church-lady.jpg
[22:24] <elder> SATAN?
[22:25] <rweeks> perhaps it could be…. oh, I don't know...
[22:25] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:25] <dmick> I'm certain he's involved in this bug I'm looking at
[22:25] <Psi-jack> When setting up Ceph to use Btrfs, is it better to have Btrfs utilize it's own native raid10, or another option? Also with considering that the main storage will be plattered drives with the journal on high-speed SSD with Synchronous MLC NAND, for use with CephFS purposes.
[22:26] <elder> http://www.hark.com/clips/kprcrcxptt-could-it-be-satan
[22:28] <jrisch> dmick: Here's the log from the request: http://pastebin.com/Ks99Zz3d
[22:30] <Psi-jack> Hmmm.
[22:30] <rweeks> Psi-jack: what's the benefit of RAID10 in that scenario?
[22:30] <dmick> jrisch: ok, so we're talking about the s3 request across HTTPS; that's a start
[22:30] <dmick> presumably you gave your S3 client the proper secret and access keys? I don't see them in the log, but perhaps they're not logged, not sure
[22:31] <sjust> rweeks: decreases probability of osd failure, so potentially decreases necessary number of pg replicas
[22:31] <jrisch> dmick: Well, it should be the swift protocol…
[22:31] <rweeks> would you set up one OSD per RAID10?
[22:31] <rweeks> and how big would that raid10 be?
[22:31] <jrisch> dmick: Yes they are: x-auth-user and x-auth-key
[22:32] <Psi-jack> rweeks: Speed and performance. The setup I'm building is 3 storage servers, 4 HDD's with 1 SSD, for use for housing VM Rados disk images, as well as some servers will utilize cephfs for shared content accross multiple computers, such as website content, mail content, etc.
[22:32] <sjust> Psi-jack: haven't heard much about people trying that, it's usually nicer to use the disks to increase the number of osds to exploit many-to-many replication
[22:32] <dmick> jrisch: ok. I just barely know S3 and am even less familiar with Swift, so I'm going to page yehudasa again
[22:32] <rweeks> so you would treat the 4 HDDs as a single RAID10 and a single OSD?
[22:32] <jrisch> dmick: Ok. Thank you..! :-)
[22:32] <Psi-jack> rweeks: That would be the idea, yes.
[22:32] <rweeks> because if you do that, you end up with 3 OSDs total
[22:33] <Psi-jack> rweeks: Right. Which is why I'm asking this question. :)
[22:33] <rweeks> that seems to me to be less than optimal
[22:34] <jrisch> dmick: But AFAICT, I'm getting in, but can't list the buckets (because there is none) and I can't create one with radosgw-admin
[22:34] <Psi-jack> So Ceph could better benefit from splitting up the disks as individual devices, doing it's own form of replication and integrity, than relying on underlying replication and redundancy technology?
[22:34] <dmick> jrisch: can you create one with the Swift client?
[22:35] <rweeks> that is what we recommend, yes, Psi-jack
[22:35] <sjust> Psi-jack: that's the idea
[22:35] <Psi-jack> Interesting.
[22:35] <jrisch> dmick: I haven't tried yet
[22:35] <jrisch> dmick: I haven't got it installed.
[22:35] <Psi-jack> But, adding and removing OSD's, how well does that work with Ceph?
[22:35] <sjust> Psi-jack: consider a single disk failure, with the raid10 scenario, you replace the drive and there is a direct drive to drive copy of a (let's say, 2TB) disk
[22:36] <sjust> with ceph's replication, you instead have the remaining osds re-replicate data
[22:36] <sjust> which should be faster
[22:36] * Psi-jack nods.
[22:36] <sjust> if the dead osd had 100 PGs, you have potentially 100 osds participating
[22:36] <jrisch> I think I'll try that tomorrow. It's getting late here. Thank you for your help so far. I might get on one of the coming days to talk to yehudasa about it....
[22:36] <sjust> anyway, that's the conventional wisdom
[22:36] <Psi-jack> sjust: Interesting.
[22:36] <yehudasa> jrisch: not sure I really understaood the question
[22:37] <yehudasa> .. what's linking a user to a pool?
[22:37] <Psi-jack> sjust: heh. End of next week, I should have all 4 HDD's in my hand ready to start working on the new SAN deployment. :)
[22:37] <jrisch> yehudasa: I want to use a specific pool with a swift user
[22:37] <sjust> cool
[22:38] <jrisch> yehudasa: But I can't seem to link it to the user
[22:38] <Psi-jack> Refactoring my other two storage server's is going to be.... Fun.... On it's own..
[22:38] <yehudasa> jrisch: just so that we're on the same page.. what's linking a user to a pool?
[22:38] <Psi-jack> But, SSD's are already in place, plus the 4x1TB HDD's except on the last server. Started it off with just the SSD. ;)
[22:39] <jrisch> yehudasa: I've tried radosgw-admin pool add --uid=username --pool=foo
[22:39] <yehudasa> jrisch: that command is global, so it affects all users
[22:39] <Psi-jack> So, hmmm.. 4x3 main storage OSD's backed by 3 SSD's for journal. Should be pretty fricken fast, no? ;)
[22:39] <yehudasa> jrisch: what it does is adding a rados pool to the data placement set
[22:39] <jrisch> yehudasa: Ok.
[22:40] <yehudasa> so new buckets that are created would be written to that pool
[22:40] <rweeks> yes, Psi-jack, that should be fast
[22:40] <jrisch> yehudasa: But if you take a look at the error I get: http://pastebin.com/Ks99Zz3d
[22:40] <rweeks> and potentially more resilient than raid10, really.
[22:40] <Psi-jack> Hehe
[22:40] <Psi-jack> I can hope.
[22:41] <jrisch> yehudasa: It looks like I get in but get an error because there is no bucket, right?
[22:41] <Psi-jack> And also hope it'll be faster than my previous attempts, doing CephFS on 2x4-Raid10 ZFS OSD's.
[22:41] <yehudasa> jrisch: you're url should be /swift/v1.0 probably
[22:42] <jrisch> yehudasa: Ahr, ok. I'll try that - just a sec...
[22:44] <jrisch> yehudasa: It seems like it uses SCRIPT_URL=/v1.0 for the URL.
[22:45] <jrisch> yehudasa: could it be that I have to create a bucket with the swift client first..?
[22:45] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[22:46] <jrisch> yehudasa: it seems like it tries to list all buckets: /v1.0:list_bucket:authorizing
[22:46] <yehudasa> jrisch: that's actually trying to list a single bucket named v1.0
[22:47] <Psi-jack> Heh
[22:47] <jrisch> yehudasa: OK. Hmm...
[22:47] <Psi-jack> I think the only concern I'll have with the new Ceph-based SAN is bottlenecking the 2x1Gb multipath SAN networks. ;)
[22:48] * vata (~vata@ Quit (Quit: Leaving.)
[22:48] <yehudasa> jrisch: what url did you put in cyberduck?
[22:48] * vata (~vata@ has joined #ceph
[22:48] <Psi-jack> Oh, wow, CyberDuck can interface with CephFS's radosgw?
[22:49] <yehudasa> Psi-jack: yes
[22:49] * vata (~vata@ Quit ()
[22:49] <Psi-jack> using the S3 protocol approach or something?
[22:49] <jrisch> yehudasa: The IP address of the server, the port number and then I tried to use /v1.0 and /swift/v1.0 but to no avail....
[22:49] <yehudasa> Psi-jack: s3/swift
[22:49] * vata (~vata@ has joined #ceph
[22:49] <Psi-jack> Heh, Interesting.
[22:50] <yehudasa> jrisch: put ip:port/auth
[22:50] <Psi-jack> Hmmm, lastly, would you think BtrFS would perform better with CephFS, or XFS?
[22:50] <jrisch> yehudasa: That's not working either...
[22:51] <yehudasa> jrishch: what does the log say?
[22:53] * calebamiles (~caleb@c-98-197-128-251.hsd1.tx.comcast.net) Quit (Quit: Leaving.)
[22:53] * calebamiles (~caleb@c-98-197-128-251.hsd1.tx.comcast.net) has joined #ceph
[22:54] <jrisch> yehudasa: Still the same. I actually now think there's a bug in Cyberduck: It doesn't change the path - The log says /v1.0 regardless of what I type in...
[22:55] <yehudasa> I see
[22:55] <jrisch> yehudasa: I've just visited the site with a browser, and then the log file says /auth....
[22:55] <jrisch> yehudasa: So it must be Cyberduck. I'm soory
[22:56] <rweeks> Psi-jack: you're mixing questions there
[22:56] <jrisch> yehudasa: I'm sorry to have bothered you. I'll try the swift client tomorrow and see if I can get it to work. Thank you for your patience!
[22:56] <Psi-jack> Ahhh.. Ooops.
[22:57] <rweeks> Psi-jack: you need an underlying FS for each OSD. That can be btrfs, xfs or ext4
[22:57] <Psi-jack> Would you think CephFS would perform better with BtrFS, or XFS?
[22:57] <rweeks> nhm: you want to answer that? :)
[22:58] <Psi-jack> Not so concerned about ext4 in this case. I already know XFS would rip ext4 to shreds in performance in this environment.
[22:59] <rweeks> nhm is the best guy to answer that, but I suspect the answer is "it depends"
[22:59] <Psi-jack> But, BtrFS vs XFS on platter drives does pose a good question. The wear-leveling of BtrFS is a lot different than the way XFS handles it's journal location.
[22:59] * vata (~vata@ Quit (Quit: Leaving.)
[22:59] <Psi-jack> I'm using BtrFS on the SSD's that'll provide the journalling for the Ceph, though.
[23:00] <jrisch> yehudasa: Hmm. Using S3 instead seemed to work....
[23:00] <Psi-jack> Err, for the OSD's. ;)
[23:00] * vata (~vata@ has joined #ceph
[23:00] <yehudasa> jrisch: awesome
[23:00] <lurbs> Why not put the journals directly onto a block device, instead of into a filesystem?
[23:01] <Psi-jack> lurbs: If Ceph can do that, that'll be done.
[23:01] <Psi-jack> I didn't know it could.
[23:01] <jrisch> yehudasa: yes, that's awesome. Now I run into DNS related errors, but I won't bother you with those! :-) Once again thank you...!
[23:02] <dmick> Psi-jack: absolutely
[23:03] <Psi-jack> Nice. :)
[23:03] <Psi-jack> Less filesystem overhead the better. ;)
[23:04] <Psi-jack> So, Ceph's journal could utilize 1 partition of an SSD, unformatted, for 4 OSD's in the same physical box?
[23:04] * jrisch (~jrisch@4505ds2-hi.0.fullrate.dk) Quit (Quit: jrisch)
[23:04] <lurbs> You'd need a separate partition for each OSD.
[23:04] <Psi-jack> Okay
[23:05] <sjust> and the ssd would have to have as high a sequential write speed as all 4 spinning disks combined in order to not be a bottleneck
[23:05] <lurbs> That's the downside, it's a little more fiddly to set up - you're more likely to need to change the default value of 'osd journal' from '/var/lib/ceph/osd/$cluster-$id/journal' to something specific for each OSD.
[23:05] <Psi-jack> sjust: It's an OCX Vertex 3, which is pretty darned fast.
[23:06] <lurbs> Yeah. Which, for me, is an argument against RAID for the OSD. Means your journal is more likely to be a bottleneck.
[23:06] <Psi-jack> lurbs: *nods* Yeaah.. I kind of expected the setup to be a littttttle more difficult than normal. I'm okay with that.
[23:09] * miroslav1 (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[23:10] <nhm> Psi-jack: I do 3 OSDs with 1 Intel 520 SSD pretty successfully.
[23:10] <nhm> Psi-jack: the new S3700 looks very appealing.
[23:10] <lurbs> My personal compromise is with six OSDs in a server, with single XFS formatted drive for each, and two SSDs for journals split so that each is a journal for half the OSDs.
[23:10] <Psi-jack> Heh.
[23:11] <Psi-jack> I don't do Intel SSD. I sent back more of those than I can count, until they stopped sending me new ones to fry. Killed so many in ~2 weeks or less with a simple script I wrote to hammer the crap out of it with writes and verified reads.
[23:12] <nhm> Interesting
[23:12] <nhm> I haven't had any of mine fail yet.
[23:12] <lurbs> Odd, I wouldn't do anything but Intel for servers. Probably Samsung for workstations, etc.
[23:12] <Psi-jack> My goal was literally to test how well they could hold up to extremely heavy utilization non-stop.
[23:16] <Psi-jack> nhm: So, for the OSD's, for performance, BtrFS or XFS for the platters?
[23:17] <nhm> Psi-jack: btrfs starts off better. It used to degrade much faster than XFS, but I haven't tested how things age on more recent kernels.
[23:17] <Psi-jack> Yeaah. I'd be using Arch Linux for my Ceph clusters, so that's preeeeettty current in terms of the kernel. Curreently at 3.6.8
[23:18] <nhm> Psi-jack: Ext4 actually puts in a pretty good showing, but I have no idea how it degrades.
[23:19] <Psi-jack> Hmmm.
[23:20] <Psi-jack> Yeah. That was my concern of the BtrFS on platter drives. Great for SSD, but their wear-leveling technique moves the drive head all over the place, unlike XFS/ext4 which keeps the internal journal around the center so the head has the least movement ratio to/from the journal spot.
[23:20] <Psi-jack> Granted, that journal could also be offloaded to the SSD as well. heh
[23:21] <nhm> Psi-jack: if you haven't seen them, I've got some articles up that look at write performance with SSD journals on a bunch of differnt controllers.
[23:21] <Psi-jack> I'd like to see that. :)
[23:21] <nhm> actually, article, the other one is for journals on the spinning disks.
[23:21] <nhm> http://ceph.com/community/ceph-performance-part-1-disk-controller-write-throughput/
[23:22] <Psi-jack> Mmmmmm, shiny things!
[23:22] <nhm> newer kernel and newer ceph do a bit better for small writes now.
[23:24] <lurbs> Psi-jack: I was just thinking that. I wonder if having both the Ceph and BTRFS journals on SSDs was worthwhile, or if it was pretty much redundant to split off the BTRFS journal too.
[23:24] <Psi-jack> lurbs: Heh
[23:25] <Psi-jack> Well, the SSD writes are generally WAAAAAY faster than platter writes.
[23:26] <darkfaded> Psi-jack: do you have a smaller-than-glib libc on the arch box
[23:26] <darkfaded> ?
[23:26] * jlogan1 (~Thunderbi@2600:c00:3010:1:e12a:776f:2a6d:8a8) Quit (Ping timeout: 480 seconds)
[23:26] <lurbs> The chance of me actually getting enough SSDs to test that is .. slim.
[23:26] <Psi-jack> darkfaded: Eh?
[23:26] <darkfaded> ublic instead of glibc for example
[23:26] <Psi-jack> lurbs: Heh.
[23:26] <darkfaded> i know some arch linux users do that
[23:26] <Psi-jack> darkfaded: Heck no! I don't want that garbage.
[23:26] <darkfaded> ok :)
[23:26] <Psi-jack> heh
[23:27] <nhm> I need to check with the btrfs guys again. There was a google summer of code project to allow you to put btrfs metadata on an external disk (ie SSD).
[23:27] <Psi-jack> glibc or spiked pit trap. Take your pick. :)
[23:27] <dmick> well you know ubuntu is using eglibc now
[23:27] <dmick> as is Debian IIRC
[23:28] <Psi-jack> Hmmmm...
[23:30] <Psi-jack> Hmmmm
[23:30] <Psi-jack> Yeah., you are correct. Interesting.
[23:30] <Psi-jack> Ironically, I'm slowly moving away from ubuntu. ;)
[23:31] <wer> btrfs or xfs for ceph?
[23:31] <Psi-jack> wer: Yes, which!?
[23:31] <wer> xfs for me.
[23:32] <wer> I thought I saw some weird throughput issues on here the otherday.... but I lost my scrollback... is btrfs ready? Cause there are a ton of features.
[23:32] <Psi-jack> wer: It's not ready for prime time, yet.. But it's looking pretty solid, thus far.
[23:32] <lurbs> Personally I wouldn't touch BTRFS yet. One new scary technology at a time.
[23:33] <Psi-jack> I tested BtrFS using the very same script I wrote to kill SSD's, and it held up pretty good. :)
[23:33] <wer> lol right? That is what I felt lurbs!
[23:34] <wer> k. Hmmm. Well I am building a multinode ceph deployment now... but I chose xfs.... but only cause I am like lurbs in that regard.
[23:34] <Psi-jack> heh
[23:34] <nhm> we recommend xfs right now, but I think btrfs will be the future. Ext4 might be a good option.
[23:34] <Psi-jack> Well, I'll be maintaining secondary backups of everything on my primary storage cluster anyway.
[23:34] <wer> :)
[23:35] <nhm> ZFS would be fun to play with.
[23:35] * PerlStalker (~PerlStalk@ has joined #ceph
[23:35] <Psi-jack> nhm: heh.
[23:35] <Psi-jack> Not so much.
[23:35] <Psi-jack> Well.
[23:35] <Psi-jack> ZFS for what IT is, is pretty amazing.
[23:35] <Psi-jack> that's what I'm moving away from. :)
[23:35] <nhm> Psi-jack: ah, interesting
[23:35] <nhm> Psi-jack: ZOL?
[23:35] <Psi-jack> Yes
[23:36] <wer> well thanks nhm, I think that is what I was looking for. no need for filestore xattr use omap = true with xfs right.
[23:36] <wer> ?
[23:36] <Psi-jack> It's a bit on the slower side, but it maintains well longevity-wise, and the subvolume support is effing amazing.
[23:36] <Psi-jack> Could never get dedup to work on ZoL though.
[23:37] <nhm> wer: nope. I tend to use it anyway, but I've heard mixed reports on whether or not it's faster.
[23:37] <wer> ok. But it is just a speed issue not a safety one?
[23:37] <Psi-jack> And Ceph, rather much hated TRYING to use ZFS.. Heck, even had some major issues with an XFS block subvolume, just trying to initialize the OSDs.
[23:38] <nhm> wer: I'm not familiar enough with leveldb vs filesystem xattrs to know how much to be concerned about there.
[23:38] <wer> k. No worries. ty!
[23:39] <wer> Psi-jack: stop scaring me with words like trying....
[23:39] <wer> :)
[23:39] <Psi-jack> heh
[23:39] <Psi-jack> wer: But, I am the WARRIOR that says, TRYING!! ;)
[23:39] <wer> Lets all HOPE they go away!!!!!!
[23:39] <Psi-jack> heh
[23:40] <wer> lol
[23:41] <wer> so I generated this gigantic conf file.... with 96 disks in it on 4 nodes.... I am wondering if there were any of those $name $id variables that would make the config more portable.... or smaller. Or those variables are only for the global parts... like [osd]?
[23:42] <wer> So few example configs floating around out there....
[23:43] <wer> And I will grow to 12 nodes shortly...... midtest.... hmmm. It seems like a lot of redundant config to me...
[23:47] <gregaf1> wer: you can use those variables anywhere, and you don't need to have the full cluster contents on every node, just the ones that matter for local daemons
[23:47] <wer> thank you gregaf1, so the mons handle all the magic of knowing all the nodes?
[23:48] <gregaf1> essentially yes, depending on how you do your daemon start-up
[23:49] <wer> Do the mon boxes have to have the entire cluster contents though? Cause I just want a unified config....
[23:49] <gregaf1> there are a lot of different ways you can set it up depending on what your approach is
[23:52] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:52] <wer> gregaf1: hmmm. ok. Well that clarifies things :) I think I will leave all the specifics in a unified config and let every host ignore what it doesn't need :)
[23:52] <gregaf1> sorry, I've gotta go, later
[23:52] <wer> me to no worries!
[23:52] <wer> ty
[23:54] * roald (~Roald@ Quit (Read error: Connection reset by peer)
[23:55] * ScOut3R (~scout3r@1F2E942C.dsl.pool.telekom.hu) Quit (Quit: Lost terminal)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.