#ceph IRC Log

Index

IRC Log for 2012-10-22

Timestamps are in GMT/BST.

[0:04] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[0:07] * gregorg (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[0:07] * gregorg (~Greg@78.155.152.6) has joined #ceph
[0:07] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:07] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:13] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[0:20] <sage> imjustmatthew: i'd rather leave it undocumented at this point.
[0:21] <sage> eventually we want to support multiple mds clusters in the same ceph cluster, for example, and the current newfs requires pool numerical ids instead of names, etc.
[0:21] * The_Bishop_ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[0:28] * The_Bishop (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[0:30] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[0:34] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:59] * lofejndif (~lsqavnbok@9KCAACIIF.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:07] * tziOm (~bjornar@ti0099a340-dhcp0778.bb.online.no) Quit (Remote host closed the connection)
[1:07] * nhm_ (~nh@184-97-251-146.mpls.qwest.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * todin (tuxadero@kudu.in-berlin.de) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * Meyer___ (meyer@c64.org) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * tontsa (~tontsa@solu.fi) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * hijacker (~hijacker@213.91.163.5) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * rz (~root@ns1.waib.com) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * wonko_be (bernard@november.openminds.be) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * coredumb (~coredumb@ns.coredumb.net) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * spaceman-39642 (l@89.184.139.88) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * jmcdice_ (~root@135.13.255.151) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * MarkS (~mark@irssi.mscholten.eu) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * liiwi (liiwi@idle.fi) Quit (reticulum.oftc.net charon.oftc.net)
[1:07] * nhm_ (~nh@184-97-251-146.mpls.qwest.net) has joined #ceph
[1:07] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[1:07] * Meyer___ (meyer@c64.org) has joined #ceph
[1:07] * tontsa (~tontsa@solu.fi) has joined #ceph
[1:07] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[1:07] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) has joined #ceph
[1:07] * rz (~root@ns1.waib.com) has joined #ceph
[1:07] * wonko_be (bernard@november.openminds.be) has joined #ceph
[1:07] * coredumb (~coredumb@ns.coredumb.net) has joined #ceph
[1:07] * spaceman-39642 (l@89.184.139.88) has joined #ceph
[1:07] * jmcdice_ (~root@135.13.255.151) has joined #ceph
[1:07] * MarkS (~mark@irssi.mscholten.eu) has joined #ceph
[1:07] * liiwi (liiwi@idle.fi) has joined #ceph
[1:09] * nwatkins2 (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[1:23] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:24] * nwatkins1 (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[1:27] * guilhem_ (~guilhem@2a01:e35:2e13:acd0:9c4f:8e55:417f:443b) Quit (Quit: Quitte)
[1:31] <phantomcircuit> so i have nodes which are massively different in terms of latency and disk throughput
[1:31] <phantomcircuit> am i right in assuming if i set the weight to be higher on the faster nodes in the CRUSH map that data will automagically migrate around in an optimal way?
[1:35] * grant (~grant@202-173-147-27.mach.com.au) has joined #ceph
[1:39] * Cube (~Adium@c-50-140-113-114.hsd1.fl.comcast.net) has joined #ceph
[1:39] * Cube (~Adium@c-50-140-113-114.hsd1.fl.comcast.net) has left #ceph
[1:59] * imjustmatthew (~imjustmat@pool-74-110-201-156.rcmdva.fios.verizon.net) Quit (Read error: Connection reset by peer)
[2:02] * grant (~grant@202-173-147-27.mach.com.au) Quit (Quit: Leaving)
[2:06] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[2:09] * nwatkins1 (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[2:10] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[2:23] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[2:27] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[2:34] * The_Bishop_ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[2:35] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[2:37] * The_Bishop_ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[2:44] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[2:44] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[2:59] * nwatkins1 (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[3:19] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:21] * maelfius (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) Quit (Quit: Leaving.)
[3:21] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:21] * morganfainberg (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[3:22] * morganfainberg is now known as Guest2657
[3:23] * Guest2657 (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) Quit ()
[3:23] * darkfaded (~floh@188.40.175.2) Quit (Remote host closed the connection)
[3:24] * mdrnstm (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[3:25] * darkfader (~floh@188.40.175.2) has joined #ceph
[3:26] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[3:28] * mdrnstm is now known as morgangainberg
[3:31] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[3:31] * morgangainberg is now known as mfainberg
[3:31] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[3:32] * mfainberg is now known as morganfainberg
[3:34] * The_Bishop_ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[3:46] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[3:46] * mdrnstm (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[3:46] * mdrnstm (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) Quit ()
[3:47] * morganfainberg (~mdrnstm@pool-71-160-33-115.lsanca.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[4:04] * nwatkins1 (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:20] * deepsa (~deepsa@122.172.35.88) has joined #ceph
[4:22] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[4:45] * Cube (~Cube@12.248.40.138) has joined #ceph
[5:07] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[5:15] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[5:26] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:26] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:33] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[6:29] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[6:39] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[6:47] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[6:47] * calebamiles1 (~caleb@c-24-128-194-192.hsd1.vt.comcast.net) has joined #ceph
[6:48] * calebamiles (~caleb@c-24-128-194-192.hsd1.vt.comcast.net) Quit (Read error: Connection reset by peer)
[7:01] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[7:03] * f4m8 (f4m8@kudu.in-berlin.de) has joined #ceph
[7:10] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[7:22] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:33] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:41] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[7:42] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[7:47] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[7:47] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit ()
[8:05] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[8:14] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[8:17] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:22] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:22] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:29] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[8:38] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[8:50] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[8:56] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:57] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:10] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[9:19] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:21] * loicd (~loic@178.20.50.225) has joined #ceph
[9:25] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:27] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[9:41] <phantomcircuit> trying to mkcephfs
[9:41] <phantomcircuit> it gets to mon.a and fails
[9:41] <phantomcircuit> pastebin.com/raw.php?i=EaYdruZj
[9:41] <phantomcircuit> any idea what that error is about?
[9:42] <phantomcircuit> also why is it trying to rm /var/lib/ceph/mon/ceph-a ?
[9:47] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:55] <todin> phantomcircuit: do you have the the folder /var/lib/ceph/mon?
[10:18] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:31] * The_Bishop_ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[10:31] <phantomcircuit> todin, yes
[10:32] <phantomcircuit> /var/lib/ceph/mon/ceph-a is a partition actually
[10:32] <phantomcircuit> is that not the right way to do that?
[10:35] * andret (~andre@pcandre.nine.ch) has joined #ceph
[10:37] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[11:01] <todin> phantomcircuit: it should be a filesystem
[11:02] <phantomcircuit> todin, i meant it's a btrfs fs with the root of the fs mounted at /var/lib/ceph/mon/ceph-a
[11:02] <phantomcircuit> which is presumably with the rm fails
[11:02] <phantomcircuit> why*
[11:02] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) Quit (Quit: tryggvil)
[11:09] <Fruit> hrm, if I add filestore xattr use omap = true to existing osd nodes, will that break things?
[11:20] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:21] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Remote host closed the connection)
[11:21] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[11:34] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:34] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:44] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[11:45] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:45] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:49] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:50] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[11:50] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[12:00] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:00] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:13] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[12:19] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[12:20] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:20] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:40] * ballysta (0xff@111.223.255.22) Quit (Remote host closed the connection)
[12:42] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:42] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[12:46] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) Quit (Read error: Connection reset by peer)
[12:49] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[12:49] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[12:49] * tryggvil_ is now known as tryggvil
[12:51] * MikeMcClurg (~mike@62.200.22.2) has joined #ceph
[12:58] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[13:11] * deepsa_ (~deepsa@101.63.207.210) has joined #ceph
[13:11] * deepsa (~deepsa@122.172.35.88) Quit (Ping timeout: 480 seconds)
[13:11] * deepsa_ is now known as deepsa
[13:33] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[13:41] * The_Bishop_ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[13:50] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[13:50] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Read error: Connection reset by peer)
[13:51] * samia (~samia@102.212.15.37.dynamic.jazztel.es) has joined #ceph
[13:53] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[13:53] * MikeMcClurg (~mike@62.200.22.2) Quit (Remote host closed the connection)
[13:54] * samia (~samia@102.212.15.37.dynamic.jazztel.es) Quit ()
[13:56] * BManojlovic (~steki@147.32.97.12) has joined #ceph
[13:58] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[14:07] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[14:11] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[14:12] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[14:13] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) Quit (Remote host closed the connection)
[14:13] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[14:13] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Read error: No route to host)
[14:13] * tryggvil_ is now known as tryggvil
[14:14] * loicd (~loic@178.20.50.225) Quit (Ping timeout: 480 seconds)
[14:16] * steki-BLAH (~steki@bojanka.net) has joined #ceph
[14:20] * BManojlovic (~steki@147.32.97.12) Quit (Ping timeout: 480 seconds)
[14:26] * nwatkins_ (~nwatkins@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[14:27] * nwatkins_ (~nwatkins@c-50-131-197-174.hsd1.ca.comcast.net) Quit ()
[14:32] * loicd (~loic@82.233.234.24) has joined #ceph
[14:37] * Leseb (~Leseb@193.172.124.196) Quit (Read error: Connection reset by peer)
[14:37] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[14:51] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[14:59] * lofejndif (~lsqavnbok@82VAAHDC9.tor-irc.dnsbl.oftc.net) has joined #ceph
[15:09] * mgalkiewicz (~mgalkiewi@staticline-31-182-149-180.toya.net.pl) has joined #ceph
[15:11] * aliguori (~anthony@cpe-70-123-146-246.austin.res.rr.com) has joined #ceph
[15:16] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[15:18] * long (~chatzilla@118.195.65.95) has joined #ceph
[15:28] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[15:30] * alphe (~alphe@200.111.172.138) has joined #ceph
[15:30] <alphe> hello !
[15:31] <alphe> I still have the sizing problems but now it is not in the df -h that it is shown it is on the windows properties
[15:31] <alphe> it's not bothering since we can browse perfectly the data
[15:32] <alphe> but it is "suprising" for the end user.
[15:35] <alphe> as a side note I reached 230 Mb/s through windows -> samba/ubuntu -> ceph cluster
[15:37] * mikegrb (~michael@mikegrb.netop.oftc.net) Quit (Remote host closed the connection)
[15:40] <match> alphe: is that smb sitting on top of cephfs?
[15:52] * jpds (~jpds@faun.canonical.com) has joined #ceph
[15:53] * gregaf (~Adium@2607:f298:a:607:3d35:1293:755a:2ee9) Quit (Quit: Leaving.)
[15:57] * gregaf (~Adium@2607:f298:a:607:e8dc:95dd:2ab0:ecb5) has joined #ceph
[15:57] * joao (~JL@89.181.150.224) Quit (Remote host closed the connection)
[15:59] <gregaf> that's weird…Sage suggested that maybe the wrong sizes are because we use unusual block sizes in order to report large enough disk sizes
[16:01] <mgalkiewicz> does ceph replication is smart enough to replicate data to osds running on different servers? I have 2 osd instances per server.
[16:01] <gregaf> depends on how you set up the cluster
[16:02] <gregaf> if you used mkcephfs and specified the host fields correctly, and have more than two hosts, then yes
[16:02] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[16:02] <gregaf> you can see the failure domains semi-visually if you run "ceph osd tree"
[16:03] * deepsa (~deepsa@101.63.207.210) Quit (Ping timeout: 480 seconds)
[16:04] * deepsa (~deepsa@122.167.173.220) has joined #ceph
[16:04] <gregaf> also
[16:04] <gregaf> morning ceph!
[16:04] <gregaf> (haha I beat nhm_ and joao!)
[16:06] * joao (~JL@89.181.150.224) has joined #ceph
[16:07] <gregaf> phantomcircuit: yes, OSDs are allocated data (and thus IO) proportional to their weight, so if you have bigger and faster disks then you should increase their weight
[16:07] <gregaf> I believe the reason why your mkcephfs fails is because it's a filesystem root, as you said, and the reason is because mkcephfs is trying to ensure that it's got a clean starting point
[16:10] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[16:19] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[16:20] <mgalkiewicz> gregaf: I have just added another osd to ceph.conf with the same host name and different path to data, created filesystem and started
[16:21] <gregaf> then no, probably not
[16:21] <gregaf> You can adjust the map so that it is isolating them properly, though
[16:22] <mgalkiewicz> ok
[16:22] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:22] <gregaf> http://ceph.com/docs/master/cluster-ops/crush-map/#addosd
[16:23] <mgalkiewicz> and what are pros and cons of journal stored on tmpfs? what happens when the server crash?
[16:24] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[16:25] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:27] <gregaf> you lose the journal
[16:27] <gregaf> that's bad
[16:27] <gregaf> don't do that unless you really don't care
[16:27] <gregaf> if you're using xfs or ext3 for your backing store, losing the journal means you've lost the OSD
[16:28] <mgalkiewicz> I am using btrfs
[16:28] <gregaf> if you're using btrfs it'll just kick the OSD back in time a bit, but you've still lost writes
[16:28] <NaioN> it defeats the purpose of a journal :)
[16:28] <mgalkiewicz> this is what I thought however I've seen that some people do this
[16:29] <NaioN> it's better to have no journal in that case
[16:29] <NaioN> well it will be slower
[16:29] <mgalkiewicz> is it better to keep the data or journal on ssd disk (compare to normal disks)?
[16:30] <mgalkiewicz> in terms of performance
[16:30] <NaioN> using tmpfs as journal gets you performance but maybe for a huge price
[16:30] <NaioN> yes
[16:30] <NaioN> ideally you have a small fast journal so you can cope with a lot of writes
[16:32] <mgalkiewicz> but is it a good idea to put data on ssd or is it quite fast with just journal on such disk?
[16:33] <gregaf> sure, it's great to run everything on SSDs
[16:33] <gregaf> or battery-backed RAM would be even better :D
[16:34] <gregaf> but most people want a bit more storage than they can get out of SSDs so they just use the SSD for a journal (fast commits) and then let it flush out to a regular disk
[16:34] <todin> gregaf: do you know an recent battery-backe ram modules?
[16:34] <Fruit> gregaf: http://www.ramsan.com/products/rackmount-flash-storage/ramsan-630
[16:34] <gregaf> lol, no, I don't know anything about them beyond lusting after the idea ;)
[16:35] * noob2 (a5a00214@ircip4.mibbit.com) has joined #ceph
[16:35] * steki-BLAH (~steki@bojanka.net) Quit (Ping timeout: 480 seconds)
[16:35] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[16:35] <gregaf> I suppose that if you were doing a lot of random writes you might be happier with a spinning rust journal (since those writes are all sequential anyway) and an SSD backing store (doesn't care about random versus sequential IO) in some cases
[16:36] <noob2> sagewk: i saw recently that inktank started offering professional support contracts for Ceph. Would it be possible to get a quote for what support would look like? We're over in delaware here. Would that be a problem?
[16:38] <gregaf> noob2: you'll want to email dona@inktank.com to discuss that
[16:39] <noob2> thanks.
[16:39] <gregaf> I know she can do quotes but I doubt Sage knows the numbers ;)
[16:39] <noob2> gotcha :)
[16:39] <gregaf> and location isn't a problem
[16:39] <noob2> ok
[16:39] <noob2> yeah i'm thinking of putting together a ceph cluster
[16:39] <noob2> i have traction here
[16:39] <noob2> i was able to push a 150TB gluster cluster through so I think why not thin provisioned block storage
[16:40] <gregaf> heh
[16:40] <noob2> they're open minded here but i doubt they'd let me build block storage without some kind of support backing
[16:41] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[16:48] <noob2> gregaf: do you have a large ceph cluster ?
[16:48] <gregaf> me?
[16:48] <gregaf> not unless I get to count DreamObjects next door ;)
[16:48] <noob2> yeah just wondering
[16:48] <gregaf> (I work at Inktank)
[16:48] <noob2> oh ok
[16:49] <noob2> i did have a question about the ceph protocol
[16:49] <noob2> with rados block devices and the kernel mount options, what protocol does that use to talk to the backend storage?
[16:49] <noob2> i'm wondering what a failover scenario would look like should one of my osd's go down
[16:50] <gregaf> it's using the custom Ceph protocol
[16:50] <noob2> i see
[16:50] <gregaf> so, same as with anything else in Ceph
[16:50] <gregaf> an OSD dies, it gets timed out and the client gets a new map and talks to a different OSD for that data
[16:51] <noob2> interesting
[16:51] <noob2> i saw an article recently on creating a ceph proxy server and then serving up block devices over fibre channel
[16:51] <gregaf> there are some people doing that to get an iscsi block device
[16:52] <gregaf> (or similar)
[16:52] <noob2> right
[16:52] <gregaf> but it's just re-exporting
[16:52] <noob2> serving it over LIO?
[16:52] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[16:52] <noob2> right
[16:53] <Fruit> noob2: I'm experimenting with exactly that (ceph-over-fc)
[16:54] <noob2> Fruit: can you detail your build setup a little?
[16:55] <Fruit> 3 debian wheezy vm's for ceph, one wheezy physical machine with a fc hba as a proxy, another physical with debian squeeze that is the client
[16:57] <Fruit> of course, because this wasn't experimental enough, I'm now trying to run an osd off zfs-on-linux
[16:57] <alphe> match yes smbd siting on top of cephfs
[16:58] <alphe> means you mount the cephfs the regular way then tell samba that the directory you mounted the cephfs partition is shared
[16:58] <alphe> gregaf hello :)
[16:59] <gregaf> hey
[16:59] <alphe> I still have problems with the size evaluation but only on the windows side
[16:59] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:59] <alphe> I don't know why I see only 176G as disk size in the disk properties instead of 44 TB
[17:00] <gregaf> yeah
[17:00] <Fruit> windows probably stops counting at 640k or something
[17:00] <alphe> but on the linux side now it displays ok when doing the def -h
[17:00] <alphe> fruit hehehe
[17:00] <alphe> fruit hum I have samba shared zfs and see the size properly
[17:01] <alphe> can be samba that has a limitation don't know how I can track down that ...
[17:01] <gregaf> haha, I bet Sage was right
[17:01] <alphe> hum maybe mounting from a linux the particion in cifs and see if the size is properly displayed
[17:01] <gregaf> 44TB/176GB = 256
[17:02] <gregaf> default block size is usually 4KB, but we use 1MB for ours in order to be able to count high enough to report proper sizes
[17:02] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:02] <gregaf> 1MB/4KB = 256
[17:02] <alphe> ok so it's windows evaluacion that is lacking or is it samba that doesn't broadcast the proper information ?
[17:03] <alphe> if it's on windows side I fear that will never be solved :P
[17:03] <gregaf> heh
[17:03] <gregaf> I don't know — I'm not familiar enough with Samba or Windows internals
[17:03] <alphe> if that is on the broadcasted info by samba then we can cheat with is
[17:03] <gregaf> I'd guess/hope that it's Samba, but there could be a protocol limitation too
[17:04] <alphe> if that is on the broadcasted info by samba then we can cheat with it and tell it it's a 256 times bigger space to get the proper display no ?
[17:04] <gregaf> and I imagine that Arch was doing the same thing
[17:04] <gregaf> alphe: yeah
[17:04] <alphe> yeah probably !
[17:04] * steki-BLAH (~steki@bojanka.net) has joined #ceph
[17:05] <alphe> I'm doing that virtual box samba sharing because our initial idea to make a dokanFS based client for ceph cluster was to long to implement ...
[17:05] <gregaf> so, I guess submit a bug report saying that Windows shares don't show the proper sizing information if the underlying FS has a non-4KB block size
[17:05] <alphe> compared to a bunch of hours once in a while from 2 years of specialised developpement my boss and investors selected the fast and easy way
[17:05] <gregaf> (to Samba, that is)
[17:06] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Remote host closed the connection)
[17:06] <alphe> gregaf you understand that bug better than me I guess you will explain it better with the solve proposition :)
[17:07] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[17:07] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[17:07] <gregaf> heh
[17:08] <gregaf> alphe: do you know where their bug system is? I don't use Samba
[17:08] <alphe> no
[17:09] <gregaf> k, guess I'll try and dig it up at some later point
[17:09] <gregaf> bbiab
[17:09] <alphe> and I guess they will need me to register ... which is always a tremendous loss of time
[17:10] <alphe> as it is a problem with arch linux too (but on archlinux side then you can't browse the data more than 176GB ...which is more problematic the displayed size condition the thing you are allowed to see...)
[17:11] <alphe> in windows is nore that a cosmetic problem since you can browse perfectly the ceph store for why more than the displayed limit and fast
[17:11] <alphe> on archlinux you experience a tremendous lag ...
[17:11] <alphe> with kernel 3.6
[17:13] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:17] * morse (~morse@supercomputing.univpm.it) Quit (Read error: Connection reset by peer)
[17:18] * sagelap (~sage@167.sub-70-197-140.myvzw.com) has joined #ceph
[17:19] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:21] <buck> I have yet another teuthology question if anyone is about that can field it
[17:22] <buck> I'm getting a BadHostKeyException. From the interwebs, it seems that paramiko defaults to host-based authentication rather than using than a user-based authentication
[17:22] <buck> i'm wondering if the expectation is that people setup host-based certs for using teuthology
[17:22] <buck> or I may be padding down the wrong creek on this
[17:27] <gregaf> buck: I think that just means that you locally don't like what the remote server is giving you
[17:27] <gregaf> when trying to ssh into it
[17:27] <noob2> would it be possible to layer encryption under rados block devices?
[17:28] <gregaf> noob2: depends on the kind of encryption — if it's transparent to users, then sure
[17:28] <noob2> yeah i was thinking transparent to the users
[17:28] <gregaf> there's a feature in the tracker to support dmcrypt or something as a plug-and-play option, but we haven't started on it yet
[17:28] <noob2> any tips on how that might work?
[17:28] <noob2> interesting
[17:28] <noob2> so would that be implemented on the backend osd devices?
[17:29] * pentabular (~sean@70.231.129.172) has joined #ceph
[17:29] <gregaf> noob2: not really sure — http://tracker.newdream.net/issues/3273 is about as much as I know
[17:29] <noob2> ok thanks i'll check it out :)
[17:32] <buck> gregaf: thanks for the help. turns out I needed to use the hosts public key and not the user I'd specified in the yaml file. I'm thinking this (and a few other small things) that I've run into should be added to the README.
[17:32] <gregaf> yep, go for it :)
[17:33] * prometheanfire (~promethea@rrcs-24-173-105-83.sw.biz.rr.com) has left #ceph
[17:41] * Tv_ (~tv@2607:f298:a:607:190f:ecf7:102b:da8f) has joined #ceph
[17:42] * cdblack (8686894b@ircip3.mibbit.com) has joined #ceph
[17:43] * sagelap1 (~sage@52.sub-70-197-146.myvzw.com) has joined #ceph
[17:43] * vata (~vata@208.88.110.46) has joined #ceph
[17:46] * sagelap (~sage@167.sub-70-197-140.myvzw.com) Quit (Ping timeout: 480 seconds)
[17:49] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[17:55] * sagelap2 (~sage@167.sub-70-197-140.myvzw.com) has joined #ceph
[17:56] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[17:57] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) has joined #ceph
[17:57] * sagelap3 (~sage@2600:1013:b02a:84df:c0d5:5838:5b8d:e88a) has joined #ceph
[17:59] * sagelap1 (~sage@52.sub-70-197-146.myvzw.com) Quit (Ping timeout: 480 seconds)
[17:59] * long (~chatzilla@118.195.65.95) Quit (Quit: ChatZilla 0.9.89 [Firefox 15.0.1/20120905151427])
[18:03] * sagelap2 (~sage@167.sub-70-197-140.myvzw.com) Quit (Ping timeout: 480 seconds)
[18:05] * sagelap3 (~sage@2600:1013:b02a:84df:c0d5:5838:5b8d:e88a) Quit (Ping timeout: 480 seconds)
[18:09] * rweeks (~rweeks@12.25.190.226) has joined #ceph
[18:09] * sagelap (~sage@2607:f298:a:607:1d17:77a3:d8f5:26d2) has joined #ceph
[18:10] * steki-BLAH is now known as BManojlovic
[18:11] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[18:11] * gregaf (~Adium@2607:f298:a:607:e8dc:95dd:2ab0:ecb5) Quit (Quit: Leaving.)
[18:12] * gregaf (~Adium@2607:f298:a:607:20e0:1784:3237:9cd4) has joined #ceph
[18:16] <noob2> gregaf: if you were using ceph just for rbd could you skip the metadata servers?
[18:16] <gregaf> yep
[18:16] <noob2> awesome :D
[18:16] <noob2> i had a feeling that was the case
[18:17] <gregaf> they're only for POSIX
[18:17] <gregaf> that's kind of the point ;)
[18:17] <noob2> right
[18:23] * yehudasa (~yehudasa@2607:f298:a:607:6534:f1b7:6a0b:6dfb) Quit (Ping timeout: 480 seconds)
[18:25] * tziOm (~bjornar@ti0099a340-dhcp0778.bb.online.no) has joined #ceph
[18:25] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[18:31] * yehudasa (~yehudasa@2607:f298:a:607:dcb7:393:6594:a865) has joined #ceph
[18:37] * BManojlovic (~steki@bojanka.net) Quit (Ping timeout: 480 seconds)
[18:41] * BManojlovic (~steki@bojanka.net) has joined #ceph
[18:41] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[18:41] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[18:57] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:58] * Cube (~Cube@12.248.40.138) has joined #ceph
[19:03] * loicd (~loic@82.233.234.24) Quit (Quit: Leaving.)
[19:10] * BManojlovic (~steki@bojanka.net) Quit (Ping timeout: 480 seconds)
[19:11] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:11] * mgalkiewicz (~mgalkiewi@staticline-31-182-149-180.toya.net.pl) Quit (Remote host closed the connection)
[19:11] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[19:11] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:25] <dmick> buck: note there's an 'ssh-keyscan' to update your local known_hosts file, too, which can be helpful
[19:27] <alphe> gregaf I did more tests about the 176G on windows if I transfere a big directory 2 TB for example the copy process crash without error message during the initial data evaluation
[19:27] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:27] * The_Bishop_ (~bishop@e179020233.adsl.alicedsl.de) has joined #ceph
[19:27] <alphe> you know the stage where it tends to evaluation how much is sent and how much it will take to send that much data
[19:28] <alphe> you know the stage where it tends to evaluation how much is sent and how much time it will take to send that much data
[19:32] * danieagle (~Daniel@186.214.57.151) has joined #ceph
[19:32] * The_Bishop__ (~bishop@e179004000.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[19:33] * loicd (~loic@90.84.144.146) has joined #ceph
[19:33] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[19:34] <sagewk> sjust: can you check wip-3142?
[19:34] <sjust> sagewk: yeah
[19:34] <sagewk> just repushed
[19:34] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit ()
[19:36] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[19:37] <buck> dmick: cool. I'll check that out. Thanks.
[19:37] * Tamil (~Adium@2607:f298:a:607:9f4:3279:3ca8:c8ff) has joined #ceph
[19:41] <sjust> sagewk: that looks good
[19:41] <sagewk> sjust: can you repush wip_journal_perf with the latest fixes?
[19:42] <sjust> yes
[19:45] <sjust> pushed
[19:45] <sagewk> tnx
[19:45] <sjust> I still need to clean up the finisher instrementation
[19:46] <sjust> I'll do that now
[19:46] * rweeks (~rweeks@12.25.190.226) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[19:48] * Tamil (~Adium@2607:f298:a:607:9f4:3279:3ca8:c8ff) has left #ceph
[19:48] <gregaf> alphe: do you know if that works in samba to begin with?
[19:48] <gregaf> sounds to me like you're exceeding a 32-bit counter limitation
[19:49] * Tamil (~Adium@2607:f298:a:607:9f4:3279:3ca8:c8ff) has joined #ceph
[19:49] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:51] * Tamil1 (~Adium@38.122.20.226) has joined #ceph
[19:53] * Tamil (~Adium@2607:f298:a:607:9f4:3279:3ca8:c8ff) Quit (Read error: Connection reset by peer)
[20:00] <alphe> gregaf I will connect a linux to that VM and si if samba displys data
[20:00] <gregaf> basically that's all in the Samba pipeline anyway so unless you're seeing Ceph crash, or Samba is crashing and complaining about the FS, then it's Not Our Problem
[20:00] <gregaf> ;)
[20:00] <gregaf> :(
[20:01] <alphe> :)
[20:02] <joao> :(
[20:04] <alphe> don't know in fact if the kernel 3.6 changed their way to interpret the ceph linux size it can be your futur problem
[20:04] <alphe> the linux world will not stay on 3.4 for ever ...
[20:07] * danieagle (~Daniel@186.214.57.151) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[20:09] <alphe> hum this is fun
[20:10] <alphe> samba cifs mounted on linux works fine
[20:10] <alphe> /192.168.0.250/ceph 33T 5.5T 28T 17% /mnt/ceph
[20:10] <alphe> so it is a problem in windows ?
[20:10] <alphe> (I lost 11TB cause a osd is having network interface troubles )
[20:11] * noob2 (a5a00214@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[20:12] <gregaf> alphe: I think the ArchLinux thing is the same size reporting problem in a different code base; it's certainly not related to kernel 3.6
[20:12] <alphe> I hope so
[20:12] <gregaf> and yep, sounds like a problem with Windows
[20:12] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[20:12] <gregaf> Samba really isn't anything I know about though, so...*shrug*
[20:13] * BManojlovic (~steki@bojanka.net) has joined #ceph
[20:13] <alphe> I neither :) I'm just happy to be able to play with it withour breaking my companies netbios network :P
[20:13] <alphe> ahahahaha
[20:13] * rweeks (~rweeks@12.25.190.226) has joined #ceph
[20:15] <tontsa> alphe, try changing the smb.conf under share the "block size" parameter.. i think it defaults to 1024.. so if you have 4096 blocks you see wrong free size
[20:15] * rturk (~rturk@ps94005.dreamhost.com) Quit (Quit: Coyote finally caught me)
[20:15] * rturk (~rturk@ps94005.dreamhost.com) has joined #ceph
[20:18] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[20:24] * BManojlovic (~steki@bojanka.net) Quit (Ping timeout: 480 seconds)
[20:24] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:28] <buck> I have a small (4 line) change to the README in teuthology in my branch (wip-buck). Could someone take at look at it and verify that it seems reasonable?
[20:28] <buck> 9841805b386775c271e35b9ec20ab3ad578af998
[20:29] <joshd> buck: looks good to me
[20:29] <buck> joshd: thanks.
[20:30] <dmick> buck: I'd change "this is located" to "these are located" to make number match in the paragraph, but sure otherwise
[20:30] <dmick> (and "the ssh key" to "the ssh keys")
[20:31] <buck> dmick: ahh, right on. I wrote it on my host (with one key) and then tried to line it up with the 3-node example without changing singulars to plural. Thanks for catching that.
[20:32] <dmick> I understand completely and yw :)
[20:34] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[20:34] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) Quit ()
[20:38] * loicd (~loic@90.84.144.146) Quit (Quit: Leaving.)
[20:39] * Tamil1 (~Adium@38.122.20.226) has left #ceph
[20:40] * Tamil1 (~Adium@38.122.20.226) has joined #ceph
[20:41] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[20:45] * noob2 (a5a00214@ircip4.mibbit.com) has joined #ceph
[20:45] * rweeks (~rweeks@12.25.190.226) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[20:46] * justinwarner (~Thunderbi@130.108.232.145) Quit (Quit: justinwarner)
[20:48] * justinwarner (~Thunderbi@130.108.232.145) has joined #ceph
[20:49] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[21:03] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[21:12] <AaronSchulz> how does one prune an osd journal file?
[21:12] <todin> nhm_: hi, with the four 520 Intel SSD for the journal I get a throughput from 1100MB/s
[21:12] <noob2> sweet
[21:12] <noob2> i'm looking into buying intel 910's for our setup
[21:12] <noob2> pci express is the way to go :D
[21:13] <todin> noob2: I am going to test the 910 at the end of the week, but I don't think it will be faster
[21:14] <alphe> tontsa I don't have the block size parameter thank for the suggestion I will do it
[21:14] <noob2> todin: what OS are you going to use for the 910's?
[21:14] <noob2> i'm trying to figure out which would be best for it
[21:14] <todin> noob2: ubuntu 12.04LTS
[21:14] <noob2> they have rpm's for redhat5/6 but i dnono
[21:14] <noob2> yeah i was thinking the same
[21:14] <noob2> so you're going to build from source ?
[21:15] <todin> noob2: why do I need software for the 910 Intel ssd? it is a normal sas device which the kernel supports
[21:16] <noob2> oh
[21:16] <noob2> i didn't know that
[21:16] <noob2> i thought i needed a kernel driver for it
[21:16] <noob2> that's even better news :D
[21:16] <noob2> you're talking about the intel pci express boards right?
[21:16] <todin> noob2: but be carefull they are displayed as for devices
[21:16] <todin> noob2: yes
[21:16] <alphe> tontsa > the right block size should be 10240 ? (1MB)
[21:16] <noob2> yeah i heard i need to raid0 them through software
[21:17] <noob2> which is why i'm thinking of upgrading the processors on the boxes i want to build.
[21:17] <todin> noob2: this one http://www.intel.com/content/www/us/en/solid-state-drives/solid-state-drives-910-series.html
[21:17] <noob2> that's the one :)
[21:17] <noob2> the specs are impressive on it
[21:18] <todin> I use the card a bit diffrently, I have for osd's on the box, and every journal comes on one of the devices
[21:18] <noob2> todin: so you just plugged it in, raid0'd it and flew?
[21:18] <noob2> yeah
[21:18] <noob2> so you're using xfs then?
[21:19] <todin> I mean four osds
[21:19] <todin> noob2: no, raw partition
[21:19] <noob2> oh ok
[21:19] <PerlStalker> ceph -s is reporting 'X pgs stuck unclean'. What do I do to resolve this?
[21:20] <todin> noob2: but four 520 will be faster.
[21:20] <noob2> oh i believe it
[21:20] <noob2> i'm getting a quote on an hp 585 to plug 9 intel 910's into it
[21:21] <todin> are the 910 just for the journal or for the storage as well?
[21:21] <noob2> i'm planning on using them for storage at this point
[21:21] <noob2> with another machine running some spinning disks
[21:21] <noob2> tiering it sorta
[21:22] <todin> noob2: sounds quite fast, and you have enough pci-bus bandwidth for the 9 ssds?
[21:22] <noob2> yeah the dl585 from HP is a monster
[21:22] <noob2> 11 pci x8 slots
[21:23] <noob2> todin: what kinda speedup did you see from moving the journal onto ssd's?
[21:23] <noob2> i have a gluster cluster that could benefit from that i think
[21:24] <todin> noob2: with 15k sas disk I got around 700MB/s
[21:24] <noob2> wow
[21:24] <noob2> not bad
[21:25] <todin> and atm the 10GeE link is the bottelneck
[21:25] <noob2> i thought sas disks topped out at 500MB/s
[21:25] * sage (~sage@76.89.177.113) Quit (Ping timeout: 480 seconds)
[21:25] <noob2> heh
[21:25] <noob2> yeah my 8Gb fiber is going to be a huge bottleneck
[21:25] <noob2> same with 10Gb ethernet
[21:26] <todin> noob2: I used 4 15 sas disk, so it's arund 175MB/s per disk
[21:26] <noob2> oh ok
[21:26] <noob2> raid'd them?
[21:27] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[21:28] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:28] <todin> each of the 15k disks are for the journal, for the filestore I have 12 7.2k sas disk, each 3 disk in raid0
[21:28] <noob2> interesting
[21:29] <todin> and today I replayed the 15k disks with 520 ssds
[21:29] <noob2> sweet :D
[21:29] <noob2> sounds like that is going to be a really killer setup
[21:29] <todin> I tried 710 as well, but they are sata-2 only
[21:29] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[21:30] <noob2> didn't like the sata?
[21:31] <noob2> brb i have to head into the datacenter to kick off an install
[21:31] <todin> sata-2 goes only up to 300MB/s, the 520 which are sata-3 push around 450MB/S
[21:47] <alphe> samba block size parameter don't change anything
[21:48] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:48] * The_Bishop_ (~bishop@e179020233.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[21:49] <dmick> AaronSchulz: not sure you do; why do you believe you want to?
[21:50] <AaronSchulz> I just have a tiny testing setup with a small partition, and those journals are killing it :)
[21:51] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[21:53] * rweeks (~rweeks@12.25.190.226) has joined #ceph
[21:54] <dmick> PerlStalker: have you had a look at http://ceph.com/docs/master/cluster-ops/troubleshooting-osd/#stuck-placement-groups ?
[21:55] <dmick> AaronSchulz: killing it as in "they're too large"? They're essentially circular buffers, so allocated at startup time and don't grow.
[21:55] <PerlStalker> dmick: I hadn't seen that page.
[21:55] <dmick> if you want to just make them smaller, you can shut down the OSDs, change the size, remove the journal, and restart the OSDs
[21:56] <AaronSchulz> so deleting them won't break anything?
[21:56] <dmick> let me doublecheck, but I believe once the OSD is down, its journals have been consumed
[21:59] <dmick> AaronSchulz: actually, bring the OSD down, then start it with --flush-journal, which will only flush the journal and then stop again. see ceph-osd(8)
[22:01] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[22:02] <dmick> PerlStalker: let us know if that provides any clues
[22:04] <PerlStalker> dmick: I'm seeing a bunch of "stuck peering" messages when I run ceph health detail.
[22:06] * sjustlaptop (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[22:07] <AaronSchulz> dmick: I is still 1.1G, hmm
[22:07] <dmick> PerlStalker: http://ceph.com/docs/master/cluster-ops/troubleshooting-osd/#placement-group-down-peering-failure may be useful
[22:07] <dmick> AaronSchulz: you have to change the size in the ceph.conf; did you?
[22:07] <AaronSchulz> yeah, I did 256
[22:08] <dmick> and you deleted the file (I assume it must be a file, and not a device) after the --flush-journal?
[22:09] <PerlStalker> dmick: The list in 'peering_blocked_by' is empty on every pg I've looked at so far. Perhaps I'm not waiting long enough.
[22:09] <dmick> PerlStalker: what's the history here? Did one or more OSDs go down, and you restarted them?
[22:11] <PerlStalker> dmick: I'm testing recovery on a test cluster. I stopped 2 of 3 osds and brought them back up.
[22:11] <dmick> ok. so we expect repeering at least
[22:11] <dmick> ceph -w can be handy to watch what the cluster is doing
[22:12] <dmick> maybe you can see that it's making progress on recovering the pgs
[22:13] <PerlStalker> I'm seeing "[WRN] slow request" messages with a bunch of numbers then 'v4 currently delayed'
[22:15] <AaronSchulz> dmick: ok I forgot to delete it, then it started hanging and stopped working until I ran --mkjournal
[22:15] <dmick> PerlStalker: can you paste 'ceph osd dump' to your favorite pastebin?
[22:15] * AaronSchulz also enlarged the partition a bit
[22:16] <dmick> AaronSchulz: ah, yes, I should have said
[22:16] <dmick> good
[22:16] <AaronSchulz> what does "x pgs degraded" mean for ceph health?
[22:17] <dmick> AaronSchulz: basically that those PGs are currently going through recovery with the OSD active set
[22:17] <dmick> it's semi-normal after some sort of cluster event
[22:17] <AaronSchulz> pg=placement groups?
[22:17] <dmick> yes
[22:18] <dmick> partitions of the object namespace, basically
[22:18] <PerlStalker> dmick: http://pastebin.com/ysKASth0
[22:20] <dmick> PerlStalker: so the OSDs are all up/in, and presumably are making progress toward peering and updating. The "slow request" warnings are a little worrisome but can happen under heavy recovery load sometimes; if they don't clear up we'll have to look more closely
[22:20] <dmick> you might check the general health of the OSD machines; make sure they're not swapping or having network troubles
[22:21] <PerlStalker> dmick: Will do
[22:25] * justinwarner1 (~ceg442049@osis111.cs.wright.edu) has joined #ceph
[22:25] <justinwarner1> How do you test whether or not Ceph is running correctly?
[22:28] <dmick> justinwarner1: a quick test is "ceph -s"
[22:28] <dmick> to see the cluster status
[22:30] <justinwarner1> dmick, if that doesn't return anything though, it just stalls, I'm guessing that means it didn't create correctly?
[22:30] <dmick> right, something's not running if that fails
[22:30] <justinwarner1> Alright, sweet, thanks.
[22:30] <dmick> pgrep for ceph- processes
[22:30] <dmick> maybe check the logs to see if one of them died, and why
[22:31] <joshd> did you run 'service ceph start -a' or similar?
[22:32] <justinwarner1> No ceph processes on the machine. I honestly am just going off of the wiki on setting this up. I'm doing this in a school lab so I can't use the normal mode of setup with ssh putting the files correctly so trying to do it manually.
[22:36] <justinwarner1> @ service ceph start -a, it says -a not found (/etc/ceph/ceph.conf defines "")
[22:36] <cephalobot`> justinwarner1: Error: "service" is not a valid command.
[22:36] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:37] <dmick> yeah, I think it's service ceph -a start
[22:37] <dmick> I can never remember
[22:38] <dmick> but the point is, how did you try to start the procs, I guess
[22:39] <justinwarner1> Couldn't start it because it says I need root, and I only have a su account my professor gave me.
[22:40] <dmick> well....su will give you root, right?...and do you have sudo?
[22:41] <dmick> so what exactly are you typing, I guess is a question
[22:43] <alphe> bye all
[22:43] <justinwarner1> I don't think I ever did anything to try to start the service, I did the mkcephfs for the mon, osd, and mds. And yeah, it does, but when it does the ssh (Like, in set up too, but also doing the command you told me to actually start the service) it asks for root@hostname: for a password, which I don't have access to.
[22:44] * alphe (~alphe@200.111.172.138) has left #ceph
[22:49] * sagelap (~sage@2607:f298:a:607:1d17:77a3:d8f5:26d2) Quit (Ping timeout: 480 seconds)
[22:52] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[22:53] <iggy> I don't think the init scripts are expected to be used in that type of situation, you'll probably have to login to each host as your user then su/sudo to start the services individually
[22:55] <todin> hi, is there a video from the speaks from Josh at the openstack summit?
[22:56] <rweeks> not sure if those were recorded or not
[22:56] <justinwarner1> iggy: That's what I'm working on now. Nearly done.
[22:57] * The_Bishop (~bishop@p4FCDE74C.dip.t-dialin.net) has joined #ceph
[22:57] <todin> rweeks: ok, that's probaly why I couldn't finde a link to a video
[22:58] * sagelap (~sage@2607:f298:a:607:c0d5:5838:5b8d:e88a) has joined #ceph
[22:59] * sagelap (~sage@2607:f298:a:607:c0d5:5838:5b8d:e88a) Quit ()
[22:59] * sagelap (~sage@38.122.20.226) has joined #ceph
[22:59] * adjohn (~adjohn@69.170.166.146) has joined #ceph
[22:59] <rweeks> joshd: was your talk recorded?
[23:01] <joshd> rweeks: yes, it should go up under http://www.youtube.com/user/OpenStackFoundation this week
[23:01] <rweeks> ah cool, good to know
[23:02] <joshd> slides are at http://www.slideshare.net/openstack/storing-vms-with-cinder-and-ceph-rbdpdf
[23:02] <todin> joshd: great thanks, I just read your doc at http://ceph.com/docs/master/rbd/rbd-openstack/, can I do the cinder create via the dashboard as well?
[23:03] <joshd> todin: not yet, but that should happen in grizzly
[23:04] <todin> joshd: grizzly is the next release?
[23:04] <joshd> lots of people are interested in making booting from a volume much easier (like nova boot [src] [dest], and it auto-clones or copies for you, and the equivalent in the dashboard)
[23:04] <joshd> yeah, grizzly in the next release (in april)
[23:04] <dmick> justinwarner1: if you have su on all the hosts, you can give yourself passwordless root ssh on all the hosts
[23:04] <dmick> that makes it easier
[23:05] <todin> joshd: and it should work with devstack as well?
[23:06] <justinwarner1> dmick, I have tried that, the problem is, my su account is my professors, when Ceph does its thing, it logs in to root@hostname, not myuser@hostname
[23:06] <joshd> todin: yup
[23:06] <justinwarner1> If I run the script as my student user, it does ssh in to the other machine using student@hostname, but then it won't accept my password, I don't really know why, just assumed it was because I wasn't a su.
[23:07] <todin> joshd: great, I will give it a try tomorrow, the slideshare links doesn't load,
[23:08] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) Quit (Read error: Connection reset by peer)
[23:08] <joshd> todin: cool, let me know if the instructions could be improved
[23:08] <todin> joshd: I wll
[23:08] <todin> I will
[23:10] <dmick> justinwarner1: yes, but with su you can fix all that.
[23:10] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) has joined #ceph
[23:11] <iggy> justinwarner1: read up on passwordless login with ssh keys (which I think is what dmick was talking about)
[23:12] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[23:12] * rweeks (~rweeks@12.25.190.226) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[23:13] <todin> joshd: in the default the openstack vms don't support trim?
[23:15] <sagelap> gregaf: wip-msgr-connect-backoff?
[23:15] * aliguori (~anthony@cpe-70-123-146-246.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[23:15] <joshd> todin: no, you'd have to change openstack to pass the right options through to qemu. I'm actually not familiar with the libvirt syntax for that, just the qemu command line
[23:16] <justinwarner1> iggy && dmick, I'll try doing the passwordless login, I only did it from the osd to the mon, I thought I did it the other way around. Thanks for your help.
[23:17] <todin> joshd: I know the libvirt syntax, do you know the place where to put it?
[23:17] * Leseb_ (~Leseb@62.233.37.28) has joined #ceph
[23:17] <dmick> justinwarner1: sure. it can be a hassle to set up
[23:18] <dmick> joshd: I just tweaked the qemu commandline options for a different reason (to enable gdb server)
[23:18] <dmick> so it's probably similar
[23:19] <dmick> like Symbolic kernel debugging with KVM in http://ceph.com/wiki/Debugging
[23:19] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[23:20] <joshd> todin: modify the format_dom method in something like nova.virt.libvirt.config.LibvirtConfigGuestDisk
[23:20] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:21] <Leseb> joshd: hi! are you familiar with this error while booting from volume? http://pastebin.com/egTz93UK
[23:21] <Leseb> (rbd of course :))
[23:22] <joshd> Leseb: that's trying to use direct file injection to e.g. store ssh keys and hostname, which is horribly insecure
[23:22] <joshd> Leseb: you should configure it to use a metadata service and cloud-init, or a config drive (also works with cloud-init)
[23:23] <Leseb> joshd: I use cents, so no cloud-init :(
[23:23] <joshd> there is cloud-init for centos these days, actually
[23:23] <Leseb> oh really? cool I'm gonna check sorry
[23:24] <Leseb> but the thing is the injection seems to be a default behavior from nova, did you turn it off?
[23:24] <joshd> it wasn't the default when I was setting it up
[23:25] <joshd> maybe because I wasn't using centos packages
[23:25] * vata (~vata@208.88.110.46) Quit (Quit: Leaving.)
[23:26] * Leseb_ (~Leseb@62.233.37.28) Quit (Ping timeout: 480 seconds)
[23:26] <Leseb> hum I'm gonna check that
[23:28] <Leseb> joshd: ok, disabling the key injection did the trick, but now the boot from volume fails silently
[23:28] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[23:28] <Leseb> the instance end up with a "no bootable device"
[23:30] * noob2 (a5a00214@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[23:33] <Leseb> joshd: any idea? qemu seems to be able to reach the volume
[23:35] <elder> sagewk, joshd dmick Finally running xfstests. A single fsstress run is enough to cause xfs FS corruption in this test.
[23:36] <elder> Now I'll start digging deeper...
[23:36] * justinwarner1 (~ceg442049@osis111.cs.wright.edu) has left #ceph
[23:36] <sagewk> elder: directly on top of xfs+rbd, without the xfstests harness? or via 13?
[23:36] <elder> So far, via 13, limited to just one run.
[23:37] <elder> Now I'll zoom into simpler command line.
[23:39] <gregaf> sagewk: wip-msgr-connect-backoff looks good to me
[23:43] <joao> sagewk, gregaf, still have the time to meet? (either vidyo or irc is fine with me)
[23:44] <joshd> Leseb: is it crashing, or hanging?
[23:45] <sagewk> gregaf: k thanks
[23:45] <sagewk> joao: yeah let's skype
[23:46] <Leseb> there is no crash, nor hanging, the instance boot successfully but the system can't find the device
[23:46] <joshd> Leseb: how did you create the volume - did you create from and image?
[23:46] <joshd> s/and/an/
[23:46] <Leseb> yes I used cinder for thaht
[23:46] <joao> sagewk, great!
[23:46] <joshd> Leseb: what's the qemu command line in the end?
[23:47] <joshd> Leseb: that kind of failure usually means the data on the disk isn't bootable, and can happen if it's blank or the wrong format
[23:49] <Leseb> joshd: qemu: http://pastebin.com/kjxkrdVs
[23:50] <Leseb> joshd: my image is bootable, disk format is qcow2 and container format is ovf
[23:51] <joshd> Leseb: you'll need to convert it to raw before putting it on rbd
[23:54] <Leseb> joshd: art ok one sec :)
[23:57] * mkampe (~markk@2607:f298:a:607:222:19ff:fe31:b5d3) Quit (Remote host closed the connection)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.