#ceph IRC Log

Index

IRC Log for 2012-11-26

Timestamps are in GMT/BST.

[0:09] * CristianDM (~CristianD@host214.201-252-48.telecom.net.ar) has joined #ceph
[0:09] <CristianDM> Hi
[0:09] <CristianDM> I try run rados bench but this show error
[0:09] <CristianDM> Must write data before running a read benchmark!
[0:18] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[0:18] * ChanServ sets mode +o scuttlemonkey
[0:23] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[0:24] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[0:29] * maxiz_ (~pfliu@111.192.251.57) has joined #ceph
[0:35] * jluis (~JL@89.181.144.105) has joined #ceph
[0:36] * illuminatis (~illuminat@89-76-193-235.dynamic.chello.pl) Quit (Quit: WeeChat 0.3.9.2)
[0:38] * CristianDM (~CristianD@host214.201-252-48.telecom.net.ar) Quit ()
[0:41] * joao (~JL@89-181-153-24.net.novis.pt) Quit (Ping timeout: 480 seconds)
[0:42] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:42] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:55] * gaveen (~gaveen@112.134.113.254) Quit (Remote host closed the connection)
[0:57] * joao (~JL@89.181.144.216) has joined #ceph
[0:57] * ChanServ sets mode +o joao
[1:03] * jluis (~JL@89.181.144.105) Quit (Ping timeout: 480 seconds)
[1:13] * infinity_ (~sonu@nas2.meghbelabroadband.in) has joined #ceph
[1:14] * infinity_ (~sonu@nas2.meghbelabroadband.in) Quit (Quit: Leaving)
[1:21] * BManojlovic (~steki@212.69.21.174) Quit (Remote host closed the connection)
[1:29] * maxiz_ (~pfliu@111.192.251.57) Quit (Quit: Ex-Chat)
[1:35] * tnt (~tnt@55.188-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:49] * topro (~quassel@2.215.102.219) Quit (Ping timeout: 480 seconds)
[2:02] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[2:11] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[2:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[2:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[2:28] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:28] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:04] * darkfaded (~floh@188.40.175.2) has joined #ceph
[3:09] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:09] * darkfader (~floh@188.40.175.2) Quit (Ping timeout: 480 seconds)
[3:09] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:34] * deepsa (~deepsa@122.172.213.104) has joined #ceph
[3:38] * maxiz (~pfliu@202.108.130.138) has joined #ceph
[4:04] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[4:04] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:13] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[4:14] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:04] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:04] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:24] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[5:24] * loicd (~loic@magenta.dachary.org) has joined #ceph
[6:33] <tore_> sata is fine for non-raided setups. if you use raid underneath, SAS is the way to go
[6:34] <tore_> also, it's perfectly reasonable to mix drives from different vendors with CEPH
[6:34] <tore_> each drive can be a separate OSD if you prefer
[7:20] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[7:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:23] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[7:23] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:49] * ctrl (~Nrg3tik@78.25.73.250) has joined #ceph
[7:51] * tnt (~tnt@55.188-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:45] * tnt (~tnt@55.188-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[8:47] * illuminatis (~illuminat@62.148.91.68) has joined #ceph
[9:04] * maxiz (~pfliu@202.108.130.138) Quit (Ping timeout: 480 seconds)
[9:08] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:15] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:15] * maxiz (~pfliu@202.108.130.138) has joined #ceph
[9:26] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[9:28] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:32] <dweazle> tore_: even with raid you could use sata (but do use raid edition sata disks)
[9:32] <dweazle> only reason imo to use sas is for multipathing, which you don't need with ceph
[9:33] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:36] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[9:39] <tore_> yes you can raid sata, but no that's not a good idea for anything enterprise quality
[9:41] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[9:42] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:45] * topro (~quassel@host-62-245-142-50.customer.m-online.net) has joined #ceph
[9:47] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:48] * nosebleedkt (~kostas@kotama.dataways.gr) has joined #ceph
[9:48] <nosebleedkt> hi all
[9:49] <nosebleedkt> joao, goodmorning !
[9:49] <nosebleedkt> joao, if you have some time I can ask something simple, I think :D
[9:51] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:54] <ScOut3R> morning nosebleedkt :)
[9:54] <nosebleedkt> HI scooter :D
[10:04] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[10:05] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[10:05] * loicd (~loic@2a01:e35:2eba:db10:b179:4d0e:4bfa:6c96) has joined #ceph
[10:08] * Kioob`Taff1 (~plug-oliv@local.plusdinfo.com) has joined #ceph
[10:14] <dweazle> tore_: i beg to differ, enterprise sata is just as reliable as sas these days
[10:14] <tontsa> well same physical discs they are in 7200rpm range. they just run diffrent firmware. so from reliability point of view they are equal
[10:27] * tpeb (~tdesaules@90.84.144.138) has joined #ceph
[10:27] <tpeb> hello guys !
[10:28] <tpeb> when I made a mkcephfs I get an error
[10:28] <tpeb> ERROR: error creating empty object store in /data/ceph/cinder/osd-0: (21) Is a directory
[10:28] <tpeb> any idea about it ?
[10:34] * loicd (~loic@2a01:e35:2eba:db10:b179:4d0e:4bfa:6c96) Quit (Quit: Leaving.)
[10:34] * loicd (~loic@magenta.dachary.org) has joined #ceph
[10:36] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Quit: Leaving.)
[10:40] * maxiz (~pfliu@202.108.130.138) Quit (Quit: Ex-Chat)
[10:40] * allsystemsarego (~allsystem@188.27.167.129) has joined #ceph
[10:48] <tpeb> I get this warn : HEALTH_WARN 533 pgs peering; 533 pgs stuck inactive; 714 pgs stuck unclean
[10:56] <tpeb> was me, all is good
[11:21] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[11:23] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:24] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:36] * mib_h558gp (ca037809@ircip4.mibbit.com) has joined #ceph
[11:36] <nosebleedkt> The OSD directory under /var/lib/ceph/osd/ceph-1 must be mounted as BTRFS ?
[11:37] * kspr (~barbe@cse35-1-82-236-141-76.fbx.proxad.net) has joined #ceph
[11:37] <tnt> or xfs, or ext4 (for the latter you need some additional config)
[11:40] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[11:40] * loicd (~loic@magenta.dachary.org) has joined #ceph
[11:41] <mib_h558gp> Can anyone guide me about . How to mount specific pool onto dir at user space ?
[11:41] * kspr (~barbe@cse35-1-82-236-141-76.fbx.proxad.net) Quit (Read error: No route to host)
[11:41] <mib_h558gp> sage sir told me to use "cephfs /path --pool <poolid> " but its not working
[11:42] <tnt> how is that not working ?
[11:43] <joao> morning #ceph
[11:43] <mib_h558gp> it gives error as follows : invalid command usage: cephfs path command [options]*
[11:43] <tnt> but this command will not "mount" anything AFAIK. It will just instruct cephfs to use this pool for all directory/files under /path
[11:43] <joao> nosebleedkt, it must be mounted, but the fs does not need to be btrfs per se
[11:44] <tnt> try 'cephfs /path set_layout --pool <poolid>'
[11:44] <mib_h558gp> i didnt got you
[11:44] <mib_h558gp> Error setting layout: Inappropriate ioctl for device
[11:44] <nosebleedkt> joao, Hello :D
[11:44] <joao> mib_h558gp, are you using btrfs?
[11:44] <tnt> mib_h558gp: path need to be a path to an already mounted cephfs
[11:45] <nosebleedkt> joao, currently I have /var/lib/ceph/osd/ceph-{1,2,3} on the root fs
[11:45] <mib_h558gp> means i will first mount with "mount.ceph" cmd then use "cephfs" ?
[11:45] <tnt> yes
[11:45] <nosebleedkt> joao, not on separete mount
[11:45] <joao> nosebleedkt, that works too
[11:46] <nosebleedkt> joao, which is more efficient ?
[11:46] <joao> separate mount
[11:46] <nosebleedkt> hmm
[11:46] <nosebleedkt> and that can be ext4 ?
[11:46] <joao> you won't be competing for IO with the rest of the system
[11:46] <joao> sure
[11:46] <tnt> mib_h558gp: There is only 1 single filesystem in cephfs, but you can assign subdirectories to specific pools and you can mount subdirectories rather than the root.
[11:46] <nosebleedkt> and the size of those mounts will be the available storage?
[11:46] <mib_h558gp> tnt : ohhk. first use "mount.ceph mon:port:/ /path" then use " cephfs /mnt/ceph/a set_layout --pool 3" ??
[11:47] <joao> nosebleedkt, yes
[11:47] <nosebleedkt> aha
[11:47] <nosebleedkt> now things get clearer
[11:47] <tnt> mib_h558gp: huh well /path needs to me /mnt/ceph then
[11:47] <tnt> s/me/be/
[11:48] <mib_h558gp> tbt : ok. I will try iy out and then report you back. :)
[11:51] <mib_h558gp> not working .... :(
[11:51] <mib_h558gp> root@hemantsec-virtual-machine:~# cephfs /home/hemant/a set_layout --pool 3 Error setting layout: Invalid argument
[11:54] <mib_h558gp> tnt : even i mounted using "mount.ceph" cmd then tried to execute "cephfs" but it still not working
[12:00] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[12:08] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[12:10] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[12:11] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:20] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[12:21] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:25] <mib_h558gp> tnt : even i mounted using "mount.ceph" cmd then tried to execute "cephfs" but it still not working
[12:41] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[12:59] * cobz (~coby@p579D3134.dip.t-dialin.net) has joined #ceph
[13:03] * maxiz (~pfliu@111.192.241.244) has joined #ceph
[13:04] <cobz> Hi, ive got a simple ceph setup with one osd on a seperated node. If i try "mount.ceph head:/ /mnt/cephfs" i got the error: mount error 5 = Input/output error. Does anyone have a hint?
[13:05] <cobz> ceph -s output seems okay: http://pastebin.com/08k2JEm8
[13:05] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[13:26] * deepsa_ (~deepsa@115.241.4.187) has joined #ceph
[13:27] * deepsa (~deepsa@122.172.213.104) Quit (Ping timeout: 480 seconds)
[13:27] * deepsa_ is now known as deepsa
[13:35] * mdekkers (~mdekkers@87-194-160-154.bethere.co.uk) has joined #ceph
[13:36] <mdekkers> Hi all, I hope somebody can help with some questions :)
[13:37] * mib_h558gp (ca037809@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[13:37] <mdekkers> I am building a new ceph cluster, and was wondering if all OSD's need to be of the same size, or if indiscriminate mixing of drive sizes is ok?
[13:38] <mdekkers> any help would be much appreciated :)
[13:44] <tpeb> hi ! what meens "1 slow requests, 1 included below; oldest blocked for > 480.780784 secs"
[14:03] * cobz (~coby@p579D3134.dip.t-dialin.net) Quit (Remote host closed the connection)
[14:08] <nosebleedkt> joao, are you there?
[14:18] * mtk (cMfTUEYZdt@panix2.panix.com) has joined #ceph
[14:28] <nosebleedkt> why do i have to 'rbd map foo --pool rbd' every time i want to mount the an RBD device?
[14:28] <nosebleedkt> I mean before mounting I have to map it.
[14:29] <tnt> well yeah ... obviously
[14:29] <nosebleedkt> shouldn't this happen just once ?
[14:29] <tnt> once every boot
[14:29] <nosebleedkt> ahh
[14:38] * timmclaughlin (~timmclaug@69.170.148.179) has joined #ceph
[14:42] * mdekkers (~mdekkers@87-194-160-154.bethere.co.uk) Quit (Read error: Connection reset by peer)
[14:44] * ssedov (stas@ssh.deglitch.com) has joined #ceph
[14:49] * stass (stas@ssh.deglitch.com) Quit (Ping timeout: 480 seconds)
[14:54] * slang (~slang@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)
[14:56] * loicd (~loic@magenta.dachary.org) has joined #ceph
[14:57] <joao> wido, around?
[15:03] * aliguori (~anthony@cpe-70-123-145-75.austin.res.rr.com) has joined #ceph
[15:03] * joao sets mode -o joao
[15:05] * topro (~quassel@host-62-245-142-50.customer.m-online.net) Quit (Ping timeout: 480 seconds)
[15:05] * weber (~he@219.85.243.242) has joined #ceph
[15:09] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) has joined #ceph
[15:13] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[15:17] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[15:17] * loicd (~loic@magenta.dachary.org) has joined #ceph
[15:22] * anon (~chatzilla@hippo2.bbaw.de) has joined #ceph
[15:23] * anon (~chatzilla@hippo2.bbaw.de) Quit ()
[15:25] * anon (~chatzilla@hippo2.bbaw.de) has joined #ceph
[15:25] * todin_ is now known as todin
[15:31] * illuminatis (~illuminat@62.148.91.68) Quit (Quit: WeeChat 0.3.9.2)
[15:31] <nosebleedkt> can someone explain this
[15:31] <nosebleedkt> root@cephfs:~# mkfs.ext3 -m0 /dev/rbd1 /mnt/myrbd2/
[15:31] <nosebleedkt> mke2fs 1.42.5 (29-Jul-2012)
[15:31] <nosebleedkt> mkfs.ext3: invalid blocks '/mnt/myrbd2/' on device '/dev/rbd1'
[15:38] <elder> nosebleedkt mkfs.ext3 does not take a mount point.
[15:38] <elder> So presumably if you're trying to make a new ext3 file system with a 0% reserved block percentage, you might just use:
[15:38] <elder> mke2fs -m0 /dev/rbd1
[15:39] <elder> (I guess that's an ext2 file system)
[15:39] <nosebleedkt> oh lol
[15:40] <nosebleedkt> the 8th hour at work is always dump
[15:42] <anon> Hi there, I'm new to ceph. Is this MDS really needed for an rbd only setup?
[15:43] <nosebleedkt> no
[15:43] <anon> thanks
[15:43] <nosebleedkt> MDS is for cephfs
[15:47] <anon> next question: when there are multiple osds on one node (server) how does the ceph setup know that a redundancy shouldn't be realized within this node but spread over the other nodes?
[15:51] <ScOut3R> anon: you can define that using the crushmap
[15:51] <ScOut3R> in your rulefile change "step choose firstn 0 type osd" to "step chooseleaf firstn 0 type host" (fixme, it's working for me:))
[15:52] <ScOut3R> of course this definition is inside a rule block for a given pool
[15:53] <anon> ah, crushmap, a word to search for.
[15:53] <anon> thanks for the hint
[15:54] <nosebleedkt> hi
[15:54] <nosebleedkt> is it ceph's action that remounts my rados device read-only when it's full ?
[15:55] <nosebleedkt> /dev/rbd0 on /mnt/myrbd type ext3 (rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered)
[15:55] <nosebleedkt> /dev/rbd1 on /mnt/myrbd2 type ext3 (ro,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered)
[15:55] <via> hi, i have a ceph cluster where an osd died and i replaced it, and after taking all day it has finally settled on working, but won't get past pgmap v49256: 384 pgs: 370 active+clean, 1 active+remapped, 13 active+degraded;
[15:55] <via> i've tried repair'ing and scrub'ing all osd's, it makes no active progress towards becoming fully nondegraded
[15:56] <via> none of the pg's list any unfound objects
[15:56] <tpeb> hey guys is that possible (using crushmap) storing data on two different disk according pools ? for exemple pool_test on sda and pool_test2 on sdb on a single test machine ?
[15:59] <via> one of the pg'sthat won't recover shows like this: http://pastebin.com/sBVg8L1q
[16:06] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:10] * noob2 (a5a00214@ircip3.mibbit.com) has joined #ceph
[16:12] <noob2> wow btrfs doesn't deal well with disks flaking out and almost dying
[16:12] <noob2> my load on the server spiked to 115
[16:13] * nosebleedkt (~kostas@kotama.dataways.gr) Quit (Quit: Leaving)
[16:14] * mtk (cMfTUEYZdt@panix2.panix.com) Quit (Remote host closed the connection)
[16:15] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[16:16] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit ()
[16:16] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[16:17] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[16:42] <tpeb> always get that "health HEALTH_WARN 640 pgs peering; 640 pgs stuck inactive; 640 pgs stuck unclean" cluster is dead ?
[16:43] <noob2> not dead but def not happy
[16:45] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[16:47] <tpeb> any idea to fix it ?
[16:47] <tpeb> (any rbd tool fails)
[16:48] <noob2> i'm in a similar situation where the cluster is stuck because i have drives that are almost dead but not actually failed yet
[16:49] <noob2> all commands just hang
[16:49] <noob2> i'm not sure what the fix is yet
[16:49] <tpeb> ok thx ! I will search a solution too !
[16:49] * tpeb (~tdesaules@90.84.144.138) Quit (Quit: leaving)
[16:50] <darkfaded> you could try ejecting the drive using /sys/block
[16:50] <darkfaded> but i dont know if that is a route that makes things better, it could be making it worse just as likely
[16:56] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[17:05] * wer (~wer@wer.youfarted.net) Quit (Remote host closed the connection)
[17:06] * wer (~wer@wer.youfarted.net) has joined #ceph
[17:06] * wer_ (~wer@wer.youfarted.net) has joined #ceph
[17:06] * wer_ (~wer@wer.youfarted.net) Quit (Remote host closed the connection)
[17:08] * vata (~vata@208.88.110.46) has joined #ceph
[17:15] * anon (~chatzilla@hippo2.bbaw.de) Quit (Quit: I'm not here right now.)
[17:15] * Kioob`Taff1 (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[17:21] * calebamiles (~caleb@c-24-128-194-192.hsd1.vt.comcast.net) has joined #ceph
[17:22] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:23] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[17:31] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[17:36] * deepsa (~deepsa@115.241.4.187) Quit (Ping timeout: 480 seconds)
[17:36] * deepsa (~deepsa@122.167.171.217) has joined #ceph
[17:36] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:49] * sagelap (~sage@46.sub-70-197-142.myvzw.com) has joined #ceph
[17:49] <sagelap> wido: delighted to see we have an upstart user. do you mind looking at the upstart changes and letting us know if anything is not right/good?
[17:50] <sagelap> wido: there is now a 'ceph' master job, and 'ceph-osd-all'. so things like 'start ceph' should now work...
[17:53] * deepsa (~deepsa@122.167.171.217) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[17:53] * wer (~wer@wer.youfarted.net) Quit (Remote host closed the connection)
[17:53] * wer (~wer@wer.youfarted.net) has joined #ceph
[17:57] * gregaf1 (~Adium@2607:f298:a:607:15eb:d1d7:7645:a03c) has joined #ceph
[17:57] * jlogan1 (~Thunderbi@2600:c00:3010:1:852f:a2dd:c540:fa16) has joined #ceph
[18:01] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:03] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:04] * gregaf (~Adium@2607:f298:a:607:b4be:20f0:4787:7dfc) Quit (Ping timeout: 480 seconds)
[18:05] * wer (~wer@wer.youfarted.net) Quit (Remote host closed the connection)
[18:06] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[18:06] * wer (~wer@wer.youfarted.net) has joined #ceph
[18:07] * adjohn (~adjohn@69.170.166.146) has joined #ceph
[18:07] * Dr_O (~owen@heppc049.ph.qmul.ac.uk) has joined #ceph
[18:10] * tnt (~tnt@55.188-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:14] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[18:14] * sagelap (~sage@46.sub-70-197-142.myvzw.com) Quit (Ping timeout: 480 seconds)
[18:21] * sagelap (~sage@46.sub-70-197-142.myvzw.com) has joined #ceph
[18:22] * yehudasa (~yehudasa@2607:f298:a:607:c4f0:a32d:8103:5c98) Quit (Quit: Ex-Chat)
[18:25] * yehudasa (~yehudasa@2607:f298:a:607:45bd:9a9d:83a3:4164) has joined #ceph
[18:25] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[18:28] * allsystemsarego (~allsystem@188.27.167.129) Quit (Quit: Leaving)
[18:29] * sagelap (~sage@46.sub-70-197-142.myvzw.com) Quit (Ping timeout: 480 seconds)
[18:34] <noob2> is there a way to force slow requests to quit?
[18:34] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[18:34] * tpeb (~tdesaules@dan75-10-83-157-22-200.fbx.proxad.net) has joined #ceph
[18:34] <noob2> i see a bunch of these: 053525 osd.1 [WRN] slow request 107157.881513 seconds old
[18:35] <tpeb> hi guys
[18:35] <noob2> btrfs seems to misbehave when disks are about to die. i might have to go back to xfs
[18:37] <noob2> docs seem to point to a disk dying but hpacucli doesn't indicate a smart error yet
[18:37] <noob2> kinda curious
[18:38] <Robe> noob2: which kernel?
[18:38] <noob2> 3.50-17
[18:38] <Robe> wat?
[18:38] <Robe> linux?
[18:39] * sagelap (~sage@2607:f298:a:607:7463:bf6a:b3fa:74f8) has joined #ceph
[18:39] <noob2> yup
[18:39] <Robe> 3.5.something?
[18:39] <noob2> yup
[18:39] <Robe> ok
[18:39] <noob2> 3.5.0-17 ubuntu 12.10
[18:39] <Robe> good to know
[18:43] * vata (~vata@208.88.110.46) Quit (Quit: Leaving.)
[18:43] <noob2> these slow request seem to have hosed up the cluster
[18:44] <noob2> i can't remove any rbd luns now
[18:44] <Robe> yep
[18:44] <Robe> that's to be expected
[18:44] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[18:44] <gregaf1> noob2: what version of Ceph are you running?
[18:45] <noob2> 0.54-1quantal
[18:46] <gregaf1> there's a bug in 0.54 that we're still tracking down between scrub and that; restarting the OSDs one at a time will complete them
[18:47] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[18:48] * jlogan1 (~Thunderbi@2600:c00:3010:1:852f:a2dd:c540:fa16) Quit (Read error: Connection reset by peer)
[18:51] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:52] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:52] <noob2> ok i'll do t hat
[18:52] <noob2> should i go through and restart all of them one at a time?
[18:52] <noob2> or just the slow one
[18:57] * jlogan1 (~Thunderbi@2600:c00:3010:1:245d:471e:f78a:41f7) has joined #ceph
[18:58] <gregaf1> just the slow one should work
[19:03] <tpeb> maybe someone can help me !
[19:04] * gregaf1 (~Adium@2607:f298:a:607:15eb:d1d7:7645:a03c) Quit (Quit: Leaving.)
[19:04] <tpeb> i'm trying to install a ceph test server with two dedicated disk for ceph
[19:04] <tpeb> I want two osd deamon each use one disk
[19:05] <tpeb> but I want assign two pools on each osd (finaly one pool for one disk) without replication data (just a test for a future prod)
[19:06] <tpeb> but I fail with the crush map
[19:07] <tpeb> how can I create a simple crush map for the data, metadata and rdb pool using this one ? for example a disk for the data pool an a test pool with the second disk ?
[19:08] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[19:11] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:11] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:11] * gregaf (~Adium@2607:f298:a:607:fd99:359:95ec:8287) has joined #ceph
[19:12] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[19:14] * vata (~vata@208.88.110.46) has joined #ceph
[19:15] * illuminatis (~illuminat@89-76-193-235.dynamic.chello.pl) has joined #ceph
[19:16] <yehudasa> sagewk, gregaf: a quick peek at wip-3516 (pretty trivial)?
[19:21] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[19:24] * ChanServ sets mode +o joao
[19:25] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Remote host closed the connection)
[19:27] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[19:27] * ChanServ sets mode +o elder
[19:27] * drokita (~drokita@199.255.228.10) has joined #ceph
[19:30] * aliguori (~anthony@cpe-70-123-145-75.austin.res.rr.com) Quit (Remote host closed the connection)
[19:30] <gregaf> lurbs: you should connect with sjust about your hung OSD requests :)
[19:30] <gregaf> sjust: look at http://uber.geek.nz/ceph/
[19:35] <via> does anyone have a clue how i can resolve my pgs stuck unclean issue?
[19:37] <drokita> via: Is it not currently resolving itself?
[19:37] <via> there is no obvious reason it isnt working
[19:37] <via> one of the pgs that is failing:
[19:38] <via> http://pastebin.com/sBVg8L1q
[19:40] * firaxis (~vortex@unnum-91-196-193-107.domashka.kiev.ua) has joined #ceph
[19:40] <via> drokita: its been in this state for about 20 hours
[19:42] <via> that pastebin is one of 14 like that, all osds are active, rep size is 2... so that pg looks not degraded to me
[19:42] * danieagle (~Daniel@186.214.58.104) has joined #ceph
[19:46] <gregaf> yehudasa: that looks okay to me, assuming the surrounding setup is doing what I think it must be...
[19:46] <sjust> via: try marking osd2 out and then back in
[19:46] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[19:46] <sjust> ceph osd out 2; ceph osd in 2
[19:47] <via> ok, stand by
[19:47] * aliguori (~anthony@cpe-70-123-145-75.austin.res.rr.com) has joined #ceph
[19:47] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[19:47] <yehudasa> gregaf: thanks
[19:48] <via> so that lowered the number, i guess ill try that on all 4
[19:49] <via> why does that fix it?
[19:49] <sjust> it should not have, means the previous state was a bug
[19:49] <via> ok
[19:49] <sjust> did you by any chance have logs with debug osd = 20?
[19:49] <sjust> I'd need about 24hours of them
[19:49] * via is waiting fcor .55
[19:50] <via> no, i dont sorry
[19:50] <sjust> k
[19:50] <sjust> you are on argonaut/
[19:50] <sjust> ?
[19:51] <sjust> lurbs: looks like the request actually did complete, fwiw
[19:53] <via> .48.2 yes
[19:53] <sjust> k
[19:54] <sjust> lurbs: what ceph version are you running?
[19:54] <tpeb> Hey ! I can't remove a rdb object using rbd rm foo
[19:55] <tpeb> Removing image: 99% complete...failed.
[19:55] <tpeb> delete error: image still has watchers
[19:55] <tpeb> This means the image is still open or the client using it crashed. Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.
[19:55] <tpeb> 2012-11-26 19:50:59.742343 7f05a3bd6780 -1 librbd: error removing header: (16) Device or resource busy
[19:55] <tpeb> any idea ?
[19:56] <elder> Try again after closing/unmapping it or waiting 30s for the crashed client to timeout.
[19:57] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[19:58] <tpeb> how can I unmap device ?
[19:58] <elder> tpeb, did you map it with "rbd map ..."
[19:58] <elder> ?
[19:58] <noob2> tpeb: i just had the exact same error haha
[19:59] <noob2> tpeb: what version are you running?
[20:00] <elder> If so, then you would typically do "rbd unmap /dev/rbd1" (for example)
[20:00] <tpeb> ceph -v
[20:00] <tpeb> ceph version (commit:)
[20:00] <tpeb> lol
[20:00] <tpeb> @elder, yep I do that
[20:00] <cephalobot`> tpeb: Error: "elder," is not a valid command.
[20:00] <tpeb> elder, yep I map it using that
[20:00] <via> sjust: anyway, its 100% now, thank you for helping
[20:01] <sjust> via: sure, sorry I couldn't be more enlightening
[20:02] <tpeb> using this command rbd map foo --pool volumes --name client.admin
[20:03] <elder> tpeb, did you "rbd unmap" the device before you tried deleting the image with "rbd rm"?
[20:03] <tpeb> nope ^^
[20:03] <tpeb> thx
[20:03] <buck> Does anyone know how to add gitbuilders to the public website? (http://ceph.com/gitbuilder.cgi)
[20:03] <buck> I have one I'd like to add.
[20:04] <elder> dmick will be your best bet, buck.
[20:04] <elder> When he's online.
[20:04] <buck> elder: cool. I'll keep an eye out for him.
[20:09] * wer (~wer@wer.youfarted.net) Quit (Remote host closed the connection)
[20:09] * wer (~wer@wer.youfarted.net) has joined #ceph
[20:11] * wubo (80f42605@ircip4.mibbit.com) has joined #ceph
[20:12] * The_Bishop (~bishop@2001:470:50b6:0:6d2f:26f9:3892:c040) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[20:12] <wubo> I had an unfortunate sequence of drive failures that left me with pg's with unfound chunks. I'm trying to clean up the mess
[20:13] <wubo> When I run the revert: (ceph pg 0.159 mark_unfound_lost revert) the command hangs forever. Where should I look to find out why?
[20:13] <wubo> nothing shows up in dmesg or ceph -w
[20:13] <wubo> queries against pg's with unfound chunks also hang forever
[20:13] <sjust> wubo: version?
[20:13] * BManojlovic (~steki@212.69.21.174) has joined #ceph
[20:13] <wubo> ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
[20:13] <sjust> can you post ceph pg dump, ceph osd dump, ceph osd tree/
[20:13] <sjust> ?
[20:14] <wubo> osd tree: http://pastebin.com/WZThDiSD
[20:15] <wubo> pg dump: http://pastebin.com/uPDuqzW0
[20:16] <wubo> osd dump: http://pastebin.com/jF9tFBDs
[20:17] * nhm (~nh@184-97-251-146.mpls.qwest.net) has joined #ceph
[20:17] <lurbs> sjust: That's 0.54 from the debian-testing repository, on 12.04 LTS.
[20:17] <sjust> lurbs: thanks
[20:17] <lurbs> I can swap to the bobtail autobuild packages, if required.
[20:17] <sjust> just getting perspective, I think it's still a bug in the bobtail packages
[20:17] <sjust> lurbs: did you see vms start to hang
[20:17] <sjust> ?
[20:18] <lurbs> And can replicate the fault pretty easily. My incredibly ghetto solution has been to drop 'osd scrub load threshold' and induce a little bit of load on the boxes, to prevent the autoscrub.
[20:18] <sjust> hmm
[20:18] <lurbs> Yeah, the IO hangs pretty hard. Even pings fail.
[20:18] <sjust> to the vms?
[20:19] <lurbs> Yup.
[20:19] <sjust> are you seeing excessive memory usage on the osds?
[20:19] <lurbs> Didn't look, to be honest. They've got 96 GB so it wouldn't be immediately obvious.
[20:20] <sjust> how did you get IO to resume?
[20:20] <lurbs> Restart the affected OSD(s).
[20:20] <sjust> ok
[20:21] <sjust> does IO start to hang as soon as the delayed op messages start? or does it take a while?
[20:21] <lurbs> Don't recall, sorry.
[20:21] <lurbs> Takes a little while to detect the fault and respond.
[20:21] <sjust> from the log I saw, the op did complete, but was still reported incomplete
[20:22] <sjust> which suggests that we leaked a reference
[20:22] <sjust> that would result in a delayed op message and a memory leak
[20:22] <sjust> but not hung io
[20:22] <sjust> ...until the osd starts to swap
[20:22] <lurbs> The IO hang is pretty immediate. Certainly not delayed by how long it'd take the machine to go into swap.
[20:23] <sjust> ok
[20:23] <sjust> in that case, that log doesn't have a real example
[20:23] <sjust> ugh
[20:23] <lurbs> Bother. I can induce it pretty easily, and capture more logs, if that's helpful.
[20:23] <sjust> yeah, that would be good
[20:23] <sjust> i need debug osd = 20, debug optracker = 20, debug ms = 1
[20:26] <lurbs> Didn't have optracker, will add.
[20:26] <sjust> k
[20:29] * madphoenix (~btimm@128.104.79.82) has joined #ceph
[20:29] * dmick (~dmick@2607:f298:a:607:19e4:51dc:d444:6009) has joined #ceph
[20:29] <madphoenix> I have a few questions about deployment that I hope somebody can help with. First, is the maximum file size that Ceph can store limited by the size of OSDs?
[20:29] * ChanServ sets mode +o dmick
[20:30] <sjust> madphoenix: cephfs stripes files over multiple rados objects, so no
[20:30] <sjust> now, the maximum size of a rados object is limited by the size of the osd
[20:31] <madphoenix> sjust: what is the practical implication of that (rados object size limitation)?
[20:31] <sjust> are you using cephfs?
[20:31] <lurbs> A watch pot never boils, and a daemon in debug mode never crashes. :-/
[20:31] <sjust> lurbs: heh
[20:31] <madphoenix> I'm not using anything yet, but yes will probably be using CephFS as well as radosgw
[20:31] <madphoenix> As well as rbd, but I saw in the documentation that it stripes/replicates among OSDs
[20:32] <sjust> madphoenix: radosgw limits you to 5GB with normal upload, but no limit with multipart
[20:32] <sjust> there are no implications for cephfs
[20:33] <sjust> madphoenix: same with rbd, no such limitatino
[20:33] <madphoenix> sorry I'm not extremely familiar with the swift protocol, does it support multipart easily?
[20:33] <sjust> the only implication would be if you were trying to use rados objects directly
[20:33] <madphoenix> ok
[20:34] <sjust> yehudasa would know more, apparently swift has something analogous
[20:35] <madphoenix> i'll look into it more, cheers
[20:36] <yehudasa> http://docs.openstack.org/api/openstack-object-storage/1.0/content/large-object-creation.html
[20:36] <wubo> sjust: any clues? am I running a reasonable version?
[20:40] * The_Bishop (~bishop@p4FCDEF25.dip.t-dialin.net) has joined #ceph
[20:43] <lurbs> sjust: Think I've caught it again, but dump_ops_in_flight is showing nothing.
[20:43] <lurbs> http://paste.nothing.net.nz/e4e5cd
[20:43] <lurbs> I should be checking OSD 6, based on that, right?
[20:46] <madphoenix> yehudasa: thx, i'll give that a look
[21:01] <wubo> sjust: apparently the osd.1 daemon was down (in spite of the osd tree saying otherwise). bringing that back up let me revert the pg's
[21:04] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:05] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:13] * nwatkins (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[21:17] * BManojlovic (~steki@212.69.21.174) Quit (Quit: Ja odoh a vi sta 'ocete...)
[21:19] <via> sjust: so, i just increased the size of the metadata pool and n.ow the same thing is happening, but restarting osds doesnt resolve it
[21:24] <via> for one of the pgs that is stuck, up lists two osds and active lists 3
[21:26] * BManojlovic (~steki@212.69.21.174) has joined #ceph
[21:39] * doug (doug@breakout.horph.com) has joined #ceph
[21:41] * rlr219 (43c87e04@ircip1.mibbit.com) has joined #ceph
[21:42] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:42] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:45] <rlr219> I have a crush map question. I have a ceph cluster set up in a single cabinet in a data center. Currently I have 8 servers with 2 OSD's per server. In a different cabinet, I am looking to bring up 2 more servers with 2 OSD's each. I want those 2 servers to basically be a "back-up" or mirror of the current cluster. What is the best way to do this in the crush?
[21:45] <rlr219> Do I put the 2 clusters in a separate pools?
[21:50] <sjust> lurbs: so IO hung, but no delayed op messages?
[21:51] <lurbs> Yup.
[21:51] <lurbs> Well, a scrub hung.
[21:51] <sjust> via: can you post ceph osd tree?
[21:52] <sjust> lurbs: how long did it stay in scrub?
[21:52] <sjust> I'm not completely sure that hung scrubs are the problem, tbh
[21:56] <via> sjust: http://pastebin.com/Ui7bFySN
[21:57] <sjust> via: how ceph osd getcrushmap -o /tmp/map; crushtool -d /tmp/map
[21:57] <sjust> **how about: ceph osd getcrushmap -o /tmp/map; crushtool -d /tmp/map
[21:57] <dmick> sage: 3527
[21:57] <gregaf> rlr219: depends on what kind of backup you want…if you want a copy of everything on them, you could do it by putting them in a "DR" bucket and adding them to each of the CRUSH rules
[21:57] <via> okay, stand by, although i just created this crush map earlier today
[21:57] <dmick> er, sagewk ^
[21:58] <via> http://pastebin.com/ZKGTHm4f
[21:59] <sjust> ok, ceph -s and ceph pg dump
[21:59] <sjust> I suspect you are hitting a crush bug which has been more recently fixed
[22:00] <rlr219> gregaf: is there any documentation on that?
[22:02] <gregaf> not as much as there should be, but check out http://ceph.com/docs/master/rados/operations/crush-map/
[22:03] <via> sjust: http://pastebin.com/0Z0MPADp
[22:04] <sjust> via: your current crush map will happily place two replicas on the same host, if that's ok, you should remove the host level from the crush heirarchy
[22:05] <gregaf> rlr219: basically you would take advantage of the choosen options described there and add a second "take" stanza that selects a replica from the "DR" bucket, which would be your new ones
[22:05] <gregaf> the problem with this approach is that if these are bigger-but-slower disks (or something) they're now in the data path
[22:05] <gregaf> but Ceph doesn't have async replication or DR features yet
[22:06] * mdawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[22:06] <mdawson> Could someone help me?
[22:06] <mdawson> HEALTH_WARN 2928 pgs peering; 2928 pgs stuck inactive; 2928 pgs stuck unclean
[22:07] <mdawson> ceph osd dump
[22:07] <via> sjust: what would be the crush-y way to not do that?
[22:07] <mdawson> *snip* osd.0 up in weight 1 up_from 288 up_thru 288 down_at 287 last_clean_interval [258,287) 172.16.1.2:6800/7414 172.17.1.2:6801/7414 172.17.1.2:6803/7414 exists,up d9f67661-3b1b-4cf6-b6f6-9c053ced6c9c
[22:07] <sjust> via: just have the default pool contain all 4 osds directly
[22:08] <via> why would that prevent two copies from being on teh same host?
[22:08] <via> i thought the whole point of the hierarchy was for that
[22:08] <sjust> it won't
[22:08] <mdawson> all OSDs are shown as up, but the ckuster is stuck inactive
[22:08] <sjust> oh, this heirarchy doesn't prevent that
[22:08] <sjust>         step choose firstn 0 type osd
[22:08] <sjust> that will descend to the bottom and choose any N osds where N is the number of replicas
[22:08] <via> so what choose line would prevent it?
[22:08] <sjust> if you want to prevent two from being on the same host, you need:
[22:09] <sjust> step chooseleaf firstn 0 type host
[22:09] <via> ah
[22:09] * timmclaughlin (~timmclaug@69.170.148.179) Quit (Remote host closed the connection)
[22:09] <sjust> that will choose N hosts and then one osd from each
[22:09] <sjust> however, you can't have a replication level of 3 with that rule
[22:09] <via> okay, i'll try that, and i guess it'll take a long time to shuffle things around
[22:09] <via> oh
[22:09] <sjust> or rather, you only have two hosts
[22:09] <sjust> so it won't be able to find 3 hosts
[22:09] <via> but
[22:10] <via> with 3 replications and my current method, it would always have one copy on more than one node
[22:10] <sjust> yes
[22:10] <via> either way, thats not the problem, is it?
[22:10] <via> i will probably increase my data replication to 3 as well, but so far i can't seem to change it without causing problems <<_<
[22:10] <sjust> the trouble is that because of the particular way you have created your heiarchy, you are hitting an annoying crush bug
[22:11] <via> which particular way, so i know how to avoid it?
[22:11] <via> also, is this something fixed in later dev releases?
[22:11] * tpeb (~tdesaules@dan75-10-83-157-22-200.fbx.proxad.net) Quit (Quit: Lost terminal)
[22:11] <sjust> basically, if you are going to use replication level of 3 with 2 hosts and 4 osds, you should eliminate the host level of the heiarchy
[22:11] <sjust> since you aren't currently using it
[22:11] <via> ok
[22:11] <sjust> and yeah, this one I think is fixed in bobtail
[22:12] <via> so, could i use that all-in-one hierarchy for metadata and a different one for data (at 2 replications)?
[22:12] <sjust> actually, yes, you could
[22:12] <sjust> you would have a second "pool" root node
[22:13] <sjust> pool default-flat
[22:13] <sjust> or something
[22:13] <sjust> and have the metadata refer to that node rather than default
[22:13] <via> okay
[22:13] <sjust> step take default-flat
[22:13] <via> i'm gonna try this
[22:13] <sjust> etc
[22:13] <via> i'm mostly just experimenting with this before i start putting real crap on it
[22:13] <sjust> you can post the crushmap before you inject it and I'll take a look
[22:14] <via> ok
[22:14] <via> thanks
[22:14] <sjust> mdawson: please post the complete output of ceph osd tree, ceph osd dump, and ceph pg dump
[22:14] <rlr219> gregaf: not sure I follow you...
[22:15] * danieagle (~Daniel@186.214.58.104) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[22:15] <gregaf> rlr219: you said you had two OSDs you wanted to back up 8 others
[22:15] <mdawson> sjust: Looks like my storage network was down after some network changes. Working on a resolution now. Could that be the culprit?
[22:15] <gregaf> it sounds like what you really want is to periodically dump new changes to the backup ones; Ceph doesn't provide any tools for doing that
[22:15] <sjust> yes
[22:16] <gregaf> rlr219: so the other option for inside Ceph is to make those OSDs be part of every single PG, which is what the CRUSH rule/map manipulation gets you
[22:17] <via> sjust: play. o
[22:17] <via> er
[22:17] <via> okay, i'm gonna try this: http://pastebin.com/maP7Lv81
[22:17] <rlr219> I dont want to periodically dump changes, i am looking for real time changes to be captured. So what is changed in the primary cluster goes to the back-up as well.
[22:17] <gregaf> rlr219: then yes, in that case you want to change the CRUSH rules and map so that the "backup" nodes get every write
[22:17] <gregaf> as I described initially :)
[22:18] <rlr219> ok, like replicatiocint onto those osd's as well.
[22:18] <gregaf> yes
[22:24] <rlr219> gregaf: when you say the "take" stanza, are you refering to the items listed at the end of the bucket defs?
[22:24] <sjust> via: that crushmap looks good
[22:24] <via> okay, cool
[22:24] <via> i guess this will take another 24ish hours to rebuild <_<
[22:25] <via> i can't help but feel like ceph rebuilds take way too long... a raid1 rebuild on this same hardware took about 8 minutes
[22:26] <gregaf> rule data {
[22:26] <gregaf> ruleset 0
[22:26] <gregaf> type replicated
[22:26] <gregaf> min_size 1
[22:26] <gregaf> max_size 10
[22:26] <gregaf> step take default
[22:26] <gregaf> step choose firstn -1 type osd
[22:26] <gregaf> step emit
[22:26] <gregaf> }
[22:26] <gregaf> err
[22:27] <gregaf> rule data {
[22:27] <gregaf> ruleset 0
[22:27] <gregaf> type replicated
[22:27] <gregaf> min_size 1
[22:27] <gregaf> max_size 10
[22:27] <gregaf> step take default
[22:27] <gregaf> step choose firstn -1 type osd
[22:27] <gregaf> step take DR
[22:27] <gregaf> step choose firstn 1 type osd
[22:27] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) Quit (Quit: Leaving)
[22:27] <gregaf> step emit
[22:27] <gregaf> }
[22:27] <gregaf> rlr219: something like that ^
[22:28] <rlr219> ok.
[22:31] * sagelap1 (~sage@124.sub-70-197-142.myvzw.com) has joined #ceph
[22:32] <rlr219> so in this exapmle, my new OSDs would be in the pool DR, right?
[22:33] <gregaf> rlr219: the CRUSH pool, yes
[22:34] <rlr219> OK. and the step chhose firstn would be just a "1", not a negative?
[22:34] <gregaf> a negative number means "requested replicas minus this value"; a 0 means "requested replicas"; a positive number means "this many copies"
[22:35] * sagelap (~sage@2607:f298:a:607:7463:bf6a:b3fa:74f8) Quit (Ping timeout: 480 seconds)
[22:35] <gregaf> so that rule is saying "take all but one replica from the regular OSDs, and then take the last one from the DR osds"
[22:36] <rlr219> ok.
[22:37] <rlr219> When is the new manual coming out? ;-)
[22:37] <sjust> when did the last one come out?
[22:38] <rlr219> point taken sjust. LOl
[22:39] <dmick> about 90 seconds after the last push that affects the manual :)
[22:44] * sagelap1 (~sage@124.sub-70-197-142.myvzw.com) Quit (Ping timeout: 480 seconds)
[22:49] * jlogan1 (~Thunderbi@2600:c00:3010:1:245d:471e:f78a:41f7) Quit (Quit: jlogan1)
[22:50] * via upgraded to .52 in hopes of performance improvements
[22:52] <rlr219> Thanks gregaf. I appreciate your help today.
[22:58] * jlogan1 (~Thunderbi@2600:c00:3010:1:c411:8052:9a4c:99a) has joined #ceph
[22:59] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[22:59] * rlr219 (43c87e04@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[23:00] * jtang1 (~jtang@79.97.135.214) Quit ()
[23:01] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[23:03] * sagelap (~sage@131.sub-70-197-150.myvzw.com) has joined #ceph
[23:18] <mdawson> sjust: it was a network issue. Thanks for the response.
[23:22] <sjust> mdawson: sure, good luck
[23:27] <elder> joshd, is there really no way for user space to reliably determine which rbd device got mapped?
[23:27] * mdawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:29] <elder> Maybe we should add an option for the user to supply an opaque identifier that can be shown under /sys/bus/rbd/devices/X so it's at least possible.
[23:40] * sagelap1 (~sage@61.sub-70-197-128.myvzw.com) has joined #ceph
[23:42] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[23:43] * sagelap (~sage@131.sub-70-197-150.myvzw.com) Quit (Ping timeout: 480 seconds)
[23:43] * noob2 (a5a00214@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[23:49] <via> how terrible is it to run journal's on tmpfs? i assume it means recently committed data to the fs will be lost, but what about fs consistancy?
[23:50] <dmick> elder: by reliably determine, you mean to ask the kernel module somehow? isn't that 'presence of files in /sys/bus/rbd'?
[23:51] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[23:52] <lurbs> via: My understanding is that if you're using ext4 or xfs to back an OSD losing the journal means you need to rebuild the OSD from scratch.
[23:52] <lurbs> via: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/10021
[23:53] <yehudasa> gregaf, sagewk: pushed wip-3528, pretty trivial
[23:53] * nwatkins (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[23:54] <via> oh wow
[23:54] <via> thats scary
[23:54] <via> so having the journal on a separate device even means you're screwed if either device dies?
[23:56] <elder> dmick, I'm almost done writing an e-mail on the issue.
[23:56] <lurbs> Which is why your data placement rules, via the CRUSH map, are important - to make sure you have a replica in a different failure zone.
[23:56] <via> well, right now i have journal's on the same disk as the OSD, and its slow as crap for rebuilds... so i was thinking of moving to a shared SSD for the two OSD's on one node
[23:57] <via> but then that'll mean thats a single point of failure for losing the entire OSD's?
[23:58] <via> should i just do the former and disable the journal?
[23:59] <dmick> if you lose the journal, I don't think you don't lose the entire OSD; you lose only the transactions on the journal, I'm almost sure
[23:59] <via> okay, then thats tolerable for me at least
[23:59] <via> thre just seemd to be some back and forth about it on that list

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.