#ceph IRC Log


IRC Log for 2013-04-09

Timestamps are in GMT/BST.

[0:05] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:10] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[0:13] <Elbandi_> gregaf1: did you found something wrong in the log?
[0:14] <gregaf1> haven't gotten to it
[0:18] * diegows (~diegows@ has joined #ceph
[0:19] <Elbandi_> ok
[0:19] <Elbandi_> how can i enable debug/log the kernel module?
[0:20] * tnt_ (~tnt@228.204-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[0:22] * BillK (~BillK@58-7-53-210.dyn.iinet.net.au) has joined #ceph
[0:23] <dmick> Elbandi_: it's more difficult. You need a kernel compiled with CONFIG_DYNAMIC_DEBUG, and that's not super-common
[0:23] <dmick> http://ceph.com/w/index.php?title=Debugging&redirect=no#Kernel_Client_Debugging is still correct, I think
[0:24] <Elbandi_> uh
[0:24] <Elbandi_> ok, i try it
[0:25] <Elbandi_> hope, i hope that i find a bug :P
[0:27] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[0:29] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[0:34] * nz_monkey is now known as athrift
[0:39] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[0:46] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[0:47] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[0:47] * yanzheng1 (~zhyan@ has joined #ceph
[0:49] <Elbandi_> i got a "design" question: if there is a problem with mon/mds, why are syscalls freezed, instead of return with error code?
[0:49] <dmick> kinda like nfs hardmounts: the presumption is that hte cluster will be back shortly
[0:50] <dmick> failing would cause you to handle retry in the client, which may not even be aware it's talking to ceph
[0:53] <Elbandi_> i think almost all programs has error handling
[0:53] <dmick> you'd be surprised
[0:53] <Elbandi_> currently, this "freezeing", makes the process to D state
[0:54] <dmick> most programs don't even handle running out of memory
[0:54] <Elbandi_> unable to kill, unable to do anyting, except reboot :(
[0:54] <dmick> and yes, I know, it can be frustrating when the filesystem is really in a hole
[0:55] <dmick> but while "cat" might be easy to see how failure might be acceptable, "qemu" is not quite the same
[0:55] <dmick> and everything in between
[0:57] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[1:00] <Elbandi_> i quite often go, that mds crashed, and after mds is restarted, the mountpoint is still inaccessible
[1:01] <Elbandi_> umount not working, because "resource already in use", the process is in D state, so unable to kill
[1:01] <Elbandi_> i have to reset the server :(
[1:04] <dmick> yeah. the design is not really set up to handle that sort of failure. We expect the mds to come back and allow the filesystem to come back
[1:05] <dmick> if it did, this would all be much less painful
[1:06] <dmick> but there are such things as soft NFS mounts. I don't know if we've explored the ramifications of an operation timeout, at least in the MDS. It might be useful for development if nothing else
[1:06] <dmick> slang1: gregaf1: any opinions on a "timeout and error on call" cephfs client mode?
[1:07] <gregaf1> it's a lot more expensive for us to handle than it is for NFS, due to our shared coherent caches
[1:07] <gregaf1> this might come up in the future but for now we'd rather not
[1:08] <gregaf1> right now it'd be a band-aid solution to more serious problems
[1:08] <Elbandi_> the mds is coming back, bacuse i can access to the files with the other libcephfs clients
[1:09] <Elbandi_> the old mount is remain is bad state :(
[1:10] <dmick> have you tried umount -f?
[1:11] <Elbandi_> yes, no luck, umount is freezed too
[1:11] <dmick> getting that working might be the path of least resistance. Not sure.
[1:14] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[1:19] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[1:20] <joao> imjustmatthew, still around?
[1:21] * vanham (~vanham@ has joined #ceph
[1:21] <vanham> Hello yall
[1:22] <dmick> Hey everybody! It's vanham! :)
[1:22] <vanham> I'm getting a bunch of "connect claims to be ... not ... - wrong node" messages here and everything is stopped
[1:22] <vanham> Is there a place for me to start troubleshooting it?
[1:22] <vanham> version 0.56.4-1~bpo70+1
[1:22] <dmick> it is my belief that that message itself is necessarily a sign of a problem
[1:23] <vanham> Debian 7 with kernel 3.8.5
[1:23] <dmick> it just means a peer restarted since the last time I looked
[1:23] <dmick> so I'd probably start with ceph -s?
[1:23] <vanham> the next message is "fault with nothing to send, going to standby"
[1:24] * yanzheng1 (~zhyan@ Quit (Ping timeout: 480 seconds)
[1:24] <vanham> ceph -s?
[1:24] <vanham> let me take a look at it
[1:24] <vanham> 1 sec
[1:24] <vanham> Oh, ceph -s, right. health HEALTH_OK
[1:24] <vanham> But rbd and cephfs are not responding
[1:25] <dmick> rbd as in qemu-kvm vms, or kernel driver mounts, or?...
[1:25] <vanham> health HEALTH_OK
[1:25] <vanham> monmap e1: 3 mons at {a=,b=,c=}, election epoch 44, quorum 0,1,2 a,b,c
[1:25] <vanham> osdmap e41: 4 osds: 4 up, 4 in
[1:25] <vanham> pgmap v41886: 960 pgs: 960 active+clean; 74801 MB data, 168 GB used, 1825 GB / 2000 GB avail
[1:25] <vanham> mdsmap e35: 1/1/1 up {0=1=up:replay}, 1 up:standby
[1:25] <dmick> what about things like rbd ls or rados -p <pool> ls
[1:25] <vanham> rbd as in kernel device
[1:25] <vanham> root@mia1-node4:/var/lib/ceph/osd/ceph-4# rbd ls
[1:25] <vanham> 2013-04-08 20:25:51.412296 7f6c20624700 0 -- >> pipe(0x7f6c1c001c50 sd=5 :34289 s=1 pgs=0 cs=0 l=1).connect claims to be not - wrong node!
[1:26] <vanham> I use rbd with /dev/rbd/data/mia1-puppet
[1:26] <vanham> why are my public ips here?
[1:26] <vanham> hummm]
[1:27] <imjustmatthew> joao: mostly :) what do you need?
[1:28] <vanham> dmick, rbd ls works on one node but not on all the other ones
[1:28] <dmick> hmm
[1:28] <dmick> was this previously working and something happened to break it, or is this a new setup?
[1:29] <vanham> is there a way to stop everything, clean unsent messages and restart or something like that?
[1:29] <vanham> everything was working yesterday, CephFS, RBD with one VM
[1:29] <vanham> I was trying to change the config and restarted all the nodes (one at a time)
[1:30] <vanham> Then it broke, I think
[1:30] <dmick> what was the nature of your config changes?
[1:31] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[1:31] <vanham> I forgot to use my SSDs for journal so I tried to move to it. It didn't work (complained about a wrong journal) so I moved it back
[1:34] <joao> imjustmatthew, are you using the release deb?
[1:34] <vanham> Ok, let me s2013-04-08 20:33:31.003578 7fc110050700 0 -- >> pipe(0x18c4c80 sd=28 :0 s=1 pgs=0 cs=1 l=0).fault
[1:34] <vanham> 2013-04-08 20:33:31.003799 7fc10ff4f700 0 -- >> pipe(0x18c4f00 sd=29 :0 s=1 pgs=0 cs=1 l=0).fault
[1:34] <vanham> 2013-04-08 20:33:31.003895 7fc10fe4e700 0 -- >> pipe(0x18c5400 sd=30 :0 s=1 pgs=0 cs=1 l=0).fault
[1:34] <vanham> 2013-04-08 20:33:31.006434 7fc10fd4d700 0 -- :/11406 >> pipe(0x18c5680 sd=33 :0 s=1 pgs=0 cs=0 l=1).fault
[1:34] <vanham> 2013-04-08 20:33:31.006519 7fc10fc4c700 0 -- :/11406 >> pipe(0x18c5b80 sd=34 :0 s=1 pgs=0 cs=0 l=1).fault
[1:34] <vanham> 2013-04-08 20:33:31.006593 7fc10fb4b700 0 -- :/11406 >> pipe(0x18c5900 sd=30 :0 s=1 pgs=0 cs=0 l=1).fault
[1:35] <imjustmatthew> joao: yes, from ceph.com/debian-testing/ precise main
[1:35] <joao> kay thanks
[1:35] <vanham> I stopped everything them started everything now I have "with nothing to send, going to standby" at all my osd nodes
[1:35] <imjustmatthew> np
[1:38] * diegows (~diegows@ Quit (Read error: Operation timed out)
[1:39] <dmick> vanham: how did you stop/start?
[1:39] <vanham> /etc/init.d/ceph restart
[1:42] <Elbandi_> i found the bug: http://tracker.ceph.com/issues/4685 :>
[1:43] <vanham> Ok, starting over, I have a non-responsive ceph cluster. Where do I start testing?
[1:45] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[1:46] <joao> imjustmatthew, thanks; just figured what the issue is
[1:46] <dmick> vanham: yeah, sorry, distracted here
[1:46] <joao> I'll attempt a fix tomorrow
[1:47] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[1:47] <vanham> dmick, where would you start, I think I started at the wrong place
[1:47] <imjustmatthew> joao: Awesome, e-mail me if you need anything else: mroy@sandbox-ed.com
[1:48] <joao> sure, thanks
[1:48] <dmick> not sure I"ve had a cluster in that state. I'd start poking with ceph and see what responds. Things like ceph mon stat
[1:48] <dmick> ceph osd tree
[1:49] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[1:49] <dmick> there's no chance you've managed to update to different versions on different nodes, is there?
[1:49] <dmick> (that didn't take effect until reboot?)
[1:49] <dmick> *er, restart?
[1:51] <vanham> no... i started with 0.56.4 and have been with that since
[1:53] <vanham> hummmm
[1:53] <vanham> I think I found one thing that would make things bad
[1:54] <vanham> my public network have some few unique things because of my load balancer
[1:54] <vanham> is there a way for me to tell ceph not to use my public ips?
[1:56] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[1:57] <vanham> ok, the public IP thing is my problem
[1:58] <vanham> how do I restrict ceph to a specific range?
[1:58] * imjustmatthew (~imjustmat@pool-72-84-255-246.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[2:00] * tristanz (~tristanz@75-101-52-104.dsl.static.sonic.net) Quit (Quit: tristanz)
[2:02] <vanham> I'm back!!!
[2:02] <vanham> cool!
[2:04] <joao> well, I'm off for the night
[2:05] <vanham> My problem was that my public IPs are used for Direct Routing load balancing. So I have a couple of IPs that all the machines on the cluster have. ceph was trying to use those
[2:05] <Elbandi_> dmick: i setup dynamic_debug for kernel module, but i'm not seeing debug msg in dmesg
[2:06] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[2:08] <dmick> vanham: ah
[2:08] <dmick> so did you give all the daemons their own specific addresses in ceph.conf?
[2:09] <dmick> Elbandi_: did you also do the stuff in that wiki page?
[2:09] <vanham> no
[2:09] <vanham> can you do that to daemons other than monitors?
[2:10] <Elbandi_> dmick: yes
[2:10] <Elbandi_> # grep ceph /sys/kernel/debug/dynamic_debug/control |wc -l
[2:10] <Elbandi_> 891
[2:10] <dmick> vanham: yes
[2:10] <vanham> what config? for monitors it is "mon addr"
[2:11] <vanham> "osd addr"?
[2:11] <dmick> http://ceph.com/docs/master/rados/configuration/network-config-ref/#ceph-daemons
[2:11] <dmick> "You do not have to..."
[2:13] <dmick> Elbandi_: it's been a long time since I experimented there. elder, have you done this recently? any caveats?
[2:15] <yanzheng> echo module ceph +p >/sys/kernel/debug/dynamic_debug/control
[2:15] <vanham> Thanks! I'll do that
[2:15] <vanham> dmick, thank very much on this one
[2:15] <dmick> vanham: no worries, I think you were the one that did everything :)
[2:16] <vanham> What helped was that you told me to start with the ceph mon stat, ceph osd tree, etc commands
[2:17] <dmick> I don't know the nature of your config, but be aware that you can configure specific subnets to use in the [global] public network variable too
[2:17] <vanham> Then I saw that ceph were using the same IP for all of the nodes
[2:17] <vanham> osd nodes
[2:17] <dmick> good show
[2:17] <vanham> I solved the problem with the public addr on the [global] part
[2:17] <vanham> I misunderstood the "public addr" with the manual
[2:18] <vanham> I though "cluster addr" were the address ceph would use and "public addr" were the address ceph wouldn't
[2:18] <dmick> public network is [global]; public addr is daemon-specific. and, ah, yeah.
[2:20] <vanham> yeah, public network and cluster network. right
[2:23] * rustam (~rustam@ Quit (Remote host closed the connection)
[2:23] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[2:23] <vanham> Man, ceph is awesome!
[2:27] <dmick> we like to think so; glad you think so too!
[2:33] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[2:41] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[2:42] * alram (~alram@ Quit (Quit: leaving)
[2:42] * winston-d (~Miranda@ has joined #ceph
[2:53] * winston-d (~Miranda@ Quit (Quit: Miranda IM! Smaller, Faster, Easier. http://miranda-im.org)
[2:53] * winston-d (~Miranda@ has joined #ceph
[2:54] * dpippenger (~riven@ Quit (Remote host closed the connection)
[2:56] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) Quit (Remote host closed the connection)
[3:02] * rturk is now known as rturk-away
[3:06] * zhiteng (~Miranda@ has joined #ceph
[3:08] * winston-d (~Miranda@ Quit (Ping timeout: 480 seconds)
[3:09] * ivotron (~ivo@dhcp-59-157.cse.ucsc.edu) has joined #ceph
[3:09] * ivotron (~ivo@dhcp-59-157.cse.ucsc.edu) Quit ()
[3:09] * ivotron (~ivo@dhcp-59-157.cse.ucsc.edu) has joined #ceph
[3:12] * tristanz (~tristanz@c-71-204-142-149.hsd1.ca.comcast.net) has joined #ceph
[3:12] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[3:37] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[3:42] * zhiteng (~Miranda@ Quit (Ping timeout: 480 seconds)
[3:54] <elder> dmick, is it possible to simply kill off a running build on gitbuilder?
[3:54] <dmick> not in any sort of automated wya
[3:54] <dmick> you can log in and find the process(es)
[3:55] <elder> It failed, and I have since deleted the branch, and I couldn't get it to go away. I tried "force rebuild" hoping it would notice it was no longer around, but instead, it, well, forced it to rebuild.
[3:55] * winston-d (~Miranda@ has joined #ceph
[3:55] <dmick> if the branch is gone it shouldn't be rebuilding
[3:56] <dmick> which branch and which gitbuilder?
[3:56] * winston-d (~Miranda@ has left #ceph
[3:56] * winston-d (~Miranda@ has joined #ceph
[3:56] <elder> http://gitbuilder.sepia.ceph.com/gitbuilder-precise-kernel-amd64/
[3:56] <elder> review/wip-3761-5
[3:57] <dmick> ....how does a branch have a / in it? boggle
[3:59] <elder> Neat, huh?
[3:59] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[3:59] <elder> If I want to refer to it later, the '/' is replaced with '_'
[3:59] <dmick> is that just for grouping the listing, or are there other things you can do with the group of branches?
[3:59] <elder> No, it's arbitrary I think.
[3:59] <elder> I think '/' is just another character to git.
[3:59] <dmick> wacky
[3:59] <dmick> anyway
[4:00] <elder> Except in certain cases where it is used to separate repository_name/branch_name
[4:00] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[4:02] <dmick> weird. it shows the top commit in master as sage but you're the author. maybe he committed?
[4:02] <dmick> yeah
[4:02] <dmick> anyway
[4:02] <dmick> yeah, all kinds of branches are gone there it seems
[4:04] <dmick> I know it does a relist of branches to build in between builds; I *assume* it purges nonexistent branches then, but I don't know for sure
[4:04] <elder> Well by the time we figure it out it'll be done building.
[4:05] <dmick> by the time my ssh completes it'll be done building :/
[4:05] <elder> On the other htand it's taking a long time to respond to my query on my browser
[4:05] <elder> Yeah, a really, really long time.
[4:05] * tristanz (~tristanz@c-71-204-142-149.hsd1.ca.comcast.net) Quit (Quit: tristanz)
[4:05] <dmick> "She's dead, Jim."
[4:05] <elder> There it goes (I think)
[4:06] <elder> ssh: connect to host gitbuilder port 22: No route to host
[4:06] <elder> Maybe not.
[4:07] <elder> Well I don't think I can get on it anyway. But I can ping it now anyway.
[4:08] <dmick> lessee, that's vercoi07
[4:09] <elder> I'll take your word for it.
[4:09] <dmick> tehre's a google doc that explains that
[4:10] * ivotron (~ivo@dhcp-59-157.cse.ucsc.edu) Quit (Ping timeout: 480 seconds)
[4:11] <dmick> it's not very busy right now
[4:11] <dmick> weirdly
[4:11] <dmick> blocked task messages tho
[4:12] <dmick> from ccache procs
[4:12] <dmick> dog-slow console login
[4:12] <dmick> something ain't right
[4:12] <elder> I think I agree.
[4:13] <elder> It looked funny to me. But I can't get in to explore.
[4:13] <dmick> System information disabled due to load higher than 16.0
[4:13] <dmick> still waiting for a prompt
[4:13] <elder> Apparently the load spikes up in the 30's when there's a build going on.
[4:14] <elder> Sandon said that's not a problem because it's a big machine. I'
[4:14] <elder> m not convinced, but he may be right.
[4:14] <dmick> neither the host nor the vm seem particularly starved
[4:14] <elder> dmesg?
[4:14] <dmick> from the graphs from virt-manager
[4:14] <dmick> still no prompt
[4:14] <dmick> looking at vercoi07 from ssh
[4:15] <dmick> not swapping, but somethings in some vm is burying the cpu
[4:15] <elder> I saved the link to one set of graphs but I don't know how to get at the ones for, e.g., vercoi07. (Maybe I've never seen that)
[4:16] <dmick> if you connect with virt-manager you can see graphsf ro all the vms on a host
[4:22] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[4:35] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[4:43] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[4:44] * tristanz (~tristanz@c-71-204-142-149.hsd1.ca.comcast.net) has joined #ceph
[4:50] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[4:57] * Cube (~Cube@ Quit (Quit: Leaving.)
[5:02] * vanham (~vanham@ Quit (Ping timeout: 480 seconds)
[5:37] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:42] * chutzpah (~chutz@ Quit (Quit: Leaving)
[6:02] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[6:03] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit ()
[7:01] * eschnou (~eschnou@173.213-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:05] * danieagle (~Daniel@ has joined #ceph
[7:20] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) has joined #ceph
[7:31] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:37] * eschnou (~eschnou@173.213-201-80.adsl-dyn.isp.belgacom.be) Quit (Read error: Operation timed out)
[7:54] * tnt (~tnt@228.204-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:58] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[8:06] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:10] * norbi (~nonline@buerogw01.ispgateway.de) has joined #ceph
[8:12] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:18] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[8:18] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[8:21] * eschnou (~eschnou@173.213-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:28] * loicd (~loic@ has joined #ceph
[8:28] * jlogan (~Thunderbi@2600:c00:3010:1:64ea:852f:5756:f4bf) Quit (Read error: Connection reset by peer)
[8:33] * eschnou (~eschnou@173.213-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[8:40] * sleinen (~Adium@2001:620:0:25:6def:b44b:fa6a:b63c) has joined #ceph
[8:41] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[8:41] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit ()
[8:42] * tristanz (~tristanz@c-71-204-142-149.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[9:09] * loicd (~loic@ Quit (Quit: Leaving.)
[9:16] * gerard_dethier (~Thunderbi@ has joined #ceph
[9:16] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:17] * eschnou (~eschnou@ has joined #ceph
[9:18] * tnt (~tnt@228.204-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:20] * Vjarjadian (~IceChat77@ Quit (Quit: Now if you will excuse me, I have a giant ball of oil to throw out my window)
[9:21] * gerard_dethier (~Thunderbi@ has left #ceph
[9:23] * itamar_ (~itamar@IGLD-84-228-64-202.inter.net.il) has joined #ceph
[9:24] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[9:27] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[9:28] * ScOut3R (~ScOut3R@ has joined #ceph
[9:32] * leseb (~Adium@ has joined #ceph
[9:33] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) has joined #ceph
[9:33] * Morg (b2f95a11@ircip3.mibbit.com) has joined #ceph
[9:33] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[9:33] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:36] * BManojlovic (~steki@ has joined #ceph
[9:37] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:43] * l0nk (~alex@ has joined #ceph
[9:43] * capri (~capri@ Quit (Read error: Connection reset by peer)
[9:43] * capri (~capri@ has joined #ceph
[9:45] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[9:48] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[10:04] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[10:14] <Kioob`Taff> Hi
[10:14] <Kioob`Taff> I see that the «PG split» feature still be experimental
[10:14] <Kioob`Taff> any idea when it will be «safe» ? :p
[10:16] <Morg> prolly in 1.0 ver ;]
[10:18] * LeaChim (~LeaChim@ has joined #ceph
[10:21] <Morg> btw. anyone checked new openstack grizzly?
[10:23] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:25] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) Quit (Quit: Leaving.)
[10:28] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) has joined #ceph
[10:29] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) Quit ()
[10:34] * mattch (~mattch@pcw3047.see.ed.ac.uk) has left #ceph
[10:35] * mega_au (~chatzilla@ Quit (Remote host closed the connection)
[10:38] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[10:44] * smeven (~diffuse@ has joined #ceph
[10:44] * smeven is now known as diffuse
[10:45] <diffuse> hi
[10:49] <alexxy> joao: how can i register on bugzie
[10:49] <alexxy> i still cannot get link
[10:52] <alexxy> joao: i cannot comment on tracker
[10:52] <alexxy> joao: but patch from http://tracker.ceph.com/issues/4644 doesnt help
[10:57] * winston-d (~Miranda@ Quit (Quit: Miranda IM! Smaller, Faster, Easier. http://miranda-im.org)
[11:10] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) Quit (Remote host closed the connection)
[11:10] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[11:10] <mrjack> hi
[11:11] <mrjack> is it a problem to mix different fs-types on different hosts? eg. having some hosts osds with ext4 and some hosts osds xfs?
[11:11] <absynth> not sure why you would want to do this?
[11:11] <mrjack> absynth: i want to migrate from ext4 to xfs
[11:18] <mrjack> so will it be a problem?
[11:18] * ramsay_za (~ramsay_za@ has joined #ceph
[11:18] * ramsay_za (~ramsay_za@ Quit ()
[11:21] <absynth> mrjack: not sure, probably worth to wait for joao
[11:28] * diegows (~diegows@ has joined #ceph
[11:33] <soren> Is there any way to limit how much space someone can consume in the distributed cephfs? Like if I only want to allocate up to 1 TB of space for it, even though my cluster has much more space than that?
[11:36] * Yen (~Yen@ip-81-11-198-39.dsl.scarlet.be) Quit (Ping timeout: 480 seconds)
[11:41] * Yen (~Yen@ip-83-134-98-4.dsl.scarlet.be) has joined #ceph
[12:25] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[13:16] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[13:20] <joao> sorry, overslept; clearly still suffering the effects of all the rock climbing from last weekend
[13:20] <joao> here now
[13:21] <joao> alexxy, not sure why that's still happening to you, I'll mention that again later today
[13:22] <joao> mrjack, I don't think that would be a problem; the osds would work regardless of the fs they are using
[13:23] <joao> mrjack, I'd advise you'd send an email to ceph-users though, or to hang around until some of the LA guys arrive; they should know better than me
[13:23] * vanham (~vanham@ has joined #ceph
[13:36] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[13:41] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[13:52] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Ping timeout: 480 seconds)
[13:59] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[14:02] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:07] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Ping timeout: 480 seconds)
[14:13] * hybrid5121 (~walid@106-171-static.pacwan.net) has joined #ceph
[14:14] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Read error: Connection reset by peer)
[14:18] * checka (~v0@91-115-228-64.adsl.highway.telekom.at) has joined #ceph
[14:20] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[14:21] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[14:23] * The_Bishop (~bishop@2001:470:50b6:0:18a9:d14e:1316:a12f) has joined #ceph
[14:25] <matt_> joao, probably a random question but... can you recommend any books or otherwise that might help a rookie developer transition from ruby/python to C?
[14:26] <joao> oh, not the best person to answer that
[14:26] <Matt> I don't even remember what I originally learned C from
[14:26] <joao> I learned C using 'C in 21 days' some 10 years back; don't think I ever read any other book on that, except for K&R
[14:27] <Matt> I think I've had an assortment of random books on C
[14:27] <Matt> I used to swear by the O'Reilly stuff, not sure what their books on C are like tho
[14:29] <matt_> I'd like to start contributing to open source projects but most are out of my league at the moment. Ceph's source largely looks like jibberish to me and I'd like to fix that :)
[14:32] * barryo (~borourke@cumberdale.ph.ed.ac.uk) Quit (Quit: Leaving.)
[14:34] <joao> matt_, the transition to c++ might be easier
[14:34] <joao> not by much though
[14:36] <absynth> joao: is 0.61 still slated for early may?
[14:36] <joao> absynth, iirc, approx. 3 weeks away
[14:37] <joao> so I guess May is a fair estimate
[14:38] <joao> brb
[14:39] <Matt> I forget my path to C :)
[14:40] <Matt> if you don't count messing around with a few other languages on the way, it was probably right from several varients of BASIC to C
[14:42] * vanham (~vanham@ Quit (Ping timeout: 480 seconds)
[14:48] * barryo (~borourke@cumberdale.ph.ed.ac.uk) has joined #ceph
[14:48] <alexxy> joao: btw patch from bugzie doesnt solve problem
[14:52] <matt_> I mean, I can program. I just don't know C well enough
[14:52] * sivanov (~sivanov@gw2.maxtelecom.bg) has joined #ceph
[14:52] * itamar_ (~itamar@IGLD-84-228-64-202.inter.net.il) Quit (Remote host closed the connection)
[14:52] <matt_> templates and classes and whatnot look for foreign when they aren't in Ruby
[14:53] <sivanov> hi all. It is possible to allow other linux users to execute ceph commnads?
[14:53] <sivanov> thanks in advance
[14:53] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[14:54] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[14:55] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit ()
[14:56] <scuttlemonkey> sivanov: how do you mean? I usually do everything with sudo on ubuntu...so it would be as easy as adding them to sudoers
[14:57] <sivanov> i not have sudo. The problem is between opennebula and ceph . user oneadmen cannot execute rbd commands
[14:57] <sivanov> it si possible to add user oneadmin to keyring
[15:01] <scuttlemonkey> you can create keys for whomever you like really
[15:01] <scuttlemonkey> http://ceph.com/docs/master/rados/operations/authentication/
[15:01] <sivanov> many thanks
[15:01] <scuttlemonkey> for instance, when setting up openstack I often create a 'volumes' user and key
[15:02] <scuttlemonkey> 'ceph auth get-or-create client.volumes mon 'allow r' osd 'allow rwx pool=volumes, allow rx pool=images'
[15:02] <scuttlemonkey> or you can just do 'allow *' on mon, osd, mds
[15:03] <scuttlemonkey> depending on what permissions you want to give
[15:03] <sivanov> okay, i will try
[15:03] <scuttlemonkey> also make sure you have either the latest driver, or have run the sed fix that jaime pushed
[15:03] <scuttlemonkey> http://opennebula.org/documentation:rel4.0:ceph_ds
[15:06] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[15:14] <alexxy> joao: can you mark on bugzie that patch doesnt work?
[15:16] * vanham (~vanham@load1-1.supramail.com.br) has joined #ceph
[15:16] <liiwi> matt_: http://c.learncodethehardway.org
[15:20] <joao> alexxy, updated
[15:21] <joao> lunch; I'll be around but may have delays answering
[15:21] <leseb> #info leseb qemu and kvm are not compiled with the RBD support
[15:22] <leseb> #info leseb I'm stuck and I need to ask a packager
[15:23] <leseb> #info leseb I considered to switch to another task, like the benchmark methods or the RDW benchmark
[15:23] <leseb> #info leseb out!
[15:25] <leseb> sorry wrong channel...
[15:32] * Morg (b2f95a11@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[15:44] * mattch (~mattch@pcw3047.see.ed.ac.uk) Quit (Ping timeout: 480 seconds)
[15:50] * stiller1 (~Adium@2001:980:87b9:1:30f7:19d2:ed14:a775) has joined #ceph
[15:50] <absynth> mrjack: still there?
[15:51] * goldfish (~goldfish@ Quit (Remote host closed the connection)
[15:54] * stiller (~Adium@2001:980:87b9:1:7cc2:7ceb:9e0f:228f) Quit (Ping timeout: 480 seconds)
[15:54] * sivanov (~sivanov@gw2.maxtelecom.bg) Quit (Quit: Leaving)
[16:00] * mikedawson (~chatzilla@ has joined #ceph
[16:03] * rahmu (~rahmu@ has joined #ceph
[16:13] * hannes_ (~hannes@bristol.ins.cwi.nl) has joined #ceph
[16:14] <hannes_> Hello everyone!
[16:14] <vanham> hello!
[16:15] <hannes_> I have been starting to use ceph for our database use cases and am very impressed. One question: The snapshot-layering documentation says that COW snapshots are not yet supported by the rbd kernel module. Any news on when this might be? Alternatively, is there another way to access a COW snapshot as a block device from linux?
[16:16] <hannes_> (currently using 0.56.4)
[16:22] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[16:23] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[16:24] <hannes_> Oh, I see a github commit with the comment "rbd: activate v2 image support" - Do I just have to upgrade then?
[16:24] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[16:25] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Quit: Leaving.)
[16:26] <scuttlemonkey> hannes_: as of a couple weeks ago the userland stuff still hadn't made it to the kernel
[16:26] <scuttlemonkey> lemme ask one of the rbd guys and see if they can give me a more definite answer on status
[16:26] <hannes_> cool, thanks!
[16:26] <scuttlemonkey> (although they may not be in for a couple of hours)
[16:27] <hannes_> should i ask on the mailing list?
[16:28] <scuttlemonkey> you are certainly welcome to...although I just dropped them a line and can poke you when they answer
[16:28] <scuttlemonkey> whichever is your preference
[16:28] <hannes_> if you poke me via mail at hannes@cwi.nl, that would be a+, i am not constantly on IRC (used to, those were the days...)
[16:29] <hannes_> in the meantime, i will try whether this is in the latest development version
[16:29] <scuttlemonkey> sure thing
[16:29] <hannes_> wow, that is impressive support
[16:30] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[16:30] <scuttlemonkey> hehe, happy we could help
[16:37] <Azrael> has anybody successfully deployed ceph with the ceph chef cookbooks?
[16:37] <Azrael> hi scuttlemonkey
[16:37] <Azrael> scuttlemonkey: i'm getting close, i think
[16:38] * hannes_ (~hannes@bristol.ins.cwi.nl) Quit (Quit: Konversation terminated!)
[16:40] <Azrael> scuttlemonkey: once my changes are finished, i'll post on github and submit a pull request
[16:44] * norbi (~nonline@buerogw01.ispgateway.de) Quit (Quit: Miranda IM! Smaller, Faster, Easier. http://miranda-im.org)
[16:51] <scuttlemonkey> azrael: awesome! I know dreamhost is using them...but they are also the ones who are tearing down the public ones for the new stuff
[16:52] <Azrael> ahh ok. and thats due for cuttlefish iirc.
[16:52] <scuttlemonkey> yeah
[16:52] <Azrael> i can haz sneak peek? :-)
[16:52] <scuttlemonkey> hehe
[16:52] <scuttlemonkey> they are actually going to do the whole RC thing
[16:53] <mrjack> joao: ok
[16:53] <scuttlemonkey> and we'll make some noise when the RC is available
[16:53] <Azrael> what do you mean by RC thing?
[16:54] <mrjack> joao: i had the first osds installed with ext4, but i read that xfs would be better supported.. now i am expanding the cluster and would like to use xfs on the new osds.. if that is no problem when other hosts osds are ext4, i'll now continue to integrate the new nodes in my cluster ;)
[16:54] <scuttlemonkey> a release candidate
[16:54] <scuttlemonkey> in advance of the actual release
[16:54] <mrjack> absynth: yes
[16:54] <janos> mrjack: mixed osd's are fine as far as i know
[16:55] <janos> mrjack: i have half btrfs and half xfs
[16:55] <absynth> mrjack: do you have kvm-qemu vms?
[16:55] <absynth> do your VMs see the "aes" cpu feature?
[16:55] <mrjack> yes
[16:55] <joao> mrjack, I don't believe having a mixed cluster of ext4 and xfs is a problem, but you should check with someone else in case there's some downside I'm not aware of
[16:55] <mrjack> well
[16:55] * aliguori (~anthony@ has joined #ceph
[16:55] <Azrael> scuttlemonkey: nice
[16:55] <mrjack> currently my vms see the "vmx" flag to use kvm inside kvm but i could also support aes
[16:56] <absynth> hmm
[16:56] <absynth> is that a start parameter for qemu or how do you set the flags that are exported to the guests?
[16:57] <elder> slang1, can you tell me what a workunit bash script will see when it runs "pwd"?
[16:57] <mrjack> i use libvirt
[16:58] <slang1> elder: I think its /home/ubuntu/cephtest/
[16:58] <mrjack> absynth: try something like: <cpu match='exact'><model>Nehalem</model><feature policy='require' name='vmx'/></cpu>
[16:58] <slang1> elder: let me check
[16:58] <elder> Or whatever it's configured as in the .teuthology.yaml file?
[16:58] <slang1> yes
[16:58] <mrjack> absynth: -cpu Nehalem,+vmx -enable-kvm
[16:59] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Read error: Connection reset by peer)
[16:59] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[16:59] <elder> What about... I'm interested in having a bash library script so I don't have to keep repeating code in multiple workunits. Is that possible?
[17:00] <absynth> hmm, i think we don't use xml config files
[17:01] <slang1> elder: looks like its: /home/ubuntu/cephtest/mnt.0/client.0/tmp
[17:02] <elder> If I wanted to do something like: "source rbd_lib.sh" at the top of more than one workunit script, could I do that, or will just the named script be available? And if I could do that, would I need to specify a path?
[17:02] <elder> OK, I think I've seen a path like that before.
[17:02] <mrjack> absynth: well then just try -cpu parameter
[17:03] <slang1> elder: you can do that, but you'll need to specify a path
[17:03] <slang1> elder: which is...
[17:04] <absynth> mrjack: will do, thanks
[17:08] <slang1> elder: looks like its: /home/ubuntu/cephtest/workunit.client.0/
[17:09] * EWDurbin (~ernestd@ewd3do.ernest.ly) has joined #ceph
[17:09] <elder> Splendid. I'll give it a try. Thanks a lot slang1
[17:09] <slang1> elder: you'll want something like source $PWD/workunit.*/rbd_lib.sh I guess
[17:09] <elder> Oh, and is /home/ubuntu/cephtest
[17:09] <elder> OK
[17:09] <elder> That was my next quest. $PWD will be the directory set in .teuthology.yaml?
[17:09] <elder> Or will I have to get that somehow myself?
[17:10] <slang1> slang1: it will be a test subdir within that dir, yes
[17:10] <elder> OK, that gives me plenty to work with. I'll figure out the details.
[17:10] <slang1> elder: actually, that's not quite right
[17:11] <slang1> elder: the process is to create a workunit subdir within <testdir> (e.g. /home/ubuntu/cephtest)
[17:12] <slang1> elder: and do git archive ... to pull the latest workunit source into that subdir, so those files go in <testdir>/workunit.<role>
[17:12] <slang1> elder: then the workunit script is actually run from a subdir within the mountpoint
[17:13] <slang1> elder: so for a relative path, it would need to be: $PWD/../../../workunit.*/rbd_lib.sh
[17:13] <slang1> (I think)
[17:13] <elder> It seems like we might want to run *outside* the subject of the test.
[17:14] <elder> Unless we want to explicitly do so within the test.
[17:14] <elder> But hey, I'm not going to change it.
[17:14] <slang1> elder: subject of the test? ..not following..
[17:14] <elder> I created a task to do this because it seems like it's going to be a little work. I'll document what you've described though so I can get to it.
[17:15] <elder> I mean, we should create the scripts that run in /tmp, not in /tmp/<mounted ceph directory> if we're testing the <mounted ceph directory>
[17:15] <elder> Is that clearer?
[17:15] <slang1> elder: yeah it does that
[17:15] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[17:15] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Quit: Leaving.)
[17:16] <slang1> elder: scripts go in <testdir>/workunit.foo
[17:16] <slang1> elder: run from <testdir>/mnt.0/client.0/tmp
[17:16] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[17:16] <elder> Oh, so you mean the script is run from there, but more like (cd <mountpoint>; run <path-to-script>)
[17:16] <elder> I see
[17:16] <slang1> elder: yep
[17:16] <elder> OK, that's good.
[17:17] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[17:18] <slang1> elder: hmm..could you use:
[17:18] <slang1> wd=$(dirname $0)
[17:18] <slang1> source ${wd}/rbd_lib.sh
[17:19] <elder> That sounds pretty good...
[17:20] <elder> If someone doesn't specify test_path in ~/.teuthology.yaml is it just /home/ubuntu/cephtest/?
[17:20] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Remote host closed the connection)
[17:21] <slang1> elder: /tmp/cephtest/
[17:21] <elder> OK
[17:21] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:21] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[17:22] <slang1> elder: its actually /tmp/cephtest/<jobname> or /tmp/cephtest/<user>-<tstamp>
[17:23] <slang1> (avoids clobbering old tests and cleanup issues)
[17:23] <elder> OK, just fleshing out my understanding, it's not critical.
[17:24] * jlogan (~Thunderbi@2600:c00:3010:1:64ea:852f:5756:f4bf) has joined #ceph
[17:24] <slang1> elder: btw, that mds crash yesterday looks like the same one from before
[17:24] <elder> But that was fixed, or does that mean it was not?
[17:25] <slang1> elder: yeah must not be
[17:25] <slang1> elder: what I fixed was a bug, but there must be others in that path
[17:25] <alexxy> joao: any news?
[17:26] <elder> Do I need to open a new bug for it? Or should I add to the existing one with some information about this instance?
[17:26] <slang1> elder: I was going to add something from that pastebin you sent a link for
[17:26] <elder> OK, great.
[17:27] <mrjack> ok so i setup the new osds with xfs like this, is that okay as i modified the ceph.conf osd section could someone please take a look at http://pastebin.com/cZBCLbzn ?
[17:30] * Vjarjadian (~IceChat77@ has joined #ceph
[17:32] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[17:33] <mrjack> do i now just have to make ceph osd create 5,6,7?
[17:33] * portante|afk is now known as portante
[17:38] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:42] * drokita (~drokita@ has joined #ceph
[17:45] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[17:45] * hybrid5121 (~walid@106-171-static.pacwan.net) Quit (Quit: Leaving.)
[17:47] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[17:48] * sleinen (~Adium@2001:620:0:25:6def:b44b:fa6a:b63c) Quit (Quit: Leaving.)
[17:48] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit ()
[17:49] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[17:49] * tnt (~tnt@228.204-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[17:51] <mrjack> ok i now added the osds, created osd data dir with --mkfs and --mkkey..
[17:52] <mrjack> but when i try to add the osd to crush i get this
[17:52] <mrjack> # ceph osd crush
[17:52] <mrjack> unknown command crush
[17:52] * mikedawson (~chatzilla@ Quit (Ping timeout: 480 seconds)
[17:55] <mrjack> i try like explained here:
[17:55] <mrjack> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[17:55] <mrjack> ceph osd crush set {id-or-name} {weight} pool={pool-name} [{bucket-type}={bucket-name} ...]
[17:55] <mrjack> which pool name should i choose?
[17:57] <scuttlemonkey> mrjack: if you would prefer to see the crushmap and add the new osd in using the others as a template you can do it that way instead
[17:57] <scuttlemonkey> http://ceph.com/docs/master/rados/operations/crush-map/#tuning-crush-the-hard-way
[17:58] <mrjack> no i wan't the easy way ;)
[17:58] <scuttlemonkey> hah
[17:58] <mrjack> without set and get crushmap etc
[17:58] <scuttlemonkey> it says the hard way...but honestly I prefer that way :P
[17:59] <scuttlemonkey> to answer the pool question you can look at this example: http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds
[17:59] <mrjack> i guess i will have all 3 pools on these new osds
[17:59] <mrjack> so parameter would be pools=0,1,2 or what?
[18:00] <topro> mrjack: for me something like this works
[18:00] <topro> ceph osd crush set [id] osd.[id] 1.0 pool=default rack=unknownrack host=[HOSTNAME]
[18:01] <mrjack> hm i read that it would be wise to set weight to some small value (0.2) and reweight later? does that matter?
[18:01] <topro> mrjack: only for ceph older than bobtail
[18:01] <topro> its written there, too
[18:01] <mrjack> i use ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)
[18:01] <mrjack> yes but when i try:
[18:01] <mrjack> root@node06:~# ceph osd crush
[18:01] <mrjack> unknown command crush
[18:01] <topro> that is bobtail, so you should be fine
[18:01] <mrjack> !?
[18:02] <mrjack> yeah should ;)
[18:03] <topro> i get unknown command with "ceph osd crush" as well. you have to specify the full command with all paramneters. error message is misleading!
[18:03] <absynth> yep
[18:03] <mrjack> ah ok
[18:03] <absynth> i was just about to say that
[18:03] <loicd> In https://objects.dreamhost.com/inktankcom/DreamCompute%20Architecture%20Blueprint.pdf the nodes have 32GB of RAM each. Does that mean dream compute cannot provide virtual machines larger than 16GB ( I assume 32GB means there would be no more ram for the OSD and be bad ) ?
[18:03] <topro> faster ;)
[18:04] <loicd> And good day :-)
[18:04] <mrjack> ah ok
[18:04] <absynth> loicd: as far as i remember, dreamhost does not colocate storage and compute nodes
[18:04] <mrjack> oh i made a mistake
[18:04] <absynth> but the document says otherwise... or is at least a bit inconclusive
[18:04] <mrjack> ceph osd crush set 4 1.0 pool=default rack=unknownrack host=node06 - but i wanted node05, what now?
[18:05] <absynth> loicd: i think the config on page 2 is the baseline config for a storage XOR compute node
[18:05] <topro> mrjack: edit the crushmap ;)
[18:05] <mrjack> can i just do:
[18:05] <mrjack> ceph osd crush set 4 1.0 pool=default rack=unknownrack host=node05
[18:05] <absynth> "If these machines are also to host virtual machines, additional memory (and perhaps cores) will need to be added, ac
[18:05] <absynth> -
[18:05] <absynth> cording to the expected needs of those VMs.
[18:05] <absynth> sorry for the shitty paste
[18:05] <absynth> "
[18:05] <loicd> absynth: "This architecture has been optimized for density (vs. speed, throughput, or capacity) because they expect to be doing a huge amount of computation (running VMs)." lead me to think that it is about running VMs on those nodes.
[18:06] <absynth> that sentence pertains to the overall architecture
[18:06] <absynth> ie. the pod concept
[18:06] <topro> mrjack: don't know
[18:07] <topro> maybe first remove the wrong one, then add the right one
[18:07] <topro> as in http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[18:07] <absynth> i think i pretty clearly remember someone high up saying they decided against colocating storage and compute on the same machine
[18:08] * BillK (~BillK@58-7-53-210.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:08] <loicd> absynth: thanks for the hint. Reading again.
[18:08] <loicd> it changes the perspective quite a bit
[18:10] * gucki (~smuxi@84-73-190-117.dclient.hispeed.ch) has joined #ceph
[18:10] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Remote host closed the connection)
[18:11] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[18:12] <loicd> absynth: do you remember the rationale for not colocating ?
[18:14] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[18:14] <matt_> colocating would make a lot of sense when you factor in how much bandwidth the VM hosts need to the OSD's
[18:15] * vata (~vata@2607:fad8:4:6:dcad:e235:1569:718a) has joined #ceph
[18:16] <pioto> if you have your OSD journal on a separate device, and that device fails, does the OSD fail too, then, and have to be rebuilt? or, what?
[18:16] <pioto> can you just replace the journal device and the OSD will just catch up?
[18:17] <matt_> pioto, you can replace the journal and the osd doesn't know. You run the risk of have corrupt object though because you lose anything in the journal
[18:17] <pioto> yeah, that's what i suspected
[18:17] <pioto> it'd have some partial commit or whatever
[18:18] <pioto> that it ends up assuming is complete
[18:18] <loicd> matt_: that would involve the ability for the openstack scheduler to spawn the VM near the primary osd. I don't think openstack knows how to do that yet, does it ?
[18:18] <pioto> but i guess that'd eventually get corrected by the scrubs?
[18:18] <pioto> or, caught at read by a checksum fail?
[18:19] <matt_> pioto, a deep scrub should catch it
[18:19] <matt_> I probably wouldn't take the risk but if you can't rebuild for whatever reason it would probably carry on
[18:20] <matt_> loicd, I don't think there is a single primary OSD for an RBD image unless you specified one in the crush map
[18:21] <loicd> oh, that's right
[18:21] <infernix> i know ceph can't do replication across datacenters
[18:21] <infernix> but is there a way to make that work just for radosgw somehow?
[18:22] <matt_> loicd, I meant more that they have 2x 10G per server already and having the VM's in a different pod means you have to push all the traffic over the spline
[18:22] <loicd> understood
[18:23] * winston-d (~Miranda@ has joined #ceph
[18:23] <loicd> there is a way to instruct ceph to keep all OSD for a given RBD in the same node ( pod in this case ) ?
[18:24] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[18:24] <matt_> Look here : http://ceph.com/docs/master/rados/operations/crush-map/#placing-different-pools-on-different-osds
[18:24] * loicd looking
[18:24] <mattch> infernix: You could do this easily enough with the crushmap surely (assuming of course your connection between datacentres is reliable enough?)
[18:24] <matt_> you can use crush to stick data on a certain primary and then have the secondaries elsewhere
[18:25] <matt_> you would just need to have the primary as a host instead of the SSD pool in the example
[18:25] <matt_> and a pool per host
[18:25] <infernix> but can i do that per pool?
[18:25] <infernix> i already have pools per datacenter
[18:25] <infernix> and i do have an 80ms 1gbit link between the two, though there's maintenance downtime on it from time to time
[18:26] * rahmu is now known as joehrahme
[18:26] <infernix> crushmaps are per cluster, not per pool afaik
[18:26] <matt_> infernix, do you mean you just want the gateway in the different DC accessing the pool from the other DC?
[18:26] * loicd thinking
[18:27] <infernix> well i know ceph itself can't do async replication yet, so i'm trying to think of an easy way to do it for radosgw
[18:27] <infernix> the radosgw api isn't as latency sensitive as rbd would be
[18:28] <matt_> infernix, technically all ceph needs is layer 3 comms between osds/mons/gateways. It would probably work with performance limitations
[18:28] <infernix> yes but again, a crushmap goes for the entire cluster, not for one pool
[18:29] <infernix> i could buy another set of nodes and set up a new cluster but that would cost me $60k
[18:29] <mattch> I was wondering if anyone had done any work with ceph and redundant routes with multipathing etc for reliability between datacentres
[18:29] <gregaf1> you can set different rules for different pools; sounds like you want to do that
[18:30] <matt_> You can't really cover a split brain scenario between the two DC's if the link was to fail, one DC would go down
[18:31] * slang1 (~slang@c-71-239-8-58.hsd1.il.comcast.net) Quit (Read error: Connection reset by peer)
[18:31] * slang1 (~slang@c-71-239-8-58.hsd1.il.comcast.net) has joined #ceph
[18:31] * eschnou (~eschnou@173.213-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:33] <ron-slc> can somebody confirm. are MDS directory layouts to alternative pools non-functional in current bobtail?
[18:34] <loicd> I don't get it
[18:34] <loicd> :-D
[18:34] * joehrahme is now known as rahmu
[18:34] <mattch> matt_: there are enough failure domains even with all the nodes in one room on one switch that if you solved those, you'd get datacentre-level redundancy for free I think
[18:35] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[18:35] <mattch> (though I agree that split-brain, fencing and redundant routes aren't an easy task to begin working on!)
[18:37] <loicd> a given RBD can have many primary OSD if the underlying objects have differnt primary OSDs ? Is that what you're saying matt_ ?
[18:38] <matt_> loicd, the key is the rules. Look at the SSD-primary rule
[18:38] <matt_> RBD objects are mapped to many PG's. PG's have different primaries
[18:38] <loicd> ok
[18:39] <matt_> You would need to make a rule that makes the primary the same (or on the same host) then apply that to a pool
[18:39] * leseb (~Adium@ Quit (Quit: Leaving.)
[18:40] * loicd looking again
[18:42] <winston-d> joshd: ping
[18:43] <loicd> matt_: does that mean I would have to create one pool per host ?
[18:43] <matt_> loicd, yep
[18:44] * rahmu (~rahmu@ Quit (Remote host closed the connection)
[18:44] <loicd> and given that openstack requires one pool for cinder, I would only be able to create volumes on a single host ?
[18:44] <matt_> well you could access the volumes from another host but you would lose data locality
[18:45] <loicd> now I get it :-)
[18:45] <loicd> matt_: thanks for your patience
[18:45] <matt_> No worries, I'm still learning too
[18:47] <loicd> it would be more convenient to be able to say : all objects for this RBD bloc uses a single PG therefore has a single primary
[18:48] <matt_> remember also that data locality only matter for reads, writes still need to hit every OSD
[18:49] <matt_> I think there is work going on for CephFS that lets you read from the local copy if there is one instead of the primary
[18:49] * chutzpah (~chutz@ has joined #ceph
[18:49] <matt_> no idea if it applies to rbd
[18:49] * eschnou (~eschnou@173.213-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[18:50] * l0nk (~alex@ Quit (Quit: Leaving.)
[18:57] * eschnou (~eschnou@173.213-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:58] <matt_> I'm off to get some sleep, night all
[19:00] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[19:00] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:00] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) has joined #ceph
[19:03] * winston-d (~Miranda@ Quit (Quit: Miranda IM! Smaller, Faster, Easier. http://miranda-im.org)
[19:06] * winston-d (~Miranda@ has joined #ceph
[19:06] * winston-d (~Miranda@ Quit ()
[19:07] * winston-d (~Miranda@ has joined #ceph
[19:09] * eschnou (~eschnou@173.213-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[19:18] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[19:22] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[19:23] * slang1 (~slang@c-71-239-8-58.hsd1.il.comcast.net) Quit (Ping timeout: 480 seconds)
[19:29] * slang1 (~slang@c-71-239-8-58.hsd1.il.comcast.net) has joined #ceph
[19:30] * yehudasa (~yehudasa@2607:f298:a:607:2981:dee6:99f7:e854) Quit (Quit: Ex-Chat)
[19:30] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:34] * ivotron (~ivo@dhcp-59-159.cse.ucsc.edu) has joined #ceph
[19:34] <infernix> ok so the thought would be to use obsync
[19:35] <infernix> implement that as a hook inside radosgw and just sync on writes to other radosgws that are running on their own ceph cluster
[19:35] <infernix> it needs some intelligence but it allows to have separate ceph clusters that can still be kept in sync
[19:37] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[19:41] <elder> slang1, any reason I need to hang onto this cluster that had the error I posted yesterday?
[19:41] <elder> I'd like to reset it and do some more tests with it.
[19:41] <slang1> elder: nope
[19:41] <elder> OK thanks.
[19:43] * noob2 (~cjh@ has joined #ceph
[19:43] * drokita (~drokita@ has joined #ceph
[19:43] * jskinner (~jskinner@ has joined #ceph
[19:45] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[19:49] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[19:51] * gucki (~smuxi@84-73-190-117.dclient.hispeed.ch) Quit (Remote host closed the connection)
[19:51] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[19:51] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[19:52] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[19:53] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[19:53] * ChanServ sets mode +o elder
[19:53] * ljonsson (~ljonsson@ext.cscinfo.com) has joined #ceph
[20:00] <mrjack> hm
[20:00] <mrjack> once i did ceph osd crush set 5 osd.5 1.0 pool=default rack=unknownrack host=node06
[20:00] <mrjack> ceph starts backfilling
[20:01] <mrjack> but what is it backfilling? i did not mark the osd up?
[20:03] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[20:04] <mrjack> 2013-04-09 20:04:47.866929 mon.0 [INF] pgmap v14410545: 768 pgs: 718 active+clean, 14 active+remapped+wait_backfill, 2 active+recovery_wait, 34 active+remapped+backfilling; 773 GB data, 1551 GB used, 1929 GB / 3667 GB avail; 1042KB/s wr, 104op/s; 35650/448076 degraded (7.956%); recovering 39 o/s, 150MB/s
[20:05] <mrjack> http://pastebin.com/3t5TDYyX
[20:05] <mrjack> but i wonder what ceph is backfilling right now?!
[20:06] <mrjack> or is that expected behavior?
[20:09] * rustam (~rustam@ has joined #ceph
[20:11] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[20:12] * rustam (~rustam@ Quit (Remote host closed the connection)
[20:17] * dpippenger (~riven@ has joined #ceph
[20:18] * vanham (~vanham@load1-1.supramail.com.br) Quit (Ping timeout: 480 seconds)
[20:18] <noob2> is there an isi cluster i can try my auto cataloging python script on that won't cause mayhem?
[20:19] * vanham (~vanham@load1-2.supramail.com.br) has joined #ceph
[20:19] <noob2> nvm
[20:19] * noob2 (~cjh@ has left #ceph
[20:21] * vanham (~vanham@load1-2.supramail.com.br) Quit ()
[20:23] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[20:25] * Cube (~Cube@cpe-76-172-67-97.socal.res.rr.com) Quit (Quit: Leaving.)
[20:25] * gluster (1b6a30cc@ircip1.mibbit.com) has joined #ceph
[20:28] <gluster> hi guys my name is Vincent and i am new to the ceph irc
[20:32] <nhm__> gluster: hello!
[20:33] <gluster> hi
[20:38] * yasu` (~yasu`@dhcp-59-149.cse.ucsc.edu) has joined #ceph
[20:41] * gluster (1b6a30cc@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[20:43] <yasu`> Does anybody know where in the mds/CInode the filepath is stored ?
[20:43] * jskinner_ (~jskinner@ has joined #ceph
[20:47] * Cube (~Cube@ has joined #ceph
[20:47] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[20:49] <Kioob> Removing a snapshot of a 400GB RBD image is very slow, and make cluster unavailable during some minutes. I hope it's not the normal behavior, so, how can I find what is wrong in my setup ?
[20:50] * jskinner (~jskinner@ Quit (Ping timeout: 480 seconds)
[20:51] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[20:52] <fghaas> Kioob: removing an rbd snapshot ought to be rather snappy, and should definitely not make your cluster unresponsive, no. my suggestion would be to check out ceph -w and look if you see any slow requests, stuck PGs, recovery, deep scrub, or similar that coincides with the snapshot removal
[20:52] * yehudasa (~yehudasa@2607:f298:a:607:3076:3947:2032:e52e) has joined #ceph
[20:55] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[21:01] * ljonsson (~ljonsson@ext.cscinfo.com) Quit (Quit: Leaving.)
[21:02] <yasu`> I found make_path_string()...
[21:06] <fghaas> yasu`: your question is probably one for gregaf1, slang1 or sage -- you may just have to be patient until they have time to reply, they can't spend _all_ their time on irc :)
[21:07] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[21:07] * winston-d (~Miranda@ Quit (Quit: Miranda IM! Smaller, Faster, Easier. http://miranda-im.org)
[21:08] <yasu`> fghaas: Thanks. I know :) I bet if somebody happens to know. I think I can find myself. Thanks again
[21:09] <yasu`> (I mean I wondered if. :)
[21:11] <yasu`> I can find it by myself (broken English, sigh)
[21:12] * eschnou (~eschnou@173.213-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:14] <fghaas> yasu`: don't worry about your English, we have plenty of non-native speakers here -- and you're making yourself perfectly understandable
[21:15] <yasu`> thanks :)
[21:18] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[21:23] <Kioob> ok fghaas, thanks
[21:24] <Kioob> about "ceph -w", I never seen it work
[21:24] <iggy> that could be a problem
[21:25] <Kioob> I have the same output that "ceph status", followed by that error "2012-12-10 11:45:35.407734 unknown.0 [INF] mkfs de035250-323d-4cf6-8c4b-cf0faf6296b1"
[21:25] <Kioob> then, nothing happen
[21:27] <Kioob> (of course, it's the date of the ceph deployment)
[21:32] <fghaas> like iggy says, if ceph -w doesn't work at all, that would be a problem worth looking into
[21:34] <Kioob> I understand.
[21:35] <Kioob> I will try to trace that problem, thanks !
[21:36] * jskinner_ (~jskinner@ Quit (Remote host closed the connection)
[21:36] * jskinner (~jskinner@ has joined #ceph
[21:39] * SvenPHX1 (~scarter@wsip-174-79-34-244.ph.ph.cox.net) Quit (Remote host closed the connection)
[21:41] * rturk-away is now known as rturk
[21:48] * BManojlovic (~steki@fo-d- has joined #ceph
[21:49] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:49] <elder> Apparently @ is a reserved character in bash, so I need to backslash it if I want to use it for specifying snapshot names...
[21:54] <rturk> fghaas: you're going to be at the openstack summit?
[21:56] <fghaas> rturk: barring any unexpected travel issues, yes. yourself?
[21:57] * danieagle (~Daniel@ has joined #ceph
[21:57] <rturk> fghaas: yes! perhaps we should set aside some time for you, me, scuttlemonkey to meet?
[21:59] <fghaas> rturk, most certainly. when are you flying in? right now, monday morning looks best for me as far as meeting slots are concerned, but you can also just pop by our booth and grab me whenever convenient
[21:59] <rturk> Monday morning should work for us
[22:00] * rturk takes a look through the sessions
[22:01] <rturk> I can do just about any time that morning before the 11:50 sessions start
[22:02] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[22:02] * tnt (~tnt@228.204-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[22:08] <rturk> fghaas: any time in particular work for you? 10:30 break?
[22:11] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:16] <fghaas> rturk: sec, checking
[22:17] * alram (~alram@ has joined #ceph
[22:20] <fghaas> rturk: in your inbox :)
[22:20] <rturk> got it!
[22:26] <scuttlemonkey> hooray for ODS
[22:29] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[22:29] * sleinen (~Adium@2001:620:0:25:e837:233:a5dc:729b) has joined #ceph
[22:31] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:34] * noahmehl_away (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) has joined #ceph
[22:34] * noahmehl_away (~noahmehl@cpe-75-186-45-161.cinci.res.rr.com) Quit ()
[22:34] * verwilst (~verwilst@dD576962F.access.telenet.be) has joined #ceph
[22:35] * slang1 (~slang@c-71-239-8-58.hsd1.il.comcast.net) Quit (Ping timeout: 480 seconds)
[22:38] * LeaChim (~LeaChim@ has joined #ceph
[22:43] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[22:43] * verwilst (~verwilst@dD576962F.access.telenet.be) Quit (Quit: Ex-Chat)
[22:50] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[22:53] <ron-slc> it looks like using cephfs to define a different "directory default storage pool" don't work in Bobtail?
[22:54] <ron-slc> via the set_layout option
[22:54] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:56] <gregaf1> it's finicky and I don't remember for sure but .56.4 might have the virtual xattrs letting you do it instead — see if "ceph.layout" exists
[23:03] * ivotron (~ivo@dhcp-59-159.cse.ucsc.edu) Quit (Ping timeout: 480 seconds)
[23:05] * slang (~slang@c-71-239-8-58.hsd1.il.comcast.net) has joined #ceph
[23:06] * ivotron (~ivo@eduroam-238-87.ucsc.edu) has joined #ceph
[23:08] * Vjarjadian_ (~IceChat77@ has joined #ceph
[23:09] <ron-slc> gregaf1: I only see a handful of ceph.dir.xxx entries on the directory. (no ceph.layout) when I run ceph mds dump, I see this line: compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object}
[23:09] * Vjarjadian_ (~IceChat77@ Quit ()
[23:09] * Vjarjadian_ (~IceChat77@ has joined #ceph
[23:09] <ron-slc> with 3=default file layouts on dirs in the incompat={} stanza. Not sure if that has a bearing.
[23:10] <gregaf1> it's invisible; you need to query it explicitly (I really don't remember if it exists there or not)
[23:10] <gregaf1> otherwise you'll need to use the cephfs tool, and yes, it's annoying
[23:10] <gregaf1> I think you'll find you need to fully specify every parameter and use pool number IDs, not names
[23:13] * Vjarjadian (~IceChat77@ Quit (Ping timeout: 480 seconds)
[23:14] <ron-slc> yea, that was my procedure, use pool id Number, with "cephfs" tool. Also query explicit with xattr doesn't see "ceph.layout" on affected parent dir.
[23:14] <ron-slc> The layout looks like it should work: when doing "show_layout": layout.data_pool: 6
[23:14] <ron-slc> layout.object_size: 4194304
[23:14] <ron-slc> layout.stripe_unit: 4194304
[23:14] <ron-slc> layout.stripe_count: 1
[23:15] <ron-slc> But any files written beneath do get created. But the file contents get thrown in the bit-bucket.
[23:20] <gregaf1> you mean you've successfully set that layout, where pool 6 is where you want them, and then when you create a new file its contents are in a different pool?
[23:20] <gregaf1> or you're querying the layout on new files created underneath that and they look different?
[23:22] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[23:24] <ron-slc> Well the file seems to be created in the mds (metadata), I can see it with an ls; and the cephfs "show_location" also shows an OSD related to the layout.data_pool but when doing a hexdump of a small created file, it is filled with Zeros.
[23:24] <ron-slc> The "rados df" also shows zero objects, in the Pool # 6
[23:25] * eschnou (~eschnou@173.213-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:26] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[23:29] <ron-slc> So my goal is to have a few particular sub-directories be set to layout.data_pool: 6, and all subsequent files and dirs also be stored in this pool # 6
[23:32] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[23:33] <gregaf1> right, and that's what should happen
[23:33] <gregaf1> oh, it's possible that you need to grant your users write permissions to your extra pools
[23:33] <gregaf1> that could be your hangup
[23:34] <gregaf1> I believe this is described in the docs
[23:38] * chutzpah (~chutz@ Quit (Remote host closed the connection)
[23:51] * sleinen (~Adium@2001:620:0:25:e837:233:a5dc:729b) Quit (Quit: Leaving.)
[23:51] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[23:51] * chutzpah (~chutz@ has joined #ceph
[23:57] * portante is now known as portante|afk
[23:57] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:59] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:59] * aliguori (~anthony@ Quit (Quit: Ex-Chat)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.