#ceph IRC Log

Index

IRC Log for 2012-04-11

Timestamps are in GMT/BST.

[0:03] <yehudasa> elder: nhm: optional meeting
[0:03] <sagewk> elder: you can listen :)
[0:03] <sagewk> maybe
[0:03] <joao> is it a meeting with everybody?
[0:04] <sagewk> if you want to join!
[0:04] <joao> sure
[0:05] <nhm> sure, vidyo?
[0:24] * LarsFronius (~LarsFroni@g231137245.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[0:32] <sagewk> joshd: want to look at wip-guard?
[0:42] * gregorg (~Greg@78.155.152.6) Quit (Ping timeout: 480 seconds)
[0:46] * gregorg (~Greg@78.155.152.6) has joined #ceph
[0:48] <sagewk> :W: Possible missing firmware /lib/firmware/bnx2/bnx2-mips-06-6.2.3.fw for module bnx2
[0:48] <sagewk> is that bad?
[0:48] <sagewk> from update-initramfs?
[0:52] <dmick> erm
[0:52] <nhm> sagewk: naw, it always does that.
[0:52] <dmick> I'm a little surprised that firmware is in the initrd
[0:52] <nhm> sagewk: it hasn't been a problem yet for me.
[0:53] <nhm> sagewk: especially since it (hopefully) should be loading from /lib/firmware/updates
[1:03] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[1:03] <sagewk> hrm, ubuntu user needs to be in fuse group..
[1:03] <sagewk> not sure why that wasn't a problem before..
[1:05] <sjust> sagewk: those patches look right
[1:05] <gregaf> sagewk: can you give me a quickie review? http://pastebin.com/Rs7KJVYX
[1:08] <sagewk> gregaf: looks good. i would vote for 'osd heartbeat addr' tho (no server)
[1:08] <gregaf> okay
[1:08] <sagewk> but i'm the one who finds the hb stuff intuitive, so you guys decide :)
[1:09] <gregaf> I just had the server in there because I was going to let users specify a client addr too
[1:10] <gregaf> as long as you don't think we'll want that specificity anytime in the foreseeable future I'm fine just having "osd heartbeat addr"
[1:10] <sjust> in order to separate sending from recieving HBs?
[1:10] <joshd> gregaf: do clients bind? that'd be weird
[1:10] <gregaf> joshd: no, they don't
[1:10] <gregaf> I was thinking that some security-conscious users might want them to
[1:10] <sagewk> yeah, should be needed (for the same reason that a web browser doesn't initiate connections from port 80 etc.)
[1:10] <gregaf> but that's just because I'm confused ;)
[1:11] <sagewk> it might be worth digging a bit into what the user is doing.. i suspect that if they have a firewall between osds they might be doing something wrong
[1:12] * lofejndif (~lsqavnbok@83TAAEVOT.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:13] <gregaf> it's Piston; I assume they're trying to segregate OSDs from co-located public VMs or something?
[1:13] <gregaf> and presumably they've got a firewall on the top of every rack but OSDs can cross racks
[1:15] <sagewk> ah ok
[1:33] * Qten1 (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[1:40] * yehudasa (~yehudasa@aon.hq.newdream.net) Quit (Remote host closed the connection)
[1:42] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[1:48] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:09] * jlkinsel (~jlk@shell1.dal5.proind.us) has joined #ceph
[2:09] <- *jlkinsel* help
[2:14] * cattelan is now known as cattelan_away
[2:33] * joao (~JL@89.181.153.140) Quit (Quit: Leaving)
[2:37] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:51] * cattelan_away is now known as cattelan
[3:38] * rturk (~rturk@aon.hq.newdream.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[3:46] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[3:59] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:36] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[4:45] <Qten1> anyone have any links to a howto for RBD on swift?
[4:46] * jlkinsel (~jlk@shell1.dal5.proind.us) has left #ceph
[5:05] * chutzpah (~chutz@216.174.109.254) Quit (Quit: Leaving)
[5:25] <Qten1> might be asking for something that dosn't quite exist yet :), was reading into the http://ceph.newdream.net/openstack/, which talks about "Ceph?s RADOS Block Device (RBD) to fill the block storage void in the cloud software stack. They have also expressed interest in Ceph?s distributed object store as a potential alternative to Swift."
[5:49] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) Quit (Quit: ERC Version 5.3 (IRC client for Emacs))
[5:49] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) has joined #ceph
[6:06] * Qten1 (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Ping timeout: 480 seconds)
[6:08] <sboyette> \quit
[6:08] <sboyette> oops :)
[6:08] * sboyette (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Quit: leaving)
[6:25] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has left #ceph
[6:29] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[6:29] * mdxi is now known as sboyette
[6:30] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[6:36] * joshd (~joshd@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[6:38] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Ping timeout: 480 seconds)
[6:43] * f4m8_ is now known as f4m8
[6:43] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:45] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[6:52] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:20] * cattelan is now known as cattelan_away
[7:26] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[8:22] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[8:23] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[8:28] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[8:29] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Read error: No route to host)
[8:29] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[8:33] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Read error: Connection reset by peer)
[8:33] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[8:34] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[8:38] * adjohn is now known as Guest1529
[8:38] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[8:38] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit ()
[8:38] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[8:38] * adjohn is now known as Guest1530
[8:38] * Guest1530 (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Read error: Connection reset by peer)
[8:38] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[8:38] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Read error: Connection reset by peer)
[8:38] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[8:39] * adjohn is now known as Guest1531
[8:39] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[8:39] * Guest1531 (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Read error: Connection reset by peer)
[8:39] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Read error: Connection reset by peer)
[8:39] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[8:41] * Guest1529 (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Ping timeout: 480 seconds)
[8:46] * ivan` (~ivan`@li125-242.members.linode.com) Quit (Quit: ERC Version 5.3 (IRC client for Emacs))
[8:47] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Ping timeout: 480 seconds)
[8:50] * ivan` (~ivan`@li125-242.members.linode.com) has joined #ceph
[9:16] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[9:27] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:32] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[9:51] * loicd (~loic@83.167.43.235) has joined #ceph
[10:21] <Qten> Hey all, Does anyone happen to know which version is going to be considered stable? any ideas at this stage???
[10:22] <Qten> I ask because were close to deploying our cloud and i'd really like to use ceph
[10:25] * rz (~root@ns1.waib.com) Quit (Read error: Connection reset by peer)
[10:30] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[10:31] * rz (~root@ns1.waib.com) has joined #ceph
[10:33] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) has joined #ceph
[11:05] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[11:05] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[11:25] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:39] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[12:53] * lofejndif (~lsqavnbok@1RDAAAQ7L.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:16] * mgalkiewicz (~mgalkiewi@85.89.186.247) has joined #ceph
[13:30] * lofejndif (~lsqavnbok@1RDAAAQ7L.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[13:45] <nhm> Qten: kind of depends on what you want to use. Rados and RGW are top priorities, the filesystem layer will come later.
[13:48] <exel> nhm: in your vision, where does rbd stand in terms of priorities?
[13:49] <nhm> exel: Alex has been doing some testing of XFS on top of rbd lately, so we are doing some work on it. rgw is definitely first priority though because that's what our initial customers are deploying.
[13:50] <nhm> exel: The nice thing though is that everything needs rados, so the rados work we do will make rgw, rbd, and ceph better.
[13:51] * oliver1 (~oliver@p4FD0610E.dip.t-dialin.net) has joined #ceph
[13:52] <exel> nhm: rados works fine for me, but krbd is a nightmare.
[13:53] <exel> and qemu-rbd is not an option for us
[13:53] <exel> if needed we wouldn't mind sponsoring development if there's someone working on that.
[13:58] <nhm> exel: Sage should be around in 3-4 hours and can likely point you to the right folks to speak with.
[13:58] <exel> nhm: cool, I'll stick around.
[13:59] <nhm> exel: cool. Also, please submit bug reports for the krbd problems if you haven't...
[14:00] <exel> yeah I'm still gathering data
[14:00] <exel> just to make sure I'm not smokin crack :)
[14:00] <nhm> cool, thanks! :)
[14:11] <elder> Crack is whack.
[14:13] <elder> exel, rbd is a priority, bulletproofing it is going to be my main focus in the coming weeks.
[14:14] <elder> I would very much like to know what nightmares you have with kernel RBD so I can make sure they go away.
[14:15] <exel> elder: basically, if I kill an osd node, all i/o stops (no failover). If the osd restarts, the kernel crashes.
[14:15] <elder> Sponsorship would be very helpful.
[14:15] <exel> tried a number of kernels, including the one in your git repo for krbd
[14:15] <elder> I think I hit that one or something like it last night, and am going to be looking at it. I think it's in messaging. Do you have a stack trace or log messages you could send me?
[14:15] <exel> yeah, hang on
[14:16] <elder> pastebin or something similar preferably
[14:16] <exel> http://pastebin.com/iK5a70SR
[14:16] <exel> had a couple more, but always the same trace as far as I could tell.
[14:17] <elder> OK, you're getting something in try_read(), I'm getting it in try_write(), but they may be related.
[14:17] * joao (~JL@89.181.153.140) has joined #ceph
[14:17] <joao> hi #ceph
[14:17] <exel> it seems like a klibceph issue, by the way, not just rbd
[14:17] <elder> If you could, please file a bug on this with that log info here: http://tracker.newdream.net/projects/ceph-kclient/issues/new
[14:17] <exel> I managed a similar crash using a ceph mount
[14:17] <exel> yeah, on it.
[14:18] <nhm> good morning (afternoon?) joao
[14:18] <elder> libceph includes the messaging code.
[14:18] <elder> So I think I agree...
[14:18] <joao> nhm, afternoon, but feels just like early morning :)
[14:19] <exel> elder: unrelated, is there any way to lower the timeout on failing osds? The /sys interface for rbd offers a couple of timeout parameters, but they don't seem to do anything.
[14:19] <exel> it seems like currently it relies on tcp timeouts only
[14:19] <elder> I don't know the answer to that exel, but will see if I can find out this morning.
[14:20] <exel> elder: I may be daft, but I don't see an option for creating a new account on your redmine?
[14:20] <exel> oh
[14:20] <exel> found it, daft indeed :)
[14:21] <elder> Just impatient.
[14:21] <exel> nice euphemism :)
[14:22] <elder> I guess I'll start using that all the time now.
[14:22] <nhm> elder: that your impatient, not daft? ;)
[14:23] <nhm> rather, maybe I'm daft for mixing up your/you're. ;)
[14:23] <elder> yore.
[14:23] <nhm> words too hard morning....
[14:27] <exel> elder: #2261
[14:29] <elder> Thank you.
[14:47] * aliguori (~anthony@nat-pool-3-rdu.redhat.com) has joined #ceph
[15:05] <elder> nhm, joao, any info about this sort of thing: INFO:teuthology.task.ceph.osd.1.out:starting osd.1 at :/0 osd_data /tmp/cephtest/data/osd.1.data /tmp/cephtest/data/osd.1.journal
[15:05] <elder> INFO:teuthology.task.ceph.osd.1.err:./msg/msg_types.h: In function 'void entity_addr_t::set_port(int)' thread 7f4f25f3e7a0 time 2012-04-10 21:21:40.691627
[15:05] <elder> INFO:teuthology.task.ceph.osd.1.err:./msg/msg_types.h: 252: FAILED assert(0)
[15:06] <elder> It started happening last night for me, when the test setup had worked for me the previous run.
[15:06] <joao> no idea
[15:06] <joao> never seen it
[15:06] <joao> maybe some lingering process holding the port?
[15:07] <elder> Well I tried rebooting the systems. Will try again, and this time will make sure it actually happened.
[15:08] * BManojlovic (~steki@212.200.243.246) has joined #ceph
[15:21] <elder> Has anyone else tried to run a simple teuthology test this morning?
[15:25] <elder> I'm going to try again with an older version of the ceph code.
[15:31] * mgalkiewicz (~mgalkiewi@85.89.186.247) Quit (Ping timeout: 480 seconds)
[15:36] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[15:38] <joao> elder, not really
[15:38] <joao> haven't used teuthology since last week
[15:40] * mgalkiewicz (~mgalkiewi@85.89.186.247) has joined #ceph
[15:43] * lofejndif (~lsqavnbok@83TAAEWKB.tor-irc.dnsbl.oftc.net) has joined #ceph
[15:44] * f4m8 is now known as f4m8_
[15:49] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[15:50] <elder> OK, there is a problem with something updated in the ceph tree last night. I got past my problems by using "sha1: 965f83d4bdedd0bcb0497100e9cca8f476920d45" for my ceph task. That's a commit from two nights ago.
[15:50] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[15:50] <elder> sage, gregaf, sjust (and yahudasa), see above.
[15:50] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[15:52] <joao> elder, have you tried any of the other commits that follows that one?
[15:52] <elder> No.
[15:54] <joao> elder, I think the problem is probably with this one: 6fbac10dc68e67d1c700421f311cf5e26991d39c
[15:54] <elder> Could be.
[15:54] <elder> I'm not going to chase it though.
[16:03] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:17] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[16:17] <nhm> elder: sorry, was having breakfast with the kids.
[16:17] <nhm> elder: haven't tried anything in teuth since earlier in the week.
[16:27] * filoo_absynth (~absynth@mail.absynth.de) has joined #ceph
[16:27] <filoo_absynth> goodday
[16:28] <nhm> filoo_absynth: hello!
[16:28] <elder> Is it normal to see things like this at 5 minute intervals in syslog:
[16:28] <elder> Apr 9 15:49:01 plana28 kernel: [ 8592.080853] libceph: osd1 10.214.133.32:6800
[16:28] <elder> socket closed
[16:28] <nhm> elder: I don't recall seeing that, but I haven't been looking at syslog much.
[16:29] <filoo_absynth> nhm: did we meet at WHD? I'm bad with nick->name mapping :)
[16:29] <joao> can't say I've seen it either
[16:29] <joao> nhm wasn't at WHD
[16:29] <nhm> filoo_absynth: nope, I didn't make it out there.
[16:30] <nhm> filoo_absynth: I think both Joao and Sage were there though...
[16:31] <filoo_absynth> yes, i remember talking about pasteleria belem to joao ;)
[16:31] <joao> I was there only for a couple of days though
[16:31] <joao> oh, that was you... :)
[16:34] <filoo_absynth> aye
[16:38] * oliver1 (~oliver@p4FD0610E.dip.t-dialin.net) has left #ceph
[16:38] * oliver1 (~oliver@p4FD0610E.dip.t-dialin.net) has joined #ceph
[16:43] * lofejndif (~lsqavnbok@83TAAEWKB.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[16:44] * rosco (~r.nap@188.205.52.204) Quit (Quit: leaving)
[16:44] * rosco (~r.nap@188.205.52.204) has joined #ceph
[16:45] * krisk (~kap@rndsec.net) has left #ceph
[16:48] <nhm> elder: just to confirm, I did run into the same problem with master.
[16:57] * cattelan_away is now known as cattelan
[17:31] * oliver1 (~oliver@p4FD0610E.dip.t-dialin.net) has left #ceph
[17:54] * loicd1 (~loic@83.167.43.235) has joined #ceph
[17:54] * loicd (~loic@83.167.43.235) Quit (Read error: No route to host)
[18:04] <sagewk> cd4a760e9b22047fa5a45d0211ec4130809d725e should fix it
[18:05] <wonko_be> can someone help me with the "ceph osd crush add" syntax?
[18:06] <wonko_be> 2012-04-11 18:05:14.593038 mon <- [osd,crush,add,3,osd.3,1,host=ceph-002,rack=1]
[18:06] <wonko_be> 2012-04-11 18:05:14.593715 mon.0 -> '(22) Invalid argument' (-22)
[18:06] <wonko_be> what should exactly go in host= and rack=
[18:09] <sagewk> it may be that they need to be alphanumeric. also, you may need to add pool=default in there.. if that rack or host don't exist yet it doesn't know where to attach it to the tree
[18:10] <elder> sage, I don't believe it did.
[18:10] <sagewk> e.g., host=whatever rack=rack1 pool=default
[18:10] <sagewk> k
[18:10] <elder> I ran a test this morning that failed, meaning your checkin from last night would have been in place.
[18:10] <wonko_be> sagewk: what is the default "weight"? 1?
[18:11] <elder> nhm verified it as well
[18:15] <nhm> yeah, master was broken this morning for me as well. I just switched to stable as I don't need anything so recent.
[18:15] <sagewk> yeah
[18:15] <sagewk> in real clusters i've been using size of disk in TB
[18:17] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) has joined #ceph
[18:28] <sagewk> joao: there?
[18:28] <joao> I am
[18:28] <elder> nhm, any recollection about how you got vidyo! to start on Ubuntu after installing the package?
[18:29] * aliguori (~anthony@nat-pool-3-rdu.redhat.com) Quit (Ping timeout: 480 seconds)
[18:29] <nhm> elder: I ran /opt/vidyo/VidyoDesktop/VidyoDesktop I think
[18:30] <nhm> elder: I've more or less given up on the linux version. After a couple of days something breaks and I have to reinstall.
[18:30] <elder> libblkid What the hell do they need that for?
[18:30] <elder> Now I remember...
[18:32] * lofejndif (~lsqavnbok@09GAAETPN.tor-irc.dnsbl.oftc.net) has joined #ceph
[18:35] <elder> OK, well I got it working on my phone and I'm content with that for now.
[18:36] <nhm> maybe I'll try that too. I'd like to move my laptop over to linux.
[18:37] <sagewk> i had to install libblkid1:i386 and kludge the LD_LIBRARY_PATH in the /usr/bin/VidyoDesktop script to make it go on my laptop (precise)
[18:38] <elder> That's what I figured I had to do too, but I didn't have the patience to research the specifics you just provided.
[18:38] <sagewk> nhm: did carl's raid config script work out?
[18:39] <nhm> sagewk: sort of. The biggest problem is that we can't clear the raid config at the beginning like he does without losing access to the system drive.
[18:39] * perplexed (~ncampbell@216.113.168.141) has joined #ceph
[18:39] <sagewk> yeah
[18:39] <nhm> sagewk: so if we've changed things from default that he hasn't changed, we could have a slightly different setup.
[18:40] <nhm> I haven't delt with it yet.
[18:41] <nhm> I'll probably be doing that this afternoon.
[18:41] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[18:43] <sagewk> nhm: ah, the bug is fixed, but gitbuild is behind
[18:43] <nhm> sagewk: btw, meant to ask you, is metropolis backed up? ;)
[18:44] <perplexed> Hi All. Question for you on some odd rados bench results I'm seeing. If I do a long running bench for writes, the performance starts off well (~75MBps), but over time I see the periods of "stalls" where the current write MB/s reports as zero before starting up again and a lower MB/s rate. Anything I should be checking for to understand this?
[18:44] <nhm> perplexed: btrfs?
[18:44] <perplexed> ext4
[18:44] <joao> nhm, those questions are never a good omen...
[18:44] <joao> what are you going to do on metropolis?
[18:45] <sagewk> nhm: it is not :)
[18:45] <sagewk> there is a second disk in tehre, tho.. feel free to rsync whatever you want to that
[18:45] <nhm> joao: that's where I've been writing out many gigabytes of debug data.
[18:46] <sagewk> nhm: in fact there is an untouched sdc you can mount on /backup
[18:46] <nhm> sagewk: Ok. I was just thinking that it might be good to back up some of this debug data.
[18:46] <elder> sagewk, I tried what you suggested on VideoDesktop above and... ClientManagerCallBack: Join Progress for conference: Dummy data
[18:46] <elder> Segmentation fault (core dumped)
[18:46] <elder> Done.
[18:46] <elder> Will use my phone.
[18:46] <sagewk> i saw that once, and then it worked the second time.
[18:46] <sagewk> yeah, that's safer :)
[18:46] <nhm> sagewk: it can all be recreated easily enough and won't matter as much as time goes on, but it would be annoying to lose.
[18:46] <elder> Oh, just run it a couple times and it might work?
[18:47] <elder> What a great plan!
[18:47] <nhm> elder: vidyo segfaults constantly for me in linux.
[18:47] <sagewk> that's my method :)
[18:47] <elder> Did you try it a couple times?
[18:47] <sagewk> it worked the second time..
[18:47] <nhm> elder: ... yes, which is how it segfaults for me constantly. ;)
[18:47] <elder> Maybe you need to be more diligent.
[18:48] <elder> You just need to try one more time than the number of times it fails.
[18:51] <nhm> elder: ah, the skinner box model of software development.
[18:53] <sagewk> another one of the ceph kvm machines crashed. gitbuilder vms are back up and catching up...
[18:53] <sagewk> balancing sucks, so it'll be slow
[18:55] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[18:59] <sagewk> if only these vms were backed by some sort of reliable, shared storage system so we could easily migrate them between hosts...
[18:59] <yehudasa> sagewk: nah, that'll never work
[19:00] * aliguori (~anthony@nat-pool-3-rdu.redhat.com) has joined #ceph
[19:00] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[19:00] <sagewk> heh, maverick EOL just announced.. no wonder things started crashing! http://www.h-online.com/open/news/item/Ubuntu-10-10-Maverick-Meerkat-reaches-end-of-life-1518744.html
[19:01] <nhm> sagewk: that coincidently is the most recent version of ubuntu supported by vidyo. ;)
[19:01] <dmick> sagewk: sounds like a tools and infrastructure nightmare to me :)
[19:06] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[19:18] <elder> Do we have a call today? Or am I still just a victim of Vidyo?
[19:18] <nhm> elder: we are all on
[19:19] <elder> I called the wrong room.
[19:19] <elder> Nobdy there.
[19:30] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[19:34] * adjohn (~adjohn@50-0-164-119.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[19:35] <perplexed> I know it's possible to increase the replication level on a pool, but are there any known issues reducing the replication level. I'm assuming ceph will take either in its stride. The reason I ask is that ceph -w shows activity related to increases, but decrease activity seems silent in the output.
[19:36] <dmick> elder: so it seems that the echo is a function of having your own voice feed back from the speaker into your own mic. I saw you had headphones but maybe the audio was still coming out the speaker? (echo cancellation is designed to solve this, but apparently it sucks in vidyo)
[19:36] <elder> There is no option to turn on or off echo cancellation on the Android version of Vidyo.
[19:36] <gregaf> perplexed: reducing replication should work fine
[19:36] <elder> It's possible the speaker was still going, I don't know for sure.
[19:36] <elder> I had my headphones on.
[19:37] <dmick> elder: yeah, a thing to check for next time
[19:37] <elder> Maybe next year at this time we'll have a call without any Vidyo distractions.
[19:37] <gregaf> it probably doesn't produce output to the central log (that's what you're seeing via ceph -w) because it's not very interesting compared to what happens when you increase replication
[19:37] <dmick> asymptotically approaching goodness
[19:37] <elder> One would hope so, but I'm not yet confident of that.
[19:39] <joao> dmick, I think most of the echo comes from your side
[19:39] <joao> usually, when we're alone in the room, there is no echo to speak of
[19:41] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[19:41] <dmick> we tried turning down volume on our end; didn't help. It does vary, true.
[19:42] <joao> would have echo cancellation set on your side help? this is assuming it's not already set of course :)
[19:43] <dmick> echo cancellation just seems like a lose (although Polycom has had it down for years). No problems on our end with anyone but Alex today
[19:45] <gregaf> and we tried muting our mic and it didn't help on our end (did it help on yours?)
[19:45] <elder> nhm, can you please connect with Danger Room now?
[19:45] <joao> dmick, just saying because I do hear echo, although it doesn't affect the call
[19:45] <nhm> elder: sure
[19:47] <perplexed> Apologies if an obvious Q... On my 4 server cluster, 10 OSD's/disks per server... I see that if I crank replication on a pool up to something > the number of servers, ceph -w reports degraded with a fixed percentage... the degraded state does not improve. Just to confirm... is there a limit that should not be crossed... replication setting must be <= # servers, etc?
[19:50] <wido> perplexed: So, 40 OSD's in total?
[19:50] <elder> nhm I'm OK with how well it is working right now.
[19:51] <perplexed> Y, 40 OSD's. Pool was defined with 2000 PG's also (50x # OSD's)
[19:51] <wido> perplexed: Yes, I think it's due to the crushmap
[19:51] <elder> I still get an echo because it seems the input feeds back tot he output even if I'm on a headset (i.e., seems to be going inside the phone/software, withut the speaker or microphone involved)
[19:51] <gregaf> perplexed: with the default CRUSH map being generated from a config like that, Ceph won't allow more than one replica per host
[19:51] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[19:51] <wido> You haven't got enough hosts in the crushmap for that
[19:51] <wido> like gregaf says
[19:51] <gregaf> so if you're asking for 5 it can't satisfy those constraints and will remain degraded
[19:51] <nhm> elder: strange. Well, let me know if you need me to test again.
[19:52] <perplexed> Thx. Is there any way to get CRUSH to treat different OSD's (spindles) as valid locations for a replica.... one server being able to host more than one copy of an object, as long as on a different OSD/spindle?
[19:52] <dmick> sniff. you guys left before I got there
[19:53] <elder> OK.
[19:53] <elder> dmick, we saw you coming.
[19:54] <yehudasa> gregaf: pushed to wip-oc-perf
[19:54] <gregaf> yehudasa: thanks!
[19:55] <gregaf> perplexed: you can do one of two things; either pretend that each host is actually two different hosts (in the config), or else maintain your own crush map with custom rules
[19:55] <perplexed> thx
[19:55] <sagewk> gregaf: actually, just need to modify the crushmap to not separat replicas across hosts
[19:55] <gregaf> but I wouldn't recommend either one of those, since while multiple replicas in a single host will protect you from disk failures they won't protect you from any of the other failures you might experience
[19:55] <sagewk> the default one probably does, depending on how many osds you told mkcephfs about
[19:56] <sagewk> (er, hosts)
[19:56] <gregaf> sagewk: that would be "your own crush map with custom rules" :)
[19:57] <sagewk> heh yeah :)
[19:57] <sagewk> fwiw, when mkcephfs/osdmaptool creates the default osdmap, it separates across hosts only if there are > 2 (>=? i can't remember) hosts in the initial ceph.conf file.
[20:01] <perplexed> Thx all. That clears it up. Makes sense... if we have sufficient servers the need for replicas on different spindles/OSD's on one host isn't there.
[20:04] <perplexed> I assume the same limitation would be present if the crush map used a rack placement approach. If my 4 servers were split across 2 racks (2 per rack), by default I shouldn't use a replication value >2 for any pool in that config.
[20:05] <nhm> ooh, sandisk extreme ssd for $120 at newegg.
[20:08] <elder> Is that good? $120GB? They seem to be hovering around $1/GB.
[20:08] <gregaf> perplexed: you'll have to ask sagewk how the defaults work in that situation; I don't recall
[20:09] <gregaf> I think the defaults are that any designation will be ignored for replication if there aren't at least 3, but if you do have at least 3 you are limited to the total number of them
[20:09] <gregaf> that's not something hardcoded in CRUSH itself, though; it's just how the auto-generated maps work
[20:10] <elder> OCZ Vertex 3 120GB for $110 shipped after $30 off and $20 rebate too if that's any good.
[20:10] <elder> Lunch.
[20:12] <nhm> elder: not sure about the vertex 3, but the sandisk extreme has the new sandforce controller and toggle-mode nand chips (though fewer of them than some of the really high end drives). For the price it's quite fast.
[20:13] <nhm> that, and I *hate* mail in rebates.
[20:19] <joao> would it be a problem if a same program mounted multiple filestores?
[20:19] <joao> and I mean, having multiple FileStore instances (say, 'a' and 'b') and calling 'a.mount()' and 'b.mount()'.
[20:20] <joao> (trying to exclude this as the cause for a couple of stack traces I'm getting)
[20:23] <gregaf> joao: sjust is out today and sagewk is in a meeting...
[20:24] <joao> okay, np
[20:24] <gregaf> if grepping for "static" doesn't turn up anything (it shouldn't) it shouldn't be a problem (and I don't think it should be a problem), but I don't know the implementation or interface well enough to be sure
[20:24] <joao> I have some other work to do and can come back to this at a later time :)
[20:25] * lofejndif (~lsqavnbok@09GAAETPN.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[20:26] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[20:26] <joao> gregaf, I would think no problem should arise from this, but I'm incurring in some issues with the PerfCountersCollection class, so I'm probably wrong :)
[20:26] <gregaf> aaaaaahhhhh???.that actually might be a conflict with static data members in PerfCounters
[20:27] <joao> well, it shouldn't be a problem anyway
[20:27] * ceph-test (~Runner@mail.lexinter-sa.COM) has joined #ceph
[20:27] * adjohn (~adjohn@15.sub-166-250-46.myvzw.com) has joined #ceph
[20:27] <joao> I'll just umount() one filestore and mount the other
[20:28] <joao> :)
[20:28] <joao> that should take care of that, I suppose
[20:28] <ceph-test> hi2all
[20:28] * adjohn (~adjohn@15.sub-166-250-46.myvzw.com) Quit ()
[20:28] <joao> hello
[20:28] <ceph-test> need help
[20:29] <ceph-test> may be found bug
[20:29] <ceph-test> after trying some adding -removing osd
[20:30] <ceph-test> all osd and mds crashed
[20:30] <ceph-test> ./osd/OSDMap.h: 475: FAILED assert(get_max_osd() >= crush.get_max_devices())
[20:31] <joshd> ceph-test: what version are you running?
[20:31] <ceph-test> After cleanup crush and osd map
[20:31] <ceph-test> mds is up and runnong
[20:32] <ceph-test> But osd still crashed
[20:33] <ceph-test> with the same messages
[20:33] <perplexed> Just to confirm, if I'm using rados import to write files to a pool, and I used the --debug_objecter 10 option, what is the significance of the op_submit events output? I've been assuming they are an indication of which OSD received the client (rados client) write.
[20:34] <joao> dinner, brb
[20:34] <joshd> perplexed: yeah, it shows which osd is getting the write
[20:35] <joshd> ceph-test: that sounds like a bug in osd removal - could you create an issue at http://tracker.newdream.net and paste the full backtrace?
[20:36] <perplexed> What I see when I analyze the events is that only two of the 10 OSD's on any of my 4 hosts get the client writes... the other 8 OSD's per server are untouched. I'd assumed I'd see rados import spraying writes across all OSD's equally. My pool was defined with 2000 PG's, 4 servers, 10 OSD's per server... 40 total.
[20:36] <joshd> perplexed: you can do --debug-ms 1 and grep for 'osd_op(.*write' as well
[20:36] <perplexed> I'll try that too. Thx
[20:37] <perplexed> In the test I import ~220MB of data (in 16k files.. so ~13KB per file).
[20:37] <perplexed> Filenames are unique
[20:37] <gregaf> perplexed: you've seen issues like this a few times with different configurations, right?
[20:37] <perplexed> It's been a constant for me since testing last week.
[20:38] <perplexed> Just finally have time to dig a little deeper :)
[20:38] <gregaf> I'm not an expert on our CRUSH stuff but if you could post your crush map and pg map somewhere that'd be helpful for whoever ends up diagnosing it
[20:38] <perplexed> Thx. Will do.
[20:44] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[21:01] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[21:03] <todin> hi, is the wip-librbd-caching branch in a state where this feature is testable?
[21:07] <lxo> ceph hardlink farm, lesson #2) removing a directory tree after cp -lR'ing it elsewhere seems to cause the space taken up by the tree to go unaccounted for in ls output
[21:07] * harry (~harry@host-92-23-249-249.as13285.net) has joined #ceph
[21:17] <joao> that moment when you find the source of an error after drinking a glass of wine and assume there is a relationship between the two events
[21:17] <joshd> todin: yeah - the option is rbd_cache_enabled - I'm doing testing now
[21:18] <joshd> todin: the size is set by client_oc_size - might change that later
[21:18] <gregaf> lxo: details?
[21:21] <mgalkiewicz> hi guys I have problems with repairing my cluster https://gist.github.com/2361627
[21:22] <mgalkiewicz> I am using ceph 0.44.1-1~bpo70+1 on both osds
[21:23] <lxo> gregaf: ls -l foo; mkdir bar; cp -lR foo/x/. bar/y; rm -rf foo/x; ls -l foo bar => bar's size is 0
[21:24] <gregaf> ah
[21:24] <todin> joshd: ok, where do I set the enable option? in the ceph.conf, or in the rdb option in the libvirt xml?
[21:24] <gregaf> and presumably foo no longer includes x in its calculations?
[21:25] <lxo> that's what it looks like to me
[21:25] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Remote host closed the connection)
[21:25] <gregaf> lxo: okay
[21:26] <gregaf> can you make a bug for it?
[21:26] <gregaf> I don't think we're going to have time for this in the near future :(
[21:26] <lxo> what I haven't checked is whether foo and bar's parent does. I'm not sure it should. but that's yet another fishy bit in hard-link handling that I hadn't noticed before. it explains some zero- and perhaps negative-sized dirs I ran into before
[21:26] <gregaf> but I can say that it's a bug that makes sense to me and I think I even remember being confused about it once upon a time when going over that code for other reasons
[21:26] <lxo> gregaf, will do. not right now. I'll gather some more info before I file a bunch of bugs, if you don't mind, since I understand there's no rush
[21:27] <gregaf> works for me, just as long as there's a bug sometime so we don't lose it
[21:27] <lxo> *nod*
[21:32] <joshd> todin: depends on how you run it, but wherever you put rbd_writeback_window before
[21:35] <todin> joshd: ok, does it take arguments? is the working principel of the cache some were documentet?
[21:36] <joshd> todin: no, rbd_cache_enabled is just a boolean, and it's not documented yet, but it's a writeback cache that grows up to client_oc_size bytes
[21:37] <joshd> mgalkiewicz: there's no easy way to repair internal inconsistencies like that atm - but you can shutdown the osds holding that object and manually change the object info to match (it's an xattr on the file) - I'm guessing 4mb is the correct size
[21:37] <joshd> mgalkiewicz: are you using xfs?
[21:38] <mgalkiewicz> xfs is on rbd device
[21:38] <mgalkiewicz> ceph is on btrfs
[21:39] <joshd> mgalkiewicz: which ceph version? if it's recent we'd really want figure out how it became inconsistent
[21:39] <mgalkiewicz> 0.44.1-1~bpo70+1
[21:40] <mgalkiewicz> on osd.0 the file have 8192 blocks and on osd.1 only 2056
[21:41] <mgalkiewicz> do you suggest to increase this to 8192?
[21:42] <joshd> I'd check which is newer, but probably copy the one from osd.0 to osd.1
[21:43] <mgalkiewicz> th bigger is newer
[21:43] <joshd> if it's been written with a larger size, nothing would have made it smaller (no discard yet)
[21:43] * lofejndif (~lsqavnbok@09GAAETW5.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:44] <mgalkiewicz> is it safe to copy the file with both osds running? is it safe for my clients?
[21:45] <joshd> you'll want to stop the osds first - they might have cached info that will overwrite the xattrs on files you manually copy (namely the object metadata, including size)
[21:46] <mgalkiewicz> hmm it is not enough to just stop osd.1?
[21:47] <joshd> yeah, that should be enough
[21:47] <mgalkiewicz> ok and want do you need to determine what caused the problem? some logs?
[21:48] <joshd> unfortunately we'd need logs of how it happened - logs from this point won't tell us that
[21:48] <mgalkiewicz> I have recently upgraded from 0.44-1~bpo60+1 if it makes and difference
[21:48] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[21:49] <mgalkiewicz> well I am not sure when the problem appeared
[21:49] <mgalkiewicz> but I still have logs from last week
[21:51] <joshd> if they've got debug osd >= 10 they might be useful - could you compress them and attach them to a bug? logs from both osd 0 and 1 would help (and possibly others, but the logs will tell us if that's needed)
[21:52] <sagewk> joshd: wanna pull the discard stuff into your branch so it gets "tested" with the rest?
[21:52] <mgalkiewicz> well I have debug osd = 1 in configuration file so it is probably not enough
[21:53] <joshd> sagewk: sure
[21:53] <todin> sagewk: joshd the the discard stuff is the trim functionaltiy?
[21:54] <mgalkiewicz> lol it looks like just restarting osd.1 fixed the problem
[21:55] <sagewk> todin: yeah
[21:55] <mgalkiewicz> joshd: thx for help
[21:55] <sagewk> not wired up to anything yet, tho
[21:55] <joshd> mgalkiewicz: you're welcome
[21:55] <todin> sagewk: though it is not testable right now?
[21:55] <sagewk> there is a simple unit/functional test, but that's it
[21:57] <todin> sagewk: I wantet do test it form the rbd layer, if I delete huge amounts of data in my vms the space should be reclaim on the osd side?
[21:57] <joshd> mgalkiewicz: wait a minute, did you restart without copying? that might just be http://tracker.newdream.net/issues/1197
[21:58] <sagewk> todin: yeah, once you wire it up to qemu.
[21:58] <sagewk> todin: it isn't efficient (yet) for huge trims (it's O(n)), but it'll work... once the qemu code is updated.
[21:59] <todin> sagewk: do you have any idea when the qemu rbd driver will be updated?
[22:07] <elder> dmick, I am unable to log int o plana46 and 84 using user name "ubuntu". Is that something you can fix? Is that something that should have been set up by chef or something?
[22:07] <dmick> doubtless it can be fixed. I think some of the plana's have never been cheffed
[22:08] <dmick> can you ssh into them?
[22:08] <elder> I notice your proper use of the double-f.
[22:08] <elder> Yes.
[22:08] * dmick blushes why thank you
[22:08] <elder> Just not login to the console.
[22:09] <dmick> I should say...they've been cheffed, but they haven't been reinstalled with the 'accepted Ceph install'
[22:09] <dmick> which would have set the passwords differently
[22:09] <dmick> if you're willing to wait 20-30 min for a reinstall I can make that happen too
[22:09] <dmick> eventually we're going to need to do it on all the machines
[22:09] <elder> Not urgent.
[22:10] <dmick> otherwise you can ssh in and hack /etc/passwd to the 'right' thing
[22:10] <elder> I don't truly need it, just wanted to note the anomaly.
[22:10] <dmick> yeah
[22:11] <nhm> dmick: please just avoid the ones for congress testing
[22:11] <dmick> wouldn't do it without locking
[22:11] <nhm> dmick: excellent, thanks
[22:11] <dmick> I am doing other sneaky things without locking, but very carefully
[22:12] <dmick> in fact it's the case that *most* of the plana are set up that way
[22:12] * lofejndif (~lsqavnbok@09GAAETW5.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[22:21] * BManojlovic (~steki@212.200.243.246) Quit (Quit: Ja odoh a vi sta 'ocete...)
[22:21] * nhm pokes at megacli
[22:22] <todin> nhm: megacli is a nice tool ;-)
[22:25] <nhm> todin: their error codes are so efficient. Every bytes saved is a byte earned...
[22:25] * mgalkiewicz (~mgalkiewi@85.89.186.247) Quit (Ping timeout: 480 seconds)
[22:25] <dmick> todin: this is a definition of the word "nice" with which I had not been previously acquainted
[22:27] <todin> nhm: yep, and the documentatin is well done, another nice piece of mission critical software
[22:27] * lofejndif (~lsqavnbok@9YYAAE9DL.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:27] <elder> "Anyone who has ever had to work with the LSI RAID controllers knows that the MegaCLI provided by LSI is the most cryptic command line utility in existence."
[22:27] <elder> http://www.5dollarwhitebox.org/drupal/node/82
[22:29] <sagewk> nhm: still need some plana?
[22:29] <gregaf> hey, if you had guaranteed employment because you knew megacli you'd think it was a nice tool too ;)
[22:30] <todin> elder: you are lucky when you could use the megacli util, I was the last few day in the server room to use the webbios and the lsi bios command line
[22:30] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:31] <nhm> sagewk: haven't tried to lock anything yet today. I need 5 more nodes.
[22:33] <elder> nhm how long do you need them?
[22:33] <elder> I could probably let 3 go; I have two sets of 3 locked at the moment.
[22:34] * mgalkiewicz (~mgalkiewi@85.89.186.247) has joined #ceph
[22:34] <nhm> elder: Sage got me some
[22:59] <sagewk> joshd: http://fpaste.org/0EjG/
[23:16] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:19] <mgalkiewicz> joshd: I have copied one file but there were two with inconsistent state and both were fixed after restart
[23:28] * harry (~harry@host-92-23-249-249.as13285.net) Quit (Remote host closed the connection)
[23:34] * aliguori (~anthony@nat-pool-3-rdu.redhat.com) Quit (Quit: Ex-Chat)
[23:41] <sagewk> gregaf: can you look at wip-osd-boot?
[23:41] <sagewk> i think this is what was causing the flapping carl was seeing on congress
[23:44] <sagewk> gregaf: also, can you fix the pool create command to set pgp_num too? :) pool 3 '.rgw.data.1' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 65536 pgp_num 8
[23:44] <gregaf> yeah
[23:45] <gregaf> I don't think that fix works???there's already a similar one in prepare_boot
[23:46] <sagewk> yeah, the order of the check is wrong, tho.. it looks at osdmap.is_up() first.
[23:46] <sagewk> should probably remove the other one.
[23:46] <gregaf> the pending_inc is cleared at the same time the osdmap is updated :)
[23:47] <gregaf> so to close the race we'll need to do epoch comparisons or similar
[23:47] <gregaf> not just blind looks at current and pending osdmap
[23:48] <sagewk> it's not that, it's that there are two duplicate osd_boot messages coming in
[23:49] <gregaf> yes, and if they're coming in without the monitor committing a new map it'll notice ?????osdmap.is_up(osdnum) will be false and pending_inc.new_up_client will contain the OSD
[23:49] <gregaf> right?
[23:50] <gregaf> but if it does commit a new map then osdmap.is_up(osdnum) will be true, AND pending_inc.new_up_client will not contain the OSD
[23:50] <sagewk> not if it was up before.... up.... down -> up from first map ... ->down->up from secondmap
[23:50] <sagewk> oh, right.
[23:51] <perplexed> How accurate are rados bench stats? I/m doing a seq bench test from a server on a GigE network connection, and it's reporting a read rate of 146MB/s... which would be >1Gbps by my math.
[23:51] <gregaf> there's already a facility for setting the pgp_num in that create command, it just doesn't set it to automatically match the pg_num, so stick it in front of the pg_num = line
[23:51] <gregaf> ie
[23:52] <gregaf> diff --git a/src/mon/OSDMonitor.cc b/src/mon/OSDMonitor.cc
[23:52] <gregaf> index 3f0db37..c284319 100644
[23:52] <gregaf> --- a/src/mon/OSDMonitor.cc
[23:52] <gregaf> +++ b/src/mon/OSDMonitor.cc
[23:52] <gregaf> @@ -1974,7 +1974,7 @@ bool OSDMonitor::prepare_command(MMonCommand *m)
[23:52] <gregaf> if (m->cmd.size() > 4) { // try to parse out pg_num and pgp_num
[23:52] <gregaf> const char *start = m->cmd[4].c_str();
[23:52] <gregaf> char *end = (char*)start;
[23:52] <gregaf> - pg_num = strtol(start, &end, 10);
[23:52] <gregaf> + pgp_num = pg_num = strtol(start, &end, 10);
[23:52] <gregaf> if (*end != '\0') { // failed to parse
[23:52] <gregaf> err = -EINVAL;
[23:52] <gregaf> ss << "usage: osd pool create <poolname> [pg_num [pgp_num]]";
[23:52] <gregaf> since you're already working there :)
[23:55] <gregaf> is that what perplexed's problem was?

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.