#ceph IRC Log

Index

IRC Log for 2013-05-14

Timestamps are in GMT/BST.

[0:03] <ay> Hm.
[0:04] <ay> Three node cluster. Run service ceph -a start on node01. That starts up flawlessly except the osd.3 on nodel03. That one says "global_init: unable to open config file from search list /tmp/ceph.conf.1882486b3857880ff99dc1475ba92064"
[0:04] <ay> Starting it manualy with /etc/ceph/ceph.conf config works.
[0:04] <ay> Any thoughs why?
[0:06] <davidz> cjh_: I'm told yes.
[0:17] <cjh_> i see
[0:17] <cjh_> i'll have to figure out some other way to stop them from writing then
[0:18] * Volture (~Volture@office.meganet.ru) has joined #ceph
[0:19] <Volture> Hi all
[0:19] * BillK (~BillK@124-169-231-135.dyn.iinet.net.au) has joined #ceph
[0:19] * leseb1 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:21] <Volture> tell me. how to fix this problem? http://pastebin.com/bkqfKxSj - Ceph version 0.56.4
[0:25] <davidz> cjh_: I meant that yes it works for both.
[0:26] * rturk is now known as rturk-away
[0:28] <Volture> Please tell me. how to fix this problem? http://pastebin.com/bkqfKxSj - Ceph version 0.56.4
[0:32] <Volture> Please tell me. how to fix this problem? http://pastebin.com/bkqfKxSj - Ceph version 0.56.4
[0:33] <dmick> I think Volture wants someone to tell him how to fix this problem
[0:39] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[0:39] <cjh_> davidz: oh ok i'll have to try it out tonight and see if i can get it working
[0:45] <sjust> sagewk: wip_5020 looks ok?
[0:46] * dwt (~dwt@128-107-239-235.cisco.com) Quit (Quit: Leaving)
[0:50] * coyo (~unf@00017955.user.oftc.net) Quit (Quit: F*ck you, I'm a daemon.)
[0:54] * themgt (~themgt@24-177-232-33.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[0:55] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:58] * Jakdaw (~chris@puma-mxisp.mxtelecom.com) has joined #ceph
[0:59] <loicd> Volture: 2013-05-14 02:12:32.766758 7ff2c62ad780  2 journal read_entry 169975808 : bad header magic, end of journal
[0:59] <loicd> did the file system supporting /srv/ceph/osd0/journal have troubles recently ?
[1:01] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[1:03] <Volture> loicd: I replaced the hard disk to a large size. Data copied by rsync. osd crush after start ((
[1:04] <iggy> I don't think rsync copies xattr's by default (dunno if that's the problem, but something to maybe look at)
[1:05] <Fetch> trying to configure radosgw on a cluster running CentOS. radosgw daemon starts (used init script from new branch), Apache is configured with fcgid per ceph.com docs, but I'm getting "Connection reset be peer: mod_fcgid: error reading data from FactCGI server" and the radosgw command in s3gw.fcgi is fast exiting
[1:06] <loicd> rsync -avHSX --numeric-ids does ( -X does ) but if Volture rsync'd a live osd, it may have copied a partial journal
[1:06] <Fetch> no errors in radosgw.log, any ideas on places to look?
[1:07] <Fetch> is the fast exit from the radosgw client as called by fcgi a problem?
[1:07] <Volture> loicd: All right. I now try
[1:08] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Connection reset by peer)
[1:11] <loicd> http://ceph.com/docs/master/dev/osd_internals/map_message_handling/ says : "OSD::enqueue_op calls PG::queue_op which checks can_discard_request before queueing the op in the op_queue and the PG in the OpWQ." but it looks like OSD::handle_op does it https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L6244 and OSD::enqueue_op queues it regardless https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L6461
[1:13] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:16] * tnt (~tnt@91.177.214.32) Quit (Ping timeout: 480 seconds)
[1:17] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:17] <Jakdaw> The docs suggest that using libvirt+QEMU+RBD is possible, and that you can create snapshots in that way. Has this ever worked or is it an outright lie?
[1:17] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:17] * sjustlaptop (~sam@38.122.20.226) Quit (Ping timeout: 480 seconds)
[1:17] <Jakdaw> Doesn't look like libvirt supports using RBD for the snapshots of storage, but a regular file for the snapshot of memory
[1:18] <Jakdaw> whereas qemu doesn't appear to support using RBD for the snapshot of memory (fair enough)
[1:19] * rustam (~rustam@94.15.91.30) has joined #ceph
[1:21] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[1:24] <Volture> loicd: Thank you
[1:24] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[1:24] <loicd> Volture: yw
[1:25] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:26] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) has joined #ceph
[1:28] * LeaChim (~LeaChim@176.250.188.136) Quit (Ping timeout: 480 seconds)
[1:36] <loicd> See OSD::handle_pg_(notify|info|log|query) relates to a function that has apparently been replaced by react() methods such as https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L6040
[1:37] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Read error: Connection reset by peer)
[1:37] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[1:39] <Fetch> actually, rgw is showing (with -d flag) RGWGC::process() failed to acquire lock on gc.22
[1:39] <Fetch> is that sufficient to crash ragosgw?
[1:48] <sjust> loicd: hmm, looks like it's a bit out of date, feel free to update the internals docs
[1:49] <loicd> sjust: thanks for the confirmation, trying to fix it will be a good exercise ;-)
[1:49] <sjust> yeah, sorry about that, should have been more careful
[1:50] <loicd> I appreciate your help, keeps me going. Thanks :-)
[1:58] * berant (~blemmenes@24-236-240-247.dhcp.trcy.mi.charter.com) has joined #ceph
[2:04] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Remote host closed the connection)
[2:06] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[2:15] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[2:25] * sagelap (~sage@2600:1012:b00a:d492:598c:d480:4af:b6ce) has joined #ceph
[2:34] * esammy (~esamuels@host-2-102-68-228.as13285.net) has joined #ceph
[2:39] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Read error: Connection reset by peer)
[2:39] * Qten (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[2:46] * esammy (~esamuels@host-2-102-68-228.as13285.net) Quit (Ping timeout: 480 seconds)
[2:47] <FroMaster> I've been reading about Ceph for the past few days and am stuck at step 0 - Discussion about hardware (virtual machine) configuration - CPU/Ram/Disk Layout, etc.
[2:47] <FroMaster> All I want to do is setup a 1-3 node minimal virtual instance (on vmware esx 5.1) and check out the end user interfaces (gateways, etc.)
[2:52] <FroMaster> Anyone got a suggestion on the CPU/RAM/Disk Size/Disk Layout for a small site?
[2:53] <iggy> Jakdaw: disk snapshots can be created on ver2 RBD images (unfortunately, that is not the default yet (afaik))
[2:53] <iggy> FroMaster: information like that has been covered pretty thoroughly on the mailing list, ceph blogs, and the docs
[2:54] <iggy> but honestly, if you're just "checking out" end user interfaces and running in VMs, it's probably not going to make a big difference
[2:55] <FroMaster> iggy: almost everything i've read talks about installing the software and not much about hardware configurations
[2:55] <Fetch> iggy: any ideas on why s3gw.fcgi would quick exit whenever called by fastcgi? does /var/lib/ceph/radosgw/* need to be owned by apache user, for instance?
[2:56] <iggy> FroMaster: because it helps to have a general knowledge of the software before you start planning hardware... so go setup a test cluster (maybe more than one), by that time you will probably have stumbled upon the hardware info you seek
[2:57] <iggy> Fetch: not off the top of my head, but look for the #ceph archives, I know other have had similar problems before and they may have mentioned how they fixed them
[2:57] <Fetch> thanks
[3:01] <FroMaster> iggy: I'm off to install ubuntu 12.04.2 in a 1CPU/1GB/100GB Virtual machine and follow the Ceph quick deploy guide... Hopefully i'll stumble my way into what I'm looking for :)
[3:02] <iggy> FroMaster: that's probably a good start, I know there are hardware recommendations in the docs somewhere, I just don't know off the top of my head
[3:03] <Fetch> iggy: is there typically only one radosgw process spawned when started from /etc/init.d ?
[3:04] <iggy> not sure, I'd think there would be multiple to handle more than one request at a time
[3:05] <Fetch> I'd think so too. IRC logs weren't helpful, unfortunately, so I was checking to see if everything's conformant up to the hairy Apache bit of the chain
[3:05] * rustam (~rustam@94.15.91.30) has joined #ceph
[3:07] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[3:07] * sagelap (~sage@2600:1012:b00a:d492:598c:d480:4af:b6ce) Quit (Ping timeout: 480 seconds)
[3:17] <FroMaster> I know I need to upgrade my ubuntu 12.04.2 default kernel (have 3.5.0 currently). How far do I need to go? 3.6.9, 3.6.11, 3.9?
[3:18] * berant (~blemmenes@24-236-240-247.dhcp.trcy.mi.charter.com) Quit (Quit: berant)
[3:18] <Fetch> FroMaster: for what, the kernel fs driver?
[3:18] <Fetch> I wouldn't bother with it, you can use the fuse driver with your kernel version
[3:20] <FroMaster> alright.. moving along :)
[3:21] * DarkAce-Z (~BillyMays@50.107.54.92) has joined #ceph
[3:22] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[3:23] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[3:24] * buck (~buck@bender.soe.ucsc.edu) Quit (Quit: Leaving.)
[3:25] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[3:27] <iggy> fuse is okay for testing, but it's going to be less stable and less performant
[3:28] <Fetch> iggy: kernel driver doesn't even currently (afaik) support format 2 images
[3:28] <iggy> Fetch: I don't know if you were limiting your searches to apache, but try other webservers (lighttpd, nginx, etc... iirc, the last time I heard someone with that problem, it was with lighttpd)
[3:28] <Fetch> if he's testing everything on a single node ceph install *shrug*
[3:29] <iggy> Fetch: for rbd, no... but that's not the kernel fs driver
[3:30] <Fetch> thanks...I wasn't specifically limiting to Apache, but I'm composing an email to the list anyway
[3:30] <Fetch> hopefully someone will be able to point out in what particular way I'm being an idiot while I sleep :)
[3:30] <iggy> where were you searching the irc archives?
[3:30] <Fetch> google
[3:30] <Fetch> it was working for results, just nothing that was a perfect match
[3:30] <iggy> look for cephlogbot's logs
[3:31] <Fetch> the ones that get posted at irclogs.ceph.widodh.nl ?
[3:32] * eternaleye (~eternaley@2607:f878:fe00:802a::1) Quit (Quit: ZNC - http://znc.in)
[3:33] * eternaleye (~eternaley@2607:f878:fe00:802a::1) has joined #ceph
[3:38] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[3:41] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[3:52] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[4:10] <via> joao: so, once i finish ocmpiling wip-4974, i'll back up my mon directories and try upgrading to that --anything i should know?
[4:19] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[4:25] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[4:34] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:36] <FroMaster> ceph-deploy gatherkeys ceph02 | Unable to find /etc/ceph/ceph.client.admin.keyring
[4:36] <FroMaster> What's the steps to troubleshoot?
[4:41] * ken1 (~quanta@117.7.237.74) has joined #ceph
[4:43] * brian_appscale (~brian@74-93-27-41-Minnesota.hfc.comcastbusiness.net) has joined #ceph
[4:43] * brian_appscale (~brian@74-93-27-41-Minnesota.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[4:57] <iggy> Fetch: that's it i think
[4:57] * rustam (~rustam@94.15.91.30) has joined #ceph
[4:58] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[4:59] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[5:10] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[5:20] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[5:20] * loicd (~loic@magenta.dachary.org) has joined #ceph
[5:37] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) has joined #ceph
[5:51] * Muhlemmer (~kvirc@86.127.208.243) Quit (Quit: KVIrc 4.3.1 Aria http://www.kvirc.net/)
[5:53] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[6:16] * athrift (~nz_monkey@222.47.255.123.static.snap.net.nz) Quit (Ping timeout: 480 seconds)
[6:18] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[6:20] * Volture (~Volture@office.meganet.ru) Quit (Remote host closed the connection)
[6:20] * rustam (~rustam@94.15.91.30) has joined #ceph
[6:35] <via> joao: nevermind, i updated to .61.1 successfully. thanks for the help
[6:35] <via> .2*
[7:02] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[7:06] * jks (~jks@3e6b5724.rev.stofanet.dk) Quit (Read error: Connection reset by peer)
[7:06] * jks (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[7:06] * Romeo_ (~romeo@198.144.195.85) Quit (Read error: Connection reset by peer)
[7:06] * Romeo (~romeo@198.144.195.85) has joined #ceph
[7:06] * Tamil (~tamil@38.122.20.226) Quit (Read error: Connection reset by peer)
[7:07] * pioto_ (~pioto@pool-96-235-30-25.pitbpa.fios.verizon.net) Quit (Remote host closed the connection)
[7:07] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:07] * pioto (~pioto@pool-96-235-30-25.pitbpa.fios.verizon.net) has joined #ceph
[7:07] * Tamil (~tamil@38.122.20.226) has joined #ceph
[7:10] * iggy2 (~iggy@theiggy.com) has joined #ceph
[7:10] * iggy___ (~iggy@theiggy.com) has joined #ceph
[7:11] * dignus (~dignus@bastion.jkit.nl) Quit (Remote host closed the connection)
[7:11] * todin (tuxadero@kudu.in-berlin.de) Quit (Remote host closed the connection)
[7:12] * iggy_ (~iggy@theiggy.com) Quit (Ping timeout: 480 seconds)
[7:12] * ken1 (~quanta@117.7.237.74) Quit (Quit: Leaving.)
[7:13] * iggy (~iggy@theiggy.com) Quit (Ping timeout: 480 seconds)
[7:13] * jpieper (~josh@209-6-205-161.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[7:16] * dignus (~dignus@bastion.jkit.nl) has joined #ceph
[7:17] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[7:18] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[7:18] * iggy2 is now known as iggy
[7:20] * jpieper (~josh@209-6-205-161.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[7:22] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[7:31] <Qten> lo, anyone know of any current patches/flags for grizzly which when creating VM's it will use the root disk in the flavor to image the rbd/glance image to this disk? (ie cinder create 10 --image-id xx-yy-zz)
[7:32] * br1 (~br1@static-217-133-175-104.clienti.tiscali.it) has joined #ceph
[7:33] * br1 (~br1@static-217-133-175-104.clienti.tiscali.it) Quit ()
[7:33] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[7:33] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[7:52] * tnt (~tnt@91.177.214.32) has joined #ceph
[7:59] * agh (~oftc-webi@gw-to-666.outscale.net) Quit (Quit: Page closed)
[8:00] * sachindesai (~Adium@173.228.107.95) has joined #ceph
[8:01] <Kioob> Hi
[8:02] <Kioob> you can't do a "snap purge" on a full OSD ???
[8:03] * sachindesai (~Adium@173.228.107.95) has left #ceph
[8:11] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:13] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[8:15] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[8:15] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[8:17] <Kioob> cd houkouonchi
[8:17] <Kioob> rofl
[8:17] <Kioob> bad window, sorry
[8:22] * rustam (~rustam@94.15.91.30) has joined #ceph
[8:24] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[8:29] * loicd (~loic@90.84.144.84) has joined #ceph
[8:32] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[8:33] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Copywight 2007 Elmer Fudd. All wights wesewved.)
[8:36] * fridudad (~oftc-webi@fw-office.allied-internet.ag) Quit (Quit: Page closed)
[8:36] * fridudad (~oftc-webi@fw-office.allied-internet.ag) has joined #ceph
[8:39] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[8:41] * loicd1 (~loic@90.84.144.93) has joined #ceph
[8:45] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[8:46] * loicd (~loic@90.84.144.84) Quit (Ping timeout: 480 seconds)
[8:46] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[8:56] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:02] <loicd1> ccourtaut: good morning sir :-)
[9:02] * loicd1 is now known as loicd
[9:16] <ccourtaut> loicd1: good morning!
[9:17] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[9:17] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:22] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) has joined #ceph
[9:22] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[9:23] * tnt (~tnt@91.177.214.32) Quit (Ping timeout: 480 seconds)
[9:27] * loicd (~loic@90.84.144.93) Quit (Quit: Leaving.)
[9:27] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:31] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:40] * leseb (~Adium@83.167.43.235) has joined #ceph
[9:40] * vipr (~vipr@78-23-113-37.access.telenet.be) has joined #ceph
[9:51] * fabioFVZ (~fabiofvz@213.187.20.119) has joined #ceph
[9:54] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[9:54] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[9:59] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[10:00] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[10:00] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[10:01] * LeaChim (~LeaChim@176.250.188.136) has joined #ceph
[10:25] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[10:25] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[10:35] * rustam (~rustam@94.15.91.30) has joined #ceph
[10:38] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[10:39] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[10:41] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[10:42] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[10:48] <tnt> Is there a way to simulate the effect of changing the crush tunables ?
[10:52] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Quit: foo)
[10:52] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[10:59] <joao> via, nothing I can think of
[11:01] <matt_> joao, have you guys had any further luck getting to the bottom of the mon store growth issue?
[11:02] <joao> matt_, not yet, sorry
[11:02] <matt_> no worries
[11:03] <joao> had to put that aside to work on other urgent bugs, but intend to go back to that asap
[11:03] <mrjack> matt_: did you try 0.61.2?
[11:04] <matt_> mrjack, not yet. I didn't realise it was out
[11:04] <mrjack> matt_: look at the ceph-user ML
[11:04] <mrjack> matt_: it fixes two bugs
[11:05] <matt_> mrjack, thanks for the heads up. I'll have a look in a second
[11:39] <Azrael> when an OSD dies, what the procedure one should follow to replace it? get a new disk and somehow fudge the old osd's ID onto that new disk?
[11:39] <Azrael> or give a new osd id and leave the old id alone?
[11:40] <wogri_risc> Azrael, AFAIK OSD's want to be numbered consecutively. if OSD X dies, replace the disk, format it and initialize again with X.
[11:43] <Azrael> ok
[11:43] <Azrael> so
[11:43] <Azrael> thats whwere i'm having trouble
[11:43] <Azrael> initializing a device with the proper osd id
[11:43] <wogri_risc> hm. I've never actually done this.
[11:43] <wogri_risc> but maybe you also need to delete the old OSD ID.
[11:44] <wogri_risc> and re-create it.
[11:44] <wogri_risc> I assume you've read the documentation?
[11:47] <tnt> I thought I had seen a 'replace failed osd' guide at some point in the doc but it doesn't seem to be there anymore.
[11:47] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[11:48] <tnt> That should be a fairly routine operation so it should have a well documented process.
[11:48] <Gugge-47527> i think you should just remove the broken one, and add a new
[11:48] <Gugge-47527> both steps is documented as far as i remember :)
[11:49] * bergerx_ (~bekir@78.188.101.175) has joined #ceph
[11:50] <tnt> yes, that will work but I think there is easier way, to avoid as much data movement as possible.
[11:50] <tnt> Azrael: doesn't ceph-osd -i {osd-num} --mkfs work to create the recreate a new fs with the old id ?
[11:59] * Dark-Ace-Z (~BillyMays@50.107.54.92) has joined #ceph
[12:00] <Gugge-47527> tnt, you havent marked the "broken" osd down yet?
[12:01] <Gugge-47527> out, not down :)
[12:02] * DarkAce-Z (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[12:03] * esammy (~esamuels@host-2-102-68-228.as13285.net) has joined #ceph
[12:05] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[12:05] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[12:10] * rustam (~rustam@94.15.91.30) has joined #ceph
[12:13] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[12:14] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[12:15] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[12:26] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[12:26] * rustam (~rustam@94.15.91.30) has joined #ceph
[12:29] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[12:29] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[12:36] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[12:36] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[12:37] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit ()
[12:37] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[12:37] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[12:38] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[12:39] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Remote host closed the connection)
[12:40] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit ()
[12:40] * rustam (~rustam@94.15.91.30) has joined #ceph
[12:41] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[12:42] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[12:42] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Quit: Leaving.)
[12:43] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[12:43] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[12:56] * nigwil (~idontknow@174.143.209.84) Quit (Remote host closed the connection)
[12:57] * nigwil (~idontknow@174.143.209.84) has joined #ceph
[12:57] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[13:13] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:13] * nigwil (~idontknow@174.143.209.84) Quit (Remote host closed the connection)
[13:13] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[13:14] * nigwil (~idontknow@174.143.209.84) has joined #ceph
[13:18] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[13:18] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[13:24] <tnt> Gugge-47527: huh, I'm not the one with the issue, Azrael is :p
[13:25] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[13:25] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[13:26] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[13:31] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 21.0/20130506154904])
[13:39] <mrjack_> hm
[13:40] <mrjack_> what does it mean: mon.1 [WRN] reached concerning levels of available space on data store (25% free)
[13:41] <tnt> it means mon.1 only has 25% of free disk space on the data disk.
[13:41] <mrjack_> is that a problem?
[13:41] <tnt> it's concerning :p
[13:42] <mrjack_> well... ;)
[13:42] <absynth> i'd start freeing space up
[13:42] <tnt> basically you really don't want to reach full disk on a mon ... or very bad things will happen.
[13:42] <absynth> ceph doesn't usually complain about anything that is not worth complaining about
[13:42] <nhm_> mrjack_: see the mailing list post about v0.61.2
[13:42] <mrjack_> hm
[13:42] <mrjack_> i think it does not calculate correctly
[13:42] * diegows (~diegows@190.190.2.126) has joined #ceph
[13:42] <mrjack_> . /dev/md6 771G 448G 284G 62% /data there is 38% free
[13:43] <absynth> check df -i
[13:46] <mrjack_> nope
[13:46] * jtaguinerd_ (~jtaguiner@124.6.182.55) has joined #ceph
[13:46] <mrjack_> 4% inodes in use
[13:46] <jtaguinerd_> hi all
[13:47] * dcasier (~dcasier@LVelizy-156-44-40-164.w217-128.abo.wanadoo.fr) has joined #ceph
[13:47] <jtaguinerd_> i think i have recreated this bug http://tracker.ceph.com/issues/3905
[13:48] <jtaguinerd_> i tried to restart but stale pgs won't go away
[13:48] <tnt> mrjack_: and you're sure the monitor data are on /data and not /var/lib/ceph/mon/ceph-a ?
[13:49] <mrjack_> tnt: i defined it in ceph.conf ot use that path, yep
[13:50] <tnt> and /data is only the mon ?
[13:50] <tnt> and if not, can you do a du -sh on the mon directory
[13:50] <mrjack_> hm
[13:51] <tnt> because 448G of mon data would be concerning.
[13:51] <mrjack_> no
[13:51] <mrjack_> there is other data on data
[13:51] <mrjack_> mon directory is 86M
[13:51] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[13:52] <tnt> ok, so ... not really a problem, there is some margin.
[13:52] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[13:52] <mrjack_> yes
[13:52] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:52] <mrjack_> but monitoring kicked in, because ceph is now state WARN instead of HEALTH_OK ;)
[13:53] <tnt> ah yes :p Well, if that's an option, I would put mon data on a separate partition so that you're sure that it can't be filled accidentely with some other data.
[13:58] * rustam (~rustam@94.15.91.30) has joined #ceph
[13:59] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[14:00] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[14:00] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[14:00] <mrjack_> tnt: yes, but i still think there is an error on calculation the 25%
[14:02] <absynth> hrm... is the ceph-deploy in the debian/ubuntu repositories broken?
[14:02] <absynth> SyntaxError: ('invalid syntax', ('/usr/lib/pymodules/python2.6/ceph_deploy/test/test_cli.py', 44, 26, ' assert {p.basename for p in tmpdir.listdir()} == set()\n'))
[14:03] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[14:03] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[14:04] * berant (~blemmenes@vpn-main.ussignalcom.com) has joined #ceph
[14:19] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[14:19] <tnt> Ok, mon/mds/rgw upgraded to 0.61.2 without troubles ... let see how it goes for the osds now.
[14:19] * ScOut3R_ (~ScOut3R@212.96.47.215) has joined #ceph
[14:20] <mrjack_> tnt: me, too, mons did no problems, osd also flawless
[14:21] * Dark-Ace-Z is now known as DarkAceZ
[14:21] <tnt> mmm, the mon are generating new log messages in ceph-mon.a.log that there were not generating before.
[14:22] <tnt> or maybe not ... I think it was just because it was the master for a brief period until they were all updated.
[14:23] <mrjack_> tnt: how long did it take until your monitors where converted?
[14:24] <tnt> Not sure ... 20 sec or so ?
[14:24] <mrjack_> hm
[14:24] <mrjack_> ok here it takes minutes...
[14:25] <tnt> Well, my cluster is really small ... 12 osds. I need to add the other 14 ones this week.
[14:25] <mrjack_> tnt: mine is even smaller ;)
[14:26] <mrjack_> tnt: but i had many monitor elections ( > 10k or so...)
[14:26] <mrjack_> maybe this takes time to convert
[14:27] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) Quit (Ping timeout: 480 seconds)
[14:28] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[14:28] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[14:32] <tnt> mrjack_: upgrading the PGs seems to take longer ...
[14:32] <mrjack_> tnt: well, that i think is expected to take longer, depending on the number of PGs you have..
[14:33] <mrjack_> i have mon quorum again *yeah*
[14:34] <absynth> anyone running 0.56.6 here?
[14:34] <paravoid> yes
[14:34] <mrjack_> absynth: i did, but currently upgrading
[14:34] <absynth> did scrubbing work?
[14:35] <absynth> as in, did you have the pre-0.56.6-memleaks or was it OK?
[14:35] <mrjack_> absynth: i had no noticable memleaks
[14:35] <tnt> absynth: I had the memleak ... even with 0.56.6
[14:35] <tnt> I'm upgrading now to 0.61.2 ... we'll see how memory behave after the update
[14:35] <paravoid> what kind of memleaks?
[14:36] <tnt> OSD growing in size.
[14:36] <paravoid> I did have a few OSDs suddenly grow a lot in size
[14:36] <paravoid> like tens of GB of RAM in a few minutes
[14:36] <absynth> sounds like the usual scrubbing memleak alright
[14:36] <paravoid> bug #?
[14:37] <tnt> http://tracker.ceph.com/issues/3883
[14:37] <absynth> no idea, we submitted one and i think it got shot down
[14:37] * wschulze (~wschulze@70.42.157.32) has joined #ceph
[14:37] <tnt> although the conclusion was completely wrong, it's not argonaut specific at all.
[14:38] <mrjack_> tnt: how long did PGs convert take? did you do ceph osd noout before restarting the osd?
[14:38] * markbby (~Adium@168.94.245.2) has joined #ceph
[14:39] <tnt> mrjack_: I did set the noout flag. PG conversion took a few minutes. ( like maybe a bit less than 5 )
[14:41] * loicd trying to figure out "DeletingState allows you to register a callback to be called when the deletion is finally complete. See PG::start_flush. "
[14:42] <mrjack_> tnt: ok PGs converting took seconds here .. ;)
[14:44] <tnt> each osd here has ~ 6k PGs ... I have a lot of pools (to allow setting rep size) and the # PGs formula wasn't all that clear back when I created them :p
[14:44] <mrjack_> tnt: i have ~ 4k PGs .. but it seems that it is not optimal to put my mon on a raid10 where there is other disk-io going on...
[14:45] <tnt> actually that's still suprising ... 12k PGs total and it says "osd.2 6611 PGs are upgrading" but it shouldn't have that many.
[14:49] <mrjack_> 15EB/s rd
[14:49] <mrjack_> :)
[14:51] * rustam (~rustam@94.15.91.30) has joined #ceph
[14:53] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[14:54] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[15:02] <via> one of my monitors newly upgraded to .61.2 crashed overnight: https://pastee.org/sznze
[15:04] <tnt> via: what's the available freespace ?
[15:05] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[15:05] <jmlowe> Anybody ever see this error? "osd.9 [ERR] 2.1c0 caller_ops.size 3002 > log size 3001"
[15:05] <via> tnt: like, disk space?
[15:05] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[15:05] <via> on the drive the monitor is on, almost 2 gigs
[15:06] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[15:09] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) has joined #ceph
[15:10] * wschulze (~wschulze@70.42.157.32) Quit (Quit: Leaving.)
[15:11] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[15:13] * wschulze (~wschulze@70.42.157.32) has joined #ceph
[15:15] <joao> via, I'm pretty certain I've seen that assert before, but don't seem to find the ticket nor any other email regarding that; would you mind opening a ticket for it?
[15:16] * ScOut3R_ (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[15:17] <tnt> Mmm, my mon data directory seems to be growing ...
[15:18] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[15:19] <joao> tnt, upgraded to cuttlefish?
[15:19] <tnt> yes ...
[15:20] <tnt> it does seem to go back down in size from time to time, but 1) it doesn't go down as much as it goes up, hence a global rise 2) it seems to grow more quickly over time.
[15:21] <joao> tnt, matt_ has been suffering from that too
[15:21] <joao> it's some leveldb issue we haven't been able to figure out yet
[15:21] <joao> we tried compacting the store from time to time, but although it seems to make it better it hasn't solved the issue
[15:21] <tnt> I'm in the middle of the osd rolling upgrade so I'll see if it quiets down after wards, but I can't remember the mon ever growing that large.
[15:22] * wschulze (~wschulze@70.42.157.32) Quit (Quit: Leaving.)
[15:22] * dfanciola (~dfanciola@212.147.27.221) has joined #ceph
[15:22] <joao> fwiw, the data used by the monitor is usually a couple hundred MB, although the store can grow a lot
[15:22] <tnt> (it's still only 300M on a 10G disk, but it was like 20M before the update 1h ago)
[15:22] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[15:24] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Ping timeout: 480 seconds)
[15:26] * wschulze (~wschulze@70.42.157.32) has joined #ceph
[15:26] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[15:27] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[15:28] <matt_> joao, have you ever looked at the data inside the sstables to see what data it isn't cleaning up?
[15:28] <joao> that didn't occur to me :x
[15:29] <joao> I wonder how feasible that is
[15:30] <matt_> it's basically just an append on key-value store, if the keys the mon store's are ascii you should be able to decode a bit of it just by opening it in a text editor
[15:30] <matt_> only*
[15:31] * esammy (~esamuels@host-2-102-68-228.as13285.net) has left #ceph
[15:32] <matt_> ha, just checked and it's very far from ascii. I'll find the doc that has the on-disk format specs, I was reading it today
[15:34] <mrjack_> how can i use syncfs with debian squeeze? i am running my own kernel (3.4.36 with syncfs)...
[15:36] <tnt> and it doesn't detect it by itself ?
[15:36] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:37] <mrjack_> tnt: no, the osd says there is no syncfs support - do i need a newer glibc?
[15:38] <mrjack_> oh and on one other node i got this: http://pastebin.com/wsCUCYjR
[15:38] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[15:40] * rustam (~rustam@94.15.91.30) has joined #ceph
[15:40] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) Quit (Ping timeout: 480 seconds)
[15:41] <tnt> mrjack_: I think that a custom version of the package should be enough.
[15:41] <mrjack_> hm?
[15:42] <tnt> http://pastebin.com/60AExcj5
[15:42] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[15:42] * yehuda_hm (~yehuda@2602:306:330b:1410:7849:6691:3662:529c) has joined #ceph
[15:42] <tnt> seems that if you have SYS_syncfs or __NR_syncfs defined during build, it will bypass the glibc and do a direct system call.
[15:43] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Now if you will excuse me, I have a giant ball of oil to throw out my window)
[15:44] <dfanciola> hi there, im having problems integrating the radosgw and keystone - i configured the NSS db for radosgw as in the docs but still getting "SigningCertNotTrusted" in the logs
[15:44] <mrjack_> tnt: i don't get it.. during which build? i use the prebuilt packages from ceph.com?
[15:44] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[15:46] <tnt> mrjack_: well, they build the packages using the default packages in debian and so synfs support isn't even built in. But if you rebuilt it youself using newer kernel headers from your custom kernel, it would detect it.
[15:47] <mrjack_> so i have to grab current git, co 0.61.2, run dpkg-buildpackage and install that?
[15:48] <tnt> yes.
[15:48] <tnt> but you need to make sure that the installed kernel headers are the ones from your custom kernel ...
[15:48] <mrjack_> yo np
[15:49] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[15:50] * jtaguinerd_ (~jtaguiner@124.6.182.55) Quit (Quit: jtaguinerd_)
[15:56] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Quit: Ex-Chat)
[15:59] <jmlowe> any inktank guys around?
[16:01] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:02] * rustam (~rustam@94.15.91.30) has joined #ceph
[16:03] <joao> matt_, yeah, I'll give it a shot
[16:03] <joao> thanks
[16:03] <joao> jmlowe, here, what's up?
[16:04] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[16:04] <jmlowe> joao: any idea what this means? osd.7 149.165.228.11:6812/16051 1194 : [ERR] 2.1c0 caller_ops.size 3002 > log size 3001
[16:04] * BillK (~BillK@124-169-231-135.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:04] <joao> jmlowe, not really, sorry :\
[16:04] <mrjack_> tnt: hm it seems i cannot build it myself... - it requires default-jdk javahelper junit4 libleveldb-dev libsnappy-dev, but says libleveldb-dev is not found on squeeze...
[16:05] <tnt> mrjack_: huh ... well they have to build it somehow.
[16:06] <mrjack_> hmhm
[16:08] * humbolt (~elias@91-113-100-118.adsl.highway.telekom.at) has joined #ceph
[16:09] <humbolt> Anyone fit with ceph RBD usage in openstack?
[16:09] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[16:09] * ChanServ sets mode +o elder
[16:12] <Fetch> humbolt: I've recently been working on it, at about 85% complete
[16:16] * BillK (~BillK@124-169-186-145.dyn.iinet.net.au) has joined #ceph
[16:19] <humbolt> I can not get glance to work with it
[16:19] <humbolt> Fetch: I am getting a permission error, even when I am NOT using cephx
[16:21] <Fetch> glance has filesystem permissions to read ceph.conf and the keyfile ?
[16:21] <Fetch> (they were locked down by default on my system, had to open them up)
[16:22] * ShaunR (~ShaunR@staff.ndchost.com) Quit (Read error: Connection reset by peer)
[16:25] <humbolt> Fetch: the are world readable and the keys are owned by glance and cinder
[16:25] * loicd trying to figure out "Things to Note" in https://github.com/ceph/ceph/blob/master/doc/dev/osd_internals/recovery_reservation.rst
[16:26] <humbolt> Fetch: what is the best way to test this?
[16:26] <humbolt> I am doing this: glance --os-tenant-id 8f7c04016bf6404cb09650e4193cbe5d image-create --name="precise-ceph" --is-public=True --disk-format=raw --container-format=ovf --file stackimages/precise-server-cloudimg-amd64-disk1.img
[16:26] <humbolt> Request returned failure status.
[16:26] <humbolt> 500 Internal Server Error
[16:26] <humbolt> The server has either erred or is incapable of performing the requested operation.
[16:26] <humbolt> (HTTP 500)
[16:27] <Fetch> what are you seeing in your glance api log?
[16:27] <humbolt> 2013-05-14 16:23:59.399 46662 TRACE glance.api.v1.images PermissionError: error calling connect
[16:27] <humbolt> 2013-05-14 16:23:59.399 46662 TRACE glance.api.v1.images
[16:27] <humbolt> 2013-05-14 16:26:42.518 ERROR glance.api.v1.images [3ae51120-83da-41d1-9290-4da2e7be66b3 4c015bf46a304e41beadd76c49bae883 8f7c04016bf6404cb09650e4193cbe5d] Failed to upload image
[16:28] * aliguori (~anthony@32.97.110.51) has joined #ceph
[16:28] <humbolt> Fetch: that is from glance log
[16:29] <Fetch> *nod* I had same error, trying to remember fix
[16:29] <humbolt> I love IRC, it is like a collective mind!
[16:30] * Teduardo (~DW-10297@dhcp92.cmh.ee.net) has joined #ceph
[16:30] <Teduardo> is radosgw multi-user/multi-tenant and does it keep track of usage?
[16:30] <Fetch> humbolt: heh. I used cephx, never turned it off. Can you use rbd to create an image in the pool in question?
[16:30] <tnt> Teduardo: it's multi-user.
[16:31] <tnt> Teduardo: Not sure if it keeps per-user usage yet.
[16:31] <tnt> you can see usage per bucket at least.
[16:32] * wschulze (~wschulze@70.42.157.32) Quit (Quit: Leaving.)
[16:36] <humbolt> Fetch: well, rbd is working fine
[16:37] <Fetch> tnt: I'm configuring rgw to plug into keystone. You have any ballpark numbers for token cache size and revocation interval for a small (6 node) test cluster?
[16:37] <humbolt> Fetch: I also tried to turn of cephx, that does not seem to be the problem.
[16:37] <humbolt> is glance looking for the keyring in a specific location under a specific filename?
[16:38] <tnt> Fetch: sorry, never used keystone, I only have a couple of users used for privilege separation inside our app ... no external users.
[16:38] <dfanciola> fetch: i took the values i found in the juju charm
[16:38] <Fetch> humbolt: I'd leave cephx on, but anyway moving on: have you verified that the rbd pool in the glance api config is set properly? In the default config file, it's defined pretty far down. I ended up accidentally defining it twice, and it's the second definition that was in effect
[16:38] <dfanciola> which are 500 for cache size and 600 for the interval
[16:38] <Fetch> dfanciola: thanks, I'll snag those :)
[16:39] <Fetch> dfanciola: how're you liking Juju? I just heard about it for the first time the other day
[16:39] <humbolt> Fetch: that part is correct, but I will recheck.
[16:39] <dfanciola> but im stuck at error HTTP 405 on the gateway
[16:40] <humbolt> Fetch: what about known_stores, do I have to uncomment this?
[16:40] <Fetch> mine's uncommented, but it shouldn't be necessary
[16:40] <Fetch> but it won't hurt
[16:40] <dfanciola> fetch: im not using juju for the moment, only taking what i miss :)
[16:40] <humbolt> Fetch: I think I found the problem!
[16:40] <Fetch> humbolt: what was it?
[16:41] <Fetch> (since IRC is logged, I'm a big believer in putting your solution out there :)
[16:41] <humbolt> Fetch: If using cephx authentication, this file should include a reference to the right keyring in a client.<USER> section
[16:41] <humbolt> and it does not!
[16:43] <humbolt> any idea, do they mean a [client.username] section? or is this added under [global]?
[16:43] <Fetch> client.username
[16:44] <Fetch> in the key file
[16:44] <Fetch> for instance, in /etc/ceph/keyring I have a section
[16:44] <humbolt> well there are several keyfiles, one for each user
[16:44] <Fetch> [client.glance] key = AQDTd4lRkLwuKhAA4B7MvwEgpWMss
[16:45] <humbolt> and the comment, I just pasted was referring to ceph.conf
[16:45] <humbolt> and ceph auth list, lists the keys
[16:46] <Fetch> so glance reads ceph.conf to figure out what keyring to read for its client key
[16:46] <Fetch> I just shoved everything /etc/ceph/keyring because that's how I role
[16:46] * ShaunR (ShaunR@ip72-211-231-130.oc.oc.cox.net) has joined #ceph
[16:46] <Fetch> but otherwise yes, I believe you'd need a client.glance section with a keyring statement pointing to the client keyring
[16:47] * tnt hopes that's only a test cluster :p
[16:48] <Fetch> tnt: yeah.
[16:48] <absynth> baaah
[16:49] <absynth> found a bug in owncloud, spent 3 hours circumventing other bugs to reproduce it
[16:49] <absynth> need holiday.
[16:49] <loicd> absynth: :-)
[16:50] <absynth> or maybe beer.
[16:50] <Fetch> a beer holiday
[16:51] <absynth> we just had that
[16:51] <absynth> father's day...
[16:51] * loicd hands a beer to absynth
[16:52] <absynth> wait... are you french?
[16:56] <joao> absynth, is that what you guys are not calling oktoberfest ?
[16:56] <joao> *now
[16:56] <nhm_> I suppose so long as it's afternoon in europe it wouldn't be so bad so long as I had European beer.
[17:03] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[17:03] <absynth> joao: no, father's day is the fun equivalent to mother's day
[17:04] <absynth> it's always on some christian holiday (i keep confusing them, something about jesus and god and the heaven)
[17:04] <absynth> in may
[17:04] <joao> absynth, we also have that here, at some point during april
[17:04] <absynth> so people get drunk. fathers.
[17:04] <absynth> non-fathers, too.
[17:04] <joao> yeah, no drinking here though
[17:04] <joao> maybe that's why I think it's so boring
[17:11] <mrjack_> lol
[17:11] <mrjack_> :)
[17:11] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[17:13] <nhm_> joao: interesting, isn't catholicism pretty big there? At least in the US they are the ones more well known for holiday drinking. ;)
[17:14] * sagelap (~sage@2600:1012:b00a:7d5b:598c:d480:4af:b6ce) has joined #ceph
[17:14] <joao> yeah, most of the population is either catholic or of catholic descent
[17:14] <absynth> yeah, the irish...
[17:14] <joao> but I don't think we have drinking holidays
[17:15] <joao> that's mostly an Irish thing I guess
[17:15] <absynth> and German!
[17:15] <absynth> the only holiday that does not include drinking occasions is probably x-mas
[17:15] <absynth> s/occasions/traditions/
[17:16] <nhm_> absynth: my family has little alcohol on holidays, but my wife's family has beer and wine at every holiday. It's much better that way. ;)
[17:16] <joao> okay, I mean, in the country side, small villages and all that, there is a big tradition in drinking (mostly custom made beverages prepared in the town square during special holidays)
[17:17] <joao> but not so much in the cities
[17:17] <absynth> nah, i don't mean that in a "unbearable sober" way, but more or less nationally accepted customs
[17:17] <joao> err
[17:18] <absynth> for new years, we have, well, new year's eve. then we have "carnival" which is massive in some areas. then easter (with bonfires from pagan tradition -> lots of alcohol), raising of the "may tree" on 1st of may, father's day... then there's a long drought period until Oktoberfest :)
[17:18] <joao> nhm_, if we were to have beer and wine only at holidays, then I guess we would consider the whole year round as a big holiday :p
[17:18] <joao> I rarely drink though
[17:19] <absynth> oh, and there's world hosting day
[17:19] <absynth> almost forgot about the one occasion where i saw joao drink :)
[17:19] <absynth> so, if you factor in IT trade shows, we have about a half-dozen more occasions
[17:20] <humbolt> health HEALTH_WARN 665 pgs stuck unclean; 1 mons down, quorum 1,2 1,2
[17:20] <humbolt> but all mons are running?!
[17:20] <absynth> humbolt: here, have a beer.
[17:20] * rustam (~rustam@94.15.91.30) has joined #ceph
[17:20] <humbolt> absynth: means what?
[17:20] <humbolt> absynth: to wait?
[17:22] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[17:22] <Fetch> humbolt: if all mons are running and 1 shows down, often a network/firewall issue
[17:22] <joao> humbolt, are you sure all monitors are running on the same version?
[17:23] <humbolt> joao: no, right. They are not, I only updated one of them! Thanks!
[17:23] <joao> humbolt, pre-0.59 monitors are unable to speak to 0.59+ monitors
[17:24] * dfanciola (~dfanciola@212.147.27.221) Quit (Quit: Leaving)
[17:25] <humbolt> Fetch: no firewall
[17:26] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[17:26] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[17:29] <joao> ah
[17:30] <fabioFVZ> :(
[17:30] <joao> via, found the bug that had that assert
[17:30] <joao> it's #4999
[17:30] <joao> doh
[17:30] <joao> it so happens to be the bug I've been working on today -_-
[17:31] <via> oh
[17:31] <via> well, cool, let me know if there's anything i can do
[17:31] <joao> got thrown off due to another segfault also on that ticket
[17:33] <joao> via, will do, thanks! :)
[17:34] * tkensiski (~tkensiski@209.66.64.134) has joined #ceph
[17:34] * tkensiski (~tkensiski@209.66.64.134) has left #ceph
[17:35] * BManojlovic (~steki@91.195.39.5) Quit (Remote host closed the connection)
[17:36] <mrjack_> :/
[17:36] <mrjack_> one of my mon crashed and won't start up again after upgrading to 0.61.2
[17:37] <joao> mrjack_, crash dump?
[17:38] <joao> mrjack_, which version are you coming from?
[17:38] <mrjack_> joao: 0.56.6
[17:39] <joao> mrjack_, is this reproducible?
[17:40] <mrjack_> yes, fails now, and unable to start up mon
[17:42] <joao> mrjack_, can you run the monitor again with debug mon = 20 and debug paxos = 20 and post the log?
[17:43] <mrjack_> yes, how can i do it?
[17:43] <mrjack_> jsut put it in global section?
[17:43] <joao> mrjack_, add 'debug mon = 20' and 'debug paxos = 20' to your [mon] on ceph.conf
[17:45] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) Quit (Ping timeout: 480 seconds)
[17:46] * rustam (~rustam@94.15.91.30) has joined #ceph
[17:46] * tkensiski (~tkensiski@209.66.64.134) has joined #ceph
[17:46] * tkensiski (~tkensiski@209.66.64.134) has left #ceph
[17:47] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[17:48] * masACC is now known as maswan
[17:48] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:50] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:58] * dcasier (~dcasier@LVelizy-156-44-40-164.w217-128.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[17:58] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[17:59] * fabioFVZ (~fabiofvz@213.187.20.119) Quit (Remote host closed the connection)
[18:02] * sagelap1 (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[18:03] * ChanServ sets mode +v wogri
[18:05] * sagelap (~sage@2600:1012:b00a:7d5b:598c:d480:4af:b6ce) Quit (Ping timeout: 480 seconds)
[18:08] * rustam (~rustam@94.15.91.30) has joined #ceph
[18:09] * tnt (~tnt@91.177.214.32) has joined #ceph
[18:10] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[18:11] * FroMaster (~DM@static-98-119-19-146.lsanca.fios.verizon.net) Quit ()
[18:11] <joao> via, do you happen to have the full log for that crash?
[18:11] <humbolt> health HEALTH_WARN 5 pgs degraded; 1280 pgs stuck unclean; recovery 154/29865 degraded (0.516%)
[18:11] <humbolt> what now?
[18:12] <humbolt> pgmap v36712: 1280 pgs: 6 active, 1269 active+remapped, 5 active+degraded; 39222 MB data, 122 GB used, 11259 GB / 11382 GB avail; 154/29865 degraded (0.516%)
[18:12] <sagewk> humbolt: you probably have a down (but not out) osd, or not enough osds preventing ceph from replicating
[18:13] <humbolt> osdmap e322: 7 osds: 7 up, 7 in
[18:13] <humbolt> no
[18:13] <sagewk> what does 'ceph osd tree' say?
[18:13] <humbolt> does not look like it
[18:14] <humbolt> Think I might know the reason now
[18:15] <humbolt> All OSDs are up. But I renamed the hosts in ceph.conf, as it was complaining about the dots in the names paris3.san
[18:18] * alram (~alram@cpe-75-83-127-87.socal.res.rr.com) has joined #ceph
[18:18] <humbolt> why do my hosts have a weight of 0? -2 0 host paris3.san
[18:19] <sagewk> not sure, but that explains the degraded.
[18:19] <sagewk> the host weight is the sum of the osds beneath it
[18:20] <sagewk> so 'ceph osd crush reweight osd.something 1' (or however you are doing your weights)
[18:20] <sagewk> (we recommend units of TB)
[18:21] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[18:22] <sagewk> matt_: around?
[18:22] <humbolt> sagewk: but my osds have the right weight
[18:22] <humbolt> 1 3 osd.1 up 1
[18:22] <humbolt> 3TB
[18:23] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[18:25] <sagewk> can you pastebin the whole ceph osd tree output?
[18:27] <via> joao: when you say full log
[18:27] <via> you mean more than the last few hundred lines? or at a different debug level
[18:29] * dikkjo (~dikkjo@46-126-128-50.dynamic.hispeed.ch) has joined #ceph
[18:29] <joao> via, more than the last few hundred lines
[18:29] * dwt (~dwt@rtp-isp-nat1.cisco.com) has joined #ceph
[18:29] <joao> at least a considerable portion of it
[18:30] <joao> if you were to be able to reproduce it with a higher debug level, that would be awesome too :)
[18:30] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Quit: Ex-Chat)
[18:36] <via> joao: http://mirror.ece.vt.edu/pub/mon.2.log
[18:36] <humbolt> Can somebody tell me, if there is something wrong with this?
[18:36] <humbolt> http://pastebin.com/Jd0CEMCG
[18:37] <humbolt> Updated pastebin http://pastebin.com/ZM3gMq8V
[18:39] * al (d@niel.cx) Quit (Remote host closed the connection)
[18:39] <joao> via, thanks
[18:40] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:41] <joao> via, got it; thanks!
[18:45] <via> sure
[18:46] * joao-laptop (~Adium@89-181-145-151.net.novis.pt) has joined #ceph
[18:47] * dcasier (~dcasier@80.215.8.248) has joined #ceph
[18:48] <jmlowe> I think I need some help here
[18:48] <joao> wth
[18:48] <joao> how did I login on the laptop?
[18:49] <jmlowe> 2700 pgs: 2216 active+clean, 18 active+remapped+wait_backfill, 414 active+remapped, 31 active+remapped+backfilling, 12 active+degraded, 1 active+degraded+backfilling, 5 active+degraded+remapped+wait_backfill, 2 active+degraded+remapped+backfilling, 1 active+clean+inconsistent; 4152 GB data, 8348 GB used, 74526 GB / 82874 GB avail; 1023KB/s wr, 103op/s; -8984/5392261 degraded (-0.167%)
[18:49] * joao-laptop (~Adium@89-181-145-151.net.novis.pt) Quit ()
[18:49] * rustam (~rustam@94.15.91.30) has joined #ceph
[18:50] <jmlowe> http://pastebin.com/cbbues2m
[18:51] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[18:53] <cjh_> i think the stripe width of 64K is a little small. shouldn't it be more like 1MB if you want fast throughput? The downside is more wasted space right?
[18:54] <jmlowe> any inktank guys around?
[18:55] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[18:56] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[18:58] <jmlowe> sagelap1: you around?
[19:02] <jmlowe> joao: I think I need some help with my crushmap, can you help me?
[19:02] <joao> jmlowe, I can certainly try
[19:02] <jmlowe> http://pastebin.com/cbbues2m
[19:03] <joao> everything looks okay
[19:03] <joao> on that front at least
[19:03] <jmlowe> I restarted osd.17 and it seemed to hop outside of my tree and gwioss2.iu.xsede.org was truncated to gwioss2
[19:03] <joao> hmm
[19:04] <joao> there were some users mentioning something like that last week
[19:04] <jmlowe> I dumped my crushmap and it doesn't look like my tree
[19:04] <joao> didn't follow the conversation though
[19:04] <joao> sagewk, does this seem familiar to you? ^
[19:04] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:04] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[19:04] <joao> look, there's gregaf
[19:05] <joao> he ought to know something about this :p
[19:05] <jmlowe> gregaf: help!
[19:05] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[19:05] <gregaf1> sorry, my laptop appears to be breaking; what's up?
[19:06] <joao> gregaf1, jmlowe restarted an osd and it came back outside his tree
[19:06] <jmlowe> I restarted an osd and it hopped outside of my tree and created a new host
[19:06] <jmlowe> http://pastebin.com/cbbues2m
[19:06] <joao> see http://pastebin.com/cbbues2m
[19:06] <joao> lol
[19:06] <jmlowe> my current state is now 2700 pgs: 2222 active+clean, 16 active+remapped+wait_backfill, 414 active+remapped, 28 active+remapped+backfilling, 12 active+degraded, 5 active+degraded+remapped+wait_backfill, 2 active+degraded+remapped+backfilling, 1 active+clean+inconsistent; 4152 GB data, 8323 GB used, 74551 GB / 82874 GB avail; 552KB/s wr, 36op/s; -14954/5382605 degraded (-0.278%)
[19:07] <joao> gregaf1, I recall some people mentioning something of the sorts last week, but didn't follow the conversation
[19:07] <joao> so I don't really know if something came out of it
[19:07] * The_Bishop (~bishop@2001:470:50b6:0:80d2:31ad:4852:4a37) has joined #ceph
[19:07] <joao> thought you might have an inkling of what may be happening
[19:08] <gregaf1> hmmm, looks like the upstart scripts (? if that's your init system, jmlowe) are setting the CRUSH location a little differently from how you had it before
[19:08] <sagewk> yeah back in 5 min
[19:08] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit ()
[19:08] <gregaf1> they (optionally, I think) do that now, based on the output of hostname -s or something
[19:08] <jmlowe> ubuntu 12.10, so yes upstart
[19:09] <jmlowe> how do I get things happy again?
[19:09] <gregaf1> I believe there's a way to disable it if you want to
[19:10] <gregaf1> you can set "osd crush update on start = false" in the [osd] section of your ceph.conf
[19:10] <gregaf1> and just move the daemon back into the right location
[19:10] * coyo (~unf@pool-71-170-191-140.dllstx.fios.verizon.net) has joined #ceph
[19:10] <gregaf1> alternatively, accept that short hostname-based naming scheme and put them in the tree instead of the long hostnames, then restart everybody *shrug*
[19:12] <jmlowe> ceph osd crush move 17 −7 ?
[19:12] <sagewk> jmlowe: cuttlefish now updates osd crush weights on start, but it assumes that the osd lives under a node named after `hostname -s`
[19:13] <gregaf1> jmlowe: I believe you can use names, not IDs
[19:13] <jmlowe> guess I shouldn't have ignored those warnings
[19:13] <gregaf1> and you might need to specify bucket types but I'm not sure; this part should be doc'ed pretty well
[19:13] <sagewk> jmlowe: you can disable that behavior with 'osd update crush on start = false' (or something similar).. but i would recommend structuring your crush map using hostname -s named nodes so that things are more 'normal' moving forward
[19:13] <sagewk> sorry, just catching up :)
[19:14] <jmlowe> so for me ceph osd crush move 17 gwioss2.iu.xsede.org
[19:14] <jmlowe> http://ceph.com/docs/master/rados/operations/control/ specifies the first argument as {id}
[19:16] <jmlowe> ok, this makes it more clear http://ceph.com/docs/master/rados/operations/crush-map/
[19:17] <cjh_> can the rados command write to a specific rbd image?
[19:17] <jmlowe> making it ceph osd crush move osd.17 host=gwioss2.iu.xsede.org
[19:17] * coyo (~unf@00017955.user.oftc.net) Quit (Quit: F*ck you, I'm a daemon.)
[19:18] * BillK (~BillK@124-169-186-145.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[19:19] <jmlowe> hmm, invalid argument
[19:19] <jmlowe> can I just inject a correct crushmap to fix this?
[19:21] <gregaf1> yeah, that'd work, but the CLI should be easier to work with :/
[19:22] <jmlowe> also note that when I dumped my crushmap it did not seem to be updated by the osd upstart script
[19:25] * al (quassel@niel.cx) has joined #ceph
[19:27] <jmlowe> ok, well I moved the host into the same rack with itself
[19:29] <jmlowe> I'm happier with my status I think
[19:31] * dcasier (~dcasier@80.215.8.248) Quit (Ping timeout: 480 seconds)
[19:32] * eegiks (~quassel@2a01:e35:8a2c:b230:6d37:4c4c:b170:8dff) Quit (Ping timeout: 480 seconds)
[19:40] * DarkAce-Z (~BillyMays@50.107.54.92) has joined #ceph
[19:42] * eschnou (~eschnou@50.205-240-81.adsl-dyn.isp.belgacom.be) has joined #ceph
[19:44] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[19:44] * eegiks (~quassel@2a01:e35:8a2c:b230:61ec:6b94:d956:267e) has joined #ceph
[19:46] <cjh_> does anyone know the stripe width that rados bench uses?
[19:49] * sagelap1 (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Quit: Leaving.)
[19:50] <gregaf1> it defaults to 4MB objects; you can adjust that with -b (see the rados help text)
[19:50] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[19:53] <sjust> loicd: next has a patch changing the rules slightly on osd op submission
[19:53] <sjust> haven't merged next into master since that went in
[19:55] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[19:56] <cjh_> gregaf1: thanks :)
[19:57] <cjh_> those are still broken up into 64K chunks though when they're striped to the rbd object right?
[19:57] <gregaf1> rados bench and rbd don't have anything to do with each other...
[19:57] <cjh_> ok
[19:57] <cjh_> i r confused then
[19:58] <dmick> and striping is not the default for rbd images, benching aside
[19:58] * eschnou (~eschnou@50.205-240-81.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[19:59] <cjh_> i've been playing around a little with rbd format 2 and the stripe width/ count and it seems everything i change from the default lowers the write speed haha
[19:59] * dcasier (~dcasier@223.103.120.78.rev.sfr.net) has joined #ceph
[19:59] <humbolt> My cluster is in a weird condition. Can anybody have a look? http://pastebin.com/ZM3gMq8V
[20:02] <humbolt> how can I find out more about the condition, my cluster is in?
[20:03] * ifur (~osm@hornbill.csc.warwick.ac.uk) has joined #ceph
[20:03] <cjh_> humbolt: ceph health detail
[20:04] <humbolt> cjh_: and how do I fix the many "pg 7.40 is stuck unclean for 79507.204658, current state active+remapped, last acting [5,4,3]"
[20:04] <humbolt> and some pg 2.42 is stuck unclean since forever, current state active+remapped, last acting [0,1,2]
[20:04] <cjh_> you probably have a node that is down or needs to be restarted
[20:05] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[20:05] <humbolt> cjh_: but all nodes seem to be up, have a look: http://pastebin.com/ZM3gMq8V
[20:05] <cjh_> the weights are all messed up it looks like
[20:06] <cjh_> why are some 3 and some .06?
[20:06] <cjh_> dmick: is the striping option mostly for large qemu images?
[20:07] <humbolt> striping?
[20:07] <dmick> humbolt: the "dmick:" there means it's fro me
[20:07] <humbolt> the 0.069 are 70GB SSDs, the weight 3 are 3TB disks
[20:08] <cjh_> humbolt: i see. that makes sense
[20:08] <cjh_> why does paris4/5 have no osds?
[20:08] <dmick> cjh_: the idea is to spread out the load across multiple objects so that multiple accesses to the same image don't have to block on the same OSD/object
[20:08] <humbolt> the only entry there, that does not actually make any sense, that is -11 paris3. This does not exist
[20:09] <cjh_> dmick: ok i'm following
[20:10] <ifur> gotten some play money for testing cephfs (or i get to play with cephfs before using the hardware for something else), and would like to check with a dev with insight into cephfs state atm and whats worthwhile to test
[20:10] <cjh_> dmick: i'm searching for options to configure my cluster for nothing but maximum write throughput. my use case is many many servers just constantly writing to this thing as fast as possible. Large GB files
[20:11] <ifur> usefull data is very likely to result from this, getting to test it ontop ot dual-port infiniband FDR, meaning 100Gbits, basically...
[20:11] <jmlowe> can somebody tell me what this means "osd.9 [ERR] 2.1c0 caller_ops.size 3002 > log size 3001"
[20:13] * fridudad_ (~oftc-webi@p4FC2DD1A.dip0.t-ipconnect.de) has joined #ceph
[20:14] <humbolt> how can I gracefully change the hostnames of my osd hosts?
[20:14] <cjh_> humbolt: there's a crush command for that. check the wiki :)
[20:16] <dmick> jmlowe: that's officially "log_weirdness". No, honestly, that's the function. :)
[20:16] <humbolt> crush move does not sound to be the right tool for the job
[20:16] <dmick> sorry I don't know what it implies. sjust?
[20:16] <jmlowe> dmick: so do I need to worry?
[20:17] <humbolt> let me rephrase my question: can I stop the cluster, change the hostnames in my ceph.conf, start the cluster and then apply an updated crushmap? will this work without dataloss?
[20:17] <humbolt> the OSDs will stay the same, just the hostnames change.
[20:18] <cjh_> humbolt: you need to change the crushmap
[20:18] <cjh_> the ceph.conf won't do it
[20:19] <humbolt> cjh_: so I change dump the crushmap, change it and load it back and then I will change ceph.conf afterwards.
[20:19] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[20:19] <cjh_> humbolt: yeah that sounds good
[20:20] <humbolt> cjh_: how come, ceph does not become all confused in the process? I guess it does not care much, as long as the UUIDs on the OSDs stay the same, is that so?
[20:21] <sjust> jmlowe: probably not something to worry about, checking
[20:21] * tnt (~tnt@91.177.214.32) Quit (Read error: Connection reset by peer)
[20:22] * rturk-away is now known as rturk
[20:23] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[20:23] <dmick> cjh_: humbolt: I could be confused, but I don't think Ceph cares about the names of the buckets in crush matching hostnames. That's up to you for keeping your records straight, but, I'm pretty sure they're just bucket names
[20:23] <sjust> jmlowe: make a bug, that shouldn't happen
[20:24] <dmick> once you get to an osd name, then you have to communicate to the right host, but that's taken care of by the cluster map
[20:24] <sjust> jmlowe: won't cause trouble, but we should fix it
[20:26] * rustam (~rustam@94.15.91.30) has joined #ceph
[20:26] <cjh_> dmick: i see. i thought it was more than that
[20:26] * tnt (~tnt@91.176.27.204) has joined #ceph
[20:27] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[20:37] * rturk (~rturk@ds2390.dreamservers.com) Quit (Quit: Coyote finally caught me)
[20:37] * rturk-away (~rturk@ds2390.dreamservers.com) has joined #ceph
[20:37] * rturk-away is now known as rturk
[20:37] * ChanServ sets mode +o rturk
[20:38] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: my troubles seem so far away, now yours are too...)
[20:39] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[20:39] * ChanServ sets mode +o scuttlemonkey
[20:41] <ifur> what is most important for MDS and OSD for cephs, single core clock speed or number of cores?
[20:42] <ifur> *cephfs
[20:42] <ifur> unrealistic to combine the two...
[20:47] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[20:49] <humbolt> when I stop my cluster and then start it again, the crushmap is gone.
[20:50] <ifur> which sounds very sane to me, because the machines shouldnt assume they havent moved if you ask me.
[20:50] * eschnou (~eschnou@50.205-240-81.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:50] <dmick> humbolt: what do you mean the crushmap is "gone"?
[20:51] <humbolt> dmick: Well, I altered the crushmap and loaded it into the system with ceph osd setcrushmap -i
[20:51] <humbolt> but when I restart the cluster, this crushmap is not there
[20:52] <humbolt> and service ceph -a start gives me lines like this on on start: create-or-move updating item id 2 name 'osd.2' weight 1 at location {host=paris3,root=default} to crush map
[20:52] <dmick> oh, your changes appear to be gone. What are you using to examine the map?
[20:52] <humbolt> while the host in the new crushmap and in ceph.conf is now named ceph3 not paris3.
[20:52] <humbolt> ceph osd tree
[20:53] <humbolt> dmick: and ceph osd getcrushmap -o ... to get an editable copy
[20:55] <sjust> humbolt: the cuttlefish release notes mention something about this
[20:55] <sjust> there's a config you need to change
[20:55] <humbolt> dmick: I did alter the hostnames in crushmap and I through hosts out, as I was using two different hostnames for each host, to differentiate between ssd and hdd osd groups. I am doing this on another level now.
[20:56] <dmick> doh.
[20:57] <dmick> so, this is a new feature, and is the place where hostname actually *does* matter to crush
[20:57] <dmick> in that
[20:57] <dmick> the startup script is trying to be helpful and put osds under the right hosts
[20:57] <dmick> you can disable this autocorrection by setting osd crush update on start to false
[20:57] * ChanServ sets mode +v leseb
[20:57] <dmick> sorry for the misdirection; I'd missed this feature
[20:57] <humbolt> sjust: osd crush update on start = false??
[20:58] <sjust> looks right
[20:58] <sjust> humbolt: hmm, might be nice if we could detect whether the crushmap has been customized
[20:58] <sjust> something to consider for later
[20:59] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:00] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:02] * b1tbkt (~quassel@24-216-67-250.dhcp.stls.mo.charter.com) has joined #ceph
[21:02] <humbolt> the hostname on my system maps to 127.0.1.1, not my storage VLAN
[21:03] <humbolt> so that does not make a lot of sense for me here
[21:03] <b1tbkt> what is this telling me: "mdsmap e1: 0/0/1 up" ?
[21:06] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) has joined #ceph
[21:06] <ghartz> hello
[21:07] <ghartz> someone know when this bug will be fix? http://tracker.ceph.com/issues/3601
[21:10] <humbolt> now it did put all osds under the host, I was running ceph -a start from
[21:10] <PerlStalker> Is there any reason mons would hold a new election besides a mon losing network connectivity or otherwise going down?
[21:10] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:10] <humbolt> create-or-move updating item id 5 name 'osd.5' weight 1 at location {host=ceph3,root=default} to crush map
[21:10] <humbolt> that does not make much sense to me
[21:11] <humbolt> Am I starting this from the wrong machine?
[21:12] <humbolt> do I need to start the OSDs from the machine they are running on? Or can I do that from a central location?
[21:13] * bergerx_ (~bekir@78.188.101.175) Quit (Quit: Leaving.)
[21:14] <humbolt> This does not make much sense to me: http://pastebin.com/JrKRkkAy
[21:14] <humbolt> BTW, don't get a wrong impression from all my questions. Ceph is awesome and I love it!
[21:15] <humbolt> I just need to get to know it a little better yet.
[21:16] <humbolt> Why is it trying to mount this OSD locally? I am running this on ceph3 and it seems to believe it is on ceph5.
[21:16] <humbolt> Mounting xfs on ceph5:/srv/ceph/osd5
[21:16] <humbolt> df: `/srv/ceph/osd5/.': No such file or directory
[21:21] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[21:27] <humbolt> That's not nice: pgid currently maps to no osd
[21:38] <humbolt> I have osd.1 assigned directly to a host as an item, but also directly assigned as an item to a rack. Reason being, I have a virtual rack for SSDs and a virtual rack for HDDs, so I can assign pools to different HDD types via crushmap ruleset. Is this even allowed? Before I did have two hostnames for each server to make the differentiation on this level (host.ssd, host.hdd).
[21:38] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Quit: jlogan)
[21:38] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[21:39] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[21:41] <humbolt> Please help me with this last question.
[21:41] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[21:44] <mrjack> re
[21:48] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[21:54] <fridudad_> Are all leveldb issues solved? So is it save to upgrade from 0.56.6 to 0.61.2?
[21:55] <mrjack> fridudad hi
[21:59] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:01] * drokita1 (~drokita@199.255.228.128) has joined #ceph
[22:01] <mrjack> fridudad_: i upgraded today...
[22:01] <mrjack> fridudad_: but what issues do you refer to? bug-id?
[22:02] <loicd> sjust: I'll take a look at next, thanks for the hint :-D
[22:02] <ifur> you guys need to start showing up on HPC conferences, AND get proper RDMA using verbs.
[22:03] <drokita1> Question: I just noticed that my default location for my keyrings are on my root volume. It seems that a better place for those would be on the OSD volume like the documentation suggests. Is there an easy way to move them, or can they be deleted and recreated without issue?
[22:04] <fridudad_> mrjack: hi not sure i had to search for the bug id
[22:05] <jmlowe> ifur: how is rdma not going to be negated by amdahl's law?
[22:07] <ifur> jmlowe: b7y intels addition to the law saying that data paralellelism makes it irrelveant because of more computing power the more data
[22:07] <ifur> i work on actual computing problems, not idealised theoretical cases
[22:08] <nhm_> ifur: we were at SC12
[22:08] <nhm_> ifur: and Sage just presented at LUG
[22:09] <nhm_> ifur: I actually was there too.
[22:09] <ifur> nhm_: sorry then, SC is BIG!!! :P
[22:10] <ifur> nhm_: but wanted to talk to you guys about any possible bottlenecks or hardware gotchas
[22:10] <nhm_> ifur: our booth was stuck in the back by the whisper suites because we don't have seniority for booth preferences. :)
[22:10] <ifur> we are buying some gear for another purpose, but basically i get to test cephfs on FDR IB and high end sandy bridge xeons
[22:11] <ifur> i want to see what i can squeeze out of it
[22:11] <ifur> nhm_: aww. thats too bad... will you be in ISC in leipzig?
[22:11] <nhm_> ifur: what kind of controller(s) and how much expander oversubscription?
[22:12] <ifur> nhm_: i was going with supermicro dual expander jbods (45 disks) and 3 controllers per OSD (controllers not determined)
[22:12] <nhm_> ifur: We were talking about it but I think we're going to end up being too busy. We'll see.
[22:12] <ifur> and looking like 1 MDS and 3 OSD (yeah i know, small) one jbod per osd
[22:12] <Vjarjadian> anyone here using Ceph over 'normal' WAN? i'm coming to the point where the only thing left to do is test... just wondering if anyone has some tips for making it work better over the WAN... and if the WAN fails, will it go effectively turn into 2 clusters and rebalance/double the replication, and then return back to normal on reconnect.
[22:13] <humbolt> Finally! glance with RADOS backend working!
[22:13] * alex_ (~chatzilla@d24-141-198-231.home.cgocable.net) has joined #ceph
[22:13] <nhm_> ifur: the fastest setup I've tested is a SC847A with 4 cheap SAS9207-8i controllers, 24 spinning disks, and 8 Intel 520 SSDs.
[22:14] <nhm_> ifur: no expanders at all.
[22:14] <humbolt> What I like the most about ceph so far, I screwed around quite a lot, misconfigured crushmap, ... but ceph recovered in the end!
[22:14] <ifur> if thats from the blog, wasnt that for RGW only?
[22:14] <nhm_> ifur: arguably the DDN SFA10k we tested with ORNL was faster overall, but not per-node.
[22:15] <alex_> if using brtfs theres no need for a seperate journal partition right?
[22:15] <nhm_> ifur: all of the blog posts so far are just rados bench, and using a single controller with 8 drives.
[22:15] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[22:15] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:16] <nhm_> alex_: You still need* one, but it can do the journal and disk writes concurrently.
[22:16] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[22:16] <humbolt> Can you guys tell me, what is the recommended way to roll out a ceph cluster?! ceph-deploy?
[22:16] <nhm_> *technically you may still be able to not use the journal, but it's not recommended/tested afaik.
[22:16] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:16] * eschnou (~eschnou@50.205-240-81.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[22:16] <nhm_> humbolt: I think that's what we are pushing now. I still more or less just build them by hand.
[22:17] <nhm_> it's probably nicer than the old stuff I use. :)
[22:18] <ifur> nhm_: is it also possible to enable OSD caching, so it copy-on-write before writing to disk, and/or send ack before comitted to physical disk?
[22:18] <alex_> i got two big boxes of disks and two 1U boxe's of SSD's, so the journal configuring has got my brain stuck, can I put journal on seperate node from storage device?
[22:18] <jmlowe> ifur: I was referring to diminishing returns, my ethernet is sub millisecond, disk latency is 8ms, you are going to get less than a 12.5% improvement even if you drive your network latency down to zero
[22:18] <ifur> some big finished storage from DDN isnt an option nhere unfortuantly :/
[22:19] <nhm_> ifur: we've done some work looking into using COW for the journal<->disk when you are using btrfs with the journal on the same drive.
[22:19] <nhm_> ifur: not implemented yet though.
[22:19] <ifur> jmlowe: so your a factor of 1000 slower than IB
[22:19] * eschnou (~eschnou@50.205-240-81.adsl-dyn.isp.belgacom.be) has joined #ceph
[22:19] <nhm_> alex_: journals on the same node
[22:20] <ifur> jmlowe: FYI, IB is sumb microsecond -- faster than SSD....
[22:20] <jmlowe> doesn't matter, I'll never see it because spinning disks will mask any improvments in the networking
[22:21] <ifur> jmlowe: and with nodes on UPS, the storage node memory IS the disk cache
[22:22] <ifur> today there is little to no reasont o trust spindle disks when they say data has been written, because its not always the case.
[22:22] <ifur> nhm_: what performance should i get on each node you think, hopefully?
[22:22] <ifur> OSD i mean
[22:23] <alex_> thats not very possible, ssd's are on seperate node, i cant do something like ceph storage1:{/dev/mapper/LUN1} [cache1:/dev/mapper/LUN1] or something
[22:23] <nhm_> ifur: what are you using it for?
[22:23] <alex_> where storage one is where the big disks are, and cache1 is node of ssds
[22:24] <ifur> nhm_: cephfs, test setup, may end up with using ceph object storage afterwards, all depending on how second test goes.
[22:24] <ifur> nhm_: buyg some hardware for desktop storage, and i got to decide hardware and use it for testing and playing with before hand
[22:25] <ifur> but if the performance is good and it turns out to be reliable, who knows :-)
[22:25] <nhm_> alex_: ultimately the journal is allowing data to hit the node and guarantee atomic operations without having to do direct IO to the OSD itself. see: http://ceph.com/docs/next/rados/configuration/journal-ref/
[22:26] <ifur> nhm_: but yes, will hook it up to compute nodes to try and punish it with heavy IO
[22:26] <nhm_> ifur: CephFS performance isn't as good as the block layer right now. There may be some low hanging fruit, but we haven't had enough incoming funding from the HPC side to spend a lot of time looking at it. :/
[22:27] <ifur> nhm_: ball park? :P
[22:27] <ifur> nhm_: just so i have something to try and beat
[22:27] <nhm_> ifur: On a node that can do about 1.8GB/s with rados bench, I can get about 1.6GB/s with the block layer and around 700MB/s with cephfs. No real tuning.
[22:28] <Vjarjadian> nhm, is the reliability there in CephFS? for some slow may be OK... as long as it's reliabel
[22:29] <nhm_> Vjarjadian: Single MDS reliability is better than multi-MDS. Again though, a lot of our focus has been on cloud since that's where the funding is coming from right now. We've been working on trying to get some more funding for CephFS development though.
[22:29] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:30] <ifur> nhm_: ok, to ask it a different way then, what performance per spindle disk is feasible, and back to my questions, where are the likely hardware bottlenecks, or are they in software mostly?
[22:30] <Vjarjadian> i'm surprised the money hasn't got a windows client built yet...
[22:30] <ifur> nhm_: and assume multiple clients if it makes it easier
[22:31] <ifur> if i get the hardware i want, theoretical maximum network bandwidth will be 8GB/s, so bottleneck wont be there for sure.
[22:31] <nhm_> ifur: On the hardware side, assuming you are using QDR IB with IPoIB, my guess is that the big issues you will run into are related to expanders and controllers.
[22:31] * berant (~blemmenes@vpn-main.ussignalcom.com) Quit (Quit: berant)
[22:32] <ifur> nhm_: that was my guess as well... but would you go with software raid and tune that, or with brtfs?
[22:32] <nhm_> ifur: For some reason it seems like the io patterns ceph causes makes many expanders behave badly.
[22:32] <ifur> nhm_: LSI expanders no good for this? would using sas2 disks and setting up proper zoning help you think?
[22:32] * ChanServ sets mode +v andreask
[22:33] <nhm_> ifur: We did RAID5 on the DDN and that worked pretty well so long as cache mirroring was disabled. On my test node, I'm not using RAID at all, but if you want to do single-replica + RAID, you are best off making lots of small RAID5s (or RAID1, but then why not just do 2x replication?)
[22:33] <jmlowe> ifur: I'm pretty sure you are talking about building the equivalent of this http://www.blogcdn.com/japanese.engadget.com/media/2006/03/jet-beetle.jpg
[22:35] <ifur> nhm_: raid6 has better performance in these types of applications, i either go with raid10 or rad6... raid5 is never dividable by 2, so you end up with float and not very well aligned raids
[22:36] <nhm_> ifur: Don't know for sure. I haven't been able to do extensive testing yet. What I've seen though is that systems with expanders tend to have more problems reaching good per-drive speeds (maybe 50-60MB/s per drive, sometimes worse!) while systems with disks directly connected to the controllers can reach north of 100MB/s depending on the filesystem being used.
[22:36] <Vjarjadian> here comes capacity vs speed argument :)
[22:37] <ifur> nhm_: my expander experience is not very extensive.... hwoever, ive seen what you are talking of there with difference between rad5 and raid 6
[22:37] <nhm_> Vjarjadian: nope, I'm not getting in this argument, I hate raid controllers in general. ;)
[22:37] <alex_> anyone using brtfs in producction?
[22:37] <ifur> even with rad6 it craps itself if its not aligned... advtange of raid6 is double performance on parity and metadata
[22:37] <Vjarjadian> i'm only planning to use my first cluster as archive for backups... should keep things nice and simple without being expensive if it fails. dedup might be interesting on IO tho over WAN
[22:37] <ifur> raid5 should never be used outside a desktop environment
[22:37] <nhm_> alex_: most people that do end up deciding to go with XFS instead. It's just not quite there yet.
[22:38] <nhm_> alex_: it'll be fantastic when it is.
[22:38] <ifur> Vjarjadian: raid6 is ideal with 10 disks, rule of thumb is dividable by 2, so for 8 disks you can use both raid10 and raid5, but with 10 disks raid6 is going to be faster than raid10!!
[22:39] <ifur> so, did not mean rad5 there
[22:39] <Vjarjadian> ifur, but with ceph... wouldnt 10 OSDs be better than raid 10/6?
[22:39] <alex_> xfs doesn't seem very interesting
[22:39] <nhm_> ifur: ceph tends to prefer more OSDs. You'll likely see better performance with 2 5 disk RAID5 vs 1 10 disk RAID6.
[22:39] <ifur> Vjarjadian: were talking of the underlying storage if not using brtfs
[22:40] <nhm_> Vjarjadian: 10 OSDs generally is better if you are going to do replication, but some folks would rather do RAID5/6 with no replication.
[22:40] <ifur> nhm_: what about partitioning raid6? :) nah im just being silly, hehe
[22:40] <alex_> i use ZFS currently, but it burns through my ssds
[22:41] <nhm_> Vjarjadian: there are limits too. Say if you have 60 drives in 1 node, you may not want 60 OSDs because of the CPU/Memory overhead.
[22:41] <ifur> nhm_: i got lots of stuff to test out!
[22:41] <Vjarjadian> nhm, you have access to much bigger hardware than i do :)
[22:42] <nhm_> Vjarjadian: Too bad I don't get to keep it! Usually it's a 1-off thing for a vendor or a customer. :)
[22:43] <Vjarjadian> you know much about running over WAN? i asked a question earlier about the one i'm planning, but probably nobody in the know saw it
[22:43] <nhm_> ifur: just to warn you, high performance ceph seems to be pretty tricky to get right currently. The simpler designs seem to be the better ones at the moment.
[22:44] <nhm_> Vjarjadian: We basically only recommend it if you can treat it like a big local network.
[22:44] <nhm_> Vjarjadian: so like dark fiber
[22:44] <ifur> nhm_: so, lots of controllers, lots of nodes and least amount of complexity on storage infrastrcture?
[22:45] <Vjarjadian> i'm running site to site routed VPN, so different subnets but all able to contact each other
[22:45] <nhm_> ifur: so far that seems to be the way to go.
[22:45] <nhm_> ifur: there are a couple of platforms we'll be testing going forward that might change that, but I'm not sure when we'll be able to test them and what exactly we'll see.
[22:46] <nhm_> ifur: it's possible bcache or flashcache could really help too, but I haven't had time to try those either.
[22:48] <ifur> nhm_: would love to see cephfs mature
[22:48] <ifur> otherwise HDF5 ontop if ceph objectstorage wouldnt be that bad either
[22:49] <nhm_> ifur: We all do. :) It's too bad the US sequester hit HPC so hard.
[22:50] <alex_> anyone using opennebula 4 with ceph?
[22:50] <ifur> intel may be interested, could probably sell a proposal to them saying you would like to work with ceph against intel truescale or something :P
[22:50] <sagewk> davidz,sjust: where are we at with #4967, the osd min down reporters thing?
[22:51] <sjust> sagewk: I think david determined that it resulted from the configs being in the osd section
[22:51] <sjust> I think I saw a commit to master with the config names changed?
[22:52] <sagewk> yeah... wasn't sure if there was anything else. can we close the bug then?
[22:52] <davidz> sagewk, sjust: It is also a documentation issue for older releases. Does John need to fix and then close the bug?
[22:53] <sjust> do we update the docs for older versions, or are the online docs always generated from master?
[22:54] <sagewk> always master.
[22:54] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[22:55] <sagewk> it should just mention teh config names are different for 0.62 and earlier.
[22:55] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:59] <sagewk> i'll make the change and close the bug
[22:59] <davidz> sagewk: Oh, I didn't know the docs were generated so I assigned #5044 to John to correct them.
[23:00] <davidz> sagewk: I was leaving #4967 to deal with an issue of OSDs getting marked down. But we might close that "Can't reproduce"
[23:02] <sagewk> k
[23:05] <mrjack> i can say, cuttlefish dropped my load on my servers from ~ 4 to ~2
[23:06] <mrjack> where can i get libleveldb-dev for debian squeeze if i want to build squeeze packages with syncfs support myself?
[23:07] * danieagle (~Daniel@186.214.76.12) has joined #ceph
[23:07] <davidz> sagewk: bug 4967 closed
[23:08] <sagewk> mrjack: yehudasa yehuda_hm asked me that very thing this morning..
[23:09] <sagewk> glowell should know
[23:09] <yehuda_hm> heh
[23:09] <yehuda_hm> yehuda_hm
[23:09] <yehuda_hm> add "deb http://ceph.com/debian-leveldb squeeze main" to your /etc/apt/sources.list
[23:09] <yehuda_hm> mrjack: ^^^
[23:10] <glowell> Our leveldb backports are at cph.com/debian-leveldb. There are leveldb and libsnappy for squeeze, natty, and oneric
[23:14] <mrjack> ceph mon add 0
[23:14] <mrjack> unknown command add
[23:14] <mrjack> hmhmhm..
[23:15] <mrjack> glowell oh thx
[23:18] <mrjack> ah
[23:18] <mrjack> it is missing arguments...
[23:19] * DarkAce-Z is now known as DarkAceZ
[23:21] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[23:21] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[23:24] * alex__ (~chatzilla@d24-141-198-231.home.cgocable.net) has joined #ceph
[23:28] * alex_ (~chatzilla@d24-141-198-231.home.cgocable.net) Quit (Ping timeout: 480 seconds)
[23:30] * eschnou (~eschnou@50.205-240-81.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:32] * dikkjo (~dikkjo@46-126-128-50.dynamic.hispeed.ch) Quit (Quit: Leaving)
[23:33] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[23:36] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[23:46] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:55] * lerrie (~Larry@remote.compukos.nl) has joined #ceph
[23:55] * lerrie2 (~Larry@remote.compukos.nl) Quit (Read error: Connection reset by peer)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.