#ceph IRC Log

Index

IRC Log for 2013-04-19

Timestamps are in GMT/BST.

[0:00] * Forced (~Forced@205.132.255.75) Quit ()
[0:01] <ashleyx42> http://youtu.be/5SoJCUROykU
[0:03] * ashleyx42 (ashleyx42@c-76-108-138-141.hsd1.fl.comcast.net) Quit (Killed (tjfontaine (No reason)))
[0:05] * BillK (~BillK@58-7-240-102.dyn.iinet.net.au) has joined #ceph
[0:06] * loicd (~loic@67.23.204.150) Quit (Quit: Leaving.)
[0:09] * DarkAce-Z is now known as DarkAceZ
[0:10] * coyo (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[0:15] * jimyeh (~Adium@112.104.142.211) has joined #ceph
[0:16] * leseb (~Adium@67.23.204.150) has joined #ceph
[0:18] * Cube (~Cube@12.248.40.138) Quit (Read error: Operation timed out)
[0:23] * jimyeh (~Adium@112.104.142.211) Quit (Ping timeout: 480 seconds)
[0:25] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[0:28] * leseb (~Adium@67.23.204.150) Quit (Quit: Leaving.)
[0:30] * gmason (~gmason@12.139.57.253) has joined #ceph
[0:31] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[0:34] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[0:36] * leseb (~Adium@67.23.204.185) has joined #ceph
[0:37] * leseb (~Adium@67.23.204.185) Quit ()
[0:43] * verwilst (~verwilst@dD576962F.access.telenet.be) Quit (Quit: Ex-Chat)
[0:43] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[0:43] * LeaChim (~LeaChim@176.250.150.147) Quit (Read error: Connection reset by peer)
[0:46] * jimyeh (~Adium@112.104.142.211) has joined #ceph
[0:49] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[0:50] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:54] * jimyeh (~Adium@112.104.142.211) Quit (Ping timeout: 480 seconds)
[1:01] * rustam (~rustam@94.15.91.30) has joined #ceph
[1:01] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[1:07] * leseb (~Adium@67.23.204.185) has joined #ceph
[1:10] * loicd (~loic@67.23.204.150) has joined #ceph
[1:13] * dpippenger (~riven@216.103.134.250) Quit (Ping timeout: 480 seconds)
[1:13] * leseb (~Adium@67.23.204.185) Quit (Quit: Leaving.)
[1:14] * fghaas (~florian@67.23.204.150) has joined #ceph
[1:16] <loicd> joshd: does ceph support consistency groups ? ( in the context of https://etherpad.openstack.org/havana-cinder-volume-migration ;-)
[1:16] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) has joined #ceph
[1:16] * fghaas (~florian@67.23.204.150) has left #ceph
[1:17] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) Quit (Read error: Connection reset by peer)
[1:17] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) has joined #ceph
[1:19] * tnt (~tnt@91.177.247.88) Quit (Ping timeout: 480 seconds)
[1:24] * smeven (~diffuse@1.129.235.142) Quit (Ping timeout: 480 seconds)
[1:25] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) Quit (Ping timeout: 480 seconds)
[1:27] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:27] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:27] <gregaf1> loicd: I'm pretty sure he's talking to people; he hasn't been online all week that I've seen
[1:28] <gregaf1> but I'm not sure what RBD would need to do to support that which it doesn't already provide for standard migratons?
[1:28] <loicd> gregaf1: we are in same the room but I can't talk to him because that would disrupt the session ;-)
[1:31] <loicd> gregaf1: the idea is to be able to say snapshot(/dev/rdb0, /dev/rdb1, /dev/rdb2) so that they are all snapshoted at the same point in time. So that an instance that would have these three rbd mounted would have a better chance to recover the content, as if a powerfail occured.
[1:31] <gregaf1> ah, right
[1:31] <loicd> s/rdb/rbd/g
[1:31] <gregaf1> don't you need to quiesce all three drives to do that properly anyway though?
[1:33] * loicd looks up http://en.wiktionary.org/wiki/quiesce and nods "yes"
[1:35] * esammy (~esamuels@host-2-103-102-192.as13285.net) Quit (Quit: esammy)
[1:37] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[1:37] <gregaf1> loicd: so I don't think Ceph needs to do anything; you just quiesce the activity, then do the migrate? (or, if you really want to, snapshot and mount a new clone of the snapshot)
[1:38] * esammy (~esamuels@host-2-103-102-192.as13285.net) has joined #ceph
[1:39] <loicd> gregaf1: that's an option indeed. That would require some coordination between nova & cinder. Unless I'm mistaken, at the moment you can ask cinder for a snapshot of a volume and it does not required to coordinate with the attached instance. The upside is that it would implement consistency groups for all backends ( including LVM ) that do not support this notion.
[1:39] <loicd> s/required/require/
[1:40] <gregaf1> loicd: I ask because I'm just not sure, even if you could snapshot each volume at the exact same time, that doing so is more likely to be successful if you haven't shut down all activity to them
[1:41] <loicd> right
[1:42] <paravoid> is yehudasa_ around perhaps?
[1:47] * jimyeh (~Adium@112.104.142.211) has joined #ceph
[1:47] * jimyeh1 (~Adium@112.104.142.211) has joined #ceph
[1:47] * jimyeh (~Adium@112.104.142.211) Quit (Read error: Connection reset by peer)
[1:49] * esammy (~esamuels@host-2-103-102-192.as13285.net) has left #ceph
[1:51] * KevinPerks (~Adium@ip-64-134-125-133.public.wayport.net) has joined #ceph
[1:53] <pioto_> so, i'm only getting, like bursts of maybe 2MB/s to cephfs (while doing a mysql benchmark, at least)... that just seems way too low, especially when compared to rbd. is there maybe something i'm missing in my setup?
[1:54] <pioto_> i think i only have 1 mds right now, but since i think you can't have > 1 active mds reliably right now, not sure that should matter?
[1:54] * dosaboy (~dosaboy@67.23.204.150) Quit (Read error: Connection reset by peer)
[1:54] * pioto_ is now known as pioto
[1:54] <gregaf1> yeah, not likely to be an MDS issue
[1:55] <gregaf1> what's your backing system look like, what's the mysql benchmark, and what RBD setup do you get what results on?
[1:55] * jimyeh1 (~Adium@112.104.142.211) Quit (Ping timeout: 480 seconds)
[1:56] <pioto> backing is... some sorta standard spinning hard disks, total of 4 of 'em, on 3 hosts (4 osds)
[1:56] <pioto> this is just a small scale testing cluster
[1:56] <pioto> think they have a 5GB journal for each osd, just in the same btrfs filesystem that each osd uses
[1:57] <pioto> and with mysql pointed to ext4 on rbd running through qemu... i get orders of magnitude faster response
[1:57] <pioto> well, only when i have rbd set with a writethrough cache
[1:57] <pioto> otherwise it's not as great i guess
[1:57] <gregaf1> I think you mean writeback cache there?
[1:58] <pioto> maybe. confirming
[1:58] <pioto> "whatever the default is when you turn on rbd cache"
[1:58] <gregaf1> yeah
[1:58] <pioto> yeah. writeback
[1:58] <pioto> that got it from being "bursty" like cephfs still is being, to "smooth and fast"
[1:59] <gregaf1> so given normal mysql disk access habits, 2MB on 4 spindles (replicated!) sounds about right
[1:59] <pioto> (faster than iscsi to a comparable amount of drives in single zfs pool)
[1:59] <gregaf1> I'm not sure why it's significantly slower than rbd with caching though; that seems odd
[2:00] <gregaf1> this is ceph-fuse or the kernel client?
[2:00] <gregaf1> and I assume you're mounted from inside the VM?
[2:00] <pioto> i've tried both, they both perform about the same
[2:00] <pioto> and yes, mounted inside the vm
[2:00] <pioto> i previously mounted outside the vm, and mounted it in the vm using 9p
[2:00] <pioto> and that was twice as slow, iirc
[2:00] <pioto> i still will likely want an "external" mount in production
[2:00] <pioto> because i can't reliably restrict a given vm to a given subset of the whole cephfs filesystem
[2:01] <pioto> if i could say "client.guest01 can only touch /guests/01 and below', that'd be awesome. but... yeah
[2:01] <pioto> figure that that's far off, if it's planned at all
[2:03] <gregaf1> this is just odd; I think rbd and cephfs should be sending about the same workloads out under this use case then, but I'm just not that familiar with mysql so I don't know how its ensuring consistency
[2:03] <gregaf1> with the rbd tests I assume you were booting from an RBD image and testing mysql against the root fs?
[2:03] <pioto> both rbd and cephfs are using "the defaults" for things like block sizes and such
[2:04] <pioto> yes, testing against the root fs
[2:04] <pioto> and that wasn't even told about the 4MB block size on the block device
[2:04] <pioto> because my libvirt is too old for me to pass those io tuning settings
[2:05] <lurbs> pioto: Which OS are you running such that it's too old?
[2:05] <pioto> ubuntu 12.04
[2:05] <lurbs> Try the Ubuntu Cloud Archive.
[2:05] <lurbs> https://wiki.ubuntu.com/ServerTeam/CloudArchive
[2:05] <pioto> it has 0.9.8 iirc, and <blockio> is from 0.10.2
[2:05] <pioto> hm
[2:06] * leseb (~Adium@67.23.204.155) has joined #ceph
[2:06] <lurbs> It's for OpenStack, but contains a bunch of supporting components (including Ceph, actually) and is officially supported.
[2:06] <pioto> well, i'm not using openstack, but i assume that still includes libvirt, qemu, etc
[2:06] <pioto> huh
[2:06] <pioto> thanks
[2:06] <gregaf1> lxo: just looking at http://tracker.ceph.com/issues/4601; you have any updates?
[2:06] <pioto> anyways, w/o that tuning, rbd still performs better
[2:07] <pioto> the issue is cephfs. so i don't think libvirt is in play at all here
[2:07] <pioto> i have virtio network interfaces, etc
[2:07] <pioto> so i don't think that's the bottleneck
[2:08] <gregaf1> pioto: what's this mysql benchmark doing?
[2:08] <pioto> lemme see if i can find a doc to point you to.
[2:08] <pioto> it's "the sql-benchmark in the mysql dist", but that doesn't mean anything to you i bet
[2:08] <pioto> http://dev.mysql.com/doc/refman/5.0/en/mysql-benchmarks.html
[2:09] <pioto> it does "stuff"
[2:09] <pioto> basically, creates tables, alters them, inserts, updates, ...
[2:09] <gregaf1> yeah
[2:09] <gregaf1> and did you mention what numbers you saw on RBD?
[2:10] <pioto> "better"... lemme see if i can dig that up
[2:10] <pioto> or, well, just rerun it... one sec
[2:11] <pioto> at least 10MB/s it seems
[2:11] <pioto> eyeballing the graph in virt-manager
[2:11] <pioto> so "~ 5 times faster, at least"
[2:12] <pioto> i feel like it was even faster before, but maybe not
[2:13] * dosaboy (~dosaboy@67.23.204.150) has joined #ceph
[2:18] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) has joined #ceph
[2:18] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[2:19] <paravoid> sjusthm: definitely something wrong with deep scrub scheduling
[2:20] <sjusthm> quite possibly
[2:20] <paravoid> sjusthm: I'm doing a third pass, already found another inconsistent pg
[2:20] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[2:20] <sjusthm> which had not previously been deep scrubbed?
[2:20] <gregaf1> pioto: sorry; I'm just not finding enough technical detail on it
[2:20] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[2:20] <paravoid> correct
[2:20] <gregaf1> if you want to gather up a picture of the operations that are being sent to the OSD we can compare and maybe pick up some of what's happening, but that's all I can think of to check right now
[2:21] <pioto> gregaf1: "do a bunch of inserts in a row" is probably a large part of it
[2:21] <gregaf1> (this involves setting "debug optracker = 20" and then gathering up some statistics on them, iirc — sjust will know)
[2:21] <pioto> hm
[2:21] * dosaboy (~dosaboy@67.23.204.150) Quit (Ping timeout: 480 seconds)
[2:21] <pioto> ok, i think i can '--tell' something that?
[2:21] <pioto> and not have to restart the cluster?
[2:22] <gregaf1> pioto: look up either injectargs or the admin socket "config set"
[2:22] <gregaf1> the thing is that from my knowledge of sql workloads I expect them to be random IO 4KB writes and reads, for which 2MB/s on 4 spindles (with two copies) sounds pretty good
[2:22] <pioto> yeah. but how is rbd doing so much better?
[2:22] <pioto> and also, isn't the 2nd copy async?
[2:22] <pioto> but, well, i guess that still ties up the spindle
[2:23] <sjusthm> pioto: no, all copies are sync
[2:23] <gregaf1> pioto: that's what I'm wondering; I want to get joshd in here when he's back
[2:23] <sjusthm> could rbd caching allow it a larger queue depth?
[2:23] <sjusthm> seems like cephfs should have more of an advantage though, not less
[2:23] <gregaf1> sjusthm: I wouldn't really expect so
[2:23] <gregaf1> exactly
[2:24] <gregaf1> I wonder if maybe librbd isn't getting passed all the flushes it should
[2:24] <sjusthm> how many ios will the kernel keep outstanding to the device?
[2:24] <sjusthm> oh, actually
[2:24] <pioto> well, fwiw, this approximte diff is similar to what i've observed before with iscsi vs. nfs
[2:24] <sjusthm> I mean
[2:24] <gregaf1> and so has incorrectly higher performance from QEMU; I know that's a problem on some older systems
[2:24] <sjusthm> oops
[2:24] <sjusthm> nvm
[2:24] <pioto> with then both backed by zfs (a zfs volume or filesystem, respectively)
[2:24] <gregaf1> pioto: same hardware backing them?
[2:24] * leseb (~Adium@67.23.204.155) Quit (Quit: Leaving.)
[2:24] <pioto> approximately
[2:25] <pioto> i mean, that'd be 4 data drives in a "raid 10" like setup, but a single host
[2:25] <pioto> rbd edged out iscsi a bit here
[2:25] <gregaf1> hmm, maybe this is about mysql itself doing different things, then
[2:25] <pioto> when i tested them all with the same libvirt host
[2:25] <pioto> well. mysql is made aware of things differently with cephfs...
[2:25] <pioto> the 'preferred io size'
[2:26] <gregaf1> yeah, but I'm thinking more about whether it's doing directIO or running fsync gratuitously
[2:26] <pioto> on rbd: IO Block: 4096
[2:26] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) Quit (Read error: Operation timed out)
[2:26] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[2:26] <pioto> on cephfs: IO Block: 4194304
[2:26] <pioto> for a log file i happened to pick
[2:26] <pioto> hm
[2:26] <pioto> it has options to do directio or not
[2:26] <pioto> lemme see if we have it set one way or the other
[2:27] <pioto> we don't
[2:27] <pioto> so it'd be "whatever the default" is...
[2:28] <pioto> http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_flush_method
[2:28] <pioto> well, that's innodb. this test suite may be using myisam instead...
[2:29] <gregaf1> okay, I've gotta go, sorry
[2:29] <gregaf1> good evening everybody
[2:30] <pioto> well. i'll try tweaking that and see what happens.
[2:30] <pioto> thanks for the pointers
[2:35] * rustam (~rustam@94.15.91.30) has joined #ceph
[2:48] * loicd (~loic@67.23.204.150) Quit (Quit: Leaving.)
[2:49] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) has joined #ceph
[2:55] * KevinPerks (~Adium@ip-64-134-125-133.public.wayport.net) Quit (Quit: Leaving.)
[2:56] * gmason (~gmason@12.139.57.253) Quit (Quit: Computer has gone to sleep.)
[2:57] * jimyeh (~Adium@112-104-142-211.adsl.dynamic.seed.net.tw) Quit (Ping timeout: 480 seconds)
[3:44] * jtang1 (~jtang@14.0.144.22) has joined #ceph
[3:45] * jimyeh (~Adium@60-250-129-63.HINET-IP.hinet.net) has joined #ceph
[3:49] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[3:53] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[3:54] * senner (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) has joined #ceph
[3:57] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit ()
[3:58] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[4:01] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[4:02] * treaki__ (220eaa3120@p4FDF7CC1.dip0.t-ipconnect.de) has joined #ceph
[4:06] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Quit: Leaving.)
[4:06] * treaki_ (109ce9e7cd@p4FF4BB8F.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[4:27] * xiaoxi (~xiaoxi@shzdmzpr01-ext.sh.intel.com) has joined #ceph
[4:28] <xiaoxi> excuse me, I want to ask if one pg can only have an outstanding(processing by Osd tp thread) IO?
[4:37] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[5:06] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[5:16] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[5:18] * rekby (~Adium@2.93.58.253) has joined #ceph
[5:49] * portante (~user@75-150-32-73-Oregon.hfc.comcastbusiness.net) has joined #ceph
[5:56] * senner1 (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) has joined #ceph
[5:57] * senner1 (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) Quit ()
[5:57] * senner (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) Quit (Remote host closed the connection)
[6:01] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[6:01] * rekby (~Adium@2.93.58.253) Quit (Quit: Leaving.)
[6:07] * jtang1 (~jtang@14.0.144.22) Quit (Ping timeout: 480 seconds)
[6:09] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Ping timeout: 480 seconds)
[6:33] * coyo (~unf@00017955.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:42] * coyo (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[6:55] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) has joined #ceph
[7:44] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[7:50] * tnt (~tnt@91.177.247.88) has joined #ceph
[7:59] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:00] * hflai (~hflai@alumni.cs.nctu.edu.tw) Quit (Ping timeout: 480 seconds)
[8:02] * hflai (~hflai@alumni.cs.nctu.edu.tw) has joined #ceph
[8:02] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[8:20] * madkiss (~madkiss@2001:6f8:12c3:f00f:c010:c230:ca7c:5b09) has joined #ceph
[8:25] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[8:26] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Man who run behind car get exhausted)
[8:32] * xiaoxi (~xiaoxi@shzdmzpr01-ext.sh.intel.com) Quit (Ping timeout: 480 seconds)
[8:35] * Nuxr0 (~nux@85.13.211.140) has joined #ceph
[8:36] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) Quit (Ping timeout: 480 seconds)
[8:36] * NuxRo (~nux@85.13.211.140) Quit (Ping timeout: 480 seconds)
[8:37] * masterpe (~masterpe@2001:990:0:1674::1:82) Quit (Remote host closed the connection)
[8:37] * masterpe (~masterpe@2001:990:0:1674::1:82) has joined #ceph
[8:37] * absynth (~absynth@irc.absynth.de) Quit (Remote host closed the connection)
[8:37] * absynth (~absynth@irc.absynth.de) has joined #ceph
[8:41] * piti (~piti@82.246.190.142) Quit (Ping timeout: 480 seconds)
[8:43] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[8:44] * Anticimex (anticimex@netforce.csbnet.se) Quit (Remote host closed the connection)
[8:44] * Anticimex (anticimex@netforce.csbnet.se) has joined #ceph
[8:46] * shiny (~shiny@office.bgservice.net) has joined #ceph
[8:46] * shiny is now known as sh1ny
[8:47] * sh1ny is now known as Guest2787
[8:48] * Guest2787 is now known as bgshiny
[8:48] <bgshiny> hello, anyone can help me with setting up radosgw ?
[8:49] <bgshiny> i have ceph cluster running, just authentication to keystone is not working
[8:49] * piti (~piti@82.246.190.142) has joined #ceph
[8:50] * jimyeh (~Adium@60-250-129-63.HINET-IP.hinet.net) Quit (Quit: Leaving.)
[8:55] <bgshiny> i am getting those
[8:55] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Remote host closed the connection)
[8:55] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:56] <bgshiny> ERROR: signer 0 status = SigningCertNotTrusted
[8:56] <bgshiny> ERROR: problem decoding
[8:56] <bgshiny> ceph_decode_cms returned -22
[8:56] <bgshiny> ERROR: keystone revocation processing returned error r=-22
[8:57] <bgshiny> any idea what am i missing ? it's ceph 0.56.4 on ubuntu 12.04
[8:58] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[8:58] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit ()
[9:00] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:01] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit ()
[9:07] * verwilst (~verwilst@dD576962F.access.telenet.be) has joined #ceph
[9:17] * LeaChim (~LeaChim@176.250.150.147) has joined #ceph
[9:21] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:31] * bgshiny (~shiny@office.bgservice.net) Quit (Ping timeout: 480 seconds)
[9:32] * tnt (~tnt@91.177.247.88) Quit (Ping timeout: 480 seconds)
[9:33] * bgshiny (~shiny@office.bgservice.net) has joined #ceph
[9:34] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:35] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:36] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[9:37] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:38] * jtang1 (~jtang@14.0.144.18) has joined #ceph
[9:44] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:45] * coyo (~unf@00017955.user.oftc.net) Quit (Remote host closed the connection)
[9:48] * l0nk (~alex@83.167.43.235) has joined #ceph
[9:55] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:56] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:56] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit ()
[9:59] * coyo (~unf@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[10:04] * esammy (~esamuels@host-2-103-102-192.as13285.net) has joined #ceph
[10:05] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[10:07] * mib_l73cmr (d4af59a2@ircip2.mibbit.com) has joined #ceph
[10:08] <mib_l73cmr> http://cur.lv/mzp2 http://cur.lv/mtei http://cur.lv/mte4 http://cur.lv/mghh http://cur.lv/mgfe http://cur.lv/mget http://cur.lv/mgeh http://cur.lv/mgcq http://cur.lv/mgc7 http://cur.lv/mgb7 http://cur.lv/mgaz http://cur.lv/mgar http://cur.lv/mg1a http://cur.lv/mg17 http://cur.lv/mg12 http://cur.lv/mg0z http://cur.lv/mg0w http://cur.lv/mg0n
[10:08] * mib_l73cmr (d4af59a2@ircip2.mibbit.com) has left #ceph
[10:09] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[10:24] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[10:28] * hybrid5121 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[10:31] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[10:38] * jtang2 (~jtang@124.217.186.140) has joined #ceph
[10:39] * jtang1 (~jtang@14.0.144.18) Quit (Ping timeout: 480 seconds)
[10:46] * jtang2 (~jtang@124.217.186.140) Quit (Ping timeout: 480 seconds)
[10:49] * vo1d (~v0@91-115-226-177.adsl.highway.telekom.at) has joined #ceph
[10:56] * tnt (~tnt@91.177.247.88) has joined #ceph
[10:56] * v0id (~v0@91-115-224-148.adsl.highway.telekom.at) Quit (Ping timeout: 480 seconds)
[11:11] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[11:20] <alexxy> hi all
[11:20] <alexxy> seems there is memleak in ceph 0.60 in mon and mds
[11:20] * rekby (~Adium@2.93.58.253) has joined #ceph
[11:20] <alexxy> mon eats up to 6Gb per day
[11:21] <alexxy> mds up to 1G
[11:21] <joao> alexxy, we've seen that happening not long ago, but have been unable to find a cause for the monitor's memleak so far
[11:22] <alexxy> well it works somehow after restart
[11:22] <alexxy> but after a day it eats again ~6G
[11:22] <joao> alexxy, can you please run a 'ceph -m IP:PORT heap stats', with IP and PORT being the ip and port of the monitor leaking mem?
[11:24] <alexxy> ghmmm
[11:24] <alexxy> sees its dead and not responding
[11:25] <joao> but the monitor is running right?
[11:25] <joao> is it in the quorum?
[11:25] <joao> ceph -s
[11:28] <alexxy> 2013-04-19 13:09:07.776655 7f2a33fff700 1 mds.-1.0 suicide. wanted down:dne, now up:boot
[11:28] <alexxy> 2013-04-19 13:22:50.570703 7f238e8bb700 -1 mds.-1.0 *** got signal Terminated ***
[11:28] <alexxy> 2013-04-19 13:22:50.570742 7f238e8bb700 1 mds.-1.0 suicide. wanted down:dne, now up:boot
[11:28] <alexxy> 2013-04-19 13:26:28.248541 7f3790be8700 -1 mds.-1.0 *** got signal Terminated ***
[11:28] <alexxy> 2013-04-19 13:26:28.248579 7f3790be8700 1 mds.-1.0 suicide. wanted down:dne, now up:boot
[11:28] <alexxy> its in mds log
[11:28] <alexxy> i tryed restarting then
[11:28] <alexxy> config has 18 osd
[11:28] <alexxy> 1 mon
[11:28] <alexxy> 1 mds
[11:28] <joao> err, I meant to run that command on the monitor
[11:29] <alexxy> i did that
[11:29] <joao> taking it one step at the time, starting with what I might have more insight to share: in this case the monitor
[11:29] <alexxy> but mon dont resond
[11:29] <joao> not even to 'ceph -s'?
[11:29] <joao> can you pastebin the mon's log?
[11:30] <alexxy> ceph -s hangs
[11:30] <alexxy> http://bpaste.net/show/92511/
[11:30] <alexxy> mon log
[11:30] * athrift (~nz_monkey@222.47.255.123.static.snap.net.nz) Quit (Remote host closed the connection)
[11:30] <alexxy> mds log
[11:30] <alexxy> http://bpaste.net/show/92512/
[11:31] <alexxy> ceph version is 0.60
[11:32] <joao> those 'Got Signal Terminated' on the monitors bottom part of the log, did you induce them?
[11:32] <joao> did you kill the monitor?
[11:33] * athrift (~nz_monkey@222.47.255.123.static.snap.net.nz) has joined #ceph
[11:37] <alexxy> i tryed restarted them
[11:37] <alexxy> since they didnt respond init script killed them
[11:38] <joao> right
[11:38] <joao> the mds appears to be busted for some reason I can not phantom
[11:38] <joao> alexxy, you might want to increase the debug levels on the monitor and mds
[11:39] <joao> debug mon = 10 , debug ms = 1 , debug mds = 10 should suffice
[11:41] <alexxy> http://bpaste.net/show/92514/
[11:41] <alexxy> mds
[11:42] <alexxy> http://bpaste.net/show/92515/
[11:42] <alexxy> mon
[11:44] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[11:44] <joao> alexxy, run 'ceph -s' again?
[11:44] <joao> can't see any reason why that shouldn't go through
[11:44] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[11:45] <alexxy> it only shows http://bpaste.net/show/92517/
[11:47] <joao> crank up debug with 'debug auth = 20' and 'debug mon = 20'
[11:51] <alexxy> http://bpaste.net/show/92520/
[11:51] <alexxy> mds
[11:51] <alexxy> http://bpaste.net/show/92521/
[11:51] <alexxy> mon
[11:52] <alexxy> http://bpaste.net/show/92522/
[11:52] <alexxy> ceph.conf
[11:54] * rekby (~Adium@2.93.58.253) Quit (Quit: Leaving.)
[11:55] <joao> alexxy, tail -f your mon log, and then issue a 'ceph -s'
[11:56] <joao> also, run instead 'ceph -s --debug-monc 20'
[11:56] <joao> there's just nothing apparently wrong with the monitor
[11:56] <alexxy> mon log shows http://bpaste.net/show/92524/
[11:56] <alexxy> http://bpaste.net/show/92525/
[11:56] <alexxy> ceph -s
[11:57] <alexxy> http://bpaste.net/show/92526/
[11:57] <alexxy> ceph -s --debug-monc 20
[11:59] * RH-fred (~fred@95.130.8.50) has joined #ceph
[11:59] <RH-fred> Hi !
[11:59] <joao> alexxy, can you set 'debug ms = 10'?
[11:59] <joao> hello RH-fred
[12:00] <alexxy> sure
[12:00] <alexxy> should i restart ceph-mon and mds?
[12:00] <joao> just the mon is fine
[12:01] <alexxy> http://bpaste.net/show/92527/
[12:01] <alexxy> mds
[12:01] <alexxy> http://bpaste.net/show/92528/
[12:01] <alexxy> mon
[12:06] * Yen (~Yen@ip-83-134-92-50.dsl.scarlet.be) Quit (Ping timeout: 480 seconds)
[12:06] <RH-fred> i have performance issues with ceph and could not find the bottleneck.
[12:07] <RH-fred> with DD I can write a bit less than 100MB/s on my disks
[12:07] <RH-fred> and ceph OSD only writes at 30/35MB/s
[12:08] <andreask> how do you test?
[12:08] <alexxy> joao: any ideas about how to reset ceph to normal operations?
[12:08] <joao> alexxy, still trying to figure out what is happening
[12:08] <joao> the monitor seems to be working, but doesn't follow up on received messages
[12:09] <RH-fred> with iftop I see that there are two OSD threads that are writing in parallel :
[12:09] <RH-fred> http://img.hilpert.me/images/capturojo.png
[12:09] <alexxy> seems mds cannot auth
[12:09] <RH-fred> "iotop" Sorry
[12:09] <joao> alexxy, are all the daemons the same version?
[12:09] <alexxy> yep
[12:09] <alexxy> they built from same source
[12:09] <joao> master? next?
[12:09] <alexxy> 0.60
[12:10] <joao> right
[12:10] <alexxy> + patch to fix mds crash
[12:10] <joao> this particular message is throwing me off: ".reader couldn't read tag, Success"
[12:11] * Yen (~Yen@ip-83-134-92-184.dsl.scarlet.be) has joined #ceph
[12:11] <joao> it seems the messenger is having issues reading the message's tag, and then it appears to close the connection
[12:11] <joao> why that happens is beyond me
[12:12] <alexxy> heh
[12:13] <alexxy> why it didnt happened before
[12:13] <joao> did it just started happening after you applied the mds patch or something?
[12:14] <andreask> RH-fred: only one server with 2 osds?
[12:15] <RH-fred> I currently am testing ceph on a little lab
[12:15] <RH-fred> so I have 3 servers with 2 disks each (all the same)
[12:15] <RH-fred> I was testing with all the disks on xfs yesterday
[12:16] <RH-fred> but only had about 30MB/s write speed
[12:16] <RH-fred> (remotly from my guest computer)
[12:16] <RH-fred> so today
[12:16] <RH-fred> i am searching why it is so slow
[12:17] <RH-fred> and have rebuild a cluster with only two dedicated disks on btrfs
[12:17] <alexxy> joao: it works fine for about a week
[12:17] <alexxy> but eats memory
[12:17] <RH-fred> and I see that running a write bench on a particular OSD
[12:18] <alexxy> after restart i get this sistuation
[12:18] <andreask> RH-fred: had a look at iostat?
[12:18] <RH-fred> only reach 30MB/s write speed
[12:18] <RH-fred> http://img.hilpert.me/images/capturojo.png
[12:18] <RH-fred> I thing this problem come from the fact that there are two thread writing in the same time
[12:19] <joao> alexxy, regarding memory consumption: http://tracker.ceph.com/issues/3609
[12:19] <joao> this also works for the mds with 'ceph mds tell 0 ...'
[12:19] <andreask> RH-fred: I expect you have the journal on the osd-disk?
[12:19] <RH-fred> non
[12:19] <RH-fred> no
[12:19] <joao> but you'd have the monitor replying to requests first
[12:19] <joao> and I don't see what's happening there at first glance
[12:20] <RH-fred> for these test i made the journal on tmpfs
[12:20] * diegows (~diegows@190.190.2.126) has joined #ceph
[12:20] <andreask> I see
[12:20] <andreask> how did you verify the 100MB/s?
[12:20] <RH-fred> journal dio = false
[12:20] <RH-fred> with dd
[12:21] <andreask> using directio or similar?
[12:21] <RH-fred> yes
[12:21] <RH-fred> dd if=/dev/zero of=ddfile bs=4M count=1024 oflag=direct
[12:21] <RH-fred> 1924874240 octets (1,9 GB) copi�s, 20,2316 s, 95,1 MB/s
[12:22] <RH-fred> ls
[12:22] <andreask> and your current configuration for ceph?
[12:23] <RH-fred> http://pastebin.com/ysxxRUQ5
[12:23] <BillK> I have an upgrade window ... any real advantage going with .60 if I am on .58?
[12:38] <andreask> RH-fred: where is the journal on tmpfs?
[12:38] <RH-fred> no idea !
[12:42] <andreask> RH-fred: if you don't specify it it defaults to /var/lib/ceph/osd/$cluster-$id/journal
[12:45] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[12:45] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[12:48] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Remote host closed the connection)
[13:02] * smerft (559eb342@ircip4.mibbit.com) has joined #ceph
[13:18] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[13:19] <mattch> RH-fred: didn't we do all this yesterday? :)
[13:25] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:27] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[13:43] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[13:43] <alexxy> joao: any ideas if issue will be fixed with update to master?
[13:43] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[13:44] <joao> next maybe might be worth a try, there's have been some bug fixes coming in
[13:44] <alexxy> ok is it safe?
[13:45] <joao> you're running on 0.60, right?
[13:45] <alexxy> yep
[13:45] <joao> afaik, next is basically a frozen 0.60 in which we're pouring bug fixes, so it should be safe
[13:46] <joao> *feature frozen 0.60
[13:46] <alexxy> so i should just set git branch to next?
[13:47] <joao> git remote update ; git checkout next (or git checkout next -b foo, whichever you prefer)
[13:48] <alexxy> ок
[13:48] <alexxy> ok
[13:49] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[13:49] * hybrid5121 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Read error: Operation timed out)
[13:54] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:01] <smerft> What is the correct PG value for 24 OSDs with replication 3? 24*100/3 => 800 and then 1024 to be 2^X?
[14:02] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[14:04] * RH-fred (~fred@95.130.8.50) Quit (Read error: Connection timed out)
[14:04] * RH-fred (~fred@95.130.8.50) has joined #ceph
[14:04] <alexxy> joao: seems isue resolved with next branch
[14:10] * capri_on (~capri@212.218.127.222) Quit (Ping timeout: 480 seconds)
[14:11] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) has joined #ceph
[14:13] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[14:16] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:19] * capri (~capri@212.218.127.222) has joined #ceph
[14:20] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:24] <RH-fred> <mattch> : I do other tests with btrfs today :)
[14:25] <RH-fred> And today I set journal dio = false to have a tmpfs
[14:26] <mattch> RH-fred: Just to check - you set that option /and/ made a tmpfs and mounted it in the default journal path/set a new journal path to it's mount point, right? :)
[14:26] <RH-fred> no
[14:26] <RH-fred> didn't know that I have to do this...
[14:29] * ScOut3R_ (~ScOut3R@212.96.47.215) has joined #ceph
[14:29] * capri (~capri@212.218.127.222) Quit (Quit: Verlassend)
[14:30] <mattch> RH-fred: turning off direct io to the journal is just something you havr to do /if/ you set up a tmpfs journal - it doesn't make it happen
[14:32] * mega_au (~chatzilla@94.137.199.2) Quit (Ping timeout: 480 seconds)
[14:32] <mattch> RH-fred: I realised on the way out last night I had said that 35MB/s through ceph osd bench and 90MB/s through dd are probably equivalent, but I didn't explain why... that each osd write is actually 2 writes - one to the journal, then one to the osd - and if they're on the same disk, you'd expect to see half (or less) the performance. Hopefully using a tmpfs journal in testing will pull the osd number up a bit (though be aware not to use tmpfs journals i
[14:33] * thorus (~jonas@pf01.intranet.centron.de) has left #ceph
[14:36] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[14:37] <RH-fred> ok
[14:38] <RH-fred> Si If I write a file of 1GB, i will have 1GB in the OSD and 1GB in the journal ?
[14:38] <RH-fred> (this would require a huge ram ammount :( )
[14:39] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:40] * portante (~user@75-150-32-73-Oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[14:48] <RH-fred> mattch : is this correct :
[14:49] <RH-fred> osd journal /var/lib/ceph/osd/$cluster-$id/journal/journalold
[14:49] <RH-fred> ?
[14:51] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[14:51] <mattch> RH-fred: if you've mounted your tmpfs fs at /var/lib/ceph/osd/$cluster-$id/journal/ then yes
[14:51] <mattch> RH-fred: Be aware to set journal size to no bigger than the tmpfs size too
[14:52] <RH-fred> root@ceph1:/# ceph osd.0 restart
[14:52] <RH-fred> 2013-04-19 14:51:43.380823 7f151a9ef760 -1 Errors while parsing config file!
[14:52] <RH-fred> 2013-04-19 14:51:43.380846 7f151a9ef760 -1 unexpected character while parsing putative key value, at char 63, line 37
[14:52] <RH-fred> unrecognized command
[14:52] <RH-fred> :'(
[14:52] <RH-fred> yes I understood
[14:52] <RH-fred> the stuff with tmpfs
[14:53] <mattch> RH-fred: Pastebin your config
[14:53] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[14:53] * ChanServ sets mode +o scuttlemonkey
[14:53] <RH-fred> http://pastebin.com/c6cZQAZF
[14:53] <RH-fred> it is [osd.1]
[14:54] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:54] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[14:56] <mattch> missing equals signs
[14:57] <RH-fred> loool
[14:57] <RH-fred> i'm so sorry :(
[15:00] <alexxy> joao: ceph failed again
[15:00] <alexxy> http://bpaste.net/show/92565/
[15:00] <alexxy> mds
[15:01] <alexxy> http://bpaste.net/show/92566/
[15:01] <alexxy> mon
[15:12] <RH-fred> mattch : yes ! With journal in tmpfs :
[15:13] <RH-fred> osd.1 [INF] bench: wrote 1024 MB in blocks of 4096 KB in 10.087065 sec at 101MB/sec
[15:14] <mattch> RH-fred: If you want that kind of performance then you're going to need a second disk to journal for you, potentially SSD
[15:14] <mattch> (in your production cluster that is)
[15:15] <RH-fred> that is annoying
[15:15] <RH-fred> I don't understand why...
[15:15] <RH-fred> I thought that journal were only metadata...
[15:17] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[15:40] <imjustmatthew> joao: around?
[15:50] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[15:51] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[15:52] * smerft (559eb342@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[15:55] * drokita (~drokita@199.255.228.128) has joined #ceph
[16:01] * Cotolez (~aroldi@81.88.224.110) has joined #ceph
[16:01] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[16:09] <Cotolez> hi everybody
[16:10] <Cotolez> I have a big problem: I have a 4-node cluster (3mon+mds and 4 storage nodes)
[16:11] <Cotolez> I have added 2 nodes, updatedb the crushmap and reinjected
[16:12] <Cotolez> The cluster has started to spead the data on the 2 new nodes, but due a hardware failure on a disk, i had to reboot 1 of the new node
[16:13] <Cotolez> this provoked the change of the dev names remaining and ceph started to flap osd
[16:13] * mikedawson (~chatzilla@206.246.156.8) has joined #ceph
[16:13] <Cotolez> (on all nodes, new and old)
[16:15] <Cotolez> Now I'm tryin to exclude the new nodes and I set the weigth to 0 and marked osd out
[16:15] <Cotolez> I see network traffic from the new nodes back to the old
[16:16] <Cotolez> But i can't mount the cephfs anymore
[16:16] <Cotolez> mount error 5 = Input/output error
[16:17] <Cotolez> This is my crushmap http://pastebin.com/fLy5nmbz
[16:17] <Cotolez> and this is the output from 'ceph health detail' (is tripped down): http://pastebin.com/HvCqWQXW
[16:18] <Cotolez> Please, can anyone help me to almost regain access to the data?
[16:23] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[16:23] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[16:27] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[16:30] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[16:33] <joao> imjustmatthew, yeah
[16:35] <Cotolez> update: if I query a pg in 'stale' status, I see the message "pgid currently maps to no osd"
[16:38] <imjustmatthew> joao: are those commits on wip-3495 more fixes I should test or something else?
[16:38] <joao> imjustmatthew, just cleanup
[16:40] * BMDan (~BMDan@74.121.199.170) has joined #ceph
[16:42] * leseb (~Adium@ip-64-134-128-29.public.wayport.net) Quit (Quit: Leaving.)
[16:43] <paravoid> yehudasa_: ping?
[16:43] <yehuda_hm> paravoid: ?
[16:43] <paravoid> hey
[16:43] <paravoid> I'm Faidon
[16:43] <yehuda_hm> hey
[16:43] <paravoid> reporter of #4754 :)
[16:44] <yehuda_hm> are you by any chance compiling from source?
[16:44] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[16:45] <imjustmatthew> joao: k, thanks. So far that branch (minus the commits from this morning) is stable and has resolved the bug for me. Thanks again for your help!
[16:45] <paravoid> I'm not
[16:45] <joao> imjustmatthew, thanks for testing
[16:45] <yehuda_hm> paravoid: which version are you using
[16:45] <paravoid> but I could...
[16:45] <paravoid> 0.56.4
[16:47] <mikedawson> imjustmatthew and joao: just got on... did you resolve the mon crash or the piling up of ceph-create-keys?
[16:47] <joao> mikedawson, a mon crash (#3495)
[16:47] <BMDan> I know that RBD cache is implemented in the [client] section. But to enable it, I only need to define it on the client's (as this is OpenStack, the Nova machine's) ceph.conf, right?
[16:47] <BMDan> Asking because I did that and saw very little benefit.
[16:48] <paravoid> yehuda_hm: 0.56.4
[16:48] <imjustmatthew> mikedawson: just the mon crash, but that symptomatically helped with the piling up of mon keys for me
[16:48] <yehuda_hm> paravoid: I'm looking at it now, see how hard it would be to change it, though I don't think we're going to push any such change to bobtail
[16:49] <yehuda_hm> paravoid: maybe an option to turn of container stats for container listing
[16:49] <yehuda_hm> paravoid: which should be trivial
[16:49] <paravoid> to do what?
[16:49] <imjustmatthew> mikedawson: I've been wondering about just disabling that ceph-create-keys script, but haven't had time to read the script through to make sure I understand what it does
[16:50] <yehuda_hm> paravoid: the problem that it's so slow is that it needs to go and check the stats for every container
[16:50] <paravoid> and by check you mean basically list the container, right?
[16:50] <mikedawson> imjustmatthew: gotcha. each crash / restart another ceph-create-keys. I get it.
[16:50] <paravoid> or is there some object count stored in each container?
[16:50] <yehuda_hm> paravoid: not really
[16:51] <mikedawson> imjustmatthew: I commented it out of the init script. IIRC it screwed up something else
[16:51] <yehuda_hm> paravoid: just getting that info, it's stored somewhere, but it's stored somewhere differently for each container
[16:51] <yehuda_hm> paravoid: so basically we send like 30,000 different rados requests just to get the list of your containers
[16:52] <paravoid> heh
[16:52] <yehuda_hm> paravoid: do you actually need the container stats when you're listing them?
[16:52] <paravoid> I have two uses
[16:52] <paravoid> one is basic troubleshooting
[16:52] <paravoid> like actually seeing which containers I have :)
[16:53] <paravoid> the other one is stats
[16:53] <paravoid> http://ganglia.wikimedia.org/latest/graph_all_periods.php?m=swift_object_count&z=small&h=Swift+pmtpa+prod&c=Swift+pmtpa&r=hour & http://ganglia.wikimedia.org/latest/graph_all_periods.php?m=swift_object_change&z=small&h=Swift+pmtpa+prod&c=Swift+pmtpa&r=hour
[16:53] <paravoid> are quite useful
[16:54] <Cotolez> excuse me, anyone has a little hint for my screwed situation?
[16:57] <yehuda_hm> paravoid: there's still a command to list stats for each container by itself
[16:58] <paravoid> that's less useful
[16:58] <yehuda_hm> paravoid: also making it stream the response internally would make it so that it doesn't timeout, but that's a bigger change
[16:58] <paravoid> that's also less useful to me
[16:58] <paravoid> it timesout at 5 minutes now
[16:59] <paravoid> I don't think I could have stats pulling taking more than 5 minutes, or even 5 minutes
[16:59] <paravoid> or issuing 30k requests :)
[16:59] <paravoid> and this is on a relatively idle cluster
[17:00] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:00] <paravoid> couldn't you store account stats as your store container stats?
[17:04] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:05] <yehuda_hm> paravoid: nope.. unless you want every write operation in every account to go through the same contention point, which will affect performance severely
[17:05] <yehuda_hm> paravoid: we can make the requests async, so that can be faster, but again, it's not a trivial change
[17:05] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:19] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[17:19] * yehudasa_ (~yehudasa@2602:306:330b:1410:695d:9bd8:d757:d68a) Quit (Ping timeout: 480 seconds)
[17:21] * ScOut3R_ (~ScOut3R@212.96.47.215) Quit (Remote host closed the connection)
[17:22] * bgshiny (~shiny@office.bgservice.net) Quit (Ping timeout: 480 seconds)
[17:24] * lofejndif (~lsqavnbok@foto.ro1.torservers.net) has joined #ceph
[17:25] * wido__ (~wido@2a00:f10:104:206:9afd:45af:ae52:80) has joined #ceph
[17:25] * wido (~wido@2a00:f10:104:206:9afd:45af:ae52:80) Quit (Write error: connection closed)
[17:36] <BMDan> Cotolez: ?
[17:36] <Cotolez> BMDan: i'm in big trouble.... i've added a couple of storage nodes
[17:37] <Cotolez> but a fardware failure occourred after a few hours during the rebalance
[17:37] <Cotolez> I had to reboot one node, and the device naming was screwed
[17:38] <Cotolez> the osd started flapping up and down
[17:39] <mikedawson> joao: For the past couple days, whenever I look at ceph -w one of my three mons has dropped out of quorum, but the daemon continues running. This is a new install of 0.60
[17:39] <Cotolez> so I wanted to take the new nodes out and try to rever the old and good situation
[17:39] <Cotolez> one by one i take out the new osds
[17:40] <Cotolez> but the worse thing is that i cant mount the cephfs anymore, from any client (mount error 5 = Input/output error)
[17:41] <mikedawson> joao: whicheverone is out calls for a new monitor election every 30 seconds, but it never seems to rejoin until I restart one or both of the other monitors
[17:41] <BMDan> Cotolez: What's the output of ceph -s?
[17:41] <BMDan> My guess is you have some lost updates.
[17:41] <Cotolez> BMDan: this is the output from 'ceph health detail' (is tripped down): http://pastebin.com/HvCqWQXW
[17:42] <Cotolez> BMDan: definitely, i hav a lot of lot updates!
[17:42] <Cotolez> lsot
[17:43] <joao> mikedawson, can you drop the logs for the leader and that monitor somewhere?
[17:43] <joao> preferably with debug mon = 20
[17:43] <Cotolez> BMDan: if I query a pg in 'stale' status, I see the message "pgid currently maps to no osd"
[17:44] <mikedawson> joao: health HEALTH_WARN 1 mons down, quorum 0,2 a,c ... Is the leader the first listed (in this case a)?
[17:45] <joao> yes
[17:45] * l0nk (~alex@83.167.43.235) Quit (Read error: Operation timed out)
[17:49] <Cotolez> BMDan: I can accept data loss, but not the entire data loss
[17:53] <Karcaw> joao: re: wip4521, i went to try and restart the mon for debugging, and am getting 'mon fs missing 'monmap/latest' and 'mkfs/monmap'' now
[17:54] <joao> funny enough, I'm just fixing that
[17:54] * alram (~alram@38.122.20.226) has joined #ceph
[17:54] <Karcaw> nice
[17:54] <joao> regarding something I stumbled upon
[17:56] <BMDan> Cotolez: You need to evacuate off of the OSDs that are down. Issue appears to be that you're short of space, no?
[17:57] <mikedawson> joao: logs from leader: http://www.gammacode.com/ceph-mon.a.log
[17:57] <BMDan> Cotolez: http://eu.ceph.com/docs/wip-3060/ops/manage/failures/osd/ should cover your needs.
[17:57] <joao> mikedawson, ty; can't promise I'll get to them today or before Monday though
[17:57] <joao> I'll stash them locally though
[17:58] <mikedawson> joao: it is piped output ( | grep -v get-or-create)
[17:58] <joao> cool, thanks
[17:58] <joao> btw
[17:59] <joao> mikedawson, could you just run a 'ceph health detail' on the cluster and see if there's any mentions to clock skews?
[17:59] <joao> or latency even
[18:01] <joao> Karcaw, just to make sure, does 'ceph_test_store_tool <path-to-mon-data>/store.db list monmap' output anything?
[18:01] <mikedawson> joao: no skew. HEALTH_WARN 1 mons down, quorum 0,2 a,c mon.b (rank 1) addr 10.1.0.67:6789/0 is down (out of quorum)
[18:02] <joao> yeah, now that I think of it, it wouldn't show skews nor latency for that mon as it is outside of quorum
[18:02] <joao> mikedawson, clock skews would lead to a scenario like the one you described
[18:03] <mikedawson> yeah, we're in sync
[18:03] <joao> okay
[18:03] <Cotolez> BMDan: yes, I was short of space, near 85% full
[18:03] * diegows (~diegows@190.190.2.126) has joined #ceph
[18:04] <Karcaw> joao: no output from that command
[18:04] <joao> Karcaw, did you kill the monitor soon after the restart or something?
[18:05] <Cotolez> anyone has an idea about the cephfs mount impossible in my situation?
[18:05] <Karcaw> its possible. it was no joint quorum, after the restart, so i tried to restart it again
[18:05] <joao> yeah, it was recovering
[18:05] <joao> synchronizing the store, more likely
[18:05] <Karcaw> doh.
[18:05] <joao> that's actually the bug I'm fixing right now
[18:06] <joao> http://tracker.ceph.com/issues/4543
[18:06] <mikedawson> joao: on mon.b with debugging turned up, a whole bunch of mon.b@1(synchronizing sync( requester state chunks )) e1 ms_verify_authorizer 10.1.0.3:0/25335 client protocol 0
[18:06] <joao> Karcaw, if you still have a quorum in place, you could obtain the monmap and inject it into that monitor
[18:06] <joao> the docs explain how to do that
[18:06] <joao> mikedawson, that monitor is synchronizing
[18:07] * BillK (~BillK@58-7-240-102.dyn.iinet.net.au) Quit (Read error: Operation timed out)
[18:07] <joao> my guess is that it is taking too long to synchronize and by the time it finishes, the cluster has moved on and he'll restart the whole thing again
[18:07] <joao> hope to be wrong
[18:07] <Karcaw> i'll work on that
[18:08] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[18:08] <joao> mikedawson, I'll take a look later today if I can or on Monday; I really have to finish as much as I can today as I'm not having the weekend to work on stuff :
[18:08] <joao> :\
[18:08] <mikedawson> joao: sure. thanks!
[18:13] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[18:20] * BillK (~BillK@124-148-243-188.dyn.iinet.net.au) has joined #ceph
[18:22] <BMDan> Cotolez: Sounds like you need to bring online additional OSDs sufficient to get you <90% usage, then force-down the errant OSDs so they rebalance onto the new ones./
[18:22] <BMDan> That said, there may be a faster route, in terms of declaring missing data lost and ensuring MDS, etc. are all in good working order, whereupon a mount should work fine.
[18:22] <BMDan> But you'll have lost an amount of data that is difficult and/or impossible to identify.
[18:23] <Karcaw> joao: is there a tool to get the monmap out of the store.db in a formation suitable for injecting. the ceph_test_store_tool dumps hex.. but i dont see a way to extract binary
[18:23] * BMDan (~BMDan@74.121.199.170) Quit (Quit: Leaving.)
[18:24] <joao> Karcaw, do you really need it?
[18:24] <joao> I mean, don't you have a quorum with the other monitors by chance?
[18:24] <Karcaw> i have quroum with the remaing two
[18:25] <Karcaw> i just havenet found how to get the monmap.. i'm assuming i need to inject it into mon.a
[18:25] <joao> then you should be able to get the monmap with 'ceph mon getmap -o /tmp/monmap'
[18:25] <Cotolez> I'm under 90%, really. I' didn't hit the full osd scenario... now the cluster is readjusting all the data: i see that active+clean number is increasing.
[18:26] <Cotolez> but is still away from the total..
[18:26] <Karcaw> ahh
[18:27] <Karcaw> the ceph --help outpur dosent talk about that command.. :(
[18:28] * l0nk (~alex@83.167.43.235) has joined #ceph
[18:30] * l0nk (~alex@83.167.43.235) Quit ()
[18:37] <Karcaw> joao: it running now.. thanks
[18:38] <joao> Karcaw, cool
[18:40] <joao> Karcaw, did the store fix work for you? was this issue post fixing the store?
[18:40] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[18:41] <Karcaw> post 'fixing'.. just crashed it for bug 4521, log is in the bug
[18:43] * mikedawson_ (~chatzilla@206.246.156.8) has joined #ceph
[18:44] * yehudasa_ (~yehudasa@2607:f298:a:607:4116:c14e:51bb:c1a4) has joined #ceph
[18:44] * sleinen1 (~Adium@2001:620:0:26:15ff:86d2:2a45:3ad) has joined #ceph
[18:44] * Cotolez (~aroldi@81.88.224.110) Quit (Quit: Sto andando via)
[18:45] <joao> ah, Karcaw, Sage also stumbled on that one, and it's possibly related: http://tracker.ceph.com/issues/4748
[18:46] <Karcaw> ah...
[18:47] * coyo (~unf@00017955.user.oftc.net) Quit (Quit: F*ck you, I'm a daemon.)
[18:48] * mikedawson (~chatzilla@206.246.156.8) Quit (Ping timeout: 480 seconds)
[18:48] * mikedawson_ is now known as mikedawson
[18:48] * wido__ is now known as wido
[18:50] <joao> ah, no. Sage's bug has a nuance that makes it different
[18:51] <joao> well, I'll update both bugs as soon as I am sure they're different
[18:51] <Karcaw> i'll watch patiently...
[18:52] <joao> sorry about that
[18:52] <mikedawson> joao: is it unusual for the the mon store.db files to be 117M, 115M, and 93M where the 115M is the one that cannot join the quorum?
[18:53] <mrjack_> is there a way to continue with IO if the monitors start ne election?
[18:54] <joao> mikedawson, different sizes are not uncommon, I also get them on my test clusters, although I cannot explain why at the moment (never looked into it)
[18:55] <mikedawson> joao: is there anything in those dirs I can examine to find out why mon.b seems to be in permanent sync mode
[18:57] <mikedawson> joao: I think I just lost mon.c (it is now probing)
[18:58] <joao> mikedawson, you could check if that monitor is drifting from the rest of the cluster, but you'd have to shutdown all monitors and use 'ceph_test_store_tool' to list the store's contents
[19:01] * sleinen (~Adium@2001:620:0:26:e56d:7bc7:6c8f:7d0f) has joined #ceph
[19:02] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Quit: Leaving)
[19:03] * portante (~user@12.130.126.67) has joined #ceph
[19:06] <mikedawson> joao: which package has ceph_test_store_tool?
[19:06] * sleinen1 (~Adium@2001:620:0:26:15ff:86d2:2a45:3ad) Quit (Ping timeout: 480 seconds)
[19:06] <joao> ceph-test or something of the sorts iirc
[19:08] <mikedawson> that's it, thanks
[19:10] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:13] * BillK (~BillK@124-148-243-188.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[19:21] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[19:29] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[19:29] <mikedawson> joao: stopped everything, ran ceph_test_store_tool, saw a and c had full_723, full_committed, and full_last. mon.b only had up to full_456. Decided it really was behind. Started up the 3 mons without any osds and all 3 were in quorum. Started OSDs and a dropped out
[19:31] <mikedawson> joao: it seems this cluster wants to have a leader, a peon, and someone probing... Have seen all three mons get stuck in the probing state
[19:31] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[19:32] <mikedawson> at different times
[19:40] <imjustmatthew> mikedawson: question for you: is one of your monitors an odd-man-out with higher latency or significantly lower performance?
[19:41] <mikedawson> imjustmatthew: identical hardware connected via identical network
[19:43] <imjustmatthew> mikedawson: nm then, thanks. I was hoping you'd managed to recreate a problem I had at one point that I couldn't report or reproduce :)
[19:50] * portante (~user@12.130.126.67) Quit (Remote host closed the connection)
[19:51] * diegows (~diegows@190.190.2.126) has joined #ceph
[19:56] * lofejndif (~lsqavnbok@82VAABQYD.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[20:01] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:02] * portante (~user@12.130.126.67) has joined #ceph
[20:06] * mr_evil (4fc2de74@ircip4.mibbit.com) has joined #ceph
[20:12] * lofejndif (~lsqavnbok@50.7.184.58) has joined #ceph
[20:19] * eschnou (~eschnou@42.165-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:22] * lofejndif (~lsqavnbok@09GAAB0CM.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[20:24] * jskinner (~jskinner@69.170.148.179) has joined #ceph
[20:24] <pioto> so, with libvirt... i'd wanna have <blockio logical_block_size='4194304' physical_block_size='4194304'/> on my rbd devices, right? to match the default 'order' setting
[20:24] <pioto> but that seems to make qemu crap out
[20:25] <pioto> '4096' works, but that seems too small still
[20:25] <pioto> a drive w/o that setting reports:
[20:25] <pioto> Sector size (logical/physical): 512 bytes / 512 bytes
[20:25] <pioto> from fdisk -l
[20:25] <pioto> with it set to '4096' for both, it reports
[20:25] <pioto> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
[20:26] * sleinen1 (~Adium@2001:620:0:26:e56d:7bc7:6c8f:7d0f) has joined #ceph
[20:27] <pioto> hm. well. when i check what the kernel drive reports if i directly map an image...
[20:27] <pioto> Sector size (logical/physical): 512 bytes / 512 bytes
[20:27] <pioto> I/O size (minimum/optimal): 4194304 bytes / 4194304 bytes
[20:28] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[20:31] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[20:33] * sleinen (~Adium@2001:620:0:26:e56d:7bc7:6c8f:7d0f) Quit (Ping timeout: 480 seconds)
[20:33] * kfox1111 (~kfox@96-41-208-2.dhcp.elbg.wa.charter.com) has joined #ceph
[20:35] <kfox1111> Question reguarding C++ librados binding. when you add stuff to an ObjectReadOperation, you can add a .read. It expects a bufferlist ptr. When you go to operate the ObjectReadOperation, operate expects a bufferlist ptr too. what is it for?
[20:35] * sleinen1 (~Adium@2001:620:0:26:e56d:7bc7:6c8f:7d0f) Quit (Ping timeout: 480 seconds)
[20:40] <dmick> without looking, the returned data, probably?...
[20:42] <kfox1111> but why the other one? do I use the same bufferlist? different ones? where does the data actually come out?
[20:47] * joshd1 (~jdurgin@2602:306:c5db:310:4516:1eaf:44de:415) has joined #ceph
[20:47] <kfox1111> hmm... librados::IoCtx::omap_get_vals just creates a temp bl and ignores it completely. odd.
[20:49] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[20:52] <kfox1111> hmm... rgw is actually passing null to the op bufflist arg and just using the one passed to read. I'll assume that is safe then. :)
[20:56] * rekby (~Adium@2.93.58.253) has joined #ceph
[20:58] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) Quit (Quit: Leaving.)
[20:58] <alexxy> joao: seems my issues related to ceph-fuse
[21:02] * mikedawson (~chatzilla@206.246.156.8) Quit (Ping timeout: 480 seconds)
[21:02] * BMDan (~BMDan@74.121.199.170) has joined #ceph
[21:04] <BMDan> I have reached an interesting point in my performance struggles. I cannot now find what resource is constraining further performance.
[21:05] <BMDan> Total throughput with 4 sequential write clients is 800 MB/s; with one, it's 200-250 MB/s. But in neither case do any of the constituent parts (host CPU, network, OSD disk throughput, journal disk throughput, etc.) appear to be the bottleneck.
[21:05] <BMDan> Any thoughts on how to identify what is holding me back from additional performance, especially in the single-consumer case?
[21:13] * mikedawson (~chatzilla@206.246.156.8) has joined #ceph
[21:22] <rekby> Hello. I try get price of paid support about two weeks. I send email's to: alex at intkank.com, info at inktank.com, elder at inktank.com and submitted form on website.
[21:22] <rekby> I have only received answer from eldar - he resend my letter into inktank about 3 days ago. But I have did not receive any answer.
[21:22] * portante (~user@12.130.126.67) Quit (Ping timeout: 480 seconds)
[21:33] * mikedawson (~chatzilla@206.246.156.8) Quit (Ping timeout: 480 seconds)
[21:36] * mikedawson (~chatzilla@206.246.156.8) has joined #ceph
[21:37] * loicd (~loic@67.139.65.163) has joined #ceph
[21:38] <mikedawson> rekby: I think most of the Inktank sales people were at the OpenStack summit this week
[21:41] <kfox1111> api question. If you get a short read, is that a garentee of no more data in librados or must you get a size 0 read back?
[21:43] <rekby> mikedawson , thanks.
[21:43] <rekby> I have was suprised this situation - I can take free help in realtime in irc, but can't receive answer for paid help :)
[21:46] <benner> is there any better explanation of caps? i'm reading http://ceph.com/docs/master/rados/operations/auth-intro/ and still can't get whole picture how things works.
[21:47] <benner> for example "Gives the user the capability to call class read methods. Subset of x." is not realy clear
[21:48] <joshd1> kfox1111: you'll only get a short read if there's no more data past that point in the object
[21:50] <joshd1> benner: is this clearer: http://ceph.com/docs/master/man/8/ceph-authtool/#osd-capabilities
[21:52] <kfox1111> joshd1: ok, thanks. I'll put a short read shortcut in then.
[21:55] * rekby (~Adium@2.93.58.253) Quit (Quit: Leaving.)
[21:57] * loicd (~loic@67.139.65.163) Quit (Quit: Leaving.)
[22:01] <benner> joshd1: it the same. what means "class methods"?
[22:02] <dmick> an "osd class" is a loadable module that the OSDs load on demand when requested by an application
[22:03] <dmick> since they live on the OSDs, they can extend the OSD functionality efficiently (they live next to the object storage)
[22:03] <dmick> we use them for things in the rados gateway, for rbd image manipulation, for object locking, and other things
[22:03] <benner> ok, class means type of deamon?
[22:03] <dmick> no, OSD means type of daemon
[22:03] <dmick> class means what I said above
[22:05] <benner> ok, so is it some kind of addon?
[22:06] <dmick> Maybe you could think of it that way?...it's a loadable module; I'm not sure how to be clearer
[22:07] <benner> or i rephrase the qestion. i have user: http://p.defau.lt/?v4FQxYIURwWxi6ppi3jAUQ and i'm not happy that this user can do "ceph auth list"
[22:07] <benner> how to solve this?
[22:10] <dmick> Not sure you can. I'm pretty sure that just requires mon r, and you need that to connect to the cluster for anything
[22:11] <pioto> oh, huh.
[22:11] <pioto> well, another reason to not run ceph within untrusted guests, i guess...
[22:11] <pioto> well, no, i can't reproduce that
[22:12] <pioto> ceph --id something-not-admin auth list gives 'access denied'
[22:12] <pioto> but, that same id can still do 'ceph -s'
[22:12] <benner> mhm
[22:12] <benner> i did this: ceph auth get-or-create client.benner osd 'allow rw' mon 'allow r' > keyring.benner
[22:12] <pioto> note: i'm running ceph 0.60, so maybe it's a recent change?
[22:12] <benner> and copiend file to client's /etc/ceph/keyring
[22:13] <benner> i my env ceph is:
[22:13] <benner> # dpkg -l | grep ceph | tail -1
[22:13] <benner> ii libcephfs1 0.56.4-1precise Ceph distributed file system client library
[22:13] <pioto> the auth i used only has:
[22:13] <pioto> caps: [mds] allow
[22:13] <pioto> caps: [mon] allow r
[22:13] <pioto> caps: [osd] allow rw pol data
[22:13] <pioto> oh. well. that's a typo that'll cause problems ;)
[22:15] <benner> so it's normal in bobtail ?
[22:15] <pioto> could be. i don't have bobtail running anywhere
[22:16] <dmick> ok. it seems rwx doesn't allow auth list. I'm not sure why, from the code, but it seems it does not, whereas * does
[22:17] <dmick> oh. sorry, yes, I was misreading that. you need '*' to do auth most-anything
[22:17] <dmick> it looks like, at least.
[22:18] <dmick> benner, are you saying you see mon 'allow r' is enough to do auth list?
[22:19] <pioto> on the subject, a slight sidebar... can i tweak the caps of an existing auth w/o regenerating its key? maybe i'm just missing the right answer...
[22:19] <benner> i think, yes: http://p.defau.lt/?kdq_6kHm5ZCsM0Z6tNZzIQ
[22:19] <dmick> pioto: yes, auth caps
[22:20] <dmick> you have to supply all the caps
[22:20] <pioto> ah, that doesn't seem to show in `ceph -h`
[22:20] <pioto> but does with ceph auth -h
[22:20] <pioto> err, `ceph auth`.
[22:20] <dmick> the help and doc really need work; I'm a large part of the way there. but we know.
[22:20] <kfox1111> is there a way after formatting a ceph file system to block a shell script until all of the syncing and things are finished so I can capture a vm with a clean fs?
[22:21] <pioto> ok. well. when i get things working more, maybe i can send some patches
[22:21] <dmick> benner: you're doing ceph auth list as client.admin; in the list output you can see client.admin has mon allow *
[22:21] <dmick> and it should; client.admin is the superuser, basically
[22:21] <pioto> or are you planning on reworking how commands work, to be more automatically self-documenting?
[22:21] <dmick> pioto: yes
[22:21] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:22] <pioto> ok. then i'll just wait :) (that sounds tedious to do, but good to do)
[22:22] <dmick> http://wiki.ceph.com/01Planning/02Blueprints/Dumpling/Ceph_management_API
[22:24] <pioto> oh, cool. so, that's for the 'd' release, which i guess is for, what, some time this fall?
[22:24] <pioto> Dumpling is due for release on August 1st 2013.
[22:24] <pioto> cool
[22:25] <dmick> yeah
[22:26] <benner> fsck, guys sorry, i missed then.
[22:26] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) Quit (Remote host closed the connection)
[22:27] <dmick> no worries
[22:27] <dmick> you made me learn more about what's required for caps for auth commands
[22:27] <dmick> so it's all good
[22:27] <benner> :)
[22:46] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[22:53] <liiwi> in 12
[22:54] <liiwi> erp
[23:00] * BMDan (~BMDan@74.121.199.170) Quit (Quit: Leaving.)
[23:01] <yehudasa> paravoid: I pushed a branch (wip-4760) that fixes the streaming of the response. It'll still be overall slow, but at least apache isn't going to time out.
[23:01] <paravoid> ooh
[23:01] <yehudasa> paravoid: it's on top of next, which means that it's what going to be cuttlefish
[23:02] <paravoid> I guess I need apache with 100 Continue for that?
[23:02] <yehudasa> still needs some more beating up before merging it in
[23:02] <yehudasa> not necessarily, no
[23:03] <paravoid> I'm confused then
[23:03] <paravoid> what does this do?
[23:03] <paravoid> what do you mean by streaming?
[23:04] * mikedawson (~chatzilla@206.246.156.8) Quit (Ping timeout: 480 seconds)
[23:04] <paravoid> the problem is that you have to calculate X-Account-Object-Count & X-Account-Bytes-Used, isn't it?
[23:05] <paravoid> that's part of the headers, how are you able to incrementally stream the response?
[23:07] <yehudasa> paravoid: that's a different request ...
[23:07] <yehudasa> you're talking about HEAD account, whereas I was referring to the list containers
[23:08] <paravoid> that's GET, isn't it?
[23:08] <yehudasa> yeah
[23:08] <paravoid> GET in Swift has those headers
[23:08] <paravoid> but if you don't need those, then what's why is it so slow?
[23:08] <paravoid> there are no per-container stats in GET
[23:08] <yehudasa> ahmm.. yes there are
[23:09] <yehudasa> unless they changed the api
[23:10] <paravoid> oh, hrm
[23:10] <paravoid> the text/plain output doesn't have it
[23:10] <paravoid> the json does
[23:10] <paravoid> (swift still responds instantly with json)
[23:12] <yehudasa> well, they have a different data model probably
[23:13] <yehudasa> either they're paying for it on each request, or they propagate the info lazily
[23:14] <yehudasa> we can later make it go faster, or even do a lazy account stats
[23:15] <yehudasa> so that it doesn't go through all the buckets for this info
[23:15] <yehudasa> ok, I'm afk for now
[23:18] * yehudasa_ (~yehudasa@2607:f298:a:607:4116:c14e:51bb:c1a4) Quit (Ping timeout: 480 seconds)
[23:20] * mr_evil (4fc2de74@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[23:27] <benner> is ceph better than swift?
[23:28] * eschnou (~eschnou@42.165-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:28] <gregaf1> *looks at room title*
[23:28] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[23:29] <gregaf1> I'm gonna go with "yes" :)
[23:31] <dmick> it's two letters shorter. That's better right there.
[23:31] <dmick> er, one. one letter shorter.
[23:31] <dmick> I'll come in again.
[23:31] <joao> still counts
[23:34] <benner> what i met then asking: performance is better i think. what about api part?
[23:35] <benner> *when
[23:36] * treaki_ (34f19c8684@p4FF4A7C5.dip0.t-ipconnect.de) has joined #ceph
[23:40] * treaki__ (220eaa3120@p4FDF7CC1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[23:42] * madkiss (~madkiss@2001:6f8:12c3:f00f:c010:c230:ca7c:5b09) Quit (Quit: Leaving.)
[23:45] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[23:53] * jluis (~JL@89.181.154.215) has joined #ceph
[23:55] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[23:59] * joao (~JL@89.181.149.4) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.