#ceph IRC Log

Index

IRC Log for 2012-07-25

Timestamps are in GMT/BST.

[0:06] <joshd> elder: yes, that's correct about the reason for the redundancy
[0:07] <elder> I'm just putting that aside. I'll rip the whole thing out instead, after a requisite waiting period.
[0:07] <joshd> that sounds good
[0:08] <joshd> maybe a week to see if there are any responses? we can wait to put it upstream until later in any case
[0:08] <elder> Is my statement about the notification coming back as a result of "my" own notify request?
[0:08] <joshd> yes, that's right
[0:08] <elder> OK.
[0:08] <elder> Good.
[0:10] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:10] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[0:10] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:13] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:18] * BManojlovic (~steki@212.200.241.106) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:19] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[0:28] <sagewk> joshd: wanna look at wip-asok?
[0:28] <joshd> sure
[0:36] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:36] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:37] <joshd> the docstring in the config_show test wasn't changed
[0:37] <joshd> otherwise looks good, assuming it's tested
[0:43] * aliguori (~anthony@32.97.110.59) Quit (Remote host closed the connection)
[1:10] <mikeryan> sagewk: can i get a review on wip_rados_bench_98765 ?
[1:10] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:26] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) has joined #ceph
[1:33] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) Quit (Quit: LarsFronius)
[1:35] * tnt (~tnt@99.56-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:39] * andrewbogott_ (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[1:48] <sagewk> mikeryan: stickign on my list!
[1:48] * andrewbogott_ (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Ping timeout: 480 seconds)
[1:53] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[2:13] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[2:14] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[2:15] * Leseb_ (~Leseb@62.233.37.89) has joined #ceph
[2:15] * Leseb_ (~Leseb@62.233.37.89) has left #ceph
[2:15] * andrewbogott_ (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[2:21] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[2:26] <mikeryan> sagewk: thanks, it's a pretty large chunk of code
[3:04] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[3:13] * Ryan_Lane (~Adium@216.38.130.164) has joined #ceph
[3:13] * andrewbogott_ (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Ping timeout: 480 seconds)
[3:13] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[3:21] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[3:59] * joao (~JL@89.181.148.137) Quit (Remote host closed the connection)
[4:02] * chutzpah (~chutz@100.42.98.5) Quit (Quit: Leaving)
[4:15] * Ryan_Lane (~Adium@216.38.130.164) Quit (Quit: Leaving.)
[4:16] * adjohn (~adjohn@69.170.166.146) Quit (Quit: adjohn)
[4:39] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[5:33] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[5:44] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[6:05] * ryann (~chatzilla@216.81.130.180) has joined #ceph
[6:10] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:39] * Qten (~qgrasso@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Read error: Connection reset by peer)
[6:39] * Qten (~qgrasso@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[6:40] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[6:46] * deepsa (~deepsa@122.172.21.41) Quit ()
[6:46] * deepsa (~deepsa@122.172.21.41) has joined #ceph
[7:30] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:49] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[8:03] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[8:07] * ssedov (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[8:07] * stass (~stas@ssh.deglitch.com) has joined #ceph
[8:17] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) has joined #ceph
[8:19] * tnt (~tnt@99.56-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:29] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:29] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:50] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[8:51] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[8:52] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit ()
[9:11] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:15] * tnt (~tnt@99.56-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:28] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[9:30] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[9:30] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[9:30] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:31] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[9:31] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:32] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:38] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Ping timeout: 480 seconds)
[9:42] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) Quit (Quit: LarsFronius)
[9:58] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:05] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:05] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[10:06] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:11] <tnt> For the RBD client, is it important to use the absolute latest version or is the one shipped in the kernel fine ?
[10:14] <tnt> ceph.com looks down as well ...
[10:17] * joao (~JL@89.181.148.137) has joined #ceph
[10:18] * loicd (~loic@83.167.43.235) has joined #ceph
[10:27] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[10:54] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[10:58] <tnt> mmm, the kernel panic would suggest maybe the version shipped with the kernel isn't all that great
[11:00] * benner_ (~benner@193.200.124.63) Quit (Read error: Connection reset by peer)
[11:00] * benner (~benner@193.200.124.63) has joined #ceph
[11:07] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:30] * themgt_ (~themgt@24-181-215-214.dhcp.hckr.nc.charter.com) has joined #ceph
[11:30] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[11:30] <gadago> hi, when I try to start radosgw service with service radosgw start I get 'stop: Unknown parameter: id'
[11:30] <gadago> any ideas?
[11:31] <gadago> actually 'start: Unknown parameter: id'
[11:34] <tnt> yeah got the same here ... I use /etc/init.d/radosgw start in the mean time.
[11:35] <gadago> tnt: thanks
[11:36] * themgt (~themgt@24-181-215-214.dhcp.hckr.nc.charter.com) Quit (Ping timeout: 480 seconds)
[11:36] * themgt_ is now known as themgt
[11:36] <loicd> Hi, the ceph logs show "WARNING: multiple ceph-osd daemons on the same host will be slow" however, I was under the impression that running one osd per disk on a given machine was recommended.
[11:38] <joao> I guess that's a preventive warning, in case your hardware doesn't match the workload put on by the multiple osds
[11:39] * Meths_ (rift@2.27.72.112) has joined #ceph
[11:40] <loicd> joao: that makes me want to find reference material recommending multiple osd per host
[11:40] <loicd> ;-)
[11:43] <tnt> Does the kernel rbd client supports caching ?
[11:45] * Meths (rift@2.27.73.166) Quit (Ping timeout: 480 seconds)
[11:50] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[11:51] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[12:13] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (Quit: ZNC - http://znc.in)
[12:14] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[12:38] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[12:45] <tnt> yehudasa: Any comments on the status of caching for rbd kernel ?
[12:50] * themgt_ (~themgt@24-181-215-214.dhcp.hckr.nc.charter.com) has joined #ceph
[12:52] <joao> tnt, he should only be available in five hours or so
[12:53] * loicd1 (~loic@83.167.43.235) has joined #ceph
[12:53] * loicd (~loic@83.167.43.235) Quit (Read error: Connection reset by peer)
[12:55] * tnt_ (~tnt@212-166-48-236.win.be) has joined #ceph
[12:55] * themgt (~themgt@24-181-215-214.dhcp.hckr.nc.charter.com) Quit (Ping timeout: 480 seconds)
[12:55] * themgt_ is now known as themgt
[12:59] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[13:08] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:35] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) Quit (Ping timeout: 480 seconds)
[14:39] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[14:39] * tnt_ is considering writing a NBD server using RBD as backend to workaround the limitation of in-kernel rbd ...
[14:59] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[15:01] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: No route to host)
[15:02] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[15:16] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[15:47] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[15:50] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[16:37] <gadago> hi, anyone know what 'radosgw-admin: symbol lookup error: radosgw-admin: undefined symbol: means when I try to add a new' means when trying to add a new rados user ?
[16:40] <tnt_> you probably have mixed version of libraries installed
[16:40] <gadago> yeah, it looks that way, just apt-get upgrading now
[16:58] <sileht> hi, anyone known If I can select the network interface that ceph-osd has to use ?
[16:59] <tnt_> sileht: AFAICT it uses whatever interface is used to route to the mon (and the ip distributed to the client is the one as seen by the mon)
[17:07] <sileht> tnt_, thx
[17:10] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:14] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:22] <Tv_> elder, joao, nhm_, anyone else potentially working early: teuthology vm downtime starts now
[17:22] <elder> OK. That's too bad. But now I nkow why my test will fail...
[17:22] <elder> Thanks for saying so.
[17:25] <nhm_> Tv_: ok, good to know.
[17:26] <sage> elder: if you already have machines locked, just add 'check-locks: false' to your yaml and you can continue to use them
[17:27] <elder> OK.
[17:27] <elder> Good to know.
[17:28] <sileht> tnt_, so is it possible to separate the network traffic between osd and the one from client ?
[17:28] <joao> Tv_, thanks for the heads up
[17:28] <nhm_> sage: I played this morning with more 4k file write tests. I can't get fio any faster than about 4MB/s no matter of I'm using spinning disks, spinning disks with external xfs log on ssd, or ssds. Also tried barrier=0 with little difference. I'm going to try something other than fio.
[17:28] <joao> nhm_, did that patch work for you?
[17:29] <nhm_> joao: haven't tried it since I changed the iterator increment to pos. After I did that it worked without segfaulting though.
[17:29] <joao> cool
[17:29] <nhm_> joao: and the benchmark is much faster using the vector. I sent out some results in email yesterday.
[17:30] <nhm_> joao: well, by results, I mean poorman's profiling results.
[17:31] <joao> nice
[17:31] <nhm_> joao: I don't know enough about all of the threads being spawned to know how often we should reasonably be hitting samples waiting on locks
[17:33] <gadago> hi guys, trying to access my rados gateway using swift with http://ceph1/auth/1.0 -U username:swift -K "key_goes_here" list but get a 'Auth GET failed: http://ceph1:80/auth/1.0/ 403 Forbidden'
[17:33] <gadago> any ideas?
[17:35] <joao> nhm_, are you referring to threads on the workload generator or on the filestore?
[17:35] <joao> iirc, the workloadgen does not spawn threads
[17:36] <nhm_> joao: filestore
[17:36] <joao> and the filestore, if I'm not mistaken, will create a thread pool in the beginning and then just reuse them
[17:36] <nhm_> joao: If you look at that output I sent, it shows counts of samples that were taken and what was being done.
[17:37] <joao> I see
[17:38] <nhm_> joao: The samples are being taken across all threads, so it's entirely reasonable that some threads are idle, while others should be hammering away and hopefully minimally waiting on locks.
[17:39] <joao> by the looks of it, the first one should be due to waiting for transaction completion
[17:39] <joao> we do queue a whole lot of transactions
[17:41] <joao> is whatever is on the right a backtrace from the lock up to the first caller?
[17:41] <nhm_> yep
[17:42] <nhm_> basically there's a script that launches gdb over and over, gets stacktraces, and groups similar things. It's a total hack, but interesting.
[17:42] <joao> so, if we reduced the number of in-flight transactions on the workloadgen, we should be able to lower the numbers on those waiting for transaction completion and pinpoint those
[17:42] * BManojlovic (~steki@212.200.241.106) has joined #ceph
[17:42] <nhm_> there's not a lot out there right now that does this kind of profiling effectively.
[17:42] <joao> nhm_, I wonder if we could obtain those results with systap
[17:43] <joao> systap is less intrusive than gdb by miles, I'm just not sure if we can leverage it on the user-level
[17:44] <nhm_> joao: I think we might. There has been some rather scattered work on high performance stacktrace generation for exactly this kind of profiling too.
[17:44] <nhm_> Also, now that I'm using fno-omit-frame-pointer properly, perf is giving me symbols and I'm able to do some basic wallclock profiling there too.
[17:45] <joao> you should poke Dan about dtrace for linux; not sure if he was able to make some progress on using it reliably, and when we were trying to use it to poke on btrfs, it did some amazing stuff
[17:45] <nhm_> yeah, I've heard dtrace is pretty nice.
[17:46] <nhm_> there's also some 3rd party patches for the google profiling tools that let you do wall clock profiling that might be worth trying.
[17:46] <joao> nhm_, anyway, if you try --test-max-in-flight VALUE, you should be able to reduce the number of in-flight transactions the workloadgen queues
[17:47] <joao> by default, it queues up to 50 and then waits before queuing additional transactions
[17:47] <nhm_> joao: Ok. Right now I'm working on looking at underlying filesystem metadata performance.
[17:47] <joao> okay, just letting you know the option is there ;)
[17:48] * tnt_ (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:48] <nhm_> joao: Yeah, that's good to know. Maybe it'll shed some light on things.
[17:49] <nhm_> joao: I'm concerned about metadata performance though. I didn't expect to see it so bad even on SSDs.
[17:49] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[17:50] <joao> how bad is it on SSDs?
[17:52] <nhm_> joao: 4MB/s according to fio.
[17:52] <nhm_> joao: I think something might be wrong with the benchmark though.
[17:58] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[18:01] * themgt (~themgt@24-181-215-214.dhcp.hckr.nc.charter.com) Quit (Quit: themgt)
[18:06] <gregaf> gadago: it means you need to specify the id for the radosgw instance you want to start :) "service radosgw start id=radosgw.a" or whatever name you're using
[18:06] * MarkDude (~MT@c-50-137-1-13.hsd1.wa.comcast.net) Quit (Read error: Connection reset by peer)
[18:06] <gadago> gregaf: thank you
[18:06] <gregaf> loicd1: you get that warning about multiple OSDs being slow if your system doesn't support syncfs and you aren't using btrfs, because the OSDs have to fall back to using sync() which runs across all disks
[18:10] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[18:11] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[18:14] * MarkDude (~MT@c-50-137-1-13.hsd1.wa.comcast.net) has joined #ceph
[18:18] <gadago> can anybody help me with the s3 php example here please? http://ceph.com/docs/master/radosgw/s3/php/
[18:19] <gadago> I'm just trying to connect to my rados gateway and list some objects
[18:19] * tnt (~tnt@99.56-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:20] <gadago> it doesn't seem to pass the credentials properly
[18:20] <loicd1> gregaf: thanks !
[18:22] <gregaf> tnt: now that you're back in the room, there's no caching support in the kernel rbd module and I don't think you should expect it anytime soon???the userspace modules got to take advantage of pre-existing caching components and right now elder is just trying to keep the kernel stuff sane and up-to-date compatibility-wise
[18:24] <tnt> gregaf: yes I figured :( That's why I started hacking together a NBD server using the userspace lib as backend, see what kind of gain I get from that.
[18:27] <tnt> Basically I'd like to use ceph for image storage on an array of xen servers and so I need a block device AFAIK ...
[18:29] <gregaf> ah, yeah???Xen :(
[18:29] <mikeryan> tnt: small world, just watched your 27c3 talk last night
[18:30] <tnt> mikeryan: hehe :)
[18:30] <gregaf> I wanted fghaas to be back in the channel so I could rib him about the 2.5 hour Ceph training I hear he did at OSCon
[18:38] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[18:40] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[19:05] <sagewk> joshd, yehudasa: pushed 1 fix
[19:18] * Leseb (~Leseb@62.233.37.36) has joined #ceph
[19:23] * loicd1 (~loic@83.167.43.235) Quit (Quit: Leaving.)
[19:30] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[19:38] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[19:38] * chutzpah (~chutz@100.42.98.5) has joined #ceph
[19:42] <Tv_> new teuthology vm in service, check your email for instructions, please let me know if anything does not work
[19:54] * deepsa (~deepsa@122.172.21.41) Quit (Ping timeout: 480 seconds)
[20:04] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:06] * Leseb (~Leseb@62.233.37.36) Quit (Ping timeout: 480 seconds)
[20:12] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[20:13] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[20:14] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[20:17] * Meths_ is now known as Meths
[20:37] * andrewbogott (~andrewbog@74-94-80-158-Minnesota.hfc.comcastbusiness.net) has joined #ceph
[20:45] <andrewbogott> I've set up a 2-node cephfs volume for testing/benchmarking purposes, and it is quite a bit slower than I was hoping. Can anyone suggest ways I might tune for performance? Or, is there a guide about that someplace?
[20:46] <andrewbogott> I already have my journal pointed at a separate raid.
[20:46] <nhm_> andrewbogott: what kind of performance are you seeing?
[20:46] <nhm_> andrewbogott: and what kind of hardware are you testing on?
[20:46] * nhm_ is now known as nhm
[20:47] <andrewbogott> nhm: benchmarks (gluster vs. cephfs) are here: http://pastebin.com/GHAzkEAC
[20:47] * andrewbogott digs for hardware stats...
[20:49] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:50] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:50] <andrewbogott> nhm_: You're wondering about cpu, memory, drive speeds, what? All of the above?
[20:50] <nhm> andrewbogott: mostly network, data disk, journal disk
[20:50] <andrewbogott> nhm_: for what it's worth, local storage benchmarks run in about 1/3 the time.
[20:51] <nhm> andrewbogott: also, number of osds
[20:51] <andrewbogott> Each server has 8 300g drives. 6 drives make up a raid-10 used for data, 2 drives make a raid-1 used for the journal.
[20:51] <andrewbogott> Two osds, replication 2.
[20:52] <andrewbogott> The servers are on the same rack, I can only assume that networking isn't the bottleneck.
[20:54] <nhm> andrewbogott: I'm not really familiar with sysbench. This is buffered IO? those first tests are 2KB block size writes?
[20:57] <andrewbogott> nhm: The read/writes are at random positions, so I expect that buffering is moot. The benchmark says that 'block size' is 16k but I don't know quite what that means.
[20:59] <nhm> andrewbogott: you may want to use a tool like iostat or collectl to see what the request sizes look like to the underlying osd data disk and how much IO wait time there is.
[21:00] <nhm> andrewbogott: you may also want to try with a single OSD and no replication and see how that affects the results.
[21:01] <andrewbogott> nhm: If it's faster without replication, does that tell me how to make it faster with replication, subsequently?
[21:02] <andrewbogott> (Sorry, I don't know enough to understand how these pieces fit together.)
[21:04] <nhm> andrewbogott: It might provide a clue. One of the things that can happen with ceph is that if a request on 1 OSD is slow the other osd(s) can end up paralyzed and slow the whole write process down. This might not be happening here (less likely with 2 OSDs), but with really small random IOs there could be some bad behavior going on.
[21:04] <nhm> andrewbogott: That looks like a pretty brutal test for just about any distributed filesystem.
[21:05] <andrewbogott> nhm: Yep, it's brutal :) Hoping to use ceph as VM storage, although probably RBD is a better choice there.
[21:06] <nhm> andrewbogott: we do have a lot of tunables regarding queue sizes and maximum queued bytes at various different levels that could help if that's what's going on.
[21:07] <nhm> andrewbogott: if it turns out that a lot of really small IOs are making their way to the OSDs, that the underlying FS is doing a lot of metadata work and can't keep up.
[21:08] <nhm> That's actually something I'm in the middle of investigating right now.
[21:09] <nhm> if you are on XFS, increasing the journal size may help. Oh, also, are you mounting the underlying filesystem with noatime?
[21:11] <andrewbogott> nhm: Yes, using XFS. Right now journal size is '1000'??? what do you suggest as an example of 'increasing'?
[21:12] <nhm> andrewbogott: sorry, I meant the XFS journal
[21:12] <andrewbogott> Ah, ok.
[21:12] <andrewbogott> I will try noatime first.
[21:12] <nhm> andrewbogott: Increasing the ceph journal can help, but if the OSD disks are slow eventually the journal will get too far ahead of the OSD data disk and you'll see a bunch of stalled writes from the client.
[21:13] <nhm> andrewbogott: I think the max XFS journal size is a couple of GB, but I have no idea yet what the optimal size is if the workload results in a lot of metadata updates. I'm still working that out.
[21:18] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[21:18] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[21:21] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit ()
[21:22] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[21:50] <jefferai> You guys seem to heavily promote Chef for installing Ceph, but (and I say this having heavily used Chef in the past) it seems like for setting Ceph up on, say, 12 nodes it's more trouble than it's worth
[21:50] <jefferai> or am I missings omething?
[21:50] <jefferai> *missing something
[22:00] <iggy> jefferai: depends, if you've got chef deployed already it's not bad, but yeah for a one off system it could be overkill
[22:01] <iggy> they are also planning on trying to get support for other tools in tree, but since they are using chef internally, that came first
[22:02] <jefferai> Sure
[22:03] <iggy> for smaller installs, it's probably just as easy to do keys and use the "init" script
[22:08] <Tv_> iggy: more like, chef came first because we saw that was where the significant userbase was, at that moment
[22:09] <Tv_> iggy: i expect Juju to come strong later, because Canonical is actually putting resources into that one
[22:09] <Tv_> iggy: i personally want to write a "just scripts" thing that lets you trigger all the relevant actions manually, over ssh, and replace mkcephfs with that
[22:10] * izdubar (~MT@c-50-137-1-13.hsd1.wa.comcast.net) has joined #ceph
[22:10] <Tv_> iggy: the rest, puppet etc, will be community contributions, as far as i can tell
[22:10] <Tv_> to me, this is more about the underlying product features that enable good deployments, than writing specifically chef cookbooks
[22:11] <Tv_> jefferai: chef is one of those things where learning it is painful, but from there on it can help; if you haven't learned it, ceph might not be the best thing to start with
[22:11] <Tv_> a traditional load balancer -> web app -> database deployment has way more examples out there
[22:15] <jefferai> Tv_: I know, as I said I've heavily used Chef in the past
[22:16] * MarkDude (~MT@c-50-137-1-13.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[22:16] <jefferai> But that also means I know how much work it is to set up :-)
[22:18] <iggy> damn, i was just thinking about rolling out chef :/
[22:19] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:19] <jefferai> iggy: by all means, do it
[22:20] <jefferai> if you have a lot of machines and they are basically heterogenous, it can be very useful
[22:20] <jefferai> erm
[22:20] <jefferai> basically homogonous, I mean
[22:21] <jefferai> When you have 100 web servers each with a different IP balancing load with identical configurations, it's great
[22:21] <jefferai> it's a little less useful if every one of your machines needs slightly tweaks
[22:21] <iggy> it makes sense for us, we have 4 clusters
[22:22] <ryann> So I'm on the same page; Say i have the Journal for an OSD on an SSD. I decide to stop teh entire cluster for some reason. Does ceph keep any data in that journal, or has that data been written to disk? In the event that i wish to replace that SSD, do I have to backup that journal?
[22:22] <iggy> but yeah, we have about 10 other systems that are pretty random
[22:23] <iggy> those could be a pita
[22:24] <jefferai> iggy: yeah, for those I've found Chef to be less useful, because you have to write all the custom stuff anyways
[22:24] <gregaf> ryann: you can't just toss out the journal; its whole purpose is persisting data in a controlled way so that Ceph can maintain data consistency across reboots.
[22:24] <iggy> or i have to clue-by-for people who change stuff randomly without telling people
[22:24] <gregaf> but if the OSD is turned off, you can flush the journal out and then move it to a different place and replace the disk or whatever.
[22:25] <ryann> gregaf: Thanks. Looking up ho to flush journal now.......
[22:28] <ryann> gregaf: Got it. Thanks!
[22:32] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) has joined #ceph
[22:33] <fghaas> gregaf: I hear from andreask you asked for some sparring re my ceph bof?
[22:33] <gregaf> lol
[22:33] <gregaf> I just heard you sat them all through a 2.5 hour training session :p
[22:33] <gregaf> apparently it went well, but I was surprised you had anything so organized!
[22:33] <gregaf> all the BoFs I've been to were 30-60 minute things without a real agenda
[22:34] <fghaas> it was a 1 hour bof, it's just that people didn't want to leave for 3 hours
[22:34] <gregaf> nice
[22:34] <fghaas> dude, you callin me disorganized? you messin with me?
[22:34] <gregaf> I've been trying to fact-check rturk's OSCon reports since I don't believe him ;)
[22:35] <sagewk> gregaf: can you look at patches 5 and 6 at https://github.com/ceph/ceph/commits/wip-osd ?
[22:35] <gregaf> you have a nice crowd?
[22:35] <gregaf> sagewk: yeah
[22:35] <sagewk> thanks!
[22:35] <gregaf> ermm, which ones are 5 and 6?
[22:35] <fghaas> why, because rturk talks about hotels with a billion rooms and 100k of them on fire?
[22:35] <sagewk> mon: ones, 5th and 6th on teh list
[22:35] <gregaf> the mon ones?
[22:35] <gregaf> ah, from the top
[22:36] <gregaf> I was trying to count the wrong way and wasn't sure what the start point was
[22:39] <gregaf> sagewk: well, I was wondering how they accidentally didn't ignore boot messages; that would do it...
[22:39] <gregaf> both patches look good to me
[22:40] <gregaf> fghaas: yeah; I think a hotel with rooms on fire all the time counts as a failure no matter how big it gets
[22:40] <sagewk> great, can you look at #4 (osd tick()) too?
[22:40] <fghaas> gregaf: build one with 1e+9 rooms and then we'll talk
[22:41] <gregaf> "osd: avoid misc work"?
[22:41] <mikeryan> "statistically insignificant number of rooms on fire"
[22:41] <mikeryan> i'd stay in a hotel that listed that as a feature
[22:41] <sagewk> yeah
[22:41] <fghaas> see, there ya go
[22:42] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:43] <gregaf> well, the magic of gitk "Ignore space change" says you added an if (is_active()) block and moved the tick interrupt out of that block???let me check what tick actually does
[22:44] <ryann> gregaf: To future-proof my design a little bit more: Say I stopped the cluster, and flushed the journal on one of my osd's, replaced faulty journal disk. Do I need to execute a --mkjournal once again or will the daemon automatically produce the journal file again?
[22:46] <gregaf> sagewk: do we have anything besides tick() keeping mon sessions alive?
[22:46] <sagewk> the monclient sends keepalives
[22:46] <sagewk> osd is just piggybacking on the ms_handle_connect() to send its own stuff in each session
[22:47] <gregaf> are they turned on for the OSD? I thought this OSD timeout code was fairly recent
[22:51] <gregaf> sagewk: yeah, looks good; I was just feeling a little paranoid about the monitor connection but I don't think that'll be a problem
[22:53] <sagewk> gregaf: k thanks
[22:54] <gregaf> ryann: I'm not sure; it shouldn't be too hard to test though or somebody else in the channel might know
[22:55] <ryann> gregaf: Thanks, I got it. I trashed it once, and got it working the second time. I'm out of your hair...
[22:57] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[22:58] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:06] * fghaas (~florian@91-119-129-178.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[23:16] * adjohn is now known as Guest703
[23:16] * Guest703 (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Read error: Connection reset by peer)
[23:16] * _adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[23:16] * _adjohn is now known as adjohn
[23:18] <wido> for the record, we did remove the console proxy didn't we?
[23:18] <wido> I still see it in the tree and in a couple of files, but I don't see it getting packaged anymore
[23:18] <wido> not in the RPM's or DEB's
[23:20] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[23:21] <wido> uh, wrong channel :) Heading over to #cloudstack ....
[23:22] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[23:48] * andrewbogott (~andrewbog@74-94-80-158-Minnesota.hfc.comcastbusiness.net) Quit (Quit: andrewbogott)
[23:58] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.