#ceph IRC Log

Index

IRC Log for 2012-01-11

Timestamps are in GMT/BST.

[0:25] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:26] * jmlowe (~Adium@mobile-198-228-227-250.mycingular.net) has joined #ceph
[0:30] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:38] * spadaccio (~spadaccio@213-155-151-233.customer.teliacarrier.com) Quit (Quit: WeeChat 0.3.7-dev)
[0:38] * jmlowe (~Adium@mobile-198-228-227-250.mycingular.net) has left #ceph
[0:41] * adjohn is now known as Guest23623
[0:41] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[0:42] * Guest23623 (~adjohn@208.90.214.43) Quit (Read error: Operation timed out)
[0:51] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:56] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:03] * adjohn (~adjohn@208.90.214.43) Quit (Remote host closed the connection)
[1:04] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[1:50] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:09] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[2:13] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[3:00] * adjohn (~adjohn@208.90.214.43) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * jojy (~jvarghese@108.60.121.114) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * joshd (~joshd@aon.hq.newdream.net) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * votz (~votz@pool-108-52-121-248.phlapa.fios.verizon.net) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * gohko (~gohko@natter.interq.or.jp) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * nolan (~nolan@phong.sigbus.net) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * grape (~grape@216.24.166.226) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * darkfaded (~floh@188.40.175.2) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * acaos_ (~zac@209-99-103-42.fwd.datafoundry.com) Quit (synthon.oftc.net larich.oftc.net)
[3:00] * yehudasa_ (~yehudasa@aon.hq.newdream.net) Quit (synthon.oftc.net larich.oftc.net)
[3:01] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[3:01] * jojy (~jvarghese@108.60.121.114) has joined #ceph
[3:01] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[3:01] * votz (~votz@pool-108-52-121-248.phlapa.fios.verizon.net) has joined #ceph
[3:01] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[3:01] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[3:01] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[3:01] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) has joined #ceph
[3:01] * grape (~grape@216.24.166.226) has joined #ceph
[3:01] * darkfaded (~floh@188.40.175.2) has joined #ceph
[3:01] * yehudasa_ (~yehudasa@aon.hq.newdream.net) has joined #ceph
[3:01] * acaos_ (~zac@209-99-103-42.fwd.datafoundry.com) has joined #ceph
[3:02] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:03] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[3:10] * jojy (~jvarghese@108.60.121.114) Quit (Quit: jojy)
[3:11] * adjohn (~adjohn@208.90.214.43) Quit (Quit: adjohn)
[3:20] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[3:20] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:21] * jojy (~jvarghese@108.60.121.114) has joined #ceph
[3:22] * jojy (~jvarghese@108.60.121.114) Quit ()
[3:30] * lxo (~aoliva@lxo.user.oftc.net) Quit (synthon.oftc.net charm.oftc.net)
[3:30] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (synthon.oftc.net charm.oftc.net)
[3:30] * elder (~elder@aon.hq.newdream.net) Quit (synthon.oftc.net charm.oftc.net)
[3:30] * nhm (~nh@68.168.168.19) Quit (synthon.oftc.net charm.oftc.net)
[3:30] * ottod (~ANONYMOUS@li127-75.members.linode.com) Quit (synthon.oftc.net charm.oftc.net)
[3:30] * jclendenan_ (~jclendena@204.244.194.20) Quit (synthon.oftc.net charm.oftc.net)
[3:30] * ajm (adam@adam.gs) Quit (synthon.oftc.net charm.oftc.net)
[3:30] * Sargun (~sargun@208-106-98-2.static.sonic.net) Quit (synthon.oftc.net charm.oftc.net)
[3:30] * sagewk (~sage@aon.hq.newdream.net) Quit (synthon.oftc.net charm.oftc.net)
[3:30] * __jt__ (~james@jamestaylor.org) Quit (synthon.oftc.net charm.oftc.net)
[3:30] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:30] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[3:30] * elder (~elder@aon.hq.newdream.net) has joined #ceph
[3:30] * nhm (~nh@68.168.168.19) has joined #ceph
[3:30] * ottod (~ANONYMOUS@li127-75.members.linode.com) has joined #ceph
[3:30] * jclendenan_ (~jclendena@204.244.194.20) has joined #ceph
[3:30] * ajm (adam@adam.gs) has joined #ceph
[3:30] * Sargun (~sargun@208-106-98-2.static.sonic.net) has joined #ceph
[3:30] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[3:30] * __jt__ (~james@jamestaylor.org) has joined #ceph
[3:43] * jmlowe (~Adium@mobile-198-228-227-250.mycingular.net) has joined #ceph
[3:47] <jmlowe> I have an off topic problem with linux 3.2 and the cciss driver for hp raid controllers, moving from 3.0 to 3.2 I take a 265000% performance hit
[3:48] <ajm> 265000%
[3:48] <ajm> i have a lot of 3.2 issues
[3:48] <jmlowe> that is correct
[3:49] <ajm> its not even stable on my desktop yet (where I test things)
[3:49] <ajm> so I won't even try any servers at all
[3:49] <jmlowe> 212MBs to 80KBs as measured by hdparm
[3:49] <ajm> wow
[3:50] <Kioob> btrfs on 3.2 throw a lot of errors on my setup
[3:51] <Kioob> I add to switch to 3.1.*
[3:51] <Kioob> s/add/had/
[3:52] <jmlowe> my btrfs file systems are going corrupt on me for no good reason, hence the test of 3.2
[3:58] <jmlowe> I'm not missing anything obvious am I?
[4:00] <jmlowe> only thing I can see that is drastically different is an option to turn off marshaling of interrupts, before you couldn't turn it off but it is on by default so that behavior shouldn't have changed
[4:06] <ajm> do you get odd load during the tests
[4:07] <ajm> like system load / cpu usage way higher than it should be
[4:08] <jmlowe> let me check
[4:14] <jmlowe> WTF! problem seemingly went away, terminal history tells me I'm not insane
[4:15] <ajm> was the battery on the adapter charging or something
[4:16] <jmlowe> I don't think so, it's been powered on for weeks with a healthy battery status
[4:23] <jmlowe> that should only affect writes, no bbu will turn off write back
[4:25] <jmlowe> maybe the card defaults to safe and the driver to performant so you have to modprobe with the safe off to set the card and reboot to make it take effect?
[4:43] <jmlowe> So has anybody ever seen this "ERROR: current/ volume data version is not equal to snapshotted version."?
[5:12] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[5:24] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[5:24] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[5:26] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit ()
[5:29] * mike3 (~mike@S01060023bee96928.vs.shawcable.net) has joined #ceph
[5:29] * mike3 (~mike@S01060023bee96928.vs.shawcable.net) Quit (Excess Flood)
[5:29] * mike3 (~mike@S01060023bee96928.vs.shawcable.net) has joined #ceph
[5:35] * mike3 (~mike@S01060023bee96928.vs.shawcable.net) Quit (autokilled: Do not spam. Mail support@oftc.net with questions (2012-01-11 04:35:02))
[6:00] <jmlowe> *sigh* performance problem is back
[6:02] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[6:38] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[7:45] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[8:45] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:51] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:43] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[9:43] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[9:43] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[10:01] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:48] * andreask (~andreas@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[10:53] * spadaccio (~spadaccio@213-155-151-233.customer.teliacarrier.com) has joined #ceph
[11:07] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[11:07] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:13] * andreask1 (~andreas@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[11:13] * andreask (~andreas@85-127-93-41.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:16] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[11:16] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:37] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[11:38] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[12:14] * jmlowe (~Adium@mobile-198-228-227-250.mycingular.net) Quit (Quit: Leaving.)
[12:16] * jmlowe (~Adium@mobile-198-228-227-250.mycingular.net) has joined #ceph
[12:17] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[12:38] * andreask1 (~andreas@85-127-93-41.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[13:27] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Remote host closed the connection)
[13:28] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[13:28] <jmlowe> found something significant "sched: RT throttling activated"
[13:31] <dwm_> As in real-time processes?
[13:31] <jmlowe> that's what's in the logs
[13:32] <jmlowe> leads me to trying to figure out what is here https://www.kernel.org/doc/Documentation/scheduler/sched-rt-group.txt
[13:33] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:34] <dwm_> "when the rt_runtime budget is exceeded, the kernel silently stops
[13:34] <dwm_> scheduling RT tasks. there is no way to distinguish this from
[13:34] <dwm_> a task taking very long to complete."
[13:34] <jmlowe> and then things like disk access times go through the roof
[13:35] <jmlowe> which is what I'm seeing
[13:35] <dwm_> Many kernel threads are RT-priority..
[13:36] <jmlowe> btrfs threads?
[13:38] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:38] <dwm_> 'migration' and watchdogs.
[13:39] <dwm_> Migration/N are N real-time workers to migrate processes between CPUs.
[13:42] <jmlowe> if I read this correctly setting the runtime to −1 eliminates throttling?
[13:55] <darkfaded> how can a kernel stop serve realtime processes? (thanks for the link)
[13:56] <darkfaded> ah, just per group of processes. that's actually a nice document
[13:56] <jmlowe> hmm, for me that may just be a symptom
[14:01] * andreask (~andreas@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[14:30] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[14:31] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[14:36] * raso (~raso@debian-multimedia.org) Quit (Quit: WeeChat 0.3.6)
[14:36] * jmlowe (~Adium@mobile-198-228-227-250.mycingular.net) Quit (Quit: Leaving.)
[14:41] * andreask (~andreas@85-127-93-41.dynamic.xdsl-line.inode.at) has left #ceph
[14:49] <spadaccio> hi, can anybody tell me what's the expected release date for 0.40?
[14:58] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Quit: Leaving)
[14:59] * BManojlovic (~steki@93-87-148-183.dynamic.isp.telekom.rs) has joined #ceph
[14:59] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[15:02] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:03] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[15:29] * todin (tuxadero@kudu.in-berlin.de) Quit (Read error: Operation timed out)
[15:29] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[16:09] * rosco_ (~r.nap@188.205.52.204) Quit (Quit: leaving)
[16:09] * rosco (~r.nap@188.205.52.204) has joined #ceph
[16:11] * _Tassadar (~tassadar@tassadar.xs4all.nl) Quit (Remote host closed the connection)
[16:11] * _Tassadar (~tassadar@tassadar.xs4all.nl) has joined #ceph
[16:16] * rosco (~r.nap@188.205.52.204) Quit (Quit: leaving)
[16:16] * rosco (~r.nap@188.205.52.204) has joined #ceph
[16:39] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:39] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:42] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) has joined #ceph
[16:52] <wido> spadaccio: I think it will be within two weeks or so?
[16:53] <spadaccio> wido: great, thanks!
[16:54] <wido> spadaccio: It might be earlier actually, just checked the tracker and all issues linked to 0.40 are resolved
[16:54] <wido> http://tracker.newdream.net/projects/ceph/roadmap
[16:55] <wido> If you filter for "Completed versions" you'll see 0.40 as 'done'
[16:55] <wido> 0.41 is the next sprint which has open issues
[16:55] <spadaccio> I see.. thanks!
[16:59] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[17:16] * gregaf (~Adium@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[17:19] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[17:19] * sjust (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[17:19] * elder (~elder@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[17:19] * yehudasa_ (~yehudasa@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[17:19] * sagewk (~sage@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[17:22] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) has joined #ceph
[17:23] * adjohn (~adjohn@70-36-197-80.dsl.dynamic.sonic.net) Quit ()
[17:39] <jmlowe> I've moved my replication up to 4 from 2, (I'm about to do dangerous things), and I'm stuck in a degraded state, is this because I need to adjust my crush map?
[17:41] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[17:45] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:56] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:58] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[17:59] * alexk (~alexk@cadlab.kiev.ua) has joined #ceph
[17:59] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[17:59] * fronlius (~fronlius@testing78.jimdo-server.com) Quit ()
[17:59] <alexk> Hi All
[18:00] <alexk> Who can I ask some questions about stability of RADOS for production use?
[18:00] <jmlowe> probably not me, I've been having some trouble for the past week
[18:02] <alexk> So, does anyone here have real-world experience deploying Ceph cluster?
[18:03] <jmlowe> I've been using rbd to back kvm virtual machines
[18:03] <alexk> I am looking for a solution and RADOD/libradosgw seem to be perfect fit.
[18:03] <jmlowe> haven't use the gateway
[18:04] <alexk> do you think it is even worth trying? I am looking for backend for my system to store 100+TB of data and 2+ billion of files
[18:04] <alexk> I do not need Ceph frontend...
[18:04] <wonko_be> can you loose your data?
[18:04] <alexk> no, absolutely not :)
[18:05] <wonko_be> then step away from ceph
[18:05] <alexk> what about RADOS?
[18:05] <wonko_be> all part of ceph, afaik
[18:05] <alexk> I mean without MDS and all that stuff?
[18:05] <wonko_be> yes, I know what rados is
[18:05] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[18:06] <jmlowe> it's better with mds, but ultimately you are still relying on btrfs (bitten twice this week by it)
[18:06] <wonko_be> yeah, I had to reinitiate my ceph today, btrfs didn't really survive my kernel upgrade
[18:06] <wonko_be> *poof* data gone :-)
[18:06] <darkfaded> seems his irc was on btrfs too
[18:07] <wonko_be> lol
[18:07] <jmlowe> looks like the 3.0 kernel shits the bed as my old boss used to say
[18:07] <jmlowe> in terms of btrfs that is
[18:07] <darkfaded> unstable filesystem, period
[18:08] <wonko_be> I had problems (performance and stability) using multiple devices
[18:08] <wonko_be> striping the devices with md, and then feeding it to btrfs as one device seems to solve a lot
[18:08] <jmlowe> interesting
[18:09] <alexk> I would assume that RADOS (being ground for ceph for a long time) should be pretty stable if you don't use metadata services?
[18:09] <darkfaded> #should i happen to find a bigger apartment where there is "server noise isolation" i'm moving my home stuff off linux onto some unix box
[18:09] <darkfaded> oh yes
[18:09] <darkfaded> this isn't a shell
[18:09] <darkfaded> i decided to not write that into channel and uncommented it
[18:09] <darkfaded> time to go home
[18:10] <alexk> would you guys give me any hint how to contact developers/support? I would like to find out if I can use anything from this project for my problem.
[18:10] <wonko_be> just sit idle in this channel for a while
[18:10] <darkfaded> alexk: just wait here until sage or greg are there, for example
[18:10] <wonko_be> ask your question when you see some discussions going on
[18:11] <jmlowe> well yes, but ultimately your data sits on a filesystem (btrfs) that doesn't have a working fsck, if you loose power, have a deadlock and have to push the button, or there is some bug in the kernel (there seem to be a lot of them), then your data is gone
[18:11] * yehudasa (~yehudasa@99.139.62.62) has joined #ceph
[18:11] <darkfaded> but if it's production data you want to put on it and the website has this big note saying experimental, then this is problematic
[18:11] <wonko_be> jmlowe: you can use ext4 with xattrs enabled afaik
[18:11] <wonko_be> that is my next option
[18:11] <alexk> ok, thanks for the hint... will be waiting for sage or greg
[18:11] <darkfaded> like... you now, there is no scenario that doesnt spell "fully liable"
[18:11] <jmlowe> can you? I thought that wasn't really usable
[18:12] <wonko_be> it's beta how beta is ment to be used, not the google-beta
[18:12] <wonko_be> jmlowe: afaik it is possible
[18:12] <darkfaded> jmlowe: i used to run it on ext4, was fine
[18:12] <wonko_be> it is in the docs
[18:12] <darkfaded> no snapshots of course
[18:12] <alexk> This is sad as the project sounds very promising.
[18:12] * sjust (~sam@99.139.62.62) has joined #ceph
[18:13] <wonko_be> alexk: it is a work in progress
[18:13] <darkfaded> alexk: yes, it IS the perfect designed filesystem, but it's in your own interest for it to be finished when you use it :)
[18:13] <alexk> I understand. My project has been a work in progress for the last 6 years :)
[18:13] <darkfaded> hehe
[18:13] <alexk> but that does not stop us from getting some customers
[18:15] <alexk> I thought if I only needed RADOS for pure distributed and scalable BLOB storage, there is a good chance it could work for me.
[18:15] <wonko_be> it is the lowest level, so most work on it should be done, and most bugs should be fixed
[18:15] <jmlowe> It kicks ass, best thing on the market (I've run lustre and gpfs as well as a bastard drbd solution) when everything is working
[18:15] <wonko_be> but most is not "all"
[18:16] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:16] <Kioob> is there a way to limit access to a RADOS block device to only one client at a time ?
[18:17] <jmlowe> hmm, but I could whack each osd and let it replicate back onto ext4 then later when btrfs doesn't ship with malox I could reverse the process
[18:18] <wonko_be> jmlowe: yes, I have done this (to move from btrfs-multi-dev to btrfs-on-md)
[18:18] <wonko_be> stop your osd, get your monmap, save the keyring, reformat, and create the fs again... the instructions are in the wiki under "recovering from a failed osd"
[18:19] * gregaf (~Adium@99.139.62.62) has joined #ceph
[18:19] <jmlowe> oh, I'm well aware, I think I only have 1/12 btrfs backing partitions that pass a fsck
[18:19] <wonko_be> jmlowe: http://ceph.newdream.net/wiki/Replacing_a_failed_disk/OSD
[18:21] * elder (~elder@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:21] <jmlowe> I just keep whacking osd's and rebuilding, wasn't aware it was bugs in 3.0 that were causing all of my problems, I'm a little bitter this morning after laying in bed awake worrying about how many corrupt filesystems I would have when I got up
[18:22] * fronlius (~fronlius@d217151.adsl.hansenet.de) has joined #ceph
[18:25] * gregaf1 (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:25] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:25] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:25] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:26] * Tv (~Tv|work@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:26] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:26] <alexk> is there an official email address of the ceph developers? I cannot see any working links on the website.
[18:26] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:27] <Tv> alexk: the mailing list
[18:27] <Tv> on the website under the "Mailing Lists and IRC" tab
[18:27] <jmlowe> so let me ask my question again since there are some new people in here, I'm about to do some dangerous things and I bumped my replication up from 2 to 4, but the replication doesn't seem to be taking place, what am I missing?
[18:28] * gregaf (~Adium@99.139.62.62) has joined #ceph
[18:28] * gregaf1 (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:28] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:30] * sjust (~sam@99.139.62.62) has joined #ceph
[18:30] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:30] * Tv (~Tv|work@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:30] <alexk> Tv: Thanks, but that is a public list. I can also see personal email addresses of the guys at dreamhost, but not sure that is the best way to get the information I need.
[18:30] * gregaf (~Adium@99.139.62.62) has joined #ceph
[18:30] * Tv (~Tv|work@99.139.62.62) has joined #ceph
[18:31] <Tv> jmlowe: can you pastebin "ceph pg dump" output please?
[18:33] <jmlowe> sure
[18:33] * Tv (~Tv|work@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:33] <jmlowe> doh
[18:35] * gregaf1 (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:35] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:35] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:35] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:36] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:37] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:37] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:37] * gregaf1 (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:38] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[18:38] * yehudasa (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:38] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:38] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:38] * elder (~elder@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:38] * elder (~elder@99.139.62.62) has joined #ceph
[18:38] * sjust (~sam@99.139.62.62) has joined #ceph
[18:39] * gregaf (~Adium@99.139.62.62) has joined #ceph
[18:39] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:44] <jmlowe> well it's too big for pastebin, how about this https://slashtmp.iu.edu/files/download?FILE=jomlowe%2F76293bUD7EF
[18:44] <jmlowe> password: cephpgdump
[18:45] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:46] <jmlowe> I can't believe they make us use passwords now, really doesn't do much for a service that got by for a decade without it just fine
[18:47] * gregaf1 (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:47] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:47] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:47] * sjust (~sam@99.139.62.62) has joined #ceph
[18:47] <sjust> jmlowe: what command did you use to up the replication level?
[18:47] <sjust> also, could you post ceph osd dump?
[18:48] <jmlowe> ceph osd pool set data size 4
[18:49] * BManojlovic (~steki@93-87-148-183.dynamic.isp.telekom.rs) Quit (Remote host closed the connection)
[18:50] * gregaf (~Adium@99.139.62.62) has joined #ceph
[18:50] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:50] * gregaf1 (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:50] * sjust (~sam@99.139.62.62) has joined #ceph
[18:52] * sjust1 (~sam@99.139.62.62) has joined #ceph
[18:52] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:52] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:52] * gregaf1 (~Adium@99.139.62.62) has joined #ceph
[18:52] <jmlowe> http://pastebin.com/BEmjMMgV
[18:52] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[18:53] * gregaf (~Adium@99.139.62.62) has joined #ceph
[18:53] * sjust1 (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:53] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:53] * gregaf1 (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:54] * sjust (~sam@99.139.62.62) has joined #ceph
[18:54] <sjust> jmlowe: you have them arranged into two racks, right?
[18:54] <jmlowe> one rack
[18:54] <jmlowe> two hosts, 6 disks per host
[18:54] <sjust> could you pastebin your crushmap?
[18:54] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:56] * sagewk1 (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:56] * elder (~elder@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:56] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:56] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:56] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:56] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:56] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:56] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:56] * sjustlaptop (~sam@mc32736d0.tmodns.net) has joined #ceph
[18:57] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[18:57] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:57] * sagewk1 (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:57] * gregaf (~Adium@99.139.62.62) has joined #ceph
[18:58] <jmlowe> http://pastebin.com/9igWiNS2
[18:58] * sagewk (~sage@99.139.62.62) has joined #ceph
[18:59] * joshd (~joshd@99.139.62.62) has joined #ceph
[18:59] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[18:59] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:59] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[18:59] <sjustlaptop> jmlowe: you may have hit a crush issue, looking
[19:00] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:00] * joshd (~joshd@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:00] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:00] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[19:00] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:01] * joshd (~joshd@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:01] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:01] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:01] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:01] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:02] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:02] * joshd (~joshd@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:02] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:03] <jmlowe> Finally, I'm getting a bit tired of the kernel developers bugs, any change is refreshing
[19:03] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:03] * joshd (~joshd@99.139.62.62) has joined #ceph
[19:03] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:03] <gregaf> whee, the internet is broken!
[19:04] <jmlowe> it's like the early nineties in here
[19:04] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[19:04] <gregaf> darkfaded: all: actually you can use Ceph snapshots without using btrfs as the backing store — it's not as efficient since Ceph needs to do manual copy-on-write, but it does work!
[19:04] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:05] * yehudasa__ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:05] * joshd (~joshd@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:05] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:05] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:05] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:05] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:05] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:05] <gregaf> the problems you're going to run into with ext4 involve the use of xattrs, which means it's not a very good choice for radosgw — but if you're not using that and you don't do too many snapshots or have super-deep directory trees it's okay (and xfs shouldn't have any problems with the xattrs, but I dunno how its stability is under Ceph)
[19:05] <jmlowe> gregaf: is there anything else I should know about using ext4, I'm wedged between 3.0 btrfs bugs and 3.2 driver bugs?
[19:06] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:06] * gregaf1 (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:06] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:06] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:07] <gregaf1> and sorry if we miss messages — our proper internet link is down here and there seem to be some issues with our backup and this channel; I'm hitting refresh on wido's irc logs just to find out if my messages made it out or not
[19:07] * joshd (~joshd@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:07] <sjustlaptop> jmlowe: there is a limit to the total size of xattrs on an object, this can pose a problem for ceph bookkeeping
[19:08] <sjustlaptop> for ext4 that is
[19:08] <sjustlaptop> if you avoid snapshots and long object names I think you won't hit it though
[19:08] * sjust (~sam@99.139.62.62) has joined #ceph
[19:08] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:09] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:09] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:09] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:09] * gregaf1 (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:09] * joshd (~joshd@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:10] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:10] * yehudasa__ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:10] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:11] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:11] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:11] <jmlowe> Anybody know Chris Mason, his last promise for a working fsck.btrfs was back in August, is it ever going to really happen?
[19:12] * yehudasa__ (~yehudasa@99.139.62.62) has joined #ceph
[19:12] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:12] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:12] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Write error: connection closed)
[19:12] * gregaf1 (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:14] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:14] * gregaf1 (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:14] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:14] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:16] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:16] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:16] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:16] * yehudasa__ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:16] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Write error: connection closed)
[19:16] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:16] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:16] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:17] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:17] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:17] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:18] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:18] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:18] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:18] * joshd (~joshd@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:19] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:19] * joshd (~joshd@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:19] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:19] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:19] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:19] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:20] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[19:20] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:20] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:21] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:21] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:22] * yehudasa__ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:22] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:22] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:22] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:22] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:23] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:23] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:23] * yehudasa__ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:23] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:24] <jmlowe> sjustlaptop: so does it look like a crush problem?
[19:24] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:25] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:25] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:25] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:25] * yehudasa (~yehudasa@99.139.62.62) has joined #ceph
[19:25] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:26] <alexk> what is the best target OS for ceph? was it ever built and test on centos?
[19:26] * yehudasa (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:26] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:26] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:26] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:27] <dwm_> alexk: By OS, the current implementation targets Linux. By distribution, it's pretty agnostic. There's high-quality .deb packages available for Debian-derived distributions; RPMS also exist but I understand having One True Spec File that works well across the different variants is non-trivial.
[19:27] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:27] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:27] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:27] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:27] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:28] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:28] <jmlowe> alexk: I'm real happy with ubuntu 11.10 fwiw
[19:28] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:29] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:29] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:29] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:29] <alexk> ok. I tried centos just now (since I have it at hand), but it would not build right away (probably due to outdated automake version)
[19:30] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[19:30] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:30] <alexk> for testing and production I would try RHEL first of all
[19:30] * sjustlaptop (~sam@mc32736d0.tmodns.net) Quit (Read error: Connection reset by peer)
[19:31] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:31] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:31] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:32] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:32] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:32] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:32] * yehudasa (~yehudasa@99.139.62.62) has joined #ceph
[19:32] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:33] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[19:33] * yehudasa (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:34] * sjustlaptop (~sam@mc32736d0.tmodns.net) has joined #ceph
[19:34] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:34] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:34] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[19:34] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:35] * yehudasa (~yehudasa@99.139.62.62) has joined #ceph
[19:35] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:35] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:35] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:36] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:36] <sjustlaptop> jmlowe: ok, it's not actually a bug, the problem is that your crushmap specifies that there should be at most one replica per host and you only have two hosts
[19:36] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:36] <jmlowe> that's what I was thinking but I'm not sure how to say you can have up to two copies on one host
[19:37] <jmlowe> do I need an extra step in the rule?
[19:37] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:37] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:37] * yehudasa (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:37] * yehudasa (~yehudasa@99.139.62.62) has joined #ceph
[19:38] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:39] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[19:39] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:39] * yehudasa (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:39] <sjustlaptop> jmlowe: I'm also not sure how to restrict it to at most two replicas per host, looking
[19:39] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:39] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:40] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[19:40] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:40] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:40] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[19:40] * fronlius (~fronlius@d217151.adsl.hansenet.de) Quit (Quit: fronlius)
[19:40] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:41] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:41] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:41] <jmlowe> or would it be more like 'step chooseleaf firstn 2 type host'
[19:41] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:42] <jmlowe> to grab replica's two at a time and put them on a host
[19:42] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:42] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:42] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:42] * sjust (~sam@99.139.62.62) has joined #ceph
[19:43] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:43] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:43] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:43] <sjustlaptop> it should be closer to 'step choose firstn 2 type host' followed by 'step choose firstn 2 type device'
[19:43] * yehudasa (~yehudasa@99.139.62.62) has joined #ceph
[19:44] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:44] * yehudasa (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:44] <jmlowe> or should that be a 'step choose firstn 2 type host; step choose firstn 0 type device'
[19:44] * yehudasa (~yehudasa@99.139.62.62) has joined #ceph
[19:45] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:45] * yehudasa (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:45] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:45] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:46] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:46] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:46] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:46] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:46] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:47] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:47] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:47] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:47] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:47] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[19:48] * sjust (~sam@99.139.62.62) has joined #ceph
[19:48] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:48] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:48] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:48] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:49] * yehudasa__ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:49] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:49] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:49] <sjustlaptop> jmlowe: newline between the steps
[19:49] <sjustlaptop> not sure it'll work, testing it here also
[19:50] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:50] * sagewk (~sage@99.139.62.62) has joined #ceph
[19:51] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[19:51] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:51] * yehudasa__ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:51] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:51] * gregaf (~Adium@99.139.62.62) has joined #ceph
[19:51] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:52] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[19:52] * yehudasa (~yehudasa@99.139.62.62) has joined #ceph
[19:53] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:53] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:53] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:53] <nhm> Are you guys having some networking issues?
[19:53] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:53] <dwm_> nhm: Yeah, apparently their main netlink is down and their backup is flakey as hell.
[19:53] <dwm_> ... as can be observed!
[19:54] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:54] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:54] * yehudasa (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[19:54] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:55] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:56] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:56] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:56] <nhm> dwm_: indeed!
[19:56] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:58] * gregaf1 (~Adium@99.139.62.62) has joined #ceph
[19:58] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:58] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Write error: connection closed)
[19:58] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:58] * fronlius (~fronlius@d217151.adsl.hansenet.de) has joined #ceph
[19:58] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:59] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[19:59] * gregaf1 (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:00] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:01] * yehudasa__ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:01] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:01] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:01] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:02] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:02] * sagewk (~sage@99.139.62.62) has joined #ceph
[20:03] * yehudasa_ (~yehudasa@99.139.62.62) has joined #ceph
[20:03] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:03] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:03] * yehudasa__ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:04] * gregaf (~Adium@99.139.62.62) has joined #ceph
[20:05] * yehudasa__ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:05] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:05] * yehudasa_ (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:06] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:06] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:07] * gregaf1 (~Adium@99.139.62.62) has joined #ceph
[20:07] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:07] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:08] * sjust (~sam@99.139.62.62) has joined #ceph
[20:08] * gregaf1 (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:09] * sagewk (~sage@99.139.62.62) has joined #ceph
[20:09] * gregaf (~Adium@99.139.62.62) has joined #ceph
[20:11] * gregaf1 (~Adium@99.139.62.62) has joined #ceph
[20:11] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:11] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:11] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:11] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:11] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:11] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:11] * gregaf1 (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:12] * yehudasa (~yehudasa@99.139.62.62) has joined #ceph
[20:12] * yehudasa__ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:12] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:13] * gregaf (~Adium@99.139.62.62) has joined #ceph
[20:13] * sagewk (~sage@99.139.62.62) has joined #ceph
[20:14] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:15] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:15] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:15] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:16] * gregaf (~Adium@99.139.62.62) has joined #ceph
[20:17] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[20:17] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:17] * yehudasa (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:17] * yehudasa (~yehudasa@99.139.62.62) has joined #ceph
[20:17] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:18] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:18] * yehudasa (~yehudasa@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:18] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:19] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:19] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:19] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:19] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:20] * sagewk (~sage@99.139.62.62) has joined #ceph
[20:20] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:20] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:21] * gregaf (~Adium@99.139.62.62) has joined #ceph
[20:23] * sagewk1 (~sage@99.139.62.62) has joined #ceph
[20:23] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:23] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:23] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:23] * sjust (~sam@99.139.62.62) has joined #ceph
[20:23] * sagewk1 (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:23] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:24] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:24] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:24] * sjust (~sam@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:24] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:24] * gregaf (~Adium@99.139.62.62) has joined #ceph
[20:25] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:25] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:25] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:26] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:26] * yehudasa_ (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:26] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Write error: connection closed)
[20:26] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:26] * gregaf (~Adium@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:27] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:28] * sagewk (~sage@99.139.62.62) has joined #ceph
[20:28] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[20:28] * gregaf (~Adium@99.139.62.62) has joined #ceph
[20:29] * sagewk (~sage@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:29] * gregaf (~Adium@99.139.62.62) Quit (Read error: Connection reset by peer)
[20:29] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:29] * gregaf (~Adium@99.139.62.62) has joined #ceph
[20:30] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) has joined #ceph
[20:30] * gregaf (~Adium@99.139.62.62) Quit ()
[20:30] * yehudasa_ (~yehudasa@aon.hq.newdream.net) has joined #ceph
[20:31] <nhm> tv: I like your presentation from the openstack conference. On the graph on page 49 with all of the MDSes, what was happening at the valleys labelled "many directories" and "same directory"?
[20:31] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[20:31] <nhm> tv: I like your presentation from the openstack conference. On the graph on page 49 with all of the MDSes, what was happening at the valleys labelled "many directories" and "same directory"?
[20:33] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[20:34] * sagewk1 (~sage@aon.hq.newdream.net) has joined #ceph
[20:35] * sjust1 (~sam@aon.hq.newdream.net) has joined #ceph
[20:38] <sjustlaptop> jmlowe: "step choose firstn 2 type host\nstep choose firstn 2 type osd" seems to work
[20:38] * sjust (~sam@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[20:38] * yehudasa (~yehudasa@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[20:38] * sagewk (~sage@adsl-99-139-62-62.dsl.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[20:39] <jmlowe> I seem to have trashed my osd's, I accidentally left step choose firstn 0 type host now they crash even with original crush map
[20:39] * fronlius (~fronlius@d217151.adsl.hansenet.de) Quit (Quit: fronlius)
[20:41] <nhm> jmlowe: btw, what kind of hardware are you testing on?
[20:42] <jmlowe> 2 x (hp dl180G6 with a P800 to a MSA60 with 12 x 1TB SAS)
[20:43] <jmlowe> 10GigE for networking
[20:43] <nhm> Nice. I'm mostly stuck doing my testing on old junk, and it's tough even getting that.
[20:44] <jmlowe> eventually once I get things nailed down this will be a 4 way ceph cluster
[20:44] <jmlowe> identical hardware
[20:45] <jmlowe> I've got 4x HP dl160G6 to run kvm vm's backed by rbd
[20:46] * sjustlaptop (~sam@mc32736d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[20:47] <nhm> We mostly end up buying some kind of vendor solution and then have no testing infrastructure.
[20:47] <jmlowe> the interesting thing is that I've got a second data center, I'll have 2 ceph osd/mds/mon and 4 vm hosts in each data center, they are linked by 40Gbs (4x10Gbs) and they have the same vlan in both data centers
[20:48] <Tv> nhm: that image is copied straight from Sage's earlier work, and is explained in there
[20:48] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[20:48] <nhm> Tv: ok, thanks
[20:48] <Tv> nhm: i'd re-explain, but i'd need to look it up as a refresher anyway ;)
[20:49] <Tv> nhm: it's basically exploring the scalability of the mds, clients interacting with lots of directories, and then clients interacting with one directory (that gets fragmented across mdses)
[20:50] <nhm> Tv: looks like everything in those valleys ends up getting funneled to MDS0
[20:51] <Tv> nhm: yeah, and then spread out
[20:51] <jmlowe> so any hope of recovery from this? http://pastebin.com/39E6jAfW
[20:51] <Tv> that's how i remember it
[20:51] <Tv> nhm: mds recovering from overload, first across many dirs, then within a single dir
[20:51] <Tv> nhm: by shedding off load to other mdses
[20:52] <nhm> jmlowe: neat set up. Some of the guys at the last lustre conference were talking about wide-area storage. You could write a paper. :)
[20:53] <jmlowe> We already did
[20:53] <nhm> Tv: ah, neat
[20:53] <nhm> jmlowe: good for you! Where do you work?
[20:53] <jmlowe> http://scholar.google.com/scholar?hl=en&q=simms+lustre&btnG=Search&as_sdt=0%2C15&as_ylo=&as_vis=0
[20:53] <jmlowe> Indiana University
[20:54] <nhm> jmlowe: I'm at the University of Minnesota Supercomputing Institute
[20:54] <jmlowe> excellent, a fellow academic
[20:55] <jmlowe> well I'm staff, but I like to pretend
[20:55] <nhm> same here
[20:56] <nhm> jmlowe: you wrote torque log parsing code. :)
[20:57] <jmlowe> at some time
[20:58] <jmlowe> I seem to rewrite that every couple of years
[20:58] <nhm> jmlowe: Yeah, several people here have all written some variant of the same thing.
[20:58] <nhm> including me. :)
[20:59] <jmlowe> it's like a hpc rite of passage
[21:01] <nhm> jmlowe: Must be. One of my backburnered projects is to correlate moab/torque logs, collectl data, and performance counter metrics for every process that runs on our clusters.
[21:01] <jmlowe> same here
[21:02] <jmlowe> I pump ours from the clusters out to an oracle db using rabbitmq
[21:02] <nhm> nice
[21:02] <nhm> nothing so fancy here, I just have cron scping stuff around.
[21:02] <jmlowe> you should look up Joshi Fullop at NCSA, he is doing some amazing correlation engine stuff
[21:03] <jmlowe> if you run into him tell him you know me
[21:04] <nhm> jmlowe: Will do. Sadly we have become rather isolationist and are focusing less on neat projects like that these days.
[21:05] <nhm> jmlowe: also, the guys down at TACC are starting to do some of this kind of stuff too with taccstats.
[21:06] <jmlowe> I also have mdiag pumping out stuff, but nobody's listening
[21:07] <nhm> Do you get anything beyond what's in the moab logs? I just parse them directly like the torque logs.
[21:07] <nhm> I use it to generate graphs like this: http://www.msi.umn.edu/~mark/msica/mirror_20090928-20100927.png
[21:07] <jmlowe> eh, I never found them very useful for the long term
[21:08] <jmlowe> ooh, pretty
[21:09] <jmlowe> https://github.iu.edu/jomlowe/HPC-Stats
[21:09] <nhm> jmlowe: I used the same tools when I was doing lustre filesystem testing with IOR: http://www.msi.umn.edu/~mark/msica/2GB-block_64MB_directIO_posix_nocache.png
[21:10] <nhm> hrm... looks like I need IU credentials to see that...
[21:10] <jmlowe> really?
[21:10] <nhm> jmlowe: Yeah, "Use your IU credentials to log in. If you are not part of Indiana University, contact rtadmin@indiana.edu to have an account created for you."
[21:11] <jmlowe> what's your email address?
[21:11] <nhm> nhm@clusterfaq.org or mark@msi.umn.edu
[21:13] * lxo (~aoliva@exit1.ipredator.se) has joined #ceph
[21:15] <jmlowe> I'll have to update this remote but https://github.com/jmlowe/HPC-Stats
[21:15] <nhm> jmlowe: btw, the stuff that generated those graphs is here: https://github.com/clusterfaq/nhmca
[21:15] * gregorg_taf (~Greg@78.155.152.6) Quit (Ping timeout: 480 seconds)
[21:15] <nhm> jmlowe: it's rather ugly perl that should probably be rewritten, but it's pretty configurable.
[21:16] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[21:18] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:25] <wido> I found the memory leak, what was causing it
[21:26] <wido> One of my monitors was still running 0.38, while the whole cluster is running 0.39
[21:26] * fronlius (~fronlius@d217151.adsl.hansenet.de) has joined #ceph
[21:26] <wido> The 0.38 monitor was 'advertising' 0000-000-XX as fsid which ofcourse was the wrong fsid
[21:27] <wido> That was generating a lot of log entries in the other two monitors, but it also caused them to start eating memory appereantly.
[21:27] <wido> I upgraded the monitor to 0.39 and now everything is running happily
[21:31] * elder (~elder@aon.hq.newdream.net) has joined #ceph
[21:41] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[21:45] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has joined #ceph
[22:02] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[22:02] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:06] <nhm> looks like the network connection got fixed. :)
[22:14] <sjust1> jmlowe: could you post the crushmap that caused the crashes?
[22:19] <jmlowe> http://pastebin.com/dKuDBSHL
[22:21] <jmlowe> I broke it with having 'step chooseleaf firstn 0 type osd; step chooseleaf firstn 2 host' ( ; added for clarity)
[22:21] <jmlowe> reverted and still broken
[22:22] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:23] <gregaf> sjust1: so that crush map means the PGs don't map to anybody, right?
[22:24] <sjust1> gregaf: unfortunately, I think it means that we hit an assert due to two chooseleaf directives, one moment
[22:24] <gregaf> ah, okay
[22:25] <gregaf> and it's in the OSD Map that all the OSDs have, so they read it on startup and BOOM
[22:26] <sjust1> perhaps not, jmlowe: do you have the crash backtrace?
[22:28] <jmlowe> where would I find that?
[22:28] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:29] <sjust1> end of osd log?
[22:29] <jmlowe> 2012-01-11 14:04:33.643377 7f272d49c7a0 journal read_entry 1010008064 : seq 43372 33 bytes
[22:29] <jmlowe> 2012-01-11 14:04:33.643403 7f272d49c7a0 journal read_entry 1010016256 : seq 43373 33 bytes
[22:29] <jmlowe> 2012-01-11 14:04:33.643441 7f272d49c7a0 journal _open /data/osd.7/journal fd 15: 1048576000 bytes, block size 4096 bytes, directio = 1
[22:29] <jmlowe> *** Caught signal (Segmentation fault) **
[22:29] <jmlowe> in thread 7f2721398700
[22:29] <jmlowe> ceph version 0.39 (commit:321ecdaba2ceeddb0789d8f4b7180a8ea5785d83)
[22:29] <jmlowe> 1: /usr/bin/ceph-osd() [0x5fa846]
[22:29] <jmlowe> 2: (()+0x10060) [0x7f272d07f060]
[22:29] <jmlowe> 3: /usr/bin/ceph-osd() [0x5e61db]
[22:29] <jmlowe> 4: (crush_do_rule()+0x326) [0x5e6ac6]
[22:29] <jmlowe> 5: (CrushWrapper::do_rule(int, int, std::vector<int, std::allocator<int> >&, int, int, std::vector<unsigned int, std::allocator<unsigned int> > const&) const+0x50) [0x55e480]
[22:29] <jmlowe> 6: (OSD::advance_map(ObjectStore::Transaction&)+0x17b7) [0x54eec7]
[22:29] <jmlowe> 7: (OSD::handle_osd_map(MOSDMap*)+0x18c7) [0x550ed7]
[22:29] <jmlowe> 8: (OSD::_dispatch(Message*)+0x25b) [0x552d6b]
[22:29] <jmlowe> 9: (OSD::ms_dispatch(Message*)+0x116) [0x553dd6]
[22:29] <jmlowe> 10: (SimpleMessenger::dispatch_entry()+0x84b) [0x5ba37b]
[22:29] <jmlowe> 11: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x4b134c]
[22:29] <jmlowe> 12: (()+0x7efc) [0x7f272d076efc]
[22:29] <jmlowe> 13: (clone()+0x6d) [0x7f272b6a789d]
[22:30] <jmlowe> that what you are looking for?
[22:30] <sjust1> yup
[22:31] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[22:34] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:43] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[22:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[22:52] * fghaas (~florian@85-127-93-41.dynamic.xdsl-line.inode.at) has left #ceph
[22:55] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) Quit (Quit: Leaving.)
[22:55] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) has joined #ceph
[22:56] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:58] * adjohn is now known as Guest23755
[22:58] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[22:58] * Guest23755 (~adjohn@208.90.214.43) Quit (Read error: Connection reset by peer)
[22:59] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) Quit ()
[22:59] * jmlowe (~Adium@140-182-219-86.dhcp-bl.indiana.edu) has joined #ceph
[23:07] * jmlowe (~Adium@140-182-219-86.dhcp-bl.indiana.edu) Quit (Ping timeout: 480 seconds)
[23:14] * sjust1 (~sam@aon.hq.newdream.net) Quit (Quit: Leaving.)
[23:14] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[23:20] * adjohn (~adjohn@208.90.214.43) Quit (Remote host closed the connection)
[23:21] * adjohn (~adjohn@208.90.214.43) has joined #ceph
[23:33] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:35] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:39] * lx0 is now known as lxo

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.