#ceph IRC Log


IRC Log for 2012-10-04

Timestamps are in GMT/BST.

[0:05] * scuttlemonkey (~scuttlemo@ Quit (Quit: HALP! U LEF AIRLOK OPEN! http://bit.ly/fuasqd)
[0:06] * MarkN (~nathan@ has joined #ceph
[0:07] * MarkN (~nathan@ has left #ceph
[0:08] <nhm> hrm, that could be confusing.
[0:14] * LarsFronius (~LarsFroni@2a02:8108:3c0:79::2) Quit (Quit: LarsFronius)
[0:14] <joao> nhm, /nick TheRealMarkN
[0:14] <joao> :p
[0:20] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:21] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:22] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:24] <dmick> likely to cause a bit of confusion. Mind if we call you Bruce?
[0:27] * dty (~derek@testproxy.umiacs.umd.edu) Quit (Ping timeout: 480 seconds)
[0:33] * pentabular is now known as cowbell
[0:33] <cowbell> more cowbell!
[0:33] <nhm> awesome! I'll answer to Bruce.
[0:37] * cowbell (~sean@adsl-70-231-128-149.dsl.snfc21.sbcglobal.net) has left #ceph
[0:37] * slang (~slang@c-24-12-181-11.hsd1.il.comcast.net) has joined #ceph
[0:46] * cblack101 (86868949@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[0:48] * houkouonchi-work (~linux@ Quit (Ping timeout: 480 seconds)
[0:51] * slang (~slang@c-24-12-181-11.hsd1.il.comcast.net) Quit (Quit: slang)
[0:51] <dmick> http://www.youtube.com/watch?v=_f_p0CgPeyA
[0:52] <dmick> not http://files.sharenator.com/more_cowbell_tshirt-s400x290-104326.jpg
[0:52] <dmick> important to keep the canon straight
[1:01] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[1:09] * adjohn (~adjohn@ Quit (Quit: adjohn)
[1:18] * miroslavk1 (~miroslavk@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[1:22] * houkouonchi-work (~linux@ has joined #ceph
[1:27] * jlogan1 (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[1:29] * dty (~derek@pool-71-178-175-208.washdc.fios.verizon.net) has joined #ceph
[1:29] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[1:32] * dty_ (~derek@pool-71-178-175-208.washdc.fios.verizon.net) has joined #ceph
[1:32] * dty (~derek@pool-71-178-175-208.washdc.fios.verizon.net) Quit (Read error: Connection reset by peer)
[1:32] * dty_ is now known as dty
[1:43] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[1:45] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[1:50] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:53] <dmick> elder: you around?
[1:54] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has left #ceph
[2:06] * scuttlemonkey (~scuttlemo@ has joined #ceph
[2:07] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[2:09] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[2:19] * Kioob (~kioob@luuna.daevel.fr) Quit (Ping timeout: 480 seconds)
[2:24] * sagelap1 (~sage@ Quit (Ping timeout: 480 seconds)
[2:25] * sagelap (~sage@41.sub-70-197-139.myvzw.com) has joined #ceph
[2:31] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[2:33] * sagelap (~sage@41.sub-70-197-139.myvzw.com) Quit (Ping timeout: 480 seconds)
[2:36] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[2:40] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[2:42] * tren (~Adium@2001:470:b:2e8:adbb:7082:700c:57c7) has joined #ceph
[2:42] <tren> Hey, is Greg around?
[2:42] <gregaf> yep, hi!
[2:42] <elder> dmick, I am now
[2:42] <tren> Hey! :)
[2:43] <gregaf> I was just writing you a brief email about the stuff from last week that got delayed because I was hoping you'd pop up in irc and spare me the effort. :p
[2:43] <tren> There's an issue with the dumpcache. It doesn't take short path (./)
[2:43] <tren> it's ended up trying to put it into /
[2:43] <tren> is it supposed to do that?
[2:44] <gregaf> hmm
[2:44] <gregaf> probably a parsing bug
[2:44] <gregaf> actually, it's just calling ofstream::open on whatever you give it
[2:44] <tren> and if I give it a full path, it's not actually making the file :/
[2:45] <gregaf> hmm
[2:45] <tren> is it a single shot?
[2:45] <tren> because I had to remove the partial file it was making in my / fs
[2:45] <tren> as it wasn't big enough
[2:45] <gregaf> oh god
[2:45] <gregaf> it's probably still attempting to dump, then
[2:46] <gregaf> it's just streaming until it runs through the cache and then closing the stream
[2:46] <tren> hmm
[2:46] <gregaf> umm
[2:46] <tren> it failed over the mds
[2:46] <gregaf> yeah, that's what I'd expect to see
[2:47] <tren> and now the rsync that was running has died with "Transport endpoint is not connected"
[2:47] <gregaf> it probably blocked on the ENOSPC
[2:47] <gregaf> and then got failed over
[2:47] <tren> which also killed the ceph-fuse client
[2:47] <tren> :/
[2:47] <tren> that shouldn't happen.
[2:47] <gregaf> yeah :/
[2:48] <gregaf> we've started two new people in the last 2.5 weeks who are working on the filesystem
[2:48] <gregaf> and I'll be transitioning back to it Real Soon Now
[2:48] <gregaf> so it should be getting noticeably better shortly
[2:48] <tren> So for now should I abandon any attempts on working with the FS part?
[2:48] <gregaf> but at the moment...http://ceph.com/docs/master/faq/#is-ceph-production-quality :(
[2:49] <tren> lol, I know it's not
[2:49] <gregaf> well, it depends, but you do seem to have a workload that pretty reliably kills it
[2:49] <tren> but it's also not really test worthy either
[2:49] <tren> as it's extremely fragile
[2:49] <tren> it's just an rsync
[2:49] <tren> that's all
[2:49] <gregaf> rsync is *hard*!
[2:49] <gregaf> ;)
[2:50] <tren> I guess :/
[2:50] <gregaf> in all seriousness, it's pretty much a pessimal workload as far as CephFS goes
[2:50] <tren> yeah, I'm getting that :)
[2:50] <tren> the MDS seems extremely fragile
[2:50] <gregaf> there actually are people who run it and succeed with other sorts of workloads
[2:51] <tren> It's more the issue of mds failovers killing fs access
[2:51] <gregaf> yeah
[2:51] <tren> and now, the memory leak in the mds
[2:51] <gregaf> so it appears to not be a "leak" in the normal sense; it is in fact far exceeding the number of inodes and directories it's supposed to be caching
[2:52] <gregaf> and that is also almost certainly why the failover isn't working correctly
[2:52] <gregaf> if you could get us a cache dump we could probably figure out what's going on
[2:52] <tren> k
[2:52] <gregaf> but I don't think we have the bandwidth right now to be setting up and running the tests locally, given our other commitments
[2:52] <tren> I'll re-start the rsync
[2:53] <tren> and I'll do the dump in the morning. MDS should be up to about 10GB by then
[2:53] <tren> that'll at least be a little more sane to dump
[2:53] <gregaf> (the number was >9 million inodes and >1 million directories, which is…ridiculous)
[2:53] <dty> is there any reason you could not run two radosgw to ensure high availability? would obviously run a load balancer in front to arbitrate connections
[2:54] <gregaf> dty: radosgw is in fact designed to be clustered that way
[2:54] <gregaf> you can run as many as you like!
[2:54] <dty> great, thanks
[2:54] <tren> Greg: we're a large mail hosting company
[2:54] <gregaf> (although with radosgw you actually would run into some scaling problems at some point, but it's far past two)
[2:54] <tren> Greg: We deal with lots of ridiculous amounts of everything
[2:54] <gregaf> tren: I mean it's a ridiculous number for the daemon to be caching, not to have in the filesystem :)
[2:55] <tren> oh, gotcha :D
[2:55] <gregaf> our default is 100k
[2:55] <gregaf> so, you know
[2:55] <dty> i just need to have a unique name in the []'s in the config correct. The example has [client.radosgw.gateway]
[2:55] <gregaf> I think so…yeyehudasa?
[2:55] <tren> Greg: We have about 1k servers
[2:55] <gregaf> err, yehudasa*
[2:56] <yehudasa> yey hudasa?
[2:56] <dty> i guess my question is if that name ([client.radosgw.gateway]) is special
[2:56] <dty> eg. [client.radosgw.gateway1] and [client.radosgw.gateway2]
[2:56] <dty> i know the init script looks for a prefix
[2:56] <gregaf> tren: so if they're all going to access one filesystem you may want to have a larger default cache size ;)
[2:56] <yehudasa> the radosgw init script searches for a client.radosgw prefix
[2:56] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[2:57] * gregorg (~Greg@ has joined #ceph
[2:57] * scuttlemonkey (~scuttlemo@ has joined #ceph
[2:57] <gregaf> but it should not be growing to more than 90 times its specified max size
[2:58] <gregaf> initial guess is that there's a problem with correctly trimming the "capabilities" which let clients access files, and that's holding things up, but we'll need to see more to know for sure and look at what the cause could be
[2:58] <tren> hehe…I agree. Though, the mds fail over actually DID work smoothly…ceph-fuse client dying not withstanding
[2:58] <tren> I'll dump the cache in the AM. Is it going to be approximately the size of the RAM usage?
[3:01] <tren> Greg: Also, is the ceph-fuse client allowed to take as much ram as it wants? Before it killed itself, it was at 12GB of ram
[3:01] * Cube (~Adium@ Quit (Ping timeout: 480 seconds)
[3:02] <gregaf> tren: I think the cache dump should be smaller, but I'm not actually certain how much info it prints out…I can try and get a more accurate count if you need one
[3:02] <gregaf> the ceph-fuse client is definitely not supposed to take as much RAM as it wants
[3:03] <gregaf> that's another clue to me that the problem probably lies with capabilities
[3:04] <tren> naw, I'll do it in the morning and provide it to you. :)
[3:04] <tren> Are you usually on here in the later part of the day?
[3:04] <tren> like 2PM on?
[3:05] <tren> I'm Pacific timezone
[3:05] <gregaf> I'm generally around at 9-10 Pacific until 5-6:30 Pacific
[3:05] <gregaf> work hours ;)
[3:06] <gregaf> same with sage (he can help you too/better if by luck he's got some spare time) and the rest of the Inktank folks
[3:06] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[3:06] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:07] <gregaf> if something interesting is going on I'm sometimes available later in the day too
[3:08] <tren> Well, thank you very much for you and your teams' help
[3:08] <gregaf> thanks for testing!
[3:08] <tren> no worries. I have a lot of faith in Ceph
[3:08] <tren> we are running moosefs right now
[3:09] <tren> it's stable and reliable but we have serious concerns
[3:09] <gregaf> ah, so not your first distributed fs
[3:09] <tren> nope
[3:09] <tren> 4th
[3:09] <tren> Ceph is definitely the most advanced though
[3:10] <gregaf> :D
[3:10] <tren> just need to make it less fragile
[3:11] <tren> I've noticed that a heavy load on an osd server is enough to cause ceph to self destruct
[3:11] <gregaf> it'll make the OSD self-destruct, but the system should survive that…and there were some changes merged into (I think) .48.2 to improve handling that situation
[3:12] <tren> I'm running .52
[3:12] <tren> on kernel 3.5.3
[3:12] <tren> I try to keep current, though I realize that's been in vain with regards to the fs portion ;)
[3:12] <tren> I just built kernel 3.6 today for another issue we're having
[3:13] <tren> so the ceph nodes will be on 3.6 in the next few days
[3:13] <tren> and once an osd server dies (we have 12 per node, 16 nodes) the repair load causes any other busy servers to go over the edge and it cascades from there
[3:14] <tren> I've managed to tune most of those issues away. Except for mds. I have *no* idea how to tune mds timeouts and there's no documentation
[3:15] <gregaf> hmm, actually, looks like those features weren't merged in to a release yet
[3:15] <gregaf> weird
[3:16] <gregaf> oh, the mds heartbeat controls are mds_beacon_interval and mds_beacon_grace
[3:16] <gregaf> it'll send one every mds_beacon_interval_seconds
[3:16] <gregaf> and get marked laggy if the monitor doesn't receive one for mds_beacon_grace seconds
[3:17] <gregaf> if you're brave you can pick up a lot of them (as well as other things you shouldn't change!) in src/common/config_opts.h, or ask on the mailing list
[3:18] <gregaf> and it appears if you change mds_beacon_grace you should change mds_session_timeout and mds_reconnect_timeout to match
[3:18] <gregaf> tren: so, real simple! ;)
[3:18] <tren> oh
[3:19] <tren> that's explain why when I played with those settings I caused the mds to freak out
[3:19] * rturk_ (~rturk@cpe-76-166-218-169.socal.res.rr.com) has joined #ceph
[3:19] <tren> it was on a fresh install of ceph
[3:19] * rturk_ (~rturk@cpe-76-166-218-169.socal.res.rr.com) Quit ()
[3:19] <tren> the mds kept failing between the 3 nodes
[3:19] <tren> it was kinda funny/scary
[3:19] <gregaf> where by "to match" I mean "make it [mds_reconnect_timeout] (mds_session_timeout - mds_beacon_grace)"
[3:19] <tren> I removed those settings promptly
[3:19] <gregaf> heh
[3:20] <tren> Would it be possible to have that description sent to the list?
[3:20] <tren> I think it'd be valuable information for anyone trying to build a large ceph cluster
[3:20] <tren> we're at 192 osd's currently and wanting to grow
[3:20] <gregaf> sure
[3:21] <tren> Thank you :)
[3:22] <tren> I'm going to head out for the night. Thank you very much. You've given me a lot of useful information!
[3:22] <tren> Have you ever looked at the moosefs list?
[3:22] <tren> Nothing like the ceph list…so quiet…and the devs never talk on it.
[3:23] <tren> cheers!
[3:23] * tren (~Adium@2001:470:b:2e8:adbb:7082:700c:57c7) Quit (Quit: Leaving.)
[3:26] * hk135 (~root@ Quit (Ping timeout: 480 seconds)
[3:28] * miroslavk (~miroslavk@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[3:29] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[3:37] * maelfius (~mdrnstm@ Quit (Quit: Leaving.)
[3:48] * grant (~grant@202-173-147-27.mach.com.au) has joined #ceph
[3:48] <grant> Hi all, is anyone around?
[3:49] <joshd> yeah, what's up?
[3:49] <grant> I have an OSD that I cannot stop see;
[3:49] <grant> /etc/init.d/ceph stop osd.11
[3:49] <grant> === osd.11 ===
[3:49] <grant> Stopping Ceph osd.11 on dsanb2-coy...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 23
[3:49] <grant> 16...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...kill 2316...^C
[3:50] <grant> Have tried to kill the PID, inc kill -9 2316
[3:50] <grant> Any ideas?
[3:50] <joshd> probably it's stuck in the middle of an I/O, and the underlying fs has a problem
[3:50] <ajm> disk wait?
[3:50] * edv (~edjv@ Quit (Quit: Leaving.)
[3:50] <joshd> is there anything in syslog/dmesg?
[3:51] <grant> I can't determine any issues - here is dmesg
[3:51] <slang> ps -u -p 2316 should show you the state of the process
[3:51] <grant> [ 2820.156002] [<ffffffff81118a5b>] __filemap_fdatawrite_range+0x5b/0x60
[3:51] <grant> [ 2820.156002] [<ffffffff8111942c>] filemap_flush+0x1c/0x20
[3:51] <grant> [ 2820.156002] [<ffffffffa0048189>] btrfs_start_delalloc_inodes+0xc9/0x1f0 [btrfs]
[3:51] <grant> [ 2820.156002] [<ffffffff8104dea3>] ? __wake_up+0x53/0x70
[3:51] <grant> [ 2820.156002] [<ffffffffa003a27e>] btrfs_commit_transaction+0x1ae/0x840 [btrfs]
[3:51] <grant> [ 2820.156002] [<ffffffff8108aae0>] ? add_wait_queue+0x60/0x60
[3:51] <grant> [ 2820.156002] [<ffffffff8165a4fe>] ? _raw_spin_lock+0xe/0x20
[3:51] <grant> [ 2820.156002] [<ffffffffa003a910>] ? btrfs_commit_transaction+0x840/0x840 [btrfs]
[3:51] <grant> [ 2820.156002] [<ffffffffa003a92e>] do_async_commit+0x1e/0x30 [btrfs]
[3:51] <grant> [ 2820.156002] [<ffffffff81084a6a>] process_one_work+0x11a/0x480
[3:51] <grant> [ 2820.156002] [<ffffffff81085814>] worker_thread+0x164/0x370
[3:51] <grant> [ 2820.156002] [<ffffffff810856b0>] ? manage_workers.isra.30+0x130/0x130
[3:51] <grant> [ 2820.156002] [<ffffffff8108a03c>] kthread+0x8c/0xa0
[3:52] <grant> [ 2820.156002] [<ffffffff81664c74>] kernel_thread_helper+0x4/0x10
[3:52] <grant> [ 2820.156002] [<ffffffff81089fb0>] ? flush_kthread_worker+0xa0/0xa0
[3:52] <grant> [ 2820.156002] [<ffffffff81664c70>] ? gs_change+0x13/0x13
[3:52] <grant> Sorry for long output - pastebin might work better for this.
[3:52] <grant> ps u -p 2316; root 2316 0.0 0.0 169036 5144 ? Ds 11:04 0:00 /usr/bin/ceph-osd -i 11 --pid-file /var/run/ceph/osd.11.pid -c /etc/ceph/ceph.conf
[3:53] <ajm> Ds disk wait + filesystem frozen for some reason
[3:53] <grant> Any idea as to how I can kill?
[3:53] <grant> Machine was just power cycled due to osd.13 experiencing same issue.
[3:53] <grant> (and do not have LOM on this box)
[3:55] <joshd> umount -l -f (osd_data_dir) might help
[3:56] <grant> Trying now - thanks guys :)
[3:56] <joshd> assuming there's more to the stuff in dmesg, you're hitting a btrfs bug
[3:57] <grant> Any idea what line exactly makes you think that or I can search for more info on fix?
[3:58] <joshd> it's a backtrace from btrfs (all the lines with btrfs in them) there should be something before all the lines with <ffffffff81118a5b> saying what the problem is
[3:58] <joshd> what version is your kernel?
[3:59] <grant> 3.2.0-31-generic from Ubuntu 12.04
[4:00] <dmick> that's fairly old in btrfs terms
[4:00] <grant> I have a test rig with 3.5 kernel (Ubuntu 12.10 beta2) but cannot reproduce and not sure if upgrading this "pilot" rig is smart
[4:01] <grant> Any recommendations?
[4:02] <grant> OK, forcing the lazy dismount got it out for the moment.
[4:03] <joshd> if you keep seeing these issues, I'd suggest xfs without upgrading, but if you want to keep btrfs, I'd suggest upgrading to 3.5 (and probably reformatting with new btrfs after the upgrade)
[4:04] <joshd> btrfs also ages much better in 3.5. in 3.2 it fragments quite a bit
[4:05] <grant> Thanks, I'll need to find some specifics as to changes in 3.5 that might help this specific issue before we upgrade.
[4:09] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[4:10] * deepsa (~deepsa@ has joined #ceph
[4:22] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:23] <grant> If upgrade to 12.10 for 3.5 kernel - do we need to format each drive with mkfs.btrfs?
[4:23] <grant> Should we do one disk at a time, let ceph redistribute data to it before doing next drive?
[4:26] <joshd> if you want everything to stay available that's best
[4:27] <grant> Ta
[4:27] * miroslavk (~miroslavk@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:30] <grant> @joshd thanks for all your help. Unfortunately, even now that osd is unmounted, ceph service does not recognise this and is still hanging at service ceph osd.11 stop
[4:30] <cephalobot> grant: Error: "joshd" is not a valid command.
[4:30] <grant> :(
[4:31] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:32] <joshd> hmm, I'm not sure there's anything you can do if it still won't be killed
[4:32] <grant> Problem is that it hangs trying to kill this at reboot - requiring hard powercycle (in DC 30 mins away)
[4:32] <grant> osd.13 experienced this problem earlier today.
[4:33] <grant> travel and hard powercycle fixed, but now osd.11 is exhibiting the same.
[4:33] <grant> :(
[4:34] <joshd> ah, right
[4:35] <joshd> you can kill the rest of the osds and do and unmount their data dirs, then do an unsafe shutdown
[4:36] * dty (~derek@pool-71-178-175-208.washdc.fios.verizon.net) Quit (Quit: dty)
[4:37] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Quit: slang)
[4:37] <grant> Won't it then still hang on the broken pid for osd.11?
[4:40] <joshd> not if you use sysrq, i.e. echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger
[4:47] * grant_ (~grant@202-173-147-27.mach.com.au) has joined #ceph
[4:47] * grant (~grant@202-173-147-27.mach.com.au) Quit (Read error: Connection reset by peer)
[4:47] <grant_> Thanks joshd will try now - have not used that command before :)
[4:49] <iggy> that's assuming sysrq is enabled in the kernel
[4:50] <iggy> i think it is in ubuntu, but not in rhel
[4:50] <iggy> but rhel is probably a horrible choice for ceph anyway
[4:55] * miroslavk (~miroslavk@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[4:56] <grant_> It worked, I have other issue, but was able to reboot machine using command.
[4:58] <iggy> for future reference, i usually do alt-sysrq-s(ync) u(nmount) (re)b(oot)
[4:58] <grant_> Thank you :)
[4:59] <iggy> when things are really wedged, that's about as clean a reboot as you can get
[4:59] <joshd> you're welcome :)
[5:00] <joshd> iggy: in this case btrfs was stuck, hence the lack of sync
[5:01] <iggy> well... the other filesystems...
[5:02] <joshd> I warned about those earlier
[5:04] <iggy> oh, i just skimmed the scroll back... my bad
[5:05] <joshd> no worries
[5:12] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[5:13] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[5:23] * grant_ (~grant@202-173-147-27.mach.com.au) Quit (Remote host closed the connection)
[5:35] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Remote host closed the connection)
[5:38] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) has left #ceph
[5:38] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[5:40] * scuttlemonkey (~scuttlemo@ Quit (Read error: Connection reset by peer)
[5:40] * scuttlemonkey (~scuttlemo@ has joined #ceph
[5:45] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Quit: Leaving.)
[6:06] * chutzpah (~chutz@ Quit (Quit: Leaving)
[6:28] * miroslavk (~miroslavk@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[6:44] * dmick (~dmick@2607:f298:a:607:1a03:73ff:fedd:c856) Quit (Quit: Leaving.)
[6:50] * grant (~grant@202-173-147-27.mach.com.au) has joined #ceph
[6:51] <grant> Stupid question - I've just upgraded all nodes to kernel 3.5 - after i reformat each OSD - how do I know that it is then OK to format the next?
[6:51] <grant> what should I be looking for?
[6:55] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[7:00] * deepsa_ (~deepsa@ has joined #ceph
[7:01] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[7:01] * deepsa_ is now known as deepsa
[7:03] <iggy> grant: something in ceph health output probably
[7:05] <iggy> or is it ceph status
[7:07] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[7:19] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Ping timeout: 480 seconds)
[7:22] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[7:23] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[7:26] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) has joined #ceph
[7:27] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) has left #ceph
[7:37] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[8:05] * jtang (~jtang@ has joined #ceph
[8:05] <jtang> will there be a ceph presence at sc2012?
[8:06] <jtang> or is it going to be inktank that will be there?
[8:08] * Kioob (~kioob@luuna.daevel.fr) Quit (Ping timeout: 480 seconds)
[8:12] * aliguori (~anthony@cpe-70-123-140-180.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:15] * jtang (~jtang@ Quit (Quit: WeeChat 0.3.8)
[8:15] * jtang1 (~jtang@sgenomics.org) has joined #ceph
[8:16] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:16] * deepsa (~deepsa@ Quit (Read error: Connection reset by peer)
[8:16] * jtang1 is now known as jtang
[8:17] * deepsa (~deepsa@ has joined #ceph
[8:44] * stxShadow (~Jens@ip-178-203-169-190.unitymediagroup.de) has joined #ceph
[8:50] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:00] * EmilienM (~EmilienM@ has joined #ceph
[9:09] * stxShadow (~Jens@ip-178-203-169-190.unitymediagroup.de) has left #ceph
[9:14] * grant (~grant@202-173-147-27.mach.com.au) Quit (Ping timeout: 480 seconds)
[9:26] * verwilst (~verwilst@ has joined #ceph
[9:32] * Leseb (~Leseb@ has joined #ceph
[9:39] * BManojlovic (~steki@ has joined #ceph
[9:50] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:53] * deepsa_ (~deepsa@ has joined #ceph
[9:55] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[9:55] * deepsa_ is now known as deepsa
[9:57] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[9:58] * EmilienM (~EmilienM@ Quit (Quit: kill -9 EmilienM)
[9:59] <exec> is there any change to have rbd locking feature in agronaut?
[10:03] * deepsa_ (~deepsa@ has joined #ceph
[10:03] * tziOm (~bjornar@ti0099a340-dhcp0358.bb.online.no) has joined #ceph
[10:03] <tziOm> How can I list all rados users?
[10:05] * EmilienM (~EmilienM@ has joined #ceph
[10:07] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[10:07] * deepsa_ is now known as deepsa
[10:12] * verwilst (~verwilst@ Quit (Ping timeout: 480 seconds)
[10:17] * grant (~grant@60-240-78-43.static.tpgi.com.au) has joined #ceph
[10:22] <tziOm> Anyone here?
[10:22] * yoshi_ (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[10:22] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Read error: Connection reset by peer)
[10:22] <tziOm> I have a radosgw problem.. get 400 on "PUT /1ATXQ3HHA59CYF1CVS02%2Dbackups/DailySet1%2Fslot%2D02special%2Dtapestart HTTP/1.1"
[10:27] * EmilienM (~EmilienM@ Quit (Quit: kill -9 EmilienM)
[10:28] * EmilienM (~EmilienM@ has joined #ceph
[10:29] * mgalkiewicz (~mgalkiewi@staticline-31-182-149-180.toya.net.pl) has joined #ceph
[10:30] * loicd (~loic@ has joined #ceph
[10:41] * EmilienM (~EmilienM@ Quit (Quit: kill -9 EmilienM)
[10:42] * EmilienM (~EmilienM@ has joined #ceph
[10:46] <tziOm> Hmm..
[11:03] * EmilienM (~EmilienM@ Quit (Read error: Connection reset by peer)
[11:05] * EmilienM (~EmilienM@ has joined #ceph
[11:08] * EmilienM (~EmilienM@ Quit ()
[11:18] * gohko_ (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[11:19] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[11:21] * EmilienM (~EmilienM@ has joined #ceph
[11:23] * EmilienM (~EmilienM@ Quit ()
[11:35] * grant (~grant@60-240-78-43.static.tpgi.com.au) Quit (Ping timeout: 480 seconds)
[11:54] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[11:55] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) Quit (Quit: tryggvil)
[11:58] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[12:04] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Read error: Connection reset by peer)
[12:05] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[12:14] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[12:15] * cowbell (~sean@ has joined #ceph
[12:15] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[12:16] * cowbell is now known as pentabular
[12:16] * pentabular (~sean@ has left #ceph
[12:29] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[12:31] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:31] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:33] <tziOm> I cant make radosgw and apache work (mod_fastcgi)
[12:33] <tziOm> Get either wrong content length (with rgw print continue = false) or duplicate status... without
[12:46] * mgalkiewicz (~mgalkiewi@staticline-31-182-149-180.toya.net.pl) Quit (Ping timeout: 480 seconds)
[12:52] * loicd (~loic@ has joined #ceph
[12:55] * mgalkiewicz (~mgalkiewi@staticline-31-182-149-180.toya.net.pl) has joined #ceph
[13:00] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Ping timeout: 480 seconds)
[13:01] <tziOm> I have the impression that radosgw does not work.
[13:01] * gaveen (~gaveen@ has joined #ceph
[13:25] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[13:34] * Leseb (~Leseb@ Quit (Quit: Leseb)
[13:34] * yoshi_ (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:35] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[13:37] * Leseb (~Leseb@ has joined #ceph
[13:48] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[14:18] * LarsFronius (~LarsFroni@net-93-151-166-18.cust.dsl.teletu.it) has joined #ceph
[14:19] * tziOm (~bjornar@ti0099a340-dhcp0358.bb.online.no) Quit (Remote host closed the connection)
[14:33] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[14:35] * dty (~derek@pool-71-178-175-208.washdc.fios.verizon.net) has joined #ceph
[14:49] * loicd (~loic@ Quit (Quit: Leaving.)
[14:58] * aliguori (~anthony@cpe-70-123-130-163.austin.res.rr.com) has joined #ceph
[15:03] * LarsFronius (~LarsFroni@net-93-151-166-18.cust.dsl.teletu.it) Quit (Quit: LarsFronius)
[15:05] * dty (~derek@pool-71-178-175-208.washdc.fios.verizon.net) Quit (Quit: dty)
[15:09] * loicd (~loic@magenta.dachary.org) has joined #ceph
[15:20] * EmilienM (~EmilienM@ has joined #ceph
[15:30] * nhorman (~nhorman@nat-pool-rdu.redhat.com) has joined #ceph
[15:34] * dty (~derek@testproxy.umiacs.umd.edu) has joined #ceph
[15:45] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[15:50] * LarsFronius (~LarsFroni@net-93-151-166-18.cust.dsl.teletu.it) has joined #ceph
[15:58] * cblack101 (c0373727@ircip2.mibbit.com) has joined #ceph
[16:03] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[16:15] * scuttlemonkey (~scuttlemo@ has joined #ceph
[16:25] * loicd (~loic@ has joined #ceph
[16:25] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Remote host closed the connection)
[16:48] * blaphmat (a5a00214@ircip2.mibbit.com) has joined #ceph
[16:48] <blaphmat> hi everyone, is there a recommended way to do backups with ceph? i haven't seen any info in the wiki about it
[16:49] <blaphmat> for instance if i wanted to backup to tape or offsite
[16:53] <nhm> blaphmat: Heya, I honestly don't know if we have any specific recommendations.
[16:54] <nhm> blaphmat: It might depend on if you are using RGW, RBD, CephFS, etc.
[17:00] <blaphmat> ok
[17:00] <blaphmat> lets say i'm just using RBD for the block storage
[17:00] <blaphmat> can rbd work over fiber or is it just iscsi at the moment?
[17:01] <Fruit> you could export it over FC using LIO, I suppose
[17:01] <blaphmat> i'm not sure what LIO is, haven't seen that before
[17:01] <Fruit> http://www.linux-iscsi.org/wiki/Fibre_Channel
[17:02] <blaphmat> wow this is interesting
[17:02] <blaphmat> is the recommended way to do iscsi or use the native kernel client?
[17:02] * Fruit has a working setup exporting a ZFS zvol over FC with LIO
[17:03] <Fruit> I suppose a ceph block device could be exported similarly
[17:03] * mgalkiewicz (~mgalkiewi@staticline-31-182-149-180.toya.net.pl) Quit (Remote host closed the connection)
[17:04] <blaphmat> wow this LIO looks powerful
[17:05] <blaphmat> Fruit: so the usual way this works is you create your thin block devices with rbd and then the clients have a fuse or kernel module to mount them?
[17:05] <blaphmat> sorry for all the noob questions
[17:06] <Fruit> ceph offers filesystems, block devices and generic key/value object stores
[17:06] <Fruit> if you want something mountable, use the filesystem option
[17:06] <blaphmat> i'm mostly interested in the block devices i think
[17:06] <blaphmat> i'm curious if ceph could be used to replace an expensive san fibre device
[17:06] <Fruit> of course you could mkfs a block device, but then you'll only be able to mount it on a single host (unless you put something like gfs on top)
[17:07] <blaphmat> right
[17:07] <blaphmat> it looked like online that ceph was being used to create block devices for remote systems to use as their storage
[17:07] <Fruit> personally I'm most interested in using block devices as backing stores for kvm-qemu vm's
[17:07] <blaphmat> right
[17:07] <blaphmat> that being another use
[17:08] <Fruit> especially since qemu has native rbd support these days
[17:08] <blaphmat> yeah that's pretty awesome
[17:10] * aliguori (~anthony@cpe-70-123-130-163.austin.res.rr.com) Quit (Remote host closed the connection)
[17:12] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:15] <stan_theman> anyone know a preferred place to leave the output of ceph-debugpack for sending to the list?
[17:15] <stan_theman> err, put the tarball*
[17:16] * sagelap (~sage@10.sub-70-197-146.myvzw.com) has joined #ceph
[17:17] * tziOm (~bjornar@ti0099a340-dhcp0358.bb.online.no) has joined #ceph
[17:17] <tziOm> I have huge problems getting rados-gw working. Does this work at all using 48.2 ?
[17:20] <cblack101> Off the wall question: Is RADOS an acronym for something or just a word made up?
[17:22] <scuttlemonkey> cblack101: "Reliable Autonomic Distributed Object Store"
[17:22] <cblack101> Thanks man!
[17:23] <scuttlemonkey> np
[17:25] <tziOm> I keep getting invalid content length with mod_fastcgi and rados-gw!
[17:27] * jlogan1 (~Thunderbi@2600:c00:3010:1:787f:c6f:10bb:bf2) has joined #ceph
[17:28] <sagelap> joao: around?
[17:30] <joao> sagelap, here
[17:34] * loicd (~loic@ Quit (Quit: Leaving.)
[17:36] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[17:37] * scuttlemonkey (~scuttlemo@ has joined #ceph
[17:37] <sagelap> just opened a ticket, noticed mon report is missing the cursh map
[17:38] <joao> and thunderbird is acting funny again...
[17:38] <joao> stopped updating the emails from the lists and the tracker
[17:39] <joao> alright, got it
[17:39] * tren (~Adium@ has joined #ceph
[17:40] <joao> looks like there was another ticket from tamilarasi regarding a monitor crash
[17:41] <sagelap> which one?
[17:41] <joao> http://tracker.newdream.net/issues/3260
[17:41] <joao> an assertion on propose_pending()
[17:41] <sagelap> i pushe dthe fix for that yesterday.. it's the one you reviewed the other day
[17:41] <tren> Sage: Is ceph mds tell <mds> dumpcache <location> supposed to cause the mds to fail over to a standby node?
[17:42] <joao> ooh
[17:42] <joao> yeah
[17:42] <joao> sagelap, nevermind; thunderbird *is* still acting funny
[17:42] <joao> I should check what's happening
[17:42] <sagelap> tren: not normally, but in your case hte cache is so huge it will probabl take it a while to write out that 50 GB or whatever to a file
[17:43] <tren> Sage: The cache was 29GB and it only wrote out 1.1GB and failed over to a standby mds
[17:43] <tren> Sage: Every time I ask for a dump, it causes a failover. I'm not sure it's supposed to do that.
[17:43] <tren> Sage: I'm bzip'ing the cache to upload somewhere for you guys to peek at
[17:45] <tren> Sage: Though, on a positive note, I haven't gotten stuck in replay or clientreplay on mds failover with 0.52 :)
[17:51] * aliguori (~anthony@ has joined #ceph
[17:52] * sagelap1 (~sage@ has joined #ceph
[17:53] * sagelap (~sage@10.sub-70-197-146.myvzw.com) Quit (Ping timeout: 480 seconds)
[17:53] <nhm> tziOm: It should work, but it can be a bit tricky to setup.
[17:57] <tziOm> hmm.. I have tried with both apache, nginx and lighttpd.. I get everything to work more or less, but I get the errors from amanda, and amanda works with s3 .. so..
[17:57] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:02] <nhm> tziOm: Does rest-bench work?
[18:06] * Tv_ (~tv@2607:f298:a:607:5c1e:e9a0:aa30:35e7) has joined #ceph
[18:07] <sagewk> tren: cool
[18:08] <sagewk> joao: also there's a github note on the mon elector fixes.. just needs to be broken into 2 patches
[18:08] <joao> sagewk, okay
[18:08] <joao> sagewk, also, whenever you find the time, please check mon-report-crushmap; topmost commit
[18:08] <joao> I hope it does what you're looking for
[18:09] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:09] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[18:10] <sagewk> joao: perfect, want to commit that to master?
[18:10] <sagewk> thanks!
[18:10] <joao> okay
[18:10] <tziOm> ERROR: failed to create bucket: XmlParseFailure
[18:10] <tziOm> failed initializing benchmark
[18:13] <tziOm> FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: error parsing headers: duplicate header 'Status'
[18:13] <tren> Sage: https://www.dropbox.com/s/0oug50r0izqhn88/sap.cache.bz2 - The dumped cache
[18:17] <nhm> tziOm: we've seen something like that before.
[18:17] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[18:17] <nhm> tziOm: http://tracker.newdream.net/issues/439
[18:19] * benpol (~benp@garage.reed.edu) has joined #ceph
[18:20] <tziOm> I have seen it, but seriously _THIS IS TWO YEARS AGO_
[18:20] <tziOm> and you still advice the same apache.conf and one has to set "rgw print continue = false;" to disable this error
[18:21] <tziOm> but then one other error appears here, atleast with amanda: [error] [client] Invalid Content-Length
[18:22] <gregaf> jtang: right now Inktank is using both brandings everywhere it goes; I think we will be at sc12 but I'm not certain..nhm?
[18:22] <tren> Morning Greg :)
[18:23] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:23] <nhm> gregaf: yep, we'll have a booth. Not sure which/if engineers are going to be there yet..
[18:24] <gregaf> exec: I don't think we're going to backport rbd locking, but you could probably do it on your own by grabbing the class files and upgrading the clients...
[18:24] * edv (~edjv@ has joined #ceph
[18:24] <nhm> tziOm: Not sure why it's still a problem. Have you talked to Yehuda about it at all? I think he's kind of the resident RGW expert.
[18:26] <jtang> gregaf: ah okay, well I shall be at SC12
[18:26] <jtang> i was hoping to be able to talk to come inktank/ceph people at the event
[18:27] <gregaf> well, sounds like somebody will be there…I thought you were going nhm; did that get canned?
[18:28] <rweeks> There should be at least 4 Inktank people there.
[18:28] <rweeks> plus we will have a booth
[18:28] <blaphmat> Fruit: I checked out LIO, i believe something like that would work
[18:28] <blaphmat> but I'd have to build a prototype first
[18:29] <nhm> gregaf: No idea, just waiting to find out what the plan is.
[18:29] <jtang> rweeks, nhm I plan on dropping by the booth if there will be one
[18:29] <tziOm> nhm, No, I am quite new here, just exploring if ceph could be for my business... but what I have seen now (except from theory) is unfortunately scaring me away
[18:29] <nhm> jtang: Definitely will be a booth!
[18:29] <tziOm> for example, mount -t ceph .... (works for 20 secs, then crashes)
[18:29] <rweeks> according to this spreadsheet, we definitely will have a booth at SC12.
[18:29] <tziOm> and these are packages from ceph repo
[18:30] <jtang> :)
[18:30] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Ping timeout: 480 seconds)
[18:30] <gregaf> yeah, cephfs isn't stable yet and rgw is annoying to set up since it's still written to work with apache but has to work around a bunch of issues
[18:30] <tziOm> also, setting up radosgw, as a sysadmin at a hosting company I kindoff know what I am doing... hardly documented, and not working at all with documented setup (or any setup?)
[18:31] <gregaf> rbd is fairly pleasant to use, though ;)
[18:31] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[18:31] <jtang> out of curiousity will there be HR people at the sc12 event?
[18:31] * miroslavk (~miroslavk@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[18:31] <nhm> tziOm: DreamHost does have a pretty sizable RGW deployment with mutliple RGW servers sitting behind haproxy.
[18:31] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[18:31] <rweeks> jtang: I don't think so
[18:32] <nhm> tziOm: Certainly if you wouldn't mind documenting the troubles you've had, we'd like to know how we can do things better.
[18:33] <tziOm> nhm: first of all, dont even distribute the ceph kernel vfs module until it works for more than 60 seconds
[18:33] <tziOm> nhm, fuse module seems working, but I do not dare go that route anymore
[18:33] <tren> tziOm: Actually it works pretty well. I've pushed terabytes of data through both the kernel client and the fuse client
[18:34] <tziOm> tren, with git version, or debian packages from ceph repo?
[18:34] * scuttlemonkey (~scuttlemo@2607:f298:a:607:e19c:2c9e:fa62:c4c4) has joined #ceph
[18:34] <tren> tren: 0.52 that I made into a binary ebuild for gentoo to push out to the cluster
[18:34] <tziOm> talking to yourself again....
[18:34] <tziOm> ;)
[18:34] <tren> yup…I do that from time to time ;)
[18:34] <nhm> tziOm: If you could submit a bug report about the crash with any logs or other information you think would be useful, that would definitely help us.
[18:35] <tziOm> tren, might be 0.52 works, but ceph has a debian repo with 0.48.2
[18:35] <tren> tziOm: Sorry about that. :) But I'm working with their 0.52 source builds
[18:35] <tren> tziOm: 0.48, 0.48.1 and 0.48.2 work as well. I've tested every version from 0.48 -> 0.52 :/
[18:35] <tziOm> And since it is this unstable and nothing really works, I dont see why then dont push newer packages into repo
[18:36] <tren> I'm a fairly new comer to using ceph though. Only about 3 months of testing now
[18:36] <rweeks> er, I wouldn't say "nothing really works"
[18:36] <tren> tziOm: try using a source build.
[18:36] <tziOm> tren, first think I did was mount -t ceph .... ; dd if=/dev/zero of=foo bs=1024k count=2048 ...... crash
[18:37] <tren> tziOm: I've had 11 bonnie++'s running against a cluster using both kernel module and fuse
[18:37] <gregaf> tziOm: that's rather unusual, what kernel are you on?
[18:37] <tziOm> 3.2.0
[18:37] <tren> tziOm: what backing fs are you using?
[18:38] <tziOm> ext4
[18:38] <tren> Try xfs :)
[18:38] <tziOm> basically this setup was a cut and paste of the 5-min quick start guide + dd
[18:38] <blaphmat> btrfs seems to be gaining stability quickly
[18:38] <Tv_> tziOm: what was the crash?
[18:39] <tren> blaphmat: I was finding the btrfs-cleaner's taking more cpu time than the ceph-osd procs. :/ I switched to xfs
[18:39] <blaphmat> i see
[18:39] <tren> that was on kernel 3.5.3
[18:39] <tziOm> libceph: tid 269 timed out on osd0, will reset osd
[18:39] <blaphmat> fairly recent then
[18:40] <tziOm> ..something like that
[18:40] <gregaf> tziOm: oh, that looks like messenger issues, and Alex fixed a whole mess of those issues since then
[18:40] <tziOm> ceph-osd D ffff8801b9ed0040 0 3896 1 0x00000000
[18:40] <nhm> tziOm: Are you mounting ext4 with user_xattrs?
[18:40] <tziOm> yep
[18:40] <tren> Are you using the kernel fs client on a osd node?
[18:41] <gregaf> as with btrfs, you really want to run the latest kernel you can if you're using any of the ceph kernel pieces
[18:41] <tziOm> I was, yes
[18:41] <Tv_> nhm: btw https://github.com/ceph/ceph/commit/767272d3dc556cde0c839b935247eead6a34aa12
[18:41] <tren> tziOm: Shouldn't do that.
[18:41] <tren> tziOm: Probably why it crashed. I think this was gone over on the mailing list last month
[18:41] <nhm> tziOm: Ok, and filestore xattr use omap = true in ceph.conf?
[18:42] <gregaf> isn't that documented reasonably well, or are we still missing it?
[18:42] <tren> gregaf: It's documented in a few places I've noticed
[18:42] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[18:42] <tziOm> Can someone with rgw please try amanda
[18:42] <tren> gregaf: along with a link to the loopback nfs conversation from a while ago
[18:42] <darkfader> the question is if it's in the beginner docs
[18:43] <tziOm> config here: http://pastebin.com/1aB8M6KH
[18:43] <gregaf> darkfader: you're right, it's not
[18:44] <darkfader> gregaf: i'm too delayed to know what to write, but maybe tziOm can do after he figures? *hintnudge*
[18:44] <tziOm> place in $amandadir/etc/amanda/DailySet1/amanda.conf replace keys and run: for i in 1 2 3 4 5 6 7 8 9 10; do amlabel DailySet1 DailySet1-$i slot $i; done;
[18:44] <nhm> Tv_: ah, good to know
[18:45] <nhm> gregaf: darkfader: tziOm: We do list it in our Troubleshooting docs here: http://ceph.com/wiki/Troubleshooting
[18:45] <nhm> Probably better to have that in a more visible place though.
[18:45] <jtang> i recently saw few issues and commits logged by sam lang, is this the same sam lang from the pvfs2 (argonne?)
[18:46] <gregaf> http://tracker.newdream.net/issues/3264
[18:46] <gregaf> tziOm: is this amanda the amanda network backup, or something else?
[18:46] <Tv_> gregaf et al: http://tracker.newdream.net/issues/3076
[18:46] <tren> gregaf: I have the cache dump for you
[18:46] <blaphmat> i think it's the amanda backup
[18:47] <tziOm> gregaf, backup, yes
[18:47] <tren> Tv_: That's the one I found :) very good info there
[18:47] <nhm> jtang: Our Sam Lang is from Argonne yep. :)
[18:47] <gregaf> *waves at slang*
[18:47] <slang> *waves*
[18:47] <jtang> I was wondering there for a moment when I first saw them issues being logged
[18:48] * BManojlovic (~steki@ has joined #ceph
[18:48] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:49] <slang> I'm famous!
[18:49] <jtang> heh only cause i've used pvfs2
[18:49] <jtang> ;)
[18:49] <slang> jtang: ah
[18:49] <jtang> I think i may have spoken to you at sc many years ago
[18:49] <slang> I'm not that famous!
[18:50] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[18:50] <blaphmat> which linux OS would you say has the best ceph support at the moment?
[18:51] <gregaf> slang: your name is in Wikipedia; you're famous!
[18:51] <jtang> speaking of which, i may have asked this before, are there plans to port ceph to other platforms and architectures
[18:51] <slang> gregaf: or notorious
[18:52] <gregaf> tziOm: I created a task in the tracker to try out Amanda: http://tracker.newdream.net/issues/3265
[18:52] <gregaf> can you comment on it with any other relevant details? I don't know anything about the software or what will be involved :)
[18:52] <blaphmat> NaioN: any luck on getting LIO to write a module for ceph?
[18:53] * LarsFronius (~LarsFroni@net-93-151-166-18.cust.dsl.teletu.it) Quit (Quit: LarsFronius)
[18:53] <gregaf> jtang: it's not super well-tested, but Ceph should work fine on most archs; new ARM and PowerPC I know for certain are built by some distros and have been run by people in the past
[18:53] <gregaf> (there are other archs built as well; I just don't remember which ones)
[18:54] <iggy> I compiled a very old version for arm at least (I was going to try to run it on one of those little hackable NASes, but it didn't have enough ram)
[18:54] <jtang> gregaf: are these the server side components or the client side components that have been built and known to be functional?
[18:54] <Tv_> jtang: canonical actively compiles ceph for their high-endish arm server arch
[18:55] <darkfader> blaphmat: i know one of the LIO people loves ceph, i'd think he immediately would see the potential of having an rbd backend in lio
[18:55] <gregaf> Inktank is unlikely to port it off of Linux until we're a lot happier with where it is on Linux, but the code is fairly portable; somebody ported it to FreeBSD (I think that one) several months ago
[18:55] <blaphmat> darkfader: i agree :)
[18:55] <Tv_> jtang: wido experimented with intel atoms running osds and ended up concluding they didn't have enough cpu power to handle recovery (for now)
[18:56] <gregaf> both — I mean, the kernel clients are upstream so they build everywhere, right?
[18:56] <tziOm> gregaf, Please include the for loop that creates the "tapesets" in the tracker. My error is I get: "Wrong content length from client"
[18:56] <Tv_> jtang: then again, you might be talking about client-side only..
[18:56] <gregaf> Tv_: actually I think he got that working with newer code and not quite as many disks per cpu
[18:56] <blaphmat> darkfader: I think even without the module it should be possible to use rbd as your LIO backing from what the docs indicate
[18:56] <nhm> gregaf: nice
[18:56] <Tv_> gregaf: yeah and i'd expect the osd "reservation" stuff etc throttling to help a lot
[18:56] <wido> Tv_: Indeed, the run fine under normal load. But as soon as recovery kicks in you're in trouble
[18:56] <nhm> gregaf: also, I've got a big article I'm writing now. BTRFS is definitely higher on the CPU utilization front.
[18:57] <wido> And the 4GB memory limit on the Atom is to low
[18:57] <wido> Still want to look at the AMD Brazos platform with 8GB of memory
[18:57] <gregaf> tziOm: is that something Amanda says or something in the RGW logs?
[18:57] <jtang> Tv_: yes, im more interested in the client
[18:57] * Leseb (~Leseb@ Quit (Quit: Leseb)
[18:57] <darkfader> blaphmat: hmm, i have used lio, and you need to use one/more backend devices per lun you'll be making, so you would need to slice up a /dev/rbd - don't knwo how that would turn out
[18:57] <jtang> servers are pretty common and easy to get going with commodity hardware
[18:57] <darkfader> which part of the docs / which lio feature did you think of?
[18:57] <Tv_> jtang: honestly, i'd expect it to Just Work(tm).. perhaps you'll find an endianness bug at some point, but that should be it
[18:58] <nhm> wido: Does that have the SSE4 crc32c instruction?
[18:58] <tziOm> gregaf, apache error log
[18:58] <gregaf> okay
[18:58] <tziOm> probably from rgw
[18:58] <nhm> wido: I'm bogged down in all kinds of customer stuff but I really want to look into alternate CRC32c implementations.
[18:59] <rweeks> does anyone have a link to the FreeBSD port? I know a BSD dev who would like to look at it
[18:59] <blaphmat> darkfader: so at the moment LIO doesn't understand that ceph does that for you. i see the problem..
[18:59] <Tv_> jtang: as i said, ubuntu builds ceph on armhf all the time: https://launchpad.net/ubuntu/+source/ceph/0.48-1ubuntu3/+build/3723505 etc
[18:59] <nhm> wido: I've noticed there is a performance penalty with our current one at really high throughput levels. It might be an issue on lower power platforms too.
[18:59] <blaphmat> darkfader: with LIO integration I don't see why ceph can't take on traditional SAN's.
[19:00] <jtang> Tv_: i shall have to fire it up on my rasbpi and play with it!
[19:00] <gregaf> rweeks: it got merged into our codebase; it was just some extra #include stuff :)
[19:00] <darkfader> blaphmat: well
[19:00] <wido> nhm: No idea if it has SSE. I also haven't looked that close to what is happening with the Atoms
[19:00] <rweeks> oh - so 0.52 could likely compile on FreeBSD then?
[19:00] <nhm> wido: you may want to try just disabling crc32 and see how it does.
[19:00] <gregaf> it wouldn't be surprising if we'd broken it, since we don't build it locally
[19:00] <gregaf> but yes
[19:00] <rweeks> interesting
[19:01] <darkfader> issue 1) lio should stop losing targets if it doesnt find a certain device 2) for certain aspects, definitely, there are many things that can work better in ceph/lio and lio scales very well
[19:01] <gregaf> https://github.com/ceph/ceph/pull/2
[19:01] <blaphmat> have you guys seen this slideshare: http://www.slideshare.net/thomasuhl/infinistoretm-scalable-open-source-storage-arhcitecture slide 30 talks about rbd backing LIO
[19:01] <gregaf> will give you some palces to look in the code base/git history
[19:02] <darkfader> thats the guy, but from what i gather he is concerned with options, not with whats already working
[19:02] <nhm> wido: on a dual 6core nehalem system I was able to increase from about 1.2GB/s to 1.4GB/s by disabling it.
[19:02] * chutzpah (~chutz@ has joined #ceph
[19:02] <darkfader> he has a very definite and large picture on his mind
[19:02] <nhm> wido: with large IOs that is.
[19:02] <darkfader> but noone's painted it
[19:03] <blaphmat> yeah
[19:03] <blaphmat> i'm sure that'll get the wheels turning in people's heads :D
[19:03] <wido> nhm: Ah, ok. I'll give that a try. But I've been haunted with many btrfs bugs as well. Lately my time went to getting RBD into CloudStack and libvirt. Didn't do that much performance testing
[19:03] <wido> have to go afk
[19:04] <gregaf> tren: right, sorry — where's that cache dump again?
[19:05] <darkfader> blaphmat: in my last class i noticed it took most people almost an hour to really see the potential (san-heavy people will take a few seconds, but for others it's quite hard to grasp/imagine something with as much possibilities)
[19:05] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:05] <rweeks> interesting, darkfader
[19:05] <rweeks> because as a long-time NFS person I grasped the possibilities really fast
[19:06] <blaphmat> darkfader: yeah i picked up on that real quick that it could take down million dollar san's if it could play nice over iscsi or fibre channel
[19:06] <rweeks> even though blocks aren't really my thing, historically
[19:06] <darkfader> i was about to say storage-heavy and err, well
[19:06] <rweeks> hehe
[19:06] <darkfader> rweeks: maybe it's terabyte-heavy infact
[19:06] <rweeks> FC is dead. Don't tell the FC companies.
[19:06] <darkfader> yeah right
[19:06] <Tv_> blaphmat: haven't seen those slides before, interesting..
[19:06] <rweeks> Expensive interconnects. Meh.
[19:06] <blaphmat> yeah true
[19:07] <blaphmat> fibre isn't cheap
[19:07] <darkfader> imho thats bs
[19:07] <blaphmat> what is?
[19:07] <Tv_> ethernet has a good history of killing everything else
[19:07] <Tv_> just give it time
[19:07] <darkfader> copper people constantly rip out cables and waste shitloads of money
[19:08] <darkfader> and "converged" ethernet rocks if your systems aren't doing anything, ever
[19:08] <blaphmat> that's my issue
[19:08] <darkfader> but see backup windows be exceeded for 6 hours because some idiot switched to 10ge
[19:08] <blaphmat> i don't have the extra bandwidth to let storage go over the copper
[19:08] <nhm> Tv_: I'm very curious what Intel has up their sleeve after buying off qlogic's IB and Cray's Interconnect divisions.
[19:08] <rweeks> fibre is cheap, and so is silicon - but FC companies are used to making lots of money from that infrastructure.
[19:08] * maelfius (~mdrnstm@ has joined #ceph
[19:09] <Tv_> nhm: 40+ GbE on mobo?
[19:09] <blaphmat> yeah i believe that
[19:09] <darkfader> i guess medium term we'll see a war between cisco ucs and whatever intel and broadcom do
[19:09] <jtang> infiniband?
[19:09] <blaphmat> cisco charges a fortune for sfp's
[19:09] <gregaf> Tv_: that's funny, I feel like I've seen almost all those slides before ;)
[19:09] <nhm> Tv_: couldn't they do that anyway with their own network group?
[19:09] <rweeks> infiniband is great, if you want to buy everything from Mellanox.
[19:09] <tziOm> does ceph have a concept of the age/traffic towards a object, and could move objects with less frequent usage to slower storage?
[19:09] <darkfader> gregaf: i've pasted the pres here
[19:09] <Tv_> nhm: perhaps they bought optics knowledge
[19:09] <gregaf> nhm: saw an article where one of the Intel VPs said they were going to put network interconnects on-package or even on-chip
[19:09] <jtang> rweeks: its cheaper than 10gbit
[19:09] <darkfader> some ages ago though
[19:09] <Tv_> tziOm: nope
[19:09] <jtang> well it was when we last bought a cluster of the stuff
[19:10] * EmilienM (~EmilienM@ has left #ceph
[19:10] <rweeks> it is now, jtang, but only because Mellanox aggressively dropped their prices.
[19:10] <Tv_> tziOm: that would imply a lookup table of where the data is now; CRUSH doesn't want that
[19:10] <nhm> gregaf: yeah, I saw something about that. Can't remember if I read the article.
[19:10] <Tv_> gregaf: yup, that's been making the rounds
[19:10] <jtang> rweeks: they bought voltaire!
[19:10] <darkfader> yeah and fc is a lot cheaper than 10ge too, and practically you can do well with 8+8gbit storage + 2*1 gbit lan. but not with 10gbit lan+storage
[19:10] <jtang> i thought that was sort of funny
[19:10] <jtang> and cisco making a mess when they bought topspin
[19:10] <tziOm> Tv_, any chance you will change that..?
[19:10] <jtang> and qlogic got pathscale
[19:11] <jtang> ah the joys of a monopoly
[19:11] <rweeks> hm. is FC cheaper, really, if you don't buy Cisco switches?
[19:11] <Tv_> tziOm: doesn't belong on the RADOS layer; cephfs might do that later, etc higher layers
[19:11] <darkfader> and, final fc bonus is that you keep the storage network to people who don't constantly mess the network up
[19:11] <blaphmat> darkfader: that's basically the setup i have. hence why i keep looking for open sauce fibre storage
[19:11] <nhm> rweeks: I actually am kind of regretting buying intel 10GE cards for the test nodes I've got instead of ConnectX-3 cards.
[19:11] <nhm> rweeks: I could have gotten a lot more milleage out of those.
[19:11] <darkfader> rweeks: compared to 10ge? methinks. but i had filed cisco as quite cheap/lowend.
[19:11] <darkfader> (not nexus though, just the mds stuff)
[19:12] <rweeks> yeah, those connectx cards look neat
[19:12] <Tv_> nhm: you've heard Carl's experience with 10G, right? Broadcom is better than Intel, for him.
[19:12] <rweeks> I would like to get some 40g cards to play with.
[19:12] <darkfader> but really , i think the main point is - people end up buying "10ge san" anyway after they tried running it on a wire
[19:12] <blaphmat> the connectx are infiniband?
[19:12] <darkfader> ib + 10ge
[19:13] <rweeks> both
[19:13] <nhm> Tv_: yeah, I had already grabbed intel cards before I talked to him.
[19:13] <rweeks> ib + 10gbe + 40gbe on the new models
[19:13] <darkfader> ah, ok?
[19:13] <nhm> Tv_: Still, I've used plenty of Intel 10G before without issue.
[19:13] <jtang> infiniband is pretty nice
[19:13] <darkfader> i have connectx2, bet they're too old for that
[19:13] <rweeks> yep
[19:13] <darkfader> for 10ge i finally got one(1) solarflare nic
[19:13] <nhm> yeah, I used to have some connectx2 cards. The new ones look nice and are cheap.
[19:14] <Tv_> nhm: i haven't been hands on with 10gig, but back in the days of 1gig, intel cards were picky on driver & firmware versions, but once you got a good combo, they beat everything
[19:14] <rweeks> but regarding price, what I meant was: is FC cheaper, if you go out and buy FC switches from Brocade, plus all the FC client cards and an expensive SAN from some vendor
[19:14] <blaphmat> my problem with 10Ge has been that I can't get the full 10G haha. i get about 1/3rd to 1/2 max
[19:14] <darkfader> switches and hbas are the san. the other thing is "storage"
[19:14] <jtang> blaphmat: from a single host?
[19:14] <rweeks> versus buying 10gbe switches from someone like Force10 and building your own storage arrays on white boxes with Ceph, for example
[19:15] <blaphmat> jtang: from host to host with 10gig cards on each
[19:15] <rweeks> right darkfader
[19:15] <rweeks> I'm thinking about overall cost.
[19:15] <nhm> blaphmat: I've pulled around 9Gb/s with cards on the same switch.
[19:15] <darkfader> rweeks: if you try to mirror the ram in the whitelabel box
[19:15] <darkfader> then it'll probably turn up even. :>>
[19:15] <darkfader> otherwise it's apples and oranges
[19:15] <blaphmat> nhm: were you using iperf to test?
[19:15] <nhm> blaphmat: about 6Gb/s from Minnesota to Oakridge.
[19:15] <rweeks> but why do you need to do that, if ceph has 3 replicas spread across your cluster?
[19:15] <jtang> it'd be quite hard to push 10gb froma single host, you must have had quite a system
[19:16] <nhm> blaphmat: iperf and netperf.
[19:16] <nhm> blaphmat: along with gridftp
[19:16] <darkfader> you might not have to. but that's a distributed storage vs. big array question, not related to the transport
[19:16] <blaphmat> jtang: i agree, i meant just network to network performance. i can't actually push that much data
[19:16] <rweeks> well, it is sort of related
[19:16] <blaphmat> nhm: yeah i can see gridftp doing it
[19:16] <darkfader> rweeks: i think in the long run distributed, whitebox storage will be better
[19:17] <rweeks> (btw: I just came from one of the "established" vendors of NAS & SAN storage. And I agree, darkfader, which is why I joined Inktank. :)
[19:17] <nhm> jtang: this was a 10GbE storage front-end on a 8k core cluster with a 12GB/s lustre filesystem and QDR IB interconnects.
[19:17] <darkfader> at least if it is like ceph with scalable mds. the other solutions, no, sorry, never
[19:17] <rweeks> exactly.
[19:17] <blaphmat> nhm: when i use iperf i get 5.71Gb/s between 2 hosts on the same switch
[19:17] <rweeks> scalable metadata has been a huge problem for every vendor of scale-out storage.
[19:18] <darkfader> rweeks: you might happen to have read it, it was great. "in cloud, everything is 10x, especially latency"
[19:18] <blaphmat> rweeks: you work for inktank now?
[19:18] <rweeks> and it still is, regardless of what IBM and EMC and NetApp tell you.
[19:18] <jtang> nhm: ah okay
[19:18] <rweeks> yes, blaphmat. I started with them on Monday.
[19:18] <jtang> funky :)
[19:18] <rweeks> haha
[19:18] <rweeks> nice, darkfader
[19:18] <rweeks> where's that quote from?
[19:18] <blaphmat> rweeks: awesome. i believe i put a resume in with them just for fun :)
[19:18] <nhm> blaphmat: Most OSes should have good defaults, but there are some tcp kernel tweaks you can try that might help.
[19:18] <rweeks> sounds like something Simon Wardley would say.
[19:19] * jtang shudders at the thought of gridftp
[19:19] <darkfader> rweeks: i just want a big huge red line separating comparism ceph<>low/midrange and any distributed storage<>vmax/usp
[19:19] <blaphmat> nhm: are you using jumbo frames? i'm not. don't have support for it yet from the net team
[19:19] <jtang> actually /me just shudders at the though of globus
[19:19] <jtang> thought
[19:19] <darkfader> mostly because of different use cases
[19:19] <nhm> blaphmat: hrm, I think we had jumbo frames on, that was like a year ago before I worked for Inktank.
[19:19] <rweeks> oh definitely
[19:19] <rweeks> we're not in the same ballpark, today.
[19:19] <nhm> jtang: I used to do a lot of gridftp/globus stuff. :D
[19:20] <jtang> nhm: were you running lustre across a WAN?
[19:20] <darkfader> rweeks: anyway, cool that there's a new intank person :>
[19:20] <tziOm> a little more meat on the amanda rgw bone: http://pastebin.com/L005BSQ7
[19:20] <rweeks> in the long term though I see those use cases converging
[19:20] <darkfader> rweeks: that would be cool
[19:20] <nhm> jtang: nope, just on the cluster, with 10GbE front-ends for gridftp.
[19:20] <jtang> or just doing site to site with the teragrid infrastructure over gridftp?
[19:20] <rweeks> there's another inktanker lurking in here: miroslavk
[19:20] <jtang> or whatever internet2 network thats there
[19:20] <nhm> jtang: we weren't actually on teragrid. I tried to get us there but politics and apathy kept getting in the way.
[19:21] <darkfader> rweeks: what i'd really want is a fairy that gives me 20-30 3.5" embedded pc's with 16GB ram, dualport fc in a 40pin sca port and a 10ge to front
[19:21] <jtang> nhm: i've never been on teragrid myself, but i've alwasy wondered about it
[19:21] <rweeks> darkfader: with the way things are going in enterprise IT I see a lot of shakeup in the traditional storage market
[19:21] <darkfader> buy a couple old arrays, drop one disk per shelf, turn into a ceph monster
[19:21] <blaphmat> rweeks: the sys engineer position looked interesting. that's where i'd fit if i lived out on the west coast
[19:21] <nhm> jtang: we mostly used I2/NLR over some dark fiber that connects a bunch of the midwest universities to chicago.
[19:22] <darkfader> rweeks: i have one customer who has a "SAN" consisting of switches, hbas and one msa2000. that is the worst waste of money evar :>
[19:22] <blaphmat> i agree, i just built a gluster cluster that made people's heads explode when they say the price. it was 1/10th what we normally pay
[19:22] <gregaf> updated, thanks tziOm
[19:22] <blaphmat> haha
[19:22] <blaphmat> that's awesome
[19:22] <blaphmat> darkfader: rolled his own to the max
[19:22] <jtang> nhm: did you ever deal with jlabs by any chance?
[19:22] <darkfader> rweeks: and the quote is from some crazyhead on twitter named devops_borat
[19:22] <rweeks> most FC SAN is a horrible waste of money, as far as I'm concerned
[19:22] <nhm> jtang: don't think so.
[19:23] <jtang> ...this is leading into a question about geolocation and data locality...
[19:23] <rweeks> oh yes, devops_borat. He is rather funny.
[19:23] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[19:23] <jtang> we're sort of looking into using ceph as a storage backend across a metro area network and possibly a wan as well
[19:23] <blaphmat> nhm: did you guys have to mess with the buffer ethernet ring at all ?
[19:23] <darkfader> rweeks: i'll not change my opinion there, let the usual network guys run a storage network and you're going down. plus most sans I see are well >1k port and fully make sense
[19:23] <nhm> doh, I just missed our daily stund-up. oops!
[19:24] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:24] <jtang> i was wondering if anyone out there has tried ceph over a wan/man
[19:24] <darkfader> for these small failed deployments there should be a sales guy and a CIO slapped
[19:24] <nhm> jtang: latency could end up being a problem
[19:24] <jtang> and if there are things to watch out for, does latency kill performance?
[19:24] <rweeks> yes
[19:24] <rweeks> latency ALWAYS kills performance.
[19:24] <jtang> you preempted me
[19:24] <rweeks> storage, or not.
[19:25] <darkfader> hah, but on linux we always love to write async
[19:25] <darkfader> let it flush on the weekend
[19:25] <rweeks> but in a MAN you should have reasonably low latence
[19:25] <rweeks> WAN, that _really_ depends on the WAN. There are laws of physics that we can't get around quite yet.
[19:25] <benpol> ah
[19:26] <benpol> (sorry folks wrong window)
[19:26] * dmick (~dmick@2607:f298:a:607:1a03:73ff:fedd:c856) has joined #ceph
[19:26] <darkfader> (you would have just sounded curious)
[19:26] <jtang> i guess there is a possibility of placing data at certain locations
[19:26] <rweeks> darkfader: I didn't say that you should let the networking guys run storage networks, specifically. But aside from Oracle, I can't think of something in corporate data centers that is less cost effective than FC.
[19:27] <jtang> i've not yet played with the rados command too much, but does the cppool command copy objects across in parallel (i assume it does)?
[19:27] * scuttlemonkey (~scuttlemo@2607:f298:a:607:e19c:2c9e:fa62:c4c4) Quit (Read error: Connection reset by peer)
[19:27] <rweeks> yes, jtang. If your MAN has low enough latency, and you had, say 3 data centers, you could set up data placement rules to mirror the geographic placement of nodes in the cluster.
[19:27] <jtang> i think one of the use cases that we might have is replicating an entire file system to another pool of disks
[19:27] <darkfader> rweeks: idk, i find it useful, it runs scsi and it is mostly lossless
[19:28] <jtang> at a different site
[19:28] <rweeks> but then, I am starting to feel the same way about NFS, so...
[19:28] * scuttlemonkey (~scuttlemo@ has joined #ceph
[19:28] <jtang> i might save some of these questions and discussions till sc12
[19:28] <darkfader> i've also been having a ide dvd drive on a scsi bridge on a fc bridge on a iscsi bridge so i didn't need to use PIO. so i see some stuff differently :)
[19:29] <jtang> i need to think more and have a chat with the team im working with
[19:29] <darkfader> (and yes it was faster)
[19:29] <nhm> jtang: what organization are you with btw?
[19:29] <rweeks> happy to talk about them in person. It looks like I will probably be at SC12 as well
[19:29] <jtang> im paid to deliver the project "digital repository of ireland"
[19:29] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:29] <jtang> but i'm based out in trinity college dublin
[19:29] <nhm> jtang: neat
[19:30] <rweeks> wow, very cool
[19:30] <jtang> its a funny contract that i have, im on secondment
[19:30] <jtang> if thats the right term
[19:30] <jtang> storage is just one of those things that im interested in
[19:30] <rweeks> meaning, you work for Trinity College, but your project isn't a Trinity project
[19:30] <rweeks> ?
[19:30] <jtang> i was talking to my line manager earlier on about some ideas with hadoop styled systems for doing genome analysis
[19:31] <jtang> rweeks: i work for Trinity College Dublin, but the project is lead by the Royal Irish Academy
[19:31] <rweeks> right
[19:31] <jtang> it's a national project ;) technically i work for the irish state
[19:32] <darkfader> off for another few hours of oktoberfest madness
[19:32] <joao> and Trinity College "lends" you to the Royal Irish Academy for the project?
[19:32] <joao> darkfader, I kind of envy you
[19:32] <jtang> joao: yea something like that
[19:32] <joao> enjoy :p
[19:32] <darkfader> joao: do that it is better than reality :)
[19:32] <jtang> ah oktoberfest! i need to go at somepoint
[19:32] <rweeks> I'm interested in the hadoop/ceph integration as well, replacing HDFS with Ceph
[19:32] <joao> jtang, that's how the 'secondment' usually works with international missions and the sorts
[19:33] <jtang> rweeks: im more interested in doing the analysis with hadoop/ceph ;)
[19:33] <nhm> rweeks: yeah, I need to put that on my big list of things to do performance tests on.
[19:33] <jtang> rather than replacing hadoop with ceph
[19:34] <rweeks> well, you can't really replace hadoop with ceph.
[19:34] <blaphmat> rweeks: have you looked into disco with python?
[19:34] <rweeks> but you can use ceph as the storage for hadoop instead of HDFS.
[19:34] <rweeks> I have not, blaphmat
[19:34] <rweeks> but I'm not a python guy. not a dev, really.
[19:34] <blaphmat> it's pretty interesting. it's super easy to setup
[19:35] <blaphmat> i see
[19:35] <jtang> i havent had a look at the radosgw yet, i think i might get some interns to look at it later on in the year
[19:35] <blaphmat> if you can setup hadoop you can easily do disco
[19:35] <jtang> there's a few things i want to try out with ceph in the backend
[19:36] <jtang> ceph seems like it might be a good fit for genome analysis projects
[19:36] <jtang> but i guess i need to convince management that its a good idea to try
[19:37] <jtang> to try differnt things
[19:37] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[19:37] <nhm> jtang: what tools are you using for genome analysis?
[19:37] <rweeks> that's always the challenge with management, isn't it.
[19:38] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) Quit ()
[19:38] <jtang> nhm: the groups that i deal with use an illumina sequencer
[19:38] <nhm> jtang: I used to be a project manager on a proteomics/gnomeics data analysis pipeline.
[19:38] <jtang> so the pipeline is whatever they ship
[19:38] <jtang> we're all waiting for the new nano-poretech stuff to come out
[19:40] <nhm> jtang: We wrote (shudder) cagrid services for various protein identification search engines and short read sequence alignment tools.
[19:40] <jtang> nhm: https://github.com/jcftang/cports/tree/develop/packages -- there's some packages there that we use
[19:40] <nhm> jtang: along with a relatively simple data storage service with metadata tagging.
[19:41] <jtang> its not all bio stuff, but a mixture
[19:41] <jtang> of hpc related things
[19:41] <jtang> nhm: we were looking at irods for datamanagement at one point
[19:41] <jtang> nhm: for genomic and protemoic work, as well as drug design
[19:42] <jtang> that didnt work out too well, it because topheavy with storing too much data in the database and not enough on disk
[19:42] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[19:43] <nhm> jtang: All of our stuff was stored on disk with just the metadata in the DB. File transfer was using cagrid's awful stuff though. Eventually one of our devs wrote a sftp frontend which was much better.
[19:43] <jtang> one of the things i've wanted ot was to manage data being streamed from one site to another site
[19:43] <jtang> as the sequencer that we have is in a hospital
[19:43] <jtang> and the hpc facility is across the city
[19:44] <jtang> (another possible use for ceph with replication turned on)
[19:44] <nhm> jtang: That's actually what our goal was. We had various services that operated on a federated network that each did various things.
[19:44] <nhm> jtang: so data analysis happened on our cluster(s), while storage was on another box, while the frontend was somewhere else.
[19:45] <jtang> nhm: sounds like the mess, much like what we have right now :P
[19:45] <jtang> its going to be a nightmare when the second sequencer gets turned on
[19:45] <jtang> we have a solid as well now
[19:46] * cblack101 (c0373727@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[19:46] <nhm> jtang: Sort of a mess, but a contained one. It let users have a web interface for launching identification analysis and not have to try to log in somewhere and run broken bioinformatics tools, which was a big win.
[19:46] <jtang> right i must go now its too late to still be in the office, i got what i needed!
[19:47] <jtang> i shall meet some inktank people next month at SC12
[19:47] <nhm> alright, good evening!
[19:47] <jtang> i must make a list of things to discuss and ask
[19:48] <jtang> have a good evening to those in europe
[19:48] <jtang> and those that are in the US, have a good day
[19:49] <rweeks> nice to meet you, jtang
[19:49] * nhorman (~nhorman@nat-pool-rdu.redhat.com) Quit (Quit: Leaving)
[20:03] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[20:04] * cblack101 (86868947@ircip1.mibbit.com) has joined #ceph
[20:07] <tren> Gregaf: You around?
[20:07] <gregaf> yep
[20:08] <gregaf> found that link in the history; looking over it now
[20:08] <tren> sweet :) sorry, I was doing hard drive testing :/
[20:08] <gregaf> np!
[20:08] <tren> we go through lots of hard disks here...
[20:09] <exec> joshd: hi. I've seen your commits about locking support. any chance to have them on agronaut ?
[20:12] <joshd> exec: it would be possible to backport them, but a bit messy. as gregaf mentioned, you could use a newer client side and load new rbd and lock classes in the argonaut osds
[20:13] <gregaf> where "messy" means "unlikely to go into a stable release" ;)
[20:13] <joshd> exec: the next stable release (bobtail) isn't that far away
[20:25] <nwl> joshd: is the release timetable feature or time-based?
[20:27] <joshd> generally time-based, but if important bugs come up we don't mind delaying
[20:27] <Tv_> flexitime
[20:28] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[20:30] <maelfius> joshd: is it (typically) 6mo time release? or you on a different schedule (I ask since most "cool" OSS projects seem to be doing the 6mo time-release)
[20:33] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[20:33] <gregaf> tren: do you have any debug logging enabled for your MDS right now, or did you have to turn it all off?
[20:34] <gregaf> maelfius: we do development releases every 2-4 weeks
[20:34] <gregaf> we've only had one stable release so far and I don't know that we've really set expectations for how often we'll do that, but every six months is a pretty good guess
[20:34] * cowbell (~sean@adsl-70-231-128-149.dsl.snfc21.sbcglobal.net) has joined #ceph
[20:35] * sagelap1 (~sage@ Quit (Ping timeout: 480 seconds)
[20:36] <maelfius> gregaf: yeah i've seen argonaut. I'm usuing the dev releases (personally in my POC)
[20:36] <maelfius> cool! :)
[20:37] <gregaf> tren: I can certainly see from the cache dump that for some reason the client still has capabilities (these are leases on access, essentially, which mean the MDS needs to keep the inode in memory) that it's told the MDS it no longer wants
[20:38] <gregaf> at this point, a sample log with debug level 10 should shed light on why that's happening
[20:39] <gregaf> (sample log = does not need to include startup/shutdown; just a time period while running)
[20:41] <tren> okay
[20:41] <exec> joshd: gregaf: thanks.
[20:42] <tren> gregaf: the only workload I have running is rsync
[20:42] <tren> gregaf: you want the mds log?
[20:43] <exec> joshd: i hope seamless upgrade between stable releases will work?
[20:44] <joshd> exec: that's the plan
[20:45] <tren> gregaf: well, I've turned mds logging up to 10
[20:45] <tren> gregaf: how long do you need this enhanced logging?
[20:46] <exec> joshd: real live differs )
[20:46] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) has joined #ceph
[20:47] <joshd> exec: that's what testing is for
[20:48] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[20:48] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[20:49] <exec> joshd: kk, will see ))
[20:51] * Cube (~Adium@ has joined #ceph
[21:00] * scuttlemonkey (~scuttlemo@2607:f298:a:607:5163:627e:9450:8563) has joined #ceph
[21:00] * yehudasa_ (~yehudasa@ has joined #ceph
[21:06] * blaphmat (a5a00214@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[21:16] * scuttlemob (~scuttlemo@ has joined #ceph
[21:16] * scuttlemonkey (~scuttlemo@2607:f298:a:607:5163:627e:9450:8563) Quit (Read error: Connection reset by peer)
[21:17] * yehudasa_ (~yehudasa@ Quit (Ping timeout: 480 seconds)
[21:42] <exec> perhaps stupid q.: how I can re-init mds daemons after metadata pool recreation?
[21:43] <gregaf> tren: sorry, went out to lunch
[21:43] <gregaf> the longer it runs, the more data we have — but long enough to have rsynced a couple files is probably all I need
[21:48] * scuttlemob (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[21:48] <gregaf> exec: this is deliberately undocumented, but if you're attempting to recreate a filesystem (ie, have wiped out all the data in CephFS!): http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/6996
[21:50] * scuttlemob (~scuttlemo@2607:f298:a:607:d40c:697:5dfd:4f13) has joined #ceph
[21:56] <exec> gregaf: thanks.
[21:58] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) Quit (Quit: tryggvil)
[22:01] <tren> Gregaf: you there?
[22:01] <gregaf> yep!
[22:01] <tren> https://www.dropbox.com/s/1gfc5n53b4ztwup/mds.ocr46.log.bz2
[22:01] <tren> logs you asked for
[22:01] <tren> or rather, log
[22:01] <gregaf> excellent
[22:02] <tren> it's about 600mb uncompressed
[22:02] <tren> debug 10 definitely spews a lot of detail when an rsync is running
[22:02] <gregaf> wait until you see debug 20 ;)
[22:03] <tren> lol, I've sent you debug 20 logs before ;) it's a good thing I have a fast array to handle the logs
[22:10] * gregorg_taf (~Greg@ has joined #ceph
[22:10] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[22:10] * cblack101 (86868947@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[22:19] <nhm> debug 20 on everything is kind of insane
[22:19] * EmilienM (~EmilienM@195-132-228-252.rev.numericable.fr) has joined #ceph
[22:25] * cowbell (~sean@adsl-70-231-128-149.dsl.snfc21.sbcglobal.net) has left #ceph
[22:27] <tren> nhm: good way to test your disk io ;)
[22:28] * lofejndif (~lsqavnbok@83TAABFJ1.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:33] * scuttlemob (~scuttlemo@2607:f298:a:607:d40c:697:5dfd:4f13) Quit (Read error: Connection reset by peer)
[22:33] <gregaf> now it's time to play the "how much of the MDS code can I remember?" game
[22:33] <gregaf> in addition to "why am I falling asleep!??!"
[22:34] <gregaf> but I do see that the client is sending along a handle_client_caps dropping everything
[22:34] <gregaf> and the MDS is recording that, but says the caps are still dirty, and so doesn't seem to want to kick them out of cache
[22:35] <tren> gregaf: Can I just smile and nod?
[22:35] <gregaf> heh, yes
[22:35] <gregaf> just recording it somewhere
[22:35] <gregaf> and making it possible for slang to follow along if he wants or for sagewk to look back at it later
[22:37] <slang> following
[22:37] <gregaf> not sure if this will mean anything to you or not :)
[22:37] <slang> gregaf: you should have a coffee, btw
[22:38] <gregaf> haha; I went and fetched a Coke
[22:38] <tren> gregaf: shotgun it ;) that'll wake you up
[22:38] <gregaf> eww
[22:38] <slang> damn kids and their empty calories
[22:38] <gregaf> it might wake me up, but nobody would appreciate the method
[22:38] <tren> gregaf: that's why you do it alone…to hide your shame ;)
[22:39] <dmick> stir some sugar into that to fortify it
[22:39] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[22:42] * BManojlovic (~steki@ has joined #ceph
[22:45] <tren> gregaf: is there anything else you need from me that would help?
[22:46] <gregaf> not at this time; and hopefully I can just figure it out from what I've got
[22:46] <gregaf> I'll let you know as soon as I find anything
[22:46] <gregaf> thanks!
[22:46] <tren> mds is at 2.6gb ram usage ;)
[22:46] <tren> gregaf: thanks! just pvt me on here if you need to get my attention. or send an email
[22:46] <gregaf> will do
[22:46] <tren> gregaf: and let me know if you need anything else :) rsync is still going
[22:48] <gregaf> slang: have you done anything in the client with max_size yet?
[22:48] <slang> gregaf: no
[22:48] <gregaf> okay
[22:49] <slang> gregaf: por que?
[22:49] <gregaf> one of the things i see here is that the cap update changes the size to 31262 and the max_size to 0
[22:49] <gregaf> I think that that's okay, since max_size is a permission for the client to extend the file up to that size, and the client wants to drop all caps
[22:49] <gregaf> but I don't remember for sure
[22:50] <joao> slang, for a moment there I had a brain freeze and thought I was looking at another channel other than #ceph :p
[22:50] <gregaf> hola!
[22:50] * dty (~derek@testproxy.umiacs.umd.edu) Quit (Ping timeout: 480 seconds)
[22:50] <gregaf> donde esta?
[22:51] <gregaf> :p
[22:51] <joao> lol
[22:51] <Tv_> te ette todellakaan halua alottaa tota..
[22:51] <joao> gregaf, aside from the hola, we actually have matching words for all that
[22:51] <joao> :p
[22:51] <gregaf> really, the hola is the one that's different?
[22:51] <gregaf> yo hablo un poco espanol, pero muy despaciomente y muy malmente
[22:51] <joao> gregaf, yeah, we drop the h
[22:52] <joao> hola -> ola
[22:52] <Tv_> i'm having Breaking Bad flashbacks
[22:52] <gregaf> actually, that's about the only sentence I can still construct in Spanish
[22:52] <gregaf> and it's deliberately wrong
[22:52] <Tv_> gregaf: yeah you should be focusing more on how to order beer in Danish
[22:52] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:52] <gregaf> I don't like beer — teach me how to order whiskey!
[22:53] <nhm> this channel discussion is on the right track!
[22:53] <Tv_> gregaf: en whisky tak
[22:54] <nhm> btw, how do you guys manage to not drain the whiskey table at DH on a regular basis?
[22:54] <slang> gregaf: going to amsterdam?
[22:54] <gregaf> copenhagen first
[22:55] <gregaf> UDS
[22:55] <slang> nice
[22:55] <gregaf> and then wido's party
[22:55] <nhm> gregaf: sweet
[22:55] <Tv_> oh dutch is going so be so much harder than danish
[22:55] <Tv_> and danish is already considered the hardest scandinavian language, i think
[22:55] <gregaf> I'm going to be hung over the whole week, I suspect; since I have ~20 hours from the time our company Halloween party ends until UDS starts and it's a 13.5 hour flight :/
[22:56] <gregaf> (I'm trying to pretend I'm seasoned, but actually my passport just arrived last week — super excited!)
[22:56] <nhm> gregaf: yeah, I bet that's going to be a ton of fun
[22:56] <nhm> gregaf: SC12 is in Salt Lake City. ;(
[22:56] <joao> hey, greg's coming to europe
[22:56] <joao> !
[22:56] <Tv_> gregaf: i think slightly drunk, hungover and jet lagged is actually the closes a foreigner will ever get to dutch pronounciation
[22:57] <gregaf> joao, you should talk to Sage and Mark — we haven't forgotten you, we just haven't made any plans yet ;)
[22:58] * scuttlemonkey (~scuttlemo@ has joined #ceph
[22:59] <joao> yep, sure will :)
[23:00] * aliguori (~anthony@ Quit (Remote host closed the connection)
[23:00] <nhm> joao: I'm personally putting you in charge of making sure gregaf doesn't remember what he's done out there. >:)
[23:00] <gregaf> why would you do that to me?
[23:00] <nhm> gregaf: because I'm your good friend of course!
[23:01] <gregaf> but I…I like to remember what I've done!
[23:01] <gregaf> if I don't remember what I've done I assume it was bad
[23:01] <dmick> even the shameful bits?
[23:01] <gregaf> if you forget the shameful bits you just repeat them later on :p
[23:01] <sjust> to our amusement
[23:01] <tren> I'm glad this is bein' logged for posterity ;)
[23:02] <dmick> I thought that was part of the fun?!
[23:02] <Tv_> gregaf: i guess i was thinking of Monty's French accent: http://www.youtube.com/watch?v=QSo0duY7-9s and here's a brief introduction to some other significant European cultural phenomena: http://www.youtube.com/watch?v=vAaaAVJr9zg
[23:04] <nhm> joao: make sure he eats lots of blueberries to help replace the brain cells that get killed off.
[23:04] <joao> blueberries are good for that?
[23:04] <joao> why only now am I knowing about this?
[23:05] <gregaf> this…is a very strange video
[23:05] <joao> that would have been amazing to know for the last 10 years
[23:08] <nhm> joao: supposedly it helps neuron communication or something. One of the scientists I used to work with ate them every morning.
[23:09] <slang> kernel:
[23:09] <slang> client.1:
[23:09] <slang> kdb: true
[23:09] <dmick> Exception: i_am_not_teuthology
[23:10] * sagelap (~sage@ has joined #ceph
[23:10] <slang> I'm wondering if that does what I think, which is mounts the kernel client on client.1 with kdb
[23:10] <nhm> dmick: sudo kernel: client.1: kdb: true
[23:10] <dmick> sudo: kernel: command not found
[23:10] <slang> heh
[23:11] <slang> is it beer thirty already?
[23:11] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[23:11] <nhm> slang: I do have a half bottle of belgian in the fridge that needs to be drunk...
[23:11] <gregaf> slang: yes, I believe that's what that snippet should do
[23:11] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[23:12] <gregaf> assuming it's properly located in the task list
[23:12] <nhm> I concur
[23:12] <joshd> slang: no, the kernel task installs kernels. you want the kclient task
[23:12] <slang> ah yes
[23:12] <slang> joshd: you win!
[23:13] <dmick> what that will do is force a kernel reinstall, if the 'default' kernel version is not installed
[23:13] <nhm> joshd: Is kdb unique to kclient?
[23:13] <dmick> so you may not want it anyway
[23:13] * Cube (~Adium@ Quit (Quit: Leaving.)
[23:13] <dmick> kdb is a kernel task; just enable kdb in case of panic
[23:13] <gregaf> oops, sorry then
[23:13] <dmick> well a kernel task option
[23:14] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[23:14] <nhm> dmick: was the original snippet correct, just not what he wanted then?
[23:14] <joshd> yes
[23:14] <nhm> ah, I missed what the original question was.
[23:14] <nhm> clearly I do need beer.
[23:16] <slang> joshd: thanks, btw
[23:16] * BManojlovic (~steki@ has joined #ceph
[23:17] <joshd> np
[23:18] <gregaf> okay, I think I may have narrowed the problem down to something useful
[23:19] <gregaf> it looks like issue_caps is deciding that the client is a loner, and giving it caps,but then not sending a message telling the client it got anything
[23:25] * EmilienM (~EmilienM@195-132-228-252.rev.numericable.fr) has left #ceph
[23:26] <slang> gregaf: is there a request to do that?
[23:26] <gregaf> actually, I don't think that's where the problem is
[23:26] <gregaf> but it's related to that
[23:27] <slang> i guess stale would cause the client to req-request
[23:27] <gregaf> slang: request to do what?
[23:27] <slang> re-request
[23:27] <gregaf> yeah, it just resends iirc
[23:27] <gregaf> in this case, the problem is that the client has caps that it doesn't want
[23:27] <gregaf> it also may not be getting told it has the caps; not sure about that yet
[23:28] <gregaf> actually, I think it probably does know; that would explain why the client also has growing memory use
[23:28] <slang> gregaf: right - just thinking if it doesn't, it could send a stale msg to the client
[23:30] <gregaf> I'm missing something "if it doesn't" what?
[23:30] <gregaf> err
[23:30] <gregaf> *something — "if it doesn't" what?
[23:31] * Cube (~cube@ has joined #ceph
[23:31] <slang> gregaf: if the mds doesn't tell the client that it issued it caps
[23:32] <gregaf> ah, then the mds could send a stale message to the client on those caps when they aren't refreshed later?
[23:32] <slang> gregaf: right
[23:32] <gregaf> yeah
[23:32] <slang> gregaf: in fact that should happen based on looking at the code
[23:32] <gregaf> I don't remember how the stale detection works
[23:33] <slang> gregaf: looks like the tick marks all sessions stale that haven't sent renewcaps messages
[23:33] <gregaf> that's what I was afraid of
[23:33] <slang> gregaf: within an interval
[23:33] <gregaf> so in this case the client session is still active
[23:33] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[23:34] <Kioob> Hi
[23:34] <gregaf> and since the refresh isn't on explicit caps, then the lost caps never get recovered
[23:34] <gregaf> (note: when I said "afraid of", I really meant "I think it's inconvenient for this purpose, but of course pretty obvious since a per-inode stale check would be expensive")
[23:35] <Kioob> I'm looking again to RBD, to replace LVM+DRBD structures ; since LVM+DRBD over hardware RAID6 doesn't behave very well on random writes (mainly for MySQL InnoDB)
[23:36] <gregaf> Kioob: looking for advice? a sanity check? help on some problem?
[23:37] <Kioob> gregaf: is RBD ready for production ?
[23:37] <gregaf> we consider it to be, yes
[23:38] <Kioob> ok, so, I would like to reuse same hardware. I have 4 servers, each one with 8 SAS 7200rpm disks, 4 SSD, 48GB of RAM and 12 core. I suppose I can put 8 OSD on each server, right ?
[23:38] <gregaf> that would probably make the most sense
[23:39] <Kioob> and I can manage the number of copies of each RDB block ?
[23:39] <gregaf> well, you set how many copies there are on a pool level
[23:39] <gregaf> and you can put different RBD images in different pools
[23:39] <Kioob> great
[23:40] * Cube (~cube@ Quit (Quit: Leaving.)
[23:40] <gregaf> if you're looking to try and use it for backing SQL databases though, you'll want a sanity check from nhm or joshd — I don't remember if anybody's been successful with that use case
[23:40] * Cube (~cube@ has joined #ceph
[23:40] <Kioob> mmm yes
[23:41] <nhm> Kioob: random writes are tough. I'd go into this with low expectations. :)
[23:41] <joshd> rbd caching makes things a bit better, but you're still better off if you can get mysql to do larger I/Os
[23:42] <Kioob> the problem is because of virtualization
[23:42] <joshd> and of course the read/write mix matters too
[23:42] <Kioob> 20 different DB VM, which writes simultaneously...
[23:43] * tziOm (~bjornar@ti0099a340-dhcp0358.bb.online.no) Quit (Remote host closed the connection)
[23:44] * lofejndif (~lsqavnbok@83TAABFJ1.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[23:46] <joshd> consider how many iops that is vs. how many your spinning disks provide / replication count
[23:47] <Kioob> for now, because of RAID6 there is way more IO on disks than asked by the system
[23:47] <Kioob> for example, on a software RAID version I have :
[23:47] <joshd> you can put the journals on ssds to help with bursts of writes, but eventually your bottleneck will be the spinning disk
[23:47] <joshd> *disks
[23:47] <nhm> Kioob: do you have both the transaction log files and data files?
[23:47] <Kioob> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
[23:47] <Kioob> sda 986,83 4957,15 2017,59 21196442658 8627068079
[23:47] <Kioob> sdb 989,62 4935,87 2020,27 21105447016 8638520983
[23:47] <Kioob> ...
[23:47] <Kioob> sdl 1038,28 4898,26 2129,74 20944626785 9106648167
[23:47] <Kioob> md2 2015,60 7430,53 11817,50 31772449561 50530851950
[23:48] <nhm> Kioob: sorry, do you plant to put both on rbd?
[23:48] <Kioob> this one is full SSD
[23:48] <Kioob> nhm: yes, I would like to have all the system on RBD
[23:48] <Kioob> (xen vm)
[23:49] <Kioob> as you can see in that stats I have more writes than reads on the "md2" block device, but to compute the RAID6 checksum md throw a lot of reads IO
[23:49] <nhm> Kioob: the transaction log files sound kind of nasty. You may want to increase the default OS_FILE_LOG_BLOCK_SIZE. I don't know a whole lot about mysql tuning though, so ymmv.
[23:53] <Kioob> I was supposing the problem come from 1) the RAID6 overhead, and 2) the fact that there is 20 differents VM running MySQL on same hardware. So "sequential writes" for each virtual machine is no more sequential for the physical system
[23:53] <Kioob> by writing all data in a CoW way, I suppose the "2" point will be solved
[23:54] <Kioob> and for the "1" point, I was supposing that RDB can be a solution
[23:59] <nhm> Kioob: I want to be honest. In practice I'm not sure we are a great solution for databases on RBD right now. I'd love someone to prove me wrong though.
[23:59] <nhm> Kioob: You'll have best luck if you can increase the write sizes.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.