#ceph IRC Log


IRC Log for 2011-10-17

Timestamps are in GMT/BST.

[1:11] * eternaleye_ (~eternaley@ Quit (Remote host closed the connection)
[1:28] * bencherian (~bencheria@cpe-76-173-232-163.socal.res.rr.com) has joined #ceph
[1:35] * verwilst (~verwilst@d51A5B030.access.telenet.be) Quit (Quit: Ex-Chat)
[1:58] * eternaleye_ (~eternaley@ has joined #ceph
[2:15] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) has joined #ceph
[4:20] * n0de (~ilyanabut@c-24-127-204-190.hsd1.fl.comcast.net) has joined #ceph
[4:56] * n0de (~ilyanabut@c-24-127-204-190.hsd1.fl.comcast.net) Quit (Quit: This computer has gone to sleep)
[4:57] * bencherian (~bencheria@cpe-76-173-232-163.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[4:58] * bencherian (~bencheria@cpe-76-173-232-163.socal.res.rr.com) has joined #ceph
[6:03] * Nadir_Seen_Fire (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[6:08] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[6:29] * bencherian (~bencheria@cpe-76-173-232-163.socal.res.rr.com) Quit (Quit: bencherian)
[6:35] * bencherian (~bencheria@cpe-76-173-232-163.socal.res.rr.com) has joined #ceph
[7:09] * tserong (~tserong@58-6-101-23.dyn.iinet.net.au) has joined #ceph
[7:27] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[7:34] * Nadir_Seen_Fire (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[7:46] * bencherian (~bencheria@cpe-76-173-232-163.socal.res.rr.com) Quit (Quit: bencherian)
[7:51] <tserong> pardon the spammish nature of this: if anyone here is coming to linux.conf.au in January 2012, we would love to have speakers on ceph at the High Availability and Distributed Storage miniconf: http://tinyurl.com/ha-lca2012-cfp - feel free to pass this on, and thanks for listening :)
[8:08] <tserong> (or see http://goo.gl/k9yA7 for a slightly more announce-y looking announcement)
[8:23] * alexxy (~alexxy@ Quit (Quit: No Ping reply in 180 seconds.)
[8:26] * alexxy (~alexxy@ has joined #ceph
[8:43] * MKFG (~MK_FG@ has joined #ceph
[8:46] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) Quit (Ping timeout: 480 seconds)
[8:46] * MKFG is now known as MK_FG
[10:38] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[11:11] * hijacker (~hijacker@ Quit (Ping timeout: 480 seconds)
[11:11] * hijacker (~hijacker@ has joined #ceph
[12:48] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[14:13] * iribaar (~iribaar@ has joined #ceph
[14:34] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:02] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Remote host closed the connection)
[15:14] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[15:26] * iribaar (~iribaar@ Quit (Read error: Connection reset by peer)
[15:27] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[15:27] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:31] * lxo (~aoliva@lxo.user.oftc.net) Quit ()
[15:45] * slang (~slang@chml01.drwholdings.com) Quit (Remote host closed the connection)
[15:50] * slang (~slang@chml01.drwholdings.com) has joined #ceph
[15:52] * gregorg_taf (~Greg@ has joined #ceph
[15:52] * gregorg (~Greg@ Quit (Ping timeout: 480 seconds)
[16:09] * iribaar (~iribaar@ has joined #ceph
[16:59] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[17:43] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:51] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) has joined #ceph
[17:53] * elder (~elder@cfcafwp.sgi.com) has joined #ceph
[17:56] <damoxc> sagewk: are there still plans to add copy-on-write to rbd?
[17:57] <sagewk> damoxc: yeah. probably start next sprint
[17:58] <damoxc> sagewk: awesome, that's good to hear
[18:00] <df__> from the weekend:
[18:00] <df__> 1051Z < df__> dd: writing `/mnt/ceph/lf.7453.12618.27625': File too large, 1099511627776 bytes (1.1 TB) copied, 12870.9 s, 85.4 MB/s, is that meant to happen?
[18:01] <sagewk> df__: somewhere there is a #define or conf with the max file size.
[18:01] <Tv> if ((uint64_t)(offset+size) > mdsmap->get_max_filesize()) //too large!
[18:01] <sagewk> df__: is a purely artificial limit to bound the amount of recovery work the fs has to do if a client writing to the file crashes
[18:01] <Tv> return -EFBIG;
[18:02] <sagewk> df__: (the mds has to scan all objects to recover mtime, tail objects to get file size)
[18:02] <slang> its a config option: mds max file size = <something large>
[18:02] <Tv> src/common/config_opts.h:139:OPTION(mds_max_file_size, OPT_U64, 1ULL << 40)
[18:02] <Tv> >>> 1<<40
[18:02] <Tv> 1099511627776
[18:02] <Tv> looks like the same number ;)
[18:02] <df__> ah, thanks
[18:03] <gregaf> sagewk: isn't the need for something like that obviated with the per-file size limits the MDS maintains?
[18:04] <sagewk> need for which?
[18:04] <gregaf> a global size limit
[18:04] <sagewk> iirc it's only used during mkfs to initailizxe the value in teh mdsmap
[18:05] <sagewk> maybe a better approach would be to put it in the file layout, and make it part of the dir layout policies
[18:05] <sagewk> then you could change it on a per-file basis if you wanted to
[18:05] <df__> btw, the current ceph-client/master is able to completely kill the machine i'm working on -- so badly that i don't get any logging
[18:06] <sagewk> df__: that's not good.. whats the workload?
[18:06] <gregaf> sagewk: no, I mean the MDS already maintains on a per-file basis the max length the client is allowed to extend the file to, and when the client runs up near that limit it asks for more space…right?
[18:07] <gregaf> so the MDS only goes through lots of work on files that have lots of data
[18:07] <gregaf> or do you want to prevent users from making that check take a long time anyway?
[18:08] <sagewk> right.
[18:09] <sagewk> i'm worried about 'truncate(1PB) ; reboot -f -n' until we have a saner approach on the mds side
[18:10] <gregaf> hmm, okay
[18:10] <df__> sagewk, just trying to identify it. the first time i triggered it, i got a Oct 17 15:42:51 vc-macpro kernel: [ 5634.493987] BUG: Bad page state in process kworker/0:0 pfn:1723f2
[18:20] <sagewk> df__ is it easy to reproduce?
[18:21] <df__> so far, i've tried twice and killed it both times, just trying to find the smallest thing that does it
[18:22] <df__> in maybe related news, i've also got quite a lot of the following being reported:
[18:22] <df__> Oct 17 15:42:51 vc-fs3 mds.vc-fs3[6912]: 7f26363d1700 mds0.cache.dir(100000a42a8) mismatch between child accounted_rstats and my rstats!
[18:22] <df__> Oct 17 15:42:51 vc-fs3 mds.vc-fs3[6912]: 7f26363d1700 mds0.cache.dir(100000a42a8) total of child dentrys: n(v0 rc2011-10-17 15:42:03.666432 b139006652705 881=776+105)
[18:22] <df__> Oct 17 15:42:51 vc-fs3 mds.vc-fs3[6912]: 7f26363d1700 mds0.cache.dir(100000a42a8) my rstats: n(v404 rc2011-10-17 15:42:03.666432 b349047412156 1778=1608+170)
[18:22] <df__> and
[18:22] <df__> Oct 17 16:21:41 vc-fs3 mds.vc-fs3[6912]: 7f26363d1700 mds0.cache.dir(100000a42a8) mismatch between head items and fnode.fragstat! printing dentries
[18:22] <df__> Oct 17 16:21:41 vc-fs3 mds.vc-fs3[6912]: 7f26363d1700 mds0.cache.dir(100000a42a8) get_num_head_items() = 21; fnode.fragstat.nfiles=0 fnode.fragstat.nsubdirs=34
[18:23] <sagewk> df__: single mds or clustered?
[18:23] <df__> 3 mds, 1 active
[18:24] * cclien (~cclien@ec2-175-41-146-71.ap-southeast-1.compute.amazonaws.com) Quit (Ping timeout: 480 seconds)
[18:27] <gregaf> he had other troubles on Friday too with the client caps — I didn't get a chance to dig through those log files, though :/
[18:32] <df__> sorry :(
[18:32] <df__> yep, just killed it a third time
[18:33] <df__> all this was doing was two concurrent video decodes (ie , reading a file and writing a different one)
[18:34] <sagewk> df__: my first guess would be the most recent patch (osdmap thing), since it's seen the least testing.
[18:35] <df__> i had gone to revert that, but the description looked less likely, i'll rerun with that
[18:35] <sagewk> yeah, i don't see any obvious problems with it.. :/
[18:35] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[18:49] * bchrisman (~Adium@ has joined #ceph
[18:55] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:20] * jojy (~jojyvargh@ has joined #ceph
[19:20] <damoxc> sagewk: is there anything useful in libceph that should/could be exposed via language bindings?
[19:24] * jojy_ (~jojyvargh@ has joined #ceph
[19:24] * jojy (~jojyvargh@ Quit (Read error: Connection reset by peer)
[19:24] * jojy_ is now known as jojy
[19:31] <df__> sagewk, i've got this feeling that the kernel crash might be unrelated to ceph-client
[19:34] <df__> btw, are there any tunables for mds performance?
[19:34] <df__> via ceph: $ time find . | wc
[19:34] <df__> 6892 6892 266411
[19:34] <df__> real 0m38.298s
[19:34] <df__> via nfs: real 0m5.101s
[19:34] <df__> (same dir tree on both systems)
[19:36] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[19:51] <sagewk> df__: is that with debugging etc off?
[19:57] <cp> Question: sometime doing a command like
[19:57] <cp> echo ' name=admin mypool foo' > /sys/bus/rbd/add
[19:58] <cp> says: sys/bus/rbd/add not found
[19:58] <cp> any ideas?
[19:58] <joshd> modprobe rbd
[19:59] * fronlius (~Adium@f054111204.adsl.alicedsl.de) has joined #ceph
[19:59] <joshd> if that doesn't work, upgrade to a kernel with rbd support
[20:15] * adjohn (~adjohn@ has joined #ceph
[20:22] <jmlowe> does anybody know about feature #1619 from the roadmap?
[20:22] <jmlowe> http://tracker.newdream.net/issues/1619
[20:23] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:29] * iribaar (~iribaar@ Quit (Ping timeout: 480 seconds)
[20:35] * iribaar (~iribaar@ has joined #ceph
[20:37] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit (Quit: cp)
[20:42] <NaioN> Is there any change this bug (http://tracker.newdream.net/issues/1573) is fixed in 0.38? Its the bug with multiple rsync workload and the mds crashes
[20:42] <NaioN> What I see is that the MDS gets killed by the OOM killer...
[20:43] <NaioN> looks like a memory leak?
[20:54] * jojy (~jojyvargh@ Quit (Quit: jojy)
[20:54] * jojy (~jojyvargh@ has joined #ceph
[21:05] * cp (~cp@ has joined #ceph
[21:07] <fronlius> Does anyone of you know if there is a puppet module for ceph? There seems to be a chef cookbook (https://github.com/NewDreamNetwork/ceph-cookbooks) …so I thought maybe one of you uses puppet and has something to share?
[21:11] <cp> joshd: thanks. Wasn't either of those though. Using 11.04 and checked that rbd is loaded according to the kernel logs
[21:12] <NaioN> cp: and /sys/bus/rbd/add exists?
[21:13] <cp> yup
[21:13] <NaioN> is that the exact error?
[21:13] <joshd> cp: it must be the image not being found then
[21:13] <NaioN> because if something is wrong with the line then it gives a write error
[21:13] <cp> Ah. Didn't think of that (I'm remote debugging for someone else)
[21:14] <cp> Yup, that was it
[21:14] <NaioN> it could also be the wrong pool or something
[21:14] <NaioN> cp: then it gives a write error
[21:14] <NaioN> not a not found error
[21:16] <joshd> jmlowe: I think there's a selinux or apparmour check during migration at least that assumes the image is a file - #1619 is to test all the rbd functionality that libvirt exposes and fix any other such checks
[21:16] <cp> Yup wrong object name
[21:17] <NaioN> joshd: i saw you summitted the multiple rsync workload bug
[21:17] <NaioN> did you get it in 0.35 upwards?
[21:18] <NaioN> and did you test 0.36? because I used 0.36 and did not get the exact same errors
[21:18] <joshd> NaioN: it only occurred that one time - it's been tested nightly for a while with no reccurrences
[21:19] <NaioN> ok so it could be fixed in 0.36
[21:19] <joshd> possibly
[21:19] <NaioN> the thing I saw is that the mds gets killed by the OOM killer
[21:20] <NaioN> does the MDS require some minimum memory?
[21:21] <joshd> hmm, there were some memory leaks fixed, but that was after 0.36
[21:22] <NaioN> ok...
[21:22] <jmlowe> joshd: I can give you one example of apparmor breaking libvirt, on a stock ubuntu 11.10 if you use the example from the wiki qemu will try to look at /etc/ceph/ceph.conf as the libvirt user but will be blocked
[21:22] <joshd> generally performance is better with more memory since you can increase the MDS' cache size
[21:22] <NaioN> I use the 0.36 at the moment
[21:22] <Tv> fronlius: we are not currently working on puppet deployment; chef, crowbar & juju is plenty to keep us busy right now; the lower level things like "bring up a new osd" will be reusable across deployment mechanisms
[21:22] <NaioN> so that could explain things
[21:24] <jmlowe> joshd: the resulting error indicates the user doesn't have access to the rbd image but actually it is the config file that the libvirt user can't access
[21:25] * adjohn (~adjohn@ Quit (Quit: adjohn)
[21:25] <joshd> jmlowe: thanks, I think we can improve the error message there
[21:27] * __jt__ (~james@jamestaylor.org) has joined #ceph
[21:27] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[21:27] <greglap> NaioN: what are you doing with the MDS when it gets OOM-killed?
[21:28] <NaioN> multiple rsync workload
[21:29] <greglap> hmm, not related to 1573, that's a straight-up segfault
[21:30] <greglap> what's the filesystem look like that you're rsyncing?
[21:30] <greglap> is it very large directories?
[21:30] <NaioN> well nothing special as far as I can tell
[21:30] <NaioN> its a XFS filesystem
[21:30] <NaioN> well multiple filesystems
[21:31] <NaioN> each about 100G
[21:31] <greglap> how much memory does the MDS have?
[21:31] <NaioN> 2G
[21:31] <greglap> and have you tweaked any of its config settings?
[21:31] <NaioN> nop
[21:31] <NaioN> Im waiting for memory for the boxes I want to use for the MDS
[21:32] <NaioN> then they get more mem
[21:32] <NaioN> this is a VM
[21:32] <NaioN> so its not much
[21:34] <greglap> what kind of hardware?
[21:35] <greglap> You might be hitting some bug we don't have but I can't think what it would be off-hand
[21:35] <greglap> and so I wonder if your MDS is just too slow so it's getting backed up on messages
[21:39] <NaioN> its a virtual machine (vmware)
[21:40] <greglap> NaioN: are you using cephx?
[21:40] <greglap> that's the only leak we've fixed in the MDS since v0.36
[21:40] <NaioN> well joshd told there where some memory leaks that got solved after 0.36
[21:40] <NaioN> no no cephx
[21:41] <greglap> hmm, that's all I can think of off-hand
[21:42] <NaioN> I'll try again if I have the hardware
[21:42] <NaioN> then the mds gets 16g
[21:43] <greglap> if you want to look into it you can install the debug symbols and do memory profiling and see what comes up
[21:43] <greglap> http://ceph.newdream.net/wiki/Memory_Profiling
[21:43] <greglap> once you have heap dumps we can walk you through the rest of it
[21:44] <NaioN> ah nice will give it a try tomorrow
[21:45] <NaioN> i can also give that command to the mds?
[21:46] <NaioN> done
[21:47] <NaioN> I'll let it run now
[21:49] * fronlius (~Adium@f054111204.adsl.alicedsl.de) Quit (Quit: Leaving.)
[21:49] * fronlius (~Adium@f054111204.adsl.alicedsl.de) has joined #ceph
[21:54] * greglap (~Adium@aon.hq.newdream.net) has left #ceph
[22:25] * adjohn (~adjohn@ has joined #ceph
[22:33] * adjohn (~adjohn@ Quit (Read error: Connection reset by peer)
[22:33] * adjohn (~adjohn@ has joined #ceph
[22:49] <df__> sagewk, yes
[23:03] * fronlius (~Adium@f054111204.adsl.alicedsl.de) Quit (Quit: Leaving.)
[23:21] * Nightdog (~karl@190.84-48-62.nextgentel.com) has joined #ceph
[23:27] <ajm> hey sagewk, did you get a chance to look at backporting that patch ?
[23:28] <sagewk> not yet
[23:28] <sagewk> v0.35 was it?
[23:31] <ajm> 0.34
[23:31] <ajm> 0.35 was the pg changes that I can't do due to this :)
[23:33] <sagewk> yeah
[23:38] * tserong (~tserong@58-6-101-23.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[23:47] * tserong (~tserong@58-6-103-205.dyn.iinet.net.au) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.