#ceph IRC Log

Index

IRC Log for 2010-11-03

Timestamps are in GMT/BST.

[0:09] * yehudasa_hm (~yehuda@ppp-69-228-129-75.dsl.irvnca.pacbell.net) Quit (Read error: Connection reset by peer)
[0:09] * yehudasa_hm (~yehuda@ppp-69-228-129-75.dsl.irvnca.pacbell.net) has joined #ceph
[0:10] <yehudasa_hm> cd
[0:10] <yehudasa_hm> wrong window :)
[0:52] * terang (~me@ip-66-33-206-8.dreamhost.com) has joined #ceph
[1:04] * terang (~me@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[1:23] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[1:31] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) has joined #ceph
[1:36] * greglap (~Adium@cpe-76-90-74-194.socal.res.rr.com) Quit (Quit: Leaving.)
[1:56] * greglap (~Adium@76.90.74.194) has joined #ceph
[2:23] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:23] * cmccabe (~cmccabe@dsl081-243-128.sfo1.dsl.speakeasy.net) has left #ceph
[3:09] * lidongyang (~lidongyan@222.126.194.154) Quit (Remote host closed the connection)
[3:44] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has left #ceph
[4:17] * terang (~me@ip-66-33-206-8.dreamhost.com) has joined #ceph
[7:08] * f4m8_ is now known as f4m8
[7:09] * terang (~me@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[7:51] * terang (~me@pool-173-55-24-140.lsanca.fios.verizon.net) has joined #ceph
[8:04] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:04] * greglap (~Adium@76.90.74.194) Quit (Quit: Leaving.)
[8:09] * lidongyang (~lidongyan@222.126.194.154) has joined #ceph
[8:13] * allsystemsarego (~allsystem@188.27.167.113) has joined #ceph
[9:30] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[9:50] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[10:35] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Read error: No route to host)
[10:50] * tjikkun (~tjikkun@195-240-122-237.ip.telfort.nl) has joined #ceph
[11:15] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[11:26] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[11:30] * growler (growler@dog.thdo.woaf.net) Quit (Quit: leaving)
[11:44] * Yoric (~David@213.144.210.93) has joined #ceph
[11:54] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[11:57] * yehudasa_hm (~yehuda@ppp-69-228-129-75.dsl.irvnca.pacbell.net) Quit (Ping timeout: 480 seconds)
[13:58] * allsystemsarego (~allsystem@188.27.167.113) Quit (Quit: Leaving)
[14:03] * ghaskins_mobile (~ghaskins_@12.157.84.42) has joined #ceph
[14:08] * ghaskins_mobile (~ghaskins_@12.157.84.42) Quit ()
[14:41] * terang (~me@pool-173-55-24-140.lsanca.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[14:49] * ghaskins_mobile (~ghaskins_@12.157.84.42) has joined #ceph
[14:51] * ghaskins_mobile (~ghaskins_@12.157.84.42) Quit ()
[15:06] * ghaskins_mobile (~ghaskins_@12.157.84.42) has joined #ceph
[15:36] * greenail (~greenail@73.47.2d.static.xlhost.com) has joined #ceph
[15:36] <greenail> hello, is anyone running ceph in production on ec2?
[15:41] * corecode (~2@0x2c.org) has joined #ceph
[15:41] <corecode> hi
[15:42] * yehudasa_hm (~yehuda@ppp-69-228-129-75.dsl.irvnca.pacbell.net) has joined #ceph
[15:54] <corecode> so i guess you consider btrfs stable enough to be used for data storage
[16:24] * greenail (~greenail@73.47.2d.static.xlhost.com) Quit (Quit: Leaving)
[16:35] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:51] * greglap (~Adium@166.205.136.105) has joined #ceph
[17:24] * ghaskins_mobile (~ghaskins_@12.157.84.42) Quit (Ping timeout: 480 seconds)
[17:43] * greglap (~Adium@166.205.136.105) Quit (Read error: Connection reset by peer)
[17:58] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:11] <sagewk> corecode: stable enough for us.
[18:16] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:19] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[18:23] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:33] * allsystemsarego (~allsystem@188.27.167.113) has joined #ceph
[18:34] <corecode> uh well
[18:34] * ghaskins_mobile (~ghaskins_@12.157.84.42) has joined #ceph
[18:34] <corecode> i just tried a simple ceph single node setup
[18:35] <corecode> with ext3
[18:35] <corecode> but that did not work well at all
[18:35] <corecode> unfortunately
[18:35] <corecode> untar of the linux kernel worked
[18:35] <corecode> but then a make -j24 just didn't do anything
[18:35] <corecode> sadness
[18:35] <sagewk> what client?
[18:35] <corecode> we're searching for a replacement for nfs for our home dirs (research)
[18:36] <corecode> the linux kernel module from the ceph-stable debian packages
[18:36] <corecode> well, ceph kernel + ceph rest from the debian packages
[18:36] <sagewk> argh, those debian packages are way out of date. i need to just delete them.
[18:36] <corecode> ubuntu maverick kernel
[18:36] <corecode> oh?
[18:37] <corecode> i thought they were 0.22.2?
[18:37] <sagewk> the kernel module packages are
[18:37] <sagewk> the userspace/server side is up to date
[18:37] <corecode> ah
[18:37] <sagewk> what kernel version is maverick?
[18:37] <corecode> i think the kernel module came with the maverick kernel
[18:37] <corecode> Linux labospc6.epfl.ch 2.6.35-22-generic #35-Ubuntu SMP Sat Oct 16 20:45:36 UTC 2010 x86_64 GNU/Linux
[18:39] <corecode> well, given that i see warnings all over the place that ceph might lose data, i'm a bit hesitant anyways to use it
[18:40] <corecode> i can't believe we have 2010 and the generic option is still nfs
[18:40] <sagewk> yeah, it's not ready for production use just yet. we're definitely interested in hearing bug reports tho
[18:40] <corecode> no offence
[18:40] <corecode> :)
[18:40] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Quit: bye)
[18:40] <sagewk> (a simple make workload should have no problems.. it's part of our regular testing here)
[18:41] <corecode> for quite some time i considered writing a proper parallel file system myself
[18:41] <corecode> but advisors say "you can't publish that"
[18:41] <sagewk> you might try ceph-client-standalone.git, which has the latest kernel client code as a standalone module.
[18:41] <corecode> i was also wondering why i had to specify the ip address of the monitor
[18:42] <sagewk> if you have /sbin/mount.ceph a hostname will work
[18:42] <corecode> ah
[18:42] <corecode> well, i think for now it won't be feasible to use ceph as production system
[18:42] <corecode> and too bad i didn't graduate yet
[18:43] <corecode> or i could consider fixing it :)
[18:43] * ghaskins_mobile (~ghaskins_@12.157.84.42) Quit (Quit: This computer has gone to sleep)
[18:46] <corecode> anyways
[18:46] <corecode> good luck
[18:46] <corecode> i enjoyed your osdi paper (as far as i can remember)
[18:46] <sagewk> thanks :)
[18:48] <corecode> too bad it is 4 years later and still "experimental" all over the place :D
[18:48] <sagewk> it's an ambitous feature set. we're getting close!
[18:49] <corecode> ah you went the add features, then make it stable router? :)
[18:49] <corecode> rotue*
[18:50] <corecode> oh god, i'm out of glucose
[18:50] * ghaskins_mobile (~ghaskins_@12.157.84.42) has joined #ceph
[18:53] <gregaf> there's sort of a minimum feature set before an fs is useful, and it takes longer to reach than you'd think
[18:54] <corecode> as everything in IT does
[18:58] * ghaskins_mobile (~ghaskins_@12.157.84.42) Quit (Quit: This computer has gone to sleep)
[18:59] * ghaskins_mobile (~ghaskins_@12.157.84.42) has joined #ceph
[19:09] * ghaskins_mobile (~ghaskins_@12.157.84.42) Quit (Quit: This computer has gone to sleep)
[19:18] * ghaskins_mobile (~ghaskins_@12.157.84.42) has joined #ceph
[19:19] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[19:20] * ghaskins_mobile (~ghaskins_@12.157.84.42) Quit ()
[19:41] * greglap1 (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:41] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[19:43] * ghaskins_mobile (~ghaskins_@12.157.84.42) has joined #ceph
[19:43] * ghaskins_mobile (~ghaskins_@12.157.84.42) Quit (Remote host closed the connection)
[19:58] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:58] * greglap1 (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[20:00] * Meths_ (rift@91.106.214.122) has joined #ceph
[20:06] * Meths (rift@91.106.196.234) Quit (Ping timeout: 480 seconds)
[20:06] * ghaskins_mobile (~ghaskins_@12.157.84.42) has joined #ceph
[20:06] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[20:06] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:06] * Meths_ is now known as Meths
[20:27] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[20:32] <wido> when there is an entry in /sys/kernel/debug/ceph/*/mdsc, which just stays there, does that mean something is hung? I'm running a rsync which went into D and doesn't continue
[20:32] <sagewk> yeah
[20:32] <sagewk> what's the operation?
[20:32] <wido> getattr
[20:33] <wido> only one line btw
[20:33] <wido> multi-MDS (2), trying to rsync kernel.org again
[20:33] * corecode (~2@0x2c.org) has left #ceph
[20:34] <sagewk> are mds logs enabled?
[20:34] <wido> no, but I can reproduce it easily, i'll up the logs and retry
[20:35] * greglap1 (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:35] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[20:40] <wido> MDS is still recovering from the restart, will take some time
[20:43] <sagewk> ok thanks
[20:45] <wido> ok, rsync is running again, I see a lot of "lookup" calls in the mdsc file now, rsync is still running fine
[20:47] <wido> btw, I found http://www.canonware.com/jemalloc/ today, some people claim it is better then Google's tcmalloc, might be worth checking out sometime if you hit a tcmalloc issue
[20:49] <gregaf> interesting, but we've been enjoying tcmalloc pretty well :)
[20:50] <wido> thought so, but you never know
[20:56] * ghaskins_mobile (~ghaskins_@12.157.84.42) Quit (Quit: This computer has gone to sleep)
[20:57] <sagewk> tcmalloc has some other profiling stuff that is pretty handy. good to know there are other options tho!
[21:16] <wido> sagewk: it seems it was an issue inside the MDS, problem isn't coming back...
[21:17] <wido> but one thing I did notice over the last few weeks, in locate "ceph" wasn't included in PRUNEFS, so every morning at 06:30 locate started to updated it's database and would also index /mnt/ceph, that always failed
[21:17] <wido> would all go to status D, and I had to reboot the client
[21:17] <wido> this morning I had to do the same, but did not reboot the MDS, and the rsync kept stalling all day
[21:27] <yehudasa_hm> sagewk: snapshots creation on kclient/rbd/kvm seem to interact nicely .. can create a snapshot on one that appears on another. Now all that is left is to fix all the bugs and other misc features
[21:33] <sagewk> wido: the client crashed, or just fs access hung?
[21:33] <sagewk> next time it happens maybe we can correlated mdsc with the mds logs and see what's up
[21:35] <wido> sagewk: nothing crashed, only the rsync hung
[21:35] <wido> fs access still worked fine, on another client I ran a "find /mnt/ceph", which went OK
[21:35] <wido> only that particular rsync stalled
[21:36] <wido> I'm going afk, i'll check if I can reproduce it
[21:38] <sagewk> ok thanks!
[21:56] * terang (~me@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:03] <sagewk> yehudasa_hm: awesome
[22:04] <sagewk> hey, did you see the libvert response?
[22:06] <sagewk> er, libvirt? the udev rules he mentioned sound good. although now they are making me nervous about our sysfs structure and the 'one data item per file' goal. wondering if we should have had /sys/class/rbd/123/name, snaps, etc., that sort of thing, instead of the tab delimited files
[22:20] <yehudasa_hm> yeah, saw that
[22:20] <yehudasa_hm> not sure which is best
[22:23] <sagewk> the current scheme means some grep / awk / cut magic, while a more structured sysfs could just cat the fields out.
[22:24] <yehudasa_hm> yeah, we can add that too
[22:26] <yehudasa_hm> so that we'd have both the current implementation and a structured one
[22:26] <yehudasa_hm> and in the end phase out the current one
[22:27] <yehudasa_hm> or maybe going cold turkey now, before .37 is out
[22:27] <sagewk> that's what i'm thinking
[22:27] <sagewk> linus is a stickler about the ABI changes
[22:28] <sagewk> having a node per device is pretty straightforward. whether to have dirs to represent snapshots is less clear...
[22:28] <sagewk> depends on whether we already have an inmemory structure for them?
[22:29] <yehudasa_hm> hmm, not sure whether we keep it on all the time or reread on request
[22:30] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[22:30] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[22:30] <yehudasa_hm> right, we have everything in memory
[22:32] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[22:33] <sagewk> do you mind looking at that now? would be nice to have it for -rc2. or maybe even send a description just of the interface out for review, since it likely didn't get much attention before
[22:33] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[22:33] <yehudasa_hm> yeah, sure
[22:33] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[22:33] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[22:34] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[22:34] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[22:35] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[22:35] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[22:37] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[22:49] * ghaskins_mobile (~ghaskins_@12.157.84.42) has joined #ceph
[23:17] * ghaskins_mobile (~ghaskins_@12.157.84.42) Quit (Quit: This computer has gone to sleep)
[23:21] * allsystemsarego (~allsystem@188.27.167.113) Quit (Quit: Leaving)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.