#ceph IRC Log


IRC Log for 2011-01-21

Timestamps are in GMT/BST.

[0:00] <bchrisman> and write to new files will be accepted and simply avoid the downed osds?
[0:00] <bchrisman> I'll enable swap and retry the drive pull… I don't think we have any systems with large RAM configs..
[0:01] <gregaf> unfortunately no, unavailable PGs aren't accounted for when mapping (they can't be, really), so it's just random chance whether your new files will be available right away or not
[0:01] <bchrisman> ahh ok
[0:01] <gregaf> sjust has run more tests around this than I have, he rewrote the scrub code I think
[0:01] <bchrisman> is memory required for rebuild proportional to storage size or the pg number?
[0:02] <gregaf> IIRC it does a few PGs at a time, and it's proportional to the number of objects in those PGs
[0:02] <bchrisman> ahh ok.. thanks!
[0:03] <gregaf> order n+m where n is number of PGs and m is number of objects in those n pgs
[0:03] <gregaf> sjust: am I getting that right?
[0:03] <sjust> sounds right
[0:03] <bchrisman> ahh
[0:03] <sjust> no, wait
[0:03] <jantje> i'm currently running 1 mds on a single machine, I see average load going up to 1.00 (it's a quad core machine)
[0:03] <sjust> should be more like order nm where m is the number of objects per pg and n is the number of concurrent recoveries
[0:04] <gregaf> oh, right
[0:04] <jantje> (when I'm stressing it out .. but at that point the mds looks cpu limited - single threaded?)
[0:04] <gregaf> and concurrent recoveries is a tunable, right? I forgot that
[0:04] <bchrisman> concurrent recoveries being # of ongoing repairs from failed osds?
[0:04] <sjust> # of ongoing pg repairs from failed osds
[0:05] <gregaf> bchrisman: so, yes
[0:05] <bchrisman> oh… number of recoveries allowed to be run simultaneously?
[0:05] <bchrisman> ahh ok
[0:05] <sjust> a failed osd will cause some number of pgs to require repairs
[0:05] <bchrisman> yeah… makes sense.
[0:05] <gregaf> sjust: you can tune the number that a single OSD will try and recover simultaneously, right?
[0:05] <sjust> I think so, haven't looked into that myself
[0:06] <gregaf> so you could turn that number down to reduce memory requirements, although the recovery will then take longer
[0:06] <bchrisman> ahh that's good design.. yeah
[0:07] <sjust> osd recovery max active, I think
[0:07] <sjust> so yeah
[0:07] <sjust> and there is a similar tunable for number of concurrent scrubs
[0:07] <sjust> osd max scrubs, I think
[0:08] <gregaf> jantje: the MDS runs a lot of threads but most things require a global lock
[0:08] <jantje> ok, i just was expecting to eat much more memory and cpu :-)
[0:09] <jantje> i'm currently running a chrooted 32bit env on a 64bit machine/OS/kernel
[0:09] <gregaf> well, the amount of memory it'll eat is limited by the tunables and by how much metadata is in the filesystem
[0:10] <gregaf> if you have plenty of extra you can turn up a few things, most importantly the mds_cache_size
[0:11] <jantje> ok, i'll try that later
[0:15] <jantje> yehudasa: are you in ?
[0:23] <jantje> This is what it should look like:
[0:23] <jantje> stat64(".", {st_mode=S_IFDIR|0755, st_size=4600, ...}) = 0
[0:23] <jantje> fstat64(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 17), ...}) = 0
[0:23] <jantje> brk(0x8c2d000) = 0x8c2d000
[0:25] <jantje> and this is what I get on a ceph fs: (mounted with norbyteS)
[0:25] <jantje> stat64("panos/", {st_mode=S_IFDIR|0755, st_size=225, ...}) = 0
[0:25] <jantje> write(2, "cc1: ", 5cc1: ) = 5
[0:25] <jantje> write(2, "panos/: Value too large for defi"..., 45panos/: Value too large for defined data type) = 45
[0:25] <jantje> write(2, "\n", 1
[0:25] <jantje> ) = 1
[0:25] <jantje> fstat64(0, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 0), ...}) = 0
[0:25] <jantje> brk(0x9e62000) = 0x9e62000
[0:26] <jantje> hmm
[0:26] <jantje> whats fstat?
[0:26] <jantje> file descriptor stat()
[0:26] <DeHackEd> its checking what kind of "file" stdin is. perhaps a pipe, a direct file attached, or a terminal
[0:27] <jantje> the issue is so weird, last time I had yehudasa solved it by making 'norbytes' work, and now inside my chroot environmet I get it again
[0:28] <jantje> (yes, I patched the client's kernel)
[0:31] <jantje> ok, the fact that i'm using a chrooted environment has nothing to do with it
[0:33] <jantje> the makedev differs
[0:38] <jantje> yehudasa: it would be great if you could look into this .. again, sorry .... the issue is resolved when using a 32bit client and 64bit server, but now i'm using 64bit server and 64bit client
[0:38] <jantje> thanks for your time already, i'm going to bed, it's vereeery late inhere
[0:38] <jantje> good night all
[0:51] <Tv|work> woo
[0:52] <Tv|work> so apparently my autotest use was hanging because netpipe the benchmark tool doesn't actually work, just hangs
[0:52] <Tv|work> netperf = pretty pretty success
[0:53] <cmccabe> tv: the network isn't a pipe, that you can just netpipe things through
[0:53] <cmccabe> tv: it's more like a truck
[0:58] <gregaf> a truck carrying poker chips
[0:58] <gregaf> and if you try and put too many trucks through, those poker chips clog up the tubes
[0:59] * sakib (~sakib@ has joined #ceph
[0:59] <cmccabe> I actually have never listened to the "series of tubes" speech
[1:01] <cmccabe> wonder if CSPAN covered it when the senator first delivered it
[1:05] <gregaf> the poker chips wasn't actually his
[1:05] <gregaf> that was a Daily Show mashup of some online-gambling bans that happened at the same time
[1:05] <cmccabe> heh
[1:05] <gregaf> but the "series of tubes" is…well, it's just about as dumb as you can imagine
[1:06] <cmccabe> I hope that by the time our generation gets that old, there will be some new technology for us to misunderstand and poorly regulate
[1:06] <Tv|work> gregaf: as long as you add a semi-smart router in every pipe junction ... ;)
[1:07] <Tv|work> more like a pneumatic package delivery system
[1:07] <sakib> hi guys
[1:07] <gregaf> make sure it can buffer 5 minutes worth of chips
[1:07] <gregaf> hi sakib
[1:09] <sakib> i wanted to ask if there's any way for ceph client to change inode mode and its xattr in one request, i.e. do several things in one mds request
[1:10] <sakib> just whether it's possible or not
[1:11] <gregaf> not as it's set up right now, no
[1:16] <sakib> it means e.g. whan i wanna create node with xattr, 2 separate requests must be dispatched, right?
[1:17] <gregaf> sakib: oh, sorry, xattrs can go in with other metadata updates
[1:18] <gregaf> but right now a mknod is always in its own request, and happens synchronously
[1:21] <gregaf> metadata changes on the inode can all get squashed together, but changes that involve moving/creating/deleting the inode generally have to be synchronous through the MDS and get their own op
[1:24] <sakib> gregaf: could you please point some example routine in client code that does such metadata changes?
[1:26] <gregaf> sakib: kernel client or userspace?
[1:27] <sakib> kernel client
[1:27] <sagewk> ceph_setattr in fs/ceph/inode.c
[1:31] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[1:35] <sakib> sagewk: but it's working only with setattr args, not xattr
[1:39] <sakib> am i right?
[1:50] <gregaf> sagewk had to step out for a meeting
[1:55] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[2:01] <sakib> i gotta go now. Thanks for help, anyway :)
[2:01] * sakib (~sakib@ Quit (Quit: leaving)
[2:21] * fzylogic (~fzylogic@ Quit (Quit: DreamHost Web Hosting http://www.dreamhost.com)
[3:07] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:12] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Ping timeout: 480 seconds)
[3:17] * greglap (~Adium@ has joined #ceph
[3:18] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[3:23] * cmccabe (~cmccabe@ has left #ceph
[3:27] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[3:36] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[5:31] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[6:37] * ijuz_ (~ijuz@p4FFF7E10.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[6:46] * ijuz_ (~ijuz@p4FFF555B.dip.t-dialin.net) has joined #ceph
[6:57] * ijuz_ (~ijuz@p4FFF555B.dip.t-dialin.net) Quit (Ping timeout: 480 seconds)
[7:06] * ijuz_ (~ijuz@p57999B6E.dip.t-dialin.net) has joined #ceph
[8:28] * gregpad (~rooms@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[8:29] * gregpad (~rooms@cpe-76-90-239-202.socal.res.rr.com) Quit ()
[8:36] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:36] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[9:29] * allsystemsarego (~allsystem@ has joined #ceph
[9:47] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:56] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[12:15] * verwilst (~verwilst@router.begen1.office.netnoc.eu) has joined #ceph
[12:34] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[12:48] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[12:48] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[12:48] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[12:48] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[13:00] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[13:02] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[13:03] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[13:06] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[13:19] * hijacker (~hijacker@ Quit (Remote host closed the connection)
[14:08] * shdb (~shdb@gw.ptr-62-65-159-122.customer.ch.netstream.com) Quit (Quit: Lost terminal)
[14:13] * hijacker (~hijacker@ has joined #ceph
[14:43] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[15:02] * shdb (~shdb@gw.ptr-62-65-159-122.customer.ch.netstream.com) has joined #ceph
[15:14] * Yoric (~David@ has joined #ceph
[16:00] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[17:08] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:51] * greglap (~Adium@ has joined #ceph
[18:00] * Tv|work (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:47] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[18:48] * yehuda_hm (~yehuda@adsl-69-228-150-44.dsl.irvnca.pacbell.net) has joined #ceph
[18:49] * cmccabe (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) has joined #ceph
[18:57] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:08] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:08] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit ()
[19:23] <yehuda_hm> jantje: are you there?
[19:29] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[19:39] * Yoric (~David@ Quit (Quit: Yoric)
[20:18] <Tv|work> gitbuilder is doing it's thang already: http://ceph.newdream.net/gitbuilder/#origin/unstable
[20:19] <Tv|work> commit d69e5f5 broke unstable
[20:19] <cmccabe> k
[20:22] <gregaf> nice
[20:23] <cmccabe> it's unfortunate that we have those boost deprecated header warnings in all our builds
[20:23] <gregaf> yeah
[20:23] <cmccabe> so it will never really be clean
[20:23] <gregaf> I bet we could set up the proper includes via autotools
[20:24] <cmccabe> I had a patch that made the errors go away for me... let me see if I can find it
[20:24] <gregaf> but there isn't much point unless we can get a newer version of autotools that doesn't have that "not a literal" regression
[20:24] <cmccabe> I reverted it when I saw that sage had made the same change earlier and later reverted it :)
[20:24] <gregaf> well it's simple enough to make them go away locally, just switch the include names
[20:24] <gregaf> but then anybody with an older version of the headers can't compile
[20:24] <gregaf> at one point that was Sage
[20:24] <gregaf> we can fix all of us, of course
[20:24] <cmccabe> for me, I just defined a certain maco and things just worked
[20:24] <gregaf> but we can't fix all the people on older releases
[20:25] <cmccabe> yeah, I guess we need an autotools solution for that one.
[20:47] <Tv|work> or a way to just hide the warnings afterwards
[20:49] <Tv|work> it seems gitbuilder knows about wrapping things in "--START-IGNORE-WARNINGS" and "--STOP-IGNORE-WARNINGS"
[20:49] <Tv|work> probably just echo those around something that's bad
[20:50] <Tv|work> and don't make the scope too wide
[20:50] <cmccabe> yeah
[20:51] <Tv|work> you can try that in a branch
[20:51] <Tv|work> keep amending, discard the branch if it just won't work
[20:51] <cmccabe> a lot of projects seem to just ignore the warnings that come out of autotools
[20:52] <Tv|work> sad :(
[20:52] <cmccabe> I don't know if that's a good approach or not, but it certainly seems to be common
[20:52] <Tv|work> other thing to do would be to enhance gitbuilder to ignore specific warnings
[20:52] <Tv|work> those messages are constant
[20:52] <cmccabe> a new version of some autotool might spit out something else though
[20:53] <Tv|work> sure but the "unnecessary warnings" thing would be easy to change
[20:53] <cmccabe> my metric for it is, has it ever warned us about something that mattered?
[20:53] <cmccabe> nearly every time gcc warns, it's because of a real problem
[20:53] <cmccabe> with autotools I've never actually seen it warn about a real problem
[20:54] <cmccabe> and conversely, it lets real problems slip through with no warnings... grumble grumble
[20:54] <cmccabe> anyway, I'm not opposed to the whitelisting errors approach if it works
[20:57] <Tv|work> oh crap ceph.newdream.net is still running etch
[20:57] <Tv|work> that's almost 4 years old
[21:00] <cmccabe> hehe
[21:00] <cmccabe> I wonder if we should periodically do a 32-bit build
[21:01] <Tv|work> easy once i have a stash of kvm hosts to use ;)
[21:01] <cmccabe> it often seems to reveal errors that never show up on the 64-bit build
[21:01] <cmccabe> sometimes they're datatype-size related, but surprisingly, sometimes they're not
[21:01] <cmccabe> another fun gcc thing is that you get different error messages at -O2 than at -O0 or other optimization levels
[21:02] <cmccabe> er, s/error/warning/
[21:09] * cybergirl (~cybergirl@ANantes-552-1-71-81.w92-139.abo.wanadoo.fr) has joined #ceph
[21:11] * cybergirl (~cybergirl@ANantes-552-1-71-81.w92-139.abo.wanadoo.fr) Quit ()
[22:10] <darkfader> what's the most minimal ceph setup I could build? i wanna bring it up and then rebuild&join the old nodes later, like in a month or so
[22:10] <darkfader> 2mds, 1mon, 2osd?
[22:11] <jantje> yehuda_hm: yes, i'm here, but I don't have lots of time, i'm leaving for vacation tonight (less than 4 hours)
[22:13] <sagewk> jantje: wer you able to check /proc/mounts for norbytes?
[22:18] <gregaf> darkfader: minimal setup is one of each daemon type
[22:19] <gregaf> there's no data replication there of course, but it'll run
[22:58] <bchrisman> theoretical question: if I had shared storage (the same disk on two nodes having same UUID and pointing at the same physical hdd), and I node2 fails, can I pretty much just startup a cosd for that disk on node1 and somehow tell ceph where it is?
[22:59] <bchrisman> (ie, avoid having to repeer PGs?)
[23:00] <sagewk> they will repeer (cosd daemon restarted), but no data will be copied around
[23:02] <gregaf> I don't think they need the same UUID or anything, either
[23:14] <darkfader> that was really interesting
[23:15] <gregaf> what happened?
[23:16] <darkfader> i meant bchrisman's question
[23:16] <darkfader> i never thought about recycling an osd
[23:16] <gregaf> ah, yeah
[23:16] <bchrisman> sagewk: thanks!
[23:17] <gregaf> I'm not sure that it'd be a very common use-case in a Ceph system, having federated storage like that
[23:18] <gregaf> although I guess for people interested in converting Lustre clusters to Ceph that would apply
[23:28] * earth (~earth@2001:470:1f06:687::2) has joined #ceph
[23:30] <bchrisman> thus far, the more I dig into this filesystem, the better it looks… good job to all those working on it. :)
[23:34] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:48] <jantje> sagewk: 'mount' reported rw,norbytes
[23:49] <jantje> I did not explicitely check /proc, but I assume 'mount' just reads that file
[23:50] <cmccabe> jantje: actually mount reads /etc/mtab
[23:54] <sagewk> jantje: yeah, /proc/mount will tell you what the kernel thinks, which will narrow things down a bit (might be an issue with mount arg parsing or something)
[23:55] <jantje> ok, give me a few minutes, i'll set up my vpn session
[23:56] <sagewk> thanks
[23:58] <jantje> yes, it tells me rw,relatime,norbytes
[23:59] <jantje> and some caps_wanted_delay stuff

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.