#ceph IRC Log


IRC Log for 2012-06-19

Timestamps are in GMT/BST.

[0:00] <nhm> I'm far more disappointed that I found out yesterday that a rim joist in my house is rotting out. ;(
[0:07] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:09] <tv_> brief sepia.ceph.com dns blip coming..
[0:14] <joao> nhm, they released 64 bit debs last month or something
[0:14] <joao> I got it working on my desktop, even saw Dan
[0:15] <joao> but didn't try with a webcam
[0:15] <tv_> sooo.. would one of you guys have installed rwhod all over the place?
[0:15] <joao> I have no ideia what that is, so my guess is no
[0:16] <elder> I didn't!
[0:16] <elder> (I know you were suspicious though.)
[0:16] <tv_> anyone who's a unix neckbeard is a suspect ;)
[0:17] <joao> unless it is a dependency for building ceph, and by "everywhere" you mean "plana41" and "plana09", I'm pretty sure I did nothing of the sort
[0:18] <nhm> tv_: not I, though I'm going to install blktrace soon.
[0:27] * dmick (~dmick@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[0:27] * sjust (~sam@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[0:31] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[0:33] * yehudasa (~yehudasa@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[0:34] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[0:39] <sagewk> nhm: you should use the current next branch instad of waiting for 0.48 for those test clusters
[0:41] <elder> sagewk, I have about half of the commits working for 3.4-stable. I've almost completed a bisect to find out what went wrong in the middle of them.
[0:42] <elder> Very time consuming though.
[0:45] <dmick> elder: do you know anything about alleged xfs hangs?
[0:46] <dmick> Florian Haas pointed to some things on 25 May; another user reporting hangs just now on the list
[0:46] <dmick> different stacks, but...
[0:51] <elder> I haven't been paying attention, dmick, sorry.
[0:52] <elder> Now I see that they've sent info to the XFS list but no response.
[0:52] <elder> If I ever managed to finish anything I might offer to take a closer look...
[0:53] <dmick> not requesting help atm, just wondering if your immersion in xfs had flagged any current problems
[0:53] <dmick> _xfs_buf_find seems to be a commonality
[0:54] <elder> That function finds a buffer for a given block range, creating one if it doesn't already exist.
[0:55] <dmick> yeah. I wonder if this correlates with full FS
[0:55] <dmick> I'll ask
[0:55] <elder> Scanning it right now I'm not sure where the hang might be.
[0:56] <dmick> http://www.spinics.net/lists/ceph-devel/msg06271.html is the older thread
[0:56] <dmick> http://www.spinics.net/lists/ceph-devel/msg06905.html
[0:56] <dmick> is the current question. Understand that I'm not begging for your help, but since you expressed interest
[0:56] <dmick> ..
[0:57] <elder> Ahh, that first one tells me more.
[0:57] <dmick> I found the first one; don't know if they're connected
[0:57] <elder> It didn't immediately get the lock on the (found) buffer, so it did a blocking lock aquisition.
[0:58] <elder> So something else holds that buffer locked and it's maybe not letting it go.
[0:58] <dmick> yeah
[0:59] <elder> And it's also reading an AG free block, which might be interesting.
[0:59] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:00] <elder> This is a section of code that had some activity last year now that I'm looking at it.
[1:01] <elder> When you're freeing an extent there's a bunch of stuff that happens to keep the extent free lists at a proper length.
[1:05] <elder> That second one is a very different path.
[1:06] <elder> The first one was freeing an extent. The second one is involved in converting a delayed-allocation extent into a real extent.
[1:07] <elder> It sounds like the problem is reproducible. I won't volunteer to do it right now, but it would be possible (if reproducible) to gather a bit more information, using tracing and/or KDB to poke around.
[1:25] <elder> sagewk, con->mutex generally prevents *any* concurrency between the worker thread and the "regular" messenger code, right?
[1:26] <elder> However, since certain things (like allocating a message from try_read()) can drop it, it may be possible for the worker threads to make some progress.
[1:27] <elder> Looking at this now I'm a little afraid we may have (yet) another issue related to data being queued before it's entirely ready.
[1:27] <elder> (for write)
[1:33] * cattelan_away_away_away (~cattelan@2001:4978:267:0:21c:c0ff:febf:814b) has joined #ceph
[1:44] * yehudasa (~yehudasa@static-66-14-234-139.bdsl.verizon.net) has joined #ceph
[1:48] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[1:55] * yehudasa (~yehudasa@static-66-14-234-139.bdsl.verizon.net) Quit (Ping timeout: 480 seconds)
[1:58] * tv_ (~tv@2607:f298:a:607:b19a:507b:4b80:3233) Quit (Quit: tv_)
[2:04] * yehudasa (~yehudasa@ace.ops.newdream.net) has joined #ceph
[2:13] * joao (~JL@ Quit (Remote host closed the connection)
[2:29] * yoshi (~yoshi@p37158-ipngn3901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:32] * lofejndif (~lsqavnbok@9KCAAGEDU.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[3:13] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Ping timeout: 480 seconds)
[3:30] * adjohn (~adjohn@ Quit (Quit: adjohn)
[3:49] * yehudasa (~yehudasa@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)
[4:13] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[4:23] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[4:53] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[4:53] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:54] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit ()
[6:00] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[6:03] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit ()
[6:18] * renzhi (~renzhi@raq2064.uk2.net) has joined #ceph
[6:21] * dmick is now known as dmick_away
[6:21] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[6:28] <renzhi> Hi, on a system with multiple disk, is it better to run multiple osds (one per disk), or just one osd with hardware RAID?
[6:40] * dmick_away is now known as dmick
[6:41] <dmick> renzhi: in general we recommend letting RADOS do the disk replication. Hardware RAID tends to, at best, be wasted effort, and at worst will lose redundancy (for instance, if
[6:41] <dmick> anything on that machine goes down, the entire set of disks go down, whereas with RADOS they're going to be replicated 'far' away, less likely to suffer clustered failures)
[6:42] <renzhi> dmick: thanks, that's what I was thinking too
[6:43] <renzhi> besides, when I need to change the disks that failed, sometimes, there is no more disk of the same model, and it's messy
[6:43] <renzhi> but,
[6:43] <dmick> since RADOS only does mirroring at the moment, it's less space-efficient, but the hardening benefits make up for it. Also, replacing a failed disk is faster and less of a hotspot load on the cluster.
[6:44] <renzhi> when I have multiple disk, and the system boots up, isn't it a problem that the disk id is not always the same?
[6:44] <dmick> (tends to be faster because the reads come from many different places to one)
[6:45] <renzhi> with lots of osds, the client apps (using librados) use up more resources as well, no?
[6:45] <renzhi> librados internally seems to have a lot of threads for osds
[6:45] <dmick> not sure which disk id you mean, and who it might be a problem to. Your view of the cluster is through rados, and it isolates you from knowledge fo the particular disks involved.
[6:46] <dmick> as for client apps/resources: the client itself only talks to the primary OSD; replication is handled intra-cluster
[6:46] <dmick> (the client sometimes talks to the monitor as well, of course, but mostly the I/O is done by talking to the OSDs directly)
[6:47] <dmick> so for a given object, with a given PG, the client makes one read/write request to the primary OSD for that object/PG, and gets back one response
[6:47] <dmick> there are multiple OSD processes running, but they tend not to use a whole lot of machine resources
[6:48] <dmick> (or there would be multiple OSD processes if you used one per disk on a multidisk system)
[6:48] <renzhi> hmm... ok, my understanding seems to be bit off...
[6:49] <renzhi> because when we did stress testing on our cluster, with lots of client connections (librados) client, the library seems to use up quite a bit of threads, and we had a lot of crashes
[6:50] <renzhi> another estimation question: If we are to have 50 millions objects, what kind of RAM should I have for these machines?
[6:50] <dmick> if you have multiple I/Os in flight, of course the library uses multiple threads to handle those, but that's true of whatever storage you use. Sorry to hear about the crashes; are you implying that you think they were related to running out of thread resources on the client?
[6:50] <renzhi> yes
[6:51] <dmick> did it seem like there were many more threads than I/Os in flight?
[6:52] <renzhi> we had a small cluster, 3 osd, 2 mds, 3 mon.
[6:53] <renzhi> we tried to start client apps to do read/write, each app will spawn 20 to 50 threads, each thread connect to the cluster, do a write, then a read, and do hash to compare that the data is still right.
[6:54] <dmick> ok
[6:54] <renzhi> that's it. We ran the client app on machines with 16GB of memory, it ran out of thread resources way before we reach our target
[6:54] <dmick> that certainly doesn't sound right. How many apps were you trying to run? And you say this was using librados?
[6:54] <renzhi> so we had to change our code to have share the handle with the ioctx
[6:54] <renzhi> yes, librados
[6:55] <renzhi> we tried different ways to see which one is better, one of them is to have a thread create a handle, connect, do IO, then shutdown.
[6:56] <dmick> how many instances of the app were you trying to run at once?
[6:56] <renzhi> I'm not sure about haring the cluster handle yet, as Josh said there might be race condition, but Sage said it's ok
[6:57] <renzhi> we ran 3 on that machines
[6:57] <sage> renzhi: it's supposed to work. if you run into a race, it's a bug we should fix.
[6:57] <renzhi> sage: I haven't yet, hopefully, will never :)
[6:57] <renzhi> thanks
[6:57] <sage> :)
[6:58] <dmick> so 60/150 threads on 16GB doesn't sound unreasonable. Can you reproduce this? If so, please consider filing a bug with the details at http://tracker.newdream.net/projects/ceph
[6:58] <renzhi> I did
[6:58] <dmick> I just ran rados bench with 150 threads with no errors on a similar cluster (3 machines, 3 monitors, 6 OSDs total)
[6:58] <renzhi> I'm still trying to learn the internal of ceph :)
[6:59] <dmick> heh. ah. http://tracker.newdream.net/issues/2524?
[6:59] <renzhi> yeah
[7:00] <renzhi> we wrapped an erlang app over librados, initially, we always thought something was wrong with our erlang app or something
[7:00] <sage> renzhi: was that you who posted erlang bindings a few weeks back?
[7:01] <dmick> sage: yes, I remember the name
[7:01] <renzhi> yes
[7:01] <renzhi> I didn't do the binding for all functions yet, will do more later :)
[7:01] <sage> nice
[7:01] <renzhi> it's on github
[7:02] <dmick> so that bug says you were running 50000 concurrent I/Os? That's... more than 50 :)
[7:03] <renzhi> 50K was our target, we never reached it
[7:03] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[7:03] <renzhi> it crashed on us even at 200 threads (threads in the app, not counting those in librados)
[7:04] <dmick> ok
[7:04] <renzhi> we have an erlang server that provide storage service to other apps, and there might be a lot of client connections
[7:05] <sage> sharing the ioctx is the way to go
[7:05] <renzhi> so we were testing it to see how it behaves on load
[7:05] <renzhi> even sharing ioctx?
[7:05] <renzhi> we just share the handle at this point
[7:05] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit ()
[7:05] <renzhi> ah well, I'll need to read more the code :)
[7:06] <sage> oh, sharing the handle is fine too
[7:07] <renzhi> ok, I'll try another test with ioctx sharing, maybe later in the week.
[7:08] <sage> did you have problems sharing the rados handle?
[7:08] <sage> (sorry, haven't read the full channel history)
[7:08] <renzhi> no, it's fine so far
[7:09] <renzhi> we got to 20K connections, still waiting for more machines to do testing later.
[7:09] <sage> ok, cool :)
[7:09] <sage> don't bother sharing the ioctx in that case, won't change much
[7:09] <renzhi> ok
[7:10] <renzhi> thanks you all, Ceph has been cool so far, besides the crashes (but we can get around it), and the documentation :)
[7:10] <sage> hehe :) working on it!
[7:11] <renzhi> thanks
[7:11] <renzhi> you guys provide commercial support to China?
[7:14] <dmick> That's certainly something we can have business people talk to you about, yes. We tend to be developers on the channel
[7:15] <dmick> I have to leave for the night. Good luck renzhi and thanks for stretching the limits of rados!
[7:15] * dmick is now known as dmick_away
[7:17] <renzhi> dmick: thanks a lot
[8:08] * cattelan_away_away_away is now known as cattelan_away_away_away_away
[8:10] * cattelan_away_away_away_away is now known as cattelan_away_away_away
[8:10] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:11] * cattelan_away_away_away is now known as cattelan_away_away_away_away
[8:15] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:56] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[8:56] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[9:03] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:08] * s[X]_ (~sX]@ppp59-167-154-113.static.internode.on.net) Quit (Remote host closed the connection)
[9:10] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[9:13] * BManojlovic (~steki@ has joined #ceph
[9:36] * Meyer__ (meyer@c64.org) Quit (Read error: Connection reset by peer)
[9:37] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[9:43] * loicd (~loic@cnit.vipnetwork.fr) has joined #ceph
[9:44] * Meyer__ (meyer@c64.org) has joined #ceph
[9:53] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[10:01] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[10:03] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[10:04] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[10:14] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:20] * loicd (~loic@cnit.vipnetwork.fr) Quit (Read error: No route to host)
[10:20] * loicd (~loic@cnit.vipnetwork.fr) has joined #ceph
[10:35] * nhm_ (~nh@65-128-158-48.mpls.qwest.net) has joined #ceph
[10:37] * nhm (~nh@65-128-190-140.mpls.qwest.net) Quit (Read error: Operation timed out)
[10:57] * Qu310 (~qgrasso@ppp59-167-157-24.static.internode.on.net) has left #ceph
[11:03] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[11:09] * yoshi (~yoshi@p37158-ipngn3901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:25] * Ryan_Lane (~Adium@dslb-088-075-182-072.pools.arcor-ip.net) Quit (Quit: Leaving.)
[11:32] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[11:40] * joao (~JL@89-181-154-187.net.novis.pt) has joined #ceph
[11:40] * Damian_ (~Damian@mountainmorningband.com) Quit (Quit: leaving)
[12:10] * Ryan_Lane (~Adium@p5DDC7315.dip.t-dialin.net) has joined #ceph
[12:36] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:28] * renzhi is now known as renzhi_away
[13:30] * lofejndif (~lsqavnbok@83TAAGMGX.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:39] * lofejndif (~lsqavnbok@83TAAGMGX.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[13:42] * loicd1 (~loic@cnit.vipnetwork.fr) has joined #ceph
[13:42] * loicd (~loic@cnit.vipnetwork.fr) Quit (Read error: No route to host)
[13:45] * loicd1 is now known as loic
[13:45] * loic is now known as loicd
[13:48] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[14:14] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[15:02] * Ryan_Lane (~Adium@p5DDC7315.dip.t-dialin.net) Quit (Remote host closed the connection)
[15:03] * Ryan_Lane (~Adium@p5DDC7315.dip.t-dialin.net) has joined #ceph
[15:03] * loicd (~loic@cnit.vipnetwork.fr) Quit (Quit: Leaving.)
[15:29] * lofejndif (~lsqavnbok@09GAAGCWH.tor-irc.dnsbl.oftc.net) has joined #ceph
[15:40] * lofejndif (~lsqavnbok@09GAAGCWH.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[15:44] * cattelan_away_away_away_away is now known as cattelan_away_away_away
[15:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[15:51] * OutBackDingo (~quassel@rrcs-71-43-84-222.se.biz.rr.com) has joined #ceph
[15:52] <OutBackDingo> okies.... found irc...... anyone know if this will buiold/work on FreeBSD ?
[15:59] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:10] <jerker> OutBackDingo: if you ever try on ZFS let us know! :) I'm running ZFSonLinux for my backupmachines and like it a lot..
[16:10] <jerker> OutBackDingo: answer to question: I don't know
[16:11] <OutBackDingo> jerker: thats my objective.... seeing if itll run on FreeBSD with ZFS ...... rumours say itll build but no docs...
[16:34] * loicd (~loic@cnit.vipnetwork.fr) has joined #ceph
[16:44] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[17:06] <iggy> OutBackDingo: I remember someone building it a while back, but it didn't quite work... they never said anything again
[17:07] <iggy> I'm assuming they weren't (enough of) a developer to try to fix the differences
[17:07] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:08] <OutBackDingo> iggy: welp, it seems it wants libuuid which alone is very linux secific
[17:08] <iggy> is it... it's... just... a... lib... for.... generating uuid's
[17:09] <iggy> there's an rfc for that
[17:09] <OutBackDingo> there is this also....... http://ceph.com/2011/12/
[17:10] <OutBackDingo> where in FreeBSD we do have uuidgen
[17:12] <OutBackDingo> though uuid functions are part of libc
[17:17] * loicd (~loic@cnit.vipnetwork.fr) Quit (Quit: Leaving.)
[17:24] * loicd (~loic@cnit.vipnetwork.fr) has joined #ceph
[17:38] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[17:40] <OutBackDingo> ./configure --without-fuse --without-tcmalloc --without-libatomic-ops ----- jeeez how linux specific can one be configure: error: libaio not found
[17:48] * widodh_ (~widodh@minotaur.apache.org) has joined #ceph
[17:52] <iggy> it's a LInux filesystem... do you really expect different?
[17:53] * widodh (~widodh@minotaur.apache.org) Quit (Ping timeout: 480 seconds)
[17:59] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[18:02] * loicd (~loic@cnit.vipnetwork.fr) Quit (Quit: Leaving.)
[18:12] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[18:30] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[18:35] <OutBackDingo> iggy: LOL a fielsystems a filesystem.... like ZFS is on solaris, xBSD, and now also linux
[18:35] <iggy> ceph... is not
[18:36] <iggy> port the client to something else and I bet the servers will follow
[18:36] <iggy> although until recently, it wouldn't even run(ish) on !btrfs, so...
[18:38] <iggy> maybe if you get it compiling you can convince the devs to put freebsd into the build testing to try to catch things breaking in the future
[18:38] <iggy> instead of it being broken for probably months and some other poor schmuck having to do porting work in some months
[18:40] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[18:49] <nhm_> ooh, zfs testing. I've been wanting to do that.
[18:50] <nhm_> I wonder how the llnl guys are doing with thier zfs port.
[18:56] * yehudasa_ (~yehudasa@aon.hq.newdream.net) has joined #ceph
[19:03] * fghaas (~florian@ has joined #ceph
[19:06] * dmick_away is now known as dmick
[19:07] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:11] * fghaas (~florian@ Quit (Ping timeout: 480 seconds)
[19:12] * lofejndif (~lsqavnbok@82VAAEL7M.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:13] * chutzpah (~chutz@ has joined #ceph
[19:16] * loicd (~loic@gw-wifi.paris.fph.ch) has joined #ceph
[19:16] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[19:18] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) Quit (Quit: LarsFronius)
[19:19] <joao> oh joy
[19:19] <joao> gonna be late for the standup
[19:19] <joao> the router suddenly stopped allowing connections from new devices and a reboot is in order
[19:20] * BManojlovic (~steki@ has joined #ceph
[19:21] * jluis (~JL@ has joined #ceph
[19:23] <jluis> lol
[19:23] * fghaas (~florian@ has joined #ceph
[19:23] <jluis> guys, just a heads up: "All Ports in use"; can't join the vidyo room
[19:24] <jluis> nevermind
[19:27] <jluis> nhm, elder, they're back :P
[19:27] <elder> Well we're gone.
[19:27] <elder> Let them know we missed them.
[19:28] * joao (~JL@89-181-154-187.net.novis.pt) Quit (Ping timeout: 480 seconds)
[19:28] * jluis is now known as joao
[19:38] <dmick> well, that was a bloodbath
[19:38] <dmick> our end kept freezing all video and sound except ours
[19:38] <dmick> so it didn't seem like our client
[19:38] <dmick> the PC had very little load
[19:39] <dmick> I don't know how the connections are made, and whether it could have been the NDN Vidyo server influencing the problem
[19:39] <dmick> but it sure felt like server overload. I suppose "bad networking" could explain it too
[19:39] <dmick> but everything from my desktop seems fine. <shrug>
[19:39] <nhm_> dmick: everyone remote was able to talk to each other fine...
[19:39] <dmick> weird
[19:40] <nhm_> dmick: where is the vidyo server?
[19:41] <dmick> vidyo.newdream.net is its name. I don't know where it is physically
[19:41] <nhm_> hrm... Well, hopefully it'll be better tomorrow. :)
[19:42] * joao (~JL@ Quit (Quit: Leaving)
[19:42] * joao (~JL@ has joined #ceph
[19:50] * JJ (~JJ@ has joined #ceph
[20:01] * fghaas (~florian@ Quit (Ping timeout: 480 seconds)
[20:04] * Ryan_Lane (~Adium@p5DDC7315.dip.t-dialin.net) Quit (Quit: Leaving.)
[20:19] * fghaas (~florian@ has joined #ceph
[20:20] * SpamapS (~clint@xencbyrum2.srihosting.com) Quit (Quit: brb)
[20:21] * SpamapS (~clint@xencbyrum2.srihosting.com) has joined #ceph
[20:21] * SpamapS (~clint@xencbyrum2.srihosting.com) Quit ()
[20:21] * SpamapS (~clint@xencbyrum2.srihosting.com) has joined #ceph
[20:22] * SpamapS (~clint@xencbyrum2.srihosting.com) Quit ()
[20:23] * SpamapS (~clint@xencbyrum2.srihosting.com) has joined #ceph
[20:31] * fghaas (~florian@ Quit (Ping timeout: 480 seconds)
[20:47] <elder> Is there any way to find out what's going on with a running nightly-type job?
[20:54] * LarsFronius (~LarsFroni@95-91-243-252-dynip.superkabel.de) has joined #ceph
[21:10] * The_Bishop (~bishop@2a01:198:2ee:0:705c:2d4e:814c:b265) has joined #ceph
[21:11] * lofejndif (~lsqavnbok@82VAAEL7M.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[21:12] <gregaf> elder: you can go into the machine and look at the status of its individual jobs
[21:12] <gregaf> there's also some script there that's supposed to summarize it, but I've never gotten that to work
[21:23] * loicd (~loic@gw-wifi.paris.fph.ch) Quit (Quit: Leaving.)
[21:30] <jerker> nhm_: isnt IBM Sequoia running Lustre on top of ZoL (ZFS on Linux)? Hmm. http://en.wikipedia.org/wiki/IBM_Sequoia#Filesystem
[21:33] <lxo> wow, ceph is *far* more stable than before with btrfs as in 3.4.*!
[21:33] <lxo> I guess the zero-sized files were causing more corruption to metadata than I could see
[21:35] <lxo> now, I have a question about crush rules. I know how to arrange for ceph to place at most one copy of a file on a server with multiple OSDs, but what if I want it to place at most *two* copies there?
[21:37] <lxo> I've tried naming the server diskset twice in the set of all servers, but that doesn't seem to have had any effect
[21:37] <gregaf> lxo: I don't think you can do that with CRUSH right now, unfortunately
[21:38] <lxo> right now I've split the disks in two separate sets, but that's less than ideal
[21:38] <lxo> (because the actual scenario is a bit more complicated, with one large disk and several smaller ones)
[21:40] <lxo> the smaller disks add up to a few times the size of the larger disk, so I'd set up 3 disk sets on this server, the largest of my 3 servers, and I don't want all 3 copies of any piece of data to be on that one server
[21:40] <lxo> oh well... I guess I'll survive without that then ;-)
[21:41] <jerker> naive question; performance or redundancy reasons?
[21:41] <lxo> redundancy. I want the cluster to survive a failure of any of the 3 servers
[21:42] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:42] <jerker> I mean by putting 2 copies on the large server. Well. Just curious.
[21:42] <lxo> for performance, I thnk I'd want to make sure every piece of data has its primary on one of the smaller disks. but I haven't got there yet
[21:43] <lxo> it's not that I *want* two copies there. I just want to make sure I don't have all 3 copies there, and the total sizes on the servers pretty much requires at least some data to be present twice on this larger server
[21:44] <jerker> Cant you describe like all the small servers in one "rack" and the big server in one "rack" and then use two copies of data?
[21:44] <jerker> please not, i am not a developer, just quite naive user
[21:44] <jerker> note
[21:44] <lxo> I really want 3 copies, for one
[21:45] <jerker> ok
[21:45] * fghaas (~florian@ has joined #ceph
[21:46] <lxo> so my arrangement right now has two ???sets??? with a single disk on a middle-sized server, one set with two disks on a smaller server, and one set with the single largest disk and one set with the remaining small disks on the largest server
[21:47] <lxo> crush weights are set up by total size, so the size kind of evens out, but not entirely: I get a feeling that the set with the collection of small disks, that adds up to some 30% of the total cluster size, is getting less than its share
[21:47] <iggy> i think most of the development has targeted pretty homogenous setups or enough nodes that it won't matter
[21:49] <lxo> now, what I was thinking of doing, for performance, was to force every piece of data to have a copy in the smaller disks, and then have the two other copies randomly placed among to the other 5 disks, which led me to the question of how to allow for two copies in any one crush set
[21:49] <lxo> iggy, that's my feeling as well
[21:50] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:52] <lxo> IIRC I even tried creating different sets with the same disks, but that didn't work, in weird ways
[21:53] <lxo> now, I'm pretty sure crush was not designed for this sort of duplication, so I was a bit surprised crushtool didn't complain in the first place
[21:55] <iggy> it probably doesn't expect people to intentionally try to put their data in jeopardy by playing games in their crush map
[21:59] <lxo> AFAICT it did just that when I played such ???games??? with the crush map: rather than noticing it'd already used one disk on another set and picking another, it seemed to choose the same disk more than once, leaving a piece of data without enough replication
[22:00] <lxo> that said, I'm not entirely sure that's what happened, but I'm pretty sure some PGs ended up unexpectedly degraded
[22:02] <lxo> I have to get a better grasp of the internals of crush to actually make sense of the observed behavior
[22:05] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:06] * Ryan_Lane (~Adium@dslb-088-075-182-072.pools.arcor-ip.net) has joined #ceph
[22:08] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[22:18] * danieagle (~Daniel@ has joined #ceph
[22:18] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[22:24] * fghaas (~florian@ Quit (Read error: Connection reset by peer)
[22:25] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[22:57] * fghaas (~florian@2404:130:0:a000:213:a9ff:fea3:bb7f) has joined #ceph
[22:57] * lofejndif (~lsqavnbok@1RDAACRV0.tor-irc.dnsbl.oftc.net) has joined #ceph
[23:11] * fghaas (~florian@2404:130:0:a000:213:a9ff:fea3:bb7f) Quit (Quit: Leaving.)
[23:22] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[23:26] * brambles (brambles@ Quit (Quit: leaving)
[23:27] * brambles (brambles@ has joined #ceph
[23:34] <sage> joshd: did you want to look at wip-clsrbd before i merge into master?
[23:36] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:37] <joshd> yeah, I'll do that in a little while
[23:37] <gregaf> speaking of, I noticed while testing the rbd cli that right now you can lock old-style header images
[23:38] <gregaf> but you can't unlock them that way via librbd because of the mechanism it's using for that listing
[23:38] <sage> maybe the lock operations should only succeed on new-style images?
[23:38] <gregaf> yeah
[23:38] <gregaf> so I added that check to librbd
[23:38] * s[X]_ (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[23:39] <gregaf> but I wonder if we want to add a feature RBD_FORMAT2 or something
[23:39] <gregaf> and make the class check for it
[23:40] <gregaf> because so far as I recall there isn't a great way for the class to check those things without randomly poking around and seeing what data is where
[23:40] <sage> it can look for the 'size' key... i think that's what a few other things do?
[23:40] <sage> but yeah, a feature might be cleaner
[23:41] <sage> why can't you unlock?
[23:42] <gregaf> most of them try and access data and so will fail out, but the failure will be like -ENOENT
[23:42] <gregaf> which is a lot less helpful than -EWRONGTHINGDUMMY
[23:42] <gregaf> ;)
[23:44] <gregaf> oh, I guess it's the listing, not the unlock
[23:44] <gregaf> because those are stored as part of librbd's "mutable metadata", all of which is left empty on old-style images
[23:51] <sage> joshd, gregaf: maybe that function should return ENOEXEC on old-style images.. that'd capture all such cases?
[23:52] <gregaf> in fact require_feature already does that
[23:52] <gregaf> I'm just going through and setting all of the new functions to run that before they do anything else
[23:52] <gregaf> require_feature(hctx, 0)
[23:56] * s[X]_ (~sX]@ppp59-167-154-113.static.internode.on.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.