#ceph IRC Log


IRC Log for 2011-03-17

Timestamps are in GMT/BST.

[0:02] * darkfader (~floh@ Quit (Server closed connection)
[0:02] * darkfader (~floh@ has joined #ceph
[0:30] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[0:45] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (Server closed connection)
[0:45] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[1:02] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[1:25] * eternaleye (~eternaley@ Quit (Server closed connection)
[1:25] * eternaleye (~eternaley@ has joined #ceph
[1:38] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[1:57] * [ack] (ANONYMOUS@ Quit (Server closed connection)
[1:57] * [ack] (ANONYMOUS@ has joined #ceph
[2:10] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:18] * iggy (~iggy@theiggy.com) Quit (Server closed connection)
[2:18] * iggy (~iggy@theiggy.com) has joined #ceph
[2:37] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:38] * cmccabe (~cmccabe@ has left #ceph
[2:41] * rajeshr (~Adium@ Quit (Quit: Leaving.)
[3:11] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) has joined #ceph
[3:18] * alexxy (~alexxy@ Quit (Quit: No Ping reply in 180 seconds.)
[3:18] * alexxy (~alexxy@ has joined #ceph
[4:23] * MK_FG (~MK_FG@ Quit (Server closed connection)
[4:23] * MK_FG (~MK_FG@ has joined #ceph
[4:25] * greglap (~Adium@cpe-76-90-239-202.socal.res.rr.com) has joined #ceph
[6:06] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[6:07] * lxo (~aoliva@ has joined #ceph
[6:56] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[7:11] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) has joined #ceph
[7:27] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) Quit (Ping timeout: 480 seconds)
[7:30] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) has joined #ceph
[7:47] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) Quit (Ping timeout: 480 seconds)
[8:13] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) has joined #ceph
[8:16] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: neurodrone)
[8:20] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[8:29] * lxo (~aoliva@ has joined #ceph
[8:35] * alexxy[home] (~alexxy@ has joined #ceph
[8:35] * alexxy (~alexxy@ Quit (Read error: Connection reset by peer)
[8:41] * gucki (~gucki@80-218-124-80.dclient.hispeed.ch) has joined #ceph
[8:41] <gucki> hi
[8:42] <wido> hi
[8:42] <gucki> i need to build an s3 like storage environment and currently evaluating which fs would fit best. i cam accross ceph, but have these questions...maybe someone can help me :)
[8:43] <gucki> What does not ready for production mean?
[8:43] <gucki> just some crashes here and then, memory leaks, ...?
[8:43] <wido> Well, there are some bugs, it could crash from time to time
[8:43] <gucki> OR: changing data layout, data corruption, ...?
[8:44] <gucki> How does ceph deal with node failure? (ex split-brain scenarios, power failure of all nodes at once, ..)
[8:44] <wido> Well, no, although, I can't guarantee
[8:44] <wido> It handles that fine, just give it some time to recover, it won't be running in 10 sec
[8:44] <wido> a large cluster may take a while to get back on track
[8:44] <gucki> ok, that wont be a problem. but how about split-brain scenarios....i used to look at glusterfs and they seem to have had big issues with this problem?
[8:45] <wido> But, if you stay away from the POSIX Filesystem and only use the RADOS Gateway, it should be fairly stable
[8:45] <wido> There is no such split-brain like gluster or DRBD has
[8:46] <gucki> do you think ceph would be a good match for an s3 like storage and ready for heavy usage already today? crashes of nodes from time to time (which will be recovered) are not a problem, as ceph should just ignore those nodes until restarted? :)
[8:46] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:46] <wido> I'd love to say Yes, but you should really find out yourself, if you think it's ok for your env
[8:46] <gucki> would you recommend to build my own kernel driver or should the packages delivered with debian 6 (or ubuntu 10.04 lts) be ok too?
[8:46] <wido> You don't need a special kernel if you only use RADOS/S3
[8:47] <wido> You won't be touching the kernel client
[8:47] <wido> you might want to run a recent kernel on your OSD's, just for the newer btrfs code
[8:47] <gucki> ah ok...for rados/s3 everything is done in userland. so just update the userland regularily from git stable is the best option i guess then? :)
[8:47] <wido> Indeed
[8:48] <wido> but I would test it first if I were you, if you're happy with it, great!
[8:48] <gucki> hm....i think i'll be using ceph with xfs or ext4, btrfs seems pretty unstable (even still changing dataformats)
[8:48] <wido> ext4/xfs will give you a lower performance, especially with snapshots
[8:49] <wido> I have to go to the office, meeting, be online again in about 3 hours
[8:49] <gucki> i dont need snapshots. just ha and very good read performance :)
[8:49] <gucki> ok, i'll give it a try. thanks very much for the information. cya later
[8:49] <wido> Got some ideas, but got to go
[8:49] <wido> See the Wiki, about the RADOSGW and Varnish
[8:49] <wido> cya!
[8:50] <gucki> ok:)
[9:01] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[9:24] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:34] * lxo (~aoliva@oliva.athome.lsd.ic.unicamp.br) has joined #ceph
[10:19] * Yoric (~David@ has joined #ceph
[10:23] * lx0 (~aoliva@ has joined #ceph
[10:29] * lxo (~aoliva@oliva.athome.lsd.ic.unicamp.br) Quit (Ping timeout: 480 seconds)
[10:48] * darkfaded (~floh@ has joined #ceph
[10:51] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (Ping timeout: 480 seconds)
[10:52] * Meths (rift@ Quit (Read error: Connection reset by peer)
[10:52] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[10:53] * darkfader (~floh@ Quit (Ping timeout: 480 seconds)
[10:53] * monrad (~mmk@domitian.tdx.dk) has joined #ceph
[10:56] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) Quit (Read error: Connection reset by peer)
[10:56] * Meths (rift@ has joined #ceph
[10:57] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) has joined #ceph
[10:57] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (Ping timeout: 480 seconds)
[10:57] * allsystemsarego (~allsystem@ has joined #ceph
[11:14] * rajeshr1 (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) has joined #ceph
[11:14] * rajeshr (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) Quit (Read error: Connection reset by peer)
[11:41] <gucki> btw, how does ceph (when only using rados) compare to mongodb grid-fs?
[11:41] * lx0 (~aoliva@ Quit (Read error: Connection reset by peer)
[11:42] * lx0 (~aoliva@ has joined #ceph
[12:06] * lx0 is now known as lxo
[12:20] * Yoric (~David@ Quit (Quit: Yoric)
[12:23] * Yoric (~David@ has joined #ceph
[12:32] * Yoric (~David@ has left #ceph
[12:43] <johnl> hi gucki
[12:43] <gucki> hey
[12:44] <johnl> I don't think mongodb grid-fs has metadata
[12:44] <gucki> what do you exactly mean with meta-data?
[12:45] <johnl> directories, permissions, ownership - the usual filesystem stuff
[12:45] <johnl> oh sorry, you said compared to rados
[12:46] <johnl> rados is designed from the ground up for storing big blobs of data
[12:46] <johnl> mongodb isn't really
[12:46] <johnl> I would not want to have to manage a large mongodb cluster for filestorage
[12:47] <gucki> mh, what would be the difference? in mongo all nodes are equal, same for ceph? would mongo just have lower performance, or...?
[12:48] <johnl> all nodes aren't necessarily equal in mongo
[12:48] <johnl> you can do sharding with it
[12:48] <johnl> if that's what you mean by equal anyway
[12:48] <johnl> not every mongo server necessarily is a full copy of the data
[12:49] <johnl> good place to look on that would be: http://www.mongodb.org/display/DOCS/Sharding+Introduction
[12:50] <johnl> but that is some work you have to do to configure it
[12:50] <gucki> yes, i would do sharding (results are the same as striping in a fs guess?)
[12:50] <gucki> ok, so ceph is easier to setup and to maintain.
[12:50] <johnl> basically, rados already gives you the equivalent of sharding
[12:50] <johnl> baked right in
[12:50] <johnl> yeah
[12:51] <johnl> whereas with gridfs, you'd have to configure mongo as you need it
[12:51] <johnl> but ceph does have other complexities
[12:51] <johnl> one important thing is when a server is lost and you add a new blank server in
[12:51] <gucki> wido said there wont be any split-brain issues in ceph. but since ceph doesnt have a master (single) meta server, how does it solve this problem?
[12:52] <johnl> with mongodb, the new server would get a copy of the data from one of the other servers. so the rebuild is limited to the speed of the other replica
[12:52] <johnl> and then that other replica is under load during the rebuild
[12:52] <johnl> with ceph/rados, the rebuild is from many other servers (due to the way the hashing works (placement groups))
[12:53] <gucki> ah ok, that's obviously better :)
[12:53] <johnl> a ceph cluster won't make progress unless it has quorum
[12:53] <johnl> i.e, if a majority of its monitor nodes are available to "vote"
[12:54] <johnl> so as long as you have an odd number of monitors (and metadata servers), you can't ever have a split brain
[12:54] <johnl> well, if you have an even number of monitors, you still can't have a split brain, because both sides would stop doing anything if one ocurred
[12:55] <gucki> mh ok...so if one monitor goes offline the filesystem hangs? (or is readonly?)
[12:56] <gucki> and it only continues to operate when the monitor comes back or another one goes offline too?
[12:57] <johnl> if you have only one monitor, then the filesystem clients block until it come back (I believe)
[12:57] <johnl> but you don't just have one monitor
[12:58] <johnl> you should have at least 3 if you want high availability
[12:58] <johnl> and you have have many more
[12:58] <johnl> they all speak to each other and handle indivudual monitors failing and coming back, with no interruption in service
[12:59] <gucki> yes, but if i have 3...then one goes offline so there are only 2 left, which means an odd number and so the fs would block?
[12:59] <johnl> as long as you always have a majority of them available (i.e: n/2 + 1), things keep working
[12:59] <johnl> no, the 2 left know that they have quorum. they're the majority
[12:59] <johnl> the 1 that went offline, if it couldn't contact the other 2, it knows it is *not* the marjority
[12:59] <gucki> ah ok....i think now i got it.. :)
[13:07] <johnl> :)
[16:29] * yehuda_wk (~quassel@ip-66-33-206-8.dreamhost.com) has joined #ceph
[16:29] * yehudasa (~quassel@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[16:55] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[16:55] * lxo (~aoliva@ Quit (Read error: Connection reset by peer)
[16:56] * lxo (~aoliva@ has joined #ceph
[17:18] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:42] * joshd (~jdurgin@adsl-75-28-69-238.dsl.irvnca.sbcglobal.net) has joined #ceph
[17:42] <Tv> well, autotest is working nicely -- i've had two ceph clusters crash already :-/
[17:46] <Tv> 2011-03-17 09:46:48.176253 mon0 -> 'HEALTH_WARN osdmonitor: num_osds = 1, num_up_osds = 0, num_in_osds = 0 Some PGs are: degraded' (0)
[17:46] * rajeshr1 (~Adium@99-7-122-114.lightspeed.brbnca.sbcglobal.net) Quit (Quit: Leaving.)
[17:46] <Tv> make that three
[17:59] <sjust> where did the crashes occur?
[18:02] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:06] * cmccabe (~cmccabe@ has joined #ceph
[18:06] <Tv> seemed like a couple of asserts triggered in bufferlist::copy
[18:07] <Tv> but the traceback did not have source+line for the last few frames
[18:11] <Tv> two SIGABORTs, one where osd was still running but not "up"
[18:11] <cmccabe> tv: you can try using gdb to resolve the symbol
[18:11] <Tv> cmccabe: working on other things right now
[18:12] * rajeshr (~Adium@ has joined #ceph
[18:13] * gucki (~gucki@80-218-124-80.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[18:18] <Tv> a hanging ceph mount makes everything sad
[18:22] <Tv> sepia18 refuses to shutdown :(
[18:22] <Tv> because the shutdown sequence runs sync
[18:22] <Tv> which hangs
[18:23] <Tv> gaah
[18:23] <cmccabe> there were a bunch of shutdown commands; there must be one of them that doesn't sync
[18:24] <cmccabe> halt -n ?
[18:24] <Tv> yes well it's wedged too hard for even that now
[18:24] <Tv> powercycle time
[18:28] <cmccabe> when did librados::RadosClient::shutdown become different than librados::RadosClient::~RadosClient ?
[18:28] <cmccabe> that seems wrong
[18:30] <Tv> sjust: sepia{13,14,15,25,26,27} added to autotest
[18:32] <sagewk> skype/meeting!
[18:32] <joshd> cmccabe: I think it's been that way for a while, but shutdown is only called right before deleting the RadosClient
[18:32] <cmccabe> joshd: first of all, one takes the lock and the other doesn't. But they both operate on the same data structures
[18:34] <cmccabe> joshd: I guess maybe holding the lock temporarily is necessary for destroying that one thing
[18:34] <cmccabe> joshd: just because of Timer
[19:02] <Tv> sjust, *: brief autotest outage for upgrade
[19:05] <Tv> done
[19:12] <sagewk> time to invent some syntax for qemu...
[19:12] <Tv> libvirt has a lot of foo=bar:k=v,k2=v2 stuff
[19:13] <sagewk> there is the (old) -hda rbd:... syntax, and the new -drive rbd:...,if=none,id=blah,format=rbd syntax
[19:13] <sagewk> i guess the rbd: prefix is for the benefit of the deprecated -hda only?
[19:13] <sagewk> (since -drive has the format=rbd part?)
[19:13] <sagewk> and qemu uses , to separate it's stuff for the -drive syntax
[19:14] <Tv> cyls=c,heads=h,secs=s[,trans=t]
[19:14] <Tv> hahaha
[19:14] <sagewk> so we could use colon...? -drive rbd:pool=p:image=i:conf=/etc/ceph/ceph.conf,format=rbd
[19:15] <sagewk> or -hda rbd:pool=p:image=i:conf=blah:foo=bar
[19:15] <sagewk> would be interesting to see what syntax other storage drivers are using
[19:15] <Tv> usually those things use colon much like http:, to decide a scheme
[19:15] <Tv> and then the rest is passed to handler for that scheme
[19:15] <Tv> not sure about qemu
[19:15] <Tv> but yeah look at sheepdog
[19:17] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[19:17] <joshd> it looks like sheepdog uses colons as separators, but they just give a gateway address, port, and image name
[19:18] <Tv> does qemu use the -drive format=X to decide the right handler?
[19:18] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:18] <Tv> if so, to get to that point, it has already parsed it somewhat
[19:19] <Tv> qemu.org is down :(
[19:25] <Tv> kvm repo on kernel.org is still good.. quick read says it's a wild wild west for how to parse thigns
[19:29] <Tv> ok making hanging ceph mounts act nice is quickly becoming my #1 priority
[19:30] <Tv> too much pain
[19:31] <Tv> reboot --force hangs too
[19:33] <sagewk> reboot -f -n
[19:34] <sagewk> (-n == don't do a sync)
[19:34] <Tv> that's not on the manpage..
[19:34] <sagewk> tv: also see #206. no good solution.
[19:34] <Tv> but i see it in --help.. gah
[19:35] <sagewk> it's on my man page (lenny or squeeze, not sure)
[19:35] <cmccabe> tv: it's in the manpage on debian 6.0
[19:35] <Tv> hmmph
[19:35] <cmccabe> tv: seems like Ubuntu 10.04.2 does not have it
[19:35] <cmccabe> tv: in the manpage
[19:36] <Tv> sagewk: so umount -f is supposed to disconnect the fs and make all the fds give -EIO from there on
[19:36] <Tv> sagewk: i don't see why it couldn't just rip everything down by force, stop waiting for the remote
[19:36] <Tv> apart from "needs code"
[19:36] <sagewk> yes. provided the mount mutex isn't already held in the kernel
[19:36] <sagewk> which happens if someone does a umount without -f and it hangs
[19:36] <Tv> ohh
[19:37] <sagewk> thought about adding a hacky hook in /sys/kernel/debug/ceph/ to kill a failing umount attempt, not sure if it's worth it
[19:37] <Tv> well in this case, umount is actually already an attempt to fix the underlying problem; the real problem is that sync hangs forever
[19:38] <Tv> and, well, i'd be fine with saying "ceph lets sync continue after n seconds, whether it's complete or not"
[19:38] <Tv> because, well, there's a strong history that says sync doesn't really really guarantee that much
[19:38] <Tv> i still do three syncs before a hard halt ;)
[19:39] <Tv> to clarify: nothing tries to unmount; autotest wants to sync and reboot
[19:40] <cmccabe> tv: the three syncs thing is obsolete; it had to do with some problem that existed on an ancient unix
[19:40] <Tv> BUGS
[19:40] <Tv> According to the standard specification (e.g., POSIX.1-2001), sync() schedules the writes, but may
[19:40] <Tv> return before the actual writing is done. However, since version 1.3.20 Linux does actually wait.
[19:40] <Tv> (This still does not guarantee data integrity: modern disks have large caches.)
[19:40] <cmccabe> tv: yeah... as long as you're not using Linux 1.3.19 or earlier, you're ok :)
[19:40] <Tv> where "ok" means "hang forever" :(
[19:40] <Tv> i don't like this "ok" very much
[19:41] <cmccabe> tv: I'm just pointing out that 3 syncs is unecessary, that's all.
[19:42] <cmccabe> I am curious how system administrators resolve these problems in practice. When I had a hanging NFS mount I used to power cycle the machine manually, but that doesn't seem very enterprisey
[19:42] <sagewk> tv: that man page is horribly outddated. maybe on non-linux systems that true,but linux sync has done a real sync for a long time now
[19:42] <darkfaded> cmccabe: hehe
[19:42] <Tv> yeah
[19:43] <darkfaded> idiot admins will fuser
[19:43] <sagewk> so, a umount -f that isn't preceeded by sync should work.
[19:43] <sagewk> if it doesn't, that's a bug.
[19:43] <Tv> most of my hard halts were from the 1.3 era anyway ;)
[19:43] <darkfaded> medium admins will use umount -l
[19:43] <darkfaded> clue is knowing when to use soft and hard mounts
[19:43] <sagewk> as for making sync time out, that's what the 'soft' mode woudl do.. but it's something that should be enabled with caution.. otherwise random timeouts + slow server -> data loss
[19:43] <darkfaded> umount -f does never work in pratice
[19:44] <Tv> sagewk: well i'm thinking more something like, tell ceph.ko that it's poisoned, make it fail *everything* fast
[19:44] <Tv> sagewk: that avoids the mount mutex issue
[19:44] <sagewk> that's basically umount -f
[19:44] <sagewk> unless you want to add a hook to make existing things that block fail (abort an inprogress sync)
[19:44] <Tv> i'll try umount -f before umount the next time i reproduce the hang
[19:44] <sagewk> that would be useful for cases like this
[19:44] <sagewk> yeah
[19:44] <Tv> yeah that's pretty much needed
[19:45] <darkfaded> Tv: please also try umount -l
[19:45] <Tv> :(
[19:45] <Tv> darkfaded: that only unhooks it from the namespace though
[19:45] <darkfaded> umount -f is sucky with network fs
[19:45] <Tv> in this case, the servers are long gone
[19:45] <Tv> and the test has already been marked failed
[19:45] <darkfaded> Tv: yes, but in practice umount -l does work, -f only if noone issued things like df
[19:45] <darkfaded> ok
[19:45] <Tv> and i just need to tear everything down unconditionally
[19:46] <darkfaded> my bet is 70% of times -f will not work
[19:46] <Tv> literally next step is powercycling
[19:46] <sagewk> in any case, any such force-fail hook should probably be planned/implemented along with -o soft as it's all the same blocking code that will be affected
[19:46] <sagewk> but if you can get a shell, reboot -f -n always works, so i suspect that'll do for autotest
[19:48] <Tv> sagewk: yeah except autotest currently tries to sync, i'll need to whack it
[19:48] <sagewk> and tv,darkfaded: any umount -f failure (that isn't blocking on the umount mutex) is a ceph bug that should go in the tracker
[19:48] <sagewk> whack!
[19:48] <darkfaded> sagewk: interesting. :) i just know this from nfs-abusive environments
[19:48] <darkfaded> i didnt know ceph should handle better
[19:50] <sagewk> if there is a blocking sync it will fail, but that's a vfs problem that'll also affect nfs or whatever else. but barring that, ceph at least is *supposed* to work
[19:50] <darkfaded> (because from my POV umount -f hanging is mostly a OS issue)
[19:50] <darkfaded> ah -yeah VFS issue
[19:50] <darkfaded> hehe
[19:52] <darkfaded> sagewk: i think the standard problem we saw (nfs) was that if anyone got an incident that said "hanging mount" they'll log into the system and hit "df" to check, which will sync to show correct stats
[19:52] <darkfaded> but i am sure ceph wont hang anyway :)
[19:53] <darkfaded> last hang i produced was > 6 months ago
[19:53] <Tv> brief autotest outage
[19:53] <cmccabe> build-fedora13-amd64 just got really, really slow
[19:54] <cmccabe> is this somehow drawing from resources that someone else is using?
[19:54] <Tv> like, by being a virtual machine? sure.
[19:54] <darkfaded> tehe
[19:54] <Tv> i don't see other significant cpu spikes on the machine
[19:55] <cmccabe> are all these VMs running on ceph-kvm1.ceph.dreamhost.com
[19:55] <Tv> that or ceph-kvm2
[19:55] <Tv> but -kvm1 hosts mostly test vms which are idle now
[19:55] <Tv> your cpu graph is the only one spiking
[19:56] <cmccabe> well, I can't control the interface at all now
[19:56] <cmccabe> it takes 30 seconds to move the mouse
[19:56] <Tv> sounds like networking more than cpu
[19:56] <cmccabe> a text interface would help a lot I'm sure
[19:56] <cmccabe> how do I ssh into this?
[19:57] <Tv> i told you when you started using it, it's behind a NAT, it's only reachable from ceph-kvm1
[19:57] <cmccabe> but I should be able to ssh in from that machine then?
[19:58] <Tv> if it has an ssh server running etc
[19:59] <cmccabe> it's strange because the information tab tells me the cpu is only at 5%, but it is completely unusable at the moment
[20:00] <cmccabe> welp, I guess I'll be working on something else until this clears up enough for me to try to enable ssh
[20:01] <Tv> the kvm process for it is spiking heavily in cpu usage
[20:02] <Tv> from ~4% to ~400% all the time
[20:03] <Tv> perhaps giving it 16 vcpus was way too optimistic, without further tuning, and it's getting flushed out of cache too often
[20:13] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[20:16] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[20:18] <Tv> sjust, *: brief autotest outage again
[20:21] <Tv> sjust: hey if you want to see the osd crash, test 234 failed and i haven't cleaned it up yet
[20:24] <sjust> ok
[20:48] <Tv> 12:21:44 INFO | CmdError: Command <grep MemTotal /proc/meminfo> failed, rc=-11, Command returned non-zero exit status
[20:48] <Tv> whee segfault from grep
[20:54] <cmccabe> how can I ssh in to build-fedora13-amd64?
[20:57] <Tv> cmccabe: what more do you need?
[20:58] <Tv> cmccabe: it's in a NATted LAN inside ceph-kvm1, and i don't even know if it has an ssh server running
[20:58] <cmccabe> tv: well, first of all, it seems to completely lock up from my point of view when I hit compile
[20:58] <cmccabe> tv: second of all, I don't understand what IPs are visible from ceph-kvm1
[20:59] <Tv> uhh,
[20:59] <cmccabe> I see a lot of interfaces, but only one actual IP... which is ceph-kvm1 itself
[20:59] <Tv> dev eth0 proto kernel scope link src
[20:59] <Tv> dev virbr0 proto kernel scope link src
[20:59] <Tv> what's hard there? it's somewhere in the 192 network, and i don't know what ip it is either
[21:00] <cmccabe> what command are you running to see that output
[21:00] <Tv> now recover the box, figure out its IP, ssh in if you want, and figure out why it's doing what it is doing.. e.g. are you make -j'ing
[21:00] <Tv> ip ro
[21:00] <cmccabe> actually ifconfig does show the 192.168.122 network as well
[21:01] <Tv> you'll be happier if you stop using ifconfig
[21:01] <cmccabe> yes, it is deprecated...
[21:01] <Tv> and welcome to the 21st century..
[21:01] <Tv> but just get on the box, figure out what it's ip is, and ssh to that from ceph-kvm1
[21:02] <cmccabe> I just don't know if this setup is going to work out
[21:02] <Tv> i don't see any need for ifconfig on ceph-kvm1 in the first place
[21:02] <cmccabe> if it locks up each time load spikes
[21:02] <Tv> all the gitbuilder work just fine
[21:02] <cmccabe> well, I'm trying to ssh in, and I assumed I would be able to do that from ceph-kvm1
[21:03] <Tv> without knowing the ip address?
[21:03] <cmccabe> I have to be honest, I'm feeling a little bit frustrated now
[21:04] <cmccabe> the graphical console is unresponsive again. It seems to have been triggered by my running make -j 16
[21:04] <Tv> then don't
[21:04] <cmccabe> when it becomes responsive, perhaps I can learn the IP address from there
[21:04] <cmccabe> but in order for that to be worthwhile, I have to be convinced that this setup can work
[21:04] <Tv> reboot the box, tune down the vcpus, don't use -j16
[21:04] <Tv> it just isn't that hard
[21:05] <cmccabe> do you think reducing the number cpus will help?
[21:05] <Tv> well it sounds very much like cpu cacheline/icache trashing
[21:05] <cmccabe> I am going to halt this vm and bring it up with fewer cpus
[21:08] <Tv> sjust: i need more sepia machines, aborting 234
[21:08] * bchrisman (~Adium@sjs-cc-wifi-1-1-lc-int.sjsu.edu) has joined #ceph
[21:10] <cmccabe> tv: it seems to be doing a lot better with 4 cpus
[21:10] <cmccabe> but I still cannot ssh in
[21:10] <cmccabe> I'm getting no route to host
[21:10] <cmccabe> cmccabe@ceph-kvm1:~$ ip ro
[21:10] <cmccabe> dev eth0 proto kernel scope link src
[21:10] <cmccabe> dev virbr0 proto kernel scope link src
[21:10] <cmccabe> default via dev eth0
[21:10] <cmccabe> cmccabe@ceph-kvm1:~$ ssh cmccabe@
[21:10] <cmccabe> ssh: connect to host port 22: No route to host
[21:11] <Tv> firewall
[21:11] <Tv> pings just fine
[21:12] <cmccabe> I suppose I can set up port forwarding or something
[21:12] <Tv> ssh -D is your friend
[21:16] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[21:20] * pombreda (~Administr@186.71-136-217.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:41] * bchrisman (~Adium@sjs-cc-wifi-1-1-lc-int.sjsu.edu) Quit (Ping timeout: 480 seconds)
[21:46] * verwilst_ (~verwilst@dD576FAAE.access.telenet.be) has joined #ceph
[21:56] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:25] <Tv> hmmh i just had an autotest run fail pjd-fstest..
[22:25] <Tv> it seems i can't complete a run without triggering some ceph bug
[22:25] <Tv> the good news and the bad news are the same!
[22:27] <sjust> alright, sepia49 installing, only 48-96 to go
[22:27] <sjust> *50-96
[22:33] <Tv> brief autotest outage, upgrading..
[22:41] * MK_FG (~MK_FG@219.91-157-90.telenet.ru) Quit (Quit: o//)
[22:43] <Tv> sepia22 login: ubuntu
[22:43] <Tv> Last login: Thu Mar 17 09:44:00 PDT 2011 from ip-10-0-1-102.dreamhost.com on pts/0
[22:43] <Tv> Segmentation fault (core dumped)
[22:43] <Tv> Segmentation fault (core dumped)
[22:43] <Tv> Segmentation fault (core dumped)
[22:43] <Tv> Linux sepia22 2.6.35-27-generic #48-Ubuntu SMP Tue Feb 22 20:25:46 UTC 2011 x86_64 GNU/Linux
[22:43] <Tv> gotta love the failures
[22:51] <Tv> and another box with the exact same symptoms
[22:51] <Tv> funky
[22:53] <sjust> sepia49 is up, most of 50-96 will be soon
[22:53] <Tv> sjust: thx i'll do them all at once when they're ready
[22:54] <Tv> also, 96! whoa
[22:54] <sjust> there is a gap from 27 to 49, but still
[22:58] <Tv> 2011-03-17 14:58:13.504307 osd e7: 3 osds: 1 up, 3 in
[22:58] <Tv> these tests are making me sad :(
[22:58] <Tv> many are hitting the same bufferlist::copy assert, but still
[22:59] <sjust> I'll look at it in a bit
[23:04] <Tv> well, regardless of that, the tests themselves seem to work ok.. fyi:
[23:04] <Tv> ceph_blogbench/ ceph_dbench/ ceph_ffsb/ ceph_fsx/ ceph_pjd_fstest/
[23:04] <Tv> ceph_bonnie/ ceph_direct_io_test/ ceph_fsstress/ ceph_iozone/ ceph_tiobench/
[23:04] <Tv> that's what's autotestable easily now
[23:04] <Tv> kernel client, cfuse, cluster size & roles all easily customizable
[23:07] * bchrisman (~Adium@sjs-cc-wifi-1-1-lc-int.sjsu.edu) has joined #ceph
[23:08] <Tv> one thing that is frustrating is that i haven't figured out how to make autotest notice failures.. most failures just cause the test to hang
[23:09] <Tv> but all that means is need to write more custom stuff and depend less on current autotest features, i guess
[23:09] <Tv> the current stuff is too synchronous -- if it's waiting for client to say it's done, it won't notice a daemon crashing in the meanwhile
[23:09] <Tv> or even, if node 1 is waiting for node 2 to say it's ready, and node 2 fails utterly, node 1 will sit there until timeout
[23:10] <Tv> a Simple Matter of Programming(tm)
[23:10] <cmccabe> it really might be nice to periodically poll that the daemons were still alive every, say, 15 seconds
[23:11] <cmccabe> the problem is that some of the timeouts in the code are extremely generous and if you wait for them, every failing test will take ages
[23:11] <Tv> like, 72 hours. you're not telling me anything new.
[23:12] <cmccabe> heh
[23:12] <sjust> 32 new machines ready for autotest!
[23:12] <sjust> now for the stragglers...
[23:54] <Tv> sjust: they're in autotest pool now
[23:54] <sjust> cool
[23:54] <Tv> i think i'll go run a 57-machine test, just for the lulz
[23:54] <sjust> heh
[23:56] * verwilst_ (~verwilst@dD576FAAE.access.telenet.be) Quit (Quit: Ex-Chat)
[23:59] <Tv> "Hostname" "any 57"
[23:59] <Tv> hold on tight, there might be shrapnel in the air soon ;)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.