#ceph IRC Log


IRC Log for 2013-03-01

Timestamps are in GMT/BST.

[10:31] -larich.oftc.net- *** Looking up your hostname...
[10:31] -larich.oftc.net- *** Checking Ident
[10:31] -larich.oftc.net- *** No Ident response
[10:31] -larich.oftc.net- *** Found your hostname
[10:31] * CephLogBot (~PircBot@rockbox.widodh.nl) has joined #ceph
[10:31] * Topic is 'v0.56.3 has been released -- http://goo.gl/f3k3U || argonaut v0.48.3 released -- http://goo.gl/80aGP || Deploying Ceph with Juju http://t.co/TspsYBeTej'
[10:31] * Set by scuttlemonkey!~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net on Fri Feb 22 02:51:46 CET 2013
[10:31] <wido> and there it is again
[10:32] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[10:47] <fghaas> thanks wido
[11:00] * madkiss (~madkiss@ Quit (Quit: Leaving.)
[11:14] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[11:20] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[11:21] * dxd828 (~davidmait@dxdservers.com) has joined #ceph
[11:24] * BillK (~BillK@ Quit (Ping timeout: 480 seconds)
[11:34] * BillK (~BillK@124-169-53-48.dyn.iinet.net.au) has joined #ceph
[11:45] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[11:47] * jlogan (~Thunderbi@2600:c00:3010:1:24ed:e619:d0aa:3516) Quit (Ping timeout: 480 seconds)
[12:05] * gerard_dethier (~Thunderbi@ has joined #ceph
[12:15] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[12:28] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[12:39] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[12:48] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[12:48] * diegows (~diegows@ has joined #ceph
[13:07] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[13:07] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[13:24] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[13:24] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[13:33] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[13:40] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:46] * MooingLemur (~troy@phx-pnap.pinchaser.com) Quit (Remote host closed the connection)
[13:46] * MooingLemur (~troy@phx-pnap.pinchaser.com) has joined #ceph
[13:59] <joelio> Simple question, but how do I map up and dbd device on a host that doesn't have OSDs. Is it sufficient to run an monitor on it?
[13:59] <joelio> or use a g/w of some fashion?
[14:00] <Gugge-47527> rbd?
[14:01] <Gugge-47527> install the ceph stuff, copy ceph.conf and the keyring to the machine
[14:01] <Gugge-47527> use rbd
[14:01] <joelio> cool, that enough then
[14:03] <joelio> ah, of course, apologies..
[14:03] * joelio needs more sleep
[14:10] <janos> if i have a very old slow request, is there a way to clear it?
[14:10] <absynth> yeah
[14:10] <absynth> mark the osd as down
[14:10] <absynth> ceph osd down X
[14:11] <absynth> it should not interrupt anything and often clears lingering slow requests
[14:11] <janos> is there a way i can tell which osd?
[14:11] <janos> i just saw some spew from ceph -w
[14:11] <janos> duhhhh
[14:11] <janos> nm
[14:11] <janos> i see it
[14:11] <janos> was looking too far into the line
[14:11] <janos> thank you
[14:14] * madkiss (~madkiss@ has joined #ceph
[14:15] <janos> gah, i have some serious flapping
[14:15] <janos> shutting down. time for my day job ;O
[14:17] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:18] * loicd (~loic@ has joined #ceph
[14:21] * l0nk (~alex@ has joined #ceph
[14:25] <janos> i am unable to unmap an rbd. says device or resource busy. how cna i go about discerning what the hang up is?
[14:26] <elder> janos, is there anything mounted on it?
[14:26] <janos> unmounted
[14:26] <elder> kernel rbd image?
[14:26] <janos> just double checked mount -l
[14:27] <janos> i honestly have no idea what the difference is
[14:27] <janos> kernel or not. sorry
[14:27] <janos> i had to modprobe rbd to make it
[14:27] <elder> If you're using qemu or fuse or something I'm less help. The rbd kernel module means it's a kernel image.
[14:27] <elder> /dev/rbdX, right?
[14:28] <janos> yes
[14:28] <elder> What version of the kernel?
[14:28] <janos> 3.7.9-201
[14:28] <janos> with 0.56.3
[14:29] <elder> Did you build the kernel?
[14:29] <janos> no it's fedora 18
[14:30] <janos> just checked, there is a 3.7.9-205 avaialb
[14:30] <janos> le
[14:30] <elder> I have no idea what the difference between -201 and -205 would be.
[14:30] <janos> me either
[14:30] <janos> i dont imagine much to do with this
[14:30] <janos> ;)
[14:30] <elder> Have you done anything other than mount a file system with it?
[14:30] <elder> I.e., on the device?
[14:31] <janos> mounted to a point where samba was aimed at
[14:31] <janos> then tried loading files on
[14:31] <janos> but it unmounted succesfully
[14:32] <absynth> uhmmmm
[14:33] <absynth> that sounds vaguely familiar
[14:33] <absynth> that umount issue
[14:33] <absynth> let me check with oliver
[14:33] <elder> I'm sorry, I'm really not sure what's holding things up. It's possible there's something over on the osd that's reporting something's busy.
[14:33] <elder> And in that case someone else would be better to try to help.
[14:33] <janos> i shut down the cluster
[14:33] <janos> it was flapping horribly
[14:34] <janos> this is a home cluster. 2 hosts, 6 osd's
[14:35] <absynth> oliver says, and i quote verbatim, "kernel fucked. reboot."
[14:35] <janos> hahah
[14:35] <janos> ok
[14:35] <janos> is there a preference or recommendation of kernel rbd versus not?
[14:36] <janos> i really just need to expose a big filestore to some windows machines. i figured rbd mounted to a share, exposed by samba would be fine
[14:36] <janos> if there are other recommendations, i'm game
[14:37] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[15:01] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[15:03] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:03] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[15:06] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:22] * jskinner (~jskinner@ has joined #ceph
[15:22] * timmclaughlin (~timmclaug@ has joined #ceph
[15:25] <jtang> hrm
[15:42] <joelio> holy moly.. running IOZone benches. 1TB ext4 via rbd into non-osd'd box. Read 4kb blocks 512Mb/2Gb/4Gb- Average: 6513.57 MB/s
[15:44] <fghaas> joelio: you can interpret iozone output without tripping acid? I'm impressed :)
[15:44] <loicd> :-D
[15:45] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[15:45] * ChanServ sets mode +o scuttlemonkey
[15:45] * esammy (~esamuels@host-2-102-71-92.as13285.net) Quit (Ping timeout: 480 seconds)
[15:46] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[15:48] * esammy (~esamuels@host-2-103-102-58.as13285.net) has joined #ceph
[15:50] * drokita (~drokita@ has joined #ceph
[15:52] <joelio> fghaas: It's called phoronix test suite ;)
[15:53] <joelio> sit back and relax (pretend your working) whilst watching iostat and other important looking things :)
[15:53] <joelio> 64Kb blocks even better
[15:53] <joelio> Average: 8335.19 MB/s
[15:59] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[16:00] * jskinner (~jskinner@ has joined #ceph
[16:03] <infernix> joelio: 8 clients?
[16:05] * jlogan (~Thunderbi@2600:c00:3010:1:24ed:e619:d0aa:3516) has joined #ceph
[16:06] <joelio> infernix: I didn't pass any concurrecy options via phoronix. I think it does usual multiples
[16:06] <joelio> I'm getting the same blazing read performance across the board
[16:07] <joelio> pts/iozone-1.8.0 [Record Size: 1MB - File Size: 512MB - Disk Test: Read Performance] - Average: 8198.61 MB/s
[16:07] <joelio> etc..
[16:07] <infernix> yeah but how many clients are you running with it
[16:07] <infernix> you can't get that on one client
[16:08] <joelio> I have 4 nodes, 48 OSDs and 10Gb. the box I'm testing from has no OSDS, but connected via 10Gbe. I can get those results via aggregated b/w on 10g
[16:08] <loicd> cephalobot: help
[16:08] <cephalobot> loicd: (help [<plugin>] [<command>]) -- This command gives a useful description of what <command> does. <plugin> is only necessary if the command is in more than one plugin.
[16:08] <joelio> I have 1tb ext mapped via rbd
[16:09] <infernix> joelio: so you have 8 10gbit nics in the test box?
[16:09] <joelio> no, 1
[16:09] <infernix> so how do you exceed the maximum bandwidth of that 10gbit nic by 8 times?
[16:09] <elder> Holy moly!
[16:09] <joelio> caches
[16:10] <infernix> caches on the client negate the whole point of the test
[16:10] <infernix> you should turn those off
[16:10] <joelio> no caches on the client
[16:10] <infernix> so explain to me how you can squeeze 8GByte/s through a NIC that can only do 1GByte/s
[16:10] <infernix> :)
[16:11] <joelio> no idea, must be some magic
[16:12] <joelio> write perf is as it has always been. Read perf via IOZone giving those results, consistently
[16:12] * jskinner_ (~jskinner@ has joined #ceph
[16:12] * jskinner_ (~jskinner@ Quit (Remote host closed the connection)
[16:13] <elder> We haven't implemented magic in rbd yet.
[16:13] <elder> It's on the roadmap, but it'll be a while.
[16:13] <joelio> hurry up please
[16:13] <joelio> If you could include wizard hats, that would be great
[16:13] * janos signs up for that
[16:14] <elder> You have to earn those.
[16:14] <joelio> I guess these results need to be explained in some way, as happy as I am with them!
[16:14] <loicd> :)
[16:14] <joelio> will wait for this tranche to finish and the disect
[16:16] <absynth> what does a support contract buy me
[16:16] <absynth> with regards to magic, pointy hats and wands?
[16:17] <joelio> a year's supply of black cats?
[16:17] <absynth> err
[16:17] <absynth> sorry, i need verification by an inktank employee on that
[16:18] <absynth> my minimum requirement is a wizard's robe, a yearly supply of wizard beard wax and at least three distinct magic spells.
[16:18] <nhm> btw, I've legitimately gotten 1.3GB/s out of a single RBD volume.
[16:18] * nhm goes back to lurking
[16:19] * jskinner (~jskinner@ Quit (Ping timeout: 480 seconds)
[16:21] <joelio> absynth: Some people want the moon on a stick hey..
[16:21] <absynth> what would i do with a moon on a stick?
[16:22] <janos> hahah
[16:22] * timmclau_ (~timmclaug@ has joined #ceph
[16:22] <nhm> absynth: ever heard of code monkeys?
[16:26] * markbby (~Adium@ has joined #ceph
[16:27] <absynth> yes, but...?
[16:28] <nhm> absynth: starting around 3:30: http://www.youtube.com/watch?v=OciobvLQtDw
[16:28] * markbby (~Adium@ Quit (Remote host closed the connection)
[16:29] * timmclaughlin (~timmclaug@ Quit (Ping timeout: 480 seconds)
[16:38] * madkiss (~madkiss@ Quit (Quit: Leaving.)
[16:44] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) Quit (Remote host closed the connection)
[16:44] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[16:49] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[16:52] <infernix> joelio: my point is that those speeds are off somewhere, or the calculation is off
[16:52] <infernix> you can't possibly get 6 to 8gbyte/s performance on a single node if that node is connected with a NIC that can only ever do 1gbyte/s
[16:52] <joelio> oh, I understand
[16:52] <infernix> the only explanation would be local client-side caching, or skewed math
[16:53] <joelio> I'll be A/B ing with the local FS once the test run is complete
[16:53] <infernix> well not the only but the most plausible
[16:54] <joelio> strange thing is, this is a freshly installed box with only ceph on it, no other caching or anything setup
[16:54] <infernix> for what it's worth i'm still messing with my python librdb benchmark tool; i'm trying to reduce its memory consumption but it's affecting speeds in a bad way
[16:54] <infernix> joelio: try directio if iozone supports it
[16:55] <infernix> and turn off any rbd caching
[16:55] <joelio> yea, I have other benches running. postmark, fsmark, kernel uncompression etc.. that will give me a better idea
[16:55] <joelio> if the results are outliers (by siginficant margin) then can remove
[16:55] <joelio> or explain and then remove
[16:57] <joelio> infernix: if you're interested in the strangeness.. https://gist.github.com/anonymous/13a97553c147d2e82d94
[16:58] <joelio> the standard deviation is within limits too, could understand if there was a massive hit and then normal speeds
[16:59] <infernix> joelio: how much ram is in your client box?
[17:00] <joelio> total used free shared buffers cached
[17:00] <joelio> Mem: 64485 1196 63289 0 12 514
[17:00] <joelio> bear in mind, these are 8Gb file size
[17:01] <joelio> so you'd expect to see 8Gb cached usage clientside
[17:01] <joelio> (if it was doing that)
[17:02] <infernix> well perhaps if it deletes it afterwards, it gets freed
[17:02] <joelio> I'm watching it live
[17:02] <joelio> it doesn't move
[17:02] <joelio> (well, not vastly)
[17:08] * BillK (~BillK@124-169-53-48.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[17:09] * gerard_dethier (~Thunderbi@ Quit (Quit: gerard_dethier)
[17:11] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:14] * timmclau_ (~timmclaug@ Quit (Remote host closed the connection)
[17:14] * timmclaughlin (~timmclaug@ has joined #ceph
[17:15] * timmclau_ (~timmclaug@ has joined #ceph
[17:15] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[17:16] * timmcla__ (~timmclaug@ has joined #ceph
[17:18] * timmclaughlin (~timmclaug@ Quit (Read error: Connection reset by peer)
[17:18] * Philip (~Philip@hnvr-4d079d7d.pool.mediaWays.net) has joined #ceph
[17:18] <Philip> Hi
[17:18] * Philip is now known as Guest626
[17:19] <Guest626> I am currently a gluster-fs user and I am deeply frustrated with the huge number of bugs that are making this fs a nightmare for production usage.
[17:19] <Guest626> I am currently checking weather I can switch to ceph. Here is my scenario:
[17:21] * BillK (~BillK@58-7-60-123.dyn.iinet.net.au) has joined #ceph
[17:21] <Guest626> I store millions of files in the range of 1-20mb. These files are written once to the fs and from that point they are heavily accessed for web-serving by several different apache frontends. There is a constant stream of new files beeing added to the fs and there are constantly files being deleted. Do you think that ceph is made for such a scenario?
[17:22] <joelio> Test it
[17:23] <joelio> I wouldn't put my trust in cephfs right now though, but you may find RADOS S3 style better, and more modern
[17:23] <joelio> that's what it was invented for, after all
[17:23] * timmclau_ (~timmclaug@ Quit (Ping timeout: 480 seconds)
[17:24] <Guest626> Yeah, I was evaluating openstack storage as well but the fs-style (mount etc.) would save me lots of reprogramming.
[17:24] <joelio> Also, put varnish in front of your apaches's or some other reverse proxy (what I would do anyway)
[17:25] <joelio> Well, you'll be doomed to constantly fix leaking holes if you don't want to reprogram
[17:25] <joelio> to put it bluntly
[17:25] <joelio> :)
[17:26] * dosaboy (~gizmo@faun.canonical.com) Quit (Read error: No route to host)
[17:26] <Guest626> What are the issues you'd have with ceph for that scenario exactly? Gluster for instance is bugged as hell crashing at least one every 3 days :-/
[17:26] * Guest626 is now known as Philip_
[17:26] <joelio> The big bit on the website that says cephfs is not production ready, perhaps?
[17:26] <Philip_> :-)
[17:26] <Philip_> Yeah. Ok thats an argument
[17:27] <joelio> really, you'll probably spend more time trying to shoehorn your design into FS than it would do to engineer it properly to scale
[17:27] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[17:28] <joelio> please don't let me put you off in testing though, everything needs to start somewhere
[17:28] * timmclaughlin (~timmclaug@ has joined #ceph
[17:30] <joelio> Philip_: also, don't confuse cepf == cephfs as it's not
[17:30] <joelio> ceph even
[17:32] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[17:32] * timmcla__ (~timmclaug@ Quit (Ping timeout: 480 seconds)
[17:35] <Philip_> I also think that OpenStack Storage might be the best solution but I have to get rid of Gluster otherwise I will lose my hair and won't have enough time for a proper redevelopment :-/
[17:37] <joelio> hmm, ok,. good luck (I speak from previous experience - you're flogging a dead horse)
[17:39] <Philip_> You think OpenStack Storage is a dead horse? O.O
[17:40] <joelio> No, using a filesystem to store millions of files and pull them across multiple hosts in a clustered fashion
[17:41] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[17:42] <joelio> Unless you correctly parition the filesystem at the application layer of course
[17:43] <joelio> but why bother :)
[17:43] <Philip_> It seemed so nice in theory :-)
[17:43] * timmclaughlin (~timmclaug@ Quit (Remote host closed the connection)
[17:43] * timmclaughlin (~timmclaug@ has joined #ceph
[17:43] <joelio> yea, pretty much most people on clustered filesystem lists seem to be doing the same kinda thing tbh
[17:43] <joelio> booooooom
[17:45] <jmlowe> Philip_: is the directory structure really that important or is the path really just a key to access a blob of data that you mean to sling out over the wire?
[17:45] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[17:46] <Philip_> The directory structure is just a key. Its like (2 char hash)/(1 char hash)/file_id
[17:46] * timmclaughlin (~timmclaug@ Quit (Remote host closed the connection)
[17:47] <jmlowe> I think that's your answer, why add the complexity and overhead of posix semantics when you really just want a key value system
[17:47] <joelio> +1
[17:49] <Philip_> :-) Its true but the fs solution seemed to be way more easy to manage.
[17:49] <Philip_> I'd love to simply use aws s3 but their traffic prices are extraordinary high. :-/
[17:50] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[17:50] <jmlowe> so use radosgw, all the s3 and none of the amazon
[17:50] <joelio> use RADOS and you own kit
[17:50] <joelio> hah, snap
[17:51] <jmlowe> and it's ready for prime time afaik
[17:51] <joelio> yes, definitely
[17:51] <Philip_> Did you already compare rados to openstack storage?
[17:51] <jmlowe> I'm going to stop reading joelio's mind and go eat lunch
[17:52] <Philip_> I've heard so much good things about openstack and currently install a test cluster.
[17:52] <joelio> depending on what language you're coding in, there's a million compatible eggs, gems, pears whatever to get S3 handlers in there, so it's not actually *loads* of work
[17:53] <joelio> Philip_: They're the same interface from a client presentation perspective
[17:53] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[17:53] <joelio> just the backend arch is different
[17:53] <Philip_> Yep but they have lots of companies comitting code.
[17:54] <joelio> I don't see how that is relevant, nor different to any other project tbh
[17:54] <gregaf> not that many companies contributing to Swift AFAIK </bias>
[17:55] <joelio> haha, yea, Ceph's the future, deal with it :)
[17:55] <joelio> anyway, I thnk I need a beer
[17:55] <Philip_> Might be foolish but I had that formular: more companies using it/contributing to it = more stability
[17:55] <Philip_> :-)
[17:56] <gregaf> well, you should run your own tests ;)
[17:56] <gregaf> and my experience is lopsided
[17:57] <joelio> Philip_: I understand what you're saying, but that's the same for a lot of projects (Ceph included)
[17:58] <gregaf> but I've talked to more than one group using Swift but desperately looking to move off, and fghaas et al ran some benchmarks which indicated…well, I'd try the Rados Gateway if I were trying Swift
[17:58] <joelio> what you need to do is real world tests
[17:58] <joelio> don't trust what somebody says, or what people commit (as they may be just contributing to areas that is important to them and not the overall product roadmap)
[17:58] <joelio> test it for yourself :)
[17:58] <Philip_> Yep I will clearly do lots of tests. One nightmare (GlusterFS) is enough ;-)
[17:59] <fghaas> in addition to what gregaf says, rgw matches swift in terms of performance, it really comes down to a feature / design comparison
[17:59] <joelio> replication, scalability, availability
[17:59] <joelio> etc.. :)
[18:00] <fghaas> such as, do you want an eventually-consistent system that will pretty much guarantee to always serve you data, even if it later turns out to be inconsistent (swift), or do you prefer something that will rather not serve you data if it can't be 100% sure that it's consistent (ceph/rgw)
[18:00] <fghaas> also, do you need cross-datacenter replication (swift container sync) or not
[18:00] <joelio> good point
[18:01] <fghaas> and do you need amazon s3 api support as a first class citizen (radosgw) vs. as a community contribution (swift3, the s3 compat layer that swift ships with)
[18:01] <joelio> ++1
[18:02] * jskinner (~jskinner@ has joined #ceph
[18:03] * gaveen (~gaveen@ has joined #ceph
[18:04] <fghaas> Philip_: note also that "openstack storage" are two very distinct things, openstack object storage (swift) and openstack block storage (cinder)
[18:04] <fghaas> and with rbd and radosgw, ceph kills those two birds with the same stone, which people often find useful
[18:05] <fghaas> gregaf: actually, the swift developer base is pretty diverse, cf. John Dickinson's LCA talk
[18:06] <gregaf> good to know — I'm not really in that community and I know i have some biased filters through the people who do but I'm not sure how strong they are ;)
[18:10] <fghaas> gregaf: coming to portland this time for openstack summit? I have a feeling I've racked up a beer debt with you over the past few months that I should settle
[18:10] * ScOut3R (~scout3r@54007948.dsl.pool.telekom.hu) has joined #ceph
[18:10] <fghaas> Philip_: that glusterfs "nightmare" you mentioned, was that with UFO or the posix filesystem?
[18:11] <Philip_> posix
[18:11] * jlogan (~Thunderbi@2600:c00:3010:1:24ed:e619:d0aa:3516) Quit (Ping timeout: 480 seconds)
[18:12] * senner (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) has joined #ceph
[18:14] * alram (~alram@ has joined #ceph
[18:14] <Philip_> The nightmare will most likely repeat when I will remove hardware from the gluster system I tried to add today..
[18:15] * BillK (~BillK@58-7-60-123.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:15] <Philip_> I am using very expensive servers. Each costs as much as a average car and the system has nearly no load. Once I added two new servers to the cluster it was completely overloaded while it was rebalancing itself with a speed of 50Mbit/s and crashed..
[18:16] * Kioob (~kioob@2a01:e35:2432:58a0:21a:92ff:fe90:42c5) Quit (Quit: Leaving.)
[18:19] <gregaf> fghaas: no….I dunno that I'm going to make it to a summit unless they hold one in LA or we do CephFS integration of some kind
[18:19] <gregaf> too far outside my primary areas of responsibility
[18:24] * dalgaaf (~dalgaaf@charybdis-ext.suse.de) has joined #ceph
[18:26] * mattch (~mattch@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[18:27] <ShaunR> HEALTH_WARN 1 pgs peering; 1 pgs stuck inactive; 160 pgs stuck unclean
[18:27] <ShaunR> how do i go about fixing this?
[18:28] <absynth> are all OSDs up?
[18:28] * l0nk (~alex@ Quit (Quit: Leaving.)
[18:28] <ShaunR> yep, looks like it
[18:29] <absynth> hm
[18:29] <absynth> did you recently have an osd crash?
[18:30] <ShaunR> no, this is a test cluster though so i just brought it back up a few hours ago
[18:30] <ShaunR> i've done that plenty of times though and it's always shown good health after a few minutes
[18:31] <fghaas> ShaunR: still, first step to troubleshoot would be doing a ceph pg dump to see which PGs are stuck, and then look into their OSDs to see if there's any problem there
[18:31] * noob21 (~cjh@ has joined #ceph
[18:32] <absynth> yep
[18:32] <fghaas> suppose all of those PGs have one of their replicas in the same OSD, that's usually a good indication of trouble there
[18:32] * BManojlovic (~steki@ has joined #ceph
[18:32] <absynth> that... and slow requests. do you have any?
[18:34] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[18:36] <ShaunR> crap, i have to run... I am seeing slow requests in the log but now i'm seeing warnings that the clocks are off 3 seconds
[18:36] <ShaunR> so i'm guessing the clock is the issue
[18:37] <absynth> yeah, clock skew is probably really bad for ceph
[18:37] <fghaas> ttbomk unsynchronized clocks bother the mons, but not so much the osd -- gregaf, is that correct?
[18:38] <absynth> the slow requests are probably another symptom, or maybe the cause?
[18:38] <gregaf> fghaas: yep, that should be the case!
[18:38] <gregaf> clock skew issues could "bother" the OSDs by preventing the monitors from accepting data the OSDs need to put through in order to peer, though
[18:39] * markbby (~Adium@ has joined #ceph
[18:39] <fghaas> that might explain peering taking longer than expected, but not stuck PGs, right?
[18:43] <fghaas> come to think of it, how precisely _does_ an osd keep track of slow requests? epoll_wait()?
[18:44] <gregaf> I'd have to check the code paths on whether it could cause stuck PGs, but actually it might
[18:44] <gregaf> the OSDs keep track of all in-flight requests and just look at the old ones periodically
[18:45] <fghaas> I wasn't doubting that, just curious precisely _how_ they kept track
[18:49] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[18:51] <gregaf> every incoming request has a tracking and information object associated with it; those are put on a list ordered by time of arrival and get updated as it moves through the pipeline; during the OSD "tick" it looks at the beginning of that list and warns about every entry older than the configured time (subject to constraints about maximum number of warnings and doing a backoff for individual requests)
[18:51] <fghaas> so it's actually a wall-clock timestamp that's in that list?
[18:51] <gregaf> it's not that sophisticated conceptually (doesn't need to be!), but you can check out the OpRequest structure and its usage if you're interested in more
[18:51] <gregaf> yeah
[18:52] <gregaf> and the OSD submits the warnings to the central log
[18:52] <fghaas> really? surprised. seeing as I've seen boxes that did utterly crazy stuff in their system clock, that seems like a wonderful issue to track down and debug when it happens in a 200-node cluster :)
[18:53] <gregaf> well, it hasn't been an issue yet, it's pretty coarse-grained, and all the logic is entirely local
[18:54] * terje (~joey@184-96-132-37.hlrn.qwest.net) Quit (Ping timeout: 480 seconds)
[18:54] <gregaf> if you've got a node which is regularly changing its clock by 30 seconds (the default) or even 1 then I think you deserve trouble
[18:54] <gregaf> plus it doesn't actually impact the system operation, just warning output
[18:56] <fghaas> gregaf, ShaunR: http://www.quickmeme.com/meme/3t719b/
[18:56] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) Quit (Quit: Leaving.)
[18:56] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[18:57] * terje (~joey@63-154-130-243.mpls.qwest.net) has joined #ceph
[18:58] * noob22 (~cjh@ has joined #ceph
[19:00] <fghaas> gregaf: yeah, but people do get confused when they see "slow requests" upward of 60 seconds with nothing apparently wrong with the osds, in that case insane clock jitter would be a good thing to have on the checklist to rule out
[19:00] * dalgaaf (~dalgaaf@charybdis-ext.suse.de) Quit (Quit: Konversation terminated!)
[19:01] * dosaboy (~gizmo@faun.canonical.com) Quit (Quit: Leaving.)
[19:01] * Cube (~Cube@ has joined #ceph
[19:02] * noob21 (~cjh@ Quit (Ping timeout: 480 seconds)
[19:05] * terje (~joey@63-154-130-243.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[19:12] * joelio grabs ntp
[19:12] <joelio> if the auth is based as heavily on kerberos as it mentioned, it's vital
[19:13] <jluis> gregaf, around?
[19:14] * jlogan1 (~Thunderbi@2600:c00:3010:1:24ed:e619:d0aa:3516) has joined #ceph
[19:14] * markbby (~Adium@ Quit (Remote host closed the connection)
[19:15] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:15] <gregaf> jluis: yep
[19:15] <jluis> how would I go about to set categories on a pool?
[19:15] <gregaf> honestly I don't even remember
[19:15] <gregaf> I think those are sjust's thing
[19:15] <lxo> sjust, so, it looks like the osd crash within leveldb I brought up yesterday happens most often at the end of a backfill, presumably as the (former) primary cleans up whatever data it retained about what had to be transferred to a replica and leveldb proceeds to garbage-collect that; still no clue on whether it's a leveldb bug or a symptom of database corruption
[19:16] <gregaf> joelio: the authentication introduces the only real clock sync requirement, yes — although even that isn't strong as the keys are valid for an hour at a time and have some overlap
[19:16] <jluis> gregaf, kay, thanks
[19:16] <jluis> sjust, how would I go about to set categories on a pool? :p
[19:18] <sjust> jluis: standup/
[19:18] <sjust> ?
[19:21] * ScOut3R (~scout3r@54007948.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[19:21] * dalgaaf (~dalgaaf@nat.nue.novell.com) has joined #ceph
[19:22] <lxo> is there any kind of leveldb consistency verifier?
[19:22] * terje_ (~joey@63-154-130-243.mpls.qwest.net) has joined #ceph
[19:27] * markbby (~Adium@ has joined #ceph
[19:29] <joelio> gregaf: good to know, thanks. Had a few boxen cry about not being in sync when kerberising. Although less now thanks to the gods of automation
[19:30] <joelio> Dear gPXE, Puppet ad Foreman. Thanks for being awesome, Your's Faithfully, Joel
[19:30] * terje_ (~joey@63-154-130-243.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[19:31] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[19:32] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[19:33] <infernix> joelio, nhm, i present rbdbench.py: http://pastebin.ca/2327099 - play with buffersize, threads, disksize; would be very interested to hear some of your results with it
[19:33] <joelio> *fanfare*
[19:34] <joelio> infernix: will test it over the weekend mate
[19:34] <infernix> it's still a bit rough bu t it does the job
[19:34] <infernix> basically a librbd benchmarker for python that can spawns multiple concurrent processes
[19:34] <infernix> you can also play with object order
[19:35] <sjust> lxo: what does the crash look like?
[19:37] <nhm> infernix: cool!
[19:37] <nhm> infernix: I'm doing rbd tests with fio right now.
[19:38] * nwat (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[19:38] <nhm> infernix: I should be able to do some comparisons at some point.
[19:40] <lxo> sjust, what I said yesterday: it crashes within the string ctor, called by FindShortestPath, called by leveldb::TableBuilder::Add, called in the leveldb compaction thread
[19:40] <nwat> Howdy. On 0.56.3 I'm using mkcephfs with the —mkfs option. On the osd nodes the following command is being run "mkfs.ext4 user_xattr,rw,noatime /dev/sdb1", which doesn't look to be a valid invocation of mkfs. The error "mkfs.ext4: invalid blocks '/dev/sdb1' on device 'user_xattr,rw,noatime" is produced…
[19:40] <sjust> ugh
[19:40] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[19:41] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[19:41] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit ()
[19:42] <nwat> Oh, is it a typo in the documentation. On the quick start guide: "#osd mkfs options ext4 = user_xattr,rw,noatime". Should that be "osd mount options.." ?
[19:42] <lxo> I'm still trying to catch a crash from a live osd (rather than a crash from an osd as it comes back up after the former)
[19:42] <lxo> although I suspect it won't be much different
[19:48] * krnl (~krnl@catv-213-222-155-93.catv.broadband.hu) has joined #ceph
[19:48] <mjevans> nwat: the # is a comment character, it's telling you what they might be if they were used
[19:48] <krnl> hi
[19:48] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[19:49] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Remote host closed the connection)
[19:49] <nwat> mjevans: "osd mkfs options" vs "osd mount options" the docs have "user_xattr,rw,noatime" as options for the former. I think it should be later.
[19:50] <krnl> i installed ceph on local machine for dev purposes. i'm using the s3 interface from my app. i try to remove bucket with radosgw admin with --purge-objects, where some gigabytes was stored. but the disk usage is the same. am i doing something wrong? it shouldnt has to remove the data?
[19:50] <mjevans> nwat: that... seems likely
[19:50] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[19:50] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:53] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:59] * noob22 (~cjh@ Quit (Quit: Leaving.)
[20:01] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Quit: Leaving.)
[20:02] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[20:07] * markbby (~Adium@ Quit (Remote host closed the connection)
[20:08] * dalgaaf (~dalgaaf@nat.nue.novell.com) Quit (Quit: Konversation terminated!)
[20:14] * jjgalvez1 (~jjgalvez@ has joined #ceph
[20:15] * nwat (~Adium@soenat3.cse.ucsc.edu) has left #ceph
[20:21] <yehudasa> krnl: if these are large objects, then you'd need to wait for the garbage collector to run its course
[20:21] <krnl> yehudasa: can i force it some way?
[20:22] <yehudasa> krnl: you can try running 'radosgw-admin gc process' .. but it might be that the objects have some ttl that need to pass before it could actually do anything
[20:23] <krnl> than i wait, thnx for info!
[20:23] * sagewk (~sage@2607:f298:a:607:e898:ce17:f91a:5bdc) Quit (Quit: Leaving.)
[20:23] * dmick (~dmick@2607:f298:a:607:5416:89:4816:3512) has joined #ceph
[20:23] * sagewk (~sage@2607:f298:a:607:acf1:ce74:be94:3445) has joined #ceph
[20:27] <ShaunR> It was the clock, health is ok now... ceph really hates when the clock is off, even if it's only a few seconds
[20:27] * jjgalvez (~jjgalvez@ has joined #ceph
[20:31] * chutzpah (~chutz@ has joined #ceph
[20:33] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[20:33] * jjgalvez1 (~jjgalvez@ Quit (Ping timeout: 480 seconds)
[20:39] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[20:43] * terje_ (~joey@63-154-130-243.mpls.qwest.net) has joined #ceph
[20:44] <krnl> is it safe to run multiple rados gw hosts on different machines but on the same cluster?
[20:45] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[20:45] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[20:51] * terje_ (~joey@63-154-130-243.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[20:51] * zK4k7g (~zK4k7g@digilicious.com) has joined #ceph
[20:53] <mjevans> krnl: define 'safe' and what does the documentation say?
[21:02] <dmick> krnl: yes
[21:11] * eschnou (~eschnou@29.89-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:13] * terje_ (~joey@63-154-144-150.mpls.qwest.net) has joined #ceph
[21:21] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[21:21] * terje_ (~joey@63-154-144-150.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[21:25] * noob21 (~cjh@ has joined #ceph
[21:28] * loicd (~loic@2a01:e35:2eba:db10:120b:a9ff:feb7:cce0) has joined #ceph
[21:33] * jlogan1 (~Thunderbi@2600:c00:3010:1:24ed:e619:d0aa:3516) Quit (Ping timeout: 480 seconds)
[21:41] * jlogan1 (~Thunderbi@ has joined #ceph
[21:44] * senner (~Wildcard@68-113-232-90.dhcp.stpt.wi.charter.com) has left #ceph
[21:49] * sage (~sage@ has left #ceph
[21:59] <infernix> nhm: your fio tests are against kernel rbd, correct?
[22:01] <nhm> infernix: for now, but I'm probably going to be quickly moving into QEMU/KVM too.
[22:05] * jskinner (~jskinner@ has joined #ceph
[22:07] <infernix> nhm: this is my next step, but i have to tie that into libguestfs
[22:07] <infernix> haven't checked how well that's supported there yet, if at all
[22:07] * krnl (~krnl@catv-213-222-155-93.catv.broadband.hu) Quit (Remote host closed the connection)
[22:09] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[22:22] <ShaunR> nhm: are the fio job files you use available on the net? I'd be curious to see what types of jobs your using and to run the same jobs to compare notes :)
[22:25] <darkfader> so why didn't any pictures of the awesome view from your offices
[22:25] <darkfader> tourist fail
[22:29] <dmick> heh
[22:30] * rturk-away is now known as rturk
[22:35] * Cube1 (~Cube@ has joined #ceph
[22:35] * Cube (~Cube@ Quit (Read error: Connection reset by peer)
[22:36] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:38] <nhm> ShaunR: I actually have a test script that I run that wraps fio and passes it commandline parameters rather than using job files
[22:38] <nhm> I can then do parametric sweeps over different parameter spaces
[22:42] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[22:48] <ShaunR> nice, would this script be benifitail to others for testing there own ceph clusters or is it too specific to your deploy?
[22:58] * terje (~joey@63-154-128-47.mpls.qwest.net) has joined #ceph
[22:59] * BillK (~BillK@124-169-167-246.dyn.iinet.net.au) has joined #ceph
[23:00] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[23:03] * Philip_ (~Philip@hnvr-4d079d7d.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[23:05] * eschnou (~eschnou@29.89-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:07] * terje (~joey@63-154-128-47.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[23:09] * terje (~joey@63-154-128-47.mpls.qwest.net) has joined #ceph
[23:10] * wer (~wer@wer.youfarted.net) Quit (Ping timeout: 480 seconds)
[23:10] * mcclurmc_laptop (~mcclurmc@client0637.vpn.ox.ac.uk) has joined #ceph
[23:11] * ScOut3R (~ScOut3R@catv-89-133-43-117.catv.broadband.hu) has joined #ceph
[23:17] * terje (~joey@63-154-128-47.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[23:18] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[23:26] * The_Bishop_ (~bishop@f052096075.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[23:31] * jskinner (~jskinner@ Quit (Remote host closed the connection)
[23:35] <lxo> hmm, interesting... I was creating a hardlink farm on a ceph-fuse-mounted mountpoint that I expected to be running on a ceph.ko mountpoint, and it was reporting that some files were missing. I checked in the ceph.ko mountpoint and they were there
[23:35] <lxo> then, just as I created the links within the ceph.ko mountpoint, other links failed in the ceph-fuse mountpoint
[23:38] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) Quit (Quit: Leaving)
[23:44] * jlogan1 (~Thunderbi@ Quit (Read error: Connection reset by peer)
[23:44] * jlogan1 (~Thunderbi@2600:c00:3010:1:217f:2c08:a1d4:e762) has joined #ceph
[23:46] * gaveen (~gaveen@ Quit (Quit: Leaving)
[23:47] * wer (~wer@211.sub-70-192-207.myvzw.com) has joined #ceph
[23:48] * wer (~wer@211.sub-70-192-207.myvzw.com) Quit (Remote host closed the connection)
[23:50] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:50] <lxo> how very odd... I filled in the web form for the ceph survey, but I can't find my response in the csv file!
[23:52] * wer (~wer@211.sub-70-192-207.myvzw.com) has joined #ceph
[23:52] <gregaf> lxo: rturk did anonymize the data, are you sure it's not just that?
[23:53] <rturk> hmm
[23:53] * rturk pulls up the responses
[23:53] <rturk> I can check for your response - how will I identify it?
[23:54] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[23:56] <lxo> 12TB cluster used mainly for backups right now, with a mention of future plans to use rbd. running on BLAG/Fedora servers, using that, gNewSense and Parabola as ceph.ko and ceph-fuse clients. I put in my email address @gnu.org IIRC
[23:56] <lxo> (but I realize the email addresses didn't make to the .csv file :-)
[23:57] * mcclurmc_laptop (~mcclurmc@client0637.vpn.ox.ac.uk) Quit (Ping timeout: 480 seconds)
[23:57] <lxo> anyway, it doesn't seem like my responses were particularly unusual. I wonder if the form failed to be recorded because of NoScript or somesuch. it didn't seem to require JavaScript, but..
[23:57] <rturk> ok, let me look
[23:57] <rturk> ya, that's possible…polldaddy probably does some scripty stuff
[23:57] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[23:58] <lxo> oh well... no significant loss for either of us ;-)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.