#ceph IRC Log


IRC Log for 2012-05-14

Timestamps are in GMT/BST.

[0:27] * lofejndif (~lsqavnbok@9KCAAE45N.tor-irc.dnsbl.oftc.net) has joined #ceph
[0:43] * Theuni (~Theuni@ has joined #ceph
[1:35] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:36] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (Quit: eternaleye)
[1:37] * eternaleye (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[1:43] * aa_ (~aa@r190-135-36-218.dialup.adsl.anteldata.net.uy) has joined #ceph
[1:50] * andresambrois (~aa@r186-52-182-108.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[1:50] * brambles (brambles@ Quit (Ping timeout: 480 seconds)
[1:59] * brambles (brambles@ has joined #ceph
[2:23] * lofejndif (~lsqavnbok@9KCAAE45N.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[2:50] * yoshi (~yoshi@p3167-ipngn3601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[3:39] * Qten (~Q@ppp59-167-157-24.static.internode.on.net) Quit (Quit: Leaving)
[5:55] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[6:38] * f4m8_ is now known as f4m8
[6:39] * CristianDM (~CristianD@host217.190-230-240.telecom.net.ar) Quit (Ping timeout: 480 seconds)
[6:40] * CristianDM (~CristianD@201-213-234-191.net.prima.net.ar) has joined #ceph
[7:18] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[7:33] * Theuni (~Theuni@ Quit (Quit: Leaving.)
[7:41] * CristianDM (~CristianD@201-213-234-191.net.prima.net.ar) Quit ()
[8:24] * Theuni (~Theuni@ has joined #ceph
[8:27] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[8:54] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:01] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[9:03] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[9:29] * gregorg (~Greg@ has joined #ceph
[9:40] * BManojlovic (~steki@ has joined #ceph
[9:59] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Ping timeout: 480 seconds)
[10:00] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:03] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[10:53] * hijacker (~hijacker@ Quit (Remote host closed the connection)
[10:53] * Ryan_Lane (~Adium@c-98-210-205-93.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:55] * kavit (kavit@ has joined #ceph
[11:01] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[11:01] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:45] * brambles (brambles@ Quit (Quit: leaving)
[11:46] * brambles (brambles@ has joined #ceph
[12:20] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[12:29] * dfdsav (dfdsav@ has joined #ceph
[12:42] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[13:05] * yoshi (~yoshi@p3167-ipngn3601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:11] * hijacker (~hijacker@ has joined #ceph
[13:14] * kavit (kavit@ Quit (Quit: Leaving)
[13:23] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[13:29] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[13:43] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Read error: Connection reset by peer)
[13:45] * aa__ (~aa@r190-135-42-26.dialup.adsl.anteldata.net.uy) has joined #ceph
[13:52] * aa_ (~aa@r190-135-36-218.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[14:03] * hijacker (~hijacker@ Quit (Remote host closed the connection)
[14:05] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[14:08] * eier (~turtle@ Quit ()
[14:20] * hijacker (~hijacker@ has joined #ceph
[14:40] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[14:40] * steki-BLAH (~steki@ has joined #ceph
[14:48] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[14:48] <nhm> good morning #ceph
[14:49] <liiwi> good afternoon
[15:00] * verwilst_ (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[15:00] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Read error: Connection reset by peer)
[15:15] * Theuni (~Theuni@ Quit (Quit: Leaving.)
[15:15] * Theuni (~Theuni@ has joined #ceph
[15:16] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[15:16] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[15:17] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[15:25] * dpejesh (~dholden@ has joined #ceph
[15:32] * Theuni (~Theuni@ Quit (Remote host closed the connection)
[15:32] * Theuni (~Theuni@ has joined #ceph
[15:56] * f4m8 is now known as f4m8_
[16:35] * hijacker (~hijacker@ Quit (Remote host closed the connection)
[16:41] * joao (~JL@aon.hq.newdream.net) has joined #ceph
[16:41] <joao> hi all
[16:44] * cattelan_away is now known as cattelan
[16:45] <nhm> morning Joao
[16:45] <joao> morning nhm
[16:45] <elder> J-Wow I sent you a message about Cscope
[16:45] <nhm> joao: Have a fun weekend?
[16:46] <joao> how was your flight back?
[16:46] <joao> nhm, I did, but I can barely feel my feet
[16:46] <joao> went all the way up and down to the observatory on foot
[16:46] <nhm> joao: heh, I bet! I heard you had lots of plans. Plane flight was good. They gave me a good seat with extra leg room.
[16:47] <joao> elder, thanks :)
[16:47] <elder> At least you were wearing your sneakers.
[16:48] <joao> yeah
[16:48] <joao> wouldn't have tried to pull that off with my boots
[16:51] * Theuni (~Theuni@ Quit (Ping timeout: 480 seconds)
[17:05] * steki-BLAH (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:05] * joao (~JL@aon.hq.newdream.net) Quit (Read error: Connection reset by peer)
[17:07] * joao (~JL@aon.hq.newdream.net) has joined #ceph
[17:16] <joao> elder, it seems one can also use ctags along with cscope in vim
[17:17] * bchrisman1 (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:17] <joao> which kicks ass when it comes to see a single class' methods and symbols
[17:17] <elder> Yes.
[17:18] <elder> Just remember whether to use ^] or ^\
[17:18] <elder> I'm rarely in C++ code so I don't understand the distinction though...
[17:36] <gregaf> why are they wasting legroom on nhm? I could have used it in all my flights :(
[17:42] <elder> He may have saved some.
[17:42] <elder> Ask him if he has any left over next time you fly.
[17:43] <gregaf> heh, I will!
[17:44] <nhm> gregaf: I just asked if they had anything avaialble toward the front of the plane and got a bulkhead seat.
[17:45] <gregaf> I thought most people except Southwest used all that space for the first class cabins
[17:45] <gregaf> and everybody gives you no storage space for it, since you don't get under-the-seat or an accessible overhead compartment
[17:46] <nhm> gregaf: this was on suncountry, and they just had a little curtain for first class.
[17:46] <nhm> I think we had more legroom than most of the first class seats honestly.
[17:46] <nhm> And still could put our stuff under the first class seats (though it was a little wierd since there were only 2 storage spots for 3 seats.
[17:47] <gregaf> suncountry?
[17:47] <nhm> gregaf: yeah, it's an airline out of MSP that flies direct to LA.
[17:47] <gregaf> never heard of them, do they just do the one route or something?
[17:47] <nhm> gregaf: they go from MSP to various places, but only like 1-2 flights per day.
[17:48] <gregaf> ah, gotcha
[17:48] <nhm> it's cheap and direct though.
[17:49] <nhm> They fly faster than delta too. Same flight takes like 30 mins less going sun country.
[17:49] * Theuni (~Theuni@ has joined #ceph
[17:49] <nhm> they must need the planes more than they need to save fuel.
[17:50] * BManojlovic (~steki@ has joined #ceph
[17:50] * lofejndif (~lsqavnbok@19NAAISI1.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:50] <gregaf> yeah, or they could conceivably even have different airspace
[17:51] <gregaf> (though that may be a fiction that I just made up; I don't really know domestic flight regulations)
[17:54] <elder> Maybe they use additional dimensions.
[17:57] <nhm> elder: it's the spice
[18:00] * Theuni (~Theuni@ Quit (Remote host closed the connection)
[18:00] * Theuni (~Theuni@ has joined #ceph
[18:02] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[18:05] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:15] <nhm> sage: with 2 SSD osds (and 2 SSD journals), I just saw one of those 1-2s pauses with no write throughput from the client. Got debugging on, but wasn't running blktrace.
[18:16] <nhm> Otherwise the run was nice. Aggregate of about 300MB/s with 4 OSDs.
[18:17] <nhm> So about a 82% improvement over two OSDs.
[18:18] * joshd (~joshd@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[18:27] * The_Bishop (~bishop@cable-86-56-102-91.cust.telecolumbus.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[18:28] * joshd (~joshd@p4FD06AB7.dip.t-dialin.net) has joined #ceph
[18:30] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[18:41] * aa__ (~aa@r190-135-42-26.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[18:41] * Theuni (~Theuni@ Quit (Quit: Leaving.)
[18:42] * Theuni (~Theuni@ has joined #ceph
[18:43] * bchrisman (~Adium@ has joined #ceph
[18:43] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[18:45] * Oliver1 (~oliver1@ip-176-198-98-219.unitymediagroup.de) has joined #ceph
[18:49] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[18:59] <elder> sagewk, I'm going to push those messenger patches into the "testing" branch. I'm going to then rebase "testing" so the XFS fix is last (again).
[18:59] <sagewk> ok!
[18:59] <elder> The end result should be testing = master + 3 messenger patches + xfs fix
[19:00] <elder> I'll try to identify a couple of for-linus fixes, but we're almost out of time for that.
[19:01] <sagewk> yeah
[19:02] <elder> Actually, you inserted your crush changes into master ahead of my rbd changes.
[19:03] <elder> I think you did the merge into testing a week ago. Any reason not to push those rbd changes into master that you can think of?
[19:04] <sagewk> nope
[19:04] <elder> OK, I'll fix that up too.
[19:04] <sagewk> cool
[19:17] * nhorman_ (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[19:18] * nhorman_ (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Read error: Connection reset by peer)
[19:20] * CristianDM (~CristianD@host217.190-230-240.telecom.net.ar) has joined #ceph
[19:22] * lofejndif (~lsqavnbok@19NAAISI1.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[19:24] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[19:27] <Tv_> waah people emailing individuals and not the list
[19:27] <nhm> Tv_: about?
[19:30] * Ryan_Lane (~Adium@ has joined #ceph
[19:37] * aa__ (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[19:37] <CristianDM> Hi Tv_
[19:38] <CristianDM> This weekend I setup ceph without cephx and this work fine now.
[19:38] <Tv_> nhm: just for general help
[19:38] <CristianDM> My issue happen when enable cephx
[19:40] * stan_theman (~stan_them@cumulonim.biz) has joined #ceph
[19:45] <yehudasa> nhm: what machines are supposed to run rgw for the big cluster?
[19:57] <sagewk> elder: there?
[19:57] <sagewk> yehudasa: lock a few plana to do it
[20:04] <nhm> yehudasa: don't know for the big cluster. I've got a couple plana nodes locked for the aging cluster.
[20:04] <nhm> I can free up some plana nodes too
[20:20] <nhm> yay for down nodes.
[20:23] * stass (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[20:27] * stass (stas@ssh.deglitch.com) has joined #ceph
[20:29] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has joined #ceph
[20:31] <elder> sage, I'm heare now.
[20:31] <elder> Just got lunch.
[20:33] <sagewk> elder: i have a question about the best way to deprecate a field in the ioctl struct..
[20:33] <elder> OK.
[20:33] <elder> What field, for context?
[20:33] <sagewk> the feature is removed, so the value is basically -1 always. should the ioctl EINVAL if you don't specify that? or silently ignore the entire field?
[20:34] <sagewk> struct ceph_ioctl_layout, preferred_osd
[20:35] <elder> Old code will be (potentially) providing specific values, right?
[20:35] <elder> And in that case, they would never expect to get EINVAL. I'd say it would be best to ignore the value
[20:35] <sagewk> 3469ac1aa3a2f1e2586a412923c414779a0af854 did the wrong thing because it no longer initializes it, but still enforces it be -1, which means set + get + set will fail.
[20:36] <sagewk> yeah, ok.
[20:36] <elder> Would there ever have been a case where passing a value to that field would change behavior in a user-visible way?
[20:36] <sagewk> the data would end up somewhere else in the cluster, but they would only notice that by peeking behind the curtain
[20:36] <sagewk> so, no
[20:36] <sagewk> also, there are no known users. :)
[20:36] <elder> If not I think igoring it is definitely the way to go. If so, there may be an argument for returning an indication to the user that they can't expect it to work.
[20:36] <elder> I think ignoring is best then,.
[20:37] <sagewk> ok cool, thanks
[20:37] <elder> But I think you should enforce -1 at the interface if possible.
[20:37] <elder> I.e., always return -1 there (or 0).
[20:37] <sagewk> good idea.
[20:37] <sagewk> cool, already do that.
[20:38] <elder> And pass a well-defined value (-1?) down to the kernel in the process of ignoring what the user provided.
[20:38] <elder> Anyway, sounds like you've got it.
[20:39] <sagewk> elder: now the question is whether you want to squash that into the patch that's already in master :)
[20:39] * danieagle (~Daniel@ has joined #ceph
[20:39] <elder> Nope. Let's just tack it onto the end.
[20:39] <sagewk> see fix-preferred-osd branch
[20:40] <elder> Shouldn't be a big deal. There'll be a small window of commits where the final behavior isn't in place but that's OK.
[20:40] <elder> Next release will have the whole thing.
[20:40] <sagewk> yep
[20:40] <elder> I'll look.
[20:40] <sagewk> if it looks ok it should go in testing branch
[20:41] <elder> Please rebase it on what I have out there as master-test
[20:41] <elder> I'm just donig a sanity test on it before committing it.
[20:41] <elder> And testing will be a few commits on top of that. Actually, it shouldn't matter much, I'm sure a cherry-pick would come out clean.
[20:45] <nhm> poor aging clusters are aging.
[20:54] <elder> nhm Crucial M4 128GB or OCZ Vertex 3 120 for $99.99. Which would you choose?
[20:55] * ssedov (stas@ssh.deglitch.com) has joined #ceph
[20:56] <nhm> elder: I'd take the M4.
[20:56] * stass (stas@ssh.deglitch.com) Quit (Remote host closed the connection)
[20:56] <elder> OK. I'm probably not ready to pounce yet, but I'm sort of getting educated while I decide...
[20:56] <nhm> elder: If you can swing a Vertex4, they should be quite fast.
[20:57] <nhm> elder: I bought the sandisk extreme myself.
[20:58] <elder> I'm not sure whether I want to invest more in that laptop given recent events... But it would be very nice to boot faster.
[20:58] <elder> I could learn much more quickly that X windows isn't working.
[20:58] <elder> (I do think I've solved that, though.)
[20:58] <nhm> that's true. You could reduce your debugging time significantly! ;)
[20:59] <elder> Also contemplating putting one on my speedy machine for my home directory for faster builds.
[20:59] <nhm> yeah, theoretically I'm going to replace the system disk in my desktop with this, whenever I actually have time to do that.
[21:00] <elder> Vertex 4 is about $135 for 128GB
[21:01] <elder> My laptop only has 3Gbit SATA I think. My desktop probably has 6Gbig.
[21:01] <elder> Gbit
[21:01] <nhm> That's pretty nice. It was like $160 when I looked.
[21:01] <nhm> prices just keep coming down.
[21:05] <Oliver1> nhm: "should be quite fast"??? OCZ V3 240GiB on a MacBook 450/380 MB/s after a couple of months, define "fast" ;) ( Began with 520/480 IIRC)
[21:06] <nhm> Oliver1: If I recall I think the vertex4 was faster without compression, but I could be wrong.
[21:07] <nhm> maybe higher iops too.
[21:07] <Oliver1> nhm: Hope you are wrong??? have no further investments plans :-D
[21:08] <nhm> lol, I think you'll be fine. ;)
[21:08] <Oliver1> Hehe...
[21:09] <Oliver1> Def. satisfied with performance, this thing here is bleeding fast...
[21:11] <nhm> Yeah, pretty crazy how things are changing. I'm really curious how HP will do with their memristor stuff.
[21:16] <Oliver1> Yeah, <sentimental>First computer C64 with a tape-drive??? dunno what throughput ;-) </sentimental>
[21:19] <Oliver1> Mhm??? thinking of some youngstars now asking well-known-search-engines for the meaning of "c64"??? ;)
[21:19] <joshd> he's a good dj :)
[21:20] <Oliver1> The DJ's my best friend??? ;)
[21:20] <nhm> C64? I had a tandy color computer 2. :(
[21:21] <elder> sagewk, ceph-client testing and master branches are now updated.
[21:22] <Oliver1> nhm: wow??? uhm??? great. ;) So we all have a "past". Cool.
[21:23] <nhm> Oliver1: The only thing I really remember about it was playing some really bad cookie monster game on it.
[21:23] <nhm> And my dad using it to autodial to get world series tickets.
[21:24] <Oliver1> well, hope you have the monsters not coming back in you nightmares??? should we talk? :-D
[21:27] <nhm> Oliver1: That game was probably on the order of "E.T. Lizard Creature and Me".
[21:29] <joshd> Oliver1: could you install the wip-objecter-throttle branch? it should fix the stale vms problem
[21:34] <sagewk> elder: repushed to address yoru commends and added your reviewed-by
[21:34] <elder> OK.
[21:34] <elder> Want me to take them in and get them into master/testing for you?
[21:49] <nhm> So with 2 Nodes, and SSD backed OSDs/Journals (XFS), we get 163MB/s with 2 OSDs, and around 466MB/s with 5.
[21:50] <nhm> sorry, with 10
[21:50] <nhm> 1 per node vs 5 per node.
[21:50] <nhm> this i with rados bench running 16 concurrent 4MB requests.
[22:01] * lofejndif (~lsqavnbok@28IAAEP5Q.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:04] <gregaf> nhm: you should turn up the concurrency; if you've got 10 nodes and 16 requests you've barely got any pipelining at all going on
[22:07] <nhm> gregaf: you mean 10 OSDs?
[22:07] <gregaf> yeah
[22:08] <gregaf> assuming you're only using 1 bench run, anyway ?????if you've got a client per OSD then n/m
[22:10] <nhm> gregaf: yeah, I didn't really try much with it yet, just playing with it while I'm waiting for the aging test stuff to go. Bumping that up to 64 concurrent requests from one client pushes it up to around 500MB/s.
[22:10] <nhm> gregaf: I'll probably try multiple clients next.
[22:10] <gregaf> that's odd; I wonder what's holding the scaling down
[22:11] <gregaf> you have a reasonable number of PGs there, right?
[22:12] <nhm> gregaf: looks like there are 640.
[22:13] <gregaf> yep, that'd be reasonable
[22:13] <gregaf> grrr, argh
[22:13] <nhm> gregaf: could be client side. I'll see what 2 clients can do.
[22:15] <nhm> hrm, maybe looks marginally better.
[22:17] <nhm> about 514MB/s aggregate.
[22:18] <gregaf> so we go from 163MB/s/OSD to 102MB/s/OSD
[22:19] <gregaf> which is only about half the network bandwidth, right?
[22:19] <nhm> gregaf: yep
[22:19] <gregaf> oh, wait, no!
[22:19] <gregaf> we're network-limited on the physical nodes at that point, right?
[22:19] <nhm> ah, you are correct!
[22:20] <nhm> I had forgotten that the osds and clients are sharing the back network.
[22:21] <gregaf> each physical node is receiving about 250MB/s straight from clients, plus another 250MB/s from the other physical node, plus sending out 250MB/s???which isn't quite full line-rate but is getting close enough that I bet it's overhead and timing collisions
[22:21] <gregaf> (I dunno if 10Gbe is full- or half-duplex?)
[22:21] <dmick> full
[22:21] <dmick> at least in theory
[22:21] <nhm> gregaf: actually, never mind. that should be full duplex.
[22:21] <elder> Everything above 10Mbit is full I think.
[22:22] <gregaf> oh, bah
[22:22] * Theuni (~Theuni@ Quit (Quit: Leaving.)
[22:22] <gregaf> I don't think 100Mb was either, but I could be wrong
[22:22] * Theuni (~Theuni@ has joined #ceph
[22:22] <darkfader> everything with a switch
[22:22] <darkfader> you could have 100/half and 10/full
[22:22] <dmick> 100M was, yes, with a switch
[22:22] <dmick> 10/full was pretty uncommon in my experience, but possible
[22:23] <dmick> but mostly because no one was using switches for that
[22:23] <gregaf> hah, Wikipedia says 10GbE is the first with only full-duplex operation :p
[22:23] <nhm> gregaf: Later once I have the congress stuff going again I'll do tests with varying numbers of OSDs/node and we can look at what the plots look like.
[22:23] <gregaf> but that doesn't help here :(
[22:23] <elder> So full.
[22:23] <dmick> does this look like a known symptom before I dig more?
[22:24] <dmick> -1> 2012-05-11 08:58:22.950254 7ffb7c0de700 -1 FileStore: sync_entry timed o
[22:24] <dmick> ut after 600 seconds.
[22:24] <dmick> ceph version 0.45-329-g05b4fb3 (commit:05b4fb33a1c8727089cf4283a612b94b95981cd1)
[22:24] <nhm> gregaf: I've got lots of debugging on, maybe that's playing a bigger part with such high speeds.
[22:24] <dmick> (I imagine all it means is "some request didn't finish" but maybe there's a common cause that's been fixed)
[22:25] <gregaf> nhm: I assume that the ratio there doesn't change since you're scaling both daemons and disks, and there's plenty of CPU to go around
[22:25] <joshd> dmick: it's a request to the underlying fs that didn't finish, so the common cause is usually btrfs
[22:25] * Theuni1 (~Theuni@ has joined #ceph
[22:25] <gregaf> dmick: that's a call to sync() or similar taking more than 10 minutes
[22:25] <dmick> ext4, if it matters
[22:25] <gregaf> ie bad, but not Ceph's fault
[22:25] <gregaf> it's Ceph asserting out on the assumption that the disk disappeared out from under it or something
[22:25] <nhm> gregaf: yeah. lots of stuff I need to look at. Maybe the controller is angry, or maybe it's debugging.
[22:26] * Theuni1 (~Theuni@ Quit ()
[22:26] <dmick> there was some other stuff before: -12> 2012-05-10 15:39:25.342031 7ffb749c2700 0 -- >> pipe(0x28e9c80 sd=38 pgs=79 cs=71 l=0).fault with nothing to send, going to standby
[22:26] * zykes (~zykes@184.79-161-107.customer.lyse.net) has joined #ceph
[22:26] <dmick> ...accept connect_seq 74 vs existing 73 state 3
[22:26] <dmick> etc.
[22:26] <gregaf> nhm: heh, I was just sad at the scaling numbers, you can do your thing :)
[22:26] <liiwi> Tv_: hola, ltns
[22:26] <zykes> Is it possible for a 2 node Ceph install ?
[22:27] <gregaf> joshd: oh, I forgot to remind you that you're on support watch :p
[22:27] <liiwi> &Greetings folks
[22:27] <joshd> gregaf: that's what the calendar does
[22:27] <gregaf> it seems to work better with some people than others so I like to say so as well
[22:27] <nhm> gregaf: Oh, don't worry. It's good to have you guys getting fired up. ;)
[22:28] <joshd> zykes: you can run everything on one node if you're just testing
[22:28] <zykes> joshd: I got a small OpenStack demo cluster with 2 nodes
[22:28] <zykes> would that be doable ?
[22:28] <joshd> yeah
[22:28] <zykes> any guides ?
[22:29] <joshd> not yet, but soon
[22:29] <zykes> :/
[22:29] <zykes> has any of you done openstack with ceph ?
[22:29] <joshd> it's not too hard if you're running ubuntu 12.04
[22:30] <zykes> ok, how ?
[22:30] <joshd> I wrote the integration
[22:30] * Theuni (~Theuni@ Quit (Ping timeout: 480 seconds)
[22:30] <zykes> wiht openstack ?
[22:30] <joshd> yeah
[22:30] <zykes> howto then ?
[22:32] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:32] <nhm> gregaf: turning off debugging got us up to around 532MB/s.
[22:32] <joshd> first install and configure ceph with no mds (not needed for rbd): http://ceph.com/docs/master/install/
[22:32] <zykes> mds
[22:32] <zykes> ?
[22:32] <joshd> metadata server
[22:33] <zykes> k
[22:33] <joshd> it's for the posix-compliant filesystem layer (ceph dfs)
[22:33] <Tv_> liiwi: oh wow. hi.
[22:34] <liiwi> Tv_: heh, rewind 15 years and go y!
[22:36] <joshd> zykes: what's your use case for ceph and openstack?
[22:38] <zykes> joshd: instance storage
[22:39] <zykes> what's that compared to raods then joshd ?
[22:40] <joshd> rados is an object store that provides strong consistency and a bunch of advanced features like custom transactions and snapshots
[22:40] <joshd> rbd is a block device that is striped over rados objects
[22:41] <joshd> rbd is the part that's integrated for nova volume and image storage
[22:41] <joshd> currently there's no way to create a volume from existing data via openstack, but that should change in folsom
[22:42] <zykes> joshd: so with RBD I can't use swift for an image backend ?
[22:43] <zykes> well that kinda sucks
[22:43] <joshd> zykes: you can
[22:43] <zykes> k
[22:43] <zykes> you mean restoring say a snapshot to ceph?
[22:44] <joshd> in the future rbd will have copy-on-write cloning though, so it'll be nice to store your images and instances in the same storage cluster
[22:45] <joshd> rados also has a http gateway (radosgw) that provides the s3 and swift apis
[22:46] <joshd> the thing with volumes in openstack is that currently you have to fill them with content yourself; openstack doesn't know how to read/write to them, only how to connect them to a vm
[22:47] <joshd> you can hack around that for now, by creating a volume and uploading content to it outside of openstack, but by folsom it should be as easy to use volumes as it is to use local storage
[22:49] <dmick> elder: samf thinks you might have some interest in a cephfs crash that ken.franklin experienced
[22:49] <elder> Well, possibly.
[22:49] <dmick> machine is rebooting, but I'll have some log messages in a minute
[22:49] <elder> I have plenty to do. :)
[22:49] <elder> Great.
[22:49] <elder> I'll look.
[22:49] <dmick> yeah, I know. expecting the answer "uh yeah well something went wrong but I don't have time to dig further", but just in case...
[22:49] <elder> I have to leave in about half an hour, but if we can get a bug opened and stash some stuff there maybe that will help.
[22:50] * The_Bishop (~bishop@p4FCDFFCB.dip.t-dialin.net) has joined #ceph
[22:53] <gregaf> dmick: was Ken running the kernel client and OSD on the same node?
[22:53] <gregaf> that would explain the sync timeout
[22:54] * stan_theman (~stan_them@cumulonim.biz) has left #ceph
[22:56] <dmick> yes
[22:57] <dmick> what's the issue there?
[22:57] <dmick> I've heard rumors about locks but I don't know the story from a to z
[22:57] <gregaf> the OSD issues a sync(), which then makes the kernel client try and sync
[22:57] <dmick> sam mentioned this, but also mentioned syncfs
[22:58] <gregaf> and that doesn't work if the client has any dirty data for the local OSD, since it sends out a request that the OSD can't make use of
[22:58] <gregaf> yes, but I'm not sure that glibc actually supports syncfs yet?
[22:58] <dmick> mm
[22:58] <gregaf> it would need to be a new kernel and a new glibc
[22:58] <gregaf> could whip up a tester for it, I assume, but given the evidence that's what my bet is
[22:58] <dmick> man: syncfs() first appeared in Linux 2.6.39.
[22:58] <dmick> hm
[22:59] <dmick> diggin
[23:01] * Theuni (~Theuni@ has joined #ceph
[23:13] <Tv_> sagewk: fyi i'm filing tickets for the things we talked about, and new ones..
[23:13] <Tv_> sagewk: as part of prepping for the sprint planning
[23:14] <sagewk> perfect
[23:17] <zykes> joshd: what you mean ?
[23:17] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[23:17] <zykes> that you could take like a thing from swift and populate a ceph vol with ?
[23:17] <zykes> or what can't I do now with it?
[23:17] <sagewk> gregaf: it does. grep SYNCFS src/acconfig.h to see if it was detected
[23:17] <sagewk> dmick: ^
[23:17] <joshd> zykes: there's no way for openstack to create a volume with the contents of a given image right now
[23:18] <zykes> ok
[23:18] <zykes> but it can snapshot it and so on ?
[23:18] <zykes> and restore
[23:19] <joshd> zykes: yes, but you can't create a volume from a snapshot in openstack yet
[23:20] <dmick> sagewk: yes, was trying to discover from some binary
[23:20] <dmick> trying to figure out where FileStore is packaged
[23:20] <joshd> zykes: you can create a volume, remove the underlying rbd volume, and copy a snapshot to replace it (rbd cp --snap=blah vol1 vol2) or upload content from a local file (rbd import filename imagename)
[23:23] <joshd> zykes: I don't think the openstack volume api has rollback built in either
[23:27] <dmick> sagewk: looks a lot like syncfs is not in the 0.45-329-g05b4fb3 binary
[23:27] <zykes> it has snapshots joshd
[23:27] <dmick> difficult because of inline, but I'm 90%
[23:27] <sagewk> dmick: that means the glibc it built on didn't have syncfs...
[23:27] <dmick> right
[23:27] <dmick> modulo bugs
[23:27] <sagewk> maybe checkyour local build and see if you have it there
[23:27] <sagewk> grep SYNCFS src/acconfig.h
[23:28] <dmick> #define HAVE_SYNCFS 1
[23:28] <dmick> #define HAVE_SYS_SYNCFS 1
[23:28] <sagewk> k
[23:28] <dmick> but I'm now on 12.04
[23:28] <dmick> so that doesn't prove as much as we'd like
[23:28] <sagewk> only that the ./configure check works
[23:28] <sjust> sam@plana20:~/ceph$ grep SYNCFS src/acconfig.h
[23:28] <sjust> /* #undef HAVE_SYNCFS */
[23:28] <sjust> /* #undef HAVE_SYS_SYNCFS */
[23:28] <sjust> on 11.10
[23:28] <zykes> sudo apt-get install ceph < joshd ?
[23:28] <dmick> mm
[23:28] <joshd> zykes: yeah, you can create or delete a snapshot, but I don't think there's a way to rollback in the api
[23:29] <joshd> zykes: yeah
[23:30] <gregaf> nhm: hah, this syncfs conversation explains why the OSDs aren't scaling like we'd like!
[23:30] * BManojlovic (~steki@ has joined #ceph
[23:31] <dmick> it certainly seems suboptimal
[23:31] <Tv_> we need to go move all the things to 12.04 :-/
[23:31] <gregaf> (at least if you're not using btrfs)
[23:31] <dmick> MOVE ALL THE...well, you knwo.
[23:31] <elder> Back in a few hours.
[23:31] <Tv_> (except teuthology as it is today will be hopelessly confused for anything except "everything changes at once" upgrades)
[23:31] <Tv_> (which means downtime)
[23:32] <Tv_> dmick: except there's OVER 9000 things
[23:32] <nhm> gregaf: I've been doing stuff on the congress test stuff. Need to read backlog
[23:32] <dmick> looks like I picked the wrong week to give up smoking crank
[23:34] <nhm> lol
[23:37] <dmick> fwiw, the precise gitbuilder output indeed appears to have calls to syncfs where I'd expect
[23:37] <dmick> (just closing the loop on methodology)
[23:41] * Oliver1 (~oliver1@ip-176-198-98-219.unitymediagroup.de) Quit (Quit: Leaving.)
[23:45] <dmick> what's a good pastebin site?
[23:46] <dmick> pastebin.com, or have the cool kids moved on, I guess is what I'm asking
[23:46] <Tv_> dmick: are there really significant quality differences, at all?
[23:46] <dmick> beats me
[23:48] <dmick> elder: http://pastebin.com/Dj6z5GAs; I'll send email too in case you're not around

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.