#ceph IRC Log

Index

IRC Log for 2013-01-05

Timestamps are in GMT/BST.

[0:00] <gregaf> CEPH_CAP_FILE_WR is the one I meant
[0:00] <gregaf> let me see if I can find the size changing guards
[0:01] * terje_ (~joey@63-154-158-186.mpls.qwest.net) has joined #ceph
[0:02] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[0:02] <mattbenjamin> gregaf: another way of putting it, I believe CEPH_CAP_FILE_WR is non-exclusive (?)
[0:02] <gregaf> yes
[0:03] <mattbenjamin> gregaf: so I could compose it with CEPH_CAP_FILE_EXCL?
[0:03] <mattbenjamin> gregaf: sorry to be dense
[0:03] <gregaf> yes; that's a common case if one client has the file open and is writing to it :)
[0:03] <gregaf> what are you trying to do with the size and updates?
[0:03] <mattbenjamin> gregaf: nfsv4 SYNC4 semantics in ll_write_block
[0:04] <gregaf> heh, that unfortunately doesn't help me a ton
[0:04] <gregaf> but if the client has the caps to do file writes it should be able to set size, right sagewk?
[0:04] * terje_ (~joey@63-154-158-186.mpls.qwest.net) Quit (Read error: Operation timed out)
[0:05] <mattbenjamin> I'm working on the pnfs dataserver write path for our ganesha driver.
[0:05] * houkouonchi-work (~linux@12.248.40.138) Quit (Remote host closed the connection)
[0:05] <gregaf> I just mean I don't know much about the NFS protocol
[0:06] <elder> gregaf is Dan in the office?
[0:06] <elder> Nevermind.
[0:06] <gregaf> heh
[0:06] <mattbenjamin> gregaf: ok, sorry. that is a "write stability" assertion. if the client sends this, it's not going to send a commit op, so the receiving ds has to commit it
[0:09] <gregaf> so you're calling setattr and changing the size, and you want it to be durable before returning? or just for the other end to know that the client isn't going to give it any other commands to make the state durable?
[0:11] * miroslav (~miroslav@ip-64-134-223-194.public.wayport.net) has joined #ceph
[0:11] <mattbenjamin> gregaf: not calling setattr, but emulating it; what I need is to be ensured that another client cannot send this; I'm fine atm if it cannote write either, hence could I compose CEPH_CAP_FILE_EXCL (if that's the right cap)
[0:12] <gregaf> I'm confused; are you writing protocol extensions as well as libcephfs functions?
[0:13] <phantomcircuit> recovery -928/94470 degraded (-0.982%)
[0:13] <phantomcircuit> er
[0:13] <mattbenjamin> gregaf: I didn't think I was writing protocol extensions. _setattr() is a libcephfs function that does things I don't want, so emulating -that-, but as far as I can see, not changing any message
[0:14] <phantomcircuit> degraded being negative seems wrong to me
[0:14] * The_Bishop__ (~bishop@f052100073.adsl.alicedsl.de) has joined #ceph
[0:14] <phantomcircuit> but i guess that could be the result of there being more replicas than intended?
[0:15] <gregaf> mattbenjamin: okay, so you shouldn't need to worry about the capabilities — the client and the MDS will negotiate for the ones required when doing the open and such. If you're trying to use the capabilities to prevent other clients from touching a file…you're barking up the wrong tree
[0:16] <gregaf> what is setattr doing that you don't want, though?
[0:16] <mattbenjamin> gregaf: why is that barking up the wrong tree? it doesn't work?
[0:17] * andret (~andre@pcandre.nine.ch) has joined #ceph
[0:17] <gregaf> capabilities define what a client (foo) is allowed to do, but if a client isn't allowed to do something then the MDS negotiates with the other clients (bar and baz) in order to allow foo to do what it wants
[0:17] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[0:18] <gregaf> bar and baz could hold up the process by not releasing capabilities in a timely fashion, but they're supposed to release them as quickly as possible on request
[0:18] <mattbenjamin> gregaf: so the model isn't 'get capability, should you be granted it, bar and baz are dealt with?'
[0:18] <mattbenjamin> (revocation or whatever on bar and baz)
[0:18] <gregaf> bar and baz get certain caps revoked before foo gets the cap it wants, yes
[0:18] <gregaf> but that revocation is supposed to be fast
[0:18] * jskinner (~jskinner@69.170.148.179) Quit (Remote host closed the connection)
[0:19] * miroslav (~miroslav@ip-64-134-223-194.public.wayport.net) Quit (Ping timeout: 480 seconds)
[0:20] <gregaf> if you're just trying to hold a size stable for a few hundred milliseconds that would fit inside that mandate, but again I can't tell what you're trying to do
[0:20] <gregaf> davidz might be able to facilitate this since he actually knows things about NFS...
[0:21] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[0:22] * The_Bishop_ (~bishop@e177088127.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[0:22] <gregaf> and cap manipulation is a much deeper change than adding new functions that send synchronous MDS requests or work on file handles instead of paths ;)
[0:25] <mattbenjamin> gregaf: well, I think I need to do cap manipulation, just trying to learn how to do it right.
[0:25] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:26] * terje_ (~joey@63-154-135-247.mpls.qwest.net) has joined #ceph
[0:26] <davidz> mattbenjamin: Are you trying to do a LAYOUTCOMMIT after the NFS client has written to the OSDs directly? I wonder what the cap state should be at the "metadata server/ceph client" while the NFS client is holding the layout and can write to the OSDs? Is that what you are trying to do?
[0:26] * ScOut3R (~ScOut3R@dsl5401A397.pool.t-online.hu) has joined #ceph
[0:27] <mattbenjamin> davidz: what I'm doing is not in layoutcommit, but is equivalent. we hold a layout and we are asserting control over what the mds will permit on the inode while we have it.
[0:28] <mattbenjamin> gregaf,davidz: yes, our ds writes to the osds directly.
[0:29] * terje_ (~joey@63-154-135-247.mpls.qwest.net) Quit (Read error: Operation timed out)
[0:33] <gregaf> mattbenjamin: sorry, I'm getting a quick lesson from David and it all just clicked in my head that we're discussing pnfs and you're trying to stick the protocol on top of Ceph's
[0:34] <mattbenjamin> davidz: It is my belief that there is not one single answer to what the cap state should be, at what location, but that is the problem space. Also, we have a distinction between the an MDS which issued the layout, and the DS which will originate the write and the size change. IF we need a novel combination of caps between the two entities, then I guess that would be a protocol "changelet."
[0:35] <mattbenjamin> gregaf: ack
[0:35] <davidz> mattbenjamin: I hope you don't want to use ceph to write at the DS
[0:35] <gregaf> the Ceph protocols are all written so that clients have full knowledge of both metadata and data locations and changes which impact them
[0:35] * ScOut3R (~ScOut3R@dsl5401A397.pool.t-online.hu) Quit (Remote host closed the connection)
[0:37] <gregaf> so the quick and dirty solution that I can imagine would be to run an NFS storage daemon on each OSD server and have it accept incoming writes from clients and then write them out to whichever OSD — this probably wouldn't be a very good solution
[0:37] <davidz> on the other hand….I imagine the "metadata server" as a ceph client that manages the NFS clients. The DS is a thin layer on top of OSDs that allows an NFS client to read/write data as if the ceph client was doing it.
[0:38] <gregaf> anything deeper than that, though, is going to require some pretty serious knowledge and changes to the Ceph protocols at first glance, though
[0:38] <mattbenjamin> davidz, gregaf: our hypothesis (made ~2 years ago, and prototyped) was that we can adapt the Ceph protocols without undue abuse. Would you believe that we imagined and wrote a bunch of this years ago?
[0:38] <gregaf> I could be missing something; I'd like to grab Sage but he's in the tail end of a meeting
[0:38] * miroslav (~miroslav@ip-64-134-223-194.public.wayport.net) has joined #ceph
[0:39] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[0:40] <gregaf> mattbenjamin: what kind of design are you working with? if I can get the shape in my head better that would help since i'm working with a pretty incomplete picture of p/nfs
[0:45] <mattbenjamin> davidz: what david is describing is quite close to what our design does. the pnfs MDS is -mostly- a ceph client that manages nfs clients; the nfs clients get pnfs "layouts" which describe a topology and other details by which they can perform i/o at various data servers (DSes). A DS in our model is colocated with each Ceph OSD we're interested in. It then uses RADOS (atm) to perform i/o on behalf of the clients.
[0:45] <mattbenjamin> er, gregaf
[0:46] <gregaf> okay, so you want the MDS to lock sizes until the nfs client write completes?
[0:46] <gregaf> there's only one pnfs metadata server, right?
[0:47] <gregaf> are you trying to make this be properly coherent with other Ceph clients on the cluster or something?
[0:48] <gregaf> because if not you don't need the Ceph MDS to do any locking at all...
[0:48] <mattbenjamin> gregaf: there may be an arbitrary number of MDS servers, though only one is nominated in the pnfs specs. What we'll say is, we'll assure that only one MDS may issue layouts for a given inode.
[0:50] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:51] <mattbenjamin> gregaf: right, wanting to be properly coherent with other clients, at least we would prefer that. You're right of course, we could bypass more of this, but I'm refining what we have. And let's be honest, since we never really talked in depth to you folks about it, it doesn't surprise me that some of this is from left field.
[0:53] <gregaf> okay, so from what you describe this doesn't require protocol changes, but doing it all is going to require a pretty good understanding of the capabilities and the protocol surrounding them
[0:53] <gregaf> it's a much deeper level of engagement than anything I'd heard about people working on :)
[0:54] <mattbenjamin> gregaf: I am just digging into the caps stuff, but this is just the tip of the iceberg. But I'm hoping that I and the other folks here can join in on the list and ask questions like this?
[0:56] * nwat (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[0:57] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) has joined #ceph
[0:57] * noob2 (~noob2@pool-71-244-111-36.phlapa.fios.verizon.net) has left #ceph
[0:57] * miroslav (~miroslav@ip-64-134-223-194.public.wayport.net) Quit (Ping timeout: 480 seconds)
[0:59] * brady (~brady@rrcs-64-183-4-86.west.biz.rr.com) Quit (Quit: Konversation terminated!)
[0:59] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[0:59] <sagewk> mattbenjamin: can you send a quick email to ceph-devel summarizing the question? just finished up byt my backlog of im's and emails is growing :)
[1:00] * gaveen (~gaveen@112.135.136.152) Quit (Remote host closed the connection)
[1:00] <sagewk> mattbenjamin: given my (admittedly lmjited) understanding of what pnfs needs, i don't think there are any insurmountable hurdles here, but we do need to consider carefully how the interfaces should be extended
[1:02] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:02] <mattbenjamin> sagewk: sure. what I'd be hopeful for is some level of generality (as there appears to be, however many hidden assumptions) such that there's some room for r‌efinement, rather than starting with a baked in 'ceph pnfs model' off the bat. if that makes sense?
[1:03] <sagewk> yeah definitely, that would be ideal
[1:04] <mattbenjamin> What I'm working on right now is basically cleaning up and making useable what we have, because people are asking to try it out, etc. My personal work item currently is the DS data path.
[1:10] * joshd1 (~jdurgin@2602:306:c5db:310:4da1:bdc1:80d7:aea4) Quit (Quit: Leaving.)
[1:16] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[1:20] * terje_ (~terje@63-154-135-247.mpls.qwest.net) has joined #ceph
[1:21] * terje__ (~joey@63-154-135-247.mpls.qwest.net) has joined #ceph
[1:22] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:22] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:22] * terje_ (~terje@63-154-135-247.mpls.qwest.net) Quit (Read error: Operation timed out)
[1:29] * terje__ (~joey@63-154-135-247.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[1:34] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[1:35] * agh (~agh@www.nowhere-else.org) has joined #ceph
[1:35] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:35] * terje_ (~terje@63-154-135-247.mpls.qwest.net) has joined #ceph
[1:37] * terje_ (~terje@63-154-135-247.mpls.qwest.net) Quit (Read error: Operation timed out)
[1:40] * terje (~terje@63-154-135-247.mpls.qwest.net) has joined #ceph
[1:42] * terje (~terje@63-154-135-247.mpls.qwest.net) Quit (Read error: Operation timed out)
[1:47] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[1:50] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[1:50] <phantomcircuit> nhm, btw i can confirm that was a bad disk
[1:50] <phantomcircuit> all the tinkering for nothing :)
[1:58] * mattbenjamin (~matt@aa2.linuxbox.com) Quit (Quit: Leaving.)
[2:02] * korgon (~Peto@isp-korex-15.164.61.37.korex.sk) Quit (Quit: Leaving.)
[2:10] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[2:10] * ChanServ sets mode +o scuttlemonkey
[2:16] * benpol (~benp@garage.reed.edu) has left #ceph
[2:16] * yanzheng (~zhyan@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[2:20] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[2:20] * terje_ (~terje@63-154-135-247.mpls.qwest.net) has joined #ceph
[2:21] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[2:21] * sagelap (~sage@2607:f298:a:607:3c5a:2c79:5f87:1f13) Quit (Read error: Operation timed out)
[2:28] * terje_ (~terje@63-154-135-247.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[2:32] * mattbenjamin (~matt@adsl-75-45-227-140.dsl.sfldmi.sbcglobal.net) has joined #ceph
[2:34] * sagelap (~sage@228.sub-70-197-131.myvzw.com) has joined #ceph
[2:38] * joao (~JL@89.181.157.150) Quit (Ping timeout: 480 seconds)
[2:40] <sagelap> dmick, gregaf: pushed wip-3731
[2:40] <sagelap> can you take a look?
[2:40] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[2:40] <dmick> will do
[2:40] * terje_ (~terje@63-154-135-215.mpls.qwest.net) has joined #ceph
[2:41] <dmick> "the RD big" I assume was meant to be "being"?
[2:42] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[2:42] <sagelap> bit
[2:42] <dmick> ah, sure
[2:42] <dmick> and the call to _readd has an extra d
[2:42] <dmick> which I'm sure gcc told you about by now
[2:43] <sagelap> where do you see that?
[2:43] <dmick> - } else if (i->op.op & CEPH_OSD_OP_MODE_RD) {
[2:43] <dmick> + } else if (ceph_osd_op_mode_readd(i->op.op)) {
[2:43] <sagelap> got it
[2:43] <sagelap> thanks
[2:43] <sagelap> repushed
[2:44] <dmick> I'll have to study calc_op_budget etc. to understand or review that
[2:46] <sagelap> it's just deciding the 'cost' for an operation for the librados client-side throttling
[2:46] <sagelap> (on read operations, we budget for the amount of data we'll be reading)
[2:47] <dmick> ...and there is none for a call
[2:47] <sagelap> right
[2:47] <dmick> or is there?
[2:47] <dmick> don't calls do I/O too?
[2:47] <sagelap> well, CALL is RD|EXEC, and there isn't an exec case in that if branch, so it's 0 regardless
[2:47] <sagelap> they can, but that function isn't smart enought obudget for them (before or after)
[2:48] <dmick> well it used to just check the RD bit, not care about EXEC
[2:48] <sagelap> or in general.. the client doesn't know how much data a call op might return.
[2:48] <sagelap> } else if (ceph_osd_op_mode_read(i->op.op)) {
[2:48] <sagelap> if (ceph_osd_op_type_data(i->op.op)) {
[2:48] <sagelap> if ((int64_t)i->op.extent.length > 0)
[2:48] <sagelap> op_budget += (int64_t)i->op.extent.length;
[2:48] <sagelap> } else if (ceph_osd_op_type_attr(i->op.op)) {
[2:48] <sagelap> op_budget += i->op.xattr.name_len + i->op.xattr.value_len;
[2:48] <sagelap> }
[2:48] <sagelap> }
[2:48] <dmick> oh *that* function
[2:49] * terje_ (~terje@63-154-135-215.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[2:49] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[2:52] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[2:53] <dmick> I guess what I'm saying is, the old code did that for CALL ops, the new code will not. I understand you can't predict how much data you read, but I thought calls could both read and write, but I guess this is about "I/O to the filestore"
[2:54] <sagelap> the old code didn't, though, because those other 2 ifs dn't match CALL=RD|EXEC
[2:54] <sagelap> only anything|DATA and anything|ATTR
[2:54] <sagelap> those helpers are also checking bits in the op code.
[2:54] <dmick> right. ok
[2:54] <dmick> (tags were screwed and I just verified that myself)
[2:54] <dmick> with you.
[2:55] <dmick> ...and that was the only caller of op_mode_read. wow.
[2:55] * joao (~JL@89-181-157-150.net.novis.pt) has joined #ceph
[2:55] * ChanServ sets mode +o joao
[2:56] <dmick> I note, btw, that the wireshark plugin code hadn't been changed, but this just obviates the need to change it as well
[3:01] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[3:01] * terje_ (~joey@63-154-135-215.mpls.qwest.net) has joined #ceph
[3:05] <dmick> anyway sagelap, LGTM
[3:06] <mattbenjamin> davidz: still there?
[3:07] <mattbenjamin> davidz: I like get/put, yes. In various systems, we have ref/release for "extra" refs.
[3:08] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[3:09] * terje_ (~joey@63-154-135-215.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[3:10] * fzylogic (~fzylogic@69.170.166.146) Quit (Quit: fzylogic)
[3:13] * LeaChim (~LeaChim@b01bde88.bb.sky.com) Quit (Ping timeout: 480 seconds)
[3:15] <nhm> phantomcircuit: sorry about the bad disk, but glad it wasn't ceph. :)
[3:16] * sagelap (~sage@228.sub-70-197-131.myvzw.com) Quit (Ping timeout: 480 seconds)
[3:26] * terje_ (~terje@63-154-135-215.mpls.qwest.net) has joined #ceph
[3:32] * jlogan1 (~Thunderbi@2600:c00:3010:1:519e:fddc:b274:5689) Quit (Ping timeout: 480 seconds)
[3:34] * terje_ (~terje@63-154-135-215.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[3:34] * mattbenjamin (~matt@adsl-75-45-227-140.dsl.sfldmi.sbcglobal.net) has left #ceph
[3:39] * gucki (~smuxi@46-126-114-222.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[3:39] * gucki_ (~smuxi@46-126-114-222.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[3:44] * Cube1 (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[3:46] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[3:56] * terje_ (~terje@63-154-135-215.mpls.qwest.net) has joined #ceph
[3:58] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[3:59] * agh (~agh@www.nowhere-else.org) has joined #ceph
[4:04] * terje_ (~terje@63-154-135-215.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[4:06] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[4:06] * jluis (~JL@89.181.159.29) has joined #ceph
[4:12] * joao (~JL@89-181-157-150.net.novis.pt) Quit (Ping timeout: 480 seconds)
[4:22] * terje (~terje@63-154-135-215.mpls.qwest.net) has joined #ceph
[4:26] * BManojlovic (~steki@85.222.220.14) Quit (Ping timeout: 480 seconds)
[4:30] * terje (~terje@63-154-135-215.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[4:42] * terje (~joey@63-154-153-172.mpls.qwest.net) has joined #ceph
[4:49] * sagelap (~sage@76.89.177.113) has joined #ceph
[4:49] * terje (~joey@63-154-153-172.mpls.qwest.net) Quit (Read error: Operation timed out)
[4:51] * Ryan_Lane1 (~Adium@216.38.130.165) Quit (Quit: Leaving.)
[4:53] * KindOne (~KindOne@50.96.224.113) Quit (Remote host closed the connection)
[4:59] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[5:10] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[5:12] * KindOne (KindOne@50.96.224.113) has joined #ceph
[5:15] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[5:19] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[5:24] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:24] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:34] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:01] * dmick (~dmick@2607:f298:a:607:a808:7ad6:87fc:bbed) Quit (Quit: Leaving.)
[6:16] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[6:17] * agh (~agh@www.nowhere-else.org) Quit (Remote host closed the connection)
[6:20] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:02] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[8:07] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[8:17] * terje_ (~joey@63-154-142-153.mpls.qwest.net) has joined #ceph
[8:24] * terje_ (~joey@63-154-142-153.mpls.qwest.net) Quit (Read error: Operation timed out)
[8:33] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[8:40] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[8:51] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[9:07] * terje_ (~joey@63-154-135-83.mpls.qwest.net) has joined #ceph
[9:10] * sleinen1 (~Adium@user-23-11.vpn.switch.ch) Quit (Quit: Leaving.)
[9:15] * terje_ (~joey@63-154-135-83.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[9:20] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:30] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[9:31] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:32] * terje_ (~terje@63-154-135-83.mpls.qwest.net) has joined #ceph
[9:38] * terje__ (~joey@63-154-135-83.mpls.qwest.net) has joined #ceph
[9:40] * terje_ (~terje@63-154-135-83.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[9:41] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[9:41] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:46] * terje__ (~joey@63-154-135-83.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[9:47] * terje_ (~terje@63-154-135-83.mpls.qwest.net) has joined #ceph
[9:48] * ScOut3R (~ScOut3R@dsl5401A397.pool.t-online.hu) has joined #ceph
[9:52] * terje_ (~terje@63-154-135-83.mpls.qwest.net) Quit (Read error: Operation timed out)
[10:03] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[10:05] * yanzheng (~zhyan@jfdmzpr03-ext.jf.intel.com) Quit (Remote host closed the connection)
[10:18] * tnt (~tnt@112.169-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[10:25] * ScOut3R (~ScOut3R@dsl5401A397.pool.t-online.hu) Quit (Remote host closed the connection)
[10:36] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[10:50] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Quit: WeeChat 0.3.2)
[11:00] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[11:09] * gucki (~smuxi@46-126-114-222.dynamic.hispeed.ch) has joined #ceph
[11:11] * gucki_ (~smuxi@46-126-114-222.dynamic.hispeed.ch) has joined #ceph
[11:29] * Meths_ (~meths@2.27.95.119) has joined #ceph
[11:35] * Meths (~meths@2.25.214.88) Quit (Ping timeout: 480 seconds)
[11:58] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[11:59] * The_Bishop__ (~bishop@f052100073.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[12:03] * terje_ (~joey@63-154-142-71.mpls.qwest.net) has joined #ceph
[12:04] * LeaChim (~LeaChim@b01bde88.bb.sky.com) has joined #ceph
[12:06] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[12:08] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[12:11] * terje_ (~joey@63-154-142-71.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[12:12] * LeaChim (~LeaChim@b01bde88.bb.sky.com) Quit (Ping timeout: 480 seconds)
[12:21] * LeaChim (~LeaChim@b01bde88.bb.sky.com) has joined #ceph
[12:22] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[12:28] * terje_ (~joey@63-154-154-188.mpls.qwest.net) has joined #ceph
[12:36] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[12:36] * terje_ (~joey@63-154-154-188.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[12:37] * sleinen1 (~Adium@2001:620:0:25:48b6:a681:682e:105d) has joined #ceph
[12:38] * terje_ (~joey@63-154-154-188.mpls.qwest.net) has joined #ceph
[12:44] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[12:45] * terje_ (~joey@63-154-154-188.mpls.qwest.net) Quit (Read error: Operation timed out)
[12:48] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[12:50] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit ()
[12:52] * madkiss (~madkiss@p5792CB10.dip.t-dialin.net) has joined #ceph
[12:53] * terje_ (~joey@63-154-154-188.mpls.qwest.net) has joined #ceph
[13:02] * terje_ (~joey@63-154-154-188.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[13:22] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[13:22] * benner_ (~benner@193.200.124.63) Quit (Read error: Connection reset by peer)
[13:55] * madkiss (~madkiss@p5792CB10.dip.t-dialin.net) Quit (Quit: Leaving.)
[13:56] * madkiss (~madkiss@p5792CB10.dip.t-dialin.net) has joined #ceph
[13:57] * madkiss (~madkiss@p5792CB10.dip.t-dialin.net) Quit ()
[13:58] <exec> folks, is it safe or not to do upgrade (agronaut => 0.56) in degraded (recovering) mode?
[13:58] <nhm> exec: I don't know for sure, but why risk it?
[13:59] <exec> from upgrading howto I see that new monitors can work well with old osd. I can even stop full cluster and do upgrad in offine mode.
[14:00] <exec> nhm: If I prepare upgrade, I'd see clean state of cluster
[14:00] <exec> perhaps there are some unexpected behavior.
[14:01] <exec> but right now I don't want to wait )
[14:02] <nhm> btw, you may want to wait for 0.56.1
[14:03] <exec> something is broken in 0.56?
[14:04] <exec> I wan't to upgrade to bobtail, but can't wait )
[14:05] <nhm> exec: 0.56 has a bug that makes it not interact with pre-0.56 ceph software.
[14:06] * BManojlovic (~steki@85.222.220.14) has joined #ceph
[14:06] <nhm> exec: also, I believe 0.56.1 will have the fix for the xfs/ext4 commit bug mentioned on the mailing list last night as well.
[14:08] * korgon (~Peto@isp-korex-15.164.61.37.korex.sk) has joined #ceph
[14:09] <exec> nhm: thanks. do you know any ETA of release?
[14:09] <nhm> exec: I think it's being released on monday.
[14:10] <nhm> or that's the plan assuming testing goes well over the weekend.
[14:10] <exec> nhm: it's acceptable for me. thanks again.
[14:11] <nhm> exec: no problem. You may want to ask on the mailing list about upgrading in the degraded state too.
[14:11] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[14:12] * terje_ (~terje@63-154-154-188.mpls.qwest.net) has joined #ceph
[14:21] * terje_ (~terje@63-154-154-188.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[14:26] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[15:09] * Meths_ is now known as Meths
[15:10] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Quit: Ex-Chat)
[15:34] * terje_ (~joey@63-154-134-106.mpls.qwest.net) has joined #ceph
[15:42] * terje_ (~joey@63-154-134-106.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[16:05] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[16:05] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:38] * terje_ (~terje@63-154-151-187.mpls.qwest.net) has joined #ceph
[16:41] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[16:42] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:42] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) has joined #ceph
[16:46] * terje_ (~terje@63-154-151-187.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[16:48] * mattbenjamin (~matt@adsl-75-45-227-140.dsl.sfldmi.sbcglobal.net) has joined #ceph
[16:55] * BManojlovic (~steki@85.222.220.14) Quit (Ping timeout: 480 seconds)
[16:58] * terje_ (~terje@63-154-151-187.mpls.qwest.net) has joined #ceph
[17:01] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[17:02] <Kioob> nhm : I need to reformat all OSD backend (to try BCache). Can I stop OSD, flush journal, move all files on an other disk, then restore that ? no inode/hardlink problem ?
[17:02] <Kioob> or should I just format the OSD, then enjoy the recovery of Ceph ?
[17:05] * terje_ (~terje@63-154-151-187.mpls.qwest.net) Quit (Read error: Operation timed out)
[17:15] * sleinen1 (~Adium@2001:620:0:25:48b6:a681:682e:105d) Quit (Quit: Leaving.)
[17:20] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[17:20] * loicd (~loic@2a01:e35:2eba:db10:293c:a746:1d27:a95) has joined #ceph
[17:20] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[17:36] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[17:42] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[17:48] <sagelap> kioob: there are hardlinks. and xattrs. not sure how well various tools do at preserving those.
[17:49] <sagelap> i would let ceph do the recovery. add new osd, let it recover, remove old one.
[17:51] <Kioob> ok, thanks :)
[17:53] <Kioob> mmm, but I can't add new osd without removing the old one. I need to reformat the hard drive of the OSD
[17:53] <sagelap> mark one osd out, wait for ceph to recover, then reformat that disk and add it back in
[17:53] <sagelap> repeat for the others.
[17:53] <Kioob> ok thanks
[17:53] <Kioob> It will not throw a balance ?
[17:54] <sagelap> can overlap marking out the next osd to offload with letting data migrate back onto the reformatted osd
[17:54] <sagelap> not unless the cluster is already almost full
[17:54] <Kioob> oh no... 25% usage
[17:54] <Kioob> and there is 32 OSD. So removing one is not a big problem
[17:57] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:01] * Oliver1 (~oliver1@p5483BDFE.dip.t-dialin.net) has joined #ceph
[18:14] <paravoid> so, my cluster finally resynced
[18:14] <paravoid> but mostly
[18:14] <paravoid> now there's 0.184% degraded and 93 pgs active+remapped
[18:14] <paravoid> and the degraded ones keep increasing and I have no idea why :)
[18:14] <paravoid> the pg dump | grep remapped output doesn't point to a specific OSD
[18:20] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[18:20] <paravoid> so, stuck unclean, but I see no unfound objects
[18:21] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) Quit (Quit: Leseb)
[18:26] * themgt (~themgt@71-90-234-152.dhcp.gnvl.sc.charter.com) has joined #ceph
[18:30] * Disaster123 (~Disaster1@p5B09DEAC.dip.t-dialin.net) has joined #ceph
[18:30] * gaveen (~gaveen@112.135.132.43) has joined #ceph
[18:31] <Disaster123> hi i just posted to the mailinglist some minutes ago. My whole ceph cluster keeps crashing after changing the crushmap and ceph finished remapping.
[18:32] <paravoid> sagelap: btw, it appears that without wip-3714 + increased heartbeat rate I can't restart OSDs at all, they just never recover
[18:33] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:39] * CloudGuy (~CloudGuy@5356416B.cm-6-7b.dynamic.ziggo.nl) has joined #ceph
[18:40] * Oliver1 (~oliver1@p5483BDFE.dip.t-dialin.net) Quit (Quit: Leaving.)
[18:42] * jlogan1 (~Thunderbi@2600:c00:3010:1:519e:fddc:b274:5689) has joined #ceph
[18:54] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) has joined #ceph
[18:57] <sage> disaster123: stefan?
[18:57] <sage> what was the crush change?
[18:58] * terje_ (~terje@63-154-144-220.mpls.qwest.net) has joined #ceph
[19:02] * gaveen (~gaveen@112.135.132.43) Quit (Remote host closed the connection)
[19:07] * terje_ (~terje@63-154-144-220.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[19:12] <elder> sage, OK if I push the 3.7 based testing and master branches?
[19:13] <sage> if it's only stuff that isn't yet upstream.. nothing new in master right?
[19:13] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[19:13] <sage> oh, just newly reviewed patches?
[19:14] <elder> master is the same as it was before, just based on 3.7 (and yes, only reviewed patches)
[19:15] <elder> New testing is the rest of what's in current testing, rebased on top of that new master.
[19:15] <elder> It's master-next if you want to glance at it.
[19:15] <sage> sounds good.
[19:15] <elder> OK. I'll double check before the push but it should be fine>
[19:15] <sage> hmm but the stuff that was in master that we sent to linus was based on 3.6, and merged for 3.8-rc1
[19:16] <sage> does that mean those patches are duplicated in the history?
[19:16] <sage> seems like it should be rebased only on top of what was merged upstream, or confusion will ensue
[19:16] <elder> I'll look into that, and will change things around and not push it out for now if that's the case.
[19:17] <elder> Still haven't resolved the 3.8-rc1 + build issue. I don't know if I should be looking at that though. I can make progress on rbd using 3.7 code.
[19:17] <Disaster123> sage: yes
[19:17] <Disaster123> sorry for the late reply
[19:18] * gaveen (~gaveen@112.135.137.137) has joined #ceph
[19:24] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[19:29] * Oliver1 (~oliver1@p5483BDFE.dip.t-dialin.net) has joined #ceph
[19:33] <CloudGuy> hi all
[19:35] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[19:36] * themgt (~themgt@71-90-234-152.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[19:36] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[19:45] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[20:06] * nwat (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[20:31] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[20:39] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[20:44] * jlogan1 (~Thunderbi@2600:c00:3010:1:519e:fddc:b274:5689) Quit (Ping timeout: 480 seconds)
[20:56] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[20:58] * madkiss (~madkiss@p5792CB10.dip.t-dialin.net) has joined #ceph
[20:59] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[21:02] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[21:21] * LeaChim (~LeaChim@b01bde88.bb.sky.com) Quit (Ping timeout: 480 seconds)
[21:22] * jlogan1 (~Thunderbi@2600:c00:3010:1:519e:fddc:b274:5689) has joined #ceph
[21:24] * LeaChim (~LeaChim@b01bde88.bb.sky.com) has joined #ceph
[21:26] * madkiss (~madkiss@p5792CB10.dip.t-dialin.net) Quit (Quit: Leaving.)
[21:33] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[21:43] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:43] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit ()
[21:46] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[21:52] * Oliver1 (~oliver1@p5483BDFE.dip.t-dialin.net) Quit (Quit: Leaving.)
[21:54] * terje (~terje@63-154-133-8.mpls.qwest.net) has joined #ceph
[22:02] * terje (~terje@63-154-133-8.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[22:07] * nwat (~Adium@soenat3.cse.ucsc.edu) has left #ceph
[22:11] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[22:18] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[22:30] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[22:36] * Disaster123 (~Disaster1@p5B09DEAC.dip.t-dialin.net) Quit ()
[22:44] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[22:48] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[22:48] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[22:53] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[22:54] * terje (~terje@63-154-153-147.mpls.qwest.net) has joined #ceph
[23:01] * terje (~terje@63-154-153-147.mpls.qwest.net) Quit (Read error: Operation timed out)
[23:04] * terje_ (~joey@63-154-153-147.mpls.qwest.net) has joined #ceph
[23:07] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[23:09] * terje__ (~terje@63-154-153-147.mpls.qwest.net) has joined #ceph
[23:12] * terje_ (~joey@63-154-153-147.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[23:13] * danieagle (~Daniel@177.97.248.208) has joined #ceph
[23:14] * madkiss (~madkiss@p5792CB10.dip.t-dialin.net) has joined #ceph
[23:17] * terje__ (~terje@63-154-153-147.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[23:18] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[23:19] * terje_ (~terje@63-154-153-147.mpls.qwest.net) has joined #ceph
[23:24] * jlogan1 (~Thunderbi@2600:c00:3010:1:519e:fddc:b274:5689) Quit (Ping timeout: 480 seconds)
[23:27] * BManojlovic (~steki@15-167-222-85.adsl.verat.net) has joined #ceph
[23:27] * terje_ (~terje@63-154-153-147.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[23:29] * terje_ (~joey@63-154-153-147.mpls.qwest.net) has joined #ceph
[23:37] * terje_ (~joey@63-154-153-147.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[23:40] * terje_ (~joey@63-154-153-147.mpls.qwest.net) has joined #ceph
[23:47] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[23:48] * terje_ (~joey@63-154-153-147.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[23:50] * loicd (~loic@2a01:e35:2eba:db10:293c:a746:1d27:a95) Quit (Quit: Leaving.)
[23:50] * loicd (~loic@2a01:e35:2eba:db10:341e:1de2:f12f:17e4) has joined #ceph
[23:51] * gaveen (~gaveen@112.135.137.137) Quit (Remote host closed the connection)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.