#ceph IRC Log

Index

IRC Log for 2012-10-16

Timestamps are in GMT/BST.

[0:00] <gregaf> and keep in mind if your journal and store are on the same block device that's twice the writes again
[0:00] <phantomcircuit> yeah it's all on the same ssd
[0:00] <phantomcircuit> which is on top of an dm-crypt layer
[0:01] <phantomcircuit> so pretty much worst case the entire way down the line
[0:03] <phantomcircuit> nhm_, lol 32MB blocks got it upto 10.5 MB/s
[0:03] <phantomcircuit> that's ridiculousness but i guess with the high latency the entire way down the path it makes sense
[0:04] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has left #ceph
[0:04] <elder> sage, I'm a little concerned about consumption of major block numbers. Each mapped device consumes one.
[0:05] <scalability-junk> phantomcircuit, I think about an even worse "production" setup. non controlled network with 1gb/s between nodes and monitor and control stuff probably under my control...
[0:06] <scalability-junk> actually I'm thinking about encrypted traffic between the nodes for security reasons in the non controlled network...
[0:06] <scalability-junk> but that's probably l�ke hey shoot yourself :)
[0:06] <elder> So if we start having clones and their backing snapshots consuming one each they will disappear fast. I think there are only 255 of them.
[0:07] <sage> hmm.. woudln't it be one per parent?
[0:07] <elder> One for the mapped device, and one for its parent.
[0:08] <elder> If you start layering deeper then they go deeper. If you set up multiple rbd devices then that all multiplies.
[0:08] <sage> so it goes from 1 per user-visible device to 2.. right?
[0:08] <sage> i woudln't expect them to clone clones very often
[0:09] <elder> Right. It scales on O(1) but the small 255 is what bothers me...
[0:09] <sage> yeah
[0:09] <dmick> linux majnum is still 1 byte? rly?
[0:09] * vata (~vata@2607:fad8:4:0:f94e:88e5:de00:e387) Quit (Quit: Leaving.)
[0:09] <elder> (I've had this mild concern before, this just puts the issue in starker relief)
[0:09] * loicd (~loic@63.133.198.91) has joined #ceph
[0:10] <sage> pre-cloning you mean
[0:10] <elder> dmick, I think it's not that, it's just a fixed-size array of major device numbers.
[0:10] <elder> Yes.
[0:10] <sage> is the major device still limited at 255 on modern kernels? it seems like lots of machine will have more disks than that...
[0:11] <elder> register_blkdev()
[0:11] * synapsr (~synapsr@ip68-8-15-212.sd.sd.cox.net) has joined #ceph
[0:11] <elder> returns a fixed allocated major number in the range [1..255]
[0:11] <elder> Maybe SCSI does something different.
[0:12] <elder> SCSI devices all have major 8, with miors jumping by 16 per device.
[0:12] <elder> I'll look at what they're doing, we should do the same...
[0:12] <sage> oh i see
[0:12] <dmick> yeah, some indication that minors can be 20 bits
[0:12] <sage> i see
[0:13] <dmick> but unclear; two different spaces, with encoding between?..or something
[0:13] <dmick> legacy vs new maybe
[0:14] <dmick> yeah, 12/20 for 2.6
[0:16] <elder> SCSI has three reserved ranges of major device numbers. 8, 65-136, and 136-143 (inclusive ranges)
[0:16] * rino (~rino@12.250.146.102) Quit (Quit: Will the REAL Slim BitchX please stand up?)
[0:17] <dmick> I would tend toward thinking that carving up minor space is a better move
[0:17] <dmick> (for rbd)
[0:17] <sage> yep
[0:17] <elder> Probably.
[0:17] <elder> I'm not going to worry about it now though. Just thought I'd mention it.
[0:17] <dmick> particularly since partitions are not very important thre
[0:18] <phantomcircuit> scalability-junk, lol yes i was going to be running this all over an insecure network with 10 gbps links and ~ 500 usec latency between hosts
[0:18] * loicd (~loic@63.133.198.91) Quit (Ping timeout: 480 seconds)
[0:18] <phantomcircuit> without the 10 gbps links this would just be pure insanity
[0:18] <sage> it sounds like an orthogonal fix to change teh device numbering
[0:18] <phantomcircuit> :)
[0:18] * rino (~rino@12.250.146.102) has joined #ceph
[0:18] <elder> Yes.
[0:18] <phantomcircuit> what kind of security does ceph have for unsecured network?
[0:18] <phantomcircuit> please tell me i dont have to setup like ipsec or something
[0:19] <sage> phantomcircuit: kerberos-like.
[0:19] * synapsr (~synapsr@ip68-8-15-212.sd.sd.cox.net) Quit (Remote host closed the connection)
[0:19] <phantomcircuit> sage, so tokens with a central auth server?
[0:19] <sage> yeah
[0:20] <sage> where 'central' is the monitor cluster (so, fault tolerant and ha etc)
[0:20] <phantomcircuit> ah
[0:20] <phantomcircuit> neat
[0:21] <phantomcircuit> so i setup a threaded (python so only sort of threaded with the GIL) bench mark
[0:21] <phantomcircuit> 7.05772900581 seconds
[0:21] <phantomcircuit> IOPS 10000
[0:21] <phantomcircuit> bytes 10240000
[0:21] <phantomcircuit> 100 threads 100 iterations 1024 bytes/fdatasync
[0:21] <phantomcircuit> 1500 IOPS/s
[0:22] <phantomcircuit> the under lying ssd can do about 3k IOPS
[0:22] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[0:22] <phantomcircuit> so that seems perfectly reasonable for a mirrored volume
[0:22] <phantomcircuit> loverly
[0:22] <dmick> cool
[0:23] <phantomcircuit> where is says IOPS it should say i guess IOP
[0:23] <phantomcircuit> heh
[0:23] <scalability-junk> phantomcircuit, so you mean 1gb/s would be insane? damn
[0:23] <phantomcircuit> scalability-junk, well 1 gbps is ~125MB/s
[0:24] <scalability-junk> phantomcircuit, still you would need some kind of encryption. auth is not the whole thing, or at least it shouldn't be
[0:24] <phantomcircuit> which is more or less what a single conventional hdd can do sequential read/write
[0:24] <phantomcircuit> so 1 gbps is insane unless you've got two nics
[0:24] <phantomcircuit> one for internetz one for io
[0:24] * stxShadow (~Jens@ip-178-203-169-190.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[0:25] <scalability-junk> damn when I use the easier solution I only got 1gb/s
[0:25] <scalability-junk> with the more difficult one I have 2 nics...
[0:28] <scalability-junk> so minimum 2gb/s alright I'll try, at least the encryption issue would be gone
[0:30] <phantomcircuit> scalability-junk, well to give you an idea
[0:30] <phantomcircuit> im talking about putting like
[0:30] <phantomcircuit> several hundred vms on a single box
[0:30] <phantomcircuit> so for me it would be insane
[0:30] <phantomcircuit> for you who knows
[0:30] <scalability-junk> you said you are putting the ceph osds on a sort of public network right?
[0:31] * scuttlemonkey (~scuttlemo@63.133.198.36) Quit (Quit: This computer has gone to sleep)
[0:31] <scalability-junk> how do you handle openstack with a public network (or what do you use for virtualisation)?
[0:31] <phantomcircuit> scalability-junk, it's a shitty datacenter whose network i 100% do not trust
[0:31] <phantomcircuit> openstack is mondo to complicated
[0:31] <phantomcircuit> i wrote my own (admittedly shitty) libvirt web controller
[0:32] <scalability-junk> haha ok but how do you secure the traffic between your libvirt hosts?
[0:33] <phantomcircuit> libvirt support ssl connections with a pki
[0:33] <scalability-junk> ah alright
[0:33] <phantomcircuit> i guess i could run it all over ipsec also if i ahd to
[0:33] <phantomcircuit> had*
[0:36] <scalability-junk> yeah probably
[0:36] * scalability-junk has to rethink his infrastructure vision :P
[0:37] * miroslavk (~miroslavk@63.133.198.36) has joined #ceph
[0:44] * jjgalvez (~jjgalvez@12.248.40.138) Quit (Quit: Leaving.)
[0:48] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[0:50] * loicd (~loic@63.133.198.91) has joined #ceph
[0:50] * loicd (~loic@63.133.198.91) Quit ()
[0:53] * loicd (~loic@63.133.198.91) has joined #ceph
[0:54] * jlogan1 (~Thunderbi@2600:c00:3010:1:3ca8:5928:4097:8bcd) Quit (Quit: jlogan1)
[0:57] * synapsr (~synapsr@63.133.198.91) has joined #ceph
[0:59] * nwatkins (~nwatkins@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[0:59] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: rcirc on GNU Emacs 24.2.1)
[1:00] * nwatkins (~nwatkins@c-50-131-197-174.hsd1.ca.comcast.net) Quit ()
[1:01] * The_Bishop (~bishop@e179000062.adsl.alicedsl.de) has joined #ceph
[1:05] * BManojlovic (~steki@212.200.241.182) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:08] * Cube1 (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[1:08] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[1:18] * synapsr (~synapsr@63.133.198.91) Quit (Remote host closed the connection)
[1:18] * scuttlemonkey (~scuttlemo@63.133.198.36) has joined #ceph
[1:20] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) has joined #ceph
[1:22] * scuttlemonkey (~scuttlemo@63.133.198.36) Quit ()
[1:23] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[1:26] * miroslavk (~miroslavk@63.133.198.36) Quit (Quit: Leaving.)
[1:27] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[1:35] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[1:53] * Tv_ (~tv@2607:f298:a:607:1859:b94f:f46e:5085) Quit (Quit: Tv_)
[2:00] * lofejndif (~lsqavnbok@04ZAAAROS.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[2:05] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) has joined #ceph
[2:09] * tightwork (~tightwork@142.196.239.240) has joined #ceph
[2:11] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[2:14] * aliguori (~anthony@cpe-70-123-129-122.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:24] * aliguori (~anthony@cpe-70-123-146-246.austin.res.rr.com) has joined #ceph
[2:25] * sjusthm (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[2:28] * synapsr (~synapsr@63.133.198.91) has joined #ceph
[2:32] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) has joined #ceph
[2:32] <dty> is there anyway to enumerate the users in the radosgw-admin?
[2:35] <gregaf> dty: hmm, I don't think there is — yehudasa isn't around right now but should know if there's a reason for that or not
[2:36] * synapsr (~synapsr@63.133.198.91) Quit (Ping timeout: 480 seconds)
[2:36] * miroslavk (~miroslavk@63.133.198.36) has joined #ceph
[2:37] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) Quit (Read error: No route to host)
[2:40] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) has joined #ceph
[2:45] * miroslavk (~miroslavk@63.133.198.36) Quit (Ping timeout: 480 seconds)
[2:45] * nwatkins (~nwatkins@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[2:47] * nwatkins (~nwatkins@c-50-131-197-174.hsd1.ca.comcast.net) Quit ()
[2:57] * miroslavk (~miroslavk@63.133.196.10) has joined #ceph
[2:58] <dmick> dty: no, I don't think there is
[2:58] * aaron looks at http://ceph.com/wiki/Cluster_configuration
[2:58] <aaron> so if you want to use the drive uuid, do you use uuid=x or just x in ceph.conf?
[2:59] <dmick> dty: you can fake it with rados -p .users.uid ls, and some filtering
[3:00] <dmick> this is course is not something you should rely on
[3:00] <dmick> aaron: please avoid the wiki docs; we're trying hard to remove them. good stuff is at ceph.com/docs
[3:01] <aaron> ok
[3:01] <aaron> sometimes they explain things from a different angle, so I tend to check both
[3:01] <dmick> yeah, and sometimes the wiki is good info, but it's sometimes not
[3:03] <dmick> as for drive uuid, I believe the advice there is to use /dev/disk/by-uuid/<uuid>
[3:03] <dty> thanks dmick, i will work around it for now and hopefully I can look into patching this functionality in later
[3:04] <dmick> that is, the path to the device is just based on uuid instead
[3:04] <aaron> ahh, thanks
[3:05] * synapsr (~synapsr@12.180.144.3) has joined #ceph
[3:06] <dmick> but also I believe it's now deprecated to have mkcephfs do the format/mount for you, IIRC
[3:08] <dty> is there any benefit to having radosgw span multiple pools?
[3:09] <gregaf> dty: you mean multiple data storage pools?
[3:10] <gregaf> not presently
[3:10] <dty> yeah, what does 'radosgw-admin pool add'
[3:10] <gregaf> it serves a couple of purposes
[3:11] <gregaf> one, working around the fact that RADOS doesn't currently do PG splitting, so you can eg add new pools if your cluster grows too large
[3:11] <gregaf> second, at some point in the future a likely feature is the ability to direct some users or buckets into different pools so you can provide faster or more reliable or cheaper storage to people
[3:12] <gregaf> I think it may also have had some vestigial purposes that don't matter any more, but I don't remember for sure
[3:15] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[3:15] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:16] <dty> thanks that is helpful, what would you ball park 'too large' to be?
[3:17] <gregaf> it's relative to the starting size and how many PGs you began with
[3:17] <gregaf> general recommendation is that you should have 50-200 PGs/OSD (though this varies somewhat based on how many pools you have, etc)
[3:18] <gregaf> you can go quite a bit more on a node without running into trouble on a reasonable node, but if you get much less then you start to have bad distribution of your data
[3:18] <gregaf> so if you start out with one node and 100 PGs, and then you grow to 50 OSDs, then that's "too large"
[3:19] <dty> oh, ok
[3:19] <gregaf> if you grow up to 800 OSDs and have 80000 PGs in your main pool, you're doing fine
[3:20] <gregaf> PG splitting (and then merging) is being worked on now, so this part will be moot in the foreseeable future anyway (at least if you're running new code; I don't know if it's planned for Bobtail or a development release shortly following it)
[3:26] <dty> sorry for all the questions, is there a way to calculate per user or per bucket usage ( I see the usage show commands, though that seems to be aggregate bytes sent/recieved)
[3:26] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[3:27] <gregaf> there is; I'm afraid I don't know what you can get with and without log scraping, though
[3:27] <gregaf> if you can ask tomorrow during Pacific business hours then yehudasa will be here, or you can email the mailing list about it
[3:28] <dty> thanks, i just found 'bucket stats' so that should get me what I need
[3:28] * synapsr (~synapsr@12.180.144.3) Quit (Remote host closed the connection)
[3:30] * miroslavk (~miroslavk@63.133.196.10) Quit (Quit: Leaving.)
[3:31] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[3:31] <gregaf> okay, I'm off for the evening; later guys
[3:31] * Cube1 (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[3:31] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[3:32] * maelfius (~mdrnstm@113.sub-70-197-139.myvzw.com) has joined #ceph
[3:42] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[3:42] * loicd (~loic@12.180.144.3) has joined #ceph
[3:43] * Cube1 (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[3:44] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit ()
[3:52] * rino (~rino@12.250.146.102) has left #ceph
[4:02] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[4:11] * Cube1 (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[4:35] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[4:35] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit ()
[4:52] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) Quit (Quit: This computer has gone to sleep)
[5:14] * aliguori (~anthony@cpe-70-123-146-246.austin.res.rr.com) Quit (Remote host closed the connection)
[5:20] * maelfius (~mdrnstm@113.sub-70-197-139.myvzw.com) Quit (Read error: Connection reset by peer)
[5:26] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:27] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:28] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:28] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:34] * nwatkins (~nwatkins@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:38] * Cube2 (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[5:40] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:40] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:44] * Cube1 (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:46] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:46] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:51] * gaveen (~gaveen@112.134.113.174) has joined #ceph
[5:58] * glowell1 (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[5:58] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:59] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:59] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[6:03] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) has joined #ceph
[6:05] * dmick (~dmick@38.122.20.226) Quit (Quit: Leaving.)
[6:06] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) Quit (Quit: dty)
[6:07] * slang (~slang@ace.ops.newdream.net) Quit (Quit: slang)
[6:11] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[6:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[6:21] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[6:40] * silversu_ (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[6:40] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Read error: Connection reset by peer)
[6:41] * benner (~benner@193.200.124.63) Quit (Read error: Connection reset by peer)
[6:49] * Q (~Q@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[6:49] * Q is now known as Guest1874
[6:51] * Guest1874 is now known as Q310
[7:05] * benpol (~benp@garage.reed.edu) Quit (Ping timeout: 480 seconds)
[7:08] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[7:08] * loicd (~loic@12.180.144.3) Quit (Ping timeout: 480 seconds)
[7:09] * benpol (~benp@garage.reed.edu) has joined #ceph
[7:15] <Q310> anyone using ceph rbd with openstack here?
[7:15] * nwatkins (~nwatkins@c-50-131-197-174.hsd1.ca.comcast.net) has left #ceph
[7:20] * tightwork (~tightwork@142.196.239.240) Quit (Ping timeout: 480 seconds)
[7:47] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[7:52] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) has joined #ceph
[8:10] * sjusthm (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:27] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[8:48] * benpol (~benp@garage.reed.edu) Quit (Ping timeout: 480 seconds)
[8:52] * loicd (~loic@12.180.144.3) has joined #ceph
[8:56] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[9:21] * verwilst (~verwilst@d5152D6B9.static.telenet.be) has joined #ceph
[9:28] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Quit: Leaving.)
[9:29] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) Quit (Quit: This computer has gone to sleep)
[9:31] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:32] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:42] * hdeshev (~hdeshev@92-247-236-146.spectrumnet.bg) has joined #ceph
[9:49] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[9:56] * synapsr (~synapsr@12.180.144.3) has joined #ceph
[10:03] * maelfius (~mdrnstm@55.sub-70-197-144.myvzw.com) has joined #ceph
[10:16] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) has joined #ceph
[10:24] * deepsa (~deepsa@117.199.127.42) Quit (Remote host closed the connection)
[10:29] * deepsa (~deepsa@117.212.21.91) has joined #ceph
[10:31] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[10:32] <todin> morning
[10:39] <joao> morning
[10:43] * hdeshev (~hdeshev@92-247-236-146.spectrumnet.bg) has left #ceph
[10:54] * miroslavk (~miroslavk@63.133.196.10) has joined #ceph
[11:15] * miroslavk (~miroslavk@63.133.196.10) Quit (Quit: Leaving.)
[11:16] * miroslavk (~miroslavk@63.133.196.10) has joined #ceph
[11:16] * deepsa (~deepsa@117.212.21.91) Quit (Remote host closed the connection)
[11:17] * deepsa (~deepsa@117.212.21.91) has joined #ceph
[11:21] * maelfius (~mdrnstm@55.sub-70-197-144.myvzw.com) Quit (Quit: Leaving.)
[11:34] * miroslavk (~miroslavk@63.133.196.10) Quit (Quit: Leaving.)
[12:09] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:30] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[12:42] <lotia> anyone runnning btrfs in production?
[12:47] <tziOm> lolita: noone
[12:47] <lotia> tziOm: thanks. so currently XFS is the way to go?
[12:47] <todin> lotia: we do, hat semi production systems
[12:48] <tziOm> I dunno, really
[12:48] <tziOm> has crashed for me, it thats what you ask
[12:48] <tziOm> but people are even running windows in production, so its really up to you
[12:49] <todin> ceph on btrfs is stable for me since dec 2011
[13:00] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:01] * benpol (~benp@garage.reed.edu) has joined #ceph
[13:03] <lotia> todin tziOm are either of you using ceph to provide block storage service for openstack. and if so, anyone running centos guests?
[13:08] * gaveen (~gaveen@112.134.113.174) Quit (Ping timeout: 480 seconds)
[13:10] <zynzel> any idea: 2012-10-16 11:04:50.225681 mon.0 [INF] pgmap v991: 960 pgs: 960 active+clean; 503 GB data, 15754 MB used, 4685 MB / 20440 MB avail
[13:10] <zynzel> after rados load-gen ;) 503GB data, total storage 20GB :)
[13:12] * silversu_ (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[13:12] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[13:18] * gaveen (~gaveen@112.134.113.19) has joined #ceph
[13:22] <jamespage> lotia, I've been testing ceph block storage with openstack (albeit not with centos guests) - whats you question?
[13:25] * Cube2 (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[13:33] <lotia> jamespage: i'm curious about if the guests need recent kernsls or if they don't care what sort of block device they've been handed
[13:33] <jamespage> lotia, the answer to that is that it depends
[13:33] * mgalkiewicz (~mgalkiewi@staticline-31-182-149-180.toya.net.pl) has joined #ceph
[13:34] <jamespage> lotia, ceph gets access in different ways depending on which bit of openstack we are talking about
[13:34] <lotia> jamespage: do elaborate please
[13:34] <jamespage> lotia, glance uses python-ceph which wraps ceph libraries to talk to ceph
[13:34] <lotia> so my use case here is the block device the guests use.
[13:35] <jamespage> lotia, using libvirt?
[13:35] <lotia> jamespage: yes.
[13:35] <jamespage> lotia, OK - so infact the kernel is not used at all for libvirt - libvirt uses librbd directly
[13:36] <jamespage> I am of course making the assumption that the libvirt you are using has been linked/built against librbd....
[13:36] <jamespage> (it has in the one that I use)
[13:36] <lotia> jamespage: so the guest machines need know nothing. the hosts will likely have shinier kernels.
[13:37] <lotia> so your compute nodes run what distro?
[13:37] <jamespage> lotia, guests know nothing about where the storage is coming from - thats abstracted by libvirt
[13:37] <jamespage> lotia, Ubuntu 12.04/12.10
[13:37] <jamespage> libvirt for those two releases is built with librbd
[13:37] <lotia> jamespage: thanks
[13:38] <jamespage> lotia, like I said the kernel is irrelevant; I've been testing using a kernel which does not actually have the rbd module....
[13:38] * gaveen (~gaveen@112.134.113.19) Quit (Quit: Leaving)
[13:39] <lotia> happy enough with performance? also what does your ceph cluster look like HW wise?
[13:39] <jamespage> lotia, lol
[13:40] <jamespage> lotia, ATM I'm actually testing stuff virtually (so openstack ontop of openstack)
[13:40] <lotia> aha. i was wondering if you were going to tell me you work for inktank!
[13:40] <jamespage> lotia, nope - Canonical
[13:41] <lotia> and i'm guessing with the openstack repo enabled I get the nicest shiniest versions available (folsom)
[13:41] <jamespage> lotia, so ubuntu publishes openstack folsom for Ubuntu 12.10 (released this week) and for 12.04 using the cloud-archive
[13:42] <lotia> where nicest and shiniest need not be mutually exclusive.
[13:42] <jamespage> HOWEVER...
[13:42] <jamespage> that does not include a new version of ceph in the cloud archive
[13:42] <jamespage> yet
[13:42] <jamespage> :-)
[13:42] <lotia> so would I have to build from scratch?
[13:42] * tightwork (~tightwork@142.196.239.240) has joined #ceph
[13:42] <lotia> or can i enable the openstack repo?
[13:43] <jamespage> lotia, for ceph?
[13:43] <lotia> i'm guessing while a dependency it gets pulled from elsewhere?
[13:44] <jamespage> lotia, ceph is in the Ubuntu 12.04 main archive - but its a older version
[13:44] <lotia> any ceph ppas then?
[13:44] <lotia> i'm okay to package, but would prefer not to replicate work.
[13:44] <jamespage> lotia, yep - ppa:ceph-ubuntu/backports
[13:44] <lotia> standing on the shoulders of giants and all that good stuff.
[13:44] <lotia> fantastic.
[13:45] <jamespage> lotia, thats a no-change backport of whats in Ubuntu 12.10 for 12.04
[13:45] <lotia> i may as well start with 12.10 unless there are gotchas
[13:46] <jamespage> lotia, are you using cephx?
[13:46] <lotia> jamespage: currently researching setting up a ceph cluster. so using not much of anything.
[13:46] <jamespage> lotia, ack - understand
[13:49] <lotia> jamespage: XFS for OBD devices or btrfs?
[13:50] <jamespage> lotia, current recommendation is XFS
[13:51] <jamespage> I've also being doing some testing using ext4 - buts that's mainly due to the bits of ceph I'm using to bootstrap/configure my cluster
[13:55] <todin> lotia: I run mostly ubuntu guest in kvm.
[13:58] * tightwork (~tightwork@142.196.239.240) Quit (Ping timeout: 480 seconds)
[14:13] <lotia> todin: thanks. good to know. i'll likely be running centos/rhel guests.
[14:13] <lotia> http://www.sebastien-han.fr/blog/2012/06/10/introducing-ceph-to-openstack/ is this article representative of the current state of things with regards to ceph or has folsom provided new capability. LMK if this is OT
[14:25] <todin> lotia: I think the blog is quit good.
[14:28] <Leseb> lotia: folsom brought some new stuff like boot from volume
[14:30] * aliguori (~anthony@cpe-70-123-146-246.austin.res.rr.com) has joined #ceph
[14:31] <zynzel> why ceph try to write new data even if all nodes are full? (which crash all nodes in cluster...) ;)
[14:32] * deepsa (~deepsa@117.212.21.91) Quit (Quit: Computer has gone to sleep.)
[14:32] <zynzel> not really stable :)
[14:32] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[14:32] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has left #ceph
[14:38] * aaron is now known as Guest1916
[14:45] * slang (~slang@ace.ops.newdream.net) has joined #ceph
[15:13] * dty (~derek@testproxy.umiacs.umd.edu) has joined #ceph
[15:14] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[15:15] <zynzel> if anybody have the same problem: http://tracker.newdream.net/issues/2011
[15:16] * rlr219 (43c87e04@ircip1.mibbit.com) has joined #ceph
[15:17] <rlr219> good morning folks. Had a ceph-osd crash and when I try to restart it I get the following error: filestore(/home/cephbrick/ceph/osd) error (2) No such file or directory not handled on operation 17 (5023 5166.1.0, or op 0, counting from 0)
[15:18] <rlr219> Don't really see a way to fix it. Ideas??
[15:21] <slang> rlr219: morning!
[15:21] <slang> rlr219: does that path /home/cephbrick/ceph/osd exist on the node where the osd crashed?
[15:25] <rlr219> yes.
[15:26] <rlr219> and permissions are correct as well.
[15:27] <slang> rlr219: listing the directory should show this:
[15:27] <slang> ceph_fsid current fsid keyring magic ready store_version whoami
[15:30] <slang> rlr219: do you see those files in the directory?
[15:30] <rlr219> it does, as well as 2 snap_xxxxxxx files
[15:30] <slang> rlr219: ok
[15:34] <lotia> q
[15:36] <slang> rlr219: could you try restarting the osd with strace?
[15:36] <slang> strace -f ceph-osd -i <osd num> -c <ceph config>
[15:36] <slang> rlr219: that should tell us where the ENOENT happens
[15:37] <rlr219> sure. wait one
[15:38] <joao> --debug-osd 20 might prove useful too
[15:43] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[15:50] <rlr219> SLANG: the out put (what i could get) is here: http://pastebin.com/u6eKhng7
[15:57] <slang> rlr219: ok it looks like its failing to open an object that it expects to be there
[15:58] * maelfius (~mdrnstm@206.sub-70-197-141.myvzw.com) has joined #ceph
[15:58] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:59] <slang> rlr219: I'm not sure why - I think sjust or joshd are going to be the right people to point you at
[16:00] <slang> rlr219: they should be around in a bit
[16:00] <slang> rlr219: could you try joao's suggestion and try to start the osd with --debug-osd 20?
[16:01] <joao> while you're at it, --debug-filestore 20 and --debug-journal 20
[16:13] <rlr219> OK.
[16:26] * deepsa (~deepsa@117.212.21.91) has joined #ceph
[16:30] * NaioN (stefan@andor.naion.nl) Quit (Quit: Changing server)
[16:33] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[16:37] * mgalkiewicz (~mgalkiewi@staticline-31-182-149-180.toya.net.pl) Quit (Ping timeout: 480 seconds)
[16:37] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) has joined #ceph
[16:38] * maelfius (~mdrnstm@206.sub-70-197-141.myvzw.com) Quit (Ping timeout: 480 seconds)
[16:47] * tziOm (~bjornar@ti0099a340-dhcp0778.bb.online.no) has joined #ceph
[16:50] * joao (~JL@89.181.147.186) Quit (Quit: Leaving)
[16:53] <rlr219> Hi, when I run the --debug I don't seem to get any output, so I am runing it wrong or does it put the output somewhere?
[16:54] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) has joined #ceph
[16:57] * joao (~JL@89.181.147.186) has joined #ceph
[17:01] * sagelap (~sage@76.89.177.113) Quit (Ping timeout: 480 seconds)
[17:02] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[17:05] <rlr219> in the ceph.log I am seeing this now: ENOENT on clone suggests osd bug
[17:07] <lotia> anyone running 10Gbit cards in their cluster? if so, which nodes can best use them?
[17:07] <slang> rlr219: it writes the output to the osd log
[17:09] <slang> rlr219: can you post the full osd log to pastebin?
[17:10] <rlr219> sure. wait one please.
[17:11] <lotia> looking at http://ceph.com/docs/master/architecture/ suggests that the osds are the best candidates
[17:11] <slang> lotia: the osds handle the bulk of the network i/o, yep
[17:12] <rlr219> slang: the log is here: http://pastebin.com/S0a65XUm
[17:12] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Remote host closed the connection)
[17:13] <lotia> slang: thanks
[17:13] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) has joined #ceph
[17:14] * synapsr (~synapsr@12.180.144.3) Quit (Remote host closed the connection)
[17:15] <slang> rlr219: looks like you just recently upgraded to 0.48.1?
[17:16] <rlr219> No. That was the version we installed
[17:16] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) Quit (Quit: This computer has gone to sleep)
[17:16] <rlr219> this is our first cluster
[17:17] <rlr219> we are running it on Ubuntu precise
[17:18] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[17:21] <rlr219> Now that I think about it, we may have installed 0.48.0 and inadvertently upgraded to 0.48.1
[17:21] <rlr219> I am checking my nodes.
[17:24] * synapsr (~synapsr@12.180.144.3) has joined #ceph
[17:25] * synapsr (~synapsr@12.180.144.3) Quit (Remote host closed the connection)
[17:26] * miroslavk (~miroslavk@63.133.196.10) has joined #ceph
[17:26] * sagelap (~sage@82.sub-70-197-143.myvzw.com) has joined #ceph
[17:28] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:29] <slang> rlr219: maybe it follows the convertfs path no matter what
[17:31] * tryggvil (~tryggvil@62-50-223-253.client.stsn.net) Quit (Quit: tryggvil)
[17:34] <lotia> bare minimum setup. is it at all advisable to have a single large machine with ods mds and mon on it?
[17:34] <lotia> if i were running it as the backing store for openstack
[17:37] * verwilst (~verwilst@d5152D6B9.static.telenet.be) Quit (Quit: Ex-Chat)
[17:41] * scuttlemonkey (~scuttlemo@63.133.198.36) has joined #ceph
[17:43] * scuttlemonkey (~scuttlemo@63.133.198.36) Quit ()
[17:43] * scuttlemonkey (~scuttlemo@63.133.198.36) has joined #ceph
[17:43] * scuttlemonkey (~scuttlemo@63.133.198.36) Quit ()
[17:46] <jamespage> lotia, that would work for testings; but obviously no resilience
[17:48] <lotia> jamespage: i'm guessing it would likely be quite slow?
[17:48] <jamespage> lotia, depends how big 'single large machine' is
[17:48] <lotia> or would you be able to get speed by running multiple osds
[17:49] <jamespage> lotia, typically you would run a osd per disk; that way you won't get io contention
[17:49] <jamespage> so lots of disks would certainly help
[17:49] <jamespage> lotia, having a separate disk for the journal would also help (esp if SSD :-))
[17:50] * scuttlemonkey (~scuttlemo@63.133.198.36) has joined #ceph
[17:52] <lotia> jamespage: that would still be OSD?
[17:53] <jamespage> lotia, you have a journal per OSD
[17:53] <jamespage> and you can run multiple OSD's on the same box
[17:53] <lotia> and the OSD journal for multiple osds can be to the same block device?
[17:54] <lotia> so a single ssd for journals?
[17:54] * scuttlemonkey (~scuttlemo@63.133.198.36) Quit ()
[17:55] * aliguori (~anthony@cpe-70-123-146-246.austin.res.rr.com) Quit (Remote host closed the connection)
[17:55] <jamespage> lotia, yes
[17:57] * miroslavk (~miroslavk@63.133.196.10) Quit (Quit: Leaving.)
[17:58] <lotia> anyone experimented with running ceph-mon on the compute nodes in an openstack cluster.
[17:59] * sagelap (~sage@82.sub-70-197-143.myvzw.com) Quit (Ping timeout: 480 seconds)
[17:59] * sagelap (~sage@38.122.20.226) has joined #ceph
[18:02] * vata (~vata@2607:fad8:4:0:5995:9bca:be1f:ce1d) has joined #ceph
[18:04] <rlr219> Sorry slang, got busy. not sure I follow your last statment.
[18:05] * jlogan1 (~Thunderbi@2600:c00:3010:1:4fe:8250:70f9:cd1c) has joined #ceph
[18:10] * synapsr (~synapsr@63.133.198.91) has joined #ceph
[18:10] <sagewk> slang: good morning!
[18:11] <slang> morning!
[18:11] <sagewk> slang: do you have a minute to summarize the current status of the samba patch?
[18:11] <slang> sagewk: yep
[18:12] * Tv_ (~tv@38.122.20.226) has joined #ceph
[18:14] <slang> sagewk: the patches you sent me needed to be fixed up a bit to compile/work with the latest samba sources, I did some minor cleanup too. The testing I've done so far is with smbclient, basic file create, listing, etc.
[18:19] <sagewk> any sense of how much is needed to get them in working condition?
[18:20] <sagewk> for some reasonable definition of working?
[18:21] <slang> sagewk: the patch is really just a single source file (the ceph module) and the changes to the Makefile.in to include ceph during build
[18:21] <sagewk> elder: there?
[18:21] <elder> Yes.
[18:22] <sagewk> elder: did you reproduce #3291?
[18:22] <elder> Just a minute.
[18:22] <sagewk> the bio bug from tziom
[18:22] * scalability-junk (~stp@188-193-205-115-dynip.superkabel.de) Quit (Quit: Leaving)
[18:23] <slang> sagewk: you can also build it separately as a shared object and drop it in /usr/local/samba/lib/vfs/ceph.so
[18:23] <elder> No I did not. I'll try it now.
[18:23] <slang> sagewk: to add it to existing samba installs
[18:24] <slang> sagewk: I would say that its "working" now, but mostly untested
[18:24] <slang> and zero doc on how to set it up
[18:25] <sagewk> elder: i think we should try to nail that one down (dup of 2937 too i think.. and possibly what iw as seeing under uml a while back
[18:25] <elder> You think it will reproduce on UML too? Good, I'll try it there firest.
[18:25] <sagewk> slang: ok. is there a smbclient-based or -like test suite we can run against it?
[18:25] <elder> I'm trying to identify the kernel version right now.
[18:25] <Tv_> yehudasa: https://github.com/openstack/swift/blob/master/bin/swift-bench
[18:26] <sagewk> elder: maybe. a while back i was making it crash under uml doing dd, but uml would crash in useless ways without a proper stack trace etc...
[18:26] <elder> In any case, I'll try it there first, much quicker.
[18:27] <sagewk> elder: in any case, the bio_clone thing was low-priority as a memory leak, but if there is a larger bug there i think we should prioritize
[18:27] <sagewk> elder: sounds good
[18:27] <elder> Fine with me.
[18:27] <tziOm> sagewk, so you identified the bug?
[18:28] <elder> tziOm can you tell me the specific kernel version (or a version)?
[18:28] <tziOm> 3.6.1
[18:28] <sagewk> tziom: on insofar as we already knew that the code was potentially problematic ..
[18:28] <sagewk> elder: thanks!
[18:28] <elder> Perfect. Thank you.
[18:29] <rlr219> slang: after further review, the servers running my mons and mds's are running version 0.48, i have one server running an osd under 0.48, 5 osds running 0.48.1 and this osd that is crashing is running 0.48.2
[18:30] <Tv_> yehudasa: e.g. http://www.zmanda.com/blogs/?p=641
[18:31] <rlr219> slang: that were brought up over a period of several weeks and I missed the version changes when i did later installs. :(
[18:31] <Tv_> yehudasa: http://docs.openstack.org/developer/swift/howto_installmultinode.html#adding-a-proxy-server
[18:31] <yehudasa> Tv_: thanks
[18:31] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[18:36] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[18:36] <slang> sagewk: I don't see any tests that use smbclient in there
[18:36] <sagewk> slang: is there some sort of test suite we can reuse?
[18:36] <sagewk> slang: or do we need to build something ourselves?
[18:37] <slang> sagewk: yep there's a test suite, not sure how useful it will be for testing the ceph module..
[18:37] * loicd (~loic@12.180.144.3) Quit (Ping timeout: 480 seconds)
[18:40] <slang> sagewk: there are a bunch of torture tests in their build_farm suite, maybe we can run those
[18:40] <slang> sagewk: (unrelated) still seeing some problems with chmod
[18:41] <sagewk> slang: the end result we want is teuthology starting up samba reexporting ceph, and then a task that runs a bunch of tests on top. either directly against samba, or also mounting via linux cifs (blech) or something else...
[18:41] <slang> sagewk: that basic test works, but if I chmod back to 644, the mode isn't getting set on the client inode
[18:41] <elder> tziOm, for my information, can you tell me any more specifics about the rbd image you reproduce the hang with?
[18:42] <elder> How big it is, for example?
[18:42] <tziOm> 1 gig
[18:42] <slang> sagewk: ok
[18:42] <elder> And how many osds?
[18:42] <sagewk> slang: hmm can you push a branch?
[18:42] <slang> sagewk: it looks like the AUTH_EXCL cap is getting dropped
[18:43] * joshd (~joshd@63.133.198.91) has joined #ceph
[18:43] <tziOm> elder, 4 osds
[18:43] <tziOm> 3 mons
[18:43] <tziOm> 1 mds
[18:43] <elder> OK, thanks.
[18:43] * cblack101 (86868b46@ircip3.mibbit.com) has joined #ceph
[18:45] * scalability-junk (~stp@188-193-208-44-dynip.superkabel.de) has joined #ceph
[18:47] <gregaf> slang: it's perfectly fine to drop capabilities as long as they're reacquired when needed...
[18:47] <slang> sagewk: when the client gets the setattr reply from the mds (or even a getattr reply), the (issued & CEPH_CAP_AUTH_EXCL) == 0 check fails
[18:48] <slang> gregaf: sure
[18:48] <gregaf> I'm missing context here, aren't I
[18:48] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[18:48] <sagewk> i need to see a full log to make sense of hte situation, i think
[18:50] <slang> gregaf: the test is create/chmod 400/append/chmod 644
[18:50] <gregaf> oh, that's an odd turn of code
[18:50] <slang> gregaf: we get a failure (correctly) now on the append
[18:50] <slang> but the chmod 644 doesn't actually do its job
[18:51] <slang> sagewk: I'll add a log on that ticket
[18:51] <gregaf> sagewk: is there some reason the client needs AUTH_EXCL in order to update mode/uid/gid provided by the MDS?
[18:51] <gregaf> or is the local cache just not read from if it doesn't have that cap, so they're deliberately not updated
[18:53] * rread (c0373729@ircip2.mibbit.com) has joined #ceph
[18:53] * loicd (~loic@63.133.198.91) has joined #ceph
[18:53] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Quit: Leaving.)
[18:55] * rread (c0373729@ircip2.mibbit.com) has left #ceph
[18:57] <slang> gregaf: we always fill in the getattr results from the inode
[18:58] * rread (~rread@c-98-234-218-55.hsd1.ca.comcast.net) has joined #ceph
[18:58] * loicd1 (~loic@z2-8.pokersource.info) has joined #ceph
[18:58] * loicd (~loic@63.133.198.91) Quit (Read error: Connection reset by peer)
[19:01] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[19:03] <sjust> rlr219: are you still around?
[19:04] <rlr219> yes
[19:04] <sjust> can you restart the osd with debug filestore = 20 and post the log?
[19:04] <sjust> that should tell us what we need to know about the crash
[19:05] <rlr219> sure give me a minute
[19:10] <rlr219> sjust: its here http://pastebin.com/X3xdifxr
[19:11] * synapsr (~synapsr@63.133.198.91) Quit (Remote host closed the connection)
[19:13] <rlr219> sjust: as I posted earlier, I have slightly different versions of ceph running throughout my cluster. they are all 0.48 based. is that going to cause major issues?
[19:17] * vata (~vata@2607:fad8:4:0:5995:9bca:be1f:ce1d) Quit (Ping timeout: 480 seconds)
[19:19] <slang> sagewk: I posted a less confusing log
[19:19] <sagewk> slang; cool
[19:20] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[19:24] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[19:30] <gregaf> slang: yeah, I read that backwards since I was in a hurry :/
[19:31] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) Quit (Ping timeout: 480 seconds)
[19:31] <slang> gregaf: ?pleh siht seod
[19:31] <gregaf> not really, no :p
[19:31] <sjust> rlr219: did you at any point downgrade the osd?
[19:32] * The_Bishop (~bishop@e179000062.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[19:32] <slang> :-)
[19:32] <rlr219> sjudt; no. this is the original install of ceph on that server.
[19:33] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[19:33] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[19:34] * sagelap1 (~sage@38.122.20.226) has joined #ceph
[19:34] * sagelap (~sage@38.122.20.226) Quit (Quit: Leaving.)
[19:35] <elder> tziOm, I have reproduced your problem and will start trying to track it to root cause.
[19:36] <sagewk> slang: re 3301: the client appears to be behaving correctly, but it isn't getting back the updated mode after it sends the setattr request. can you attach an mds log to go with it? debug ms = 1 debug mds = 20
[19:37] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) Quit (Quit: Leaving.)
[19:37] <slang> sagewk: I think it is getting the updated mode
[19:37] * sjust (~sam@2607:f298:a:607:baac:6fff:fe83:5a02) has joined #ceph
[19:37] <slang> (gdb) printf "%o\n", st->mode
[19:37] <slang> 100644
[19:37] <sagewk> slang: i see the 10:42:06.618018 still has mode=100400
[19:38] * loicd1 (~loic@z2-8.pokersource.info) Quit (Quit: Leaving.)
[19:38] <slang> sagewk: that's of the inode though, not the inodestat
[19:38] <sagewk> pretty sure teh add_update_inode should have copied it into the indoe, tho, since we aren't olding the AUTH_EXCL cap
[19:38] <sagewk> aren't holding
[19:39] <slang> sagewk: I think that's the bug :-)
[19:39] <slang> sagewk: generating the mds log now...
[19:40] <sagewk> we flush the dirty auth caps a bit earlier, at 10:41:57.405663 or so
[19:40] <tziOm> elder - thanls
[19:40] <slang> sagewk: I don't see where we print out the InodeStat struct to the log
[19:41] <sagewk> we probably don't
[19:42] <slang> *nods*
[19:43] <sagewk> but my reading of the code is that in->mode = st->mode should have run, so my guess is the mds
[19:43] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[19:43] * scuttlemonkey (~scuttlemo@63.133.198.36) has joined #ceph
[19:44] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[19:44] <slang> sagewk: that gdb print above is of the InodeStat in add_update_inode
[19:44] <slang> during the ls -la
[19:46] <sagewk> huh. print out ccap_string(issued) in that block to see if that's why it's not taking that path
[19:46] <slang> sagewk: too late :-)
[19:49] <slang> sagewk: uploaded logs, I wasn't able to reproduce it the exact same way
[19:49] <slang> sagewk: i'll try to get ccap_string(issued)
[19:53] <sagewk> slang: looks suspeicious: client_request(client.4110:8 setattr mode=064330242 #10000000000)
[19:53] * Cube2 (~Cube@12.248.40.138) has joined #ceph
[19:53] * loicd (~loic@63.133.198.91) has joined #ceph
[19:54] * deepsa_ (~deepsa@117.199.126.138) has joined #ceph
[19:54] <slang> sagewk: yeah I couldn't figure out what kind of output format that is
[19:55] <sagewk> its supposed to be octal. something is wrong.
[19:55] <slang> sagewk: I think its just the way its getting printed though
[19:55] <slang> sagewk: is it because its packed?
[19:55] <slang> since its printing straight from the request structure
[19:56] * deepsa (~deepsa@117.212.21.91) Quit (Ping timeout: 480 seconds)
[19:56] * deepsa_ is now known as deepsa
[19:58] * Cube1 (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[19:59] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[20:00] * joshd (~joshd@63.133.198.91) Quit (Ping timeout: 480 seconds)
[20:01] * deepsa (~deepsa@117.199.126.138) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[20:04] * dmick (~dmick@2607:f298:a:607:647d:1f2d:7bab:35f6) has joined #ceph
[20:08] <sagewk> yeah, formatting problem.
[20:08] <sagewk> ios::oct vs std::oct..
[20:09] <slang> sagewk: what's ios?
[20:09] <slang> ceph runs on iphone?
[20:09] * blufor (~blufor@adm-1.candycloud.eu) Quit (Ping timeout: 480 seconds)
[20:09] <sagewk> :)
[20:09] <elder> That's your next project.
[20:09] <elder> We could truly rule the world.
[20:10] * nhm_ (~nhm@174-20-101-163.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[20:10] <slang> ah std::ios::oct vs. std::oct
[20:12] <sagewk> yeah
[20:14] * lofejndif (~lsqavnbok@659AAA9ZD.tor-irc.dnsbl.oftc.net) has joined #ceph
[20:14] <sagewk> slang: ah, i see the problem.
[20:15] <sagewk> slang: update_inode_file_bits wants to know what issued *was*
[20:15] <slang> sagewk: the ccap_string(issued) is always: pAxLsXsxFsxcrwb
[20:15] <sagewk> and the caller is adding in newly issued caps before calling
[20:15] <sagewk> i think it just needs to be moved down after that block
[20:17] <slang> sagewk: where does update_inode_file_bits get the mode?
[20:18] <sagewk> oh sorry, i mean add_update_inode
[20:18] <sagewk> i'll post a patch in a second..
[20:19] * The_Bishop (~bishop@2001:470:50b6:0:c912:edbf:8cb3:9603) has joined #ceph
[20:20] * aliguori (~anthony@32.97.110.59) has joined #ceph
[20:22] <sagewk> slang: wip-3301
[20:23] * loicd (~loic@63.133.198.91) has joined #ceph
[20:24] <slang> sagewk: could just bitwise-or implemented onto issued after update_inode_file_bits?
[20:25] <slang> being able to call ccap_string() from gdb makes this a lot easier for me to grok
[20:26] * loicd (~loic@63.133.198.91) Quit ()
[20:26] <gregaf> sagewk: that was my thought as well but looking at your diff I don't see where the caps have been updated in between there?
[20:26] <sagewk> oh, hold up.
[20:28] <sagewk> nm, i misread something.
[20:29] * cowbell (~sean@adsl-70-231-134-186.dsl.snfc21.sbcglobal.net) has joined #ceph
[20:30] * loicd (~loic@63.133.198.91) has joined #ceph
[20:30] * MikeMcClurg (~mike@cpc1-oxfd13-0-0-cust716.4-3.cable.virginmedia.com) has joined #ceph
[20:31] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[20:32] <slang> seems to work
[20:32] <slang> sagewk: can we instead move the issued |= implemented; down to the end of the block after update_inode_file_bits?
[20:33] <sagewk> maybe
[20:33] <sagewk> i'm trying to figure out why the issue includes Ax in the first place
[20:34] <slang> sagewk: it definitely gets it from implemented
[20:34] <slang> sagewk: before the |= implemented; it doesn't have Ax
[20:35] <sagewk> i think that' sthe bug
[20:36] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[20:37] * Cube2 (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[20:39] <sagewk> yeah
[20:42] <rlr219> sjust: wondering if you found anything?
[20:42] * blufor (~blufor@adm-1.candycloud.eu) has joined #ceph
[20:43] <sagewk> slang: ok, re-pushed wip-3301.
[20:47] * loicd (~loic@63.133.198.91) Quit (Quit: Leaving.)
[20:47] * joshd (~joshd@63.133.198.91) has joined #ceph
[20:50] <slang> sagewk: why are the caps getting revoked in this scenario?
[20:55] * nhm (~nh@174-20-101-163.mpls.qwest.net) has joined #ceph
[20:58] * cowbell (~sean@adsl-70-231-134-186.dsl.snfc21.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[21:02] * joshd (~joshd@63.133.198.91) Quit (Ping timeout: 480 seconds)
[21:03] * joshd (~joshd@63.133.198.91) has joined #ceph
[21:05] * cowbell (~sean@adsl-70-231-145-252.dsl.snfc21.sbcglobal.net) has joined #ceph
[21:10] * cowbell (~sean@adsl-70-231-145-252.dsl.snfc21.sbcglobal.net) Quit (Read error: Connection reset by peer)
[21:12] <sjust> rlr219: sorry, lunch, looking again
[21:14] <nhm> mikeryan: trying out tmux instead of screen. :)
[21:16] <elder> Is tmux a better alternative to screen?
[21:17] <nhm> elder: that's what mike ryan claims. So far I like the green status bar at the bottom.
[21:17] <elder> Ooh! Colors!
[21:17] <nhm> though it does clash with irssi's blue bar.
[21:18] * Guest1916 is now known as AaronSchulz
[21:19] <slang> the only thing I don't like about tmux is its poor handling when a lot of output is written to a pane
[21:20] <slang> it doesn't send keys (like ctrl-c) to the process right away - I usually have to detach and reattach from the tmux session
[21:20] <nhm> hrm, that sounds annoying.
[21:20] * amatter (~amatter@209.63.136.133) Quit (Ping timeout: 480 seconds)
[21:20] * Ryan_Lane (~Adium@63.133.198.91) has joined #ceph
[21:20] <slang> nhm: it is kind of
[21:21] <rlr219> sjust: on our cluster we have 2 unfound items. Let me preface this by saying that on Sunday, our cabinet that all these servers are in lost power.
[21:21] <sagewk> slang: they'r enot
[21:21] <sagewk> slang: they're getting released
[21:21] <slang> oh
[21:21] <sagewk> slang: that was part of the problem.. why implemented wasn't getting updated
[21:22] * Ryan_Lane (~Adium@63.133.198.91) Quit ()
[21:22] <sjust> rlr219: anything in dmesg on that node?
[21:22] <rlr219> Since then we have been in a recovery state. So we have 2 unfound items. we want to know if these unfound are critical or not.
[21:22] <sjust> rlr219: these are btrfs?
[21:23] <sagewk> slang: want to add a quick unit test for this and commit it along with the fix to master?
[21:23] <rlr219> nothing in dmesg, yes they are btrfs
[21:24] * vata (~vata@2607:fad8:4:0:78cb:2567:5dd0:ba4a) has joined #ceph
[21:24] <sjust> rlr219: you can list unfound objects with 'ceph pg <pgid> list_missing'
[21:24] * The_Bishop (~bishop@2001:470:50b6:0:c912:edbf:8cb3:9603) Quit (Ping timeout: 480 seconds)
[21:25] <sjust> where <pgid> is the pg with the missing objects (you should be able to get it from ceph health detail)
[21:25] <rlr219> I did that. is there a way to tell what data was from the oids?
[21:25] <slang> sagewk: yeah I'll merge wip-test-chmod in there too
[21:25] <sjust> yeah, what are the oids?
[21:26] * loicd (~loic@63.133.198.91) has joined #ceph
[21:26] <rlr219> rb.0.4.000000011e02 & rb.0.4.000000012202
[21:26] <sjust> those are both data blocks
[21:26] * amatter (~amatter@209.63.136.133) has joined #ceph
[21:26] <sjust> so they are 4MB chunks of an rbd image
[21:26] <sjust> the same image, I think
[21:28] <sjust> what is the pg/
[21:28] <sjust> ?
[21:28] <rlr219> 2.87
[21:28] <rlr219> we only have rbd images on the cluster right now.
[21:28] <sjust> yeah
[21:29] <sjust> can you check to see whether the following file exists on the down node?
[21:29] <sjust> /home/cephbrick/ceph/osd/current/4.2_head/DIR_A/DIR _B/DIR_A/rb.0.9.00000000004e__head_C5976ABA__4
[21:30] * scalability-junk (~stp@188-193-208-44-dynip.superkabel.de) Quit (Read error: Connection reset by peer)
[21:30] * scalability-junk (~stp@188-193-208-44-dynip.superkabel.de) has joined #ceph
[21:32] * The_Bishop (~bishop@2001:470:50b6:0:ac8b:ad5b:26dd:5f13) has joined #ceph
[21:33] <rlr219> no, but there are several others that are close. the "head" part has a 2 character "entry"
[21:34] <sjust> sorry?
[21:34] <rlr219> these are present: rb.0.9.00000000004e__31_C5976ABA__4 rb.0.9.00000000004e__5e_C5976ABA__4 rb.0.9.00000000004e__d6_C5976ABA__4 rb.0.9.00000000004e__54_C5976ABA__4 rb.0.9.00000000004e__a9_C5976ABA__4 (if that is relevant)
[21:34] <sjust> ah, those are snapshots
[21:35] <rlr219> ok
[21:35] <sjust> oh, I suspect I know what happened, can you give me the sha1 for the ceph-osd binary/
[21:35] <sjust> ?
[21:35] <sjust> ceph-osd --version
[21:35] <sjust> probably
[21:36] <rlr219> ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
[21:36] <rlr219> is that what you needed?
[21:36] <sjust> thanks
[21:37] * vata (~vata@2607:fad8:4:0:78cb:2567:5dd0:ba4a) Quit (Remote host closed the connection)
[21:38] * loicd (~loic@63.133.198.91) Quit (Ping timeout: 480 seconds)
[21:39] <sjust> did this osd come back up after the loss of power?
[21:40] <rlr219> yes. I have upgraded it since the power failure. It was running 0.48.1, now it is at 0.48.2
[21:41] <sjust> is pg 4.2 healthy?
[21:42] <rlr219> pg 4.2 is stuck active+recovering+degraded+backfill, last acting [6,12,13]
[21:42] <sjust> what is the output of ceph pg 4.2 query?
[21:42] <rlr219> from ceph health detail
[21:45] <rlr219> http://pastebin.com/NJXmY2qB
[21:46] <sjust> are you using a replication level of 2?
[21:46] <rlr219> 3
[21:48] * joshd (~joshd@63.133.198.91) Quit (Ping timeout: 480 seconds)
[21:51] <rlr219> sorry, rep level of 3
[22:01] * scuttlemonkey (~scuttlemo@63.133.198.36) Quit (Quit: This computer has gone to sleep)
[22:04] * oxhead (~oxhead@nom0065903.nomadic.ncsu.edu) has joined #ceph
[22:08] <sjust> rlr219: kernel version?
[22:13] <rlr219> 3.2.0-29-generic
[22:16] <mikeryan> nhm: haven't used irssi in tmux yet, thanks for charting that territory ;)
[22:16] <mikeryan> took a while for me to retrain my fingers to ctrl-b instead of ctrl-a
[22:17] <nhm> this is the first time I've tried tmux, haven't done anything at all with it other than launch irssi.
[22:19] <sjust> rlr219: I don't seem to have enough information to figure out how that crash happened, it seems that the pg 4.2 state on that osd is corrupted
[22:19] <sagewk> slang: on wip-client-perms: I'm pretty sure O_RDONLY == 0.. you actually want to do switch (rflags & O_ACCMODE) { case O_RDONLY: or similar
[22:19] * vata (~vata@2607:fad8:4:0:d0ef:740b:1c51:6a09) has joined #ceph
[22:20] <sjust> rlr219: might be a btrfs problem
[22:20] <sagewk> slang: also, i think we want (mode & (fmode << ..) == fmode) to know we have access to everything we want, not just something we want
[22:21] <slang> *nods*
[22:21] <sjust> rlr219: our best bet is probably to bring that node back up without pg4.2
[22:21] <slang> sagewk: I committed that fix but haven't pushed it yet
[22:21] <sagewk> cool.
[22:25] <rlr219> so how do we do that?
[22:25] <sjust> rlr219: hang on
[22:28] <sjust> rlr219: you can use ceph-osd -i <osdid> -c <ceph conf location> --mkjournal to blast the journal
[22:28] * slang tests
[22:29] <sjust> then you need to go to the most recent snapshot (I'll explain that shortly) and remove the
[22:29] <sjust> rlr219: one sec
[22:30] <sjust> remove the 4.2_* collections
[22:30] * loicd (~loic@63.133.198.91) has joined #ceph
[22:32] <sjust> as well as the pginfo and pglog files (find <snap directory> -name '*u4.2_*')
[22:33] <sjust> e.g.
[22:33] <sjust> ./pglog\u0.0__0_103B076E__none
[22:33] <sjust> ./pginfo\u0.0__0_DAE19212__none
[22:33] <sjust> for pg 0.0
[22:34] * miroslavk (~miroslavk@63.133.198.36) has joined #ceph
[22:35] <sjust> can you list the contents of the down osd data directory?
[22:35] <sjust> there should be a directory "current" as well as some directories that look like "snap_*"
[22:35] <rlr219> ceph_fsid current current.remove.me.846930886 fsid keyring magic ready snap_50235072 snap_50235136 store_version trace.txt whoami
[22:36] <sjust> you'll want to do what I described in snap_50235136
[22:37] <sjust> so there should be directories snap_50235136/4.2_*
[22:37] <sjust> find snap_50235136/meta -name '*u4.2_*'
[22:37] <sjust> that should turn up two files, the log and info for pg 4.2
[22:39] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:41] <rlr219> do i use rm-rf or btrfs commands?
[22:43] * slang (~slang@ace.ops.newdream.net) Quit (Quit: slang)
[22:43] * scuttlemonkey (~scuttlemo@63.133.198.36) has joined #ceph
[22:43] * slang (~slang@ace.ops.newdream.net) has joined #ceph
[22:44] <joao> the btrfs tool is only required to remove subvolume/snapshots
[22:45] <joao> if you're removing directories within a snapshot/subvol, you should work 'rm' instead
[22:45] <sjust> they are just directories, so just rm -rf
[22:45] <elder> biab
[22:45] <sjust> you aren't removing the snap directories, you are removing the 4.2 collection from within the snap directory
[22:45] <sjust> *collections
[22:47] <rlr219> ok
[22:47] <rlr219> I have found them with the find command. I used rm -rf snap_50235136/meta/DIR_2/pginfo\u4.2__0_DAE06C32__none
[22:48] <rlr219> and did it again for the pglog file. But when I rr-run the find command they still show up
[22:51] * LarsFronius (~LarsFroni@2a02:8108:3c0:79:9985:29ae:9b49:7247) has joined #ceph
[22:52] * Ryan_Lane (~Adium@63.133.198.91) has joined #ceph
[22:53] * loicd (~loic@63.133.198.91) Quit (Ping timeout: 480 seconds)
[22:55] * Cube (~cube@12.248.40.138) Quit (Quit: Leaving.)
[22:55] <sagewk> joao: there are several coverity issues in the mon that nobody has looked at.. can you take a look?
[22:55] * Cube (~cube@12.248.40.138) has joined #ceph
[22:56] <joao> sagewk, sure
[22:57] <sagewk> joao: thanks!
[23:01] * loicd (~loic@63.133.198.91) has joined #ceph
[23:01] * johnl (~johnl@2a02:1348:14c:1720:a18b:3b9f:2c7:97a9) Quit (Remote host closed the connection)
[23:02] * johnl (~johnl@2a02:1348:14c:1720:c83c:98a3:cc38:a950) has joined #ceph
[23:03] * aliguori (~anthony@32.97.110.59) Quit (Ping timeout: 480 seconds)
[23:03] * MikeMcClurg (~mike@cpc1-oxfd13-0-0-cust716.4-3.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[23:07] * cowbell (~sean@70.231.129.172) has joined #ceph
[23:07] <sjust> under snap_50235136/
[23:08] <sjust> ?
[23:08] <sjust> they should still be in the current and the other snapdirectory
[23:08] <sjust> but that doesn't matter
[23:09] <rlr219> ok. also removed the 4.2_* in that snap directory
[23:09] * loicd (~loic@63.133.198.91) Quit (Ping timeout: 480 seconds)
[23:09] <sjust> ok
[23:10] <sjust> now you need to blast the journal with ceph-osd -i <osdid> -c <ceph conf location> --mkjournal
[23:10] <sjust> after that, with luck, the osd should come up
[23:10] <rlr219> ok
[23:10] * loicd (~loic@z2-8.pokersource.info) has joined #ceph
[23:11] <rlr219> no joy
[23:12] <sjust> what did it do?
[23:12] <sjust> you would then need to run the ceph-osd daemon as normal
[23:12] <rlr219> I tried that with service ceph start osd
[23:13] <rlr219> but it didn't start
[23:13] <sjust> ok, did you get useful log output?
[23:14] <rlr219> http://pastebin.com/imRc9qci
[23:15] <rlr219> i only copy and pasted the last part
[23:15] <rlr219> if you need more let me know
[23:15] <sjust> hmm, the mkjournal didn't take
[23:16] <sjust> what output did you get from the ceph-osd -i <osdid> -c <ceph conf location> --mkjournal
[23:16] <sjust> ?
[23:18] * loicd (~loic@z2-8.pokersource.info) Quit (Quit: Leaving.)
[23:19] <rlr219> 7fdc60c4a780 -1 created new journal /var/lib/ceph/osd/ceph-10/journal for object store /home/cephbrick/ceph/osd
[23:19] <sjust> ceph-10 is the down osd?
[23:19] <rlr219> yes
[23:21] <sjust> add debug journal = 20 and try restarting the osd with service ceph start osd
[23:21] <sjust> and debug osd = 20
[23:21] <rlr219> in the ceph.conf file?
[23:21] <sjust> yeah
[23:23] * Ryan_Lane (~Adium@63.133.198.91) Quit (Quit: Leaving.)
[23:25] <rlr219> ouput is here: http://pastebin.com/VVX23Ea7
[23:26] <sjust> the journal is on a file, right?
[23:26] <sjust> can you post your ceph.cofn?
[23:26] <sjust> *ceph.conf
[23:27] * tziOm (~bjornar@ti0099a340-dhcp0778.bb.online.no) Quit (Remote host closed the connection)
[23:27] * AaronSchulz wonders if http://tracker.newdream.net/issues/3080 is high priority
[23:27] * loicd (~loic@63.133.198.91) has joined #ceph
[23:27] <sjust> I get the feeling that it's a medium term goal
[23:29] <yehudasa> gregaf: I pushed a few trivial fixes to master (coverity) if you want to take a look would be nice
[23:30] * loicd1 (~loic@63.133.198.91) has joined #ceph
[23:30] * loicd (~loic@63.133.198.91) Quit ()
[23:30] <rlr219> ceph.conf: http://pastebin.com/JRXQ8sX3 -rw-r--r-- 1 root root 10485760000 Oct 15 22:32 journal
[23:30] <rlr219> That doesn't look like the journal was just created...........
[23:34] <sjust> it wouldn't have been just created, but the header would have been redone
[23:34] <sjust> ok, try renaming the journal file out of the way and rerunning the --mkjournal command
[23:36] <rlr219> its running now
[23:36] <rlr219> woot
[23:37] <sjust> cool
[23:37] <sjust> what does ceph health say?
[23:37] * oxhead (~oxhead@nom0065903.nomadic.ncsu.edu) Quit (Quit: oxhead)
[23:37] <rlr219> HEALTH_WARN 1 pgs backfill; 88 pgs peering; 2 pgs recovering; 85 pgs stuck inactive; 283 pgs stuck unclean; recovery 48853/1697299 degraded (2.878%); 2/532764 unfound (0.000%)
[23:38] <sjust> let's give it a few minutes
[23:38] * loicd1 (~loic@63.133.198.91) Quit (Ping timeout: 480 seconds)
[23:38] <rlr219> ok
[23:38] <scuttlemonkey> jamespage: the latest ceph charm work is awesome
[23:39] <scuttlemonkey> just got inspired to play w/ it after Mark's demo here at the OpenStack Summit
[23:39] * rread (~rread@c-98-234-218-55.hsd1.ca.comcast.net) Quit (Quit: rread)
[23:39] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Read error: Connection reset by peer)
[23:41] <rweeks> which Mark
[23:42] <rlr219> looks to be moving in the right direction: HEALTH_WARN 2 pgs backfill; 5 pgs recovering; 5 pgs stuck unclean; recovery 101797/1697299 degraded (5.998%); 2/532764 unfound (0.000%)
[23:46] * loicd (~loic@63.133.198.91) has joined #ceph
[23:47] * dty (~derek@testproxy.umiacs.umd.edu) Quit (Ping timeout: 480 seconds)
[23:48] * BManojlovic (~steki@212.200.241.182) has joined #ceph
[23:51] * rread (~rread@c-98-234-218-55.hsd1.ca.comcast.net) has joined #ceph
[23:51] * danieagle (~Daniel@177.99.132.23) has joined #ceph
[23:52] * loicd1 (~loic@63.133.198.91) has joined #ceph
[23:52] * loicd (~loic@63.133.198.91) Quit (Read error: Connection reset by peer)
[23:55] <cowbell> does ceph support 3rd party auth (e.g. other than none or cephx)?
[23:56] <elder> Is the default order of an rbd image 22?
[23:56] <gregaf> cowbell: no
[23:56] <dmick> elder: yes
[23:56] <gregaf> elder: I believe so
[23:56] <elder> Thanks.
[23:56] <elder> That's what I thought, just looking for a quick confirmation.
[23:57] <cowbell> so you would only use ceph on a secured network, then?
[23:57] <gregaf> generally, yes
[23:57] <gregaf> Ceph is designed for use within a single (secured) data center
[23:58] <gregaf> we're slowly improving that security model — there's current work to encrypt the network data as well as to authenticate the initial connection, for instance
[23:58] <gregaf> but right now, Ceph is not a secure system
[23:59] <cowbell> j.k. as it seems like there's a MITM problem with cephx, though ceph-deploy and mkcephfs both use ssh to get around.
[23:59] <cowbell> thanks. appreciate it.
[23:59] * loicd (~loic@63.133.198.91) has joined #ceph
[23:59] * loicd1 (~loic@63.133.198.91) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.