#ceph IRC Log


IRC Log for 2012-12-13

Timestamps are in GMT/BST.

[0:01] <infernix> doh
[0:01] <infernix> good point :)
[0:08] * drokita (~drokita@ Quit (Quit: Leaving.)
[0:08] * drokita (~drokita@ has joined #ceph
[0:10] * dweazle (~dweazle@tilaa.krul.nu) Quit (Ping timeout: 480 seconds)
[0:11] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[0:14] * dweazle (~dweazle@tilaa.krul.nu) has joined #ceph
[0:16] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[0:20] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[0:35] * wer (~wer@dsl081-246-084.sfo1.dsl.speakeasy.net) Quit (Quit: wer)
[0:46] * l0nk (~alex@ Quit (Quit: Leaving.)
[0:46] * wer (~wer@dsl081-246-084.sfo1.dsl.speakeasy.net) has joined #ceph
[0:57] * maxiz (~pfliu@ has joined #ceph
[1:00] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[1:03] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[1:06] <elder> Is there anybody out there I can confer with about what's left in a teuthology run?
[1:12] <wer> ceph odd create…. No arguments I have tried will actually create an osd. I get invalid argument. Does anyone know the correct syntax? Are the docs seem wrong. I am working off of http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[1:12] <wer> ceph odd create osd.48 give (22) Invalid argument for instance
[1:13] <wer> So currently I can't get any of the nodes I have showing up in caph auth list.
[1:14] <wer> omg. ceph osd create. Sorry this mac keeps auto-correcting me.
[1:15] <lurbs> What about just 'ceph osd create' itself, with no arguments? Doesn't that return with the ID of the OSD it created?
[1:16] <gregaf1> yeah, docs are wrong — in argonaut it just silently ignored the osd.48 part; in newer dev versions it'll tell you that's bad
[1:17] <gregaf1> you can send along a UUID of the OSD in question to make sure you aren't double-creating
[1:17] <gregaf1> otherwise just send "ceph osd create" and it'll give you back a new ID
[1:17] <wer> lurbs: yeah. no arguments seems to use the next available osd. But I don't want to create that particular osd.
[1:17] <wer> so how can I create osd.48 - 71 in my case?
[1:18] <gregaf1> that is no longer an easy option, I'm afraid
[1:18] <gregaf1> if you really wanted you could just call it 72 times and only place 48-71 in the CRUSH hierarchy, I suppose
[1:21] <wer> gregaf1: So can I no longer expand to more nodes without having lingering osd's that don't really exist hanging around?
[1:21] <gregaf1> wer: that's what was happening in the past anyway; it's just more explicit now
[1:21] <wer> I don't get it.
[1:21] <gregaf1> that's why we always recommended sequentially-numbered OSDs starting from zero
[1:21] <wer> I did
[1:22] <wer> and now I want to bring in 48-71 without having to worry about 46and 47 atm.
[1:22] <gregaf1> then why do you need specific numbers coming back?
[1:22] <wer> well
[1:22] <gregaf1> so just skip adding them to the CRUSH map and they won't ever get assigned a daemon or be up
[1:23] <wer> I have 4 nodes. Each node has 24 predetermined osd's on them in order to keep the config sane…. I am bringing up the third node.
[1:23] <gregaf1> you could I suppose also delete them and that might remove one or two other places they'd pop up but you aren't placing any load on the cluster that wouldn't already exist by doing it this way
[1:23] <wer> gregaf1: ok, so just run create as many times as needed… and let the fake or dead osd's just hang out in the tree doing nothing?
[1:24] <gregaf1> yeah; they won't even be in the CRUSH map
[1:25] <wer> ok. Thanks gregaf1
[1:25] <wer> I never would have done that even though I discovered that was an option. Just seems really clunky. :)
[1:26] <gregaf1> yeah; the OSDs still conflate their ID and their number, which makes a lot of things clunky
[1:26] <gregaf1> I had this conversation once already today, and as I said then we are slowly working on it but splitting them out better is a challenge in many different dimensions ;)
[1:30] <wer> ok gregaf1 As long as I am not the only confused one. I think the config needs to just go away :) And having humans mount things, which assumes they are going to glue them to osd names needs to be gone too. if it is left up to the humans to maintain all this stuff. The human can currently be veto'd on their stupid decisions…. Their ability to make decisions should either be revoked or allowed, But you can't do both :)
[1:30] <dmick> elder: I might be able to help
[1:31] <gregaf1> wer: it is possible to run the system that way now, but it's very new and so not well-spread
[1:31] <gregaf1> you might find the approach used by the upstart configs and the ceph-deploy tool interesting
[1:31] <gregaf1> if you're still in a phase where you can define your deployment
[1:32] * maxiz (~pfliu@ Quit (Quit: Ex-Chat)
[1:34] * LeaChim (~LeaChim@5ad684ae.bb.sky.com) Quit (Remote host closed the connection)
[1:35] <wer> gregaf1: So I have seen how all the osd's and things are handled through that…. but I don't know how it is applicable with my issue. Would love to know. Basically I am just trying to wrap my head around the most sane and least resistive path to deployment… and maintenance…. but I seem to be chasing ceph a lot. :) And I really want to like it, cause I do like it, but the whole config thing is pretty crazy at times.
[1:38] <wer> imagine adding 4 nodes… and bringing up nodes in this order… 1,2,4,3. It is actually a little crazy to do. Cause things are not explicit as defined in the config…. It's weird. But I will just create all osd's up front… and then adjust the everything after the fact…. and before configuring a new node. But it will not be sane :)
[1:38] <wer> And I would love to know more about running without a config. That may be better for me.
[1:40] <wer> and ty. I am up and running at least :)
[1:42] <wer> this is my favorite new toy…
[1:45] * The_Bishop (~bishop@2001:470:50b6:0:24dc:9482:b62d:3434) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[1:49] <Psi-jack> Hmmm.
[1:49] <Psi-jack> if I start ceph-osd -i n; do I need to specify --osd-path, --osd-journal at all if that's all in the /etc/ceph/ceph.conf?
[1:50] <Psi-jack> And, is osdid the numerical value, or osd.number?
[1:51] <Psi-jack> And lastly, is there a start-order that should be followed? Should I start mon, osd, mds in that order?
[1:54] <dmick> Psi-jack: -i takes the "id", which is also the metavariable $id, and is the thing after the number
[1:54] <Psi-jack> Okay.
[1:54] <dmick> in osd.0, osd is $type, 0 is $id, and $name is osd.0
[1:54] <gregaf1> wer: indeed, you're discovering some of the troubles — basically Ceph is easy to use with a static manually-managed cluster, and easy-to-use with an automatically-managed dynamic cluster, but anything in-between gets odd…all I can say is we're improving it all the time
[1:55] <Psi-jack> Cool. That's a start. At least I can use an osd@.service, where @ is the id value to start/stop.
[1:55] <dmick> you should start mons before osds or mdses, I believe
[1:55] <gregaf1> Psi-jack: -i means "ID", so you just want the number
[1:56] <gregaf1> the MDSes can't do anything without the OSDs, and neither can do anything without the monitors, so there is an implied startup order there
[1:56] <gregaf1> but it doesn't much matter; daemons will just wait around until they can talk to whoever they need to
[1:56] <Psi-jack> Okay, so yeah, then. mon, osd, and mds.
[1:59] * buck (~buck@bender.soe.ucsc.edu) Quit (Quit: Leaving.)
[1:59] <Psi-jack> So, really, all there is is starting /usr/bin/ceph-mon -d -i $id; /usr/bin/ceph-osd -d -i $id; /usr/bin/ceph-mds -i $id (which doesn't seem to have a -d option)
[2:00] * jlogan (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[2:00] <Psi-jack> Err -f, not -d
[2:01] <Psi-jack> Cool. systemd-izing this will be MUCH easier. Then I'll be able to systemctl enable mon@a; systemctl enable osd@0; systemctl enable osd@1; systemctl enable osd@2; systemctl enable msd@a
[2:01] <Psi-jack> For node 1, node 2 similarly, but different id's.
[2:01] <dmick> sounds similar to things you can do with upstart, yes
[2:02] <Psi-jack> It's just MUCH simpler. LOL
[2:02] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[2:02] <dmick> I'm not participating in that skirmish :)
[2:03] <gregaf1> yep, as long as you've got ceph.conf in a default location (/etc/ceph/ceph.conf) that should just work
[2:04] <Psi-jack> Yeah. Can't get much easier. Then systemd will fully control it, restart if (if setup to do so), log to the journal, and life is good.
[2:26] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[2:26] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[2:35] * Cube (~Cube@ Quit (Ping timeout: 480 seconds)
[2:37] * wer (~wer@dsl081-246-084.sfo1.dsl.speakeasy.net) Quit (Quit: wer)
[2:41] * Cube (~Cube@ has joined #ceph
[2:50] * vata (~vata@ Quit (Quit: Leaving.)
[2:52] * upgrayedd (~achmed@65-130-121-24.slkc.qwest.net) has joined #ceph
[2:52] <upgrayedd> who was the guy doing ceph w/ IPoIB?
[2:55] <slang> upgrayedd: Scott Atchley at oak ridge I think
[2:56] <gregaf1> I assume you mean infernix, who was talking about it yesterday
[2:56] <gregaf1> upgrayedd: ^
[2:57] <upgrayedd> yes
[2:57] <upgrayedd> someone on another channel told me about it
[2:57] <upgrayedd> i was going to see if he was here and ask if he tried connected mode
[2:57] <upgrayedd> ipoib
[2:59] <infernix> yes
[2:59] <infernix> all connected mode
[2:59] <infernix> 65k mtu
[2:59] <infernix> upgrayedd: have you tried it?
[2:59] <upgrayedd> i've done a lot with iser and comstar targets on solaris
[3:00] <upgrayedd> the only time i tried ceph it was putting processes into uninterruptable sleep and i haven't tried again =P
[3:00] <infernix> well our regular storage targets are all srp
[3:00] * wer (~wer@dsl081-246-084.sfo1.dsl.speakeasy.net) has joined #ceph
[3:01] <infernix> no iser here
[3:01] <infernix> i'm exporting some stuff with scst but also srp
[3:01] * wer (~wer@dsl081-246-084.sfo1.dsl.speakeasy.net) Quit ()
[3:01] <upgrayedd> do you know a way to get it to rescan the targets on client side with srp?
[3:01] <upgrayedd> that's the only reason i didn't usei t
[3:01] <upgrayedd> s/i t/it/
[3:01] <infernix> srp_daemon handles it fine here
[3:01] <infernix> though admittedly you have to be patient on recent kernels
[3:02] <infernix> just -HUP it and that's that
[3:02] <upgrayedd> oh- cool
[3:02] <upgrayedd> SRP was faster than iSER when i tested
[3:02] <upgrayedd> as you'd imagine
[3:02] <infernix> i can do 4.5gbyte/s on srp
[3:02] <infernix> need a couple of threads and large blocksizes but it's there
[3:02] <upgrayedd> on QDR?
[3:02] <infernix> yes
[3:03] <infernix> two cards in clients; storage has 4 controllers, all active
[3:03] <upgrayedd> that's higher than theoretical for one port
[3:03] <upgrayedd> oh
[3:03] <infernix> multipath on the clients
[3:03] <upgrayedd> do you do multipathing or anything?
[3:03] <upgrayedd> beat me to it
[3:03] <upgrayedd> dude, i want to party with _you_
[3:03] <upgrayedd> =P
[3:04] <infernix> everyone does
[3:04] <infernix> :)
[3:04] <upgrayedd> i just left the place that i built the iSER rig
[3:04] <upgrayedd> they're going to burn it down because they don't want to learn about IB
[3:04] <infernix> well if you're into cloud stuff, we might be hiring
[3:04] <upgrayedd> i just switched to devops at sungard
[3:05] <upgrayedd> for cloud team
[3:05] <upgrayedd> i didn't jump off the last job without an offer signed =P
[3:05] <infernix> do you have IB there?
[3:06] <upgrayedd> nope
[3:06] * infernix rests his case
[3:06] <upgrayedd> =P
[3:06] <upgrayedd> is this at inktank?
[3:06] <upgrayedd> or somewhere else?
[3:07] <upgrayedd> i almost applied to inktank
[3:07] <infernix> no not inktank
[3:07] <nhm> infernix: btw, I think there are some people that might be poking at rsocket soon.
[3:08] <upgrayedd> for rbd transport?
[3:08] <infernix> nhm: that'd be very welcome. if there's a lack of equipment I could look into providing a dev setup
[3:08] <infernix> but sign me up for testing
[3:08] <upgrayedd> me toooooooooooooooo
[3:08] <tore_> at least at my company, we're more interested in copper gbe going forward and less interested in fiber gbe or infiniband
[3:09] <nhm> infernix: I don't think there will be a lack of equipment, but if you are interested, the near future might be a great time to bring it up on the mailing list and see if anyone bites. :)
[3:09] <upgrayedd> which list so i can second
[3:10] <tore_> i should clarify: 10g copper
[3:10] * The_Bishop (~bishop@e179000202.adsl.alicedsl.de) has joined #ceph
[3:10] <infernix> with ethernet it works as is
[3:10] <nhm> infernix: if you've got time it might be worth trying out on your test cluster. I've heard good things about it.
[3:11] <infernix> you just need to either look at udp or get tcp offloading etc
[3:11] <infernix> but on infiniband it's a totally different beast
[3:12] <infernix> nhm: i'll jump the gun as soon as i can test it, no worries
[3:12] <tore_> port to port latency is much better on infiniband, but not entirely convinced that's needed unless you have huge ssd arrays
[3:12] <upgrayedd> i don't think it's much better if you compare QDR to 10GbE
[3:12] <nhm> infernix: paper is here: https://www.openfabrics.org/ofa-documents/presentations/doc_download/495-rsockets.html
[3:13] <infernix> tore_: the bandwidth with ipoib is only like 15gbit. actual qdr is 40gbit; though effectively 32gbit
[3:13] <infernix> with more native infiniband protocols, 3gbyte/s is not a real problem
[3:13] <nhm> yeah, we were about to about 3.1-3.3 on our setup at my last job.
[3:13] <infernix> nhm: i can do some python but my C skills are lacking
[3:14] <nhm> I kind of wish I had purchased connectx3 cards instead of intel 10G cards for the test node I've got.
[3:14] <infernix> you don't want me to code up rsockets for ceph
[3:14] <infernix> nhm: on that note i got connectx2 for like $160 a piece off ebay
[3:14] <infernix> the switch is another story, though
[3:14] <tore_> the bandwidth isn't an issue. it's the port to port latency. this is somehtign I deal with regularly sinc ewe mostly service financial companies running HFT
[3:15] <tore_> QDR can get as good as 100ns port to port latency on voltaire switches
[3:15] <nhm> infernix: that's ok, this box just has a single client directly connected to it.
[3:16] <nhm> infernix: we've got another cluster for scaling tests, but those boxes don't perform nearly as well.
[3:17] <tore_> will never get that with 10 gbe copper
[3:17] <tore_> we actually deploy alot of arista boxes using SFP+
[3:17] <tore_> these can get under 500ns
[3:18] <infernix> nhm: hmm i hacked some rdma stuff together in python a few months back
[3:18] <tore_> for storage applications, I don't really see the need to get 100ns port to port latency at the cost of cable length
[3:19] <upgrayedd> tore_: one time i asked jeffrey birnbaum why they used 10GbE instead of infiniband, and he gave me an earfull =P
[3:19] <infernix> i didn't use rsockets back then
[3:21] <upgrayedd> one of the things he said though is that they'll make it to 100GbE before they'll get 80GBit IB
[3:21] <tore_> honestly, for some of the new virtualization projects we're putting together, I've started moving towards DAS with LSI SAS switches
[3:22] <nhm> upgrayedd: "Make it" is kind of subjective though.
[3:23] * silversu_ (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:23] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Read error: Connection reset by peer)
[3:23] <upgrayedd> there were lots of things that were puzzling about the response, but when you're a 24-year old talking to a managing director at a wall street firm it's hard to make a solid case
[3:23] <infernix> money is one aspect
[3:24] <nhm> upgrayedd: yeah, understood. :)
[3:24] * lerrie2 (~Larry@remote.compukos.nl) Quit ()
[3:24] <infernix> IB tends to be more expensive
[3:24] <nhm> infernix: not sure if I buy that when you are talking about equal sped grades.
[3:24] <infernix> you have a little less flexible network scalability so you have to buy more stuff
[3:25] <infernix> fat tree topology
[3:25] <nhm> ie a QDR IB switch vs 40GBE switch.
[3:25] <infernix> maybe not port for port, but as a whole
[3:25] <upgrayedd> yeah- one thing that's compelling about IB is that you can actually make use of the redundant paths
[3:26] <upgrayedd> instead of just getting things turned off with spanning tree
[3:26] <infernix> i was on a call with mellanox this week, they have some interesting stuff coming out in Q1
[3:26] <upgrayedd> donuts? =P
[3:28] <infernix> basically ib switches where you either use a port as IB or 10ge
[3:28] <infernix> so its ib+ethernet in one
[3:28] <upgrayedd> oh- i thought it was more mesh torus routing talk
[3:29] * deepsa (~deepsa@ Quit (Quit: Computer has gone to sleep.)
[3:29] <upgrayedd> i hadn't looked closely- i thought that's what the SX60xx things could do
[3:29] <infernix> yes but the new models coming out are way nicer
[3:30] * deepsa (~deepsa@ has joined #ceph
[3:30] <infernix> anyway to stay on topic, rsockets is nice but wouldn't ceph benefit from rdma more
[3:30] <infernix> e.g. have OSDs read from disk and write directly into client memory?
[3:31] <infernix> as i understand it, that results in no cpu load on the client
[3:32] <nhm> infernix: I think in the long term rdma is the goal.
[3:33] <infernix> it does show promise though. lots better than sdp
[3:34] <infernix> and by the looks of it, not all that much work
[3:34] <infernix> less than i anticipated
[3:35] <infernix> but wait, can i LD_PRELOAD this
[3:37] <infernix> "We?introduce?rsocket?extensions?for?supporting?direct data?placement?(also?known?as?zero?copy).?"
[3:37] <infernix> o_O
[3:37] <infernix> it's already in there
[3:37] <via> has anyone here set up radosgw with centos?
[3:38] <infernix> via: incidentally i built ceph 0.55 on centos 5 yesterday
[3:38] <via> well, the initscript doesn't work, so i'm just running radosgw manually
[3:39] <via> but i can't get the httpd config right
[3:39] <via> i think the instructions are fairly debian specific
[3:41] <dmick> via: I'm sure they are; most of our work is on Debian or even Ubuntu
[3:42] <dmick> but the httpd config shouldn't suffer from that I wouldn't think
[3:42] <via> i wouldn't think so
[3:43] <gregaf1> I assume you're looking at http://ceph.com/docs/master/radosgw/config/?
[3:43] <gregaf1> I ran through that not long ago and it was mostly good, though I was indeed on Ubuntu
[3:44] <via> so, when thats done, i should be able to curl something like /v1/auth/ ?
[3:44] <via> because all those paths just return 404's
[3:45] * vata (~vata@bas2-kanata16-2925341235.dsl.bell.ca) has joined #ceph
[3:45] <via> and the error log just looks like its trying to look in those directories on disk
[3:45] <gregaf1> did you turn on the swift auth?
[3:45] <via> i created a user and keys
[3:45] <via> was there something other than that?
[3:45] <gregaf1> hrm, I think that's it, yeah
[3:45] <via> but regardless, i don't think its getting to the radosgw
[3:46] <gregaf1> I've barely even run the gateway myself, though
[3:46] <via> i don't really understand how fastcgi works, i won't lie
[3:46] <gregaf1> yeah, me neither
[3:46] <via> but the s3gw.fcgi script is in my documentroot
[3:46] <via> if i try to access it i get forbidden
[3:46] <via> 403
[3:46] <gregaf1> yehudasa, tomorrow during work, would be the one to ask
[3:46] <via> but i don't know if thats normal or not
[3:46] <via> ok
[3:47] <gregaf1> sorry I can't help more
[3:47] <gregaf1> but I'm off for the day, later all!
[3:47] <dmick> is radosgw running?
[3:47] <dmick> because it's not started by the request IIRC
[3:47] <via> it is
[3:48] <dmick> one thing that would help narrow it down is to strace that and see if it's ever getting activity from your HTTP
[3:48] <via> ok
[3:49] <via> is there a particular url i should hit?
[3:49] <dmick> not sure
[3:49] <via> if i just hit the root, i get my scientficilinux welcome page
[3:49] <dmick> I only barely know s3, and I don't know anything about swift
[3:50] <via> well, s3 then, just anything that shows that its taking requests
[3:50] <infernix> nhm: rsockets built :D
[3:50] <via> although i have to add the rewrites first
[3:51] <dmick> so with s3, you'd access a bucket name that you created somehow else, right?...
[3:52] <via> nothings getting through anyway
[3:52] <via> nothing showing in radosgw strace
[3:52] <via> its an apache config problem i'm pretty sure
[3:52] <dmick> you said "I have to add the rewrites first"; have you not?
[3:53] <via> i did
[3:54] <dmick> apache logs show the rewrite happening, maybe?
[3:54] * michaeltchapman (~mxc900@ has joined #ceph
[3:55] <via> no, it doesn't
[3:55] <via> all those redirect to a /s3gw.fcgi script which i can't access anyway
[3:55] <via> the foribdden 403 comes from apache
[3:55] <dmick> well, I could look over your apache config and see if I spot something
[3:55] <dmick> yeah, but that's magic
[3:56] <dmick> the .fcgi really sends the request through the socket to radosgw
[3:56] <via> also of note, if I put the FastCGIExternalServer option doesn't work unless i keep it in the <IfModule block
[3:56] * The_Bishop (~bishop@e179000202.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[3:57] <michaeltchapman> Hi all, I just upgraded a test cluster from argonaut to 0.55, and I can't create rbd images using the new format (to allow cloning). I get librbd: error setting image id: (5) Input/output error when using format 2 and no error when using format 1.
[3:59] <via> hmm, so this is fastcgid
[3:59] <dmick> michaeltchapman: what is the rbd command line that fails?
[3:59] <michaeltchapman> rbd create --pool volumes --size 1024 doesntwork --format 2
[4:00] <dmick> and you've already created the pool volumes, and the image doesntwork doesn't already exist there, right?
[4:00] <michaeltchapman> correct
[4:01] <dmick> (EIO is weird)
[4:01] <dmick> you upgraded all the daemons, and restarted them all, right?
[4:01] <dmick> (no chance there's an old OSD in the cluster?)
[4:01] <michaeltchapman> I'll do it again and check
[4:02] <dmick> if you're running with admin sockets by default, you can check each daemon with ceph --admin-daemon <path-to-admin-socket> version
[4:04] * The_Bishop (~bishop@e179000202.adsl.alicedsl.de) has joined #ceph
[4:04] <dmick> http://ceph.com/docs/master/rados/configuration/ceph-conf/#viewing-a-configuration-at-runtime, but instead of config show, version
[4:04] <michaeltchapman> actually something odd has happened to upstart so I think your first guess might be right
[4:05] <dmick> ok. well when you think you have it fixed, you can doublecheck that way
[4:09] <michaeltchapman> ah they aren't restarting correctly - === osd.0100 ===
[4:09] <michaeltchapman> No filesystem type defined!
[4:11] <dmick> oh, this isn't that stupid space bug still is it
[4:12] <dmick> that was in /etc/init.d/ceph
[4:15] <dmick> weirdly, I can't seem to find that fix
[4:17] <michaeltchapman> I'm rebooting one of the OSDs to see if that helps. I didn't reboot them after doing the upgrade from 0.44 to 0.55 - would there be an older kernel module causing issues?
[4:17] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[4:17] <dmick> shouldn't be, no
[4:18] <dmick> anyway, there's a place in /etc/init.d/ceph that does fs_type = "btrfs" to (attempt to) set fs_type
[4:18] <dmick> but you must do that without spaces, so it needs to be fs_type="btrfs"
[4:18] <michaeltchapman> ah ok, so setting it to "xfs" is probably what I need.
[4:18] <dmick> I *know* I saw a fix committed for this
[4:19] <dmick> but I'm having trouble figuring out where it is
[4:20] <michaeltchapman> I can't see it in any of the ceph-osd* files on my system
[4:21] <dmick> the file is /etc/init.d/ceph
[4:22] <dmick> OH. because it was fixed in mkcephfs
[4:23] <dmick> can I see your ceph.conf? pastebin it somewhere?
[4:23] <michaeltchapman> if I ran mkcephfs with the old version and thne boot up with the new will I still see that issue?
[4:23] <dmick> (I think there's a latent bug in init.d/ceph that you may or may not be hitting
[4:24] <dmick> sigh. and this isn't the first time I was confused about it
[4:25] * silversu_ (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[4:25] <dmick> http://tracker.newdream.net/issues/3581 documents the issue for /etc/init.d/ceph; I claimed it was resolved by 0a137d76bd9ee924c43a42abc33f4c6c06a03d5e, which is the commit that fixes mkcephfs. sigh.
[4:25] <dmick> I'll go undo my damage and make a new commit
[4:25] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[4:25] <michaeltchapman> http://pastebin.com/gExxM4K6
[4:26] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[4:27] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[4:28] <dmick> ok, yes, you're using devs in the config
[4:29] <dmick> which leads to this place in the startup script
[4:29] <michaeltchapman> what do the devs do? I figured it looks for block device errors or something but I didn't find the info anywhere.
[4:29] <dmick> it's a little unusual I guess if you're managing your own filesystems; generally you'd set osd data = <path to data filesystem>
[4:30] <michaeltchapman> ok so I'm doing something silly like telling the OSD to look at the raw device instead of the xfs partition?
[4:31] <dmick> yyyyeah. I'm not sure but I think devs may be primarily about letting mkcephfs re-mkfs the drives, which you probably don't want
[4:31] * joshd1 (~jdurgin@2602:306:c5db:310:1c6c:c7b2:9650:bc87) has joined #ceph
[4:31] <dmick> I think you can get past this by simply removing the devs lines
[4:31] <dmick> but we still have a lurking buglet
[4:31] <dmick> it just doesn't have to affect you
[4:33] <michaeltchapman> the services are starting now, thanks.
[4:33] <dmick> woot
[4:34] * upgrayedd (~achmed@65-130-121-24.slkc.qwest.net) has left #ceph
[4:41] <infernix> nhm: rsockets preloading library is actually slowing down ceph a lot. looks like it needs to be coded in
[4:42] <infernix> though not really sure that LD_PRELOAD=blah service ceph start is actually working, but i assume so
[4:42] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[4:42] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:43] <dmick> infernix: pmap $(pgrep ceph-xxx) and see if it's in the proc virt space
[4:44] <nhm> infernix: interesting
[4:45] <dmick> michaeltchapman: is rbd image creation working OK now?
[4:46] <michaeltchapman> dmick: http://pastebin.com/SVxs7M7T
[4:46] <infernix> dmick: it isnt
[4:46] <dmick> dat's a lotta cluster
[4:46] <infernix> but i'm not running with git rsocket, and so no fork support
[4:47] <michaeltchapman> I might let it set for a while - the disks are coming over FC so it can get clogged when I restart a lot of services
[4:47] <dmick> infernix: I'm not sure just what rsocket does but if pmap doesn't show it, it ain't there
[4:47] <dmick> and michaeltchapman: ok
[4:48] <infernix> well the ceph init script is rather cryptic
[4:48] <infernix> not sure where to put that LD_PRELOAD
[4:48] <dmick> you're using /etc/init.d/ceph, I assume?
[4:49] <infernix> yeah
[4:49] <janos> how do i get a report on which osd's are up and which are not?
[4:49] <infernix> probably just as easy to kill processes and start them manually
[4:50] <dmick> I suspect you want to prepend it to $cmd just before do_cmd "$cmd" $runarg
[4:50] <dmick> in the start) clause
[4:52] <infernix> not showing up with pmap
[4:53] <joshd1> janos: ceph osd dump [--format=json]
[4:53] <janos> joshd: thanks
[4:53] <janos> ah ceph osd tree seems handy too
[4:54] <joshd1> yeah, ceph osd tree is the crush map
[4:54] <janos> so i see ceph osd down|in|out but no up
[4:54] <janos> and one osd i have is listed as down
[4:54] <janos> how does that become up?
[4:54] <infernix> tried it on a bunch of $cmd places, no amount of LD_PRELOAD makes it show up in pmap
[4:55] * Machske (~bram@d5152D87C.static.telenet.be) Quit (Ping timeout: 480 seconds)
[4:55] <janos> i'm on f17 wth ceph .52 i came in and one of the hosts i have was completely locked up and network unresponsive
[4:55] <janos> this osd is on that machine
[4:55] <janos> i brought everything back up
[4:55] <joshd1> janos: the daemon starts running/responding to heartbeats again
[4:55] <janos> no test data lost that i can tell
[4:56] <joshd1> if you watch ceph -w, you shouldn't see any other osds reporting it down
[4:56] <janos> yeah not a whole lot going on in ceph -w
[4:57] <janos> i'm partially degraded
[4:57] <janos> but not much activity spooling out
[4:58] <joshd1> is the ceph-osd process still running?
[4:58] <janos> good question - what's the right way to check?
[4:58] <janos> sorry i am completely new to this. second day with something functioning
[4:58] <joshd1> just normal unix tools like ps -ef
[4:59] <janos> yep, appears to be
[4:59] <janos> ./usr/bin/ceph-osd pid, etc
[5:03] * dpippenger (~riven@cpe-75-85-17-224.socal.res.rr.com) Quit (Remote host closed the connection)
[5:03] * infernix gives up for today
[5:03] <infernix> LD_PRELOAD all the things
[5:03] <infernix> but no
[5:17] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[5:18] * Machske (~bram@d5152D87C.static.telenet.be) has joined #ceph
[5:43] <michaeltchapman> I have four OSDs out of 191 that refuse to change from "down". The processes are running and restart without throwing any errors, what should I look for to determine why they aren't rejoining the cluster?
[5:43] <dmick> are the processes running?
[5:44] <michaeltchapman> yes
[5:44] <dmick> http://ceph.com/docs/master/rados/operations/troubleshooting-osd/ is good to look at
[5:49] <dmick> but otherwise, I'd look at the specific osd logs for clues
[5:54] * dmick (~dmick@2607:f298:a:607:cdcc:47f1:28cc:d2b7) Quit (Quit: Leaving.)
[5:55] * joshd1 (~jdurgin@2602:306:c5db:310:1c6c:c7b2:9650:bc87) Quit (Quit: Leaving.)
[6:05] * PerlStalker (~PerlStalk@ Quit (Remote host closed the connection)
[6:06] * PerlStalker (~PerlStalk@ has joined #ceph
[6:10] * gohko (~gohko@natter.interq.or.jp) Quit (Quit: Leaving...)
[6:11] * Cube (~Cube@ Quit (Read error: Operation timed out)
[6:16] * vata1 (~vata@CPE000024cdec46-CM0026f31f24dd.cpe.net.cable.rogers.com) has joined #ceph
[6:17] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[6:21] * vata (~vata@bas2-kanata16-2925341235.dsl.bell.ca) Quit (Ping timeout: 480 seconds)
[6:23] * The_Bishop_ (~bishop@e179000202.adsl.alicedsl.de) has joined #ceph
[6:25] * The_Bishop (~bishop@e179000202.adsl.alicedsl.de) Quit (Read error: Operation timed out)
[6:31] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[6:34] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[6:55] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[6:55] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:07] * vata1 (~vata@CPE000024cdec46-CM0026f31f24dd.cpe.net.cable.rogers.com) Quit (Quit: Leaving.)
[7:18] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[7:18] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:40] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[7:53] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[8:00] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:00] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:08] * gucki (~smuxi@HSI-KBW-082-212-034-021.hsi.kabelbw.de) has joined #ceph
[8:14] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:14] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:15] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:15] * low (~low@ has joined #ceph
[8:17] * gaveen (~gaveen@ has joined #ceph
[8:32] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:32] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:47] * yasu` (~yasu`@ Quit (Remote host closed the connection)
[8:55] * Leseb (~Leseb@ has joined #ceph
[9:02] * yasu` (~yasu`@ has joined #ceph
[9:11] * fc (~fc@ has joined #ceph
[9:17] * mikedawson_ (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Remote host closed the connection)
[9:19] * loicd (~loic@soleillescowork-224-94.cnt.nerim.net) has joined #ceph
[9:22] * verwilst (~verwilst@d5152D6B9.static.telenet.be) has joined #ceph
[9:22] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[9:27] * loicd (~loic@soleillescowork-224-94.cnt.nerim.net) Quit (Ping timeout: 480 seconds)
[9:30] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:33] * Morg (d4438402@ircip1.mibbit.com) has joined #ceph
[9:33] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[9:33] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:33] <Morg> mornin'
[9:35] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) Quit (Read error: No route to host)
[9:35] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:36] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[9:40] * loicd (~loic@soleillescowork-224-94.cnt.nerim.net) has joined #ceph
[9:46] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:05] * BManojlovic (~steki@ has joined #ceph
[10:10] * LeaChim (~LeaChim@5ad684ae.bb.sky.com) has joined #ceph
[10:17] * yasu` (~yasu`@ Quit (Remote host closed the connection)
[10:23] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) has joined #ceph
[10:43] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[10:43] * yoshi (~yoshi@ has joined #ceph
[10:53] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[10:56] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[11:00] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[11:05] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[11:11] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[11:11] * BManojlovic (~steki@ has joined #ceph
[11:25] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:27] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[11:42] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[11:42] * nosebleedkt (~kostas@ has joined #ceph
[11:46] * brambles (lechuck@s0.barwen.ch) Quit (Read error: Connection reset by peer)
[11:48] * brambles (lechuck@s0.barwen.ch) has joined #ceph
[11:48] * brambles (lechuck@s0.barwen.ch) Quit (Read error: Connection reset by peer)
[11:52] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:52] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:06] * todon (~todon@ has joined #ceph
[12:10] * todon (~todon@ Quit ()
[12:22] * loicd (~loic@soleillescowork-224-94.cnt.nerim.net) Quit (Ping timeout: 480 seconds)
[12:27] * loicd (~loic@ has joined #ceph
[12:32] * ivoks_ (~ivoks@jupiter.init.hr) has joined #ceph
[12:33] * ivoks (~ivoks@jupiter.init.hr) Quit (Remote host closed the connection)
[12:34] * dxd (~dxd@ has joined #ceph
[12:36] <dxd> I may be doing something very stupid, but when I try and start my two node cluster using mkcephfs I'm getting "Error: error creating empty object store in /media/ceph/osd/osd0": (95) Operation not supported" I'm root, the directory exists and is empty?
[12:38] * dxd (~dxd@ has left #ceph
[12:39] * dxd828 (~dxd@ has joined #ceph
[12:39] <dxd828> Hi all
[12:40] <dxd828> I may be doing something very stupid, but when I try and start my two node cluster using mkcephfs I'm getting "Error: error creating empty object store in /media/ceph/osd/osd0": (95) Operation not supported" I'm root, the directory exists and is empty?
[12:41] * loicd (~loic@ Quit (Quit: Leaving.)
[12:41] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:41] * ivoks_ is now known as ivoks
[13:01] <jtang> did you define a journal?
[13:02] <jtang> i vaguely remember that error messages like this arent very informative or correct
[13:05] <dxd828> jtang, I think I found it... I missed the "filestore xattr use omap = true" so the command now works, I'm not sure what that did though :)
[13:11] * deepsa (~deepsa@ Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[13:24] * CristianDM (~CristianD@ has joined #ceph
[13:24] <CristianDM> Hi
[13:25] <dxd828> Hey
[13:25] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:27] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[13:28] * brambles (lechuck@s0.barwen.ch) has joined #ceph
[13:35] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[13:37] * gaveen (~gaveen@ has joined #ceph
[13:42] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[13:50] * nosebleedkt_ (~kostas@ has joined #ceph
[13:53] * maxiz (~pfliu@ has joined #ceph
[13:57] * nosebleedkt (~kostas@ Quit (Ping timeout: 480 seconds)
[13:59] * loicd (~loic@ has joined #ceph
[13:59] * guigouz (~guigouz@ has joined #ceph
[14:04] <CristianDM> I can´t start mds in 0.55
[14:09] <Psi-jack> There is an error trying to display the error message?
[14:12] <CristianDM> mmm some issue with cephx
[14:12] <CristianDM> Can I enable cephx one by one node?
[14:12] <CristianDM> Or need to enable this in all the cluster
[14:15] <CristianDM> OSD shows ** ERROR: osd init failed: (95) Operation not supported
[14:18] <dxd828> Quick qestion about fuse-ceph.. If one of the monitors die or the host goes down will it use the other monitor to take over?
[14:20] <CristianDM> Done, fix now.
[14:25] * CristianDM (~CristianD@ Quit ()
[14:27] * CristianDM (~CristianD@ has joined #ceph
[14:28] <CristianDM> I don´t undertand
[14:28] <CristianDM> I run service ceph status
[14:28] <CristianDM> And show
[14:28] <CristianDM> osd.20: running failed: '/usr/bin/ceph --admin-daemon /var/run/ceph/ceph-osd.20.asok version 2>/dev/null'
[14:29] * CristianDM (~CristianD@ Quit ()
[14:29] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[15:00] * guigouz (~guigouz@ Quit (Ping timeout: 480 seconds)
[15:00] * guigouz (~guigouz@ has joined #ceph
[15:02] * Morg (d4438402@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[15:02] * The_Bishop (~bishop@e179010086.adsl.alicedsl.de) has joined #ceph
[15:08] <dxd828> Are there ceph-fs packages for the kernel modules for centos ?
[15:08] * The_Bishop_ (~bishop@e179000202.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[15:10] * guigouz (~guigouz@ Quit (Ping timeout: 480 seconds)
[15:15] * guigouz (~guigouz@ has joined #ceph
[15:22] * guigouz (~guigouz@ Quit (Read error: Connection reset by peer)
[15:23] * guigouz (~guigouz@ has joined #ceph
[15:31] <andreask> dxd828: http://ceph.com/docs/master/install/os-recommendations/
[15:31] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[15:31] <Kioob`Taff> Hi
[15:31] * drokita (~drokita@ has joined #ceph
[15:31] <Kioob`Taff> I try to flush OSD journal (to use different device)
[15:31] <Kioob`Taff> but I have errors :
[15:32] <Kioob`Taff> # ceph-osd -i 11 --flush-journal
[15:32] <Kioob`Taff> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0b 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[15:32] <Kioob`Taff> SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0b 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[15:32] <Kioob`Taff> 2012-12-13 15:31:10.450726 7fe5851ef780 -1 flushed journal /dev/sdh3 for object store /var/lib/ceph/osd/ceph-11
[15:32] <Kioob`Taff> what are this SG_IO errors ?
[15:34] * francois-pl (c3dc640b@ircip3.mibbit.com) has joined #ceph
[15:36] <francois-pl> Hello everyone
[15:36] <iggy> Kioob`Taff: if i was to guess, bad hardware or at the very least something is in a very bad state
[15:36] <francois-pl> Is it the right place to ask for help in a ceph install ? :)
[15:37] <Kioob`Taff> iggy : yes, I suppose it's a «flush» IO, which doesn't work with that SSD (over hardware RAID card)
[15:41] <francois-pl> Sorry in advance if my question is stupid : after trying a mkcephfs, some process don't go away... a brutal kill have been done by myself... and result to a Z process ([ceph-osd] <defunct>) and a uninterruptible (Dl 14:56 0:00 /usr/bin/ceph-osd -c /tmp/mkfs.ceph.7010/conf --monmap /tmp/mkfs.ceph.7010/monmap -i 01 --mkfs --mkkey)
[15:41] <francois-pl> Is there other solution than reboot my server ?
[15:42] <iggy> francois-pl: usually no, not for zombies
[15:43] * guigouz (~guigouz@ Quit (Ping timeout: 480 seconds)
[15:44] <francois-pl> And for the D state ? I already have problem with zombies and ceph process... but it's the first time i see a D state :/
[15:44] * The_Bishop (~bishop@e179010086.adsl.alicedsl.de) Quit (Read error: Operation timed out)
[15:48] <iggy> dunno really, could be anything... is the ceph.conf you used with mkcephfs right?
[15:56] * guigouz (~guigouz@ has joined #ceph
[15:57] <francois-pl> I suppose yes... the first osd was correctly created... the second no... because of a lack of directory (my fault)... but it's not the first time i make this mistake... but the first i see this result :p
[15:58] * The_Bishop (~bishop@e179010086.adsl.alicedsl.de) has joined #ceph
[15:58] <francois-pl> Thx for your help anyway... i was just have a little hope to see someone have already see this type of problem ;)
[16:01] * nosebleedkt_ (~kostas@ Quit (Quit: Leaving)
[16:02] * stass (stas@ssh.deglitch.com) Quit (Ping timeout: 480 seconds)
[16:03] * stass (stas@ssh.deglitch.com) has joined #ceph
[16:08] * guigouz1 (~guigouz@ has joined #ceph
[16:08] * guigouz (~guigouz@ Quit (Read error: Connection reset by peer)
[16:08] * guigouz1 is now known as guigouz
[16:09] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[16:15] * l0nk (~alex@ has joined #ceph
[16:22] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:25] * guigouz (~guigouz@ Quit (Quit: Textual IRC Client: http://www.textualapp.com/)
[16:25] * guigouz (~guigouz@ has joined #ceph
[16:26] * aliguori (~anthony@cpe-70-113-5-4.austin.res.rr.com) Quit (Quit: Ex-Chat)
[16:27] * guigouz (~guigouz@ Quit ()
[16:28] * guigouz (~guigouz@ has joined #ceph
[16:29] * vata (~vata@ has joined #ceph
[16:32] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[16:34] <Kioob`Taff> wow
[16:34] <Kioob`Taff> what are emcpowerig and emcpowerhq devices ?
[16:35] <darkfaded> i dont remember ig / hq meaning
[16:35] <darkfaded> try powermt display all for the multipath status
[16:35] <jtang> does anyone here have experience with the DDN WOS scalers?
[16:35] <darkfaded> maybe there you find some hint?
[16:36] <jtang> how do they stack against ceph?
[16:36] * guigouz (~guigouz@ Quit (Ping timeout: 480 seconds)
[16:37] <terje> hey guys, a former co-worker of mine is working on a photo-hosting service. I recommended ceph as the backend.
[16:37] <terje> I recall seeing something about using a custom librados client for such a service.
[16:38] <terje> but I can't find it now. Anyone familiar with such a client?
[16:40] <jtang> dxd828: ah you were using ext4
[16:47] <Kioob`Taff> darkfaded: in fact it's rbd6 and rbd7 devices, that my OS call «emcpowerig» and «emcpowerhq»
[16:48] <darkfaded> oh that suckss :>
[16:48] <darkfaded> i dont know how to exclude them from powerpath detection
[16:48] <darkfaded> but a funny issue
[16:49] <Kioob`Taff> how, in fact, it's only «iostat» which call them like that
[16:49] <darkfaded> use sar -dp 10 1 :)
[16:50] <darkfaded> or -d if it then also picks powerpath names
[16:50] <Kioob`Taff> sar have the same "problem"
[16:51] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[16:51] * ircolle2 (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[16:51] <darkfaded> cool
[16:54] * low (~low@ Quit (Quit: Leaving)
[16:54] <Kioob`Taff> rbd kernel module should not use a specific major id for devices ?
[16:55] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[17:01] * aliguori (~anthony@ has joined #ceph
[17:05] * The_Bishop (~bishop@e179010086.adsl.alicedsl.de) Quit (Remote host closed the connection)
[17:07] * fc (~fc@ Quit (Quit: leaving)
[17:12] * verwilst (~verwilst@d5152D6B9.static.telenet.be) Quit (Quit: Ex-Chat)
[17:20] * gaveen (~gaveen@ has joined #ceph
[17:54] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[18:00] <paravoid> I have an OSD aborting reproducibly when trying to recover the journal (happened after a powercycle)
[18:00] <paravoid> I have a backtrace from the logs
[18:00] <paravoid> and I can provide the journal too if needed, nothing private there
[18:04] * Leseb (~Leseb@ Quit (Quit: Leseb)
[18:05] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[18:07] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[18:13] * oliver1 (~oliver@jump.filoo.de) has joined #ceph
[18:13] * gaveen (~gaveen@ has joined #ceph
[18:18] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[18:20] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[18:20] * gucki (~smuxi@HSI-KBW-082-212-034-021.hsi.kabelbw.de) Quit (Ping timeout: 480 seconds)
[18:21] * gucki (~smuxi@HSI-KBW-082-212-034-021.hsi.kabelbw.de) has joined #ceph
[18:24] * jlogan (~Thunderbi@2600:c00:3010:1:e5d8:3402:7d8b:dbea) has joined #ceph
[18:33] * guigouz (~guigouz@ has joined #ceph
[18:37] * oliver1 (~oliver@jump.filoo.de) has left #ceph
[18:41] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[18:57] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[19:03] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[19:03] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[19:04] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) has joined #ceph
[19:15] * Machske (~bram@d5152D87C.static.telenet.be) Quit (Ping timeout: 480 seconds)
[19:16] * guigouz (~guigouz@ Quit (Read error: Connection reset by peer)
[19:17] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[19:19] * ebo^ (~ebo@ has joined #ceph
[19:19] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Read error: Connection reset by peer)
[19:19] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[19:29] * yoshi_ (~yoshi@ has joined #ceph
[19:29] * yoshi (~yoshi@ Quit (Read error: Connection reset by peer)
[19:47] <ebo^> my cephfs lost some data .. i deleted stuff but it did not get freed correctly
[19:52] <ebo^> is there a way to see everything saved in a cephfs?
[19:54] * dxd828 (~dxd@ Quit (Read error: Operation timed out)
[20:01] * rweeks (~rweeks@ has joined #ceph
[20:11] * Ryan_Lane (~Adium@ has joined #ceph
[20:17] * dmick (~dmick@2607:f298:a:607:b8a0:4e3e:8f14:62ad) has joined #ceph
[20:18] * yoshi_ (~yoshi@ Quit (Remote host closed the connection)
[20:18] <ebo^> why does ceph scrub while recovering ... this does not seem like a very wise investment of resources
[20:19] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Read error: Operation timed out)
[20:21] <gregaf1> ebo^: it can take some time to delete objects because it's a background process on the MDS, not synchronous for clients
[20:21] <gregaf1> and we just discovered an issue where if you have multiple clients and some are idle then the cleanup can take a very long time indeed (the temporary "capabilities" for the file aren't getting dropped, and so the file can't be deleted)
[20:22] <ebo^> it's only 10G or so .. i loaded some extra data on it, deleted it and it vanished nearly instantly
[20:22] <gregaf1> ask sjust about scrubbing, but basically it goes as long as the OSDs aren't too loaded, and the recovery config options are tricky to get at the right level
[20:30] <ebo^> i'm toying around with ceph on a few of our compute nodes to test if building a ceph cluster for our hpc system is a good idea
[20:32] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[20:42] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[20:46] <ebo^> hmm
[20:46] <ebo^> i get very unsteady performance from my setup
[20:47] <ebo^> n1: mon + mds n2/3: 1x osd n4: mounted cephfs
[20:47] <ebo^> pushing data on n4 gives good performance but stalls every 20 secs until the journals are cleared
[20:48] <ebo^> is this normal
[20:48] <ebo^> ?
[20:48] <ebo^> stalling also means lots of slow requests log entries
[20:49] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) has joined #ceph
[20:49] <joao> sjust, do we even use 'struct ceph_pg' or 'struct old_pg_t' anymore?
[20:53] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[20:53] * rweeks (~rweeks@ Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[20:59] * Machske (~bram@d5152D87C.static.telenet.be) has joined #ceph
[21:03] * BManojlovic (~steki@ has joined #ceph
[21:09] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:11] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[21:13] * Cube (~Cube@ has joined #ceph
[21:14] * Oliver1 (~oliver1@ip-178-203-175-101.unitymediagroup.de) has joined #ceph
[21:21] <sagewk> joao: it's probably still a mix. the kernel is defnintely still limited to 16 bits. i thought userspace could go above 64k, but perhaps not. in any case, let's just put a limit on the pg_num argument for creating pools for now. we can remove it after we test things more carefully.
[21:21] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[21:26] <joao> sagewk, sure, going to push out the patch as soon as I test it
[21:26] <sagewk> k
[21:34] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) Quit (Quit: Leseb)
[21:34] * guigouz1 (~guigouz@ has joined #ceph
[21:39] * stxShadow (~Jens@ip-178-201-147-146.unitymediagroup.de) has joined #ceph
[22:12] * joao (~JL@89-181-151-182.net.novis.pt) Quit (Ping timeout: 480 seconds)
[22:15] * joao (~JL@89-181-151-182.net.novis.pt) has joined #ceph
[22:15] * ChanServ sets mode +o joao
[22:20] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Read error: Connection reset by peer)
[22:20] <lurbs> If you're wanting to do a rolling upgrade, what's the easiest way of cleanly removing an OSD, or set of OSDs, from the cluster?
[22:20] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:21] <lurbs> Marking them as down seems unreliable - I relatively often see "wrongly marked me down" messages.
[22:21] <lurbs> And reweighting immediately causes backfill.
[22:22] * info (~info@ip-178-201-144-23.unitymediagroup.de) has joined #ceph
[22:22] * info is now known as absynth_
[22:22] <absynth_> hey there, anyone around?
[22:22] <absynth_> (from inktank, that is)
[22:25] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[22:27] <dmick> sure absynth_, shoot
[22:29] <Machske> update topic ?
[22:29] <absynth_> dmick: just a precaution, we have a fairly big change today
[22:30] <absynth_> was on the phone with kevin and wolfgang earlier
[22:30] * The_Bishop (~bishop@e179010086.adsl.alicedsl.de) has joined #ceph
[22:34] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:35] <Machske> if you want to do a rolling upgrade, isn't it possible to stop 1 osd, upgrade it, and start it again ?
[22:35] * yasu` (~yasu`@ has joined #ceph
[22:36] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Operation timed out)
[22:36] <dmick> it should be, Machske
[22:36] <gregaf1> lurbs was asking about it
[22:36] <gregaf1> if you manually mark something down, it won't consider to to be "wrong"
[22:37] <gregaf1> that's an instruction; the "wrongly marked me down" happens when other OSDs make the choice
[22:37] <Machske> Well, I'm trying right now going from 0.55 to 0.55.1 but don't expect any special problems in this case: :)
[22:37] <dmick> gregaf1: something I wasn't clear on: do you have to set noup before marking an OSD down (or else it will just come back)?
[22:38] <gregaf1> nope, don't need to do that
[22:38] <Machske> I'm just going to stop the osd procs, upgrade and start them one by one. Do you need to tell the cluster this, or just let it recover on it's own ?
[22:38] <dmick> gregaf1: how do you stop it from coming back up on its own then?
[22:39] <gregaf1> actually, wait
[22:40] <gregaf1> this might have changed
[22:41] <gregaf1> urgh; I think it did — wonder when
[22:47] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[22:48] * ebo^ (~ebo@ Quit (Ping timeout: 480 seconds)
[22:49] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:49] <Machske> Well I must say, I've worked a year now with gluster and playing with ceph in a test environment and must say I'm really impressed with the architectural design.
[22:54] * andreask1 (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[22:57] * vata (~vata@ Quit (Quit: Leaving.)
[22:59] * aliguori (~anthony@ Quit (Remote host closed the connection)
[22:59] <dmick> Machske: good to hear!
[23:00] <absynth_> it's some very special kind of thrill seeing a cluster go from "all ok" to "21% degraded and there's a couple dozen slow requests" in a matter of seconds
[23:00] <absynth_> and, ideally, back again
[23:00] <Machske> :) indeed
[23:01] * dmick wishes there were ethernet cables that lit up in proportion to their busy-ness
[23:01] <Machske> when we do changes to our gluster setup, there is also the thrill of seeing, "yeah, we did not loose it all :)"
[23:01] <dmick> the data center would be so pretty with cluster ops
[23:02] <absynth_> blindingly bright
[23:12] <Machske> v0.55.1 seems to change something about auth -> auth: fixed default auth settings
[23:12] <Machske> the "upgraded" mon does not seem able to "join" anymore
[23:15] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) has joined #ceph
[23:23] <yehudasa> Machske: did you run 0.55 before?
[23:23] <Machske> yes
[23:23] <yehudasa> do you have auth set up in your ceph.conf?
[23:24] <Machske> under global in ceph.conf
[23:24] <Machske> auth supported = cephx
[23:24] <lurbs> The docs claim that changed recently.
[23:25] <lurbs> http://ceph.com/docs/master/rados/operations/authentication/
[23:25] <lurbs> At '6.'
[23:25] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[23:25] <yehudasa> well, if it was working beforehand, it should have also been working now
[23:25] <yehudasa> what does your mon log say?
[23:26] <Machske> well, it works now. I stopped all monitors, upgraded them all from 0.55 to 0.55.1, started the mons and now ceph -s reports HEALTH_OK
[23:27] <Machske> so it seems that a mon 0.55 wont work together with a mon running 0.55.1 ?
[23:27] <yehudasa> Machske: it should, not sure what happened there
[23:27] <Machske> Yeah, I'm not sure eather :)
[23:28] <yehudasa> in essence the 0.55 change shouldn't have affected you, as you had cephx configured
[23:28] <Machske> that was what I thought aswell
[23:28] <yehudasa> unless you have other auth configurables set in your ceph.conf
[23:29] <Machske> nope, thats the only line
[23:29] <Machske> but maybe the line had no effect in 0.55 and did in 0.55.1
[23:29] <yehudasa> e.g., if you had 'auth service required = none'
[23:29] <Machske> maybe that's the fix, that the auth line now works :)
[23:30] * rweeks (~rweeks@0127ahost2.starwoodbroadband.com) has joined #ceph
[23:30] <yehudasa> well, it actually didn't have any effect if you didn't set up anything as it'd use the other auth defaults instead
[23:30] <yehudasa> but they were all (except for the client one) set as 'cephx'
[23:31] <Machske> hmmm, well the only line I had was "auth supported = cephx", already configured in my v0.55 setup
[23:33] <yehudasa> could be just your client failing to connect though
[23:34] <Machske> actually I have cephfs mounted via fuse, and the mount point did not seem to become unavailable
[23:34] <Machske> for the status, I just used ceph -s
[23:35] <Machske> But I'm fairly new to ceph (like 2 weeks or so :) ), so it could be just me :)
[23:36] <Machske> this actually proves a lot of power with ceph, even in a "broken" state, my data was still available. A situation like this with gluster almost always means trouble
[23:37] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) Quit (Quit: Leseb)
[23:37] * aliguori (~anthony@cpe-70-113-5-4.austin.res.rr.com) has joined #ceph
[23:38] <Machske> can anyone briefly explain what function"pgmap" has ?
[23:38] <Machske> does it "devide" the available storage in parts or something ?
[23:38] <gregaf1> the pgmap actually just aggregates statistics on the "Placement Groups" for reporting to users
[23:39] <gregaf1> in RADOS then you can have multiple "pools", which are just flat logical namespaces
[23:39] <gregaf1> and those pools are essentially sharded into "Placement Groups", where each PG covers part of the hash space for that particular pool
[23:40] <gregaf1> so when you add/lookup an object, you hash its name to get the PG it's in, and then use CRUSH on the PG to figure out where it's stored
[23:40] <Machske> allright, that makes sense
[23:41] <gregaf1> the indirection exists because recovering from down OSDs requires finding everybody who's owned data in the past, and you don't want to do it for each (of the million+) object(s)
[23:41] * jjgalvez (~jjgalvez@ has joined #ceph
[23:42] <Machske> so with those pgs it's more efficient to for example recover from a lost osd so ceph know what has to be redistributed to keep enough copies available of each object ?
[23:43] <gregaf1> yeah
[23:44] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:46] <Machske> can I modify the "redundancy" for certain objects/files via crush ? for example, if I want my mail to be stored at least 3 times instead of only 2 times in the ceph cluster
[23:47] <gregaf1> you specify the number of copies on a per-pool basis
[23:47] <Machske> thx
[23:48] <gregaf1> if you're using cephfs (not production ready!) you can specify that certain hierarchies should store new files in different pools with the "cephfs" tool (only works on the kernel client, but rules are followed by FUSE as well)
[23:48] <Machske> well the rolling upgrade did work, even with the hickup of the mons, so that's great!
[23:48] <gregaf1> if RBD, you'll have to put the volume in the right pool, and you can specify that when creating it
[23:48] <gregaf1> etc
[23:48] <Machske> I'll need to experiment with that
[23:49] <Machske> are there known issues with cephfs ?
[23:49] <Machske> or is it just not tested enough yet
[23:50] <Machske> a lot of questions, I know :s
[23:51] * The_Bishop (~bishop@e179010086.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[23:51] <gregaf1> there are some known issues, although a lot of the single-MDS ones have been dealt with in the last couple months thanks to some external contributors
[23:51] <gregaf1> lots of known issues with multi-MDS setups, but if you've been using gluster you probably don't need that...
[23:52] <gregaf1> mostly it's just we need to do more testing and hardening
[23:52] * Cube (~Cube@ Quit (Quit: Leaving.)
[23:52] <Machske> so the "safest" way to use it right now is with a single active mds ?
[23:53] <gregaf1> yeah
[23:54] <gregaf1> and any number of standbys; those don't impact it
[23:54] * drokita (~drokita@ Quit (Quit: Leaving.)
[23:55] <Machske> mdsmap e109: 1/1/1 up {0=0=up:active}, 1 up:standby
[23:55] <Machske> looks ok then :)
[23:59] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.