#ceph IRC Log

Index

IRC Log for 2013-08-03

Timestamps are in GMT/BST.

[0:10] * alram (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[0:13] * sprachgenerator (~sprachgen@130.202.135.202) Quit (Quit: sprachgenerator)
[0:16] <sjust> sagewk: we may want to consider being able to backfill multiple osds concurrently
[0:16] <sjust> sage: ^
[0:16] <sjust> it would reduce the recovery read traffic for erasure coding
[0:17] <sage> yeah
[0:17] <sage> will probably rethink how that all works. like maybe we won't put them in the acting set during backfill...
[0:18] <sjust> well, the other advantages would still apply, the backfill peers might still have a copy of the log
[0:18] <sjust> though they wouldn't count toward the limit mentioned in the Peering section
[0:19] <sjust> mmm
[0:19] <sjust> also, since the acting set position matters, we might want to backfill them for a chunk position which we are currently occupying with a healthy, but old, peer
[0:19] <sjust> yeah
[0:19] <sjust> ok
[0:20] <sjust> so OSDMap will be able to give the acting set and the non-pg-temp-acting set
[0:20] * scuttlemonkey (~scuttlemo@38.122.20.226) Quit (Ping timeout: 480 seconds)
[0:20] <sage> yeah, it does that now i think.. that's what drives the calc_acting() logic?
[0:21] <sjust> I think up is used as the non-pg-temp-acting set
[0:21] <sjust> or something, but yeah, same idea
[0:22] * Vjarjadian (~IceChat77@90.214.208.5) has left #ceph
[0:29] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[0:30] <n1md4> hi. decided to bonnie my 4g ceph block that then filled to capacity and went belly up! http://pastie.org/pastes/8201375/text
[0:31] <n1md4> what does that error mean?
[0:32] * allsystemsarego (~allsystem@188.25.130.190) Quit (Quit: Leaving)
[0:32] * scuttlemonkey (~scuttlemo@64.134.221.128) has joined #ceph
[0:32] * ChanServ sets mode +o scuttlemonkey
[0:36] <dmick> n1md4: means the client can't talk to the monitor(s)
[0:37] <dmick> mons still up?
[0:38] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[0:40] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[0:44] * terje (~joey@63-154-146-168.mpls.qwest.net) has joined #ceph
[0:51] * gentleben (~sseveranc@216.55.31.102) Quit (Quit: gentleben)
[0:52] * terje (~joey@63-154-146-168.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[0:52] <n1md4> dmick: doesn't look to be
[0:53] <dmick> so, how many nodes, and how many daemons, and which of them seem to have died?
[0:54] <n1md4> I don't know how to find the answer to that
[0:54] * terje (~joey@63-154-146-168.mpls.qwest.net) has joined #ceph
[1:02] * terje (~joey@63-154-146-168.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[1:03] <n1md4> well, 2 nodes, and 3 osds on each node, one mon daemon .. maybe that is the right answer.
[1:05] * LeaChim (~LeaChim@2.122.178.96) Quit (Ping timeout: 480 seconds)
[1:07] <dmick> what I mean is: how did you configure your cluster (nodes and daemons you expect) and what's running now (which of those daemons are alive)
[1:07] <dmick> so did you plan for one mon, or is that the only one left out of, say, 3?
[1:07] * mtk (~mtk@68.195.89.131) Quit (Remote host closed the connection)
[1:09] <n1md4> there was only 1 mon
[1:09] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[1:10] * scuttlemonkey (~scuttlemo@64.134.221.128) Quit (Ping timeout: 480 seconds)
[1:10] <dmick> ok. so on the host where that was, look at /var/log/ceph for its log, and find out why it died
[1:11] * mozg (~andrei@host109-151-35-94.range109-151.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:12] * markbby (~Adium@168.94.245.1) has joined #ceph
[1:15] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[1:16] <n1md4> not really telling me any thing
[1:17] <n1md4> just mon saying OSD is near full (90%)
[1:17] <sjust> loicd: I updated the blueprint to point at the doc branch I just pushed to ceph.git
[1:18] <dmick> n1md4: can you pastebin the last, say, 100 lines of the mon log
[1:18] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) has joined #ceph
[1:20] * duff_ (~duff@199.181.135.135) has joined #ceph
[1:22] * mtanski (~mtanski@69.193.178.202) Quit (Read error: Operation timed out)
[1:22] <duff_> when running the mon create command, no keys are being created. the process seems to just spin and do nothing, https://friendpaste.com/15mhexpYN6Xmc3xujLylza. there doesn't seem to be anything interesting in the log file, at least nothing that looks like an error. any tips on what to poke?
[1:22] * joao (~JL@89.181.144.108) Quit (Quit: Leaving)
[1:24] <dmick> duff_: what files are in /etc/ceph
[1:25] <duff_> dmick: just the ceph.conf file
[1:26] <n1md4> dmick: http://pastie.org/pastes/8201464/text (thanks)
[1:27] <dmick> n1md4: it doesn't look to me like that monitor is down. Are you *sure* it died?
[1:27] <dmick> duff_: where did you get your ceph-deploy installation
[1:27] <n1md4> no, just 'ceph -s' doesn't output anything
[1:28] <dmick> (03:37:17 PM) dmick: mons still up? (03:52:45 PM) n1md4: dmick: doesn't look to be
[1:28] <dmick> let me ask that again more specifically: is the ceph-mon process still running?
[1:28] <n1md4> ah, well, sorry about that.
[1:28] <duff_> dmick: cloning the github repo, https://github.com/ceph/ceph-deploy.git. Centos ships with python 2.6 which ceph didn't seem to like. so I used pyenv to compile 2.7.5. I got this to work once, dunno why it doesn't anymore.
[1:29] <n1md4> I did ps aux | grep mon and there were no ceph related processes; hence the "doesn't look to be" comment.
[1:29] <dmick> duff_: if you updated the repo today, we know about a new issue.
[1:29] <duff_> dmick: ahh, that would explain it then. what point in the history is good to checkout from?
[1:29] <n1md4> dmick: no, the ceph-mon process is not running.
[1:30] <dmick> but that log also doesn't look like the mon log. which log file did you examine?
[1:30] <bandrus> duff_; The same thing happened to one of mine a week or so ago, turns out the monitors were not in quorum because one of them had 0.0.0.0 in the monmap
[1:30] <bandrus> essentially could not be reached
[1:31] <dmick> duff_: if you have ceph.conf in /etc/ceph, that's not the problem.
[1:31] <n1md4> dmick: Ah! (getting late!) here http://pastie.org/pastes/8201474/text
[1:31] <dmick> duff_: is the ceph-mon process running?
[1:31] * markbby (~Adium@168.94.245.1) Quit (Ping timeout: 480 seconds)
[1:31] <dmick> n1md4: there you go.
[1:31] <n1md4> :)
[1:32] <duff_> dmick: yes, as well as the ceph-create-keys process
[1:32] <n1md4> so, how can I start it up again? the only mon command i know is creating a new one, is that the way to go?
[1:33] <dmick> n1md4: looking at exactly what that error means. no use starting it if it's going to die for lack of space
[1:33] <n1md4> i've cleared the space already
[1:33] <n1md4> the device would still u/mount
[1:34] <dmick> you've cleared what space how?
[1:35] <dmick> duff_: what does ceph -n mon. -k ceph.mon.keyring -s say (from the dir where you were running ceph-deploy, that contains that keyring)
[1:37] <n1md4> i'd created the /dev/rbd/rbd/foo 4G device, mounted in /mnt , then runn bonnie which killed mon, i then rm the Bonnie.blah file left on /mnt/ I was then able to unmount and remount the same device, only mon had died .. and only up to now have i realised that's the only problem
[1:37] <n1md4> assuming it is ..
[1:37] <dmick> if the monitor is dead, you can't really be talking to rbd to rm and mount/unmount the device
[1:38] <dmick> did you create the foo device with rbd create and rbd map?
[1:38] <n1md4> yes
[1:39] <n1md4> .. in which case I've found new ways to break ceph! :D
[1:39] <dmick> sjust: does the OSD continue to allow writes when its datastore close enough to full that the monitor shuts it down?
[1:40] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:40] <duff_> dmick: 2013-08-02 16:39:46.967201 7fc7ab284760 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
[1:41] <duff_> dmick: 2013-08-02 16:39:46.967239 7fc7ab284760 -1 ceph_tool_common_init failed.
[1:41] <n1md4> dmick: afk .. thanks for the assist! I'm getting there slowly.
[1:41] <dmick> duff_: how do you have a ceph.mon.keyring that doesn't have the right key in it??...
[1:41] <duff_> dmick: gah nm, I'm silly. was in the wrong directory.
[1:42] <dmick> k
[1:42] <duff_> dmick: it's spewing out lines that look like this: 2013-08-02 16:41:02.185678 7f8339fd9700 0 -- 10.42.7.77:0/17178 >> 10.42.21.19:6789/0 pipe(0x7f832c000c00 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
[1:42] <duff_> would you like more of them?
[1:42] <dmick> by "it" now we mean the 'ceph -s' command?
[1:42] <duff_> yes
[1:42] <dmick> that sounds like the mon died. Did it?
[1:43] <duff_> i still see ceph-mon in ps aux: /usr/bin/ceph-mon -i n7-z01-0a2a074d --pid-file /var/run/ceph/mon.n7-z01-0a2a074d.pid -c /etc/ceph/ceph.conf
[1:44] <dmick> ok. try (probably as root or sudo) ceph daemon mon.n7-z01-0a2a074 mon_status
[1:44] <duff_> that command gives me the missing keyring error i pasted earlier
[1:45] <duff_> and I'm in the correct directory, promise ;)
[1:45] <dmick> ? that shouldn't even use a keyring
[1:45] <dmick> maybe I'm confused; try it with the -n mon. -k ceph.mon.keyring then
[1:46] <duff_> lines tht look like this: 2013-08-02 16:45:30.700927 7f2a23120700 0 -- :/18084 >> 10.42.21.19:6789/0 pipe(0x13e74e0 sd=3 :0 s=1 pgs=0 cs=0 l=1).faul
[1:46] <dmick> what version of ceph is this?S
[1:47] <duff_> ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
[1:47] <duff_> on CentOS 6.4
[1:48] <dmick> ok. instead of ceph daemon mon.n7-z01-0a2a074 mon_status
[1:48] <dmick> replace mon.n7-z01-0a2a074 with the path to the mon's 'asok' socket (it'll be in /var/run/ceph)
[1:49] <duff_> sudo ceph daemon /var/run/ceph/mon.n7-z01-0a2a074d.pid -n mon. -k ceph.mon.keyring gives me the same output
[1:49] <dmick> leave off the -n and -k
[1:49] <dmick> udo ceph daemon /var/run/ceph/mon.n7-z01-0a2a074d.pid mon_status
[1:50] <duff_> monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
[1:51] <dmick> I absolutely cannot explain this
[1:51] <dmick> but one more try:
[1:51] <dmick> sudo ceph --admin-daemon /var/run/ceph/mon.n7-z01-0a2a074d.pid mon_status
[1:52] <duff_> https://friendpaste.com/15mhexpYN6Xmc3xujMG2H2
[1:52] <duff_> 0.0.0.0 addresses seem bad
[1:52] <dmick> oh there are multiple mons. Are they all up?
[1:53] <bandrus> yeah that's the same thing that happened to me, the monmap had to be fixed with proper IPs before quorum could be reached, and the keys were immediately created afterwards
[1:53] <duff_> I never told it to create multiple mons, only on the admin node
[1:53] <duff_> but no, the other two do not have ceph-mon running
[1:54] <dmick> ? what ceph-deploy commands did you use?
[1:55] <duff_> https://friendpaste.com/15mhexpYN6Xmc3xujMG0lV
[1:56] <dmick> ok. you just need to mon create the other two nodes. Your cluster definition says your monitor cluster is three, but there's only one running, which is not a quorum. Start the other two.
[1:57] <duff_> oh? each node needs the mon going? The docs say don't put the mon onto the OSD nodes...
[1:57] <dmick> when you said "ceph-deploy new admin node1 node2", you said "create a new cluster, and its mons will be admin, node1, and node2"
[1:58] <dmick> http://ceph.com/docs/master/rados/deployment/ceph-deploy-new/#usage
[1:58] <dmick> there's no problem with running mons and osds on the same node. Where is this warning? I've heard it quoted several times in the last few days
[1:59] <duff_> http://ceph.com/docs/master/start/quick-ceph-deploy/: Tip In production environments, we recommend running Ceph Monitors on nodes that do not run OSDs.
[1:59] <dmick> yes. this is clearly not a production env
[1:59] <dmick> but we clearly need to explicate that poitn better
[2:00] <duff_> well sure, but that that is an end goal
[2:00] <dmick> all that means is "you'll be beating the OSDs like a rented mule once you really start stressing the cluster, so having the mons there too might overload it".
[2:00] <dmick> it's not something to worry about for a three-node cluster.
[2:01] <dmick> but if you want to have only one mon, just destroy the cluster and do the 'new' with only one hostname
[2:02] <duff_> ahh, yes. I don't expect having a super high load in the end with this cluster anywise.
[2:02] <duff_> okay awesome, I'll just put mon's on all the osd nodes and be good.
[2:03] <duff_> but in general then, you would create two clusters, one for mons and one for osds? (and another for msds?). You then just have to configure where to find each said cluster?
[2:03] <dmick> ceph mon create node1 node2 should get you rolling quickly
[2:03] <dmick> no. you might just have 3 extra machines in the cluster (scattered across failure domains) on which to run the mons
[2:04] <dmick> they could have much less disk and probably less net and CPU, but need the same connectivity as the OSDs
[2:04] <dmick> basically
[2:04] <duff_> ok cool, thanks
[2:04] <dmick> (although they also don't need the 'cluster' network connection if you've segregated)
[2:04] <dmick> (between 'public' and 'cluster)
[2:05] <dmick> http://tracker.ceph.com/issues/5853
[2:06] * jluis (~joao@89.181.144.108) Quit (Quit: Leaving)
[2:07] <dmick> http://tracker.ceph.com/issues/5854
[2:08] <duff_> thanks :)
[2:09] <duff_> another bit that would be helpful is a quick bit talking about how to add nodes into the cluster. I'm sure its in the docs, but a little section at the end of the getting started aobut this would be helpful.
[2:10] <duff_> anywise, I'm off for the weekend. Thanks a bunch for the help dmick!
[2:10] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Quit: leaving)
[2:10] <dmick> ok. gl
[2:10] * duff_ (~duff@199.181.135.135) Quit (Quit: Lost terminal)
[2:11] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:11] <sjust> dmick: no
[2:12] <dmick> sjust: then I'm confused how n1md4 managed to consume enough storage to break the mon. My guess is that it's not really an RBD device, but just a file in /dev
[2:13] <dmick> and so it just ate up root
[2:13] <sjust> what was the situation?
[2:15] <dmick> (03:30:55 PM) n1md4: hi. decided to bonnie my 4g ceph block that then filled to capacity and went belly up! http://pastie.org/pastes/8201375/text
[2:15] <dmick> I think this was the cluster log http://pastie.org/pastes/8201464/text
[2:16] <dmick> then again, the cluster log looks ilke it really was writes to the cluster
[2:16] <sjust> were the osd and mon on the same disk?
[2:16] <dmick> mon log: http://pastie.org/pastes/8201474/text
[2:17] <dmick> I'm not sure.
[2:17] <dmick> probably. 2 nodes, each with 3 osds
[2:17] <dmick> mon on one or the other
[2:18] <sjust> mon on same fs as osd probably isn't great
[2:19] <dmick> I was trying to figure out if "reached critical levels of available space on data store" meant the mon store or the cluster
[2:20] <dmick> DataHealthService in the mon, I'm guessing it's the mon data store itself
[2:23] <sjust> dmick: yeah, the mon was toast
[2:23] <sjust> or rather, was full and didn't want to risk corruption
[2:24] <dmick> ok. so the cluster might have been starting to reject writes, but the mon ran out of private storage before that happened
[2:26] <sjust> possibly
[2:37] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) Quit (Quit: gentleben)
[2:54] * devoid (~devoid@130.202.135.213) Quit (Quit: Leaving.)
[2:55] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) has joined #ceph
[3:01] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[3:01] <n1md4> hi, again. any way to clear this http://pastie.org/pastes/8201635/text
[3:01] <n1md4> it was 8 full, it's now 6, so does it clear itself?
[3:02] <dmick> no, you need to remove some data
[3:03] <dmick> and those numbers sure are wacky
[3:03] <n1md4> hah - they're pretty good.
[3:03] <dmick> ceph df detail might be interesting too
[3:04] <dmick> did you see the discussion above with me and sjust? That's our theory for what happened to you
[3:04] * BillK (~BillK-OFT@124-148-246-233.dyn.iinet.net.au) has joined #ceph
[3:04] <n1md4> dmick: I'll have a read .. http://pastie.org/pastes/8201642/text?key=yss3vnucv0hg5dpcqqvpw
[3:04] * acaos_ (~zac@209-99-103-42.fwd.datafoundry.com) has joined #ceph
[3:05] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[3:05] <dmick> Size 10E. That is one big pair of computers
[3:06] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) Quit (Ping timeout: 480 seconds)
[3:07] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) has joined #ceph
[3:09] <n1md4> what the heck is E ?
[3:11] <dmick> exabyte
[3:12] <n1md4> oooh shiny .. so 1tb seagate barracudas must really be something :P
[3:13] <dmick> is your cluster state changing? are you deleting data?
[3:15] <n1md4> don't know how to delete data (no doubt I'm missing a concept), but there's nothing on the rbd device..
[3:18] * dpippenger1 (~riven@tenant.pas.idealab.com) Quit (Quit: Leaving.)
[3:19] <dmick> does ceph -s still say what http://pastie.org/pastes/8201635/text said?
[3:20] <n1md4> nope, only 5 full osd now
[3:21] <n1md4> still crazy full % though "3852% full"
[3:22] <dmick> so I would unmount the rbd device in the kernel
[3:22] <dmick> and then use rbd ls to find out what the image name is, and rbd rm to remove it
[3:24] <n1md4> okay.
[3:25] <n1md4> http://pastie.org/pastes/8201683/text seems a bit stuck now
[3:28] <dmick> awesome. OK, try this:
[3:28] <dmick> rados -p rbd ls
[3:29] <dmick> that should give you 25-ish object names?
[3:30] <dmick> if so, then try for o in $(rados -p rbd ls); do rados rm $o; done
[3:30] <n1md4> "rados returned (22) Invalid argument"
[3:30] <dmick> for the ls?
[3:31] <n1md4> "rados -p rbd ls"
[3:31] <dmick> wth?
[3:31] <dmick> rados lspools?
[3:32] <n1md4> data, metadata, and rbd
[3:33] <dmick> can you do the rados -p rbd ls again and pastebin the command and output?
[3:34] <n1md4> http://pastie.org/pastes/8201717/text?key=xrq7qnqmb67u2k9pfa2xa
[3:37] <dmick> well that is one screwed up pool
[3:37] <n1md4> thank you :)
[3:38] <dmick> how about this: ceph osd pool get rbd pg_num
[3:39] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[3:39] <dmick> ?
[3:41] <n1md4> i think it's far too screwed! best way to blow all the osds away and start again?
[3:41] <dmick> what happened with the osd pool get?
[3:42] <n1md4> pg_num: 64
[3:42] <dmick> ok, so that was easy, right?
[3:42] <dmick> now rados rmpool rbd
[3:43] <dmick> and ceph osd pool create rbd 64
[3:43] <dmick> and you should have an empty accessible rbd pool
[3:45] <n1md4> wooooh! No, that hard crash the server http://pastie.org/pastes/8201745/text
[3:45] <n1md4> nice!
[3:45] <dmick> uhhhhh
[3:45] <dmick> wow
[3:46] <n1md4> never mind, it's all good fun
[3:46] <dmick> oh shit, you weren't rbd mounting from the same host as the OSD, were you?
[3:46] <n1md4> erm, yes, and now you mention it I read a tip that I shouldn't do that .. cool, learning the hard way ;)
[3:47] <dmick> I mean, you'd expect deadlock, not a panic, but I suspect rbd hadn't quite let go of that pool yet
[3:47] <dmick> (maybe, in fact, you hadn't rbd unmapped; I didn't think to check)
[3:47] <dmick> Not that it should panic the kernel in any event, but...
[3:47] <dmick> yeha.
[3:48] <dmick> anyway, when that host reboots, you may be able to get it back without the krbd in the way
[3:48] <n1md4> laptop battery about to go! thanks for the assist I'll prolly be back next week ;)
[3:48] <dmick> ok. gl
[3:48] <dmick> have a nice weekend
[3:48] <n1md4> ! thanks, you too
[3:59] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) Quit (Quit: gentleben)
[4:02] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[4:04] * markbby (~Adium@168.94.245.2) has joined #ceph
[4:14] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[4:16] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) Quit (Quit: sprachgenerator)
[4:19] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Quit: leaving)
[4:30] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[4:41] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[4:43] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[4:45] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit ()
[4:46] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:54] * joshd1 (~jdurgin@2602:306:c5db:310:11ad:b56a:5b6f:7bca) has joined #ceph
[4:55] * terje (~joey@63-154-145-179.mpls.qwest.net) has joined #ceph
[4:56] * huangjun (~kvirc@58.51.149.211) has joined #ceph
[4:59] * terje (~joey@63-154-145-179.mpls.qwest.net) Quit (Read error: Operation timed out)
[5:05] * fireD (~fireD@93-142-230-25.adsl.net.t-com.hr) has joined #ceph
[5:06] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit (Quit: mtanski)
[5:07] * fireD_ (~fireD@93-139-174-231.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:10] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[5:19] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[5:46] * Isaac (~AndChat19@wireless-nat-2.auckland.ac.nz) has joined #ceph
[5:48] * Isaac (~AndChat19@wireless-nat-2.auckland.ac.nz) Quit ()
[5:50] * terje (~joey@63-154-151-19.mpls.qwest.net) has joined #ceph
[5:52] * terje (~joey@63-154-151-19.mpls.qwest.net) Quit (Read error: Operation timed out)
[6:11] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[6:17] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[6:28] * AfC (~andrew@2001:44b8:31cb:d400:cc05:1c07:9192:74e2) has joined #ceph
[6:55] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[7:06] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[7:16] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:46] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[7:57] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Read error: Operation timed out)
[7:58] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[8:07] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[8:37] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:59] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[9:00] * terje_ (~joey@63-154-148-227.mpls.qwest.net) has joined #ceph
[9:08] * terje_ (~joey@63-154-148-227.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[9:11] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[9:12] * ScOut3R (~ScOut3R@dsl51B617E0.pool.t-online.hu) has joined #ceph
[9:20] * terje_ (~joey@63-154-148-227.mpls.qwest.net) has joined #ceph
[9:25] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[9:25] * gentleben (~sseveranc@c-98-207-40-73.hsd1.ca.comcast.net) has joined #ceph
[9:25] * joshd1 (~jdurgin@2602:306:c5db:310:11ad:b56a:5b6f:7bca) Quit (Quit: Leaving.)
[9:28] * terje_ (~joey@63-154-148-227.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[9:32] * ScOut3R (~ScOut3R@dsl51B617E0.pool.t-online.hu) Quit (Ping timeout: 480 seconds)
[9:41] * odyssey4me (~odyssey4m@41-133-58-101.dsl.mweb.co.za) has joined #ceph
[10:06] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) Quit (Ping timeout: 480 seconds)
[10:40] * wiwengweng (~oftc-webi@111.161.17.68) has joined #ceph
[10:41] <wiwengweng> 2012-08-01 11:37:31.865809 mon.0 [INF] placement groupmap v9715: 384 placement groups: 384 active+clean; 8730 bytes data, 22948 MB used, 264 GB / 302 GB avail can anyone tell me what the last 3 parameters mean?
[10:51] <Gugge-47527> the last 3?
[10:55] * odyssey4me (~odyssey4m@41-133-58-101.dsl.mweb.co.za) Quit (Ping timeout: 480 seconds)
[11:02] * LeaChim (~LeaChim@2.122.178.96) has joined #ceph
[11:07] <wiwengweng> 8730 bytes data, 22948 MB used, 264 GB / 302 GB avail
[11:07] <wiwengweng> what is the data? is it objects in osd?
[11:10] <wiwengweng> btw, I have 3 osds in a single host. one mounts with /dev/sdb1, but the other 2 is not mounted with hdd. so how is the storage space calculated?
[11:11] <wiwengweng> and the mounted hdd is only 20G
[11:12] <wiwengweng> this is the actual monitor info in my host:2013-08-03 02:10:24.904631 mon.0 [INF] pgmap v8261: 632 pgs: 632 active+clean; 10236 bytes data, 8251 MB used, 31163 MB / 41360 MB avail
[11:12] * KindTwo (KindOne@50.96.231.69) has joined #ceph
[11:12] <wiwengweng> ceph@ubuntu:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 19G 3.6G 15G 20% / udev 485M 4.0K 485M 1% /dev tmpfs 198M 344K 198M 1% /run none 5.0M 0 5.0M 0% /run/lock none 495M 0 495M 0% /run/shm /dev/sdb1 3.0G 1.1G 2.0G 35% /var/lib/ceph/osd/ceph-0
[11:14] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:14] * KindTwo is now known as KindOne
[11:33] * jamespage (~jamespage@culvain.gromper.net) Quit (Quit: Coyote finally caught me)
[12:10] * s2r2 (uid322@id-322.ealing.irccloud.com) Quit (Ping timeout: 480 seconds)
[12:11] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:11] * KindTwo (KindOne@h231.56.186.173.dynamic.ip.windstream.net) has joined #ceph
[12:12] * KindTwo is now known as KindOne
[12:19] * huangjun (~kvirc@58.51.149.211) Quit (Read error: Operation timed out)
[12:19] * huangjun|2 (~kvirc@58.55.124.227) has joined #ceph
[12:28] <Gugge-47527> wiwengweng: you have 10236 bytes of objects in the system
[12:29] <Gugge-47527> wiwengweng: on the filesystems the osd uses there is used 8251MB (the used space on / + the used space on / + the used space on /var/lib/ceph/osd/ceph-0
[12:30] <Gugge-47527> wiwengweng: on the filesystems the osd uses there is 31163 MB free (again two times the free space on / + the free space on /var/lib/ceph/osd/ceph-0)
[12:35] * terje (~joey@63-154-145-99.mpls.qwest.net) has joined #ceph
[12:43] * terje (~joey@63-154-145-99.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[13:15] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:24] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:36] * haomaiwang (~haomaiwan@117.79.232.197) Quit (Ping timeout: 480 seconds)
[13:37] * rongze_ (~quassel@notes4.com) Quit (Ping timeout: 480 seconds)
[13:58] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[14:05] * terje (~joey@97-118-167-16.hlrn.qwest.net) has joined #ceph
[14:07] * reiv (~reiv@180.172.136.52) has joined #ceph
[14:10] <reiv> I have a small system with only 1 monitor, after reboot, the monitor refuse to start, are there any way to recover?
[14:19] <Gugge-47527> start by checking why it wont start
[14:19] <Gugge-47527> in the log file
[14:20] * reiv (~reiv@180.172.136.52) Quit (Quit: This computer has gone to sleep)
[14:23] * reiv (~reiv@180.172.136.52) has joined #ceph
[14:24] <reiv> sorry, just away for a moment.
[14:24] <reiv> I try to start with ceph-mon -f -i a
[14:27] * reiv_ (~reiv@180.172.136.52) has joined #ceph
[14:32] * reiv (~reiv@180.172.136.52) Quit (Ping timeout: 480 seconds)
[14:33] * reiv (~user@180.172.136.52) has joined #ceph
[14:33] <reiv> Here is the log of ceph-mon: https://gist.github.com/yuchangyuan/0a0a56a14fa4649ec2c8
[14:35] * reiv_ (~reiv@180.172.136.52) Quit (Ping timeout: 480 seconds)
[14:39] * _Tassadar (~tassadar@tassadar.xs4all.nl) Quit (Ping timeout: 480 seconds)
[14:40] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[14:44] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[14:53] * rongze (~quassel@117.79.232.234) has joined #ceph
[14:59] * haomaiwang (~haomaiwan@li565-182.members.linode.com) has joined #ceph
[15:00] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[15:44] * reiv` (~user@180.172.136.52) has joined #ceph
[15:44] * reiv (~user@180.172.136.52) Quit (Read error: Connection reset by peer)
[15:48] <n1md4> morning. all osds are 'down', how can I set them 'up'?
[16:03] <janos> are they in at least?
[16:03] <janos> you can only set ni and out. the system handles the up and down part
[16:03] <janos> ni/in
[16:03] <janos> all of them down sounds like their services might not be started
[16:04] <n1md4> 8 osds: 0 up, 8 in
[16:04] <janos> yikes
[16:05] <janos> i'm amazed you're able to get any response from it. so the mons are up and in quorum?
[16:06] <n1md4> janos: I'm still beginning with Ceph, and have probably broken it several times over (see the irc logs ;) It looks like this at the moment http://pastie.org/pastes/8202961/text
[16:06] <n1md4> there are 2 nodes, with 3 osds on each, and one of the nodes also has the mon and mds.
[16:07] <janos> i recall breaking mine a lot at first too ;)
[16:07] <janos> never had them all down though
[16:07] <janos> if you run ceph -w a while do you see any communication errors?
[16:08] <janos> like run ceph -w in one terminal and then issue a /etc/init.d/ceph restart osd.X
[16:08] <n1md4> I've just started and will have a coffee, will let you know in a few minutes .. but, nothing is currently spamming the screen.
[16:08] <janos> yeah i need coffee too. but then i need to do some yardwork!
[16:09] <n1md4> hah! the wife has told me that too!!!
[16:09] <janos> haha
[16:09] <janos> been a jungle here this year
[16:10] <n1md4> for the moment, here is the failing restart http://pastie.org/pastes/8202969/text
[16:10] <janos> ah
[16:10] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) has joined #ceph
[16:10] <janos> what does ceph osd tree look like?
[16:11] <n1md4> http://pastie.org/pastes/8202973/text?key=g7gliqpabpwm3yejbxgtaa
[16:12] <janos> that "not found" - never run into that, but makes me wonder about the conf
[16:12] <janos> could you pastebin ceph.conf?
[16:12] <janos> the osd's also have about 0 weight
[16:14] <wiwengweng> Gugge-47527: are you in?
[16:15] <wiwengweng> I don't get the idea of mounted hdd and osd
[16:15] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[16:16] <n1md4> janos: http://pastie.org/pastes/8202975/text
[16:16] <janos> whoa
[16:17] <janos> i assume this is cuttlefish and created with ceph-deploy then? (i still love on bobtail with more manula methods)
[16:17] <n1md4> yes, it's cuttlefish
[16:17] <janos> *manual
[16:17] <n1md4> followed the docs
[16:17] <janos> i'm used to my osd's being defined in here
[16:17] <janos> hrmm
[16:18] <n1md4> sure, i've seen that in other tutorials
[16:18] <n1md4> at this early stage, I don't mind removing all traces of osds and starting again.
[16:18] <janos> you may want to
[16:18] <n1md4> of course, the troubleshooting is teaching me a lot too
[16:18] <janos> well, time for mowing the lawn and tearing the deck off
[16:18] <janos> yeah
[16:18] <janos> i found it helpful
[16:19] <janos> later!
[16:19] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[16:19] <wiwengweng> I have a virtual hdd in workstation, 20G. in this disk I install the system. and create two osd. I don't mount any drive to the osd, so how is the data stored if I don't mount the osd to a actual device?
[16:19] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley)
[16:24] * BillK (~BillK-OFT@124-148-246-233.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:27] <yanzheng> wiwengweng, stored in root fs
[16:29] <wiwengweng> means in the / ?
[16:29] <wiwengweng> so it use the rest of the virtual hard disk of 20G?
[16:29] <yanzheng> /var/lib/ceph/osd/
[16:29] <yanzheng> yes
[16:30] <wiwengweng> thanks. yanzheng
[16:31] <wiwengweng> I don't know much about hard disk and fs
[16:31] * Vincent (~Vincent@49.206.158.155) has joined #ceph
[16:32] <wiwengweng> so if I upload a image file. can I find it in / ? or in /var/lib/ceph/osd/
[16:32] <wiwengweng> ?
[16:33] <yanzheng> no, you image will be striped
[16:33] <yanzheng> s/you/your
[16:33] <wiwengweng> ?
[16:33] <wiwengweng> oh,yes. I remember striped
[16:34] <wiwengweng> but what 's/you/your'??
[16:35] <yanzheng> replace my typo you with your
[16:38] <wiwengweng> sorry. I don't understand
[16:38] <wiwengweng> :D
[16:43] <wiwengweng> and I don't know if I am right? /var/lib/ceph/osd/ceph-* is mounted on / ? if it is, why 'df -h' command don't show the mounted point?
[16:45] * Vincent_Valentine (Vincent_Va@49.206.158.155) has joined #ceph
[16:47] <yanzheng> they are directories of the /
[16:48] <wiwengweng> hah~ u r right. how fool :D
[16:50] <wiwengweng> and that's by default , right?
[16:50] <yanzheng> welcome to the unix world
[16:51] <wiwengweng> oh, my god. don't torture me
[16:51] <wiwengweng> that is the last thing I ever want to touch
[16:52] <wiwengweng> :(
[16:54] <wiwengweng> and I don't get your 's/you/your'~show me some awesome commands
[16:57] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit (Quit: mtanski)
[17:04] <bandrus> http://www.brunolinux.com/02-The_Terminal/Find_and%20Replace_with_Sed.html
[17:14] * Vincent_Valentine (Vincent_Va@49.206.158.155) Quit (Ping timeout: 480 seconds)
[17:19] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[17:21] * Vincent (~Vincent@49.206.158.155) Quit (Ping timeout: 480 seconds)
[17:25] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit (Quit: mtanski)
[17:33] * markbby (~Adium@168.94.245.1) has joined #ceph
[17:44] * Vincent (~Vincent@49.206.158.155) has joined #ceph
[17:47] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[17:48] * Vincent_Valentine (~Vincent_V@49.206.158.155) has joined #ceph
[17:51] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[17:52] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[17:54] * reiv` (~user@180.172.136.52) has left #ceph
[17:58] * Vincent_Valentine (~Vincent_V@49.206.158.155) Quit (Ping timeout: 480 seconds)
[18:01] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[18:04] * markbby (~Adium@168.94.245.1) Quit (Ping timeout: 480 seconds)
[18:07] * huangjun|2 (~kvirc@58.55.124.227) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[18:16] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[18:16] * Vincent (~Vincent@49.206.158.155) Quit (Ping timeout: 480 seconds)
[18:23] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:35] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[18:40] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[18:41] <lxo> ugh. I'm getting occasional failures to reconnect reported by ceph-mds in failures to decode message types 23 (reconnect) from ceph.ko clients running on my yeeloong (mips64el) netbooks with linux 3.10.4
[18:42] <lxo> the error occured while decoding the last of 2930 inodes in the v2 reconnect message, but my attempt to bypass the failure returning early from the function without trying to decode the last inode cause a kernel oops on the client, and the mds aborted the reconnect
[18:43] <lxo> the error is repeatable once it starts occurring, so I'll try again next time it occurs
[18:45] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) Quit (Ping timeout: 480 seconds)
[18:48] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[18:48] * wiwengweng (~oftc-webi@111.161.17.68) Quit (Quit: Page closed)
[18:50] <lxo> oh, the ceph cluster is running 0.61.7
[19:02] * Vincent_Valentine (~Vincent_V@49.206.158.155) has joined #ceph
[19:06] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[19:07] * devoid (~devoid@107-219-204-197.lightspeed.cicril.sbcglobal.net) has joined #ceph
[19:07] * devoid (~devoid@107-219-204-197.lightspeed.cicril.sbcglobal.net) Quit ()
[19:36] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit (Quit: mtanski)
[19:47] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[19:58] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[20:11] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[21:10] * NaioN (stefan@andor.naion.nl) has joined #ceph
[21:10] * NaioN_ (stefan@andor.naion.nl) Quit (Read error: Connection reset by peer)
[21:23] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[21:24] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[21:26] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[21:27] * Macheske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[21:30] * Vincent_Valentine (~Vincent_V@49.206.158.155) Quit (Ping timeout: 480 seconds)
[21:30] * Machske (~Bram@d5152D87C.static.telenet.be) Quit (Ping timeout: 480 seconds)
[22:04] * Vincent_Valentine (~Vincent_V@49.206.158.155) has joined #ceph
[22:08] * grepory (~Adium@209.119.62.120) has joined #ceph
[22:12] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit (Quit: mtanski)
[22:14] * NaioN (stefan@andor.naion.nl) Quit (Remote host closed the connection)
[22:27] * NaioN (stefan@andor.naion.nl) has joined #ceph
[22:41] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:45] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[22:47] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:48] * grepory (~Adium@209.119.62.120) Quit (Quit: Leaving.)
[22:50] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[22:55] * lx0 is now known as lxo
[23:12] * sleinen (~Adium@2001:620:0:25:5927:903d:7065:4f90) has joined #ceph
[23:15] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit (Quit: mtanski)
[23:18] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[23:18] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[23:21] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[23:21] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit ()
[23:36] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit (Read error: Operation timed out)
[23:52] * loopy (~torment@pool-96-228-147-185.tampfl.fios.verizon.net) has joined #ceph
[23:58] * BillK (~BillK-OFT@124-148-246-233.dyn.iinet.net.au) has joined #ceph
[23:59] <loopy> if I bring down a mon node, the other says hunting for mon, and it faults

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.