#ceph IRC Log


IRC Log for 2011-12-01

Timestamps are in GMT/BST.

[0:12] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[0:18] * sjustlaptop (~sam@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[0:22] * adjohn (~adjohn@ has joined #ceph
[0:33] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[0:40] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[0:41] * fronlius (~fronlius@e176057253.adsl.alicedsl.de) Quit (Quit: fronlius)
[0:41] <nwatkins> buck in the house
[0:42] <buck> I build and installed ceph from the master branch and I'm hitting an issue
[0:42] <buck> I can bring the ceph services up and connect to rados
[0:42] <buck> and put and get data from rados
[0:42] <buck> but when I attempt to mount the file system, the call hangs and eventually fails with a status 5
[0:43] <buck> the same behavior happens when I attempt to use the testceph utility
[0:43] <buck> this is all on a single node. vstart works correctly
[0:44] <joshd> what does 'ceph -s' say? are you using cephx ('auth supported = cephx' in ceph.conf)?
[0:45] <buck> i did not enable auth
[0:45] <buck> buck@ubuntu-ceph-3:~/git/ceph/src$ ceph -s
[0:45] <buck> 2011-11-30 15:45:19.657381 pg v41: 14 pgs: 14 active+clean+degraded; 1 KB data, 167 MB used, 14181 MB / 15116 MB avail; 1/2 degraded (50.000%)
[0:45] <buck> 2011-11-30 15:45:19.657747 mds e9: 1/1/1 up {0=a=up:creating}
[0:45] <buck> 2011-11-30 15:45:19.657975 osd e17: 1 osds: 1 up, 1 in
[0:45] <buck> 2011-11-30 15:45:19.659264 log 2011-11-30 15:35:25.635456 mon.0 3 : [INF] mds.? up:boot
[0:45] <buck> 2011-11-30 15:45:19.659511 mon e1: 1 mons at {a=}
[0:45] <buck> buck@ubuntu-ceph-3:~/git/ceph/src$that is all on one node, running ubuntu 11.10 in a VM
[0:47] <joshd> the problem is the mds is stuck in the creating state - it should have gone active - are there any errors in your mds log?
[0:50] * cp (~cp@ has joined #ceph
[0:55] <buck> Hmmm, I'm looking through the mds log and nothign is jumping out.
[0:56] <joshd> can you pastebin it?
[0:57] <buck> sure
[0:58] * pruby (~tim@leibniz.catalyst.net.nz) Quit (Read error: Operation timed out)
[1:00] * cp_ (~cp@ has joined #ceph
[1:00] * cp__ (~cp@ has joined #ceph
[1:00] * cp (~cp@ Quit (Read error: No route to host)
[1:00] * cp__ is now known as cp
[1:00] * cp (~cp@ Quit ()
[1:05] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[1:05] <buck> http://pastebin.com/DjyE4NTq
[1:05] <buck> that's the mds log
[1:08] * cp_ (~cp@ Quit (Ping timeout: 480 seconds)
[1:09] <buck> could this be a/the issue?
[1:09] <buck> 2011-11-30 15:56:20.128111 7fef7ef83700 -- >> pipe(0x140ac80 sd=9 pgs=0 cs=0 l=0).connect claims to be not - wrong node!
[1:13] * adjohn (~adjohn@ Quit (Read error: Connection reset by peer)
[1:14] * adjohn (~adjohn@ has joined #ceph
[1:23] * adjohn (~adjohn@ Quit (Read error: Connection reset by peer)
[1:26] <buck> joshd: did that pastebin link work out correctly?
[1:27] <joshd> sorry, got pulled away for a bit - back now
[1:29] <buck> joshd: here's the link in case it got lost in the noise http://pastebin.com/DjyE4NTq
[1:30] <buck> joshd: and thanks for the help
[1:33] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[1:33] <joshd> I think the 'wrong node' bit may be the problem - the mds needs to talk to the monitors to go active
[1:35] <buck> Hmmmm.... here's my ceph.conf file. Thought it was as bare bones as I could go http://pastebin.com/Fzme3wWy
[1:38] <buck> joshd: the "wrong node" message makes it look like the pids are getting confused, not the IP addresses or ports. Wonder what would cause that
[1:39] <joshd> yeah, your config looks fine
[1:40] <buck> I'm going to go back and try this on ubuntu 10.04 and see if I can recreate the issue there (pretty easy since I'm doing this all in VMs)
[1:40] <joshd> it may be a bug in the messaging layer
[1:41] <buck> Hmm...I built this from master. Would you suggest I try stable and see if it occurs again?
[1:43] <joshd> master should be working
[1:43] <joshd> if this is reproducible after restarting ceph, there may be something odd on your machine
[1:44] <joshd> buck: is your osd running properly? the mds isn't getting any replies from it
[1:44] <buck> It's reproducible after restarting ceph and the VM. I need to run but I'll try it in 10.04 ubuntu and report back tomorrow.
[1:45] <buck> joshd: I can read and write to rados
[1:45] * adjohn (~adjohn@ has joined #ceph
[1:45] <buck> joshd: and that would require a functioning osd, right?
[1:45] <joshd> yeah, but the question is why can't the mds talk to it?
[1:46] <joshd> if you can turn on osd debugging to 20, then restart the osd and mds, the osd log may have a clue
[1:46] <buck> cool. I'll try that later tonight and see what happens.
[1:46] <joshd> ok
[2:04] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[2:08] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Remote host closed the connection)
[2:10] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[2:13] * adjohn is now known as Guest18867
[2:13] * Guest18867 (~adjohn@ Quit (Read error: Connection reset by peer)
[2:13] * _adjohn (~adjohn@ has joined #ceph
[2:13] * _adjohn is now known as adjohn
[2:15] * adjohn is now known as Guest18868
[2:15] * Guest18868 (~adjohn@ Quit (Read error: Connection reset by peer)
[2:15] * _adjohn (~adjohn@ has joined #ceph
[2:15] * _adjohn is now known as adjohn
[2:25] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[2:52] * nwatkins (~nwatkins@kyoto.soe.ucsc.edu) Quit (Quit: WeeChat 0.3.5)
[2:53] * adjohn (~adjohn@ Quit (Quit: adjohn)
[2:57] * sjustlaptop (~sam@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[3:10] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[3:18] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:24] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:12] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:23] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[4:57] * sjustlaptop (~sam@96-41-121-194.dhcp.mtpk.ca.charter.com) has joined #ceph
[4:57] * sjustlaptop (~sam@96-41-121-194.dhcp.mtpk.ca.charter.com) Quit ()
[4:59] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[5:05] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[6:20] * NightDog (~karl@52.84-48-58.nextgentel.com) Quit (Remote host closed the connection)
[6:30] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:04] * morse_ (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[8:51] <chaos_> sagewk, there is possible regresion - http://tracker.newdream.net/issues/805, i'm double checking everything now, but it looks very similar
[9:00] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[9:08] * verwilst (~verwilst@dD57671DB.access.telenet.be) has joined #ceph
[9:26] <chaos_> sagewk, is it possible that this bug http://tracker.newdream.net/issues/1775 is related to xattr bug i've had earlier? Is it possible to work around this? I have yet to copy ~100GB.
[9:32] <pmjdebruijn> btw the build issue (on ubuntu) I reported earlier (with regard to uuid.h) got fixed
[9:33] <pmjdebruijn> I rebuilt yesterday with a fresh git checkout without issues
[9:41] <chaos_> pmjdebruijn, ceph at ubuntu builds fine.. yesterday.. day before.. today
[9:43] <chaos_> http://ceph.newdream.net/wiki/Debian
[9:43] <pmjdebruijn> now
[9:43] <pmjdebruijn> the package build broke a few days ago
[9:43] <pmjdebruijn> I'm doing git master package builds
[9:43] <chaos_> oh master, not stable
[9:43] <chaos_> sorry ;)
[9:44] <pmjdebruijn> https://github.com/NewDreamNetwork/ceph/commit/77a62fdce4afb305d5314590c02325b1b221c93f
[9:44] <pmjdebruijn> that probably fixed it
[9:48] <chaos_> yea;) they are preparing to 0.39
[10:19] * NightDog (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[10:20] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:23] * fronlius_ (~fronlius@testing78.jimdo-server.com) has joined #ceph
[11:23] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[11:23] * fronlius_ is now known as fronlius
[12:35] * NightDog_ (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[13:13] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[15:05] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) Quit (Ping timeout: 480 seconds)
[15:14] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) has joined #ceph
[15:16] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[15:40] * NightDog (~karl@52.84-48-58.nextgentel.com) Quit (Read error: Connection reset by peer)
[15:40] * NightDog (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[16:05] * NightDog_ (~karl@52.84-48-58.nextgentel.com) Quit (Read error: Connection reset by peer)
[16:05] * NightDog_ (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[16:05] * NightDog (~karl@52.84-48-58.nextgentel.com) Quit (Quit: Leaving)
[16:55] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[17:04] <grape> Is there any preferred location for the "secretfile" used to access a block device?
[17:27] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[17:37] * elder (~elder@aon.hq.newdream.net) has joined #ceph
[17:40] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:41] <chaos_> sagewk, i've uploaded journal file
[17:41] * NightDog_ (~karl@52.84-48-58.nextgentel.com) Quit (Read error: Connection reset by peer)
[17:47] <chaos_> http://tracker.newdream.net is down?
[17:48] <chaos_> hmm it's working again
[17:49] <sagewk> chaos_: do you have any osd log from around 2011-12-01 05:01:37.060638 ?
[17:49] <sagewk> chaos_: i'm wondering if there are any network connect error msgs around that time
[17:49] <chaos_> sagewk, just standard log
[17:50] <chaos_> I'll look into
[17:50] <sagewk> those should show up at the defualt logging levels
[17:50] <sagewk> afaics the osd just dropped a write. wondering if i can blame the messenger
[17:50] <chaos_> 2011-12-01 04:44:58.450060 7f822adac700 osd.0 769 OSD::ms_handle_reset() s=0x16f7a20
[17:50] <chaos_> 2011-12-01 05:05:41.050161 7f822adac700 osd.0 773 OSD::ms_handle_reset()
[17:51] <chaos_> I'll look at second osd
[17:51] <sagewk> chaos_: do you know why the mds originally stopped?
[17:51] <chaos_> originally?
[17:52] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[17:52] <chaos_> second osd looks fine too
[17:52] <sagewk> this crash is during mds restart/replay.. which means the original ceph-mds process stopped for some reason
[17:52] <chaos_> hmm
[17:54] <chaos_> 2011-12-01 05:01:34.437954 7ff59035f700 mds.0.0 replay_done (as standby)
[17:54] <chaos_> 2011-12-01 05:01:35.419865 7ff59035f700 mds.-1.-1 handle_mds_map i ( dne in the mdsmap, respawning myself
[17:54] <chaos_> 2011-12-01 05:01:35.419887 7ff59035f700 mds.-1.-1 respawn
[17:54] <chaos_> it's look like restart mds at second server
[17:57] <sagewk> were there multiple ceph-mds daemons?
[17:57] <chaos_> no
[17:58] <chaos_> not at single server
[17:59] <sagewk> there there were multiple servers?
[18:00] <chaos_> I have 2 mds daemons, each at separate server
[18:01] <chaos_> second one is set up to replay first
[18:01] <sagewk> and the backup1 log you posted is the full log?
[18:01] <chaos_> full log from first mds
[18:02] <sagewk> i'm curious what is in both logs ~5am... the one you posted has its first entry at 7:56
[18:02] <chaos_> erm...
[18:02] <chaos_> full i mean done with debug options
[18:02] <chaos_> not full by time frame..
[18:02] <chaos_> i have short log from 5am
[18:02] <chaos_> (without debug)
[18:02] <sagewk> that'll help
[18:03] <chaos_> ok
[18:03] <sagewk> just want to get the sequence of events straight (which mds went down, which took over, when, etc.)
[18:03] <chaos_> ok i'll upload short logs from both mdses
[18:03] <chaos_> sorry for mess
[18:07] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:07] <chaos_> sagewk, uploaded
[18:07] <chaos_> from 30.11.2011 till crash
[18:08] <chaos_> at timline after mds.backup1.log.1.gz is mds.backup1.log
[18:13] <sagewk> chaos_: btw, do you have a cron job going off every 5 min or something? there's a regular pattern of the mds thinking it's laggy
[18:13] <chaos_> there is munin-node
[18:14] <chaos_> it's gathers data every 5 minute
[18:14] <chaos_> it*
[18:14] <chaos_> is it somehow affecting ceph?
[18:15] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[18:15] <sagewk> the mds is failing to get its regular monitor beacon/heartbeats
[18:16] <chaos_> munin use osd performance sockets
[18:16] <sagewk> harmless here bc there is only 1 mds, but when you have a standby (that hasn't crashed :) it could trigger a failover to the other one
[18:17] <chaos_> well sometimes i saw switching to other mds.. but hetzner has some network issues from time to time
[18:20] <chaos_> mds thinks is laggy because of what? network latency?
[18:35] * aliguori (~anthony@ has joined #ceph
[18:40] * adjohn (~adjohn@ has joined #ceph
[18:46] <sagewk> chaos_: no definitive clues :(
[18:47] <chaos_> ;/
[18:47] <chaos_> shit
[18:47] <damoxc> sjust: did you manage to get anywhere with my pg's stuck creating issue?
[18:47] <chaos_> sagewk, so.. everything looks ok, but nothing is working?:p
[18:48] <sagewk> chaos_: http://fpaste.org/YDne/ is a kludge that will try to skip over the bad region in the journal... can try that and hope the mds can cope
[18:48] <chaos_> i'll try
[18:48] <chaos_> don't have anything to loose;p
[18:49] <chaos_> lose*
[18:54] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:11] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:12] <chaos_> sagewk, keep figers crossed
[19:12] <chaos_> trying to boot mds
[19:13] <chaos_> and dead
[19:14] <chaos_> sagewk, did u write down proper offset?
[19:14] <chaos_> 2011-12-01 19:13:14.049268 7ff0212fd700 mds.0.journaler(ro) try_read_entry got 0 len entry at offset 132399619
[19:15] <chaos_> and your hack skips 132398281
[19:16] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[19:22] <sagewk> whoops, yeah just change that value
[19:23] <chaos_> and other one is right?
[19:23] <chaos_> 0x07e4e943 is 132442435
[19:27] <sagewk> yeah, that's the next valid event i see in the journal
[19:27] <chaos_> ok
[19:27] <chaos_> trying again
[19:27] <sagewk> oh wait, hold on
[19:27] <chaos_> why?
[19:27] <sagewk> 0x07e4decf is also valid and preceeds it
[19:28] <chaos_> ok
[19:28] <chaos_> i'll change it
[19:48] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Quit: fronlius)
[20:00] <grape> I was watching the mon log and rebooting machines to see what happened with the cluster when a node dropped out. All of the sudden the fast moving log came to a stop and had a number of lines like these:
[20:00] <grape> 13: (SafeTimerThread::entry()+0xd) [0x57da1d]
[20:00] <grape> 14: (()+0x7efc) [0x7fe6d8d60efc]
[20:00] <grape> 15: (clone()+0x6d) [0x7fe6d779a89d]
[20:00] <grape> What is going on there?
[20:02] <yehudasa_> grape: we'll need larger backtrace to analyze that
[20:03] <yehudasa_> grape: what exactly were you doing with the mon that dumped that?
[20:03] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[20:04] <grape> yehudasa_: node three was coming back online. this log came from node1
[20:04] <yehudasa_> grape: were you taking down node1?\
[20:04] <grape> yehudasa_: node2 may have also not completed it's boot-up
[20:05] <grape> yehudasa_: nah I was watching the logs on node1
[20:05] <yehudasa_> grape: what did the rest of the backtrace say?
[20:06] <grape> it was similar stuff. I just restarted it and it ran like normal, but it was hanging on that
[20:07] <grape> yehudasa_: I'll let you know if I can duplicate it. I was merely curious.
[20:07] <yehudasa_> grape: that's how a stack dump looks like, it happens when a node crashes
[20:08] <yehudasa_> grape: usually we'd want the full crash as it might have better hints as to why the crash happened
[20:09] * bchrisman (~Adium@ has joined #ceph
[20:09] <grape> yehudasa_: good to know. I'll be sure to log heavily :-)
[20:10] * adjohn (~adjohn@ Quit (Quit: adjohn)
[20:15] * aliguori (~anthony@ has joined #ceph
[20:15] * buck (~buck@bender.soe.ucsc.edu) Quit (Quit: WeeChat 0.3.5)
[20:16] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[20:29] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[20:31] <chaos_> sagewk, http://wklej.org/id/638898/
[20:31] <chaos_> this hack doesn't help
[20:32] <chaos_> i've added line to even check if application uses this code
[20:32] <chaos_> dout(0) << "skip 132399619" << dendl;
[20:39] <NaioN> I noticed some change of the cephfs from the 0.38 to the git master
[20:40] <NaioN> before you could see with ls -lah the size of the contents of a directory
[20:40] <NaioN> now it looks like it displays the size of the directory "inode" itself
[20:40] <NaioN> is this correct?
[20:46] <NaioN> sorry i see it takes a lot longer to update those sizes :)
[20:46] <NaioN> before it updated more frequently
[20:48] <buck> joshd: I was going to take another look at that mds -> osd communication issue I was seeing yesterday
[20:49] <buck> joshd: I was able to get rid of that "wrong address" issue by specifying the 'pid file = ...' in ceph.conf
[20:49] <buck> joshd: but the issue of the mds staying in "up:creating" mode persists
[20:50] <joshd> buck: are there any osd_op_reply messages in the mds log?
[20:51] <buck> joshd: no
[20:52] <joshd> buck: can you enable debugging on the osd, restart the mds, and post the osd log?
[20:53] <buck> sure thing. When you say enable debug on the osd, that's just 'debug osd = 1' in ceph.conf, yeah?
[20:54] <joshd> debug osd = 20
[20:55] <joshd> the number is a logging level
[20:55] <joshd> not much gets logged at 1
[20:55] <buck> ahh, okay. I'd thought that was a binary flag. Glad I asked
[20:59] <buck> joshd: http://pastebin.com/gytSkC7R
[21:00] <buck> this line looks like it could be symptomatic of an issue
[21:00] <buck> 2011-12-01 11:57:03.090326 7f999ed40700 osd.0 7 hit non-existent pg 1.0, waiting
[21:04] <joshd> yeah, that would explain why the mds never got replies
[21:05] <buck> With this file system, I did a "mkcephfs" and that's it. I haven't added any data to it
[21:06] <buck> given that, is there a configuration level issue that could cause this rogue pg?
[21:06] * slang (~slang@chml01.drwholdings.com) Quit (Quit: Leaving.)
[21:07] <chaos_> sagewk, if you come up with another solution how to get my ceph running and copy missing data, please post it here or at bug page. I'll have to go now. It's 9pm here and it was long day. Anyway thanks for help.
[21:08] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[21:09] * bchrisman (~Adium@ has joined #ceph
[21:09] <joshd> buck: it's finding other pgs just fine, so I don't think it's a configuration problem
[21:09] <sagewk> chaos_: stick a continue; after that set_read_pos() call
[21:10] * bchrisman (~Adium@ Quit ()
[21:14] * bchrisman (~Adium@ has joined #ceph
[21:14] * bchrisman (~Adium@ Quit ()
[21:17] <sjust> buck: could you send us the file emitted by 'ceph osd getmap'?
[21:17] * aliguori (~anthony@ has joined #ceph
[21:18] <buck> sjust: sure. Will posting it to pastebin work or is there another route I should go, given that it's a binary file
[21:18] <sjust> pastebin probably won't work for a binary file?
[21:18] <sjust> you could probably email me
[21:18] <sjust> sam.just@dreamhost.com
[21:20] <buck> sjust: cool. just sent it your way
[21:22] * nwatkins (~nwatkins@kyoto.soe.ucsc.edu) has joined #ceph
[21:26] * bchrisman (~Adium@ has joined #ceph
[21:29] * grape (~grape@ Quit (Ping timeout: 480 seconds)
[21:32] * failbaitr (~innerheig@ Quit (charon.oftc.net resistance.oftc.net)
[21:32] * sage (~sage@ Quit (charon.oftc.net resistance.oftc.net)
[21:32] * MK_FG (~MK_FG@ Quit (charon.oftc.net resistance.oftc.net)
[21:32] * Bircoph (~Bircoph@nat0.campus.mephi.ru) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * u3q (~ben@uranus.tspigot.net) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * diegows (~diegows@50-57-106-86.static.cloud-ips.com) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * buck (~buck@bender.soe.ucsc.edu) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * eternaleye_ (~eternaley@ Quit (charon.oftc.net resistance.oftc.net)
[21:32] * svenx_ (92744@diamant.ifi.uio.no) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * ajm (adam@adam.gs) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * mfoemmel (~mfoemmel@chml01.drwholdings.com) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * sagewk (~sage@aon.hq.newdream.net) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * yehudasa_ (~yehudasa@aon.hq.newdream.net) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * colon_D (~colon_D@173-165-224-105-minnesota.hfc.comcastbusiness.net) Quit (charon.oftc.net resistance.oftc.net)
[21:32] * aneesh (~aneesh@ Quit (charon.oftc.net resistance.oftc.net)
[21:33] * slang (~slang@chml01.drwholdings.com) has joined #ceph
[21:33] * _adjohn (~adjohn@ has joined #ceph
[21:33] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[21:33] * eternaleye_ (~eternaley@ has joined #ceph
[21:33] * svenx_ (92744@diamant.ifi.uio.no) has joined #ceph
[21:33] * ajm (adam@adam.gs) has joined #ceph
[21:33] * yehudasa_ (~yehudasa@aon.hq.newdream.net) has joined #ceph
[21:33] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[21:33] * colon_D (~colon_D@173-165-224-105-minnesota.hfc.comcastbusiness.net) has joined #ceph
[21:33] * Bircoph (~Bircoph@nat0.campus.mephi.ru) has joined #ceph
[21:33] * MK_FG (~MK_FG@ has joined #ceph
[21:33] * diegows (~diegows@50-57-106-86.static.cloud-ips.com) has joined #ceph
[21:33] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[21:33] * mfoemmel (~mfoemmel@chml01.drwholdings.com) has joined #ceph
[21:33] * sage (~sage@ has joined #ceph
[21:33] * jjchen (~jjchen@lo4.cfw-a-gci.greatamerica.corp.yahoo.com) has joined #ceph
[21:33] * aneesh (~aneesh@ has joined #ceph
[21:33] * failbaitr (~innerheig@ has joined #ceph
[21:33] * u3q (~ben@uranus.tspigot.net) has joined #ceph
[21:33] * acaos (~zac@209-99-103-42.fwd.datafoundry.com) has joined #ceph
[21:53] * buck (~buck@bender.soe.ucsc.edu) Quit (Quit: Leaving)
[21:53] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[21:57] * MKFG (~MK_FG@ has joined #ceph
[21:58] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[21:58] * MKFG is now known as MK_FG
[21:59] <sjust> buck: could you post the output of ceph pg dump?
[22:05] <buck> sjust: http://pastebin.com/X5z5aGBx
[22:07] <sjust> oh... that's *wierd*
[22:07] <sjust> ok
[22:08] * ghaskins_ (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[22:11] <sjust> buck: pg_num is set to 0, so pg 1.0 actually doesn't exist
[22:12] <sjust> ./ceph osd pool set <poolname> pg_num <value>
[22:12] <sjust> will allow you to set <poolname>'s pg_num to <value>
[22:12] <sjust> 16 should be a good start
[22:12] <buck> huh...
[22:13] <buck> are you suggesting I set a pool to a higher pg_num to fix the issue?
[22:13] <sjust> buck: increasing the number of pgs once the pool has been in use, however, does not work yet (it's ok now since there isn't anything in any of the pools)
[22:13] <buck> interesting issue, regardless, given that I didn't set any of that myself
[22:13] <sjust> buck: yeah, I'm looking into that now
[22:13] <sjust> it normally defaults to something sane
[22:13] <buck> sjust: cool, thanks for the help btw
[22:13] <sjust> buck: sure, no problem
[22:14] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Ping timeout: 480 seconds)
[22:55] * aliguori (~anthony@ Quit (Remote host closed the connection)
[23:31] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[23:39] <sjust> damoxc: master should now have a fix for your pgs stuck in creating problem
[23:39] <damoxc> sjust: is it back-portable to 0.38 or shall I wait for 0.39?
[23:40] <sjust> it should work fine if you cherry-pick it
[23:40] <sjust> 6fbab6da6942c238d40a6b4f1680a7e6da463289
[23:48] * _adjohn (~adjohn@ Quit (Ping timeout: 480 seconds)
[23:50] <damoxc> sjust: awesome thanks, I'll test it out tomorrow when I'm back in the office

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.