#ceph IRC Log


IRC Log for 2013-06-07

Timestamps are in GMT/BST.

[0:00] * eegiks (~quassel@2a01:e35:8a2c:b230:dd7:5938:8fb7:d3b0) Quit (Ping timeout: 480 seconds)
[0:01] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) has left #ceph
[0:01] <paravoid> grr I can't reproduce
[0:04] * rturk-away is now known as rturk
[0:04] * DarkAce-Z (~BillyMays@ has joined #ceph
[0:04] * DarkAceZ (~BillyMays@ Quit (Ping timeout: 480 seconds)
[0:06] <paravoid> sjustlaptop: and yet it is related
[0:06] <paravoid> I restart osd.2 like 10 times without seeing a slow peering issue
[0:06] <paravoid> I did the same with osd.3
[0:06] <paravoid> and it's happening already
[0:06] <paravoid> with the first restart
[0:06] <sjustlaptop> is osd.3 also running the patched version?
[0:07] <paravoid> yes
[0:07] <sjustlaptop> have logs?
[0:07] * bstillwell (~bryan@bokeoa.com) has joined #ceph
[0:07] <paravoid> it's also doing the active but not active+clean
[0:07] <paravoid> I did start it with filestore=5 osd=20, currently waiting for it to settle
[0:07] <sjustlaptop> paravoid: that just means there were writes in the mean time
[0:08] <paravoid> it didn't happen with osd.2
[0:08] <sjustlaptop> ok
[0:08] <paravoid> same amount of writes going on on average
[0:08] <paravoid> also, during this time of NNN active, I'm getting spammed by slow requests
[0:10] <bstillwell> When I run 'ceph --admin-daemon /run/ceph/ceph-mon.a.asok mon_status' it lists a mon that I want to remove because I messed up when I added it.
[0:10] <bstillwell> Any idea how to do that?
[0:11] * eegiks (~quassel@2a01:e35:8a2c:b230:c5f7:246c:9b1:8667) has joined #ceph
[0:12] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[0:12] * rongze (~zhu@ has joined #ceph
[0:13] <paravoid> sjustlaptop: slowpeer3-ceph-osd.3.log.bz2 slowpeer3-osd3-ceph.log
[0:13] <sjustlaptop> thanks
[0:13] <paravoid> just finished
[0:13] <paravoid> thank you
[0:15] <andrei> joao: i am trying to run gbd but it tells me that there are missing debuginfos
[0:15] <andrei> i've got the core file
[0:15] <andrei> not sure if it will help?
[0:16] <andrei> i've uploaded the core file in the same folder
[0:16] <sjustlaptop> paravoid: try restarting osd.3 again and confirm that it repeers quickly
[0:19] <paravoid> sjustlaptop: confirmed
[0:19] <paravoid> less than 5s
[0:19] <sjustlaptop> paravoid: ...that trashes a lot of perfectly good theories
[0:20] <sjustlaptop> when you restarted it just now, how did you do it?
[0:21] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[0:21] <sjustlaptop> in the most recent log, looks like it took about a minute to peer?
[0:21] <andrei> guys, what is a reasonable rbd cache size that I should use to maximise performance?
[0:21] <andrei> i've got 32gb on the osd servers
[0:21] <andrei> and 8 osds per server at the moment
[0:22] <andrei> upgradable to 16 osds per server
[0:22] <paravoid> sjustlaptop: start "start ceph-osd id=3"
[0:22] <sjustlaptop> how did you insert the command line arguments?
[0:22] <sjustlaptop> the debug options, that is
[0:23] <paravoid> oh I started it with no debug options now
[0:23] <sjustlaptop> right, but the time before?
[0:23] <paravoid> root@ms-be1001:~# /usr/bin/ceph-osd --cluster=ceph -i 3 --debug-ms=1 --debug-filestore=5 --debug-osd=20
[0:23] <sjustlaptop> how long was there between stopping the daemon and starting it?
[0:23] <sjustlaptop> how did you stop the daemon?
[0:23] <paravoid> the debug one with kill
[0:23] <paravoid> the non-debug one with upstart (stop ceph-osd id=3)
[0:24] <sjustlaptop> ok, is id=4 on the same machine?
[0:24] <paravoid> yes
[0:25] <sjustlaptop> run stop ceph-osd id=4; start ceph-osd id=4
[0:25] <paravoid> 0-11 on the same box
[0:25] <sjustlaptop> and confirm that peering takes forever
[0:25] <paravoid> 2013-06-06 22:03:52.659735 mon.0 66131 : [INF] osdmap e185067: 144 osds: 141 up, 141 in
[0:25] <paravoid> 2013-06-06 22:04:31.746624 mon.0 66166 : [INF] osdmap e185068: 144 osds: 140 up, 141 in
[0:25] <paravoid> 2013-06-06 22:04:33.193120 mon.0 66168 : [INF] osdmap e185069: 144 osds: 140 up, 141 in
[0:25] <paravoid> 2013-06-06 22:05:35.536126 mon.0 66223 : [INF] osdmap e185070: 144 osds: 141 up, 141 in
[0:25] <paravoid> re: time delta
[0:25] <paravoid> and second time is
[0:25] <paravoid> 2013-06-06 22:05:36.737439 mon.0 66225 : [INF] osdmap e185071: 144 osds: 141 up, 141 in
[0:25] <paravoid> 2013-06-06 22:09:03.464529 mon.0 66405 : [INF] osdmap e185072: 144 osds: 140 up, 141 in
[0:25] <paravoid> 2013-06-06 22:09:06.179071 mon.0 66408 : [INF] osdmap e185073: 144 osds: 140 up, 141 in
[0:25] <paravoid> 2013-06-06 22:16:27.956069 mon.0 66762 : [INF] osdmap e185074: 144 osds: 141 up, 141 in
[0:25] <sjustlaptop> much longer in the second case
[0:26] * jahkeup (~jahkeup@ Quit (Ping timeout: 480 seconds)
[0:26] <paravoid> yes, since 01:16 < sjustlaptop> paravoid: try restarting osd.3 again and confirm that it repeers quickly
[0:26] <paravoid> :)
[0:26] <sjustlaptop> sorry?
[0:26] <paravoid> :16
[0:26] <paravoid> is when you told me
[0:27] <sjustlaptop> wait, just realized those are minutes
[0:27] <sjustlaptop> it was down for 5 minutes?
[0:27] <paravoid> I ran start immediately after you told me, which was 7 mins later than stop
[0:27] <sjustlaptop> oh
[0:27] <sjustlaptop> I assumed that all of the trials were starting from a clean cluster stop osd; start osd
[0:28] <paravoid> the osd.2 ones that I did before and peered quickly were like that though
[0:28] <paravoid> restart ceph-osd id=2 specifically
[0:28] <sjustlaptop> yeah
[0:28] <sjustlaptop> can you do restart ceph-osd id=4?
[0:28] <paravoid> sure
[0:28] <sjustlaptop> if osd.4 running right now?
[0:29] <sjustlaptop> that is, all which should be running are running and all pgs are active+clean?
[0:29] <paravoid> yes
[0:29] <sjustlaptop> ok, go for it
[0:30] <paravoid> peering took 40s
[0:30] <paravoid> still 232 active but not clean and slow req spews
[0:30] <sjustlaptop> ok, once it's clean do it again and confirm that it goes quickly (same command)
[0:31] <paravoid> 150 active...
[0:31] <paravoid> 2013-06-06 22:31:14.319519 osd.4 [WRN] 65 slow requests, 2 included below; oldest blocked for > 117.066943 secs
[0:31] <paravoid> etc.
[0:32] <sagewk> mikedawson: that sounds like the bug that .3 *fixed*.. the problem was in the prerm for the old version of the package that is run during upgrade. i'll add something to the release notes.
[0:32] <paravoid> hah
[0:32] <sagewk> mikedawson: you might confirm that going from 0.61.3 to the current cuttlefish branch (+2 patches) doesn't similarly restart
[0:32] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[0:32] <paravoid> now I didn't even *see* it go peering
[0:32] <sjustlaptop> you are sure the pid changed?
[0:32] <paravoid> less than 2 seconds
[0:32] <paravoid> I am
[0:32] <sjustlaptop> unbelievable
[0:33] <paravoid> it got to degraded
[0:33] <paravoid> 2013-06-06 22:32:13.103129 mon.0 [INF] pgmap v8358585: 16760 pgs: 16412 active+clean, 348 active+degraded; 44539 GB data, 137 TB used, 118 TB / 256 TB avail; 1292KB/s rd, 753KB/s wr, 238op/s; 5778564/826161386 degraded (0.699%)
[0:33] <paravoid> 2013-06-06 22:32:14.410519 mon.0 [INF] pgmap v8358586: 16760 pgs: 16614 active+clean, 146 active+degraded; 44539 GB data, 137 TB used, 118 TB / 256 TB avail; 3110KB/s rd, 3683KB/s wr, 766op/s; 2435645/826161671 degraded (0.295%); recovering 3 o/s, 252KB/s
[0:33] <paravoid> 2013-06-06 22:32:15.670089 mon.0 [INF] pgmap v8358587: 16760 pgs: 16760 active+clean; 44539 GB data, 137 TB used, 118 TB / 256 TB avail; 3606KB/s rd, 5888KB/s wr, 1197op/s; recovering 3 o/s, 248KB/s
[0:33] <sjustlaptop> are you seeing slow requests?
[0:33] <paravoid> not at all
[0:33] <paravoid> it all finished in 2s
[0:33] <paravoid> slow req threshold is 30s, so
[0:33] <sjustlaptop> yeah
[0:33] <paravoid> told you!
[0:33] <sjustlaptop> well, I believe you now
[0:34] <sagewk> just catching up.. what's the scoop?
[0:34] <paravoid> that's ceph version 0.61.3-1-gd082cf9 (d082cf91b45604cefa277532a304e82a81abe384)
[0:34] <paravoid> sagewk: my slow peering issue which wasn't bobtail after all
[0:34] <paravoid> bobtail-specific
[0:34] * dcasier (~dcasier@ Quit (Ping timeout: 480 seconds)
[0:34] <sjustlaptop> sagewk: reliable slow peering after initial restart, very quick after second restart
[0:34] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[0:34] <paravoid> and it'll get worse after a while too
[0:34] <paravoid> so these were restarted yesterday
[0:35] <sagewk> i've seen slow osd starts, but i alawys assumed it was load_pgs and cold caches
[0:35] <paravoid> not so long ago
[0:35] <sjustlaptop> sagewk: patched version without activate() dirty_log on a single node
[0:35] <sjustlaptop> sagewk: not a slow restart, slow peering
[0:35] <sjustlaptop> or not a noticeably slow restart
[0:35] <bstillwell> Any idea why I would be getting this:
[0:35] <bstillwell> ceph-mon -i a --extract-monmap /tmp/monmap
[0:35] <bstillwell> [20199]: (33) Numerical argument out of domain
[0:36] <sjustlaptop> warmer cache on older osdmaps?
[0:36] <sjustlaptop> still 2s vs 40s?
[0:36] <paravoid> I've also seen it take many minutes
[0:36] <paravoid> one of the logs I posted was like that I think
[0:36] <sjustlaptop> more recently written leveldb entries for the logs?
[0:36] <paravoid> 5' or so
[0:36] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[0:37] <sagewk> oh.. what about the osdmap catchup? i guess that happens during load_pgs, tho, when peering_wq is flushed?
[0:37] <gregaf> bstillwell: pretty sure -i is the ID, ie 0,1,2 not a,b,c
[0:37] <mikedawson> sagewk: now I get it. I didn't see the problem going from 0.61.2 to the last wip I tried. So did it only affect people upgrading off of a certain wip?
[0:37] <sagewk> would be interesting to see stack trace profile
[0:37] <sjustlaptop> sagewk: in either case, the osd is only not running for a matter of seconds
[0:37] <sagewk> the fix wsa v0.61.2-30-g131dca7, so upgrading *from* anything prior to that would do it
[0:38] <bstillwell> gregaf: That's not what the example shows under "REMOVING MONITORS FROM AN UNHEALTHY CLUSTER" on this page:
[0:38] <bstillwell> http://ceph.com/docs/next/rados/operations/add-or-rm-mons/
[0:38] <sagewk> -i a is right...
[0:38] <gregaf> k, guess I'm misremembering then
[0:38] <gregaf> well, really just guessing from the error code
[0:41] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:41] <gregaf> bstillwell: can you add "--debug 20 --debug-mon 20" to that again and pastebin the output?
[0:43] <bstillwell> gregaf: Is that right?: too many arguments: [--debug,20]
[0:45] * mattbenjamin1 (~matt@aa2.linuxbox.com) has joined #ceph
[0:46] <gregaf> bstillwell: it should be fine; maybe stick it in front of the output file stuff, and if that doesn't work replace it with "--debug-none" (it's an oddity of our parser)
[0:47] <paravoid> sjustlaptop: it seems your fix that made it in .3 helped at least someone else :)
[0:48] * mattbenjamin (~matt@aa2.linuxbox.com) Quit (Ping timeout: 480 seconds)
[0:48] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:48] <sjustlaptop> paravoid: you are a mystery, I suspect that the problem is log fragmentation
[0:48] <paravoid> I'm always a mystery!
[0:48] <paravoid> :-)
[0:49] <paravoid> so, what does log fragmentation means?
[0:49] <sjustlaptop> it means that we can't be quite so lazy in how we update the pg log during peering
[0:49] <sjustlaptop> I'll put together a patch today and tomorrow
[0:50] <sagewk> this isn't so different from what we wanted to do with log rewrites anyway, right?
[0:50] <sjustlaptop> sagewk: it is the log rewrites
[0:50] <sagewk> s/so different/at all/
[0:50] <sjustlaptop> I just didn't want to actually do it
[0:50] <sagewk> :) k
[0:50] <paravoid> haha
[0:50] <sjustlaptop> I can see the same effect actually in my own testing
[0:50] <paravoid> \o/
[0:50] <sjustlaptop> I write out 500k objects
[0:50] <paravoid> finally!
[0:50] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[0:50] * ChanServ sets mode +o elder
[0:50] <paravoid> I'm not crazy :)
[0:51] <sjustlaptop> and then 10 times mark an osd in and wait for 16 ios to complete
[0:51] <sjustlaptop> *256 ios
[0:51] <sjustlaptop> the first time, 14s
[0:51] <paravoid> does that also explain the slow req after peering is complete but before pgs are clean?
[0:51] <sjustlaptop> the second time, 16
[0:51] <bstillwell> gregaf: figured it out once I ran with -d
[0:51] <sjustlaptop> the third, 7
[0:51] <sjustlaptop> then 5, 4, 4, 4, 4
[0:51] <bstillwell> I was able to remove the bad mon too (mon.mon.b)
[0:52] <sjustlaptop> something about the state of leveldb immediately after the io stops seems to be much worse than after the first or second round of log rewrites
[0:52] <paravoid> what about the unclean pgs?
[0:52] <paravoid> same bug?
[0:52] <sjustlaptop> paravoid: don't know, probably unrelated
[0:52] <sjustlaptop> well
[0:53] <sjustlaptop> the fact that they are unclean is because we tell the monitor we are dieing before shutting down
[0:53] <paravoid> no I mean
[0:53] <sjustlaptop> allowing IO to continue on the other peers while we come back up
[0:53] <paravoid> under some of these tests
[0:53] <sjustlaptop> so when we come up there is recovery to do
[0:53] * mattbenjamin1 (~matt@aa2.linuxbox.com) Quit (Ping timeout: 480 seconds)
[0:53] <paravoid> there were a lot of slow requests after peering was finished
[0:53] <sjustlaptop> right, the fact that there are slow requests is a problem
[0:54] <sjustlaptop> the fact that there are non-clean pgs is to be expected
[0:54] <paravoid> right
[0:54] <paravoid> I don't remember this with bobtail
[0:54] <sjustlaptop> well, in bobtail, the osd probably came back up before it was confirmed dead
[0:54] <sjustlaptop> so there would not have been any IO in the mean time
[0:54] * jasdeepH (~jasdeepH@50-0-250-146.dedicated.static.sonic.net) Quit (Quit: jasdeepH)
[0:54] <sjustlaptop> so that might have hidden the bug
[0:54] <sjustlaptop> or it might be a new bug
[0:55] <paravoid> I certainly did tests where I stopped, waited to be marked down and then started again
[0:55] <sjustlaptop> ah, then it's a new bug
[0:55] <paravoid> anyway, one bug at a time :)
[0:55] <sjustlaptop> right
[0:55] <paravoid> any ideas on what makes my setup so special?
[0:55] <sjustlaptop> not yet
[0:55] <paravoid> I mean, most people haven't observed this
[0:55] <paravoid> too many pgs per osd maybe?
[0:56] <sjustlaptop> that probably doesn't help, but you only have around 500/osd
[0:56] <paravoid> it is a bit inflated as it was sized before pg splitting was implemented
[0:56] <sjustlaptop> which shouldn't be too big a problem
[0:56] <gregaf> five times as much as anybody else could pretty easily be passing a threshold
[0:56] <paravoid> I have 16k pgs with 144 osds
[0:56] <gregaf> and you're a pretty eager upgrader ;)
[0:56] <sjustlaptop> gregaf: yeah, but I've seen larger ratios without trouble
[0:56] <sjustlaptop> up to 1k seems to be fairly ok
[0:57] <sjustlaptop> though it's possible others with larger ratios don't notice slower peering
[0:57] <paravoid> I have a lot of small files/writes, which is also kinda special
[0:57] <paravoid> but I don't see how it could be related?
[0:57] <gregaf> yeah, I think Jim Schutt has different expectations in that area, and he's the only one with something in that range who conducts tests like this that I'm aware of
[0:58] <gregaf> but you probably are more aware of such things than I am
[0:58] <sjustlaptop> paravoid: anyway, I'll put together a patch set implementing smarter pg log dirtying
[0:58] <paravoid> okay
[0:58] <paravoid> I'm guessing not cuttlefish material
[0:59] <sjustlaptop> not sure
[0:59] <paravoid> this is pretty critical for me, so I guess I could live with 0.64
[0:59] <sjustlaptop> paravoid: if it's actually this, you could probably reduce the severity of the problem by running with smaller logs
[1:00] <paravoid> how would I do that?
[1:00] <sjustlaptop> osd_min_pg_log_entries
[1:00] <andrei> joao: are you online?
[1:00] <sjustlaptop> defaults to 3000
[1:00] <sjustlaptop> sagewk: how about more like 300?
[1:00] * jasdeepH (~jasdeepH@ has joined #ceph
[1:00] <paravoid> not in the config reference, heh
[1:01] * jasdeepH (~jasdeepH@ Quit ()
[1:01] <andrei> did you have much luck with determining the issue with mons from the logs that i've sent?
[1:01] <sagewk> yeah
[1:01] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[1:01] <sagewk> the tradeoff there is weaker dup op detection
[1:01] <sjustlaptop> paravoid: if this guess is correct, the work should by cut by 10x
[1:01] <sjustlaptop> sagewk: I'm not sure he's getting enough IO/pg for it to be a problem
[1:01] * loicd (~loic@magenta.dachary.org) Quit (Read error: Connection reset by peer)
[1:01] <sagewk> but worth testing, yeah
[1:01] * loicd (~loic@2a01:e35:2eba:db10:f59f:7722:b1c4:a589) has joined #ceph
[1:02] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[1:02] <sjustlaptop> paravoid: that one you would need to enable throughout the cluster
[1:02] <sagewk> and there are more efficient ways to address that, too (short per-object list of recent reqids, for instance)
[1:02] <sjustlaptop> true
[1:02] <sjustlaptop> paravoid: but you can do it via injectargs I think
[1:03] <paravoid> so, ceph osd tell \* injectargs '--osd-min-pg-log-entries 300'
[1:03] <paravoid> ?
[1:03] <sagewk> yeah
[1:03] <paravoid> and will it have effect immediately or should I wait a little first?
[1:03] <sagewk> and wait for some io so that the logs actually get trimmed
[1:04] <sjustlaptop> it'll take a little while for the effect to propogate
[1:04] <sjustlaptop> yeah
[1:04] <paravoid> right
[1:04] <paravoid> cool, thanks
[1:04] <sjustlaptop> yeah
[1:04] <sjustlaptop> if you can use the same procedure to test and post the observations to the bug, that would be awesome
[1:04] <paravoid> I will
[1:05] <paravoid> I'll wait a bit first, I don't have much I/O atm
[1:05] <sjustlaptop> yep
[1:05] <sjustlaptop> I'll be back in a few hours
[1:05] <paravoid> it's out of production, although I have a periodic sync with swift running
[1:05] * DarkAce-Z (~BillyMays@ Quit (Ping timeout: 480 seconds)
[1:05] <paravoid> thanks a lot
[1:06] <paravoid> I've had some really nasty bugs but on the plus side you guys are awesome :)
[1:07] * aliguori (~anthony@ Quit (Remote host closed the connection)
[1:15] * BillK (~BillK@124-169-216-2.dyn.iinet.net.au) has joined #ceph
[1:22] * DarkAce-Z (~BillyMays@ has joined #ceph
[1:23] <paravoid> so, I though enough time passed and tried restarting osd.5
[1:23] <paravoid> peering took a while, but it started from "6 peering" and decreased
[1:23] <infernix> nhm, ping
[1:23] <paravoid> instead of a hundred pgs or whatever
[1:24] * loicd (~loic@2a01:e35:2eba:db10:f59f:7722:b1c4:a589) Quit (Quit: Leaving.)
[1:25] <sagewk> sjusthm: osd->store->read(coll_t(), log_oid, 0, 0, map.logbl);
[1:25] <sagewk> in build_scrub_map_chunk()
[1:25] <sagewk> sigh
[1:25] <paravoid> wasn't it sjustlaptop? :)
[1:25] <sagewk> sjustlaptop: ^
[1:26] <paravoid> :-)
[1:26] <andrei> is anyone still awake who could help me with recovering my ceph cluster? I think joao is dozed off
[1:26] <andrei> i am having issues with my monitors
[1:27] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:30] <nhm> infernix: here for a bit, what's up?
[1:33] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[1:34] * bstillwell (~bryan@bokeoa.com) Quit (Quit: leaving)
[1:40] * jjgalvez1 (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[1:42] * eegiks (~quassel@2a01:e35:8a2c:b230:c5f7:246c:9b1:8667) Quit (Remote host closed the connection)
[1:42] <infernix> nhm, hi. we're looking at an all-ssd array
[1:42] <infernix> 48 disks, 6 HBAs, each disk on its own channel (no sas expanders, all passive)
[1:42] <paravoid> sjustlaptop: 16s peering for the restart of osd.6
[1:42] <infernix> i was just wondering if you have done any benchmarking with pure ssd?
[1:43] <nhm> infernix: it's becoming a hot topic. :)
[1:43] <nhm> infernix: Not much yet to be honest.
[1:43] * eegiks (~quassel@2a01:e35:8a2c:b230:c963:a32b:eca6:2292) has joined #ceph
[1:43] <infernix> i have seen this array do 2 million read iops, 4k random. about 1.8 million write
[1:44] <infernix> up to 19gbyte/s exported out over infiniband
[1:44] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[1:44] <nhm> infernix: wiht HBAs and 48 SSDs in 1 system you are going to easily start hitting QPI limitations.
[1:44] <nhm> infernix: for throughput that is
[1:44] <infernix> it can do 19gbyte/s. but we don't need it for that throughput much
[1:44] <nhm> infernix: 19GB/s over IB? How many IB links?
[1:44] <infernix> 4x fdr
[1:44] <infernix> qdr can go up to 13gb
[1:45] <infernix> 4x3.2gbyte/s
[1:45] <nhm> infernix: yikes!
[1:45] <nhm> infernix: are you doing lots of processor affinity tuning?
[1:45] <infernix> yeah. i would really prefer to buy a couple and build with ceph, but i am thinking that is going to hold them back
[1:46] <infernix> i've seen it, not hands on yet; probably in a week or two i will have it with 24 ssds
[1:46] <nhm> infernix: Ceph definitely hasn't been tuned for a beast like that. :)
[1:46] <infernix> i will be able to play around with it so i can do some tests, but was mostly wondering if you had tried
[1:47] <nhm> infernix: nothing like that. I tried 10 slow SSDs in a slow 6 core AMD node once. :)
[1:47] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[1:47] <nhm> infernix: with 1 HBA and expanders I don't trust. :)
[1:47] <infernix> this will have 2 8cores
[1:48] <infernix> qpi of 8GT/s
[1:48] <nhm> infernix: I could potentially try 8 Intel 520s hooked up to a single HBA, but I only have bonded 10GbE.
[1:48] <infernix> well even over IB it maxes out at 1.5gb due to tcp/ip
[1:49] <mrjack> thanks, 0.61.3 is well stable compared to 0.61.2 ;)
[1:49] <infernix> so i guess it'll be held back until rdma or rsockets is in
[1:50] <gregaf> right now it's probably more CPU-bound than anything else on a setup like that, right nhm?
[1:50] <nhm> infernix: if you are using replication you can at least have seperate front and back networks.
[1:51] <infernix> well if it can do ~2 million random IOPS at 4k
[1:51] <infernix> that's what
[1:51] <nhm> gregaf: Maybe assuming something else doesn't break. ;)
[1:51] <infernix> that's like 7GByte/s
[1:51] <nhm> infernix: there's definitely areas where latency can be introduced. :)
[1:52] <nhm> infernix: I was just talking to Sam about that a couple days ago.
[1:52] <gregaf> an OSD daemon does something likeā€¦20k IOPS with an in-memory backing store?
[1:52] <infernix> right. so i'll play a bit with it and see what I get in-box
[1:52] <infernix> but although i'd love to, i think i'll have to resort to something other than ceph for the time being
[1:53] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[1:53] <infernix> gregaf, 20k?
[1:53] <infernix> what's holding it bak?
[1:53] <gregaf> last I heard
[1:53] <gregaf> messages have to pass through a lot of queues and undergo a fair bit of processing
[1:54] <gregaf> I'm sure it can be improved, but those numbers aren't the sort of thing you can improve with work that's easy or cheap
[1:54] <infernix> true
[1:54] <gregaf> under any platform
[1:55] <gregaf> I mean, I see people talk about kernel filesystems having trouble with op rates under 100k/sec
[1:55] <gregaf> and those don't need to coordinate with anything except themselves ;)
[1:57] <infernix> i will see if i can free budget and find a smart guy to work on it
[1:57] <infernix> we do save a bunch of dollars with this new array
[1:57] <infernix> it being performance
[1:58] <infernix> or infiniband
[1:58] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[1:58] <infernix> it is still a bit too cost prohibitive to have 3 replicas though
[1:59] <infernix> so erasure coding would be a big help too
[2:00] <gregaf> yeah, loicd and sjusthm are laying some groundwork, but it'll be a while
[2:00] <infernix> i saw that
[2:01] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:04] * zindello (~zindello@ has joined #ceph
[2:06] * mschiff (~mschiff@port-91183.pppoe.wtnet.de) has joined #ceph
[2:10] * mschiff_ (~mschiff@2a02:2028:256:f531:290:f5ff:fec3:eac5) Quit (Ping timeout: 480 seconds)
[2:12] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[2:14] <zindello> Good morning everybody
[2:15] <zindello> I've just built a fresh ceph cluster, 3 nodes, running on CentOS. I've got all the mons in quorum
[2:15] <zindello> However
[2:15] <zindello> I've got a health error
[2:16] <zindello> HEALTH_WARN 91 pgs degraded; 192 pgs stuck unclean
[2:16] <zindello> This is a completely fresh cluster, I've not put anything into the cluster at all. THis was built using the ceph-deploy tool cloned from the git repo
[2:17] * mschiff (~mschiff@port-91183.pppoe.wtnet.de) Quit (Remote host closed the connection)
[2:17] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[2:20] <zindello> Any suggestions? Bare in mind I'm a complete newbie here
[2:23] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has left #ceph
[2:23] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[2:25] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[2:30] <dmick> zindello: how many OSDs? all up? what's the pool size set to? did you mess with the crushmap?
[2:33] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[2:38] <Tamil> zindello: how long have you been seeing this?
[2:39] <zindello> Haven't touched the crush map. Completely new install. deployed using ceph-deployso whatever the pool size that sets
[2:39] <zindello> 6 osds
[2:39] <zindello> 3 hosts
[2:39] <zindello> 3 mons
[2:39] <zindello> 2 osds per host
[2:39] <zindello> I just sent a little more detail to the email list
[2:43] <zindello> Tamil: Right from the get go. It's a completely fresh install, and I haven't put any objects into the cluster at all
[2:43] <dmick> are all the osds up and in? (ceph osd dump)
[2:43] <zindello> The end goal is to evaluate cephfs for use within the company
[2:43] <zindello> Yep
[2:44] <dmick> does ceph osd tree look like the topology you expect?
[2:44] <zindello> Actually
[2:44] <zindello> For seemingly no reason at all, 2 of the OSDs have just fallen out
[2:45] <zindello> 0 and 1
[2:45] <zindello> restarting the ceph service seems not to have any effect
[2:45] <zindello> But, should ceph-deploy be creating sections within /etc/ceph/ceph.conf for the osds?
[2:45] <dmick> the ceph-all service? and no
[2:46] <dmick> check the osd logs for clues as to why they died, perhaps
[2:47] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[2:52] <zindello> Hah
[2:52] <zindello> Turned out to be a time sync and mounting issue
[2:52] * sagelap (~sage@2600:1012:b028:a2a5:3954:7de6:f874:d301) has joined #ceph
[2:52] <sagelap> dmick: need to squash 521d4cc4e5ea0e16184b5164314e4918a2e0d6c8 before merge
[2:54] <dmick> only that? :)
[2:54] <sagelap> there are more patches you haven't pushed, right?
[2:55] <dmick> yyyyes.
[2:55] <sagelap> the get_str_vec patch doesn't look like what you were talkinga bout earlier :)
[2:55] <dmick> let me push those now
[2:55] <sagelap> k
[2:55] <dmick> fixing up the previously-commented tests for pg_command and osd_command
[2:56] <zindello> Ok. If I'm looking at a cluster deployed with ceph-deploy, how can I see what "devices" the OSDs reference?
[2:57] <zindello> And, am I right in assuming that if a node needs to be rebooted for whatever reason, that ceph-osd should automatically mount the OSDs on the server?
[2:59] <sagelap> ceph-deploy disk list HOST
[2:59] <dmick> ceph-deploy disk list can help
[2:59] <sagelap> (iirc)
[3:00] <zindello> Cool :) And if I wanted one of my colleages to look at the cluster using ceph deploy on another machine on the other side of the world. Do they just have to do a ceph-deploy gatherkeys first?
[3:00] <zindello> I appreciate the assistance for a newbie as well. I've been itching to try this since the Friday talk at LCA
[3:05] * rturk is now known as rturk-away
[3:06] <sjusthm> paravoid: 16s for an initial restart?
[3:06] <sjusthm> sagelap: whoops
[3:07] <dmick> zindello: they'd need the client.admin key
[3:08] <dmick> ceph-deploy gatherkeys is probably overkill, and I'm not sure what perms it needs (probably passwordless login and sudo)
[3:10] <dmick> (man I hate writing JSON in C++)
[3:11] <dmick> sagelap: one more test fixup
[3:11] <dmick> now I will attempt to compress commits
[3:12] <dmick> although frankly rebase across merge is frightening
[3:13] <zindello> dmick: Ok no worries :)
[3:14] * Tamil (~tamil@ Quit (Quit: Leaving.)
[3:15] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[3:19] <dmick> sagelap: test merge with master is clean-ish, but I think I just can't deal with reordering commits across the "merge in from master", right?...
[3:20] <dmick> I could always reset the last five, and undo today's master merge, and then clean up back to ee0913c2e6427093a0bf9dbe9cc536968be7667e I guess
[3:20] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[3:25] * eegiks (~quassel@2a01:e35:8a2c:b230:c963:a32b:eca6:2292) has left #ceph
[3:27] <zindello> How well does the cephfs fuse client compare to the kernel client?
[3:28] <zindello> We run mainly CentOS boxes here (Some fedora workstations, but servers are CentOS) so we're running some pretty old kernels
[3:29] <joshd> as long as you don't need posix file locking or 32-bit support the fuse client is fine
[3:30] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[3:31] <zindello> Mmm, everything here except dom0 on Xen (Damn citrix!) is all 64 bit
[3:31] <zindello> Cheers :)
[3:37] * sagelap (~sage@2600:1012:b028:a2a5:3954:7de6:f874:d301) Quit (Ping timeout: 480 seconds)
[3:38] * mrjack_ (mrjack@office.smart-weblications.net) has joined #ceph
[3:42] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[3:47] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[3:55] * jfriedly (~jfriedly@50-0-250-146.dedicated.static.sonic.net) Quit (Ping timeout: 480 seconds)
[3:57] * yanzheng (~zhyan@ has joined #ceph
[4:05] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) has joined #ceph
[4:05] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[4:08] * yanlb (~bean@ has joined #ceph
[4:11] <yanlb> hi, I got a segfault on running interactive ceph command. It crashed to accept argument "-w\" (without quotes)
[4:11] <yanlb> (gdb) r
[4:11] <yanlb> Starting program: /usr/bin/ceph
[4:11] <yanlb> [Thread debugging using libthread_db enabled]
[4:11] <yanlb> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[4:11] <yanlb> [New Thread 0x7ffff5805700 (LWP 2405)]
[4:11] <yanlb> [New Thread 0x7ffff48b5700 (LWP 2406)]
[4:11] <yanlb> [New Thread 0x7ffff40b4700 (LWP 2407)]
[4:11] <yanlb> [New Thread 0x7ffff38b3700 (LWP 2408)]
[4:11] <yanlb> [New Thread 0x7ffff30b2700 (LWP 2409)]
[4:11] <yanlb> [New Thread 0x7ffff28b1700 (LWP 2410)]
[4:11] <yanlb> [New Thread 0x7ffff20b0700 (LWP 2411)]
[4:12] <yanlb> [New Thread 0x7ffff18af700 (LWP 2412)]
[4:12] <yanlb> [New Thread 0x7ffff17ae700 (LWP 2413)]
[4:12] <yanlb> ceph> -w\
[4:12] <yanlb> Program received signal SIGSEGV, Segmentation fault.
[4:12] <yanlb> 0x000000000047356c in run_command(CephToolCtx*, char const*) ()
[4:12] <yanlb> (gdb) bt
[4:12] <yanlb> #0 0x000000000047356c in run_command(CephToolCtx*, char const*) ()
[4:12] <yanlb> #1 0x0000000000473be2 in ceph_tool_do_cli(CephToolCtx*) ()
[4:12] <yanlb> #2 0x000000000046eccc in main ()
[4:13] <yanlb> ceph -v
[4:13] <yanlb> ceph version 0.63 (054e96cf79e960894ef7e33a4d13635d3ad2a1b9)
[4:13] <scuttlemonkey> yanlb: please use pastebin.com or similar
[4:14] <dmick> yanlb: that's not too surprising, although it would be nice if it didn't happen. the ceph cli is about to be very much changed
[4:17] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:17] <yanlb> I guess it fail on "\"
[4:18] <yanlb> and "-w\ " gives me "unrecognized command"
[4:22] <dmick> why are you trying to use -w\?
[4:22] <dmick> ceph -w is to watch the status of the cluster, but -w\ has no meaning
[4:22] <yanlb> "\'
[4:22] <yanlb> sorry
[4:22] <dmick> ?
[4:23] <yanlb> I made mistake to append "\" to -w
[4:23] <dmick> ok. I thought maybe you were trying to make it work.
[4:23] <dmick> also, I don't think options (like -w) work from the ceph> prompt
[4:24] <yanlb> yes
[4:24] * KindTwo (~KindOne@h69.17.131.174.dynamic.ip.windstream.net) has joined #ceph
[4:24] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:24] * KindTwo is now known as KindOne
[4:25] <yanlb> but I think ceph conmmand should not get crashed on wrong args
[4:26] <dmick> sure, you're right. In this particular case it's weird enough input that we probably won't fix in prior releases, and the ceph CLI is changing so much that the code is completely different there. But I agree in principle.
[4:26] <dmick> thanks for letting us know.
[4:28] <yanlb> my pleasure
[4:32] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[4:33] <nigwil> would it be reasonable for the ceph df [detail] to output the fields right adjusted?
[4:34] <dmick> nigwil: it certainly could
[4:35] <dmick> I guess we tend to have a sorta-convention that numerics are right-adjusted, text are left-
[4:36] <nigwil> I notice the parameter to TextTable are all LEFT for the numeric fields, RIGHT would make them easier to read in a list
[4:36] <dmick> yes, that's what I mean about easy
[4:37] <dmick> wouldn't hurt to file a Feature request
[4:37] <nigwil> ahh, yes right :-)
[4:37] <nigwil> I was going to hack it into and suggest a git pull :-)
[4:37] <nigwil> but I wondered as you were working on the CLI whether you might have already done it
[4:37] <dmick> nope, not that
[4:37] <nigwil> no worries, I can make the change
[4:38] <dmick> but that'll be pretty independent of everything else too so easy to merge whereever
[4:38] <nigwil> Linux 'df' seems to do an interesting trick whereby it sticks to a 3-digit field and adjusts the units so it is only ever outputing 3-digits (at least for size fields)
[4:39] <dmick> see si_t
[4:39] <dmick> and/or kb)t
[4:39] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[4:39] <dmick> er, kb_t
[4:40] <dmick> ...and prettybyte_t
[4:52] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[5:15] * yanlb (~bean@ Quit (Quit: Konversation terminated!)
[5:22] * haomaiwang (~haomaiwan@ has joined #ceph
[5:31] * zindello (~zindello@ Quit (Remote host closed the connection)
[5:47] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) Quit (Quit: Leaving.)
[5:56] * Vanony (~vovo@i59F799B3.versanet.de) has joined #ceph
[6:03] * Vanony_ (~vovo@i59F7A451.versanet.de) Quit (Ping timeout: 480 seconds)
[6:21] <Volture> dmick: Hi. are you here?
[6:22] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[6:23] <Volture> Hi all
[6:24] <Volture> I need a little assistance conditional
[6:24] <Volture> Could you tell
[6:24] <Volture> What is this error
[6:24] <Volture> 2013-06-07 08:19:47.898721 7f4bf1371700 0 -- :/2501 >> pipe(0x1271490 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
[6:24] <Volture> And how to deal with it
[6:24] <Volture> ?
[6:39] <lurbs> Inability to get to a monitor?
[6:42] <lurbs> Is the monitor on working and reachable from where ever you're seeing that error?
[6:50] <Volture> lurbs: Thanks, I figured out, is not the right ip address in config file
[6:52] <lurbs> That'd do it.
[6:53] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Ping timeout: 480 seconds)
[6:56] <yehuda_hm> elder / joao: can you update the title (0.61.3)?
[6:56] <yehuda_hm> or scuttlemonkey / nhm ^^^
[6:59] <Volture> lurbs: after changing the configuration file. The monitor should be destroy and create again?
[7:01] <lurbs> Uh, I don't know enough about your setup to make a call there, and I've just started drinking for the evening. You might want to check with one of the devs (I'm not, just a user).
[7:02] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[7:04] * ScOut3R (~ScOut3R@54024172.dsl.pool.telekom.hu) has joined #ceph
[7:04] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (Quit: o//)
[7:13] * ScOut3R (~ScOut3R@54024172.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[7:28] * capri (~capri@ has joined #ceph
[7:29] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:30] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:51] * BillK (~BillK@124-169-216-2.dyn.iinet.net.au) Quit (Quit: Leaving)
[7:54] * schlitzer|work (~schlitzer@ Quit (Remote host closed the connection)
[7:54] * schlitzer|work (~schlitzer@ has joined #ceph
[7:59] * Machske (~Bram@d5152D87C.static.telenet.be) Quit ()
[8:03] * wdk (~wdk@124-169-216-2.dyn.iinet.net.au) has joined #ceph
[8:06] * bergerx_ (~bekir@ has joined #ceph
[8:09] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:14] * tnt (~tnt@ has joined #ceph
[8:36] * rongze1 (~zhu@ has joined #ceph
[8:42] * rongze (~zhu@ Quit (Ping timeout: 480 seconds)
[8:42] * Vjarjadian (~IceChat77@ Quit (Quit: On the other hand, you have different fingers.)
[8:46] * Machske (~Bram@d5152D8A3.static.telenet.be) has joined #ceph
[8:48] * Machske (~Bram@d5152D8A3.static.telenet.be) Quit ()
[8:49] * Machske (~Bram@d5152D8A3.static.telenet.be) has joined #ceph
[9:02] * xan (~xan@b.clients.kiwiirc.com) has joined #ceph
[9:03] <xan> I'm reading MDS Locker now, wondering what's rdlocks, wrlocks and xlock? rd=read, wr=write and x=exclusive?
[9:06] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[9:06] * leseb (~Adium@ has joined #ceph
[9:09] * espeer (~espeer@105-236-231-118.access.mtnbusiness.co.za) has joined #ceph
[9:10] <espeer> Hi guys, I'm getting some strage hang-ups with ceph-fuse that I don't know how to further diagnose. Any ideas?
[9:11] <espeer> ceph -s reports healthy cluster
[9:11] <espeer> but the mount point blocks on all accesses
[9:11] <espeer> killing ceph-fuse and remounting brings everything back to life again
[9:12] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[9:21] * madkiss (~madkiss@tmo-103-169.customers.d1-online.com) has joined #ceph
[9:21] * madkiss (~madkiss@tmo-103-169.customers.d1-online.com) Quit (Remote host closed the connection)
[9:22] * mschiff (~mschiff@port-91183.pppoe.wtnet.de) has joined #ceph
[9:24] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[9:25] * espeer (~espeer@105-236-231-118.access.mtnbusiness.co.za) Quit (Read error: Connection reset by peer)
[9:25] * espeer (~espeer@105-236-231-118.access.mtnbusiness.co.za) has joined #ceph
[9:26] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:26] * ChanServ sets mode +v andreask
[9:30] * leseb (~Adium@ Quit (Quit: Leaving.)
[9:41] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:44] * The_Bishop__ (~bishop@e179013061.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[9:45] * madd (~m@workstation.sauer.ms) has joined #ceph
[9:46] <maximilian> do I need a third mon for 2 node cluster? both server/nodes have (mon,msd,osd) 1GB for public, 10gb for cluster networdk...just want to replace drbd with ceph
[9:48] * ScOut3R (~ScOut3R@ has joined #ceph
[9:52] <ofu> maximilian: 3. mon is necessary for quorum. When you run into problems, one side has to consist of >50% of all mons
[9:52] <saaby> maximilian: yes, you need a third mon, if you want to be able to run on just one of your osd hosts
[10:00] * leseb (~Adium@ has joined #ceph
[10:06] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[10:06] * loicd (~loic@magenta.dachary.org) has joined #ceph
[10:12] * dcasier (~dcasier@ has joined #ceph
[10:13] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[10:13] * leseb (~Adium@ Quit (Ping timeout: 480 seconds)
[10:13] * sha (~kvirc@ has joined #ceph
[10:14] <sha> hi everybody. look at this http://pastebin.com/vudh1Vsg
[10:15] <sha> any ideas? why degradation is freezed on 1.733%?
[10:19] * darkfaded (~floh@ Quit (Quit: leaving)
[10:20] * darkfader (~floh@ has joined #ceph
[10:23] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[10:25] * LeaChim (~LeaChim@ has joined #ceph
[10:25] <ofu> can I use wildcards in ceph.conf? as in [osd.0-11] host=ceph0 and osd.[12-23] host=ceph1 ?
[10:31] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) has joined #ceph
[10:32] * mabeki (~makiefer@2001:8d8:1fe:301:a2b3:ccff:fef7:21d5) has left #ceph
[10:34] <sha> no ideas
[10:34] * BManojlovic (~steki@fo-d- has joined #ceph
[10:44] * eschnou (~eschnou@ has joined #ceph
[10:54] * The_Bishop (~bishop@e177091030.adsl.alicedsl.de) has joined #ceph
[10:56] * yanzheng (~zhyan@ Quit (Quit: Leaving)
[10:58] * espeer (~espeer@105-236-231-118.access.mtnbusiness.co.za) Quit (Remote host closed the connection)
[10:58] * espeer (~espeer@105-236-231-118.access.mtnbusiness.co.za) has joined #ceph
[11:01] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[11:07] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[11:09] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[11:10] * leseb (~Adium@ has joined #ceph
[11:21] * capri (~capri@ Quit (Quit: Verlassend)
[11:33] <topro> recent discussion about memory leaks with processes growing over time making frequent restart og processes neccesary, was it MON related or MDS related?
[11:33] <topro> ^^ cuttlefish
[11:37] <mikedawson> topro: MON
[11:38] <mikedawson> topro: actually the MON issues were the leveldb store was growing versions 0.58 - 0.61.2 were affected. 0.61.3 seems better
[11:39] <topro> well with 0.61.2 I don't experience MON leaks, but I do experience heavy MDS leaks? does 0.61.3 address that too ? I know CEPHFS is not stable yet, so no accuse
[11:44] <mikedawson> topro: i don't use cephfs. not sure
[11:45] <sha> when i try to add mon - failed: 'ulimit -n 8192; /usr/bin/ceph-mon -i d --pid-file /var/run/ceph/mon.d.pid -c /etc/ceph/ceph.conf '
[11:45] <sha>
[11:48] * haomaiwang (~haomaiwan@ Quit (Ping timeout: 480 seconds)
[11:49] <topro> mikedawson: np, thank you. anyone else? ;)
[11:54] <sha> any one have idea about ulimit?
[12:02] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[12:03] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[12:10] * maximilian (~maximilia@ Quit (Ping timeout: 480 seconds)
[12:11] * maximilian (~maximilia@ has joined #ceph
[12:13] <maximilian> sha check first the public/cluster networks/adresses and hostname resolution as good as I remember I had this error once related with hostname resolution
[12:16] * Machske (~Bram@d5152D8A3.static.telenet.be) Quit ()
[12:19] <sha> maximilian: it`s about ulimit?
[12:20] * ssejour (~sebastien@out-chantepie.fr.clara.net) has joined #ceph
[12:21] <ssejour> hello
[12:21] * ssejour (~sebastien@out-chantepie.fr.clara.net) has left #ceph
[12:22] * ssejour (~sebastien@out-chantepie.fr.clara.net) has joined #ceph
[12:23] <ssejour> hello
[12:23] <ssejour> I'm currently deploying a ceph cluster on ubuntu (Cuttlefish)
[12:24] <ssejour> I'm using ceph-deploy
[12:25] <ssejour> my problem is that I can't stop or start osd process with "/etc/init.d/ceph" nor "service ceph stop/start"
[12:25] <ssejour> nothing happen
[12:26] <ssejour> I'm wondering if it's because osd nodes are not specified in ceph.conf
[12:29] <ssejour> but as I understand I do not need to be specified in ceph.conf. do I?
[12:29] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) has joined #ceph
[12:29] <goodbytes> when using service ceph start you need to specify which daemon and instance you want to start, example:
[12:30] <goodbytes> service ceph start osd.0
[12:30] <goodbytes> or
[12:30] <goodbytes> service ceph-osd start id=0
[12:32] * xan (~xan@b.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[12:32] * xan (~xan@b.clients.kiwiirc.com) has joined #ceph
[12:33] * haomaiwang (~haomaiwan@ has joined #ceph
[12:35] * maximilian (~maximilia@ Quit (Remote host closed the connection)
[12:35] * maximilian (~maximilia@ has joined #ceph
[12:36] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[12:38] * leseb (~Adium@ Quit (Quit: Leaving.)
[12:52] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Ping timeout: 480 seconds)
[13:01] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[13:08] * leseb (~Adium@ has joined #ceph
[13:12] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[13:15] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[13:17] * leseb (~Adium@ Quit (Ping timeout: 480 seconds)
[13:20] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[13:22] * leseb (~Adium@ has joined #ceph
[13:26] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Read error: Operation timed out)
[13:27] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[13:29] * diegows (~diegows@ has joined #ceph
[13:34] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[13:39] * The_Bishop (~bishop@e177091030.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[13:45] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) has joined #ceph
[13:47] * wdk (~wdk@124-169-216-2.dyn.iinet.net.au) Quit (Quit: Leaving)
[13:48] * wdk (~wdk@124-169-216-2.dyn.iinet.net.au) has joined #ceph
[13:48] * The_Bishop (~bishop@e177091030.adsl.alicedsl.de) has joined #ceph
[13:53] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[13:56] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[14:03] * haomaiwang (~haomaiwan@ Quit (Ping timeout: 480 seconds)
[14:05] <joelio> ahh, all back up and running. It's amazing what a good nights sleep and a fresh approach can do
[14:06] <joelio> thanks for listening to my exasperation yesterday - and for the tips :)
[14:06] * leseb (~Adium@ Quit (Quit: Leaving.)
[14:10] * leseb (~Adium@ has joined #ceph
[14:10] <schlitzer|work> hey, how can i increase the loglevel for radosgw?
[14:11] <schlitzer|work> it should log to /var/log/ceph/radosgw.log, but the log is empty :-/
[14:13] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[14:16] * xan (~xan@b.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[14:20] <andrei> hello guys
[14:20] <andrei> i am planning to setup a small production ceph cluster
[14:21] <andrei> made of two servers.
[14:21] <andrei> could you please let me know what is the recommended ceph version that I should install?
[14:21] <andrei> 0.61.3? is this out yet?
[14:21] <andrei> or 0.63?
[14:21] <andrei> or 0.62?
[14:22] <andrei> from what i can see there are a lot of different versions
[14:22] <andrei> joelio: how did your setup go?
[14:22] <andrei> did yo manage to get your cluster in place?
[14:22] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[14:22] * ChanServ sets mode +v andreask
[14:29] <Gugge-47527> andrei: i would go for 0.61.x
[14:29] <Gugge-47527> the cuttlefish branch
[14:32] <jtang> hmm, machines have arrived!
[14:32] <jtang> will be setting up a two node ceph cluster!
[14:37] <joelio> andrei: got there in the end, your recommended method of doing it manually was win
[14:37] <joelio> thanks!
[14:40] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 22.0/20130528181031])
[14:55] * DarkAce-Z is now known as DarkAceZ
[14:57] * leseb (~Adium@ Quit (Quit: Leaving.)
[15:08] * leseb (~Adium@ has joined #ceph
[15:18] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[15:18] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[15:18] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[15:24] * espeer (~espeer@105-236-231-118.access.mtnbusiness.co.za) Quit (Ping timeout: 480 seconds)
[15:28] * PerlStalker (~PerlStalk@ has joined #ceph
[15:28] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[15:30] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[15:32] * The_Bishop (~bishop@e177091030.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[15:36] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[15:37] * JonTheNiceGuy (~oftc-webi@ has joined #ceph
[15:38] <ofu> i am trying to configure 2 osd on one host on centos 6.4, but the startup script only starts osd.0, is this a known bug?
[15:41] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) has joined #ceph
[15:41] * ChanServ sets mode +v leseb
[15:49] * The_Bishop (~bishop@e177091030.adsl.alicedsl.de) has joined #ceph
[15:53] <JonTheNiceGuy> Hi, I'm having issues with ceph-deploy where it's not honoring my proxy details in ceph-deploy install.
[15:53] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) Quit (Quit: Leaving.)
[15:54] <JonTheNiceGuy> I've tried running HTTP_PROXY= ceph-deploy install ceph1
[15:54] <JonTheNiceGuy> And export HTTP_PROXY && ceph-deploy install ceph1
[15:55] <JonTheNiceGuy> it looks like it's just when it downloads the cert key
[15:55] <JonTheNiceGuy> But I've already performed that step, as it's part of the preflight documents.
[15:59] * Vanony (~vovo@i59F799B3.versanet.de) Quit (Ping timeout: 480 seconds)
[16:01] <scuttlemonkey> yehuda_hm/yehudasa: I made the title to not need updating
[16:01] <scuttlemonkey> (re your comment last night)
[16:02] * baffle (~baffle@jump.stenstad.net) has joined #ceph
[16:02] <scuttlemonkey> Latest stable will automagically snag the highest point releast (ceph.com/get), next section was meant to be current stable series, then announcements
[16:02] <scuttlemonkey> if we don't want the stable series in there we can rip that
[16:04] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[16:04] <baffle> Is there any work in progress for integrating a local ssd-backed bcache with the qemu-kvm/rbd implementation? So that you can have local SSD cache for qemu-kvm VMs that have their block devices on Ceph?
[16:05] <ssejour> goodbytes: thanks for your answer. but even if I do a service ceph -a start it doesn't work. (and I did a service ceph start osd.0 too)
[16:06] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[16:06] <baffle> And have anyone played around with using bcache or similar on the osd?
[16:07] <yehuda_hm> scuttlemonkey: the title though specifies v0.61 and not v0.61.3, so that people may not be aware that there's a new version
[16:08] <andrei> joelio: thanks, i've tried ceph-deploy several times and it didn't work for me ((
[16:08] <andrei> gugge-47527, thanks, that's what i will install
[16:08] <elder> I personally think the title is a bit too long. One link to a page with the latest info might be nice
[16:08] <andrei> tnt, are you around?
[16:08] <tnt> I am
[16:08] <Midnightmyth> What is the stored object size in ceph?
[16:08] <andrei> was wondering if you could share with me how you are managing the growing store.db partition?
[16:09] * scuttlemonkey changes topic to 'Latest stable (v0.61.3 "Cuttlefish) -- http://ceph.com/get || http://wiki.ceph.com Live! || "Geek on Duty" program -- http://goo.gl/f02Dt'
[16:12] <tnt> andrei: I don't have a growing store.db ... I run a patched version. you should update to 0.61.3
[16:13] <andrei> tnt: does 0.61.3 address the growing store.db issue?
[16:13] <tnt> yes
[16:14] <andrei> nice one!
[16:15] <andrei> i will be reinstalling my ceph cluster as it looks like to much trouble to restore my monitor
[16:15] <andrei> and since i've not had much data on it, i could live with loosing the cluster and starting fresh
[16:15] <andrei> does anyone know what is the recommended rbd cache size that I should use for production?
[16:16] * haomaiwang (~haomaiwan@ has joined #ceph
[16:16] <andrei> i think the default 32MB is not much nowadays
[16:16] <andrei> scuttlemonkey: hi
[16:18] <andrei> sage: are you online?
[16:18] <andrei> i was hoping to pick your brain a bit on the ceph-deploy
[16:18] * espeer (~espeer@105-236-199-115.access.mtnbusiness.co.za) has joined #ceph
[16:18] <andrei> i am setting up a new ceph cluster and the last time i've used ceph-deploy I was unable to set up a cluster
[16:21] * espeer (~espeer@105-236-199-115.access.mtnbusiness.co.za) has left #ceph
[16:21] <goodbytes> ssejour, did you try "service ceph-osd start id=0" ?
[16:21] <redeemed> andrei, i finally have been able to setup a cluster using ceph-deploy; nothing fancy though. no journal drives, etc. just a bunch of OSDs in three machines.
[16:22] * portante (~user@ has joined #ceph
[16:25] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[16:26] <ssejour> goodbytes: thanks. a lot better... but now I have a "unable to open OSD superblock on /var/lib/ceph/osd/ceph-0" error, which is another problem ;)
[16:27] * JonTheNiceGuy (~oftc-webi@ Quit (Quit: Page closed)
[16:28] <ssejour> ok. Ceph-deploy do not modify fstab to mount osd devices after a reboot?
[16:30] * Oliver1 (~oliver1@p54838FEF.dip0.t-ipconnect.de) has joined #ceph
[16:31] <redeemed> it does not on my machines, ssejour
[16:32] <ssejour> ok. I'll do it by hand ;) . I didn't see this information in the documentation
[16:33] <redeemed> ssejour, whoah now. i believe they use "upstart" to do this; i can reboot my nodes and run "mount" to see what all is mounted and i see an entry for each OSD. ex: /dev/sdd1 on /var/lib/ceph/osd/ceph-2 type xfs (rw,noatime)
[16:34] <redeemed> ssejour, what version are you running?
[16:38] <ssejour> redeemed : 0.61.3
[16:40] <redeemed> ah, then in a perfect world yours would function as mine seems to be.
[16:43] <ssejour> redeemed : do you know where is configured the device and mount options for each osd?
[16:43] * haomaiwang (~haomaiwan@ Quit (Read error: Connection reset by peer)
[16:49] <redeemed> ssejour, unfortunately i am not certain. judging by the upstart script (/etc/init.d/ceph), the scripts in /etc/init/ceph-*, and the structure of /var/lib/ceph; i would say that is all completely dynamic, determined by the /var/lib/ceph/osd subtree and configuration files held within.
[16:49] <redeemed> but i do not know
[16:53] <redeemed> ssejour: http://ceph.com/docs/master/rados/configuration/osd-config-ref/ this url contains maybe some useful info regarding your question, for example "osd data" sub-section
[17:05] * bergerx_ (~bekir@ Quit (Quit: Leaving.)
[17:06] <ssejour> redeemed: but I need to mount /var/lib/ceph/osd/ceph-X before using it... strange...
[17:07] * athrift (~nz_monkey@ Quit (Ping timeout: 480 seconds)
[17:13] * leseb (~Adium@ Quit (Quit: Leaving.)
[17:13] <andrei> redeemed: how did you manage to do that?
[17:13] <andrei> i've followed the ceph-deploy howto and it didn't work for me
[17:14] <redeemed> ssejor / andre, what's the paste service y'all use? i can share my little "script"
[17:14] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[17:15] * athrift (~nz_monkey@ has joined #ceph
[17:17] <janos> redeemed: i just use http://fpaste.org/
[17:17] <janos> not sure what others use
[17:18] * sagelap (~sage@2600:1012:b000:b778:3954:7de6:f874:d301) has joined #ceph
[17:19] * leseb (~Adium@ has joined #ceph
[17:20] <redeemed> ok, ty. andrei / ssejour, here's how i get ceph-deploy to work in my environment. http://ur1.ca/e87p6 please know that my ceph nodes are not virtual machines but rather physical nodes. that seems to have helped me with some ceph-deploy issues.
[17:20] <redeemed> i use ubuntu 12.04
[17:21] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[17:21] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[17:22] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:26] <ssejour> thanks redeemed
[17:32] <andrei> redeem: thanks!
[17:35] <andrei> redeemed: did you have any issues with keys gathering?
[17:35] <andrei> coz this is what I had I think
[17:35] <andrei> i couldn't get the keys
[17:36] <redeemed> yes, it can take a while for the keys to be generated because, as i understand, the MONs have to reach quarum and likely one of the MONs is stalling.
[17:36] <andrei> redeemed, roughly, how long did it take in your case?
[17:37] <redeemed> so i generally reboot the stalled MON when that happens (checked by running: ps aux | grep ceph, and looking for "keyring")
[17:37] <sagelap> it is normally < 5 seconds
[17:37] <andrei> did you start with 3 mons?
[17:37] <andrei> or just one?
[17:37] <redeemed> generally two of my three MONs will not have a keyring program running, so ceph is waiting on the 3rd one to stop though it is stalled.
[17:38] <redeemed> sagelap, my comps are slower or something so it takes 20 seconds.
[17:38] <redeemed> roughly
[17:40] <andrei> redeemed: thanks I will be trying that now
[17:41] <andrei> by the way, do you know if ceph-deploy will automatically edit /etc/fstab to make sure that osds are automaticlaly mounted at boot?
[17:41] <redeemed> i have not seen it do that. you can search the source if you want to see if it should in some context.
[17:42] <andrei> redeemed, so I should manually prepare the osds mountpoints and fstab entries?
[17:42] <redeemed> no, follow my "guide" and you should be set
[17:43] <andrei> thanks for the guide
[17:43] <redeemed> all using ceph-deploy, generate ceph config, install ceph on nodes, setup MONs, gather keys, zap disks, build/activate disks
[17:43] * jjgalvez1 (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[17:43] <andrei> do you know if ceph-deploy could be used for changing journal paths to ssd?
[17:44] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[17:44] <redeemed> you'd have to look over the commands
[17:44] <redeemed> dunno
[17:45] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Quit: Leaving.)
[17:45] * leseb (~Adium@ Quit (Quit: Leaving.)
[17:52] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[17:54] <nwl> wido: sorry, my browser crashed. many thanks!
[17:55] <wido> nwl: No problem! You're welcome
[17:55] <nwl> wido: also, please email me your address - i'd like to send something over as a thanks for helping out with Synergy
[17:55] <andrei> redeemed: i think it would be very valuable if you could put this howto in the wiki
[17:55] <andrei> wido: hi
[17:56] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[17:56] <wido> andrei: hi
[17:56] <andrei> wido: do you know if there has been any improvements relating to rbd in cloudstack 4.1 comparing with cloudstack 4.0.1?
[17:56] <redeemed> i'll look into it, andrei
[17:56] <andrei> i am still on CS 4.0.1 and planning to upgrade to 4.1
[17:57] <wido> andrei: Yes, it will use RBD snapshotting and cloning when deploying a template
[17:57] <andrei> is there anything I should watch out for?
[17:57] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Ping timeout: 480 seconds)
[17:57] * dxd828 (~dxd828@ Quit (Remote host closed the connection)
[17:57] <andrei> wido: really, in 4.1 i should be able to use rbd snapshotting?
[17:57] <wido> andrei: No, sorry, I was mistaken. It will be in 4.2
[17:58] <wido> No, nothing changed around RBD in 4.1
[17:58] <andrei> ah
[17:58] <andrei> you got me excited there )))
[17:58] <wido> Just got out of a call and confused about the features
[17:58] <wido> andrei: You might want to look at this blog: http://blog.widodh.nl/2013/06/a-quick-note-on-running-cloudstack-with-rbd-on-ubuntu-12-04/
[17:58] <wido> easier install with packages
[17:58] * wido has to go afk
[17:58] <andrei> wido: thanks
[17:59] * The_Bishop (~bishop@e177091030.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[18:03] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[18:03] <andrei> does it make a difference if I use a partition or a block device for the osd?
[18:04] <andrei> what is the preferred way?
[18:04] <andrei> a block device?
[18:05] <sagewk> andrei: a block device.. ceph-deploy will partition it for you with GPT
[18:05] <sagewk> and magic will ensue
[18:05] <Esmil> esmil@badu~/code/stupidterm$ ssh git@labitat.dk help
[18:05] <Esmil> Sorry
[18:06] * ssejour (~sebastien@out-chantepie.fr.clara.net) has left #ceph
[18:06] <andrei> sagewk: thanks
[18:06] <andrei> and what about journal on ssd?
[18:06] <andrei> should I create that myself?
[18:06] <andrei> as i plan to use multiple osd journals with one ssd disk
[18:06] <sagewk> ceph-deploy osd create HOST:DATADEV:JOURNALDEV.. e.g. myhost:sdb:sdi if sdi is you ssd
[18:07] <sagewk> it will by default carve out 10GB per osd for each journal
[18:07] * markl (~mark@tpsit.com) Quit (Ping timeout: 480 seconds)
[18:08] * wdk (~wdk@124-169-216-2.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[18:08] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[18:08] <andrei> sagewk: okay, so if I use the same journaldev for multiple osds ceph-deploy will not overwrite the previous osd journals?
[18:08] <andrei> is there a setting to tell ceph-deploy to use 20gb instead of 10gb per journal?
[18:09] * sagelap (~sage@2600:1012:b000:b778:3954:7de6:f874:d301) Quit (Ping timeout: 480 seconds)
[18:11] <sagewk> correct
[18:11] <sagewk> it will just add partitions
[18:11] <sagewk> --osd-journal-size, iirc
[18:12] <topro> hi, just had a bumpy ride upgrading my cephfs cluster from 0.61.2 to 0.61.3. all my clients refused to mount the fs using linux 3.8.5 kernel client
[18:12] * fmarchand (~fmarchand@a.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[18:12] * Oliver1 (~oliver1@p54838FEF.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[18:12] <sagewk> 'osd journal size = sizeinmegabytes' in your ceph.conf
[18:13] <sagewk> topro: daemons restarted?
[18:13] <topro> seems like a "ceph osd crush tunables legacy" fixed it. does 0.61.3 change tunables of an existing cluster?
[18:15] * Wolff_John (~jwolff@ has joined #ceph
[18:15] <topro> as I would need linux 3.9 to use bobtail tunables I kept sticking with legacy values. never had a problem before but after upgrading ceph servers from 0.61.2 to 0.61.3 (debian wheezy, packages from ceph.com/debian-cuttlefish) this seems to have changed
[18:15] <sagewk> not between 0.61.2 and .3...
[18:16] * Oliver1 (~oliver1@p54838FEF.dip0.t-ipconnect.de) has joined #ceph
[18:17] <topro> sagewk: well thats what I just experienced. after the problem arose and couldn't find a quick solution, I first shut down all clients, stopped all ceph daemons on all server nodes, waited for some time, restarted ceph on all server nodes and tried mounting a cephfs on one client, still failing...
[18:18] <topro> I then remebered of tunables and just gave it a try. after that all my clients could magically mount again :/
[18:18] * Jahkeup (~Jahkeup@ has joined #ceph
[18:18] <sagewk> let me double check the changelog
[18:19] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Remote host closed the connection)
[18:19] <sagewk> you're certain it was 0.61.2 before?
[18:20] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[18:20] <sagewk> can you tar up a mon directory so i can look at the osdmap history?
[18:21] <topro> sagewk: for sure, where can I put it?
[18:21] <topro> it was 0.61.2, 100% sure. or whatever I got with packages from "deb ceph.com/debian-cuttlefish wheezy main" called 0.61.2
[18:21] <sagewk> can you open a tracker issue too? thanks!
[18:26] <paravoid> sagewk: so, verdict is that I restarted when I shouldn't have?
[18:27] <paravoid> sagewk: apologies for making you dig through all that if that's the case
[18:27] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[18:27] <paravoid> at least a real bug was fixed/backported
[18:27] <sagewk> paravoid: no worries, it wasnt' supposed to break if you did that ;)
[18:27] <sagewk> and joao found another issue in the process
[18:30] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[18:31] * ssejour (~sebastien@out-chantepie.fr.clara.net) has joined #ceph
[18:32] <ssejour> redeemed: can you use ceph-conf?
[18:33] <ssejour> I have no output from ceph-conf...
[18:37] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[18:44] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) has joined #ceph
[18:44] * ssejour (~sebastien@out-chantepie.fr.clara.net) has left #ceph
[18:45] * rturk-away is now known as rturk
[18:48] * Tamil (~tamil@ has joined #ceph
[18:49] <sagewk> jamespage: ping!
[18:50] <sagewk> jamespage: i have an upstart question to pester you with, if you are available... :)
[18:50] <jamespage> sagewk, I am
[18:51] <sagewk> jamespage: http://tracker.ceph.com/issues/5248
[18:51] <sagewk> seems like ceph-all should start when all the net devices have been configured.. is there a way to do that?
[18:52] * Oliver1 (~oliver1@p54838FEF.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[18:52] <jamespage> sagewk, hmm - funny you should ask about this - I've been tinkering with openvswitch bonding and struggling todo much the same thing
[18:53] <jamespage> sagewk, "start on runlevel [2345]" won't always work either
[18:54] <jamespage> sagewk, quoting from http://upstart.ubuntu.com/cookbook/
[18:54] <jamespage> sagewk, If you want your job to start when all network devices are active, specify:
[18:54] <jamespage> start on stopped networking
[18:54] * jamespage goes to try that
[18:55] * Oliver1 (~oliver1@p54838FEF.dip0.t-ipconnect.de) has joined #ceph
[18:55] * The_Bishop (~bishop@2001:470:50b6:0:4ec:7ba:e554:e2ee) has joined #ceph
[18:56] <topro> sagewk: created a tracker issue, hope it contains all neccesary information http://tracker.ceph.com/issues/5272
[18:57] <andrei> redeemed: did you have any Traceback errors when doing ceph-deploy
[18:57] <andrei> i am having a bunch of them
[18:57] <andrei> practially with every command from your guide
[18:57] <redeemed> ssejour, i will check after i redeploy our test environment. andrei, i get traceback errors when my environment is not set up properly.
[18:58] <redeemed> just read the traceback error top and bottom of the stack to figure out what is wrong.
[18:58] <andrei> redeemed: i am following your guide to the line
[19:00] <andrei> redeemed: like these: http://ur1.ca/e896u
[19:00] <andrei> it managed to zap majority of disks, on the same host
[19:01] <redeemed> andrei, you will have to read the traceback errors to figure out what you need to do on your nodes. ceph-deploy does not do everything to clean up the nodes. for example, if you deployed to the node then purged and then tried to deploy again, your OSD drives may still be mounted and causing issues.
[19:01] <andrei> but on some osds it gives me the traceback
[19:01] <paravoid> jamespage: oh hey :)
[19:01] <jamespage> hey paravoid
[19:01] <sjust> paravoid: did the smaller logs have an effect?
[19:02] <paravoid> did you see my message yesterday?
[19:02] <jamespage> sagewk, hmm - that does not work so well
[19:02] <sjust> the 14s one?
[19:02] <paravoid> yes
[19:02] <sjust> yeah, was that an improvement?
[19:02] <paravoid> let me try again now
[19:02] <jamespage> sagewk, leave it with me - I'll go pester my friendly upstart developer
[19:02] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Quit: Leaving.)
[19:02] <jamespage> but that will be next week - he's not around right now
[19:02] <sagewk> jamespage: awesome, thanks!
[19:03] <paravoid> sjust: yes, much better
[19:03] <sjust> paravoid: ok, working on a real patch
[19:04] <paravoid> hm, not much
[19:04] <paravoid> just a bit better
[19:04] <sjust> but no more >1' peering sessions?
[19:04] <paravoid> now it took 40s
[19:04] <sjust> oh
[19:05] <paravoid> it's hard to say 100% since it hasn't always been a deterministic amount of time, but it looks better, yes
[19:05] <sjust> paravoid: hmm, it should be much better, I think we may be missing something
[19:06] <sjust> can you post a ceph pg dump?
[19:06] <sjust> curious about log bounds
[19:06] * loicd (~loic@2a01:e35:2eba:db10:f59f:7722:b1c4:a589) has joined #ceph
[19:06] <paravoid> 185110'82445 184302'4955
[19:07] <paravoid> is that what you're looking for?
[19:07] <paravoid> I can post all of it, I'm just trying to understand what you're looking for exactly to get a better understanding :)
[19:07] <sjust> drat, pg dump doesn't have the log bounds I guess
[19:08] <sjust> ok, can you pick 10 pgs at random and post ceph pg <pgid> query for those pgs?
[19:09] <paravoid> anything specific?
[19:09] <sjust> log_tail and last_update
[19:09] <sjust> under info
[19:10] <sjust> the log extends (log_tail, last_update]
[19:10] <paravoid> "last_update": "185110'29979",
[19:10] <paravoid> "log_tail": "181605'29678",
[19:10] <sjust> that's right
[19:10] <paravoid> that's not 300 now is it
[19:10] <sjust> 29979-29678?
[19:10] <paravoid> what's the two numbers?
[19:10] <sjust> 185110 is the epoch number
[19:10] <sjust> 29979
[19:10] <sjust> is the version number
[19:11] <paravoid> okay
[19:11] <sjust> in this case, you can ignore the epoch number, it's used primarily to detect divergent events (for recovery)
[19:11] * danieagle (~Daniel@ has joined #ceph
[19:12] <paravoid> 300 in all that I can see
[19:12] <sjust> ok
[19:12] <paravoid> checked a bunch of them
[19:13] <sjust> so the worst you've seen so far with shorter logs is 40s, before the worst you saw was >3'?
[19:13] * loicd very interested in this debug session
[19:14] <paravoid> hard to say
[19:14] <paravoid> I've seen 1-2' when disks failed before
[19:15] <paravoid> a disk fails -> osd dies -> slow requests -> radosgw stuck requests -> lvs outage -> nagios page us
[19:17] <paravoid> so, I've seen "just" 1-2' before, yes
[19:22] <paravoid> sjust: so, do you want me to collect more data somehow?
[19:23] <paravoid> sjust: also, all those tests happen still on that box with the patch you had me install; should I uninstall that or is it going to be part of the patch series that you're preparing?
[19:23] <sjust> paravoid: you probably want to go back to what is on the rest of the cluster for now
[19:23] <paravoid> 0.61.3, okay
[19:26] * loicd (~loic@2a01:e35:2eba:db10:f59f:7722:b1c4:a589) Quit (Quit: Leaving.)
[19:31] <paravoid> sjust: 17:29:36.807247 - 17:31:10.982680
[19:32] <paravoid> that's me restarting osd.0 just now
[19:32] <sjust> so 40s?
[19:32] <paravoid> no
[19:32] <sjust> oh
[19:32] <sjust> 1'40
[19:32] <paravoid> 1:34s
[19:32] <paravoid> yes
[19:33] <paravoid> definitely not 1:10 as you expected
[19:33] <paravoid> it didn't take 15' before
[19:37] * tkensiski (~tkensiski@ has joined #ceph
[19:37] * tkensiski (~tkensiski@ has left #ceph
[19:37] <sjust> ok, let's try this
[19:37] <sjust> can you kill -9 an osd and restart it instead of sigterming it?
[19:38] <sjust> paravoid: ^
[19:38] <paravoid> okay
[19:38] <paravoid> let's do it with osd.8 this time
[19:39] <sjust> yeah
[19:41] <paravoid> 17:39:56.801927 - 17:41:38.060246
[19:42] <sjust> how long between when the osd went down and when it went up?
[19:42] <paravoid> almost immediately
[19:42] <paravoid> upstart did it automatically
[19:42] <sjust> ok, so probably not divergent objects then
[19:43] <paravoid> no, there's *zero* i/o on the cluster now
[19:43] <paravoid> apart from scrubs
[19:43] <paravoid> no external hits at all
[19:43] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[19:43] <paravoid> didn't even get a slow request, since there were no requests to begin with :)
[19:43] * redeemed_ (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[19:44] * mattbenjamin (~matt@aa2.linuxbox.com) Quit (Quit: Leaving.)
[19:45] * LeaChim (~LeaChim@ has joined #ceph
[19:50] <sjust> paravoid: greg has a cache theory
[19:50] <paravoid> oh?
[19:50] <sjust> paravoid: can pick another victim osd, go into the current/omap/ directory, cat all of the files to /dev/null, and then restart the osd (just using upstart)
[19:50] * frank9999 (~frank@kantoor.transip.nl) Quit (Remote host closed the connection)
[19:50] <paravoid> right
[19:50] * Oliver1 (~oliver1@p54838FEF.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[19:51] * frank9999 (~frank@kantoor.transip.nl) has joined #ceph
[19:52] <paravoid> hmm
[19:52] <sjust> paravoid: it might take a while
[19:52] <paravoid> # time cat * > /dev/null
[19:52] <paravoid> real 0m38.888s
[19:52] <paravoid> user 0m0.056s
[19:52] <paravoid> sys 0m2.748s
[19:52] <paravoid> 4.0G .
[19:52] <paravoid> let's see peering now
[19:53] <gregaf> *crosses fingers*
[19:54] <paravoid> 17:53:23.839563 - 17:54:02.799897
[19:54] <paravoid> 40s
[19:54] <sjust> yeah
[19:54] <sjust> try again
[19:54] <sjust> I mean, just the restart
[19:54] <gregaf> so that's probably a no
[19:54] <paravoid> still a lot of them in active
[19:54] <gregaf> so much for me getting to laugh at Sam by getting it on my first try ;)
[19:55] <sjust> paravoid: once it goes clean
[19:55] <paravoid> yep
[19:55] <paravoid> 0s
[19:56] <paravoid> didn't even get to see peering
[19:56] <paravoid> stale -> degraded -> active+clean
[19:56] <paravoid> if it's caches, wouldn't it be dependent on the other replicas' caches for those pgs too?
[19:57] <paravoid> this might sound stupid :)
[19:57] <sjust> paravoid: sort of, but they are in memory
[19:58] <paravoid> okay
[19:58] <sjust> actually, one sec
[19:59] * redeemed (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Quit: bia)
[20:01] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[20:08] * redeemed_ (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Remote host closed the connection)
[20:08] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[20:09] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Remote host closed the connection)
[20:09] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[20:09] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Remote host closed the connection)
[20:09] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[20:13] * tkensiski (~tkensiski@ has joined #ceph
[20:13] * tkensiski (~tkensiski@ has left #ceph
[20:16] <sjust> paravoid: let's try something else, try restarting a victim with filestore_queue_max_ops = 1000000, if possible, put it in the ceph.conf and use the same upstart command you have been using
[20:17] <paravoid> just filestore_queue_max_ops = 1000000 under [osd], right?
[20:17] <sjust> filestore queue max ops = 1000000
[20:17] <sjust> under [osd]
[20:17] <paravoid> right
[20:18] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:18] <paravoid> ha
[20:18] <paravoid> that worked
[20:19] <paravoid> no visible peering at all
[20:19] <gregaf> okay, so it is an entirely local effect of some kind or other...
[20:19] <sjust> ok, that may prove that the issue is on the restarted osd, not the replicas
[20:19] <gregaf> but it's not just about leveldb and pglog caching
[20:19] <sjust> you should remove that setting and restart, it will make the filestore behave badly
[20:20] * AaronSchulz (~chatzilla@ has joined #ceph
[20:21] <paravoid> I figured as much!
[20:21] <paravoid> also very quick now, fwiw
[20:22] <gregaf> yeah, it's some kind of caching effect but we can't figure out *what*
[20:22] <sjust> try with filestore_op_threads = 10
[20:23] <paravoid> 18:23:16.292320 - 18:23:41.930746
[20:24] <paravoid> 25s
[20:29] <paravoid> so, are you sure that it wasn't what you theorized yesterday?
[20:29] <paravoid> logs et al?
[20:29] <elder> sagewk I'm going to slip my 5 patches in before the 3 btrfs ones from Josef when I commit. Let me know if you want me to do something different.
[20:29] <sagewk> elder: perfect
[20:30] <sagewk> best to keep those on top and not lose them in the pile :)
[20:32] <paravoid> sjust: anything else you can think of? I'll be gone in 10-15'
[20:36] <andrei> does anyone know if ceph-deploy would take values for the journal from the ceph.conf file or will it use it's default value unless otherwise specified by the cli option?
[20:37] * dosaboy (~dosaboy@eth3.bismuth.canonical.com) Quit (Quit: leaving)
[20:38] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[20:40] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[20:42] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[20:48] * Tamil (~tamil@ Quit (Quit: Leaving.)
[20:51] <paravoid> andrei: I haven't played with ceph-deploy but I think it uses ceph-disk, which doesn't use ceph.conf's journal value
[20:51] <andrei> paravoid: okay thanks
[20:51] <andrei> i am having some issues with the ceph-disk command which is passed on from ceph-deploy
[20:52] <andrei> it just hangs without doing much for ages
[20:52] <andrei> not sure what's it doing
[20:52] * Tamil (~tamil@ has joined #ceph
[20:52] <paravoid> andrei: http://tracker.ceph.com/issues/4031
[20:52] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[20:54] <andrei> thanks
[20:59] * Oliver1 (~oliver1@p54838FEF.dip0.t-ipconnect.de) has joined #ceph
[21:02] * rturk is now known as rturk-away
[21:19] * Tamil (~tamil@ Quit (Quit: Leaving.)
[21:24] * doubleg (~doubleg@ Quit (Quit: Lost terminal)
[21:27] * jfriedly (~jfriedly@50-0-250-146.dedicated.static.sonic.net) has joined #ceph
[21:27] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Remote host closed the connection)
[21:28] * doubleg (~doubleg@ has joined #ceph
[21:31] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Quit: Leaving.)
[21:31] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[21:32] * Meths_ (rift@ has joined #ceph
[21:35] * Oliver1 (~oliver1@p54838FEF.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[21:35] * Oliver1 (~oliver1@p54838FEF.dip0.t-ipconnect.de) has joined #ceph
[21:37] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[21:50] * Meths_ is now known as Meths
[21:52] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[22:04] * Tamil (~tamil@ has joined #ceph
[22:32] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[22:38] * Wolff_John (~jwolff@ Quit (Quit: ChatZilla 0.9.90 [Firefox 21.0/20130511120803])
[22:52] <andrei> i am trying to use ceph-deploy to install my cluster and at the moment I am having issues with adding osd
[22:52] <andrei> here is what i've done: ceph-deploy -v osd create arh-ibstorage1-ib:/dev/sdc:/dev/sda
[22:53] <andrei> the command doesn't finish, just keeps running without producing much output
[22:54] <andrei> any idea how to proceed further?
[22:56] <Tamil> andrei:whats -v for?
[22:56] <dmick> verbose
[22:56] <andrei> yeah
[22:57] <Tamil> dmick: ah ok
[22:57] <andrei> i have previously prepared the disk
[22:58] <andrei> i can see this process running: /usr/bin/python /usr/sbin/ceph-disk prepare -- /dev/sdc /dev/sda
[22:59] <Tamil> andrei: ceph-deploy disk list arh-ibstorage1-ib?
[23:01] <andrei> Tamil: it does show the list of disks including /dev/sdc
[23:01] <dmick> and /dev/sda?
[23:02] <andrei> and from what i can see it created the journal on /dev/sda
[23:02] <andrei> /dev/sda1 ceph journal
[23:02] <andrei> /dev/sdc other, unknown
[23:02] <Tamil> andrei: whats the state of the disks?
[23:02] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[23:02] <andrei> Tamil: what do you mean by the state?
[23:02] <andrei> the disks are plugged in and spinning )))
[23:03] <Tamil> Tamil: does it show as "prepared" or "active" ?
[23:03] <andrei> how do i check?
[23:03] <Tamil> andrei: ceph-deploy disk list should say that
[23:03] <andrei> nope, it's showing it as other, unknow
[23:04] <Tamil> Tamil: which distro is this?
[23:04] <andrei> ubuntu 12.04
[23:04] <andrei> latest ceph oinstalled today
[23:04] <andrei> 0.61.3
[23:05] <Tamil> andrei: hmm interesting
[23:05] <andrei> i did run zap prior to running the create command
[23:06] <Tamil> andrei:looks like it is stuck up during prepare
[23:06] <andrei> let me strace the process
[23:06] <Tamil> andrei: ok
[23:07] <andrei> it's stuck doing this:
[23:07] <andrei> link("/var/lib/ceph/tmp/arh-ibstorage1-ib.MainThread-8066", "/var/lib/ceph/tmp/ceph-disk.prepare.lock.lock") = -1 EEXIST (File exists)
[23:07] <andrei> stat("/var/lib/ceph/tmp/arh-ibstorage1-ib.MainThread-8066", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[23:07] <andrei> select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
[23:07] <andrei> just keeps showing these lines
[23:08] <Tamil> andrei: because you tried prepare followed by create?
[23:08] <andrei> is it trying to link "/var/lib/ceph/tmp/ceph-disk.prepare.lock.lock" to "/var/lib/ceph/tmp/arh-ibstorage1-ib.MainThread-8066", ?
[23:09] <andrei> Tamil, sorry, i've zapped the disk and gone with create right away
[23:09] <andrei> You may prepare OSDs, deploy them to the OSD node(s) and activate them in one step with the create command.
[23:09] <andrei> that's from the docs
[23:09] <Tamil> andrei: thought you mentioned you previously prepared the disk?
[23:10] <andrei> so, the create command, as far as I understand does both
[23:10] <Tamil> andrei: thats right, create does prepare+activate
[23:10] <andrei> nope, i've zapped the disk prior to running the create command
[23:10] <Tamil> andrei: ok got it
[23:10] * Oliver1 (~oliver1@p54838FEF.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[23:12] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) has joined #ceph
[23:14] <andrei> so, do you think this is a bug in ceph-deploy
[23:14] <andrei> ?
[23:15] <Tamil> andrei: could you please file a bug with logs
[23:16] <Tamil> andrei: how did you do an install? did you mention "--stable=cuttlefish"?
[23:17] <Tamil> andrei: how many nodes in your cluster?
[23:17] <andrei> nope, i've followed the instructions from here
[23:17] <andrei> <redeemed> ok, ty. andrei / ssejour, here's how i get ceph-deploy to work in my environment. http://ur1.ca/e87p6 please know that my ceph nodes are not virtual machines but rather physical nodes. that seems to have helped me with some ceph-deploy issues.
[23:17] <andrei> i've not mentioned --stable flag
[23:18] <andrei> it installed 0.61.3
[23:18] <Tamil> andrei: thats ok, it will pick cuttlefish by default
[23:19] <Tamil> andrei: how many nodes are you using?
[23:19] <andrei> 3 nodes
[23:19] <andrei> 2 ubuntu and 1 centos
[23:19] <andrei> osds are on ubuntu
[23:19] <andrei> centos is just for a 3rd mon
[23:19] <andrei> from the above link i went as far as creating the osds
[23:19] <andrei> i do have 3 mon processes running at the moment
[23:20] <andrei> ceph -s shows:
[23:20] <andrei> health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
[23:20] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) has joined #ceph
[23:23] * jasdeepH (~jasdeepH@50-0-250-146.dedicated.static.sonic.net) has joined #ceph
[23:24] * Vjarjadian (~IceChat77@ has joined #ceph
[23:27] * portante (~user@ Quit (Ping timeout: 480 seconds)
[23:28] <Tamil> andrei: could you please kill the ceph-disk prepare process, delete /var/lib/ceph/tmp/ceph-disk.prepare.lock.lock file and retry osd create?
[23:28] <Tamil> andrei: passing --zap-disk option to osd create command
[23:29] <andrei> okay, will do in a sec
[23:29] * dpippenger (~riven@cpe-75-85-17-224.socal.res.rr.com) has joined #ceph
[23:29] <Tamil> andrei: thanks
[23:30] <andrei> doing it now
[23:31] <andrei> how long does it usually take for this command to finish?
[23:31] <dmick> not long
[23:31] <andrei> the same thing happening with --zap-disk option
[23:32] <dmick> is there only one ceph-disk process?
[23:32] <andrei> i have also tried without the journal and it doesn't make a difference
[23:32] <dmick> that lock file should only be created by ceph-disk
[23:32] <dmick> and if it's there, then either one was interrupted at a really weird time or there's another
[23:33] <andrei> this is what i have running: http://ur1.ca/e8d2s
[23:35] <dmick> and 13220 is looping on the lock file again?
[23:35] <andrei> http://ur1.ca/e8d3q
[23:36] <andrei> this time it has done more in preparing the disk
[23:36] <andrei> let me check if it is looping
[23:36] <dmick> ok, by "same thing" you meant "it hasn't finished yet", not "the same symptom"
[23:37] <andrei> dmick: yeah, sorry
[23:37] <andrei> should have been more specific
[23:37] <andrei> it's not finished yet
[23:37] <andrei> however, it's not looping either
[23:37] <andrei> if i strace the process 13220 it shows:
[23:37] <andrei> strace -p 13220
[23:37] <andrei> Process 13220 attached - interrupt to quit
[23:37] <andrei> wait4(14315,
[23:37] <andrei> and just waits for something
[23:37] <andrei> whereas before it was looping on the lock file
[23:38] <dmick> what's 14315
[23:38] <andrei> root 14315 0.0 0.0 27364 1556 pts/6 S 22:34 0:00 partprobe /dev/disk/by-id/scsi-35000cca01a9acbac
[23:39] <dmick> one wonders why partprobe would hang
[23:39] <andrei> i Ctrl+C the strace on partprobe and it crashed ceph-deploy
[23:40] <dmick> is this disk healthy? throwing any errors in syslog?
[23:40] <andrei> partprobe was waiting for another process
[23:42] <andrei> here is what dmesg shows: http://ur1.ca/e8d6o
[23:43] <jfriedly> Hey guys. I'm having some trouble getting RADOS Gateway setup. I had it running yesterday, uploaded some files, and then we ran some destructive tests and we're trying to get radosgw running again so we can see if any of the files survived.
[23:43] <jfriedly> I'm running it with a bunch of logging options turned on, and I've got a gist of radosgw.log at https://gist.github.com/jfriedly/5732621
[23:43] <andrei> okay, i might know what's going on. I've got enhanceIO doing read caching on the osd drives
[23:43] <andrei> so perhaps it's causing the problem
[23:44] <andrei> i will remove it for the time being and do the create again
[23:44] <andrei> and see if it's caused by enhanceIO stuff
[23:44] <jfriedly> Basically radosgw renews it's monmap for five minutes, then it dies with a log message, "-1 Initialization timeout, failed to initialize"
[23:44] * eschnou (~eschnou@220.177-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[23:45] * BillK (~BillK@124-169-216-2.dyn.iinet.net.au) has joined #ceph
[23:45] <jfriedly> Anyone have any ideas?
[23:45] * dpippenger (~riven@cpe-75-85-17-224.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[23:46] <Tamil> andrei: sounds good
[23:48] <dmick> andrei: libceph? Were you trying to mount a cephfs before you had OSDs up?
[23:50] <andrei> dmick: i had another cluster on this server prior to this one
[23:50] <andrei> i've removed it
[23:50] <andrei> removed all the packages
[23:50] <andrei> cleaned the osds with dd from zero
[23:50] <andrei> i believe i had ceph-fuse mountpoint with the old cluster
[23:51] <andrei> but i think i've unmounted it prior to building this one
[23:51] <andrei> i will restart the arh-ibstorage1-ib server and try again
[23:51] <andrei> the enhanceIO removal didn't fix the problem
[23:56] * madkiss (~madkiss@p5099e2ec.dip0.t-ipconnect.de) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.