#ceph IRC Log


IRC Log for 2013-01-17

Timestamps are in GMT/BST.

[0:03] * al (d@fourrooms.bandsal.at) Quit (Remote host closed the connection)
[0:05] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[0:05] <dmick> jks: stat/unlink on FORREMOVAL seems like normal startup/backfill type stuff
[0:05] * al (d@niel.cx) has joined #ceph
[0:08] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[0:09] * sander (~chatzilla@c-174-62-162-253.hsd1.ct.comcast.net) Quit (Ping timeout: 480 seconds)
[0:14] <dmick> amichel: I ran crushtool -i amichel.map --test --rule 2 --output-csv on your compiled map
[0:14] * noob2 (~noob2@ext.cscinfo.com) has left #ceph
[0:15] <dmick> and the distribution of objects in rbd-device_utilization.csv looks reasonable to me
[0:15] <amichel> Ok, I just don't follow why the 7872 pgs are stuck unclean
[0:16] <amichel> They're listed as "active+remapped" also
[0:17] <dmick> first, you should look at http://ceph.com/docs/master/rados/operations/troubleshooting-osd/#stuck-placement-groups
[0:19] <amichel> I do not have any unfound objects
[0:19] <dmick> ceph osd dump is probably what I'd look at next
[0:20] * nwat (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[0:20] * Leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[0:21] <amichel> I'm sorry, I have to deal with some VMware shenanigans for a bit. I appreciate your help and I'll definitely be back
[0:22] <dmick> ok
[0:44] <amichel> Ok, sorry about that
[0:44] <amichel> bus sharing in VMware: Don't do it.
[0:45] <amichel> So I'm looking at this osd dump but I do not see anything jumping out at me as indicative
[0:48] * mattbenjamin (~matt@ has joined #ceph
[0:50] <dmick> amichel: pastebin?
[0:50] <amichel> http://dpaste.org/Ksj3D/
[0:51] * dosaboy (~gizmo@host86-164-221-235.range86-164.btcentralplus.com) Quit (Quit: Leaving.)
[1:03] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:06] * tnt (~tnt@120.194-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:07] <buck> I'm seeing an error when I try to run vstart.sh -d -n -l it looks like vstart doesn't think user_xattr is set, but it is (underlying FS is ext4 and the root drive). Has anyone seen this lately?
[1:07] <dmick> amichel: I agree there's nothing really smoking there, other than that everything is remapped atm and I think the OSDs are out of date
[1:07] <buck> It's got to be a configuration issue as another, similar host runs vstart just fine
[1:08] <dmick> buck: does vstart even care about user_xattr?
[1:08] <buck> osd.0.log has this: Extended attributes don't appear to work. Got error (95) Operation not supported. If you are using ext3 or ext4, be sure to mount the underlying file system with the 'user_xattr' option.
[1:08] <buck> 2013-01-16 16:01:48.724497 7fbd0cd1b780 -1 OSD::mkfs: couldn't mount FileStore: error -95
[1:08] <buck> 2013-01-16 16:01:48.724554 7fbd0cd1b780 -1 ^[[0;31m ** ERROR: error creating empty object store in dev/osd0: (95) Operation not supported^[[0m
[1:10] <buck> dmick: I think I found something in the log (and not xattr related). Looking into it now
[1:10] <dmick> so it's complaining about the mount of your source dir, presumably, which is probably root
[1:22] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[1:27] <dmick> amichel: could you also share ceph pg dump
[1:28] * jlogan (~Thunderbi@2600:c00:3010:1:a9fc:bead:751e:61d9) Quit (Ping timeout: 480 seconds)
[1:31] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[1:40] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[1:40] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[1:41] <dmick> amichel: yt?
[1:47] * jlogan (~Thunderbi@ has joined #ceph
[1:49] <dmick> amichel: others with more crush experience have posited that indeed the problem is one host
[1:49] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:49] <dmick> chooseleaf firstn 0 type host means "first choose nrep-0 hosts, then choose a leaf (device/osd) from each"
[1:49] * LeaChim (~LeaChim@b0fadd12.bb.sky.com) Quit (Ping timeout: 480 seconds)
[1:49] <dmick> and since there's only one host, you can only ever choose one osd
[1:50] <dmick> which isn't enough, so all your pgs are unclean
[1:51] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[1:52] * dpippenger1 (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[1:57] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:00] <dpippenger1> probably a silly question, but how do I set the osd_max_backfills? Is it a ceph.conf directive under [osd]? Or a runtime setting on the pool?
[2:01] <dpippenger1> and is the config actually osd_max_backfills? or "osd max backfills"
[2:01] <dpippenger1> I can't find any clear docs on it
[2:01] * Leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[2:02] <dmick> config options in general are spelled with spaces in the conf file, and underscores in the variable name
[2:02] * Leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[2:02] <dmick> so they tend to be spelled both ways casually
[2:04] <dmick> searching on ceph.com/docs is always good
[2:04] <dmick> you can also verify in common/config_opts.h
[2:04] <dpippenger1> I've been doing that
[2:04] <dmick> in this case:
[2:04] <dmick> // Maximum number of backfills to or from a single osd
[2:04] <dmick> OPTION(osd_max_backfills, OPT_U64, 10)
[2:04] <dpippenger1> the docs don't specify the directive exactly
[2:04] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[2:04] <dpippenger1> yeah, that's the new code with 10
[2:04] <dpippenger1> I'm on the old code with 5
[2:04] <dpippenger1> but I'm still not sure how to actually change it
[2:05] <dmick> I just mean that its name is "osd_max_backfills"
[2:05] <dpippenger1> how do I send the option?
[2:05] <dmick> so you would put an entry "osd max backfills" in the config file
[2:05] <dmick> or you could inject it into a live daemon
[2:05] <dpippenger1> tried that... it seems to still be running it at 5
[2:05] <dmick> http://ceph.com/docs/master/rados/configuration/ceph-conf/#runtime-changes
[2:05] <dmick> ok; what did you try specifically?
[2:06] <dpippenger1> I tried putting "osd max backfills = 100" in my ceph.conf and restarting the osd
[2:06] <dmick> where in the ceph.conf
[2:06] <dpippenger1> under the [osd] section
[2:07] <dmick> I agree that ought to have worked
[2:07] <dpippenger1> I'll keep playing with it, thanks for your help
[2:07] <dpippenger1> the runtime changes doc helps a bit
[2:07] <dmick> silly things:
[2:07] * Cube (~Cube@ Quit (Ping timeout: 480 seconds)
[2:07] <dmick> 1) the ceph.conf the daemon was actually using (args to the daemon, if not default, on the right host?)
[2:08] <dpippenger1> yeah I changed the ceph.conf on the actual system runing the osd if that's what you mean
[2:09] <dmick> 2) you can verify the setting with the admin socket and "ceph --admin-daemon <asokpath> config show"
[2:09] <dpippenger1> although I'm not sure if the backfill parameter is set by the osd getting the backfill or the ones sending the backfill
[2:09] <dpippenger1> the page here seemed to infer backfills were controlled by the recieve
[2:09] <dpippenger1> http://ceph.com/dev-notes/whats-new-in-the-land-of-osd/
[2:10] <dpippenger1> but let me set it on every osd and restart the whole cluster
[2:10] <dpippenger1> I hate feeling like I'm guessing about things
[2:10] <dmick> yeah, I'm not sure about that, but do keep in mind that the pgs are scattered around, so it's only going to affect PGs on this OSD
[2:12] <dpippenger1> it's a new osd I'm trying to backfill
[2:13] <dpippenger1> I think adding that directive elsewhere along with adding "osd recovery max active = 100" sorted it
[2:13] <dpippenger1> I see 100 running now
[2:13] <dpippenger1> I guess the reciever was allowing 100, but the senders had a cap on max active so they weren't sending enough
[2:14] <dmick> ah, yeah, makes sense; that default was 5 too
[2:14] <dpippenger1> thanks again for your help
[2:14] <dmick> sure
[2:30] <jmlowe> sjust: Is there anything else I can do to help with 3810, more logs, access to the cluster, etc?
[2:30] * zK4k7g (~zK4k7g@digilicious.com) Quit (Quit: Leaving.)
[2:31] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[2:34] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:40] <amichel> dmick: Sorry, my boss walked in, I had to go explain Ceph to her, which was pretty comical cause I don't understand it myself yet :D
[2:40] <amichel> dmick: here's the ceph pg dump: http://dpaste.org/mgggP/
[2:41] <dmick> yeah. later messages show that I don't understand CRUSH well enough :) but I know more now, and that's your issue
[2:41] <amichel> So if I modify the default data rule to choose a lower leaf first, like device or somesuch, then they'll clean themselves up?
[2:42] <amichel> I mean, all the default rules
[2:42] <amichel> Not just data
[2:42] <dmick> uhh
[2:43] <amichel> Oh man, what did I say :D
[2:43] <dmick> I think so probably, like, hba instead of host
[2:43] <amichel> ie: step chooseleaf firstn 0 type hba
[2:43] <amichel> right
[2:43] <amichel> There's no production data on this yet, so I can try anything
[2:43] <dmick> I can't help but believe that crushtool --test should help us here
[2:44] <jmlowe> dmick: if I understood crush well enough I'd write a simulator
[2:45] <dmick> well that's sorta what --test is
[2:45] <dmick> I just don't quite grok it yet
[2:45] <jmlowe> I didn't know it existed until about 30 seconds ago
[2:46] <dmick> (partly because its manpage doesn't even mention --test:
[2:46] <dmick> http://tracker.newdream.net/issues/3827)
[2:46] * rturk is now known as rturk-away
[2:47] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[2:48] <amichel> I set the new crushmap and the unclean pages are cleaning up
[2:48] <dmick> kewl
[2:49] <amichel> Pretty fast too
[2:50] <dmick> pg == placement group, btw
[2:50] <amichel> oh right
[2:50] <amichel> I even knew that, I still said pages
[2:50] <amichel> well, typed
[2:50] <dmick> brains. what are you gonna do.
[2:50] <amichel> Drink, mostly
[2:51] <dmick> ;)
[2:51] <amichel> 1100 left unclean, kickin' along
[2:51] <jmlowe> dmick: since I have you, how much to you know about the internals of file store?
[2:52] <dmick> not a lot off the top of my head, but I know the code a little
[2:52] <sjustlaptop> jmlowe: sorry for the delay, paying attention now
[2:52] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:53] <jmlowe> sjustlaptop: no problem, this bug just seems like a showstopper for bobtail or whatever the next stable release is
[2:53] <sjustlaptop> yeah, I'm looking now
[2:54] <jmlowe> I can get you on the cluster if that would help
[2:54] * alram (~alram@ Quit (Quit: leaving)
[2:54] <dmick> amichel: so --test does help
[2:54] <dmick> if you run it with --output-csv
[2:54] <amichel> Does it?
[2:54] <dmick> you get a *placement*csv file
[2:54] <dmick> that maps 'object id' to 'output'
[2:54] <dmick> with your original map, that's a file of two number (i.e. one output)
[2:55] <dmick> with the 'type hba', each line has three numbers
[2:55] <dmick> (--test basically runs a simulation)
[2:55] <dmick> jmlowe: you now have the FileStore expert :)
[2:57] <jmlowe> I was wondering about doing some surgery to affect a manual repair, I think the secondary has good copies of the objects what would happen if I just manually copied them over to the primary
[2:59] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[3:00] <jmlowe> sjustlaptop: sorry about the size of the logs, I turned debugging all the way up, created a new cluster, copied about 1.2TB of data
[3:01] <sjustlaptop> not at all, I have ways of dealing with big logs!
[3:03] <dmick> http://www.youtube.com/watch?v=XZQL22xOmUM
[3:04] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[3:10] <jmlowe> ffs, I've been listening to that song on the radio for decades and didn't know the name until tonight
[3:10] <dmick> hah
[3:10] <jmlowe> snow up hill both ways no shazam
[3:11] <dmick> no idea what it has to do with the song, or even what the song means, but it's nice 80s guitar
[3:17] <sjustlaptop> jmlowe: can I get a dump of the xattrs on the d0c18e1d/605.00000000/head//1 object in pg 1.1d on osd 7?
[3:17] <sjustlaptop> and the corresponding object on osd 0?
[3:17] <sjustlaptop> something is very odd
[3:25] <jmlowe> sjustlaptop: you are looking for those via rados or from the filesystem?
[3:25] <sjustlaptop> from the filesystem
[3:25] <sjustlaptop> current/1.1d_head/
[3:26] <sjustlaptop> you can put them on the bug
[3:26] <jmlowe> /data/osd.7/current/1.1d_head# ls -la
[3:26] <jmlowe> total 0
[3:26] <jmlowe> drwxr-xr-x 1 root root 0 Jan 13 15:09 .
[3:26] <jmlowe> drwxr-xr-x 1 root root 5916 Jan 13 15:09 ..
[3:27] <jmlowe> wait you mean from the directory not a file in there?
[3:27] <sjustlaptop> no, should have been a file in there
[3:27] <sjustlaptop> that's odd
[3:28] <jmlowe> /data/osd.0/current/1.1d_head# ls -la
[3:28] <jmlowe> total 0
[3:28] <jmlowe> drwxr-xr-x 1 root root 0 Jan 13 15:09 .
[3:28] <jmlowe> drwxr-xr-x 1 root root 12270 Jan 15 20:12 ..
[3:28] <jmlowe> neither one has any files
[3:29] <sjustlaptop> hmm, must have been deleted, I'll have to repeat with another object
[3:29] <sjustlaptop> I'll get back to you tomorrow probably
[3:30] <sjustlaptop> perhaps tonight
[3:30] <dmick> this might save you a little time jmlowe when you get there:
[3:30] <dmick> c$ cat ~/bin/listxattrs
[3:30] <dmick> #!/bin/bash
[3:30] <dmick> path=$1
[3:30] <dmick> for a in $(attr -lq $path); do echo $a; attr -qg $a $path | xxd; done
[3:30] <jmlowe> oh, make sure you are looking at stuff after I recreated the cluster, those logs start before I started from scratch
[3:31] <jmlowe> sjust: what's your timezone, I'm in eastern
[3:31] <sjustlaptop> pacific
[3:33] <jmlowe> ok, duly noted, I'll look for you tomorrow
[3:33] <jmlowe> dmick: thanks, that's handy
[3:33] <dmick> yw, thanks for all your help here
[3:34] <jmlowe> least I could do
[3:35] <dmick> heh, well, no, the least you could do is run into one problem, insult the project and its developers, and stomp off :)
[3:35] <dmick> and we appreciate that you don't
[3:50] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[3:54] * jlogan (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[4:01] * amichel (~amichel@salty.uits.arizona.edu) Quit ()
[4:10] <dmick> AARGH. That stupid CRUSHmap compile problem was '\r' in the file
[4:10] <dmick> I neglected to notice the [dos] down in the status line for vim
[4:13] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[4:16] <elder> dmick, problem with machine teuthology?]
[4:16] <elder> (I can't get to it)
[4:16] <dmick> email?
[4:16] <dmick> (as in read your)
[4:17] <elder> Ahh. Got it.
[4:17] <elder> Do you remember the flag that I set to tell it not to check locks?
[4:17] <dmick> 80min remaining at 6:33
[4:17] <dmick> and, uh
[4:18] <dmick> check-locks: false, it looks like
[4:19] <dmick> def check_lock(ctx, config):
[4:19] <dmick> if ctx.config.get('check-locks') == False:
[4:19] <dmick> log.info('Lock checking disabled.')
[4:19] <dmick> return
[4:19] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[4:40] * drafter (~drafter@ has joined #ceph
[4:46] <elder> I'm running, by the way. Thanks dmick. And now I'm going away for a bit.
[4:46] <dmick> cheers
[4:48] * drafter_ (~drafter@ has joined #ceph
[4:48] * drafter (~drafter@ Quit (Read error: Connection reset by peer)
[4:48] * drafter_ is now known as drafter
[4:59] * chutzpah (~chutz@ Quit (Quit: Leaving)
[5:03] * jlogan (~Thunderbi@2600:c00:3010:1:49d6:5ead:ab1a:61ba) has joined #ceph
[5:18] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[5:19] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[5:29] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:36] * joshd1 (~jdurgin@2602:306:c5db:310:28f6:eba8:a4e6:3810) Quit (Quit: Leaving.)
[6:00] * mattbenjamin (~matt@ Quit (Quit: Leaving.)
[6:42] * dmick (~dmick@ Quit (Quit: Leaving.)
[7:43] * itamar (~itamar@ has joined #ceph
[8:06] * tnt (~tnt@120.194-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:12] * drafter (~drafter@ Quit (Quit: drafter)
[8:12] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:17] * loicd (~loic@magenta.dachary.org) Quit ()
[8:17] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Quit: No Ping reply in 180 seconds.)
[8:17] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[8:19] * low (~low@ has joined #ceph
[8:21] * alexxy (~alexxy@2001:470:1f14:106::2) Quit ()
[8:21] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[8:24] * alexxy (~alexxy@2001:470:1f14:106::2) Quit ()
[8:25] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[8:28] * alexxy (~alexxy@2001:470:1f14:106::2) Quit ()
[8:33] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[8:35] * jlogan (~Thunderbi@2600:c00:3010:1:49d6:5ead:ab1a:61ba) Quit (Ping timeout: 480 seconds)
[8:46] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) has joined #ceph
[8:48] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[8:52] * Leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[8:57] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[9:00] * Leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[9:01] * jrisch (~Adium@ has joined #ceph
[9:18] * sagewk (~sage@2607:f298:a:607:a911:b1:1097:bd96) Quit (Read error: Operation timed out)
[9:21] * tnt (~tnt@120.194-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:29] * jrisch (~Adium@ Quit (Quit: Leaving.)
[9:30] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) has joined #ceph
[9:33] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:37] * benner_ (~benner@ Quit (Read error: Connection reset by peer)
[9:37] * benner (~benner@ has joined #ceph
[9:38] * BManojlovic (~steki@ has joined #ceph
[9:38] * sagewk (~sage@ has joined #ceph
[9:38] * verwilst (~verwilst@d528F423A.access.telenet.be) has joined #ceph
[9:44] <tnt> did 0.48.3 include any memory leak fixes ?
[9:50] * jrisch (~Adium@83-95-19-94-static.dk.customer.tdc.net) has joined #ceph
[9:52] * gaveen (~gaveen@ has joined #ceph
[9:54] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) Quit (Remote host closed the connection)
[9:54] * ScOut3R (~ScOut3R@ has joined #ceph
[9:55] * ScOut3R_ (~ScOut3R@ has joined #ceph
[9:55] * ScOut3R_ (~ScOut3R@ Quit (Remote host closed the connection)
[9:55] * ScOut3R_ (~ScOut3R@ has joined #ceph
[9:57] * ScOut3R (~ScOut3R@ Quit (Read error: Operation timed out)
[9:58] * Leseb (~leseb@ has joined #ceph
[10:00] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (Quit: Leaving)
[10:01] * xdeller (~xdeller@broadband-77-37-224-84.nationalcablenetworks.ru) Quit (Quit: Leaving)
[10:05] * xiaoxi (~xiaoxiche@ Quit (Ping timeout: 480 seconds)
[10:05] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[10:06] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[10:08] * dosaboy (~user1@host86-164-221-235.range86-164.btcentralplus.com) has joined #ceph
[10:10] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (Read error: Connection reset by peer)
[10:10] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[10:15] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:24] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[10:27] * LeaChim (~LeaChim@b0fadd12.bb.sky.com) has joined #ceph
[10:27] * sleinen (~Adium@2001:620:0:25:8cea:f916:74ec:b30c) has joined #ceph
[10:29] <jks> dmick, regarding our chat yesterday... not looking good. I let it run over night - and all osds crashed apart from the new osd I introduced
[10:31] <jks> same thing for all osds that crashed again... crashed with a stack trace with (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2eb) [0x8462bb] at the top
[10:31] <jks> 2013-01-17 07:06:46.586899 7f63d29cb700 -1 *** Caught signal (Aborted) **
[10:32] <jks> two of the osds that I restarted yesterday ran for about 9 hours before they crashed again
[10:33] * nz_monkey (~quassel@ Quit (Remote host closed the connection)
[10:34] * sleinen1 (~Adium@2001:620:0:46:a1d0:36b2:c4ae:f6fd) Quit (Ping timeout: 480 seconds)
[10:34] * nz_monkey (~nz_monkey@ has joined #ceph
[10:44] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[10:46] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[10:50] * Ryan_Lane1 (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[10:54] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[11:06] * jksM (~jks@3e6b7199.rev.stofanet.dk) has joined #ceph
[11:07] * jks (~jks@3e6b7199.rev.stofanet.dk) Quit (Read error: Connection reset by peer)
[11:13] * frey (~frey@togt-130-208-247-19.ru.is) has joined #ceph
[11:13] <frey> Hello.
[11:13] <jksM> hello
[11:15] * ScOut3R_ (~ScOut3R@ Quit (Remote host closed the connection)
[11:16] <frey> I am interested in setting up a Ceph test environment at my university. My goal is to use Ceph for production data. VMWare virtual machines and research data. - Could you tell me some recommendations? Which Linux distribution, what to look for, etc?
[11:17] * wsmob_705215 (~wsmob_705@www.nowhere-else.org) has joined #ceph
[11:17] <tnt> as in VMWare ESX ?
[11:17] <frey> I am also curious to know if you know of a list of companies that are using Ceph in production today. I know Dreamhost is. Do you know anything about their setup?
[11:18] <frey> tnt: Yes.
[11:18] <frey> tnt: I haven't seen RBD for VMWare ESX, but until that's available, I will be using something like an iSCSI proxy.
[11:19] <tnt> Yes, I guess that's the easiest ...
[11:19] * ScOut3R (~ScOut3R@ has joined #ceph
[11:19] * itamar (~itamar@ Quit (Quit: Ex-Chat)
[11:19] * itamar (~itamar@ has joined #ceph
[11:19] <jksM> I would go to the web site for recommendations on OS versions and hardware
[11:19] <frey> tnt: Too bad it makes it harder to sell Ceph here. ;)
[11:20] <tnt> I personally use ubuntu 12.04 LTS for the ceph nodes. The RBD clients (xen dom0 in my case) are Debian Squeeze but with a custom 3.6.y kernel.
[11:21] * nz_monkey (~nz_monkey@ Quit (Remote host closed the connection)
[11:21] <frey> Do you know how it is to have CentOS 6 clients? Are there any difficulties that you are aware of?
[11:22] <wsmob_705215> hello everyone. i am totally new to ceph and wanted to this via getting started guide. but when I type ceph health, i get errors like: "2013-01-17 13:20:58.525020 7f6f48a7a700 0 -- :/5453 >> pipe(0x7f6f380039f0 sd=4 :0 pgs=0 cs=0 l=1).fault" google does not help here, any ideas how to solve this or at least get an idea what happend
[11:23] <tnt> frey: to re-export in iSCSI you most likely need the rbd to appear as a block device, so you need the kernel client and so you need a recent kernel. (3.2 is too old and rbd is buggy in it).
[11:23] * nz_monkey (~nz_monkey@ has joined #ceph
[11:23] <tnt> frey: I don't know what CentOS 6 uses as kernel, but that's the only major point to pay attention to for rbd kernel clients.
[11:24] <low> tnt: 2.6.32 as a base.
[11:24] <tnt> wsmob_705215: pastebin the but log.
[11:24] <frey> tnt: An old 2.6 kernel with lots of backported code from upstream.
[11:24] <tnt> yeah ... don't use that.
[11:24] <tnt> I've never seen ceph being backported by distro.
[11:24] <exec> frey: try to use kernel from elrepo
[11:24] <phantomcircuit> so i was noticing that whenever filestore.committing in perf dump is 1 throughput goes to shit
[11:24] <phantomcircuit> http://i.imgur.com/XFhqQ.png
[11:24] <phantomcircuit> sure enough it's baiscally 1:1
[11:25] <phantomcircuit> and by goes to shit i mean zero io is happening
[11:25] <exec> frey: there are 3.7.1 at the momemnt and I had no issues with it
[11:25] <phantomcircuit> im guessing there's a lock on new io while the osd is committing
[11:25] <wsmob_705215> @tnt do you mean the logs from /var/log/ceph ?
[11:25] <cephalobot> wsmob_705215: Error: "tnt" is not a valid command.
[11:25] <wsmob_705215> tnt: do you mean the logs from /var/log/ceph ?
[11:26] <tnt> from whereever you see errors
[11:26] <frey> Well, then my only problem is that I have some machines that I would like to mount CephFS to, but they are running CentOS 6 with a 2.6 kernel that I can't upgrade.
[11:26] <frey> And RHEL 6.3.
[11:27] <frey> Is there something I can do on those machines? Or will I need to wait?
[11:27] <wsmob_705215> well all messages from ceph health look like: 2013-01-17 13:27:25.547406 7f6f42504700 0 -- :/5453 >> pipe(0x7f6f38044220 sd=4 :0 pgs=0 cs=0 l=1).fault 2013-01-17 13:27:28.547619 7f6f48a7a700 0 -- :/5453 >> pipe(0x7f6f38043510 sd=3 :0 pgs=0 cs=0 l=1).fault 2013-01-17 13:27:31.547795 7f6f42504700 0 -- :/5453 >> pipe(0x7f6f38044c10 sd=3 :0 pgs=0 cs=0 l=1).fault 2013-01-17 13:27:34.548002 7f6f48a
[11:28] <wsmob_705215> ok, i will use pastebin for formatting
[11:32] <frey> The main reason why I can't upgrade my CentOS kernel on some machines is that they have a proprietary FiberChannel driver and are using Lustre that works with that particular kernel.
[11:33] <frey> I would have thought about changing Lustre into Ceph, but I don't think Ceph has Infiniband support today.
[11:33] <exec> frey: use fuse or VM with any os on top of baremetal servers. but very likely you won't be satisfied with performance
[11:36] * ScOut3R (~ScOut3R@ Quit (Remote host closed the connection)
[11:37] <frey> exec: Hmm, maybe Lustre support is in elrepo. ;)
[11:38] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[11:40] <frey> exec: Have you used CentOS with elrepo for the Ceph nodes? Or would you recommend Ubuntu too?
[11:41] <wsmob_705215> http://pastie.org/5702327
[11:42] <wsmob_705215> (our firewall blocks pastebin ;))
[11:43] <exec> frey: I've used it for osd+mon
[11:43] <exec> not for client
[11:45] <jksM> frey: I'm using CentOS 6 for clients with kvm, but not with CephFS
[11:47] * nz_monkey (~nz_monkey@ Quit (Remote host closed the connection)
[11:48] * nz_monkey (~nz_monkey@ has joined #ceph
[11:53] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) has joined #ceph
[11:56] * xiaoxi (~xiaoxiche@ has joined #ceph
[11:57] <alexxy> hi all
[11:57] <alexxy> can anyone tell me why i get following state
[11:57] <alexxy> health HEALTH_WARN 4 pgs peering; 4 pgs stuck inactive; 87 pgs stuck unclean; recovery -160/3364332 degraded (-0.005%)
[12:05] * Morg (d4438402@ircip2.mibbit.com) has joined #ceph
[12:09] * nz_monkey (~nz_monkey@ Quit (Remote host closed the connection)
[12:10] * nz_monkey (~nz_monkey@ has joined #ceph
[12:21] * nz_monkey (~nz_monkey@ Quit (Remote host closed the connection)
[12:22] * nz_monkey (~nz_monkey@ has joined #ceph
[12:27] * xdeller (~xdeller@ has joined #ceph
[12:29] * nz_monkey (~nz_monkey@ Quit (Remote host closed the connection)
[12:29] * wsmob_705215 (~wsmob_705@www.nowhere-else.org) Quit (Remote host closed the connection)
[12:29] * nz_monkey (~nz_monkey@ has joined #ceph
[12:43] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[12:51] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[12:55] * Leseb (~leseb@ Quit (Ping timeout: 480 seconds)
[13:05] * wsmob_705215 (~wsmob_705@www.nowhere-else.org) has joined #ceph
[13:07] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[13:20] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[13:22] * mdxi_ (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[13:24] <wsmob_705215> tnt: could you already have a look at the logs (I am not sure what happend while my client was disconnected)
[13:25] <absynth_47215> he hasn't talked while you were disconnected
[13:26] <tnt> yeah no idea ... are the daemons even running ?
[13:33] * wsmob_705215 (~wsmob_705@www.nowhere-else.org) Quit (Remote host closed the connection)
[13:33] * wsmob_705215 (~wsmob_705@www.nowhere-else.org) has joined #ceph
[13:39] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[13:41] * Leseb (~leseb@ has joined #ceph
[13:42] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) Quit (Quit: Ex-Chat)
[13:48] <xiaoxi> hi tnt
[13:48] <xiaoxi> I am trying 3.6.7 (rbd) +3.6.7(osd) + xfs,it looks good now
[13:49] <xiaoxi> stable for more than 2.5h
[13:49] <wsmob_705215> well this is embarrassing. server was not running and direct start of ceph-mon gives some more information
[13:50] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:52] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[13:54] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) has joined #ceph
[14:02] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:05] * wsmob_705215 (~wsmob_705@www.nowhere-else.org) Quit (Remote host closed the connection)
[14:06] * Morg (d4438402@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[14:12] * loicd1 (~loic@ has joined #ceph
[14:14] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[14:14] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[14:14] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[14:16] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[14:20] * wsmob_705215 (~wsmob_705@www.nowhere-else.org) has joined #ceph
[14:26] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[14:31] * gaveen (~gaveen@ has joined #ceph
[14:51] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[14:54] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: This computer has gone to sleep)
[15:01] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[15:08] <mikedawson> nhm: what do you expect to see with rados bench specifying a write size larger than 4MB? Performance seems to drop off quickly. Is that a limitation of the rados bench tool or a performance issue in Ceph itself?
[15:11] <mikedawson> I ask because I can saturate my network with large enough writes (1-8MB) and I can saturate my spindles with small enough writes, which both make sense. But I've seen bottlenecks I cannot explain moving large files around.
[15:13] * Leseb (~leseb@ Quit (Ping timeout: 480 seconds)
[15:16] * loicd1 (~loic@ Quit (Quit: Leaving.)
[15:17] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[15:21] * wsmob_705215 (~wsmob_705@www.nowhere-else.org) Quit (Remote host closed the connection)
[15:22] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[15:23] <rlr219> ok. changed my crush map. seemd to work fine, except wont recover completely. mon.0 [INF] pgmap v433576: 4032 pgs: 2693 active+clean, 1338 active+remapped, 1 active+clean+scrubbing+deep; 1803 GB data, 5462 GB used, 50401 GB / 55863 GB avail; 545/1603584 degraded (0.034%)
[15:24] <rlr219> get down to just a few hundred PGs degraded and stops
[15:24] <rlr219> anyone know how to fix?
[15:27] <mikedawson> rlr219: 0.561?
[15:27] <rlr219> yes
[15:28] <mikedawson> rlr219: have you tried tunables?
[15:29] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) Quit (Remote host closed the connection)
[15:29] <rlr219> i did. and maybe I am just being impatient. but it jumped back up to 12%+ degraded.
[15:30] <mikedawson> rlr219: my experience has been if it is moving slowly, but consistently moving, just keep waiting (and read up about tuning). If it stops moving, experiment with tunables
[15:32] <rlr219> mikedawson: ya looked at your post yesterday and the bug you listed. you just added those tunables and yours recovered?
[15:34] * Leseb (~leseb@ has joined #ceph
[15:35] * ScOut3R_ (~ScOut3R@ has joined #ceph
[15:37] <madkiss> Have we heard anything about performance regressions in bobtail when used in conjunction with qemu/rbd?
[15:38] <madkiss> i have a report here stating that while rados bench shows 100mb/s constantly, from within a running VM, 30mb/s or less are available
[15:38] <mikedawson> rlr219: yes. I've had to do it on three different occasions as I've changed replication size or added / remove OSDs to clusters running 0.56.1
[15:39] <mikedawson> madkiss: I've seen a similar lack of performance
[15:42] * ScOut3R (~ScOut3R@dslC3E4E249.fixip.t-online.hu) Quit (Ping timeout: 480 seconds)
[15:42] * wsmob_705215 (~wsmob_705@www.nowhere-else.org) has joined #ceph
[15:43] <mikedawson> madkiss: some benchmarking I've done against 0.56.1 performs great (achieving max throughput or iops depending on the expected limiting factor of the test), but other tests fall well short of my expectations. In those cases, I can't find any limiting factor (CPU is low, plenty of RAM, net utilization low, spindles and ssds not breaking a sweat)
[15:45] * ninkotech (~duplo@ Quit (Ping timeout: 480 seconds)
[15:45] * ninkotech_ (~duplo@ Quit (Ping timeout: 480 seconds)
[15:46] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:47] * nhorman (~nhorman@nat-pool-rdu.redhat.com) has joined #ceph
[15:48] * vata (~vata@2607:fad8:4:6:e06d:b51e:cfa2:38bb) has joined #ceph
[15:48] <madkiss> i see
[15:49] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[15:50] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[15:53] * itamar (~itamar@ Quit (Quit: Ex-Chat)
[15:54] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[15:55] * ninkotech (~duplo@ has joined #ceph
[15:56] * ninkotech_ (~duplo@ has joined #ceph
[16:00] * aliguori (~anthony@ has joined #ceph
[16:11] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[16:12] * PerlStalker (~PerlStalk@ has joined #ceph
[16:15] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[16:25] * xiaoxi (~xiaoxiche@ Quit (Ping timeout: 480 seconds)
[16:35] * Leseb_ (~leseb@ has joined #ceph
[16:35] * Leseb (~leseb@ Quit (Read error: Connection reset by peer)
[16:37] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[16:43] * wsmob_705215 (~wsmob_705@www.nowhere-else.org) Quit (Remote host closed the connection)
[16:43] <absynth_47215> ehlo everyone
[16:44] <absynth_47215> is there a way to see the *current* weight of an OSD during the reweighting process?
[16:44] <absynth_47215> ceph osd tree naturally only shows the *target* weight
[16:46] * Leseb_ is now known as Leseb
[16:46] * Leseb is now known as leseb
[16:52] <tnt> huh ?
[16:54] <tnt> AFAIU, When a new weight is applied, that changes the crushmap and hence the pg mapping and this is pretty much instant. Then of course, it will need to actually move the PG and the status of that should be in ceph -s
[16:55] * jamespage (~jamespage@tobermory.gromper.net) Quit (Quit: Coyote finally caught me)
[16:55] * low (~low@ Quit (Quit: Leaving)
[16:56] * jamespage (~jamespage@tobermory.gromper.net) has joined #ceph
[16:57] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[16:58] * rtek (~sjaak@empfindlichkeit.nl) has joined #ceph
[17:00] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[17:21] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:23] * sander (~chatzilla@c-174-62-162-253.hsd1.ct.comcast.net) has joined #ceph
[17:30] * sleinen1 (~Adium@2001:620:0:46:c8d1:5f82:21f6:d1da) has joined #ceph
[17:35] * sleinen (~Adium@2001:620:0:25:8cea:f916:74ec:b30c) Quit (Ping timeout: 480 seconds)
[17:36] * drokita (~drokita@ has joined #ceph
[17:44] * ScOut3R_ (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[17:46] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[17:51] <drokita> So, all hosts being equal, it seems to follow that an optimal crush map would have all hosts weighted the same?
[17:53] <tnt> yes
[17:53] <tnt> in a perfect world :p
[17:56] <drokita> This world isn't perfect?
[17:56] <drokita> :)
[17:59] * ninkotech (~duplo@ Quit (Ping timeout: 480 seconds)
[17:59] * ninkotech_ (~duplo@ Quit (Ping timeout: 480 seconds)
[17:59] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[18:01] * leseb (~leseb@ Quit (Remote host closed the connection)
[18:01] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:03] * ninkotech (~duplo@ has joined #ceph
[18:04] * ninkotech_ (~duplo@ has joined #ceph
[18:05] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Read error: Operation timed out)
[18:09] * todin (tuxadero@kudu.in-berlin.de) Quit (Remote host closed the connection)
[18:09] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[18:11] * tnt (~tnt@120.194-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:11] * verwilst (~verwilst@d528F423A.access.telenet.be) Quit (Quit: Ex-Chat)
[18:14] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) has joined #ceph
[18:16] * alram (~alram@ has joined #ceph
[18:20] * denken (~denken@dione.pixelchaos.net) Quit (Quit: leaving)
[18:20] * yehuda_hm (~yehuda@2602:306:330b:a40:7438:5485:39e:5b0f) has joined #ceph
[18:25] * ircolle (~ircolle@ has joined #ceph
[18:25] * jlogan1 (~Thunderbi@2600:c00:3010:1:49d6:5ead:ab1a:61ba) has joined #ceph
[18:28] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[18:28] * BManojlovic (~steki@gprswap.mts.telekom.rs) has joined #ceph
[18:34] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:38] <rlr219> mikedawson: you still here?
[18:42] * dpippenger1 (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[18:44] * jtangwk1 (~Adium@2001:770:10:500:4b1:7be0:532e:4e6c) Quit (Quit: Leaving.)
[18:44] * Cube1 (~Cube@ has joined #ceph
[18:44] * xdeller (~xdeller@ Quit (Quit: Leaving)
[18:48] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:49] * denken (~denken@dione.pixelchaos.net) has joined #ceph
[18:50] * Ryan_Lane1 (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:50] * jtangwk (~Adium@2001:770:10:500:8d22:d635:d460:e02) has joined #ceph
[18:57] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:57] * loicd (~loic@2a01:e35:2eba:db10:8922:4401:5db7:42e6) has joined #ceph
[18:58] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:59] * loicd (~loic@2a01:e35:2eba:db10:8922:4401:5db7:42e6) Quit ()
[19:00] * loicd (~loic@2a01:e35:2eba:db10:8922:4401:5db7:42e6) has joined #ceph
[19:01] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:01] <sagewk> bah
[19:01] <sagewk> can one of you jump in also so i can test my mic
[19:01] <gregaf> indeeed
[19:01] * xmltok_ (~xmltok@pool101.bizrate.com) has joined #ceph
[19:01] <gregaf> yeah, okay
[19:01] <slang1> gregaf: the thrasher uses the mds state info to wait till it sees that the mds it killed has reached laggy_or_crashed
[19:02] <slang1> gregaf: it also uses it to check for other states like replay and active
[19:02] <gregaf> ah, and is using that as the action points for things, I see
[19:03] <slang1> I'm arguing that once a named mds enters the mdsmap, it should never leave
[19:03] <gregaf> sagewk:
[19:03] <sagewk> hmm
[19:04] <sagewk> why?
[19:04] * chutzpah (~chutz@ has joined #ceph
[19:05] * xmltok (~xmltok@pool101.bizrate.com) Quit (Ping timeout: 480 seconds)
[19:05] <slang1> in this specific case, it allows me to wait for it to reach a failed/stopped/laggy_or_crashed state
[19:05] <gregaf> in this case, so that his trigger points work — if he's got a standby and it takes over before the thrasher task sees it as laggy or crashed it gets tricky to figure out that it actually happened
[19:06] <slang1> in the general case, its nice information to have: these are MDSs that you've deployed in this cluster, these are their states
[19:06] <sagewk> can't the test just gracefully handle non-existence and take that to mean it is crashed/down/not running?
[19:07] <sagewk> i'm not sure that in general you want to the mdsmap to contain entries for all mds's that ever were running, ever
[19:07] <slang1> gregaf: I think I can actually wait till the mds is just gone from the map, since I have all their names, but ..
[19:07] * nhorman (~nhorman@nat-pool-rdu.redhat.com) Quit (Quit: Leaving)
[19:07] <gregaf> yeah, I think that's the appropriate action to take right now
[19:07] <sagewk> i guess we could have an 'ceph mds rm <name>' command to clear old cruft out, but it's not clear to me that there is added value in storing information about a daemon that is not running and has no state
[19:08] <gregaf> sagewk: I'm actually not sure it's inappropriate to keep a record of named daemons and what they're doing around, since it takes an admin action to create each one
[19:08] <slang1> sagewk: its state is: this mds crashed, and can be restarted as a standby for the now active mds
[19:09] <gregaf> but I don't remember how easy that is to do in the monitor (is it just a matter of how we clean up the mds_info map?) and I don't think we should be trying to do that at this time
[19:09] <sagewk> which effectively == no state. the only information there is that it once existed
[19:09] <sagewk> that isn't useful to the system itself. is it valuable to the admin?
[19:09] <slang1> sagewk: with many MDSs, this will get really confusing to figure out if/when you should restart an mds
[19:10] <sagewk> can't you just infer if it's not there that it failed? (=~ laggy)?
[19:10] <slang1> sagewk: was it removed intentionally or did just crash and needs to be restarted as a standby for the now active mds that was standby for me?
[19:10] <slang1> sagewk: yes
[19:10] <sagewk> hmm
[19:10] <slang1> sagewk: I'm arguing for the general case now, not for the thrasher
[19:10] <sagewk> yeah. i can see the argument there...
[19:11] <sagewk> if we make sure the 'ceph mds rm <name or gid>' works, i'm okay with that. we could make it an option even (mon remove failed mds) or something, defaulting to true.
[19:12] * dpippenger (~riven@ has joined #ceph
[19:12] <gregaf> what implications does this have on clients? any that you can think of?
[19:12] <sagewk> nope
[19:13] <sagewk> they only look and mds_info's for the up set.. these would just be extra records in the map
[19:13] <gregaf> okay
[19:13] <sagewk> hmm, wait.
[19:13] <gregaf> so we should probably make a monitor bug and let that team prioritize it :)
[19:13] <gregaf> s/bug/feature request
[19:13] <sagewk> currently the unique id is the gid, a per-instance u64. technically you can have multiple daemons in there with the same name.
[19:13] <sagewk> when *would* an entry get removed?
[19:14] <sagewk> a heuristic to remove a failed item with the same name? that seems kind of sloppy
[19:14] <gregaf> yeah, this is why I want to make a ticket so that it can be dealt with when somebody can actually think about it :)
[19:14] <gregaf> but that would pretty much have to be the answer to get the behavior Sam's talking about
[19:15] <sagewk> that, or we enforce that the names have to be unique.
[19:15] <sagewk> we're 90% thought about it, let's finish the thought insetad of restarting later
[19:15] <slang1> it doesn't seem sloppy to have two MDSs in the map with the same name?
[19:15] <sagewk> it does :)
[19:15] * slang1 nods
[19:15] <gregaf> enforce that names have to be unique? not sure what you're saying there that's different from what already happens
[19:16] <gregaf> oh god, there's not an explicit check in prepare_beacon
[19:16] <gregaf> but I wouldn't count on multiple uses of the same name to work
[19:16] <slang1> if they have to be unique already, this could just be two functions, mon remove_mds_by_uid and mon remove_mds_by_name
[19:17] <slang1> gregaf: maybe its enforced in the config?
[19:17] <gregaf> lol, can't enforce anything in the config ;)
[19:18] <gregaf> brb
[19:18] <slang1> looks like a list not a map
[19:19] * slang1 tries it
[19:30] <slang1> mdsmap e8: 2/2/3 up {0=a=up:active,1=b=up:active}
[19:30] <slang1> created with mds a,b,a
[19:30] <slang1> seems to work fine, looks like the second mds.a just gets ignored
[19:32] <slang1> or rather, the first mds.a most likely
[19:32] <gregaf> haha, I forget the output order — that's 2 active, 2 max_mds, 3 running daemons?
[19:32] <slang1> there are a few warnings about it
[19:32] * ircolle (~ircolle@ Quit (Quit: Leaving.)
[19:32] <gregaf> anyway yeah, my concern is that there's functionality like standby-for-name
[19:32] <slang1> max_mds=3
[19:32] <gregaf> oh, so it is in fact not going active?
[19:33] <slang1> it is going active
[19:33] <slang1> its just setting max_mds to 3 instead of 2
[19:39] * bstaz (~bstaz@ext-itdev.tech-corps.com) has joined #ceph
[19:43] <sagewk> i think the trick with enforcing unique names is figuring out what to do when you have 2 guys with the same name. last one wins?
[19:44] <slang1> sagewk: abort?
[19:45] * Ryan_Lane (~Adium@ has joined #ceph
[19:45] <slang1> sagewk: last one aborts?
[19:45] <slang1> sagewk: why would we want to allow users to create an mds with the same name as an existing mds?
[19:46] <sagewk> in reality it will be two ceph-mds daemons starting on teh same host by accident, or starting and restarting quickly and the msessages arriving or being processed out of order
[19:46] <sagewk> i.e. its the "same" mds, but a different instance of the daemon
[19:46] * Ryan_Lane (~Adium@ Quit ()
[19:47] * Ryan_Lane (~Adium@ has joined #ceph
[19:47] <slang1> sagewk: those both seem like error cases to me
[19:47] <gregaf> or just a mis-configuration issue
[19:47] <gregaf> but yes, I think the monitor should disallow new daemons if they try to claim the name of an MDS that is already running
[19:48] <sagewk> .. but it can't
[19:48] <sagewk> the mon doesn't know if the daemon is running, only if it got a beacon.
[19:48] <sagewk> consider 'service ceph restart mds.foo'
[19:48] <sagewk> mon immediately gets new beacon from "another" mds.a, but afaict the old one is still up and happy.
[19:48] <gregaf> okay, so it prevents them from going active until it considers the old one dead
[19:49] <sagewk> right now, the new mds.a goes into standby, eventually the old one times out, and then new one takes over
[19:49] <gregaf> perhaps notifying them of such
[19:49] <slang1> and we could have the mds tell the monitor that its getting stopped on shutdown
[19:49] <sagewk> slang1: that would help in one case (though doing anything blocking on shutdown is annoying), but not in general
[19:50] <gregaf> in any case, this is not one of our current priorities and is something that would get mostly implemented by a different team, so can we make whatever tickets are appropriate and focus on priorities instead? :)
[19:50] <sagewk> i see two options... (1) keep what we have. names are convenient, but don't otherwise mean much
[19:50] <sagewk> (2) make a 'new' mds.a that boots implicitly fail the 'old' mds.a
[19:50] <slang1> gregaf: this is useful discussion for me, actually
[19:51] <sagewk> 2 has the advantage that an admin restart doesn't make you wait for the timeout.
[19:51] <sagewk> it has the disadvantage that a misconfiguration will make two daemons fight each other
[19:52] <sagewk> 1 has the advantage that we are already done fixing it, yay! :)
[19:52] * loicd (~loic@2a01:e35:2eba:db10:8922:4401:5db7:42e6) Quit (Quit: Leaving.)
[19:53] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:53] <slang1> sagewk: 1 makes it hard to implement keeping unique names in the mds map even if they've failed
[19:55] <slang1> another option is to get rid of names entirely
[19:57] <sagewk> it's only a problem if you need/want unique names. removing names doesn't actually help anything; they're conveinent and useful, even if they don't "enforce" anything.
[19:57] <sagewk> i think 2 is the only thing we should really consider.. otherwise the status quo seems just fine.
[19:57] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[19:57] <slang1> sagewk: convenient for identifying what MDS daemons are running. but they don't server that purpose fully
[19:59] <sagewk> they just don't tell you what daemons used to be running, right?
[19:59] * slang1 nods
[19:59] <sagewk> well, let's just fix the thrasher for now.
[20:00] <sagewk> i'd like to ponder a bit more what the implications of making a 'new' mds.a kick the 'old' mds.a are.
[20:01] <slang1> ok :-)
[20:01] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[20:04] * dty (~derek@testproxy.umiacs.umd.edu) has joined #ceph
[20:06] * xmltok_ (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[20:06] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[20:07] * The_Bishop_ (~bishop@e179012071.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[20:11] <elder> glowell1, can you look at my failed build for wip-rbd-new-2 on ceph-client? It looks like the failure was due to a packaging problem.
[20:12] <elder> In addition, it's reporting "jobserver not available" which I think means slow, non-parallel builds.
[20:12] <glowell1> elder: ok
[20:13] <mikedawson> rlr219: here now
[20:15] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[20:18] * jrisch (~Adium@83-95-19-94-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[20:22] * rturk-away is now known as rturk
[20:24] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:24] * mikey (~mikey@catv-213-222-190-74.catv.broadband.hu) Quit (Read error: Connection reset by peer)
[20:24] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:25] * maswan (maswan@kennedy.acc.umu.se) Quit (Read error: Operation timed out)
[20:29] * mikey (~mikey@catv-213-222-190-74.catv.broadband.hu) has joined #ceph
[20:31] <rlr219> ok. after using crushmap tunables, my ceph cluster has finally finished recovering from the crushmap changes I made. but when I try to mount an rbd " echo ",, name=admin rbd nfsshare" > /sys/bus/rbd/add -bash: echo: write error: Input/output error" it errors out?
[20:32] * al (d@niel.cx) Quit (Remote host closed the connection)
[20:32] <rlr219> I have 4 VMs all using RBDs and they are running just fine
[20:32] * al (d@fourrooms.bandsal.at) has joined #ceph
[20:32] <rlr219> the tunables work Mike.
[20:32] <rlr219> but now I have another issue
[20:33] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[20:33] <joshd> rlr219: your kernel is 3.5+? that's where the crush tunables are supported (and 3.6+ is recommended for kernel clients anyway)
[20:34] <mikedawson> rlr219: glad you got past the first issue.
[20:35] <rlr219> Linux ss141 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:31:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
[20:35] <rlr219> yes johd, running quantal
[20:36] * maswan (maswan@kennedy.acc.umu.se) has joined #ceph
[20:36] <jksM> I was advised on the mailing list to set debug osd to 20 and create a log... I have set it in ceph.conf and restarted the osd... but it hasn't logged more than usual (about 8 log lines over the last hour). Is there a trick?
[20:37] <rlr219> actually, ran that uname on wrong server my kernel is actually 3.5.0-18-generic
[20:37] <mikedawson> joshd: I can saturate 1GbE using rados bench with 4MB writes and a relatively low thread count (maybe 8). But when I add 20GB image to glance (backed by a pool with same # of PGs and same 3x replication), throughput seems to be an order of magnitude lower. Any ideas?
[20:37] <joshd> mikedawson: glance is only using one thread with synchronous requests
[20:38] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[20:39] <mikedawson> joshd: I was doing 2 simultaneously from the same host. If I do rados bench with -t 2, should I expect the same throughput I get uploading to Glance?
[20:42] <mikedawson> joshd: after correcting for thread count, the throughput seemed to be about 10x worse.
[20:43] <dosaboy> hi guys, what is preferred deployment method for ceph atm?
[20:43] <dosaboy> ceph.com lists chef, manual (mkcephfs) and ceph-deploy
[20:44] <dosaboy> some site say that mkcephfs is deprecated
[20:44] <dosaboy> but ceph-deploy is not documented from what I can see
[20:44] <dosaboy> and the webinar earlier used mkcephfs
[20:44] <dosaboy> so am a little confused
[20:44] <joshd> mikedawson: generally yes, although I haven't tried optimizing glance at all. going through python may be making an extra copy
[20:45] * Oliver1 (~oliver1@ip-178-203-175-61.unitymediagroup.de) has joined #ceph
[20:47] <jmlowe> mikedawson: I would expect very bad performance from glance since it's basically a python http proxy accepting post requests and stuffing that data into rados
[20:47] <joshd> mikedawson: is the rados bench on the same pool glance is using?
[20:47] <mikedawson> joshd: no, I've been benchmarking against another pool
[20:48] <mikedawson> joshd: am I wrong to assume if the pool has the same # of PGs and replication size, they should perform similarly?
[20:48] <mikedawson> jmlowe: right
[20:50] <jmlowe> mikedawson: do I remember correctly that there is a glance cache? if so then it may even be worse, writing everything to local disk then copying that into ceph
[20:50] <mikedawson> dosaboy: mkcephfs is the old standby it has worked well for me, ceph-deploy reportedly needs work before its ready.
[20:51] <dosaboy> mikedawson: thanks
[20:52] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[20:52] <joshd> mikedawson: they should be the same, but it doesn't hurt to check
[20:52] <jmlowe> mikedawson: if your local disk writes 30MB/s and you ceph cluster does 300MB/s, right there is your 10x
[20:52] <rlr219> what woudl keep me from mounting a block device? rbd map nfsshare --pool rbd --name client.admin add failed: (5) Input/output error
[20:53] <mikedawson> jmlowe: I think you are right
[20:54] <jmlowe> mikedawson: so mount cephfs wherever glance is set to cache?
[20:55] <jmlowe> might do something for you
[20:55] * Ryan_Lane (~Adium@ has joined #ceph
[20:57] <scalability-junk> jmlowe, 300MB/s seems like a good network connection
[20:57] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[20:58] <scalability-junk> at least 2.4 Gb/s with the overhead probably 2.6 Gb/s
[20:58] * Ryan_Lane1 (~Adium@ has joined #ceph
[20:59] * Ryan_Lane2 (~Adium@ has joined #ceph
[21:00] <jmlowe> scalability-junk: 10GigE is the slowest denomination where I'm from
[21:00] <jmlowe> scalability-junk: my perspective is a little skewed
[21:01] <scalability-junk> jmlowe, great world :) I'm stuck with 1Gb/s at least I got 3 nics public, private, storage
[21:03] * Ryan_Lane (~Adium@ Quit (Ping timeout: 480 seconds)
[21:03] * The_Bishop (~bishop@i59F6C54.versanet.de) has joined #ceph
[21:04] * Ryan_Lane (~Adium@ has joined #ceph
[21:04] * dosaboy1 (~user1@host86-164-229-186.range86-164.btcentralplus.com) has joined #ceph
[21:04] * sagewk (~sage@ Quit (Ping timeout: 480 seconds)
[21:04] * sjust (~sam@ Quit (Ping timeout: 480 seconds)
[21:04] * gregaf (~Adium@ Quit (Ping timeout: 480 seconds)
[21:04] * dosaboy (~user1@host86-164-221-235.range86-164.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[21:04] * Ryan_Lane3 (~Adium@ has joined #ceph
[21:05] * alram (~alram@ Quit (Read error: Operation timed out)
[21:05] * Ryan_Lane2 (~Adium@ Quit (Read error: Connection reset by peer)
[21:05] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Ping timeout: 480 seconds)
[21:05] * gregaf (~Adium@2607:f298:a:607:3d65:f727:d43b:b7dd) has joined #ceph
[21:06] * Ryan_Lane1 (~Adium@ Quit (Ping timeout: 480 seconds)
[21:07] * Ryan_Lane3 (~Adium@ Quit (Read error: Operation timed out)
[21:08] * gregaf (~Adium@2607:f298:a:607:3d65:f727:d43b:b7dd) Quit (Read error: Operation timed out)
[21:11] * gregaf (~Adium@2607:f298:a:607:3d65:f727:d43b:b7dd) has joined #ceph
[21:11] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) has joined #ceph
[21:11] <sjustlaptop> aon network flap?
[21:12] * Ryan_Lane (~Adium@ Quit (Ping timeout: 480 seconds)
[21:14] * yehudasa (~yehudasa@2607:f298:a:607:39fc:6c7f:2c27:2458) Quit (Ping timeout: 480 seconds)
[21:14] * rturk is now known as rturk-away
[21:15] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[21:15] <joshd> rlr219: anything in dmesg/syslog?
[21:17] * sjust (~sam@ has joined #ceph
[21:23] * yehudasa (~yehudasa@2607:f298:a:607:6969:5d4:541c:de81) has joined #ceph
[21:31] <glowell1> elder: looks like the problem with wip-rbd-new-2 was a transient error. Worked the second time.
[21:32] <rlr219> wait one joshd
[21:32] <elder> Thanks a lot for checking on it. Any idea why such a transient error would occur? (I see it was something like EINVAL from gzip)
[21:32] * alram (~alram@ has joined #ceph
[21:34] <rlr219> joshd: http://pastebin.com/hh01SmhD
[21:34] <glowell1> I didn't find anything that had changed in the packaging piece. I'm guessing it was evironmental. We look like we have enough disk space at the moment, and I could see anything else wrong. All the files in the directory were accessible.
[21:35] <glowell1> That s/b "could not"
[21:35] <elder> OK. I'll chalk it up to nondeterministic hardware.
[21:38] <rlr219> looks like gregh may have just "patched' this? http://www.mail-archive.com/stable@vger.kernel.org/msg28285.html
[21:39] <joshd> rlr219: no, that's indicating your kernel doesn't have the crush tunables feature
[21:39] * The_Bishop_ (~bishop@i59F6A14A.versanet.de) has joined #ceph
[21:40] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[21:40] <rlr219> anyway to fix this? I can make block devices, just can't map to them.
[21:42] <joshd> rlr219: you'll need to upgrade your kernel to 3.6
[21:43] <joshd> rlr219: or disable crush tunables
[21:45] * The_Bishop (~bishop@i59F6C54.versanet.de) Quit (Ping timeout: 480 seconds)
[21:46] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[21:48] <rlr219> joshd: i had to enable tunables to get the cluster to recover after I changed the crush map. Will this cause data movement again and possibly keep the cluster from recovering again?
[21:48] * xdeller (~xdeller@broadband-77-37-224-84.nationalcablenetworks.ru) has joined #ceph
[21:48] <joshd> rlr219: yes. I'd suggest upgrading the kernel
[21:49] <joshd> rlr219: or if you compile your own, backporting the patch that lets crush tunables be used
[21:53] <rlr219> joshd: may wanna change the docs then: http://ceph.com/docs/master/rados/operations/crush-map/#tunables
[21:55] <joshd> rlr219: noted
[21:58] * infraworx (~infraworx@ has joined #ceph
[22:04] * infraworx is now known as tony
[22:04] * tony (~infraworx@ Quit (Quit: Colloquy for iPad - http://colloquy.mobi)
[22:05] * tony (~infraworx@60-241-229-106.static.tpgi.com.au) has joined #ceph
[22:06] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:07] * loicd (~loic@magenta.dachary.org) has joined #ceph
[22:08] * tony (~infraworx@60-241-229-106.static.tpgi.com.au) Quit ()
[22:08] * tony (~tony@60-241-229-106.static.tpgi.com.au) has joined #ceph
[22:11] <loicd> is there a talk about ceph at https://fosdem.org/2013/ ?
[22:11] <tnt> That would have been nice
[22:14] <fghaas> loicd: come to linux.conf.au instead, there's 3 :)
[22:14] * aliguori (~anthony@ has joined #ceph
[22:14] <fghaas> you can still rebook!
[22:15] <tnt> That's like exactly at the other side of the world.
[22:15] <fghaas> actually, there's 4 ceph talks
[22:15] <loicd> fghaas: it's a little far from Paris I'm afraid ;-)
[22:15] <loicd> fghaas: impressive !
[22:15] <fghaas> tnt: I'm doing three of them, and I'm flying 26 hours, so you can too :)
[22:16] <fghaas> well, on the down side, lca is already sold out, so I'm afraid the rebooking option is out :(
[22:16] * aliguori_ (~anthony@ has joined #ceph
[22:16] <loicd> ahhh, too baaaaad.... I would have liked to ...
[22:16] * loicd hypocrit
[22:16] <tnt> flights to australia aren't cheap unfortunately and since I'd probably have to pay for it myself, it kind of matters :(
[22:17] * aliguori (~anthony@ Quit (Read error: Connection reset by peer)
[22:17] <fghaas> there will be recordings.
[22:17] <tnt> it's just not the same :p
[22:17] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:17] <loicd> leseb: are you coming to FOSDEM ?
[22:17] <fghaas> sage is doing a talk too, so youtube will be gloriously cephified two weeks from now
[22:17] <fghaas> well, 4 weeks, realistically
[22:18] <leseb> loicd: yes I am :)
[22:18] <leseb> pretty close from the NL :)
[22:18] <leseb> s/from/to
[22:18] <loicd> leseb: how come there is no ceph talk if you're going ? ;-)
[22:19] <leseb> loicd: haha, it's just guys at the office are used to go there every year
[22:20] <loicd> leseb: we should organize an impromptu meeting to discuss ceph while we're there. What do you think ?
[22:20] <phantomcircuit> http://i.imgur.com/XFhqQ.png
[22:20] <leseb> loicd: absolutely :)
[22:20] <phantomcircuit> so i noticed that when committing is true all writes stop
[22:20] <loicd> it's a deal !
[22:20] * loicd ploting
[22:21] * dmick (~dmick@2607:f298:a:607:1a03:73ff:fedd:c856) has joined #ceph
[22:21] <phantomcircuit> is that intentional or is that just a quirk of the performance characteristics of my setup?
[22:21] <leseb> loicd: it is!
[22:22] <tnt> somehow, when first reading that, I tought you were responding to phantomcircuit ...
[22:22] <phantomcircuit> tnt, like my pretty graph
[22:23] <phantomcircuit> i'd include more data but i cant figure out how to get matplotlib to display more than 2 plots
[22:23] <tnt> yup seen it ...
[22:23] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[22:23] * loicd contemplating https://fosdem.org/2013/schedule/roomtracks/ to find a friendly track willing to indulge a last minute gathering
[22:24] * jrisch (~Adium@4505ds2-hi.0.fullrate.dk) has joined #ceph
[22:26] <madkiss> sju
[22:26] <madkiss> err
[22:26] <madkiss> sjust, gregaf: are you guys about? :)
[22:29] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[22:30] * benpol (~benp@garage.reed.edu) has joined #ceph
[22:32] * ircolle (~ircolle@ has joined #ceph
[22:34] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:37] * gaveen (~gaveen@ has joined #ceph
[22:37] <dmick> jksM: thanks for the log; I'll see if I can get help analyzing
[22:37] <madkiss> i'll take that as a no :(
[22:39] <dmick> madkiss: what's up?
[22:41] <madkiss> i was just wondering about an idea i was confronted with. To reduce random i/o in systems with lots of OSDs per machine, a customer had the idea of turning two OSDs into a RAID0 and then use these within Ceph, in order to reduce random I/O to the journals and OSDs if a whole machine goes down
[22:41] <madkiss> given that we have seen in our tests that there is a big difference between e.g. 6 OSDs failing and 14 OSDs failing at a time, I am tempted to assume that this makes sense, but wanted to make sure
[22:42] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[22:43] * Oliver1 (~oliver1@ip-178-203-175-61.unitymediagroup.de) Quit (Quit: Leaving.)
[22:44] <iggy> madkiss: the drawback there is if one of those drives dies, you have to rebuild 2x the drive size
[22:44] <madkiss> obviously, yeah
[22:45] <buck> autobuild-ceph question: some builds have --enable-cephfs-java and --with-debug turned on so as to trigger the hadoop tests to be built. I'm looking to turn this on for more builds. Other than the compile taking slightly longer, does anyone know of a reason to not build the java bindings on some platforms?
[22:45] <dmick> madkiss: and you double your probability of failure
[22:45] <iggy> buck: it has more dependencies (that may not be available everywhere)
[22:46] * tony (~tony@60-241-229-106.static.tpgi.com.au) Quit (Quit: Colloquy for iPad - http://colloquy.mobi)
[22:46] <iggy> someone was dealing with java stuff in build recently... there was something weird
[22:47] <buck> iggy: fair point. I was looking to turn it on for deb and deb-native builds. Seems like those platforms should have the requisite dependencies.....hmmm.....
[22:48] <iggy> x86{,_64} should
[22:48] <madkiss> well, i will call it a day, thanks an dgood night
[22:53] * ebo^ (~ebo@ has joined #ceph
[22:55] <loicd> out of curiosity, did someone already submitted a talk about ceph to http://www.openstack.org/summit/portland-2013/ ?
[22:56] * ScOut3R (~ScOut3R@5400A5AF.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[22:58] <sstan> what do you think about exporting RBDs with iSCSI with a high availability resource agent. Then, make servers iSCSI boot from there. That way, the cluster would be totally independent from the hardware
[22:58] <sstan> as long as at least one replica of each object is usable
[22:59] <ebo^> phantomcircuit, assuming you use "subplot"
[22:59] <ebo^> subplot(131) gives you 3 plots
[22:59] * rturk-away is now known as rturk
[22:59] <ebo^> subplot(132) being the second
[23:00] <dmick> sstan: well sorta. You don't avoid needing the cluster to be healthy, but you do avoid failures of the 'gateway' machine(s)
[23:01] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[23:03] <sstan> when a node fails, some robot could simply replace it with another one that would boot from the iSCSI pool
[23:03] <dmick> when you say "node", you mean "an iSCSI proxy"?
[23:04] <sstan> by server/node I mean the physical machine (like a HP bladeserver)
[23:05] <dmick> yes, but what role is it serving
[23:05] <sstan> each server/node/machine would be configured to boot from some iSCSI location
[23:05] <dmick> oh, you're not talking VMs. I see.
[23:05] <sstan> no exactly :)
[23:05] <sstan> one could run VMs on top of that of course
[23:06] <dmick> yeah. I assume people do this with iSCSI and FC already, but this is a way to have Ceph be the backing store, sure
[23:06] <ebo^> funny things: i had a cephfs mounted via fuse for some time and my users started to miss some files. remounting made them reappear. they were always accessible from other hosts.
[23:06] <sstan> yeah that can be done with FC and iSCSI msa, etc
[23:08] <sstan> ebo, files were accessible and missing respectively on different hosts at the same time?
[23:08] * Cube1 (~Cube@ Quit (Quit: Leaving.)
[23:08] <ebo^> yes
[23:08] <dmick> sstan: you might be interested in http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/12140
[23:08] * Lennie`away is now known as leen
[23:08] <ebo^> the host with missing files used fuse, the others use kernel 3.7
[23:08] * leen is now known as Lennie
[23:09] <sstan> thanks, dmick !
[23:10] <buck> Could someone sanity check 2 1-line changes to build scripts? autobuild-ceph branch wip-buck-more-java
[23:10] * Cube1 (~Cube@ has joined #ceph
[23:10] <buck> we're turning on java bindings and the associated tests for more builds
[23:12] * Lennie is now known as Lennie`away
[23:17] * ebo^ (~ebo@ Quit (Quit: Verlassend)
[23:20] <sstan> dmick: what's funny with such a cluster is that OSD daemons and operating systems are storing themselves.
[23:20] * Ryan_Lane (~Adium@ has joined #ceph
[23:22] <sstan> Consider a user that accesses an object that is stored/replicated on the machine itself. Would there be I/O on the network in that case?
[23:23] <dmick> I now understand you mean the servers that are running the Ceph daemons? booting from the cluster?
[23:23] <dmick> that's particularly ill if so
[23:23] <sstan> exactly
[23:24] <sstan> is it good or bad lol
[23:24] <dmick> I....don't konw
[23:24] <dmick> My initial reaction is terror but it may be unfounded
[23:24] <janos> sounds like circular dependency
[23:24] <sstan> it's like inception
[23:24] <sstan> yes it is
[23:25] <janos> usually not good, but i kinda like it!
[23:25] <sstan> but it's not a problem as long as at least one replica is online
[23:25] <janos> yeah
[23:25] <janos> haha
[23:25] <sstan> what's good about that is : 1) you can change each server one at a time
[23:25] <janos> we should not talk about this. it might jinx it
[23:25] <sstan> like an organism ... where each cell is replaced
[23:26] <sstan> but the organism itself keeps surviving
[23:26] <sstan> janos: I don't speak English. What does that mean ?
[23:26] <janos> if this sort of "ecosystem" can be kept stable and alive pretty much forever, i'd say that's really damn cool and quite a feat
[23:26] <sstan> yeah and consider this
[23:26] <janos> "jinx" is sort of like - spoil or curse
[23:27] <sstan> hah
[23:27] <sstan> why?
[23:27] <janos> trigger and problem
[23:27] <sstan> in the future ?
[23:27] <janos> yeah
[23:27] <janos> like the phrase "speak of the devil"
[23:27] <dmick> sstan: superstition
[23:27] <janos> say it and it happens
[23:27] <janos> nod
[23:27] <janos> Murphy's Law plays in there somewhere too
[23:28] <sstan> ah so if we tell ourselves that it could fail ... it actually will
[23:28] <janos> right
[23:28] <janos> sstan: your english is better than many native speakers ;)
[23:30] <sstan> haha thank you! I "speak it" but I don't know expressions and words that have not evident meanings like ... "pull of" "make it" etc.
[23:31] <janos> i'm still informing my own wife os sayings ;)
[23:31] <janos> that takes time no matter what
[23:31] <janos> os/of
[23:31] <sstan> the best about such an ecosystem is that it can grow very easily. When an extra node appears ... it copies keyrings, installs OSDs and voila!
[23:32] <sstan> then starts runnin' virtual machines
[23:32] <sstan> ... that are stored on the same object store
[23:32] <dmick> I encourage you to set up such a system and let us know how it works :)
[23:33] <janos> as time permits!
[23:33] <sstan> will do!
[23:33] <sstan> do you think there is use for a text-based GUI for ceph?
[23:34] <xmltok> are the ceph cookbooks ideal for setting up an entire cluster? it looks like it may need a cluster running to be useful
[23:34] <janos> sstan: i'm sure some people would find it useful
[23:34] <dmick> sstan: certainly more config/monitoring work is needed
[23:34] <dmick> we're thinking REST, but there's no reason the client for such things couldn't be text-based
[23:35] <janos> sstan: trying to make tools like that helps flush out deficiencies in the existing API
[23:35] <dmick> xmltok: they're intended to be
[23:35] <dmick> what do you see that seems to require a preexisting cluster?
[23:36] <xmltok> well i have 4 nodes, im working on the first now, its going to run everything. the cookbook is failing now trying to determine the state of the monitor
[23:36] <xmltok> creating a client.admin keyring
[23:36] <dmick> so it should have installed, configured, and started a mon by now
[23:37] <xmltok> must be my recipe order
[23:37] <sstan> dmick : I was thinking about ncurses style tool
[23:37] <dmick> sstan: sure; what I mean is, you'll want interfaces to sit on to do that (contacting remote machines etc.) but the presentation layer certainly could be ncurses
[23:38] * jrisch (~Adium@4505ds2-hi.0.fullrate.dk) Quit (Ping timeout: 480 seconds)
[23:39] <xmltok> not sure its it, i have ceph::apt, then ceph::mon, ceph::mds, ceph::osd, ceph::radosgw, and its still giving me the same issues
[23:40] <dec> what's the best way to officially report a performance regression in 0.56.1?
[23:40] <sstan> xmltok: firewall issues ?
[23:40] <xmltok> nah, no firewalls
[23:40] <xmltok> i did have the 5 minute install going on this machine before but i killed all of the processes, wiped the disks with ceph-disk-prepare, and nuked /etc/ceph
[23:41] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:41] <sstan> what does ceph-disk-prepare do? It'
[23:41] <sstan> it's undocumented. Why just not format it && mkfs.xfs
[23:42] <sstan> && mount /dev/xyz /var/lib/ceph/....
[23:42] <xmltok> mkfs's and i believe make some kind of file structure. Its in the chef doc, http://ceph.com/docs/master/rados/deployment/chef/
[23:43] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[23:44] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:46] * ircolle (~ircolle@ Quit (Quit: Leaving.)
[23:47] <dmick> dec: file a tracker issue
[23:47] <dmick> and :(
[23:48] <xmltok> looks like the cookbook uses ceph-disk-prepare when its is_crowbar anyway. im going to trigger rebuilds of these nodes and start fresh
[23:48] <dmick> xmltok: can you see it trying and failing to start the mon? I'm not an expert on exactly how those recipes interact but I know the mon should be up
[23:49] <dmick> if it tries and fails, the mon logs might have a clue
[23:50] <xmltok> i can see it created the mon keyring and it runs a chef service called ceph-mon-all-starter but nothing useful in the logs
[23:51] <dec> dmick: in 'Ceph' as a bug? feature? in 'performance' ?
[23:51] <xmltok> i didnt clean out /var/lib/ceph so i imagine its left over junk from my test earlier
[23:51] <dmick> Ceph as a bug; if you can mark performance that's fine (category, I assume?...)
[23:52] <dec> dmick: thanks.
[23:52] <dmick> that's an upstart service I think
[23:52] * rturk (~rturk@ds2390.dreamservers.com) Quit (Quit: Coyote finally caught me)
[23:52] <dmick> xmltok: is this Ubuntu?
[23:52] * rturk-away (~rturk@ds2390.dreamservers.com) has joined #ceph
[23:53] * rturk-away is now known as rturk
[23:53] <xmltok> yeah
[23:53] <dmick> what Ceph version?
[23:54] <xmltok> 0.56.1
[23:54] <dmick> hum. so did ceph-mon-all-starter get successfully enabled? and is ceph-mon running?
[23:54] <xmltok> no ceph related processes, let me nuke everything, include /var/lib/ceph, and have the cookbook try again
[23:55] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[23:55] <dmick> sure
[23:57] <xmltok> ok, so no ceph processes running and the only thing in the log is the creation of monfs
[23:57] <xmltok> now, my environment has the fqdn for my mons, not the hostname -s, is that my problem?
[23:58] <dmick> when I say "log" I mean stuff in /var/log/ceph, like the mon log there
[23:58] <dmick> is that what you're looking at?
[23:58] <xmltok> yeah
[23:59] * leseb (~leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:59] <dmick> so mon didn't log anything about its shutdown?
[23:59] * dosaboy (~gizmo@host86-164-229-186.range86-164.btcentralplus.com) has joined #ceph
[23:59] * sander (~chatzilla@c-174-62-162-253.hsd1.ct.comcast.net) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.