#ceph IRC Log


IRC Log for 2011-12-02

Timestamps are in GMT/BST.

[0:38] <nwatkins> sjust: i was sorta following your convo with buck, and i'm getting ready to deploy the latest master out to a bunch of nodes tonight. is there something i should wait on? it sounded like the stock config has problems?
[0:39] <sjust> nwatkins: I'm still looking into it, I haven't seen that happen anywhere else
[0:40] <sjust> if you check the generated osdmap, you should be able to tell immediately whether it happened
[0:40] <sjust> that is, pg_num should be appropriate for all osds
[0:40] <sjust> *****for all pools
[0:42] <nwatkins> sjust: maybe a dumb question---what does an inappropriate pg_num look like?
[0:42] <sjust> sorry, should have been more clear. In this case, pg_num was 0
[0:42] <sjust> pg_num is the number of pgs for that pool
[0:42] <sjust> so around 100 per osd or so
[0:43] <sjust> actually, hang on
[0:43] <nwatkins> sjust: ok
[0:45] <sjust> ah, it seems to happen when the only osd is osd.0
[0:45] <nwatkins> that specific name, or a single osd?
[0:45] <sjust> single osd, probably, tracking down the bug now
[0:45] <nwatkins> sjust: cool, what an edge case!
[0:46] <sjust> nwatkins: not exactly :P
[0:47] <nwatkins> sjust: i gotta run. thanks for the info -- i'll let ya know if i see anything weird related to this stuff.
[0:47] <sjust> sure
[0:47] <sjust> heh, it really does have to be osd.0 (probably max_osds related)
[0:47] <nwatkins> woah!
[0:48] <nwatkins> the whole thing confused me because ./vstart worked fine.
[0:59] * verwilst (~verwilst@dD57671DB.access.telenet.be) Quit (Quit: Ex-Chat)
[1:22] * elder (~elder@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[1:26] * adjohn (~adjohn@ has joined #ceph
[1:32] <sjust> buck: 75db1362bff96471480186afb35483305387ea5d in next should fix your problem
[1:32] * monrad (~mmk@domitian.tdx.dk) has joined #ceph
[1:33] <sjust> buck: or rather it would prevent the issue on a new cluster, you'll still need to adjust pg_num directly on the one from earlier
[1:34] * df_ (davidf@dog.thdo.woaf.net) Quit (Remote host closed the connection)
[1:34] * df_ (davidf@dog.thdo.woaf.net) has joined #ceph
[1:34] * _Shiva__ (shiva@whatcha.looking.at) Quit (Remote host closed the connection)
[1:35] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (Ping timeout: 480 seconds)
[1:39] * _Shiva_ (shiva@whatcha.looking.at) has joined #ceph
[1:44] <sagewk> just did first successful backfill recovery, whee. cleanup time
[1:50] * adjohn (~adjohn@ Quit (Quit: adjohn)
[1:53] * adjohn (~adjohn@ has joined #ceph
[1:55] * adjohn (~adjohn@ Quit ()
[2:07] * adjohn (~adjohn@ has joined #ceph
[2:07] * adjohn (~adjohn@ Quit ()
[2:21] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:22] * adjohn (~adjohn@ has joined #ceph
[2:22] * adjohn (~adjohn@ Quit ()
[2:23] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[2:28] * adjohn (~adjohn@ has joined #ceph
[2:28] * adjohn (~adjohn@ Quit ()
[2:38] * adjohn (~adjohn@ has joined #ceph
[2:50] * adjohn is now known as Guest18993
[2:50] * Guest18993 (~adjohn@ Quit (Read error: Connection reset by peer)
[2:50] * adjohn (~adjohn@ has joined #ceph
[3:28] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[3:39] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:44] * df_ (davidf@dog.thdo.woaf.net) Quit (Remote host closed the connection)
[3:52] * df_ (davidf@dog.thdo.woaf.net) has joined #ceph
[3:58] * adjohn (~adjohn@ Quit (Quit: adjohn)
[4:04] * NightDog (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[4:15] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:32] * aa (~aa@r186-52-225-140.dialup.adsl.anteldata.net.uy) has joined #ceph
[4:33] * grape (~grape@ has joined #ceph
[5:03] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[5:05] * ghaskins__ (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[5:06] * ghaskins_ (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Read error: Operation timed out)
[5:54] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[5:54] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[5:59] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit ()
[6:24] * bchrisman1 (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[6:28] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:42] * MarkDude (~MT@ has joined #ceph
[7:54] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[8:14] * MarkDude (~MT@ Quit (Ping timeout: 480 seconds)
[8:19] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[8:21] * bchrisman1 (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[8:34] * aneesh (~aneesh@ Quit (Remote host closed the connection)
[8:58] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[9:17] * verwilst (~verwilst@dD576F10D.access.telenet.be) has joined #ceph
[9:19] * peritus (~andreas@h-150-131.a163.priv.bahnhof.se) Quit (Ping timeout: 480 seconds)
[9:24] * aneesh (~aneesh@ has joined #ceph
[9:26] * fronlius (~fronlius@e176056235.adsl.alicedsl.de) has joined #ceph
[9:27] * peritus (~andreas@h-150-131.a163.priv.bahnhof.se) has joined #ceph
[10:01] * pmjdebruijn (~pascal@overlord.pcode.nl) Quit (Quit: leaving)
[10:04] * Meths_ (rift@ has joined #ceph
[10:09] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[10:21] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[10:31] <chaos_> sagewk, I've written own variation of your hack
[10:31] <chaos_> http://wklej.org/id/639157/ this works
[10:31] <chaos_> mds is running
[10:31] <chaos_> your if was to deep into loop
[12:08] * NightDog__ (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[12:08] * NightDog (~karl@52.84-48-58.nextgentel.com) Quit (Read error: Connection reset by peer)
[13:13] * constructor (~construct@09GAAAUE4.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:16] * donut (~construct@9YYAACZOX.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:19] * fronlius_ (~fronlius@e176052065.adsl.alicedsl.de) has joined #ceph
[13:21] * constructor (~construct@09GAAAUE4.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[13:24] * fronlius (~fronlius@e176056235.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[13:24] * fronlius_ is now known as fronlius
[15:02] * NightDog__ (~karl@52.84-48-58.nextgentel.com) Quit (Read error: Connection reset by peer)
[15:03] * NightDog__ (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[15:57] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[16:08] * fronlius (~fronlius@e176052065.adsl.alicedsl.de) Quit (Quit: fronlius)
[16:24] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:25] * andresambrois (~aa@r186-52-228-126.dialup.adsl.anteldata.net.uy) has joined #ceph
[16:29] * aa (~aa@r186-52-225-140.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[16:31] * The_Bishop (~bishop@port-92-206-183-175.dynamic.qsc.de) has joined #ceph
[17:01] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[17:50] * NightDog__ (~karl@52.84-48-58.nextgentel.com) Quit (Quit: Leaving)
[17:58] <sagewk> chaos_: great news. i would definitely get your data out and wipe your cluster now, though, as there may be subtle metadata inconsistencies introduced because we skipped over part of the mds journal
[18:03] * donut (~construct@9YYAACZOX.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[18:03] * donut (~construct@exit-01d.noisetor.net) has joined #ceph
[18:04] <chaos_> sagewk, yea data is backed up, I'll wipe cluster in few days, there are to many hacks there ;p
[18:04] <chaos_> btw. sagewk when 0.39 is coming out?
[18:04] <sagewk> today
[18:05] <chaos_> and 0.40? do you have any schedule?
[18:05] <failbaitr> shiny :)
[18:06] <sagewk> 3 weeks
[18:07] <sagewk> you can see the release schedule on the tracker roadmap page, http://tracker.newdream.net/projects/ceph/roadmap
[18:07] <chaos_> thanks
[18:08] <sagewk> i branch on that date, and release toward the end of the next week after we shake out any bugs in the new code
[18:08] <chaos_> I'm asking because fix for my bugs are marked for 0.40, i'll apply them manualy
[18:09] <nhm> sagewk: how many people do you have working on those bugs for 0.40?
[18:10] <sagewk> which bugs?
[18:10] <sagewk> most everything fixed in the last week should be applied to the stable release (the tracker is somewhat misleading there)
[18:11] <chaos_> http://tracker.newdream.net/issues/1756
[18:11] <chaos_> this for example
[18:11] <sagewk> that willb e in 0.39
[18:11] <nhm> sagewk: yeah, I think I was confused. I thought everything on the tracker page needed be fixed for 0.40 in 16 days. ;)
[18:12] <sagewk> they're there so we see them when we look at the task list. it's roughly a priority list (when you click on the v0.40 sprint link on the right)
[18:17] * fronlius (~fronlius@e176052065.adsl.alicedsl.de) has joined #ceph
[18:21] * andresambrois (~aa@r186-52-228-126.dialup.adsl.anteldata.net.uy) Quit (Remote host closed the connection)
[18:33] * bchrisman (~Adium@ has joined #ceph
[18:53] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:54] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[19:10] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:12] * donut (~construct@83TAABTFZ.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[19:57] * jantje (~jan@paranoid.nl) has joined #ceph
[20:03] * jantje_ (~jan@paranoid.nl) Quit (Ping timeout: 480 seconds)
[20:03] * Meths_ is now known as Meths
[20:46] * NightDog (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[21:11] * alexxy (~alexxy@ Quit (Ping timeout: 480 seconds)
[21:12] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[21:14] <cp> what's the current way to set where the logs should go?
[21:15] <joshd> cp: --log-to-stderr (on master it's a bool, on 0.38 I think you need --log-to-stderr=2)
[21:15] <cp> I was thinking more like log dir = /var/log/cephlogging
[21:15] <cp> in the ceph.conf file
[21:15] <joshd> ah, I don't think that changed recently
[21:15] <cp> I'm reading this but not sure if it's up to date
[21:15] <cp> http://ceph.newdream.net/wiki/Cluster_configuration#log_dir
[21:17] * ziazu (~andrea@host172-122-dynamic.47-79-r.retail.telecomitalia.it) has joined #ceph
[21:19] * ziazu (~andrea@host172-122-dynamic.47-79-r.retail.telecomitalia.it) Quit (Read error: Connection reset by peer)
[21:20] <joshd> cp: now I remember - log_dir was removed a while ago to try to make things less confusing
[21:20] <cp> so it's all log_file now?
[21:20] <cp> with the normal $id stuff?
[21:20] <joshd> yeah
[21:21] <cp> thanks josh
[21:21] <joshd> you're welcome
[21:24] <cp> another quick question - if things like "ceph osd create zzz" are working but "rbd xxx" are hanging - any suggestions for how to go about debugging?
[21:26] <joshd> usually that means some pgs are unavailable for some reason - 'ceph pg dump' should show them all active+clean
[21:27] <cp> some of the pgs say "creating"
[21:28] <joshd> in that output there's a column for the acting osds - check the logs for the first acting one (the primary) for the creating pgs
[21:30] <Tv> "Suggested packages: ... fortran-compiler"..
[21:30] <Tv> i don't think i've had fortran suggested for me in a couple of decades ;)
[21:31] <nhm> Tv: it's still a way of life around here. :)
[21:31] <nhm> Though I don't do much personally.
[21:32] <Tv> that just comes from something like automake
[21:44] * alexxy (~alexxy@ has joined #ceph
[21:46] <cp> hmmm... seems somewhere in the crush map code someone is using a signed 32bit int and not checking for overflows.....
[21:46] <cp> (for the weights)
[21:47] <Tv> !#$!@$! blkid won't show me the guid of a partition unless the filesystem contained can be detected to be one of the supported ones
[21:47] <Tv> those concepts are on totally separate levels
[21:48] <Tv> and a ceph osd journal just won't look like an ext4 :(
[21:50] <joshd> cp: did your crushmap get weird weights causing an osd to run out of space?
[21:51] <cp> joshd: No, I got negative numbers in the crush map due to an integer overflow somewhere in the crush map code
[21:51] <cp> which I suspect is causing the pg code to fail
[21:52] <joshd> how did you generate the crushmap? with mkcephfs? did you change it at all after generating it?
[21:53] <cp> It's a custom crush map with some large numbers for the weights
[21:54] <darkfader> Tv: bklid is the thing used by multipath device helpers right?
[21:54] <darkfader> oh wait, ah you're using it to read EFI headers?
[21:54] <Tv> darkfader: it's used by lots of things.. i just want a way to query a gpt partition table without writing it from scratch
[21:55] <Tv> darkfader: i'm thinking of making ceph osd journals get a new gpt GUID type, etc
[21:55] <darkfader> you're on the edge of abusing it :)
[21:55] <darkfader> got you
[21:55] <darkfader> did you try to read it via parted?
[21:55] <Tv> blkid is *built* to be able to respond to queries like "list all partitions with foo=bar"
[21:55] <Tv> except it seems to assume it'll always understand the partition contents
[21:56] <gregaf> cp: oh, the weights should all be values between 0 and 1
[21:56] <Tv> darkfader: i can make it work as long as i mkfs.ext4 the partition :(
[21:56] <darkfader> i agree it's misreacting. but i would love it even more if it really just concern with block devices, not paritionet stuff
[21:56] <cp> oooooh
[21:56] * josef (~seven@nat-pool-rdu.redhat.com) has joined #ceph
[21:56] <cp> gregaf: that's good to know
[21:56] <darkfader> meh thats useless i agree
[21:56] <gregaf> unless I'm confused and the legal values are different depending on which system you're in
[21:56] <Tv> darkfader: libparted has almost no api for gpt stuff that i can see
[21:56] <darkfader> ah ok
[21:56] <josef> sage: i sent you a patch, i think i finally got that stupid orphan warning nailed down
[21:57] <darkfader> if i find something that helps you i'll let you know
[21:57] <Tv> darkfader: as in, i could get the start/end of the partition.. but i want the metadata
[21:57] <darkfader> i see
[21:57] <gregaf> cp: I guess it might be cool if you could just set the disk size as the weight on each one or something, but right now it's just a multiplier that gets used somewhere IIRC
[21:58] <darkfader> Tv: honestly, the way most of these utilities suck, did you consider just using dd and bc?
[21:59] <cp> gregaf: that's exactly what I was doing
[21:59] <darkfader> thats what i did on hp-ux, although not on EFI labeled disks
[21:59] <darkfader> (gpt)
[21:59] <joshd> gregaf, cp: I think you can set it as anything in the user tools, and it gets normalized
[21:59] <cp> gregaf: but I can make them fractions. Do they need to add up to 1 or something? Got pointers to the relevant code?
[21:59] <cp> joshd: got a pointer to where the normalization happens?
[21:59] <gregaf> oh, maybe I am mistaken about the levels...
[22:02] <joshd> maybe I'm wrong about that, I can't see it in the code
[22:02] <cp> joshd,gregaf: OK, the crush map looks good again, but I still have pgs stuck in 'creating'...
[22:03] <cp> http://ceph.newdream.net/wiki/Custom_data_placement_with_CRUSH
[22:03] <cp> has an example with weights like 4.0
[22:09] <joshd> sagewk: does the crushtool normalize the weights for you?
[22:10] <sagewk> joshd: no
[22:11] <NaioN> I've something weird, before (0.38) I saw with a ls -lah on a cephfs the size of the contents of a directory
[22:11] <sagewk> they should stay around 1.0
[22:11] <NaioN> now it takes a long time to update that
[22:11] <sagewk> on that order of magnitude
[22:11] <sagewk> for dho it's the size of osd, in TB
[22:11] <NaioN> and the ceph -w stops with outputting after some time
[22:12] <joshd> sagewk: this came up because cp got a negative weight after adding large weights
[22:15] <sagewk> joshd, cp: do you have the commands/weights that led to badness?
[22:21] <cp> sagewk: weight 11446374.0 becomes weight -22426.000
[22:23] <cp> wait, that's not quite right
[22:23] <cp> 43110 became -22426
[22:24] <cp> ah, no I was right the first time
[22:42] * MK_FG (~MK_FG@ Quit (Ping timeout: 480 seconds)
[22:43] <sagewk> cp: how were you setting the weight? can you send me a copy of the map you were setting it on?
[22:50] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[22:57] <Tv> darkfader: i was hoping not to need to write my own gpt parser.. :-/
[22:57] <Tv> darkfader: blkid is *so* close
[23:07] <todin> what happend to the wiki, why does it have so often problems?
[23:26] <Tv> todin: It's a poorly-maintained mysql instance :( and it's been a low-priority problem lately. The outages seem to more about not being able to handle spikes, than actual outages; it comes back in a few seconds.
[23:29] <todin> Tv: ok, if you need a helping hand for the mysql just shout
[23:32] <cp> sagewk: I've just switched to normalizing everything now.
[23:34] <cp> quick question: 'ceph osd dump' shows "crush_ruleset 0" but I only have rule 1 defined
[23:34] <cp> and setting them with "ceph osd pool set data crush_ruleset 0" complains if I use any number other than 0
[23:34] <cp> Does rule 1 --> crush_ruleset 0 ?
[23:38] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit (Quit: cp)
[23:43] <sagewk> cp: the rule has a property called ruleset that needs to match.. the rule id itself can be anything (and there can be many rules in that set)
[23:44] <sagewk> tv, todin: jordan's moving the site and mysql to a different (non-loaded) vps host.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.