#ceph IRC Log

Index

IRC Log for 2013-07-19

Timestamps are in GMT/BST.

[0:00] <sagewk> you should see a message about the scrub error that marked it inconsistent. if you grep for inconsistent and then look backwards in teh log from there you shoudl see the message about scrub
[0:00] <sagewk> or ceph health detail | grep inconsistent and then grep that that pgid in the log
[0:00] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Remote host closed the connection)
[0:02] <lyncos> sagewk .. humm I cannot find that id in the mon log
[0:02] <sagewk> weird. well, you can scrub it again with 'ceph pg scrub <pgid>'
[0:02] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[0:02] <lyncos> ok doing it
[0:02] <lyncos> 2013-07-18 22:02:42.563262 osd.1 [INF] 6.17f scrub ok ( 1 remaining deep scrub error(s) )
[0:03] <sagewk> weird. well, you can scrub it again with 'ceph pg deep-scrub <pgid>'
[0:03] <sagewk> er, 'ceph pg deep-scrub <pgid>'
[0:03] <lyncos> 2013-07-18 22:02:42.563262 osd.1 [INF] 6.17f scrub ok ( 1 remaining deep scrub error(s) )
[0:04] <lyncos> I think it takes time right ?
[0:04] <lyncos> Got it
[0:05] <lyncos> http://pastebin.com/gS9ducF1
[0:06] <lyncos> and: 2013-07-18 22:04:30.927624 7f164cf8e700 1 heartbeat_map reset_timeout 'OSD::disk_tp thread 0x7f164cf8e700' had timed out after 60 on the OSD
[0:06] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:06] <sagewk> sounds like maybe a bad disk? look in dmesg on that host
[0:07] <sagewk> that usually means we got EIO back from the fs
[0:07] <lyncos> man you are right
[0:07] <lyncos> wow
[0:07] <sagewk> yay scrub!
[0:07] <lyncos> what should I do to fix.. just remove that OSD and replace it ?
[0:07] <sagewk> yeah
[0:08] <sagewk> stop the daemon, mark it out (ceph osd out osd.NNN), let teh cluster heal, and if all is well then throw the disk away
[0:08] <lyncos> ok thanks ! will have to check what is the problem.. it's a raid array
[0:08] * madkiss1 (~madkiss@2001:6f8:12c3:f00f:c498:42b3:5ccd:5aae) Quit (Remote host closed the connection)
[0:08] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[0:08] <lyncos> thanks for your help :-)
[0:08] <sagewk> np
[0:09] <gregaf> wait, what? disk thread pool timeout means EIO?
[0:10] <paravoid> speaking of that, I noticed that when I out'ed two boxes, the data didn't spread out evenly at all
[0:10] <paravoid> so I had a few osds with > 85% toofull
[0:11] <sagewk> the scrub error does
[0:11] <paravoid> if I remove them from crush altogether will they balance out better?
[0:11] <sagewk> the op_tp timeout is probably the fs wedging writes because of the same underlying disk issue
[0:11] <gregaf> yeah
[0:11] * sleinen1 (~Adium@2001:620:0:26:1475:b1d8:524b:b9d4) has joined #ceph
[0:12] <gregaf> paravoid: often, though if using the optimal tunables I think it's less of a big deal
[0:12] <paravoid> I do use the post-bobtail tunable
[0:13] <paravoid> *tunables
[0:16] <gregaf> have to ask sagewk then
[0:18] <cjh_> i just saw the changelog for 61.5. Was there a param to turn on leveldb cache in versions prior to 61.5?
[0:19] * jmlowe (~Adium@c-50-165-167-63.hsd1.in.comcast.net) has joined #ceph
[0:19] <gregaf> cjh_: yeah, "mon leveldb cache size", .61.5 just changed it from 0 to 512
[0:19] <cjh_> cool :)
[0:20] <cjh_> i didn't realize that should've been on for such a large perf boost :)
[0:20] <gregaf> neither did we ;)
[0:20] <cjh_> lol
[0:20] <cjh_> awesome, i'll change that on my cluster now
[0:21] <cjh_> or i could just upgrade and get it for 'free'
[0:21] * Macheske (~Bram@d5152D87C.static.telenet.be) Quit ()
[0:21] <sagewk> sjusthm1: merged the peering perfcounter
[0:21] <sjusthm1> thanks
[0:22] <sagewk> paravoid: would be interested in seeing the crush map at that point where things were imbalanced
[0:22] <sagewk> paravoid: this was when things wereback to active+clean, or while they were remapped?
[0:22] <paravoid> I could never to purely active+clean as I got to backfill/toofull
[0:23] <paravoid> my hierarchy is basically two rows each having two racks each having three servers each having 12 osds
[0:24] <paravoid> oh heh, maybe it was just what it should have done
[0:24] <paravoid> since I try to have at least one copy per row
[0:25] * sleinen1 (~Adium@2001:620:0:26:1475:b1d8:524b:b9d4) Quit (Quit: Leaving.)
[0:25] <paravoid> okay, this might be me just being stupid
[0:25] <paravoid> but in any case, "out" and removing from the crush map completely should have the same effect?
[0:25] <sagewk> right, same thing
[0:26] <gregaf> isn't there still a difference in retry behavior? or is that just gone now?
[0:27] * jmlowe (~Adium@c-50-165-167-63.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[0:27] <paravoid> http://p.defau.lt/?Ri_wncgTIlDW2iqXUfpKAw
[0:27] <paravoid> that's the hierarchy
[0:27] <paravoid> I know have step chooseleaf firstn 0 type rack
[0:27] <paravoid> I thought it was row :)
[0:27] * BillK (~BillK-OFT@203-59-173-44.dyn.iinet.net.au) has joined #ceph
[0:27] <paravoid> sorry about that, I last modified this back in December
[0:28] <sagewk> gregaf: not with the new default tunables
[0:28] <gregaf> cool
[0:28] <paravoid> so if I make this "type row"
[0:28] <gregaf> err, s/default/optimal/, right?
[0:28] <paravoid> I have two rows and three replicas in that pool
[0:28] <sagewk> ...and lots of data will move :)
[0:28] <paravoid> how can I tell it to put the third replica into a different rack?
[0:28] <sagewk> choose 2 row, then chooseleaf 2 rack. and set nrep = 3
[0:29] <sjustlaptop> sagewk: wip-5655
[0:29] <paravoid> step choose firstn 0 type row, chooseleaf firstn 2 type rack?
[0:31] <paravoid> er, step choose firstn 2 type row, step chooseleaf firstn 2 type rack
[0:31] <sagewk> step choose firstn 2 type row, chooseleaf firstn 2 type rack
[0:31] <sagewk> yeah
[0:32] <sagewk> that will put first 2 replicas in one row, different racks, and 1 replica in another row
[0:32] <paravoid> ok
[0:32] <paravoid> if I just chooseleaf 0 type row, wouldn't it attempt placing the two replicas in the same row apart in the hierarchy automatically?
[0:33] <sagewk> nope. it either gives you want you ask for, or fewer results
[0:33] <sagewk> oh
[0:33] <paravoid> heh, ok :)
[0:33] <sagewk> right. yeah it would end up giving you 2 replicas in 2 rows, and that's it.
[0:34] <paravoid> so the third one could possibly be on the same rack or even the same box?
[0:34] <paravoid> as one of the other two
[0:37] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[0:37] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[0:38] <gregaf> step chooseleaf 0 type row would only give you 2, since it doesn't have >=3 racks to choose from
[0:38] <paravoid> oh
[0:38] <gregaf> the thing sagewk suggested works here but is a pretty dirty hack ;)
[0:38] <paravoid> now I get it
[0:38] <paravoid> okay
[0:38] <paravoid> okay, I think I just need to read up a bit
[0:39] <paravoid> instead of asking silly questions :)
[0:39] <gregaf> where we're choosing two (ie, both) and then choosing two in each, but simply discarding the second rack from the second row because we only wanted three results
[0:40] <gregaf> ie, CRUSH is generating [1.1, 1.2, 2.1, 2.2] and giving the user [1.1, 1.2, 2.1]
[0:40] <gregaf> this doesn't generalize to getting 4 racks in 3 rows because crush would generate [1.1, 1.2, 2.1, 2.2, 3.1, 3.2] and return [1.1, 1.2, 2.1, 2.2]
[0:41] <gregaf> yehuda_hm: you'd mentioned wip-rgw-next would need review at some point; is that up yet?
[0:42] <yehuda_hm> gregaf: you can review what's there, still figuring out some quircks
[0:42] <yehuda_hm> quirks
[0:42] <gregaf> k, I don't need to do it now
[0:44] <yehuda_hm> gregaf: there are a few unrelated issues
[0:44] <sagewk> gregaf: wip-osd-leaks?
[0:44] <sagewk> sjusthm1: wip-5655: need to at least cover has_feature() too.
[0:45] <sjustlaptop> sagewk: oops
[0:45] <sagewk> but i think it'd be safer to sanitize/translate the features before they are put into the Connection. like maybe in set_features
[0:46] <sagewk> hmm, earlier. probably right after they come over the wire, in Pipe.cc accept() and connect()
[0:46] <sagewk> since there are lots of checks in there for requires vs supported features
[0:46] <gregaf> sagewk: leaks looks good, but if you mean refs then can you clean it up?
[0:46] <sagewk> cool. refs is just around for debugging.
[0:46] <gregaf> not sure which of those debug commits without signoffs you wanted to include, and it's clearly based off leaks but no longer shares commits :)
[0:46] <sagewk> none :)
[0:46] <gregaf> k
[0:49] <sh_t> can anyone explain how the dmcrypt functionality in ceph works? theres isnt much in the docs. specifically I'm wondering what to do about the keyfiles sitting on the same server that the osds are on and how I can secure those somewhere and only use them when I start up the osds
[0:49] <sh_t> are they meant to just be put in place on the node at startup only?
[0:51] <gregaf> sh_t: right now they're only meant to satisfy the "we need to get rid of drives without physically destroying them" use case, so it expects the keys to be sitting on the system drive or whatever
[0:51] <gregaf> at some point we'll provide a mechanism for storing them on a different server (probably in the monitor key storage), but there hasn't been much driving user interest in making them safe in-place
[0:52] <SvenPHX> when one gets the user access and secret keys for the rados gateway, how does one hash the secret key to pass to the API? I read somewhere that it's needed (I guess to not pass the secret key in the clear), but I'm not sure how to do that. Passing the secret key in the clear I get an access denied.
[0:52] <sh_t> gregaf: understood, thanks
[0:54] <gregaf> if that's something you're interested in you probably have an orchestration framework that can help you by eg mounting the drives yourself and giving them to Ceph as formatted filesystems
[0:54] <sh_t> another question I have. I just have a basic setup going with 3 nodes, all 3 running mon's, and 2 running osd's. the 2 nodes have 2 osds each on them of equal size. do I need to do anything special with regards to pg's to ensure my default "2 replica" redundancy across the nodes?
[0:54] <gregaf> and of course patches are welcome!
[0:54] <gregaf> I don't actually remember what the defaults look like right now, but I think you're good
[0:55] <sh_t> I guess my question is really just.. does ceph make the data redundant based on just individual osd's or will it take into account the osd's being on a different node when theres only 2 nodes with osds
[0:55] <sh_t> I'd assume the latter but just want to be sure
[0:56] <gregaf> you can check out the crush rules the system is using with "ceph osd crush dump"
[0:58] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Read error: Operation timed out)
[0:58] <sagewk> sjusthm1: sjustlaptop: should the tmp collection maybe be completely emptied on any peering event?
[0:58] <sagewk> peering reset, rather?
[0:58] <sjustlaptop> sagewk: that's another option, but this is trivialler
[0:58] <gregaf> (the failure domains it uses for separating are very configurable, and there's some attempt at doing the right thing depending on how many failure domains you have depending on the tool you're using and when the rules are set up)
[0:58] <sjustlaptop> sagewk: it would require that we track what's in the temp collection
[0:58] <sagewk> maybe an assert later then that it is empty?
[0:58] <sjustlaptop> not a dificult task, but annoying
[0:59] <sjustlaptop> in this case, it may not be
[0:59] <sagewk> just worried about similar bugs
[0:59] <sjustlaptop> yeah
[0:59] <paravoid> is this my bug?
[0:59] <sjustlaptop> paravoid: no, working on that one too
[0:59] <paravoid> no, I meant the peering one
[0:59] <sjustlaptop> which peering one?
[0:59] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit (Quit: Leaving.)
[0:59] <paravoid> what was it after all?
[0:59] <paravoid> heh, the latest one I guess
[1:00] <paravoid> the one that is (probably) fixed
[1:00] <sjustlaptop> oh, that went into next, we were doing a huge amount of work we didn't need to do
[1:00] <sjustlaptop> so we stopped doing it :)
[1:00] <sjustlaptop> you were special because your pgs have a lot of deletes
[1:00] <paravoid> ah!
[1:00] <sjustlaptop> presumably because of radosgw overwrites and garbage collection
[1:00] <paravoid> also because I have a lot of deletes from user traffic
[1:00] <sjustlaptop> oh, even easier
[1:01] <gregaf> yehuda_hm: are there any tracker tickets associated with the wip-rgw-next patches?
[1:01] <paravoid> not much compared to writes (so the number of files tends to increase monotonically)
[1:01] * diegows (~diegows@190.190.2.126) Quit (Read error: Operation timed out)
[1:02] <sjustlaptop> paravoid: we were doing a filestore operation for every delete in the log
[1:02] <sjustlaptop> during peering
[1:02] <sjustlaptop> although, we were doing exactly the same thing in bobtail, so that's a bit odd
[1:03] <paravoid> counted them, about 300-350 deletes per minute
[1:03] <paravoid> it never worked with bobtail either
[1:03] <sjustlaptop> oh, well there you go then
[1:03] <sjustlaptop> sagewk: just repushed wip-5655
[1:03] <paravoid> #5084 still says "(bobtail)" on the title :)
[1:04] <sjustlaptop> oh yeah
[1:04] <paravoid> I've been through a lot, heh
[1:05] <paravoid> perfect, I finally know how my cluster is special
[1:05] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[1:05] <paravoid> well, apart from being cursed that is
[1:05] <sagewk> sjustlaptop: oops, i think you missed my follow-on comments
[1:06] <sagewk> (03:44:34 PM) sagewk: but i think it'd be safer to sanitize/translate the features before they are put into the Connection. like maybe in set_features
[1:06] <sagewk> (03:45:23 PM) sagewk: hmm, earlier. probably right after they come over the wire, in Pipe.cc accept() and connect()
[1:06] <sagewk> (03:45:38 PM) sagewk: since there are lots of checks in there for requires vs supported features
[1:06] <sjustlaptop> sagewk: you are right, we need to track the temp directory contents
[1:06] <sjustlaptop> sagewk: ah
[1:06] <sagewk> sjustlaptop: i can pick this one up tho
[1:06] <sagewk> (the features thing)
[1:06] <sjustlaptop> sagewk: ok
[1:06] <sjustlaptop> I'll be back in a couple of hours
[1:06] <paravoid> I can test :)
[1:07] <paravoid> I didn't upgrade, thought it might be worth it to just wait and confirm if it works (or not)
[1:09] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) Quit (Ping timeout: 480 seconds)
[1:13] * themgt (~themgt@pc-56-219-86-200.cm.vtr.net) has joined #ceph
[1:16] * jeff-YF (~jeffyf@67.23.117.122) Quit (Ping timeout: 480 seconds)
[1:18] * diegows (~diegows@190.190.2.126) has joined #ceph
[1:18] <sagewk> sjustlaptop: repushed wip-5655
[1:19] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) has joined #ceph
[1:19] <paravoid> shall I try it?
[1:21] <sagewk> hmm not yet
[1:22] <paravoid> k
[1:22] <paravoid> "are we there yet?" :)
[1:23] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) Quit (Read error: Operation timed out)
[1:23] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) has joined #ceph
[1:23] <sagewk> i think we're off by one bit
[1:24] <sagewk> well, it won't matter in your case, please test
[1:24] <sagewk> sjustlaptop: when you get back can you look at the updated patch? i think we actaully want ot infer they have 33 bits of 1's (which includes SNAPMAPPER)
[1:26] <paravoid> ok, waiting for gitbuilder
[1:26] <sagewk> join the club :)
[1:26] <paravoid> incredibly useful btw
[1:26] <sagewk> we have a love/hate relationship
[1:27] <paravoid> hate why?
[1:27] <sagewk> i seem to spent a lot of my day waiting for it
[1:27] <paravoid> haha :)
[1:27] <sagewk> we need a pay to jump to the front of the line
[1:28] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Read error: Connection reset by peer)
[1:28] <paravoid> haha
[1:28] <sagewk> for when someone is waiting on a build
[1:28] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[1:28] <paravoid> yeah that'd be nice :)
[1:28] <paravoid> so, while waiting
[1:28] <paravoid> what's the release plans?
[1:28] <paravoid> is there going to be a 0.67 next week? or a 0.67-rc?
[1:31] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[1:32] * mozg (~andrei@host109-151-35-94.range109-151.btcentralplus.com) Quit (Quit: Ex-Chat)
[1:39] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[1:39] <sagewk> -rc tomorrow i think
[1:43] <paravoid> oh!
[1:47] <paravoid> seems to work so far
[1:52] * LeaChim (~LeaChim@97e00a48.skybroadband.com) Quit (Ping timeout: 480 seconds)
[1:54] * jmlowe (~Adium@2601:d:a800:97:3d55:7a08:d981:1cf) has joined #ceph
[1:54] <paravoid> wow, it's amazing how much better this is peering-wise
[1:55] <sagewk> awesome
[1:55] <paravoid> yes
[1:55] <paravoid> this has been a problem since forever
[1:55] <sagewk> can't wait to close some bugs :)
[1:57] <paravoid> hm
[1:58] <paravoid> 2013-07-18 23:57:57.533795 mon.0 [INF] pgmap v10134893: 16760 pgs: 16268 active+clean, 448 active+recovery_wait, 4 active+clean+scrubbing+deep, 40 active+recovering; 46827 GB data, 144 TB used, 116 TB / 261 TB avail; 0B/s rd, 0B/s wr, 0op/s; 1389/860715494 degraded (0.000%)
[1:58] <paravoid> I may be impatient but it's been at 448 recovery_wait/40 recovering for quite a while
[1:58] <paravoid> always with 0B rd/wr, 0op
[1:59] <paravoid> and 1389, now it's actually 1391
[2:00] <paravoid> nothing unusual in logs
[2:02] * jmlowe (~Adium@2601:d:a800:97:3d55:7a08:d981:1cf) Quit (Ping timeout: 480 seconds)
[2:03] <paravoid> still the case, something's wrong I think
[2:04] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) has joined #ceph
[2:04] <paravoid> pg query: http://p.defau.lt/?49iZYON9_8RNnFSnwdDhOA
[2:04] <paravoid> sagewk: ^
[2:08] <sagewk> what if you query one of hte pgs that is recovering?
[2:08] <sagewk> does it look to be making progress?
[2:09] <paravoid> the query above was from such a pg
[2:09] <paravoid> http://p.defau.lt/?nu_qxxH7FcHaGYGXf2GsRw
[2:09] <paravoid> and here's a second query now
[2:10] <paravoid> look exactly the same to me
[2:12] <paravoid> shall I file a bug?
[2:13] * mschiff_ (~mschiff@port-49070.pppoe.wtnet.de) has joined #ceph
[2:13] <sagewk> yeah, can you reproduce it with full osd logging?
[2:14] <paravoid> osd 20? filestore too?
[2:16] <paravoid> ok, so all of those 40 have one of the 4 restarted OSDs that were also previously restarted as their primary osd
[2:17] <sagewk> oh, interesting
[2:18] <sagewk> they're the ones running the feature fix branch?
[2:18] <paravoid> yes
[2:18] <paravoid> I have 448 at recovery_wait too, but some of them are not primary
[2:18] <paravoid> I could increase max recoveries
[2:18] <paravoid> and sieve the hanging ones from the rest
[2:19] <sagewk> hmm whatever you can do to get some logs leading up to the hang.
[2:19] <sagewk> swithcing cmputers, back in 5
[2:21] * mschiff (~mschiff@port-57062.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[2:24] * SvenPHX (~scarter@wsip-174-79-34-244.ph.ph.cox.net) has left #ceph
[2:25] * jmlowe (~Adium@2601:d:a800:97:e80f:2b89:43fe:c7b7) has joined #ceph
[2:26] * sagelap (~sage@87.sub-70-197-74.myvzw.com) has joined #ceph
[2:27] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit (Quit: Leaving.)
[2:33] * jmlowe (~Adium@2601:d:a800:97:e80f:2b89:43fe:c7b7) Quit (Ping timeout: 480 seconds)
[2:33] * orium (~fulano@a79-169-47-110.cpe.netcabo.pt) has joined #ceph
[2:38] <paravoid> #5671
[2:38] <paravoid> each osd seems to cap at 10 recovery pgs
[2:38] <paravoid> so I had 40 with 4 up, killing each drops this by 10
[2:38] <paravoid> i set osd_recovery_max_active = '20' across but it doesn't seem to make a difference
[2:43] <paravoid> huh, gets weirder
[2:43] <paravoid> stop all 4
[2:44] <paravoid> start osd.0
[2:44] <paravoid> 2013-07-19 00:40:45.008725 mon.0 [INF] pgmap v10135884: 16760 pgs: 15347 active+clean, 1400 active+degraded, 13 active+clean+scrubbing+deep; 46827 GB data, 144 TB used, 116 TB / 261 TB avail; 24083601/860716166 degraded (2.798%)
[2:44] <paravoid> 2013-07-19 00:41:52.262000 mon.0 [INF] pgmap v10135903: 16760 pgs: 15582 active+clean, 109 active+recovery_wait, 1056 active+degraded, 13 active+clean+scrubbing+deep; 46827 GB data, 144 TB used, 116 TB / 261 TB avail; 0B/s rd, 0op/s; 18113426/860716169 degraded (2.104%)
[2:44] <paravoid> recovery_wait, no recovering at all
[2:44] <paravoid> start osd.1
[2:44] * yanzheng (~zhyan@jfdmzpr04-ext.jf.intel.com) has joined #ceph
[2:44] <paravoid> 2013-07-19 00:42:25.512513 mon.0 [INF] pgmap v10135922: 16760 pgs: 15829 active+clean, 214 active+recovery_wait, 693 active+degraded, 14 active+clean+scrubbing+deep, 10 active+recovering; 46827 GB data, 144 TB used, 116 TB / 261 TB avail; 0B/s rd, 0B/s wr, 0op/s; 11939210/860716211 degraded (1.387%)
[2:44] <paravoid> 10 recovering, all of them osd.0
[2:44] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[2:47] * huangjun (~huangjun@111.173.96.119) has joined #ceph
[2:49] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Remote host closed the connection)
[2:51] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[2:52] <paravoid> okay, sleep now
[2:53] <paravoid> updated http://tracker.ceph.com/issues/5671
[2:53] <paravoid> bye
[2:53] <sagelap> paravoid: bye!
[2:56] * jmlowe (~Adium@c-98-223-198-138.hsd1.in.comcast.net) has joined #ceph
[2:58] * yy-nm (~chatzilla@218.74.37.82) has joined #ceph
[3:01] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[3:03] * mschiff_ (~mschiff@port-49070.pppoe.wtnet.de) Quit (Remote host closed the connection)
[3:08] * sagelap (~sage@87.sub-70-197-74.myvzw.com) Quit (Read error: No route to host)
[3:10] * rturk is now known as rturk-away
[3:15] * sagelap (~sage@76.89.177.113) has joined #ceph
[3:23] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:28] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[3:31] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[3:39] * themgt_ (~themgt@pc-56-219-86-200.cm.vtr.net) has joined #ceph
[3:43] * themgt (~themgt@pc-56-219-86-200.cm.vtr.net) Quit (Ping timeout: 480 seconds)
[3:43] * themgt_ is now known as themgt
[3:44] * markbby (~Adium@168.94.245.4) has joined #ceph
[3:45] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit (Quit: Leaving.)
[3:52] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[3:55] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[3:56] <huangjun> i tried the radosgw and it succeed but when i use the client(cryberduck) to access it ,it returns failed, from the radosgw log, the auth failed, but i enter the access_key and secret_key
[3:57] <huangjun> that i generate from radosgw-admin user create --uid="huangjun" --display-name="huangjun"
[3:59] <huangjun> so what should i try? i see the yehuda suggest the 'rgw dns name' option, which isn't set in my cluster conf, does this matters?
[4:03] * jeandanielbussy__ (~jeandanie@124x35x46x15.ap124.ftth.ucom.ne.jp) has joined #ceph
[4:06] * markbby (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[4:07] * jeandanielbussy_ (~jeandanie@124x35x46x12.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[4:08] * jeandanielbussy__ is now known as jeandanielbussy
[4:09] * jeandanielbussy is now known as silversurfer
[4:14] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) has joined #ceph
[4:33] <huangjun> resolved, it bc the secret key contains escape character that cyberduck can not explain.
[4:36] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[4:38] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[4:46] * john_barbee (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[4:49] <sage> huangjun: what was the character?
[4:49] <sage> we should avoid it if we can
[4:57] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[5:03] <huangjun> sage: "secret_key": "4FutcaRvO6\/ddwcu1zfcvIuVCXS2JduJ1jyNToNu"}]
[5:03] <huangjun> there is a \ in the secret key
[5:05] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:05] * fireD (~fireD@93-142-204-59.adsl.net.t-com.hr) Quit (Read error: Operation timed out)
[5:05] * fireD (~fireD@93-142-255-56.adsl.net.t-com.hr) has joined #ceph
[5:06] * silversurfer972 (~jeandanie@124x35x46x12.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:10] * silversurfer (~jeandanie@124x35x46x15.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[5:14] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[5:17] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[5:17] <dmick> huangjun: ah, so, the problem is that JSON escapes '/'
[5:17] <dmick> so that '\/' really represents '/'
[5:20] <huangjun> yes, i see this in the doc,
[5:24] * matt__ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[5:27] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[5:30] * sjusthm1 (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Remote host closed the connection)
[5:33] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:38] <mikedawson> dmick, sage: after upgrade from 0.61.4 to 0.61.5, restarted my lowest rank mon and get "cephx: verify_reply coudln't decrypt with error: error decoding block for decryption"
[5:39] * madkiss (~madkiss@2001:6f8:12c3:f00f:b58c:7c0d:92e6:1c0e) has joined #ceph
[5:39] <lurbs_> I got the same thing, until I'd upgraded the other mons too.
[5:39] * lurbs_ is now known as lurbs
[5:40] <mikedawson> lurbs_: So rolling update is out, I guess
[5:40] <mikedawson> lurbs_: any other issues?
[5:41] <mikedawson> lurbs: ^
[5:42] <lurbs> Other than that, nope. Only did the upgrade this morning, a few hours ago.
[5:43] <mikedawson> lurbs: Thanks!
[5:43] <lurbs> There's nothing production on that cluster, so I had the luxury of just going for it.
[5:43] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[5:48] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[6:02] <sage> mikedawson: 32-bit or 64-bit host?
[6:02] <mikedawson> sage: 64-bit
[6:03] <sage> *oh*
[6:03] <mikedawson> I am now fully updated to 0.61.5. As lurbs experienced, I had to break quorum to get it done
[6:03] <sage> i know what it is. sigh.
[6:03] <dmick> ooo what is it what is it
[6:03] <sage> mabye
[6:03] <sage> i suspect there is a feature bit or protocol change with the mon scrub backport
[6:04] <sage> hrm, no.. :/
[6:05] <sage> will need to reproduce.
[6:05] <mikedawson> sage: need any logs or a tracker issue?
[6:08] <sage> no, too late now that you've done the upgrade
[6:09] <lurbs> I'm midway through building a new test cluster, ostensibly to test out ceph-deploy, but I could try to duplicate it on that.
[6:20] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[6:30] <huangjun> very interesting, at begin we have 3mons, and then we rebuild the cluster, but just use only one mon, after we start the cluster, the mon failed, the log shows:
[6:30] <huangjun> verify_authorizer could not decrypt ticket info: error: NSS AES final round failed: -8023
[6:31] <sage> lurbs: http://tracker.ceph.com/issues/5673
[6:31] <sage> huangjun: are you also talking abou the 0.61.4->5 upgrade?
[6:31] <huangjun> and the mon try to connect the old mons, but the auth failed many times
[6:31] <sage> logs with debug mon = 20 and debug ms = 20 should give us what we need
[6:31] <huangjun> no, i'am use the 0.613
[6:33] <huangjun> sage, our cluster is ok, i just told the phenomeno
[6:36] <yy-nm> what difference between debug ms = 1/5 and debug ms = 20 ?
[6:37] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) has joined #ceph
[6:41] * john_barbee (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[6:43] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[6:46] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: leaving)
[6:50] * Cube1 (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) has joined #ceph
[6:54] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[6:57] * matt__ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Ping timeout: 480 seconds)
[7:06] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[7:24] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[7:50] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[7:51] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit ()
[8:04] * Cube1 (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[8:04] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) has joined #ceph
[8:09] * Cube1 (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) has joined #ceph
[8:09] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[8:21] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[8:35] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:35] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:38] * illya (~illya_hav@13-6-133-95.pool.ukrtel.net) has joined #ceph
[8:45] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:45] <huangjun> continue thinking about what i have said, is there some mechniesm to prevent mon that not belong to one cluster connect to it?
[8:47] <huangjun> bc if it connect to the no-relation cluster, the cluster mon will try to get exclude mon's auth, absoultely failed.
[8:50] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Ping timeout: 480 seconds)
[8:51] * leseb1 (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) has joined #ceph
[8:52] * mschiff (~mschiff@port-49070.pppoe.wtnet.de) has joined #ceph
[8:54] * ShaunR (~ShaunR@staff.ndchost.com) Quit (Ping timeout: 480 seconds)
[8:56] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[9:04] * yy (~michealyx@218.74.37.82) has joined #ceph
[9:05] <yy-nm> huangjun, it's mon keeping updating the paxos and pgmap etc. ?
[9:06] * fridudad (~oftc-webi@fw-office.allied-internet.ag) has joined #ceph
[9:07] * dosaboy_ (~dosaboy@host86-164-85-49.range86-164.btcentralplus.com) Quit (Quit: leaving)
[9:09] * haomaiwa_ (~haomaiwan@notes4.com) has joined #ceph
[9:13] * _Tass4dar (~tassadar@D57DEE42.static.ziggozakelijk.nl) has joined #ceph
[9:15] * haomaiwang (~haomaiwan@117.79.232.209) Quit (Ping timeout: 480 seconds)
[9:15] * sleinen (~Adium@2001:620:0:2d:c88c:deeb:5946:3132) has joined #ceph
[9:16] <fridudad> 0.61.5 had crashed all my monitors and i can't get them back to work [09:07] <fridudad> They all crash with mon/OSDMonitor.cc: 132: FAILED assert(latest_bl.length() != 0
[9:18] * _Tass4da1 (~tassadar@tassadar.xs4all.nl) has joined #ceph
[9:18] * _Tass4dar (~tassadar@D57DEE42.static.ziggozakelijk.nl) Quit (Read error: Operation timed out)
[9:19] * LeaChim (~LeaChim@97e00a48.skybroadband.com) has joined #ceph
[9:19] * _Tassadar (~tassadar@tassadar.xs4all.nl) Quit (Ping timeout: 480 seconds)
[9:21] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[9:21] <fridudad> sjust sage sagewk sagelap 0.61.5 had crashed all my monitors and i can't get them back to work They all crash with mon/OSDMonitor.cc: 132: FAILED assert(latest_bl.length() != 0
[9:22] * wiwengweng (~oftc-webi@202.108.100.231) has joined #ceph
[9:23] <wiwengweng> hello, any one here to help??:)
[9:23] * sleinen (~Adium@2001:620:0:2d:c88c:deeb:5946:3132) Quit (Ping timeout: 480 seconds)
[9:24] <wiwengweng> I met some problems using curl to test ceph
[9:25] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:25] * ChanServ sets mode +v andreask
[9:26] <yy-nm> about RADOS gateway?
[9:26] * s2r2 (~s2r2@185.27.182.16) has joined #ceph
[9:26] <wiwengweng> if I had created a bucket named "mybucket",upload a object with key="my.txt", the server ip is 192.168.213.12, how to list the buckets and the objects using curl
[9:26] <huangjun> yy-nm:i can't remeber it.
[9:26] <wiwengweng> yes, use the radosgw
[9:27] <wiwengweng> but I just start using curl, but cannot connect the gateway
[9:28] * sleinen (~Adium@2001:620:0:26:a180:fe3d:1701:712c) has joined #ceph
[9:28] <yy-nm> wiwengweng, do you try http://tracker.ceph.com/issues/3296
[9:28] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:29] <wiwengweng> i'll read first~
[9:30] <wiwengweng> my apache is using the 80 port virtual host, so I can use the command to test first
[9:34] <wiwengweng> but how can I know the url and the parameters?
[9:34] <wiwengweng> :D
[9:36] <wiwengweng> I do know I need to form this command, but can you tell me the actual meaning of the paras in the following command?? curl -i http://localhost:8080/auth/v1.0 -H "X-Auth-User: mediawiki:swift" -H "X-Auth-Key:RD2jq3CkZddVuriSUhvD7ZOHLLHl5+RHg2uSzxej"
[9:39] <yy-nm> it means auth secret and auth user? i guess "-H "X-Auth-User: mediawiki:swift" -H "X-Auth-Key:RD2jq3CkZddVuriSUhvD7ZOHLLHl5+RHg2uSzxej"
[9:42] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[9:47] * infinitytrapdoor (~infinityt@134.95.27.132) has joined #ceph
[9:56] * bergerx_ (~bekir@78.188.204.182) has joined #ceph
[9:59] <wiwengweng> yes. can you help me to form a command to meet my requirement? if I had created a bucket named "mybucket",upload a object with key="my.txt", the server ip is 192.168.213.12, how to list the buckets and the objects using curl
[10:00] <paravoid> mediawiki? ceph?
[10:00] <paravoid> swift?
[10:00] * paravoid is listening :)
[10:02] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) Quit (Read error: Connection reset by peer)
[10:02] * stacker666 (~stacker66@85.58.195.104) has joined #ceph
[10:03] <huangjun> did you try "curl 192.168.213.12/mybucket"?
[10:06] <wiwengweng> yes. access denied
[10:07] <wiwengweng> so I think I should add a access key or something, but I don't know the parameter name
[10:07] <wiwengweng> may it be something like '-H AccessKey=xxxxxxxxxx'
[10:08] <wiwengweng> I don;t know. Do you have an idea?
[10:09] <huangjun> uhh,but i tried it, works fine,
[10:09] <huangjun> i'm not failmar with curl yet,sorry
[10:09] <wiwengweng> eh. Do you disable the cephx auth??
[10:11] <huangjun> yes,
[10:15] <wiwengweng> oh, thanks. but what is the output when you exec the command
[10:15] * allsystemsarego (~allsystem@188.27.166.68) has joined #ceph
[10:16] * infinitytrapdoor (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[10:17] <huangjun> <?xml version="1.0" encoding="UTF-8"?><ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Name>huangjuntest1</Name><Marker></Marker><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>Beat It.mp3</Key><LastModified>2013-07-19T02:42:56.000Z</LastModified><ETag>&quot;dd261a6a860d606158a8e1bef467d31d&quot;</ETag><Size>4148997</Size><StorageClass>STANDARD</StorageClass>
[10:17] <huangjun> <Owner><ID>huangjun2</ID><DisplayName>huangjun2</DisplayName></Owner></Contents></ListBucketResult>
[10:18] * infinitytrapdoor (~infinityt@134.95.27.132) has joined #ceph
[10:18] <huangjun> just like you use see in webbrowser
[10:21] <wiwengweng> but not my case. even I access from the browser, show 'access denied'
[10:24] <huangjun> uhh, that may be the auth problem. did you asked google?
[10:24] <wiwengweng> but as I use python to test the s3 api, it works fine...
[10:25] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:25] <wiwengweng> google a lot, but got nothing
[10:25] <wiwengweng> fine. I will continue to check that
[10:29] <huangjun> good luck
[10:30] <wiwengweng> T_T
[10:30] <wiwengweng> can you throw a hint?
[10:31] * deadsimple (~infinityt@134.95.27.132) has joined #ceph
[10:31] <wiwengweng> about which part may cause this denial
[10:32] * infinitytrapdoor (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[10:35] <huangjun> so you can check the log in this two places :/var/log/ceph/radosgw.log and /var/log/httpd/error.log
[10:39] * deadsimple (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[10:46] * dosaboy (~dosaboy@faun.canonical.com) has joined #ceph
[10:47] * infinitytrapdoor (~infinityt@134.95.27.132) has joined #ceph
[10:50] <soren> ceph.com looks down?
[10:50] <paravoid> works for me
[10:51] <soren> Really?
[10:52] <soren> http://www.downforeveryoneorjustme.com/ceph.com agrees with me.
[10:53] * huangjun (~huangjun@111.173.96.119) Quit (Read error: Connection reset by peer)
[10:53] * fridudad (~oftc-webi@fw-office.allied-internet.ag) Quit (Quit: Page closed)
[10:54] * Volture (~quassel@office.meganet.ru) has joined #ceph
[10:54] <yy-nm> just tracker?? ceph.com is available
[10:54] <jcfischer> down for me
[10:57] * yy (~michealyx@218.74.37.82) has left #ceph
[11:10] * huangjun (~huangjun@111.173.96.119) has joined #ceph
[11:11] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[11:14] * yanzheng (~zhyan@jfdmzpr04-ext.jf.intel.com) Quit (Quit: Leaving)
[11:22] <leseb1> up for me (EU)
[11:22] * leseb1 (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) has left #ceph
[11:23] * leseb1 (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) has joined #ceph
[11:23] * waxzce (~waxzce@glo44-2-82-225-224-38.fbx.proxad.net) Quit (Remote host closed the connection)
[11:29] * infinitytrapdoor (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[11:30] * haomaiwa_ (~haomaiwan@notes4.com) Quit (Remote host closed the connection)
[11:33] <sleinen> leseb: It may only be up for you because your name servers still have it cached.
[11:33] <sleinen> The .COM nameservers tell me that ceph.com doesn't exist (is not delegated).
[11:33] * error27_ (~dcarpente@41.202.233.179) has joined #ceph
[11:34] <sleinen> …although whois says that there are three nameservers, ns{1,2,3}.dreamhost.com.
[11:34] <madkiss> Domain Name: CEPH.COM
[11:34] <madkiss> Registrar: NEW DREAM NETWORK, LLC
[11:40] <paravoid> indeed, the domain has expired
[11:40] <paravoid> it's on clientHold
[11:54] * infinitytrapdoor (~infinityt@134.95.27.132) has joined #ceph
[12:02] <madkiss> ouh?
[12:03] <madkiss> Expiration Date: 18-jul-2014
[12:03] <madkiss> hm
[12:05] <madkiss> For gTLDs the registry automatically extends the renewal for another year even when the name isn't renewed by the registrant. The domain enters the 'registrar grace' period where it can be renewed by the registrar for no additional fee (~30-45 days).
[12:05] <madkiss> well then, inktank, get going :)
[12:06] * huangjun (~huangjun@111.173.96.119) Quit (Quit: HydraIRC -> http://www.hydrairc.com <- Now with extra fish!)
[12:08] <joao> sent an email to people with that info
[12:08] <joao> thanks for digging that up
[12:13] <error27_> my static checker complains that in handle_session() we could receive and CEPH_SESSION_CLOSE message and release the last reference leading to a use after free bug
[12:14] * wiwengweng (~oftc-webi@202.108.100.231) Quit (Remote host closed the connection)
[12:15] <joao> error27_, where?
[12:16] <error27_> joao: the call to __unregister_session() frees the session if it's the last reference
[12:16] <joao> is this on the kernel client?
[12:17] <error27_> joao: yes
[12:18] * leseb1 (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) Quit (Quit: Leaving.)
[12:18] * Volture (~quassel@office.meganet.ru) Quit (Remote host closed the connection)
[12:21] * matt__ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[12:23] <joao> error27_, mind filing a ticket for that?
[12:24] <paravoid> if he can :)
[12:24] <error27_> paravoid: heh. good point. :P
[12:25] * houkouonchi-home (~linux@pool-108-38-63-48.lsanca.fios.verizon.net) has joined #ceph
[12:25] * infinitytrapdoor (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[12:25] <houkouonchi-home> joao: thanks for emailing the list about that
[12:26] <joao> houkouonchi-home, thank madkiss and paravoid, they were the ones who noticed it!
[12:26] <houkouonchi-home> its times like this I am really glad I still have all my DH access lol
[12:27] <joao> so are we :p
[12:27] <paravoid> DH?
[12:27] <houkouonchi-home> DreamHost
[12:27] <paravoid> oh
[12:27] <houkouonchi-home> I am an ex-Dreamhost gone Inktank employee =)
[12:27] <joao> who just fixed ceph.com btw
[12:27] <houkouonchi-home> but i still leech office space from them =)
[12:28] <joao> houkouonchi-home, isn't it like bed time for you?
[12:28] <joao> I'd say even some 3 or 4 hours ago?
[12:28] <joao> :p
[12:29] <houkouonchi-home> heh yeah its 3:24 AM
[12:29] <houkouonchi-home> i have problems sleeping though. I am usually up atleast this late
[12:29] <houkouonchi-home> even if I go to bed now I gotta get up for work in 6 hours =)
[12:30] * dobber (~dobber@89.190.199.210) has joined #ceph
[12:30] <houkouonchi-home> surprisingly I actually had less problems sleeping when I worked nights (10 PM -8 AM)
[12:33] <joao> eh, I know the feeling
[12:33] <joao> I have trouble falling asleep, not so much sleeping; as soon as I get going I then have problems waking up, but hey...
[12:33] <houkouonchi-home> My ideal shift would be a 12-8 PM or 1-9 PM even
[12:33] <houkouonchi-home> same here
[12:33] <houkouonchi-home> I usually can sleep just fine... once I get to sleep
[12:34] <joao> that's why this arrangement kind of works out for me
[12:34] <houkouonchi-home> and yeah getting 4-5 hours of sleep per day every work day makes waking up not fun lol
[12:34] <joao> if I sleep in the morning I'm still in time to beat everyone by 6 hours :p
[12:35] <houkouonchi-home> my new 39 inch $500 4k display is due to arrive tomorrow. I am excited. I will bring it into Brea next week
[12:36] <joao> lol
[12:36] <joao> that's wasted on vim
[12:37] <houkouonchi-home> wel honestly I use it more for desktop real-estate than games or actual 4k video
[12:37] * infinitytrapdoor (~infinityt@134.95.27.132) has joined #ceph
[12:37] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Operation timed out)
[12:37] <houkouonchi-home> so I can see more windows at once. a lot of the time they are text editors too =)
[12:40] * haomaiwang (~haomaiwan@220.181.73.125) has joined #ceph
[12:43] <joao> I remember upgrading from a 17" to a 22" (my current main monitor) and the joy it was being able to see twice the text on vim; I bet it would feel like christmas with a 39" :p
[12:44] <houkouonchi-home> I actually have used 4k displays since 2005 but until recently it was 22 inches
[12:44] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:44] <houkouonchi-home> and about the only thing it had going for it was resolution and in all other aspects it was kinda crappy cause it was an old LCD panel (2002)
[12:45] <houkouonchi-home> it excites me a great deal that high resolution TV's/display's/etc are coming out
[12:45] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) has joined #ceph
[12:45] <houkouonchi-home> even efore that 22 inch I used a 30 inch dell (2560x1600) and before that a 22 inch CRT (2560x1920)
[12:45] <houkouonchi-home> i actually haven't ran lower than 2048x1536 since 1999
[12:47] <houkouonchi-home> my dad is a developer (been programming since high school) and he was into high resolution so me and my sister got into high resolution at a young age
[12:47] <houkouonchi-home> once your acclimated to smaller text and more desktop real-estate you just can't go back.
[12:47] <houkouonchi-home> My dad and sister also have 4k displays
[12:48] <joao> yeah, I guess that makes a big difference
[12:48] <joao> I honestly never cared enough to spend the money :p
[12:49] * xdeller (~xdeller@91.218.144.129) Quit (Read error: Connection reset by peer)
[12:49] <houkouonchi-home> hehe, well I guess I should be glad my dad was into high resolution displays as it is what allowed me to enjoy them as well
[12:49] <houkouonchi-home> i dont know many other people back in the 90s with $900 CRT's hehe
[12:51] * haomaiwang (~haomaiwan@220.181.73.125) Quit (Ping timeout: 480 seconds)
[12:52] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[12:56] * deadsimple (~infinityt@134.95.27.132) has joined #ceph
[12:56] * yy-nm (~chatzilla@218.74.37.82) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 22.0/20130618035212])
[12:58] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[13:01] * infinitytrapdoor (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[13:03] * yanzheng (~zhyan@134.134.139.76) has joined #ceph
[13:11] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[13:15] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:15] * ChanServ sets mode +v andreask
[13:18] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:20] * xdeller (~xdeller@91.218.144.129) has joined #ceph
[13:20] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[13:29] * deadsimple (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[13:30] * infinitytrapdoor (~infinityt@134.95.27.132) has joined #ceph
[13:46] * error27_ (~dcarpente@41.202.233.179) Quit (Remote host closed the connection)
[13:51] * diegows (~diegows@190.190.2.126) has joined #ceph
[13:59] * keithjasper (~keithjasp@host81-133-229-76.in-addr.btopenworld.com) has joined #ceph
[13:59] <keithjasper> hey all. Does the recovery time for restarting ceph (recover from degraded) speed up the more host nodes you have? same with data speeds?
[13:59] <keithjasper> obviously pending Specs and Network
[14:00] <Gugge-47527> well, if you add a single disk, it takes the time it takes to fill that disk with data
[14:00] <jksM> keithjasper, yes, if you have the same amount of data in both cases
[14:00] <Gugge-47527> if a disk dies, recovery from that is faster if you have more disks
[14:01] * Cube1 (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[14:01] <keithjasper> :) cool thanks solves that question :)
[14:01] <keithjasper> just added my third node to 8TB arra
[14:02] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[14:09] * themgt_ (~themgt@pc-56-219-86-200.cm.vtr.net) has joined #ceph
[14:10] * waxzce (~waxzce@office.clever-cloud.com) has joined #ceph
[14:11] * infinitytrapdoor (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[14:12] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:13] * infinitytrapdoor (~infinityt@134.95.27.132) has joined #ceph
[14:13] * haomaiwang (~haomaiwan@117.79.232.209) has joined #ceph
[14:13] <keithjasper> is there a way to MOVE the journal from Spinning (with osd) to SSD, just removing journal?
[14:13] * themgt (~themgt@pc-56-219-86-200.cm.vtr.net) Quit (Ping timeout: 480 seconds)
[14:13] * themgt_ is now known as themgt
[14:14] <keithjasper> moving** not removing
[14:14] <keithjasper> is it as simple as stop OSD, move journal to SSD Partition, mount in same place as old journal, and start OSD?
[14:23] * allsystemsarego_ (~allsystem@188.27.167.90) has joined #ceph
[14:29] * allsystemsarego (~allsystem@188.27.166.68) Quit (Ping timeout: 480 seconds)
[14:32] * bergerx_ (~bekir@78.188.204.182) Quit (Ping timeout: 480 seconds)
[14:37] * illya (~illya_hav@13-6-133-95.pool.ukrtel.net) Quit (Ping timeout: 480 seconds)
[14:38] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[14:38] * madkiss (~madkiss@2001:6f8:12c3:f00f:b58c:7c0d:92e6:1c0e) Quit (Quit: Leaving.)
[14:41] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) has joined #ceph
[14:42] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:42] * infinitytrapdoor (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[14:43] * bergerx_ (~bekir@78.188.204.182) has joined #ceph
[14:44] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:47] * infinitytrapdoor (~infinityt@134.95.27.132) has joined #ceph
[14:47] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Read error: Connection reset by peer)
[14:52] * yanzheng (~zhyan@134.134.139.76) Quit (Remote host closed the connection)
[15:04] * yanzheng (~zhyan@134.134.139.76) has joined #ceph
[15:04] * markbby (~Adium@168.94.245.1) has joined #ceph
[15:12] * infinitytrapdoor (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[15:18] * BillK (~BillK-OFT@203-59-173-44.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[15:18] * s2r2 (~s2r2@185.27.182.16) Quit (Read error: No route to host)
[15:23] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[15:23] * drokita1 (~drokita@97-92-254-72.dhcp.stls.mo.charter.com) Quit (Quit: Leaving.)
[15:24] * drokita (~drokita@97-92-254-72.dhcp.stls.mo.charter.com) has joined #ceph
[15:29] * orium (~fulano@a79-169-47-110.cpe.netcabo.pt) Quit (Ping timeout: 480 seconds)
[15:32] * drokita (~drokita@97-92-254-72.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[15:32] <mozg> hello guys
[15:32] <mozg> i was wondering if anyone has already upgraded to 0.61.5?
[15:32] * DarkAce-Z (~BillyMays@50.107.55.36) has joined #ceph
[15:33] <mozg> if so, how did you address the monitor upgrades as it seems that pre 0.61.5 can't quorum with the 0.61.5 versions
[15:33] <mozg> i would like to upgrade without downtime
[15:34] <soren> I was just looking at http://wiki.ceph.com/01Planning/02Blueprints/Emperor/Erasure_coded_storage_backend_(step_2) I don't understand that initial sentence: "As of Dumpling Ceph keeps copies of objects to not loose them when hardware fails." Ceph has done that since the very beginning, surely?
[15:36] * DarkAceZ (~BillyMays@50.107.55.36) Quit (Ping timeout: 480 seconds)
[15:36] <mikedawson> mozg: I have. You will have to break quorum for 0.61.4 -> 0.61.5, but the monitor downtime is short (<1 minute)
[15:37] <mozg> i see
[15:37] <mozg> did you upgrade them one by one?
[15:37] <mozg> or all at once?
[15:37] <mozg> coz i've got 3 mons
[15:37] <mikedawson> mozg: Install it everywhere then restart the mons at the same time. They reform quorum quickly.
[15:37] <mozg> nice
[15:37] <mozg> will try now
[15:38] <mikedawson> Then restart OSDs however you typically do a rolling update
[15:38] <mozg> i am going to do apt-get upgrade
[15:38] <mozg> not sure if it will automatically restart all the services
[15:39] <mikedawson> mozg: It didn't automatically restart daemons for me (13.04)
[15:40] <mozg> mikedawson: i see so many people running ceph on 13.04 instead of the 12.04 lts
[15:40] <mozg> is there a reason for this?
[15:43] <mikedawson> mozg: We wanted newer kernels and various packages
[15:44] <mozg> i see
[15:44] <mozg> relating to ceph or in general?
[15:45] <jmlowe> btrfs is broken and will truncate sparse files (ceph uses sparse files and will lose data) with a kernel < 3.8, 13.04 has a 3.8 kernel
[15:45] <jmlowe> that would be my primary reason for 13.04
[15:45] <mozg> the reason for asking is that i am on 12.04 but with backported kernel
[15:45] <mikedawson> mozg: not-related to ceph
[15:46] <mozg> so, i wonder if i should do a upgrade to 13.04 or just use the 12.04 with backports
[15:49] * drokita (~drokita@199.255.228.128) has joined #ceph
[15:50] <mikedawson> mozg: Ceph moves fast... http://ceph.com/community/ceph-settles-in-to-aggressive-release-cadence/
[15:50] <mozg> mikedawson: thanks for the upgrade tip. went smoothly )))
[15:51] <mikedawson> mozg: glad to help
[15:53] * illya (~illya_hav@147-152-133-95.pool.ukrtel.net) has joined #ceph
[15:54] * jeff-YF (~jeffyf@64.191.222.109) has joined #ceph
[16:03] * jeff-YF_ (~jeffyf@67.23.123.228) has joined #ceph
[16:05] * yanzheng (~zhyan@134.134.139.76) Quit (Remote host closed the connection)
[16:06] * orium (~fulano@di18.di.fct.unl.pt) has joined #ceph
[16:08] * jeff-YF_ (~jeffyf@67.23.123.228) Quit (Quit: jeff-YF_)
[16:08] * jeff-YF (~jeffyf@64.191.222.109) Quit (Ping timeout: 480 seconds)
[16:12] * markbby (~Adium@168.94.245.1) Quit (Remote host closed the connection)
[16:12] * markbby (~Adium@168.94.245.1) has joined #ceph
[16:16] * yanzheng (~zhyan@134.134.139.70) has joined #ceph
[16:16] * jeff-YF (~jeffyf@64.191.222.109) has joined #ceph
[16:19] * jeff-YF_ (~jeffyf@67.23.123.228) has joined #ceph
[16:21] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) has joined #ceph
[16:22] * lyncos (~chatzilla@208.71.184.41) Quit (Remote host closed the connection)
[16:24] * jeff-YF (~jeffyf@64.191.222.109) Quit (Ping timeout: 480 seconds)
[16:24] * jeff-YF_ is now known as jeff-YF
[16:29] * jeff-YF (~jeffyf@67.23.123.228) Quit (Quit: jeff-YF)
[16:30] * Cube (~Cube@173-8-221-113-Oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[16:31] * yanzheng (~zhyan@134.134.139.70) Quit (Remote host closed the connection)
[16:39] <paravoid> yehudasa: hey
[16:39] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[16:39] * jefferai (~quassel@corkblock.jefferai.org) has joined #ceph
[16:54] * mikedawson (~chatzilla@23-25-19-9-static.hfc.comcastbusiness.net) has joined #ceph
[16:56] * yanzheng (~zhyan@101.83.52.186) has joined #ceph
[16:59] * madkiss (~madkiss@089144192063.atnat0001.highway.a1.net) has joined #ceph
[16:59] * stacker666 (~stacker66@85.58.195.104) Quit (Read error: Operation timed out)
[17:00] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:02] <sage> paravoid: you saw sam's comment on that bug?
[17:02] <paravoid> I did
[17:02] <paravoid> I replied too
[17:03] <paravoid> Ugh. This (9fe7806d4f1fe67fa10906df20cbed017321effe) isn't actually 2 commits behind wip-5655, it's just an earlier incarnation of wip-5655 (your very first attempt). I didn't see the "unknown message type" in the other OSD logs so I assumed it had worked. I installed wip-5655 just as it appeared on gitbuilder, not accounting for its backlog, and foolishly didn't check the sha1. Sorry about that...
[17:03] <paravoid> I installed c0e77c91b6c39998ef4e19a726db87b66850cf2c and confirmed that it did the trick now. Thanks!
[17:03] <paravoid> (copying as I'm not sure if you can access tracker.ceph.com now...)
[17:03] <sage> oh cool.
[17:03] <sage> nice
[17:03] <sage> ok, i'll pull that into next then; thanks for testing!
[17:04] <paravoid> :)
[17:04] <paravoid> I found two serious radosgw issues though
[17:04] <paravoid> http://tracker.ceph.com/issues/5674 & http://tracker.ceph.com/issues/5675
[17:08] * yanzheng (~zhyan@101.83.52.186) Quit (Ping timeout: 480 seconds)
[17:11] <sage> k
[17:12] <paravoid> I'm mentioning because you might care about that -rc
[17:12] <paravoid> with regards to that -rc I mean
[17:15] * diegows (~diegows@200.68.116.185) has joined #ceph
[17:16] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:18] * rudolfsteiner (~federicon@200.68.116.185) has joined #ceph
[17:26] <yehudasa__> paravoid: hey
[17:26] <paravoid> hello
[17:27] <paravoid> did you see my bug reports?
[17:27] <paravoid> the first I worked around, the second I couldn't find how to
[17:28] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Read error: Operation timed out)
[17:29] <yehudasa__> paravoid: I think the first is really a rados issue
[17:30] <yehudasa__> paravoid: on what request are you getting the 301?
[17:30] <yehudasa__> I guess you didn't set up multiple regions?
[17:30] * sagelap (~sage@76.89.177.113) Quit (Read error: Operation timed out)
[17:31] * sagelap (~sage@2600:1012:b02f:726d:8081:d1dd:f8ab:a9e1) has joined #ceph
[17:32] <paravoid> all requests
[17:32] <paravoid> no, I didn't set up anything
[17:32] <paravoid> I just restarted radosgw
[17:34] * madkiss (~madkiss@089144192063.atnat0001.highway.a1.net) Quit (Quit: Leaving.)
[17:36] <yehudasa__> my guess is that it fails to identify the current region as the master region
[17:36] <yehudasa__> that is, the region where all your old buckets were created
[17:36] <yehudasa__> I'll try to reproduce
[17:37] <yehudasa__> paravoid: were you running on cuttlefish before?
[17:37] <paravoid> no
[17:37] <paravoid> 0.66
[17:37] <yehudasa__> hmm.. I'll try to do the same from cuttlefish, should be similar
[17:38] * dobber (~dobber@89.190.199.210) Quit (Remote host closed the connection)
[17:47] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[17:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[17:51] * sagelap (~sage@2600:1012:b02f:726d:8081:d1dd:f8ab:a9e1) Quit (Read error: Connection reset by peer)
[17:53] * loicd is done moving ObjectContext / SnapSetContext out of ReplicatedPG and contemplates the mess he did ...
[17:54] * waxzce (~waxzce@office.clever-cloud.com) Quit (Remote host closed the connection)
[17:55] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) Quit (Quit: Leaving.)
[17:56] * smiley (~smiley@c-71-200-71-128.hsd1.md.comcast.net) Quit (Quit: smiley)
[17:56] <loicd> ccourtaut: ^
[17:57] <loicd> I'm running make check. I'd be very surprised if it passes ;-)
[17:59] <loicd> I now need to sort https://github.com/dachary/ceph/blob/wip-5487/src/osd/ObjectRegistry.h into something sensible
[18:02] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:07] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Quit: jlogan1)
[18:08] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[18:12] * mikedawson (~chatzilla@23-25-19-9-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[18:16] * sagelap (~sage@38.122.20.226) has joined #ceph
[18:20] <sagewk> paravoid: ping
[18:21] <sagewk> actually, nm
[18:21] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[18:22] * jcfischer (~fischer@user-23-14.vpn.switch.ch) Quit (Quit: jcfischer)
[18:24] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[18:25] * keithjasper (~keithjasp@host81-133-229-76.in-addr.btopenworld.com) Quit (Quit: keithjasper)
[18:27] <sagewk> yehudasa: wip-5674 has the radosgw caps doc update
[18:29] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[18:33] <loicd> make check passes on https://github.com/dachary/ceph/tree/wip-5487 which is proof that there should be more unit tests rather than a sign that my work actually does what's expected ;-)
[18:34] <yehuda_hm> paravoid: I pushed a fix to another branch, gregaf wil review everything and we'll merge it soon
[18:35] <yehuda_hm> note that there are a few other fixes there that you'd want to have
[18:40] * virsibl (~virsibl@193.34.8.232) has joined #ceph
[18:42] * illya (~illya_hav@147-152-133-95.pool.ukrtel.net) has left #ceph
[18:44] * morse_ (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[18:46] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:48] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[18:50] <sagewk> alfredodeza: opened a ceph-deploy bug for you, 5678.. can you take a look?
[18:51] <sagewk> finally figured out this is the source of the nightly failures
[18:51] * sleinen (~Adium@2001:620:0:26:a180:fe3d:1701:712c) Quit (Quit: Leaving.)
[18:51] * sleinen (~Adium@130.59.94.187) has joined #ceph
[18:52] <paravoid> yehuda_hm: which branch is that?
[18:53] <sagewk> wip-rgw-next
[18:53] <paravoid> okay
[18:53] <paravoid> thanks :)
[18:53] * dosaboy (~dosaboy@faun.canonical.com) Quit (Quit: leaving)
[18:54] * alexbligh (~alexbligh@89-16-176-215.no-reverse-dns-set.bytemark.co.uk) Quit (Quit: Terminated with extreme prejudice - dircproxy 1.0.5)
[18:55] * alexbligh (~alexbligh@89-16-176-215.no-reverse-dns-set.bytemark.co.uk) has joined #ceph
[18:56] * sleinen (~Adium@130.59.94.187) Quit (Read error: Operation timed out)
[18:58] <alfredodeza> sagewk: I sure can
[18:59] * X3NQ (~X3NQ@195.191.107.205) Quit (Remote host closed the connection)
[19:04] <yehuda_hm> sagewk: wip-5674 looks good
[19:04] <sagewk> k
[19:06] <gregaf> yehuda_hm: I know that much of wip-rgw-next is bugfixes, including some that users have reported, but it doesn't reference any bugs in the commit messages?
[19:07] <yehuda_hm> gregaf: the only reported bug fix is in the top commit (5675)
[19:07] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Read error: Connection reset by peer)
[19:10] * orium (~fulano@di18.di.fct.unl.pt) Quit (Read error: Operation timed out)
[19:10] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Remote host closed the connection)
[19:11] * virsibl (~virsibl@193.34.8.232) Quit (Read error: Operation timed out)
[19:12] * LeaChim (~LeaChim@97e00a48.skybroadband.com) Quit (Ping timeout: 480 seconds)
[19:12] * LeaChim (~LeaChim@97e00a48.skybroadband.com) has joined #ceph
[19:13] * bergerx_ (~bekir@78.188.204.182) Quit (Quit: Leaving.)
[19:24] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[19:33] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[19:35] * Tamil (~tamil@38.122.20.226) has joined #ceph
[19:38] * jluis (~JL@89.181.157.135) has joined #ceph
[19:39] * joao (~JL@89.181.157.135) Quit (Remote host closed the connection)
[19:39] * jluis is now known as joao
[19:39] <sagewk> sjust: wip-stats
[19:39] <sjust> k
[19:39] <sagewk> the second to last patch is redundant but seems worthwhile
[19:41] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Read error: Operation timed out)
[19:42] * smiley (~smiley@c-71-200-71-128.hsd1.md.comcast.net) has joined #ceph
[19:43] <sjust> ah, going backwards would explain the negative stats
[19:44] <sagewk> sjust: https://github.com/ceph/ceph/commit/f61b52879efba2da5fad322de560d9d26cf25f12
[19:44] <sagewk> yeah, altho i'm not sure that's what really is happening. could be.
[19:44] <sjust> that looks good
[19:44] <sjust> the debugging, that is
[19:45] <sjust> doesn't explain the stuck at 0 case though
[19:45] <sagewk> nope. do you see that on your test cluster?
[19:45] <sjust> yeah, haven't looked into it
[19:46] <sagewk> that's the 2 iterations of values, then 0, then 2 more, 0, etc.?
[19:46] <sagewk> what is the mon for that cluster?
[19:46] <sjust> no, just 0s
[19:46] <sjust> it's actually happening now
[19:46] <sjust> mira103
[19:46] * jcfischer (~fischer@130.59.94.162) has joined #ceph
[19:46] <sjust> there are apparently inconsistent pgs
[19:46] <sjust> also
[19:47] <sagewk> k
[19:48] <sjust> and that is because osd 49 is spewing EIOs
[19:48] <sjust> or rather, the fs is hiding eios
[19:48] * jcfischer_ (~fischer@user-28-9.vpn.switch.ch) has joined #ceph
[19:48] <sagewk> oh, the 0 thing is just that it's < 1 and it's integer arithmetic.
[19:49] <sjust> ah
[19:49] <sjust> well, at the moment it's recovering at a pretty good clip
[19:49] <sjust> since I killed osd 49
[19:49] <sagewk> maybe i'll make a float version of si_t that does a few significant digits
[19:49] <sjust> and is still 0
[19:50] <sagewk> oh, still 0 tho, yeah something else is wrong.
[19:50] * jcfischer (~fischer@130.59.94.162) Quit (Read error: Operation timed out)
[19:50] * jcfischer_ is now known as jcfischer
[19:51] <sagewk> yeah
[19:52] <paravoid> sjust: so, I did a bit more testing today
[19:52] <paravoid> while peering is *much* much better
[19:52] <paravoid> I still see a big difference between the first and subsequent restarts
[19:53] <paravoid> so I restarted all twelve OSDs in a box (simulating e.g. a server reboot); first time it took about 60-70s, second time it took a split second
[19:53] <paravoid> I didn't even see a pgmap that mentioned "peering"
[19:55] <sagewk> paravoid: can you try to separate that into the down stage vs the up stage?
[19:56] <paravoid> I did so at one point, let me do it again
[19:56] <paravoid> I'm not complaining much btw, 1 minute is far, far better than I'm used to
[19:56] * waxzce (~waxzce@2a01:e35:2e1e:260:155e:8a04:8c10:cb57) has joined #ceph
[19:56] <sagewk> i'm guessing it is all the down stage. the up part is probably what we have the most options to improve tho :/
[19:56] * guppy_ (~quassel@guppy.xxx) Quit (Quit: No Ping reply in 180 seconds.)
[19:57] * guppy (~quassel@guppy.xxx) has joined #ceph
[19:59] <paravoid> nope!
[19:59] <paravoid> http://p.defau.lt/?2JoZcvQrIFq6mjzunoLGqg
[20:00] <paravoid> that was a bit better than the other box
[20:00] <paravoid> let's do a third one
[20:01] <paravoid> meh, that didn't take any time at all
[20:02] <paravoid> okay this is kinda pointless, the rgw bug has prevented me from having any traffic
[20:03] <paravoid> so this isn't a good test
[20:05] <sjust> paravoid: in that run you just posted, you only restarted nodes 1 host?
[20:05] <sjust> I notice you have stale pgs at the end of the <stop> portion
[20:05] <sjust> does your crushmap prevent a pg having all replicas on 1 host?
[20:07] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[20:14] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[20:16] * orium (~fulano@a79-169-47-110.cpe.netcabo.pt) has joined #ceph
[20:25] * jluis (~joao@89.181.157.135) has joined #ceph
[20:28] * grifferz (~andy@specialbrew.392abl.bitfolk.com) Quit (Remote host closed the connection)
[20:28] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Never put off till tomorrow, what you can do the day after tomorrow)
[20:43] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[20:50] * grifferz (~andy@specialbrew.392abl.bitfolk.com) has joined #ceph
[21:02] * smiley (~smiley@c-71-200-71-128.hsd1.md.comcast.net) Quit (Quit: smiley)
[21:08] * diegows (~diegows@200.68.116.185) Quit (Ping timeout: 480 seconds)
[21:09] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[21:25] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[21:27] * joao (~JL@89.181.157.135) Quit (Quit: Leaving)
[21:40] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[21:41] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit ()
[21:52] * rudolfsteiner (~federicon@200.68.116.185) Quit (Quit: rudolfsteiner)
[21:52] * Tamil (~tamil@38.122.20.226) has joined #ceph
[21:52] <sagewk> yehuda_hm: are you still tracking down 5676?
[21:53] <sagewk> davidz: killing off that ~50 machine job so that you get the whole cluster
[21:54] <sagewk> sjust: wip-stats?
[21:54] <houkouonchi-home> sagewk: btw one of the things I did was make a new 'killsuite' script /home/ubuntu/killsuite.py if your in a test directory it will grab the name by that or you can pass it as an argument but it will actually connect to the beanstalk queue and delete the jobs (if they are queued) and kill the processes and nuke the machines
[21:54] <yehuda_hm> sagewk: yeah
[21:55] <dmick> Killsuite Engage!
[21:55] <yehuda_hm> originally I though that pool creation was taking too long, but I misread the log
[21:55] <sagewk> nice
[21:55] <yehuda_hm> sagewk: I vaguely remember that we needed to do some operation after bucket creation to get the maps updated
[21:56] <yehuda_hm> might be something that I removed recently, and it could have affected that
[21:56] <sagewk> in librados you mean?
[21:56] <yehuda_hm> yeah
[21:57] <sagewk> the pool create command waits for the client to get the map reflecting the pool creation, so any subsequent requests will see it
[21:57] <sagewk> should be handled for you.
[21:57] <sagewk> if its reproducible tho we can just turn up debug objecter and rados and see what is happening?
[21:58] <yehuda_hm> yeah, I'm trying that, just some quirks with teuthology
[21:58] <yehuda_hm> the swift test wouldn't bootstrap now
[21:59] <yehuda_hm> gregaf: I pushed updated branch to wip-rgw-next-2
[21:59] <gregaf> k
[21:59] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:59] * ChanServ sets mode +v andreask
[22:01] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[22:01] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[22:01] * markbby (~Adium@168.94.245.1) Quit (Quit: Leaving.)
[22:03] * Meths (rift@2.25.211.188) has joined #ceph
[22:06] * bandrus (~Adium@12.248.40.138) has joined #ceph
[22:11] * allsystemsarego_ (~allsystem@188.27.167.90) Quit (Quit: Leaving)
[22:21] * diegows (~diegows@host63.186-108-72.telecom.net.ar) has joined #ceph
[22:24] <sjust> sagewk: wip-stats looks good
[22:24] <sagewk> k
[22:28] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:28] * markl (~mark@tpsit.com) Quit (Quit: leaving)
[22:28] * markl (~mark@tpsit.com) has joined #ceph
[22:38] * smiley (~smiley@c-71-200-71-128.hsd1.md.comcast.net) has joined #ceph
[22:40] <gregaf> yehuda_hm: you've tested all of wip-rgw-next, right?
[22:40] <yehuda_hm> gregaf: yes
[22:42] * rudolfsteiner (~federicon@200.69.33.194) has joined #ceph
[22:48] * rudolfsteiner (~federicon@200.69.33.194) Quit (Quit: rudolfsteiner)
[22:50] <gregaf> yehuda_hm: hmm, I'm getting a spurious warning about uninit variables on the time parsing code, squashing initialization into that patch and merging
[22:53] <yehuda_hm> gregaf: cool, thanks
[22:57] * rudolfsteiner (~federicon@200.69.33.194) has joined #ceph
[22:57] <gregaf> yay happy clean branch
[22:57] <dmick> do not taunt happy clean branch
[23:03] * jakes (~oftc-webi@128-107-239-233.cisco.com) has joined #ceph
[23:03] <gregaf> I was just talking about the rgw branch I merged; yehuda_hm made it a lot easier to read than they've been lately :)
[23:05] <jakes> i need one help. Can be the reasons that all OSD's are up but, health detail says that 192 pg's are stuck stale
[23:13] * BManojlovic (~steki@mail.sansalvatore.ch) has joined #ceph
[23:19] * DarkAce-Z is now known as DarkAceZ
[23:28] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[23:29] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[23:29] * BillK (~BillK-OFT@58-7-149-55.dyn.iinet.net.au) has joined #ceph
[23:31] * orium (~fulano@a79-169-47-110.cpe.netcabo.pt) Quit (Quit: orium)
[23:31] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Remote host closed the connection)
[23:32] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[23:37] <gregaf> yehuda_hm: would still like some review of wip-rgw-versionchecks, especially the last commit (it returns the current tag and version for metadata put ops)
[23:40] <yehuda_hm> gregaf: sure
[23:40] <yehuda_hm> gregaf: we'll want to get it in soon, but it can probably be post -rc
[23:43] * smiley (~smiley@c-71-200-71-128.hsd1.md.comcast.net) Quit (Quit: smiley)
[23:44] <gregaf> k
[23:45] <gregaf> wait, don't we want it especially for the metadata sync?
[23:45] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[23:53] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[23:54] * mozg (~andrei@82.150.98.65) has joined #ceph
[23:55] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[23:55] * sjustlaptop (~sam@38.122.20.226) Quit (Remote host closed the connection)
[23:56] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[23:59] <sjust> sagewk: wip-cuttlefish-next has the patches which should probably be backproted if you'd like to take a look

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.