#ceph IRC Log

Index

IRC Log for 2013-07-02

Timestamps are in GMT/BST.

[0:00] <dmick> set logging on
[0:00] <dmick> will go to './gdb.txt'
[0:01] * gaveen (~gaveen@175.157.192.219) Quit (Remote host closed the connection)
[0:04] <andrei> here is the output: http://ur1.ca/ei157
[0:04] <andrei> i might have missed several lines as i was manually copy/pasting
[0:06] <andrei> i also had this in dmesg 2 mins ago
[0:06] <andrei> [432364.926090] init: ceph-osd (ceph/7) main process (9823) killed by ABRT signal
[0:06] <andrei> [432364.926147] init: ceph-osd (ceph/7) main process ended, respawning
[0:07] <andrei> after that ceph osd 7 process is running, but ceph osd tree is showing it as down
[0:10] <andrei> any idea what could be wrong with osd.7>
[0:10] <andrei> ?
[0:13] <Psi-jack> heh, so I'm curious, is anyone here running Ceph with CentOS 6.4?
[0:13] <davidz> andrei: After you are in gdb on the process it is going to be declared down. Not suprising.
[0:18] <andrei> okay
[0:19] <jluis> sagewk, repushed
[0:19] <andrei> but i mean more to the general problem, not this particular instance of the process being killed
[0:19] <jluis> now, off to the nearest window for 5 minutes or so
[0:19] <Psi-jack> Hmm, wow... All this time... I thought my Ceph servers were using Deadline scheduler, but no... Apparently CFQ.. Ack!
[0:19] <andrei> my ceph cluster is down when i stop an osd server
[0:20] <sagewk> jluis: thanks, looks good! i'll pull it into master
[0:23] <dmick> andrei: I'm not sure from that trace, no. sjust, does anything look suspicious to you?
[0:23] <sjust> catching up
[0:23] <andrei> at the moment, osd.7 is having a bunch of slow requests
[0:23] <andrei> and ceph tell osd.7 version just hangs
[0:24] <davidz> andrei, sjust, dmick: Thread 42 is interesting…..leveldb
[0:25] <dmick> yeah, and 49
[0:25] * sagelap1 (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Quit: Leaving.)
[0:25] <dmick> and 55
[0:25] <dmick> but I don't know what it means
[0:26] <sjust> 42 appears to be reading a file on for leveldb
[0:27] <sjust> 55 is waiting presumably on 42 to release the index lock
[0:27] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:27] <sjust> 49 is waiting on presumable 42 to release the pg lock, not obviously a problem
[0:27] <sjust> so 42 seems to be the issue
[0:27] <sjust> filesystem?
[0:28] <andrei> i am using xfs
[0:28] * john_ (~john@astound-64-85-225-33.ca.astound.net) has joined #ceph
[0:28] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[0:28] <sjust> anything in dmesg?
[0:29] <sjust> if you let it continue and then interrupt it again, do the backtraces change?
[0:29] <andrei> i've noticed that after i've started the second osd server it took about 20-30 minutes for the ceph -s to become OK
[0:29] <andrei> now, osd.7 seems to come back to live
[0:29] <andrei> but for how long?
[0:30] <andrei> shall I do the trace once again?
[0:30] <andrei> i've done trace twice while osd.7 has been stuck
[0:30] <andrei> the second trace has been saved to a text file
[0:32] <andrei> thread 42 looks similar
[0:33] * drokita1 (~drokita@199.255.228.128) has joined #ceph
[0:33] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Read error: Connection reset by peer)
[0:33] <andrei> yeah, it looks pretty similar
[0:33] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[0:33] * ChanServ sets mode +o scuttlemonkey
[0:34] <andrei> dmesg doesn't show anything useful
[0:34] <andrei> this thing started when i stopped osd processes on the second osd server
[0:34] <andrei> i have 2 osd servers
[0:34] <andrei> wit 8 osds each
[0:35] <andrei> and failure domain set to host
[0:37] <andrei> sjust: i've know done the same thing - stopped the second osd server
[0:37] <andrei> and osd.7 started hanging and doing similar thing
[0:37] <andrei> will do the trace once again
[0:38] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[0:38] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Read error: Connection reset by peer)
[0:38] * LeaChim (~LeaChim@90.221.247.164) Quit (Ping timeout: 480 seconds)
[0:39] * sebastiandeutsch (~sebastian@p57A07A27.dip0.t-ipconnect.de) has joined #ceph
[0:40] * danieagle (~Daniel@186.214.56.159) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[0:40] <andrei> sjust: yeah, same thing, but thread numbering is different
[0:40] <sjust> andrei: that suggests that they are making progress
[0:41] <sjust> let it run?
[0:41] <sjust> if you don't hit a thread pool timeout, it's just slow
[0:42] <andrei> sjust: the trouble is my ceph cluster is down while this is happening
[0:42] <andrei> al vms hang
[0:42] <andrei> and i can't afford this to happen every time i restart one of the osd servers
[0:43] <andrei> ceph -s shows: health HEALTH_WARN 1633 pgs degraded; 159 pgs peering; 159 pgs stuck inactive; 1792 pgs stuck unclean; recovery 418253/904350 degraded (46.249%); 8/16 in osds are down; noout flag(s) set
[0:43] <andrei> health HEALTH_WARN 1633 pgs degraded; 159 pgs peering; 159 pgs stuck inactive; 1792 pgs stuck unclean; recovery 418253/904350 degraded (46.249%); 8/16 in osds are down; noout flag(s) set
[0:43] <andrei> sorry, pasted it twice
[0:43] <andrei> it seems that peering takes ages
[0:44] <andrei> or something else is playing up
[0:45] * drokita1 (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[0:45] * sebastiandeutsch (~sebastian@p57A07A27.dip0.t-ipconnect.de) Quit (Quit: sebastiandeutsch)
[0:45] <andrei> sjust: and I do not have activity on osd.7
[0:45] <sjust> how many of the peering pgs have a primary on osd 7?
[0:45] <andrei> sjust: how can I check?
[0:45] <sjust> ceph pg dump
[0:45] <sjust> acting set [2,4,5] has 2 as primary and 4,5 as replicas
[0:46] <andrei> ceph pg dump |grep -e "\[7\]" |wc -l
[0:46] <andrei> 323
[0:47] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:47] <andrei> is that what you've asked for?
[0:47] <andrei> or did I misunderstand you?
[0:49] <andrei> sjust: the cluster just came back to life
[0:49] * mxmln3 (~maximilia@212.79.49.65) has joined #ceph
[0:50] <andrei> and osd.7 became responsive again and answers to the version request
[0:50] <sjust> andrei: I meant, how many of the peering pgs are on osd7
[0:51] <andrei> it took just over 10 minutes from the time i've stopped all osds on one of the servers until ceph cluster became available again
[0:51] <sjust> what version are you running?
[0:51] <andrei> 0.61.4
[0:51] <andrei> on ubuntu 12.04 with 3.8 kernel from backports
[0:53] * mxmln (~maximilia@212.79.49.65) Quit (Ping timeout: 480 seconds)
[0:53] <andrei> andrei: I meant, how many of the peering pgs are on osd7 <- sorry, the peering has finished. Not sure how to check now
[0:54] <sjust> http://tracker.ceph.com/issues/5084
[0:54] <sjust> I think your problem is related to that one
[0:54] <sjust> we have some work in master which may help, but it can't be backported to cuttlefish
[0:55] * markl (~mark@tpsit.com) Quit (Read error: Connection reset by peer)
[0:56] <andrei> sjust: thanks
[0:56] <andrei> it does look like my issues
[0:56] <andrei> do you know what version would fix this issue?
[0:56] <sjust> andrei: it would mean upgrading to master, you can't really downgrade to cuttlefish
[0:57] <sjust> andrei: you should not do that unless your cluster is expendable
[0:57] <andrei> sjust: to answer your earlier question, it looks like all peering is happening on osd.7
[0:57] <sjust> andrei: that sounds like osd.7 may simply be slow
[0:57] <sjust> you might consider replacing the disk
[0:57] <andrei> sjust: nope, sorry, got some valuable vms on it
[0:57] <andrei> can't kill it
[0:57] <sjust> or running diagnostics
[0:58] <andrei> sjust: i've done that
[0:58] <andrei> i can't find any issues with it
[0:58] <andrei> done smart and dd tests
[0:58] <sjust> can you post the output of ceph pg dump
[0:58] <andrei> no errors
[0:58] <sjust> ?
[0:58] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[0:58] <andrei> speeds are the same as all other osds
[0:58] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[0:58] <davidz> andrei: If you are searching for PGs with [7] then you have a single replica for this data. This means that while any OSDs are down then some portion of your data is offline. Not sure that is a practical way to run rbd. Also, if you lost osd.7 due to disk failure you've lost VMs.
[0:58] <andrei> sure, one sec
[0:59] <andrei> how come?
[0:59] <andrei> davidz: my failure domain is set to host, so should I not have replication to other osds?
[0:59] <sjust> davidz: if I am understanding this correctly, he's got two hosts
[0:59] <andrei> sjust: yeah
[0:59] <andrei> 2 hosts
[0:59] <andrei> 8 osds each
[0:59] <sjust> when he fails the second host, he naturally ends up with 1 remaining osd
[0:59] <sjust> that's normal
[0:59] <andrei> sjust: that's the idea
[0:59] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[0:59] * ChanServ sets mode +v andreask
[0:59] <davidz> I think he only has 1 replica.
[1:00] <andrei> i would like to be able to restart one of the servers
[1:00] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[1:00] <andrei> my replica is set to 2
[1:00] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[1:00] <andrei> i've not changed that from default
[1:01] <andrei> sjust: my ceph pg dump is around 2k lines
[1:01] <andrei> should I upload it somewhere?
[1:01] <davidz> andrei: I expected you ceph pg dump to show locations like [7,X] ….
[1:02] <sjust> andrei: you could upload it to cephdrop
[1:02] <davidz> andrei: ceph pg dump | grep "\[7" | head
[1:02] <andrei> sjust, sure
[1:03] <andrei> sjust: could you please remind me the login details please
[1:03] <andrei> davidz: yeah, i have things like [7,14] [7,14]
[1:03] <andrei> and [7,13] [7,13]
[1:03] * tziOm (~bjornar@ti0099a340-dhcp0745.bb.online.no) Quit (Remote host closed the connection)
[1:03] <davidz> andrei: ok good
[1:05] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has left #ceph
[1:06] <andrei> sjust: not sure if this relates at all, but earlier last week i had an issue with 3 osds (numbers 0, 2 and 4)
[1:06] <andrei> which were flapping
[1:06] * jlogan1 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:06] <andrei> as a result my cluster was down
[1:06] <andrei> the osd processes where filling up disk space like crazy
[1:06] <dmick> ah yes; I remember opining that "flapping" wasn't a very useful term, and that knowing what was actually happening to the OSDs was important, but I don't remember what happened after that
[1:06] <andrei> about 2gb per hour
[1:07] <andrei> dmick: tnt has suggested that i should set weight of the flapping osds to 0
[1:07] <andrei> and let ceph rewrite data onto none flapping osds
[1:07] <andrei> which has helped to bring back my cluster
[1:07] <andrei> the flapping osds are still with 0 weight
[1:07] <andrei> not sure what to do with them
[1:08] <dmick> I stand by what I recommended, which was "find out why they're not stable"
[1:08] <andrei> it has been a hell of a week ))
[1:08] <andrei> dmick: I do not see anything physically wrong with those osds
[1:08] <andrei> i've done many tests
[1:08] <andrei> and all are normal
[1:08] <andrei> so, it's got to be ceph related
[1:09] <andrei> and I am not an expert with ceph as you've gathered by now
[1:09] <dmick> I'm not saying it's not; I'm just saying "flapping" is nearly content-free
[1:09] <andrei> i've installed ceph about a month ago
[1:09] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[1:09] <dmick> perhaps it means there was some SEGV? in which case stack backtraces/cores can be investigated?
[1:09] <dmick> or assert failures in the logs?
[1:09] <dmick> basic investigation
[1:10] * matthewh (~matthewh@114.134.169.190) has joined #ceph
[1:10] <andrei> dmick: i did have segv on one of the osds
[1:10] <andrei> as per dmesg output
[1:10] <andrei> [91667.310626] init: ceph-osd (ceph/4) main process (20439) killed by SEGV signal
[1:11] <andrei> this osd was one of the osds which was going up and down all the time
[1:11] <dmick> so there will be more information about why that process died with a SEGV
[1:11] <dmick> that should be discovered and analyzed
[1:11] <andrei> dmick: i had to wipe the logs as it was quickly filling up my partition
[1:12] <andrei> (((
[1:12] * matthewh (~matthewh@114.134.169.190) Quit ()
[1:12] <andrei> sjust: does the output of ceph pg dump give you any clues?
[1:13] <sjust> andrei: no
[1:13] <davidz> dmick: Maybe we should add an admin interface request to generate a core dump for all daemons. That way you can examine stack traces without having to drop daemons into gdb.
[1:14] <sjust> davidz: gdb seems like it's easy enough
[1:14] <andrei> sjust: so, from what i've gathered my issue with osd.7 slow peering is causing the ceph cluster to hang
[1:14] <andrei> and that is a known bug
[1:14] <davidz> sjust: but I always feel rushed because the daemon can be declared down if you hang out too long.
[1:15] <sjust> davidz: true
[1:15] <sjust> andrei: sort of, your version is pretty strange
[1:15] <andrei> do you have any idea when the fix for it will be available in the stable release?
[1:15] <sjust> andrei: it won't be available in cuttlefish, it will have to wait for dumpling
[1:16] <andrei> sjust: and when is dumpling release date? Any estimates?
[1:16] <sjust> I think there is info on the website
[1:16] <dmick> davidz: isn't there something to do that anyway?...gcore?...
[1:17] * tnt (~tnt@109.130.72.62) Quit (Ping timeout: 480 seconds)
[1:17] <andrei> to answer my question, August 2013
[1:17] <andrei> a long time away ((
[1:18] <andrei> sjust: is there a temp workaround that I can use to address this problem?
[1:18] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[1:19] <andrei> sjust: this issue seems to effect only osd.7
[1:20] <sagewk> joshd: around?
[1:20] <davidz> dmick: yes, gcore should work. We should try to remember that next time. I'm so use to gdb myself.
[1:20] <andrei> i have now restarted another osd server and it took 3 seconds for the clusters to become available
[1:20] <andrei> all pgps got peered pretty fast
[1:22] <sjust> andrei: you have exactly two osd servers, correct?
[1:22] <sjust> so you restarted the other one and did not see a problem?
[1:22] <andrei> sjust: yeah, only 2
[1:22] <andrei> sjust: that's correct
[1:22] <andrei> the other one did not cause any issues
[1:22] <sjust> specifically, you just now restarted the host with osd.7 on it?
[1:23] <andrei> yeah, that is correct
[1:23] <andrei> where as before I was restarting the server with osds.9-16
[1:23] <sjust> try now to restart the host without osd.7 on it again and confirm the bad behavior
[1:23] <andrei> sjust: give me a few minutes
[1:24] <andrei> i have done this 3 times today, but can do it again
[1:24] <andrei> all 3 times had identical behaviour
[1:24] <sjust> I'll be back in a little bit
[1:24] <sjust> hour or two
[1:24] <andrei> sjust: i will go to bed soon
[1:24] <andrei> it's getting late
[1:24] <sjust> k, i'll check in tomorrow
[1:25] <andrei> what time will you be around?
[1:25] <sjust> 10-5 or so PST
[1:25] <andrei> i can post my results here before leaving
[1:25] <sjust> that also works
[1:26] <andrei> sjust: thanks for your help!!!
[1:27] * dw (~dmw@188.40.84.149) Quit (Remote host closed the connection)
[1:34] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[1:34] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[1:42] * phillipsbot (~daniel@mail.phunq.net) has joined #ceph
[1:47] * nwat (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[1:47] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[2:00] * mschiff (~mschiff@81.92.22.210) Quit (Remote host closed the connection)
[2:14] <Qu310> anyone around who may have used the chef cookbook to deploy ceph? i'm getting a weird issue it keeps trying to put my interface names in the mon host's section ie "mon host = lo, eth0, bond0, br0, eth0, lo, br0, bond0" and the chef client log says, "ERROR: execute[peer ["lo", "eth0", "bond0", "br0"]] (ceph::mon line 87) had an error: Expected process to exit with [0], but received '255'"
[2:17] <davidz> paravoid: I was not able to reproduce what you saw. I update from v0.61 to the same 0.65 release (946a838).
[2:28] <andrei> sjust: after restarting the server with osd.7 had similar issues. so it does look like there are some issues with osd.7
[2:33] * phillipsbot (~daniel@mail.phunq.net) Quit (Remote host closed the connection)
[2:37] * rturk is now known as rturk-away
[2:43] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[2:48] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[2:51] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[2:53] * sagelap (~sage@2600:1012:b01a:dbf5:59ef:e19:c22c:ec80) has joined #ceph
[2:55] * LPG (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[2:55] * LPG|2 (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[2:57] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Bye!)
[3:18] * LPG (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[3:18] * LPG|2 (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[3:27] * alram (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[3:29] * LPG (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[3:29] * LPG|2 (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[3:30] * markbby (~Adium@168.94.245.3) has joined #ceph
[3:31] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has left #ceph
[3:39] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[3:41] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) Quit (Read error: Connection timed out)
[3:41] * mikedawson (~chatzilla@c-68-58-243-29.hsd1.sc.comcast.net) Quit (Ping timeout: 480 seconds)
[3:42] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[3:47] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[3:52] * zhangbo (~zhangbo@221.226.39.82) has joined #ceph
[3:54] <Psi-jack> Well, blasted. :)
[4:02] * markbby (~Adium@168.94.245.3) Quit (Ping timeout: 480 seconds)
[4:08] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Quit: Leaving.)
[4:30] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:39] * grifferz (~andy@specialbrew.392abl.bitfolk.com) Quit (Remote host closed the connection)
[4:43] * portante is now known as portante|afk
[4:46] * portante|afk is now known as portante
[4:51] * portante is now known as portante|afk
[4:54] * grifferz (~andy@specialbrew.392abl.bitfolk.com) has joined #ceph
[5:00] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[5:00] * fireD (~fireD@93-139-180-220.adsl.net.t-com.hr) has joined #ceph
[5:03] * portante|afk is now known as portante
[5:06] * fireD1 (~fireD@93-142-205-151.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:08] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:19] * julian (~julianwa@125.69.104.220) has joined #ceph
[5:38] * AfC (~andrew@jim1020952.lnk.telstra.net) has joined #ceph
[5:38] * AfC (~andrew@jim1020952.lnk.telstra.net) Quit ()
[5:53] * LPG|2 (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[5:54] * LPG (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[5:58] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[6:01] * Qu310 (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Read error: Connection reset by peer)
[6:01] * Qu310 (~Qten@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[6:06] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[6:06] * ChanServ sets mode +v andreask
[6:07] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has left #ceph
[6:08] <Qu310> when trying to use ceph-deploy, I keep finding the admin.client key isnt being created when i deploy the mons, and i can't seem to create it as it needs a key so i can connect to the cluster so it can be created?
[6:10] * horsey (~horsey@27.34.251.98) has joined #ceph
[6:16] * sagelap (~sage@2600:1012:b01a:dbf5:59ef:e19:c22c:ec80) Quit (Ping timeout: 480 seconds)
[6:17] * julian (~julianwa@125.69.104.220) Quit (Read error: Connection reset by peer)
[6:18] * julian (~julianwa@125.69.104.220) has joined #ceph
[6:20] * yehudasa_ (~yehudasa@2602:306:330b:1410:ea03:9aff:fe98:e8ff) has joined #ceph
[6:21] * julian_ (~julianwa@125.69.104.220) has joined #ceph
[6:26] * julian (~julianwa@125.69.104.220) Quit (Ping timeout: 480 seconds)
[6:28] * zhangjf_zz2 (~zjfhappy@222.128.1.105) has joined #ceph
[7:08] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:15] * horsey_ (~horsey@203.92.58.165) has joined #ceph
[7:17] * horsey (~horsey@27.34.251.98) Quit (Ping timeout: 480 seconds)
[7:27] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:41] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[7:49] <dmick> Qu310: usually that means something's going wrong with the ceph-create-keys job
[7:50] <dmick> things to check: are the mon procs running? is ceph-create-keys logging something (to a standard place, like /var/log/syslog or /var/log/upstart/<mumble>)?
[7:53] <zhangbo> not found libcephfs-test.jar , when i rpmbuild the ceph, why?
[7:56] * tnt (~tnt@109.130.72.62) has joined #ceph
[8:04] <dmick> zhangbo: maybe this line from java/Makefile.am?
[8:04] <dmick> # build the tests if *both* --enable-cephfs-java and --with-debug were specifed
[8:08] * mschiff (~mschiff@81.92.22.210) has joined #ceph
[8:10] <Qu310> dmick: i aborted the ceph-deploy and went back to the chef cookbook, i think the issues i'm having are "linked" however, i keep getting in chef log, ERROR: execute[peer 10.100.96.40:6789] (ceph::mon line 87) had an error: Expected process to exit with [0], but received '152' ---- Begin output of ceph --admin-daemon '/var/run/ceph/ceph-mon.prbd01.asok' add_bootstrap_peer_hint 10.100.96.40:6789
[8:10] <Qu310> ----STDOUT: STDERR: read 32531 length from /var/run/ceph/ceph-mon.prbd01.asok failed with (104) Connection reset by peer
[8:11] <zhangbo> i have *both* --enable-cephfs-java and --with-debug in ceph.spec
[8:14] <zhangbo> i comment out "%{_javadir}/libcephfs-test.jar" in ceph.spec , other wise can not run rpmbuild
[8:16] <Qu310> if i try a ceph -s i get "2013-07-02 16:16:00.032546 7fa7af852700 0 -- :/25643 >> 10.100.96.42:6789/0 pipe(0x7fa7a0000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1).fault"
[8:16] * hujifeng (~hujifeng@221.226.39.82) has joined #ceph
[8:16] * haomaiwa_ (~haomaiwan@notes4.com) has joined #ceph
[8:17] * haomaiwang (~haomaiwan@117.79.232.249) Quit (Read error: Connection reset by peer)
[8:17] <hujifeng> hello zhangbo
[8:17] <zhangbo> hello hujifeng
[8:20] <hujifeng> anyone use chef cookbook to deploy ceph?
[8:21] <Qu310> hujifeng: haven't been able to get it to work myself yet
[8:21] <hujifeng> oh,bad news
[8:21] <hujifeng> hope you can
[8:22] <zhangbo> I had installed the gperftools-devel, but can't find tcmalloc when run rpmbuild , why?
[8:31] * tchmnkyz (~jeremy@0001638b.user.oftc.net) has joined #ceph
[8:32] <tchmnkyz> hey guys. i am seeing a issue after upgrading to 61.4
[8:32] <tchmnkyz> i keep loosing 2 OSD randomly
[8:32] <paravoid> davidz: I'm here now but I'm guessing you're in PDT and not around anymore? :)
[8:35] * LPG|2 (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[8:35] * LPG (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[8:37] <tchmnkyz> this was really strange i had to do a complete reboot of everything in my ceph cluster to stabalize things.
[8:37] <tchmnkyz> i went home at 5pm CDT today and things were fine. about 10PM CDT my ceph cluster freaked out.
[8:37] <tchmnkyz> after rebooting all of the OSD/MON's everything looks fine now
[8:38] <tchmnkyz> anyone else seen that on 61.4?
[8:51] * mschiff (~mschiff@81.92.22.210) Quit (Remote host closed the connection)
[8:57] * bergerx_ (~bekir@78.188.101.175) has joined #ceph
[8:58] * miniyo (~miniyo@0001b53b.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:58] * LPG (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[8:58] <davidz> paravoid: going to sleep soon.
[8:59] * LPG|2 (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[8:59] <paravoid> :)
[8:59] <davidz> paravoid: Did you check the versions of your daemons?
[8:59] <paravoid> I did, I replied to the bug report too
[8:59] <paravoid> 0.61.3 everywhere as reported
[9:00] <paravoid> the new OSDs were on the version I reported too; they're not running now but I previously copy/pasted the banner
[9:00] <paravoid> from the OSD booting up
[9:00] <paravoid> which has the timestamp, its version and pid
[9:04] <davidz> paravoid: I'll sleep on it and wake up with an answer. Well, maybe…. Strange that it is working for me. I only brought up a single new OSD today, though.
[9:05] <davidz> good night!
[9:07] <paravoid> rest well!
[9:08] * joshd1 (~joshd@2602:306:c5db:310:6996:4df7:648d:7b25) Quit (Ping timeout: 480 seconds)
[9:10] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[9:10] * LPG (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[9:10] * LPG|2 (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[9:11] * bwesemann (~bwesemann@2001:1b30:0:6:9c59:3517:55a:64af) has joined #ceph
[9:19] <vipr> joelio: I'm using the same, what does your cluster look like?
[9:19] <vipr> I'll do some tests today and see if I can verify your results
[9:28] * markit (~marco@151.78.74.112) has joined #ceph
[9:34] * tnt (~tnt@109.130.72.62) Quit (Ping timeout: 480 seconds)
[9:36] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:39] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:43] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:45] * haomaiwang (~haomaiwan@106.120.176.105) has joined #ceph
[9:45] * haomaiwa_ (~haomaiwan@notes4.com) Quit (Read error: Connection reset by peer)
[9:47] * leseb (~Adium@83.167.43.235) has joined #ceph
[9:48] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:48] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) has joined #ceph
[9:55] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[9:58] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[9:58] <niklas> Hi! What do I do, if ceph-deploy osd prepare does not finish?
[9:58] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:58] <niklas> I started it yesterday in the evening and killed it just now
[9:59] <niklas> ceph.log says:
[9:59] <niklas> 2013-07-01 18:11:02,723 ceph_deploy.osd DEBUG Preparing cluster ceph disks cs-bigfoot05:/dev/sdaa:
[9:59] <niklas> 2013-07-01 18:11:02,953 ceph_deploy.osd DEBUG Deploying osd to cs-bigfoot05
[9:59] <niklas> 2013-07-01 18:11:02,988 ceph_deploy.osd DEBUG Host cs-bigfoot05 is now ready for osd use.
[9:59] <niklas> 2013-07-01 18:11:02,988 ceph_deploy.osd DEBUG Preparing host cs-bigfoot05 disk /dev/sdaa journal None activate False
[10:00] <niklas> also after zapping the disks, ceph-deploy disk list still finds xfs on about half of them
[10:00] <niklas> even after I manually created a ext3 partition and zapping it again
[10:09] * LeaChim (~LeaChim@90.221.247.164) has joined #ceph
[10:09] * stacker666 (~stacker66@85.61.185.94) has joined #ceph
[10:09] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[10:12] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) Quit (Quit: Ex-Chat)
[10:18] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[10:18] <ofu> whois niklas
[10:18] <niklas> thats me :p
[10:18] <ofu> niklas: what kind of os are you using?
[10:18] <niklas> debian squeeze
[10:19] <ofu> ceph-deploy uses a non-root user and needs ssh-keys, if i understand correctly
[10:20] <niklas> no, wait. Its wheezy
[10:20] <ofu> can you ssh to all your nodes and login as user ceph (or whatever you choose)? does this user have sudo rights to do root-user stuff?
[10:20] <niklas> sec
[10:21] <niklas> yes
[10:22] <niklas> Also on the osd-host I can see something being done in htop
[10:22] <niklas> /bin/sh /usr/sbin/ceph-disk-prepare -- /dev/sdq
[10:22] <niklas> /usr/bin/python /usr/sbin/ceph-disk prepare -- /dev/sdq
[10:23] <niklas> but it just does not finish
[10:23] <ofu> ceph-deploy works for my mons, but not for my osds, i created all the osd stuff by hand
[10:23] <niklas> and does not take any cpu ram
[10:23] <ofu> partitions, mkfs, mount... and then I did ceph-osd --mkfs --mkjournal --mkkeys...
[10:25] * jks (~jks@3e6b5724.rev.stofanet.dk) Quit (Read error: Operation timed out)
[10:25] * gary (~gary@217.33.61.67) has joined #ceph
[10:25] <niklas> That seems to be rather ugly…
[10:26] <ofu> well, works for me.. and I know and understand what i am doing there
[10:26] <niklas> Is there a guide for manual osd configuration?
[10:29] <ofu> what I did can be found here: http://dedi3.fuckner.net/~ofu/ceph/
[10:31] <ofu> are you using hw raid?
[10:31] * miniyo (~miniyo@0001b53b.user.oftc.net) has joined #ceph
[10:31] <niklas> nope
[10:32] <niklas> well, as my raid controller can't do jbod, I have 45 raid0…
[10:32] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[10:32] <niklas> but no, not really
[10:32] <ofu> so you have a rnt bigfoot with megaraid controller?
[10:33] <niklas> yep
[10:33] <ofu> thats what I expected when I saw the name bigfoot ;-)
[10:34] <ofu> i dont like megaraid, they are totally confusing
[10:35] <niklas> Yep
[10:36] <ofu> i am using lsi hbas right now, but journals to SSDs is way too slow
[10:39] <gary> Folks, any idea why I'm seeing this clock skew message and is it anything to worry about? (the clocks are the same): sudo ceph -s health HEALTH_WARN clock skew detected on mon.xyzserver2 monmap e1: 2 mons at {xyzserver1=xx.xx.xx.xx:6789/0,xyxserver2=xx.xx.xx.xx:6789/0}, election epoch 6, quorum 0,1 xyzserver1,xyzserver2 osdmap e41: 6 osds: 6 up, 6 in pgmap v163: 248 pgs: 248 active+clean;
[10:39] <gary> 9921 bytes data, 209 MB used, 11158 GB / 11158 GB avail mdsmap e4: 1/1/1 up {0=xyzserver1=up:active}
[10:40] <tchmnkyz> i just wish someone could tell me why i am seeing poor performance and OSD drops on 61.4
[10:40] <tchmnkyz> everything was fine on 61.3
[10:40] <tchmnkyz> the update this am seems to have caused problems
[10:43] * sleinen (~Adium@2001:620:0:26:b541:ba15:2940:1caf) has joined #ceph
[10:47] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[10:55] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[10:57] <loicd> ccourtaut: good morning sir :-)
[10:57] <loicd> ccourtaut: I would very much appreciate a review of https://github.com/dachary/ceph/commit/d00a85fa54aee75a1fcb4a0a3ee9b0fa7768f539 if you have time
[11:06] <joelio> gary: you need ntp in sync
[11:07] <joelio> use server {name} iburst
[11:07] <joelio> if you want it to aquiesce sooner. Or set via ntpdate to force.
[11:07] <joelio> If cephx is as like kerberos as it says, timing is important
[11:22] <vipr> joelio: what are your dd commands, i'm getting high speeds
[11:22] <vipr> are you running the rbd client cache?
[11:22] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[11:26] <joelio> vipr: it's only for the first few minutes it seems. I think it's just the snapshots state being updated. I get about 120MB/s inside a VM with no caching
[11:27] <joelio> just a plaing dd if=/dev/zero of=test.bin bs=1M count=5000 (only 1GB RAM in VM's do an echo "3"> /proc/sys/vm/drop_caches)
[11:27] <joelio> guess I could add a direct flag to dd too
[11:28] <joelio> or use something better :)
[11:28] <joelio> it's not a real issue I guess. Just trying to understand
[11:38] <vipr> joelio: What does your ceph cluster look like?
[11:38] <joelio> 36 OSDs (6 per node, 6 nodes) 10Gbit
[11:39] <joelio> no ssd's (yet)
[11:39] <vipr> DDDAha
[11:39] <vipr> I mean
[11:39] <vipr> aha
[11:39] <vipr> nice
[11:39] <vipr> the ssd's will probably improve performance quite a bot
[11:39] <vipr> bit
[11:39] <vipr> i'm on 9 OSDs, 3 per machine, 1Gbit
[11:40] <joelio> I didn't have enough to make it worthwhile, so waiting until more budget available
[11:40] <vipr> getting around 40MB/s
[11:40] <vipr> The cluster isn't idle when testing however
[11:41] <vipr> do you have a dedicated backbone for replication?
[11:46] <jluis> gary, the monitors are very sensitive to clock skews
[11:46] <jluis> they can cause all sort of mayhem on the monitors
[11:47] <jluis> solution is what joelio said: get them synced
[11:47] <markit> joelio: I'm interested in the 10 gb stuff, what nic and switch do you use? I've a limited budget, hope you don't point me to a 20K$ one ;P 10 GB on copper or what?
[11:47] * stacker666 (~stacker66@85.61.185.94) Quit (Ping timeout: 480 seconds)
[11:49] <markit> vipr: isn't 40MB/s a bit slow? 1Gbit should saturate at 80-90MB/s, what is your bottleneck? what disks do you use?
[11:49] <markit> I'm trying to understand if JBOD sata drivers and 1Gbit can run fast enough
[12:03] <vipr> markit: i'm using 7200RPM seagate disks and intel ssds
[12:04] <vipr> Mind that the cluster is not idle however
[12:04] <vipr> Constant backups are being written to it
[12:04] <vipr> so I guess that's causing the low performance
[12:09] * zhangbo (~zhangbo@221.226.39.82) Quit (Remote host closed the connection)
[12:17] <markit> vipr: well sure!
[12:18] <markit> don't you use bonding and 2 x 1gbit interface? I mean, did you tried and had problems?
[12:26] <joelio> markit: We use Intel 82599EB NIC's with TwinAX connected to a nexus fabric extender. The extender itself is cheap and gives us lots of 10g ports, but the 'brains' is quite pricey. You can get reasonably priced 10G switches now - as we already had infra, this was the cheapeest route
[12:41] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[12:42] * LiRul (~lirul@91.82.105.2) has joined #ceph
[12:43] <LiRul> hi
[12:43] <LiRul> is there any method to run two separate radosgw (with different .rgw* pools) on same ceph cluster?
[12:44] * zhangjf_zz2 (~zjfhappy@222.128.1.105) Quit (Remote host closed the connection)
[12:51] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[12:55] <zapotah_> just make things clear, if you use the ceph network filesystem and mount it on a host, is the ceph filesystem cluster aware in a sense that if you mount it on host a from osd a and on host b from osd b and do simultaneous writes to the filesystem, there wont be corruption?
[12:59] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Read error: Connection reset by peer)
[12:59] <joelio> zapotah_: definte simultaneous writes. It's a POSIX compliant filesystem, so I assume honours file locking.
[13:00] * joelio not an MDS user -< caveat
[13:00] <vipr> markit: haven't set up bonding yet
[13:00] <vipr> I'm gonno check if the 1gbit link is saturated
[13:05] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[13:14] <niklas> I got my cluster up and running, but now it tells me "health HEALTH_WARN 4 pgs degraded; 192 pgs stuck unclean", and does not seem to do anything about it…
[13:15] <niklas> pgmap v529: 192 pgs: 174 active, 14 active+remapped, 4 active+degraded;
[13:16] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[13:17] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[13:17] <ofu> niklas: wait if it goes away,
[13:17] <ofu> or you can force rebuilding the pgs
[13:17] <markit> vipr: do you use Proxmox pve by chance?
[13:17] <niklas> ofu: how long should I wait?
[13:17] <niklas> on my VM-Testcluster it took about 3 Minutes or so
[13:19] <ofu> a few minutes should be long enough, there is no data in ther
[13:19] <ofu> e
[13:19] <ofu> try forcing it with: for i in `ceph pg dump |grep degraded | awk '{print $1}'`; do ceph pg force_create_pg $i; done
[13:19] <niklas> thats what I thought
[13:28] <vipr> markit: No, just a ceph test cluster and a debian machine with qemu
[13:37] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[13:41] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[13:43] * Machske2 (~bram@105.227-241-81.adsl-static.isp.belgacom.be) has joined #ceph
[13:43] * hujifeng (~hujifeng@221.226.39.82) Quit (Quit: Leaving)
[13:48] <markit> vipr: do you use the ssd as journal and OS disk? I mean, did you installed the OS on ssd?
[13:57] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Read error: Operation timed out)
[13:57] <niklas> ofu: tried to force_create_pg for one degraded page:
[13:57] <niklas> health HEALTH_WARN 3 pgs degraded; 1 pgs stuck inactive; 192 pgs stuck unclean
[13:57] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[14:02] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[14:05] * markbby (~Adium@168.94.245.2) has joined #ceph
[14:06] * nhm (~nhm@65-128-142-169.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[14:06] * mschiff (~mschiff@tmo-111-83.customers.d1-online.com) has joined #ceph
[14:09] * nhm (~nhm@184-97-193-106.mpls.qwest.net) has joined #ceph
[14:12] <niklas> pgmap v530: 192 pgs: 1 creating, 174 active, 14 active+remapped, 3 active+degraded
[14:12] <niklas> stuck for 20 minutes now
[14:18] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[14:22] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:35] <vipr> markit: No, ssd's are solely being used as journals
[14:39] <gary> sudo ceph -s = health OK now :-). The firewall was blocking the ntp port! Thanks all.
[14:40] <joelio> ntpq -p is your friend ;)
[14:41] * dosaboy (~dosaboy@host86-164-218-7.range86-164.btcentralplus.com) Quit (Read error: Operation timed out)
[14:42] * dosaboy (~dosaboy@host86-164-137-63.range86-164.btcentralplus.com) has joined #ceph
[14:47] * dosaboy_ (~dosaboy@host86-161-166-222.range86-161.btcentralplus.com) has joined #ceph
[14:47] * dosaboy_ (~dosaboy@host86-161-166-222.range86-161.btcentralplus.com) Quit ()
[14:47] * dosaboy_ (~dosaboy@host86-161-166-222.range86-161.btcentralplus.com) has joined #ceph
[14:50] * dosaboy (~dosaboy@host86-164-137-63.range86-164.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[14:53] * joshd1 (~joshd@108-93-176-49.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[14:54] * TiCPU (~jeromepou@190-130.cgocable.ca) Quit (Quit: Ex-Chat)
[14:57] * stacker666 (~stacker66@85.61.185.94) has joined #ceph
[14:59] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Quit: Leaving)
[15:00] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:13] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Remote host closed the connection)
[15:19] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[15:21] * julian_ (~julianwa@125.69.104.220) Quit (Quit: afk)
[15:23] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:24] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Remote host closed the connection)
[15:25] * Maskul (~Maskul@host-89-241-165-148.as13285.net) has joined #ceph
[15:28] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:29] * mikedawson (~chatzilla@c-68-58-243-29.hsd1.sc.comcast.net) has joined #ceph
[15:35] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[15:36] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[15:46] * dosaboy (~dosaboy@host86-164-222-232.range86-164.btcentralplus.com) has joined #ceph
[15:47] * drokita (~drokita@199.255.228.128) has joined #ceph
[15:52] * dosaboy_ (~dosaboy@host86-161-166-222.range86-161.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[15:52] * aliguori (~anthony@32.97.110.51) has joined #ceph
[15:54] * portante is now known as portante|afk
[15:55] * portante|afk is now known as portante
[15:59] * LPG (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[16:00] * LPG|2 (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[16:04] <markit> If I use simple sata drivers, is possible to "hot swap" an HD if I buy an "hot swap" bay, or there is something else to have/is not possible?
[16:05] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[16:06] <janos> most sata drives and mobos these days are hotswap-ready
[16:06] <janos> though the bus won't always recognize
[16:06] <janos> i've swapped in/out sata drives live without a bay
[16:07] <janos> looking for the echo to tell the machine to rescan
[16:07] <janos> of course this is all at your own risk ;)
[16:08] <janos> if i'm not mistaken
[16:08] <janos> echo "- - -" > /sys/class/scsi_host/hostX/scan
[16:08] <janos> you'd need to find out which bus - hostX
[16:09] <janos> i was adding an osd disk to a machine and didn't feel like cycling the machine
[16:12] * portante is now known as portante|afk
[16:13] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[16:16] * portante|afk is now known as portante
[16:20] * Machske2 (~bram@105.227-241-81.adsl-static.isp.belgacom.be) Quit (Quit: Leaving)
[16:23] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[16:27] * illya (~illya_hav@21-155-135-95.pool.ukrtel.net) has joined #ceph
[16:27] <illya> hi
[16:27] <illya> I have question about ceph deployment with chef
[16:28] * mschiff (~mschiff@tmo-111-83.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[16:28] <illya> based on these cookbooks https://github.com/ceph/ceph-cookbooks
[16:30] <illya> in my generated config
[16:30] <illya> auth section is missing
[16:30] <illya> and I just wanted to double check this
[16:31] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[16:31] * ChanServ sets mode +v andreask
[16:31] * mschiff (~mschiff@tmo-111-83.customers.d1-online.com) has joined #ceph
[16:32] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has left #ceph
[16:35] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[16:35] * ChanServ sets mode +v andreask
[16:35] <joelio> illya: there are sane defaults.. ie. use cephx
[16:36] <joelio> so if not explicitly defined, it uses defaults (in docs) http://ceph.com/docs/master/rados/configuration/auth-config-ref/
[16:39] <skm> I have been following the directions on this page in order to setup radosgw http://ceph.com/docs/next/start/quick-rgw/ everything seems to work fine until I get to this step:
[16:39] <skm> root@ceph1:~# sudo ceph -k /etc/ceph/ceph.client.admin.keyring auth add client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway
[16:39] <skm> no valid command found; 10 closest matches:
[16:39] <skm> auth add <entity> <caps> [<caps>...]
[16:39] <skm> Error EINVAL: invalid command
[16:40] <skm> I am using the testing branch...has there been a change to this step since that document was written?
[16:40] <illya> @joelio - thx
[16:40] <cephalobot> illya: Error: "joelio" is not a valid command.
[16:41] <illya> joelio - thx
[16:41] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[16:42] <joelio> I am a man.. not a command! :)
[16:42] <illya> sorry
[16:42] <joelio> haha, it's fine :)
[16:42] <illya> using IRC for a first time during last 4-5 years
[16:43] <joelio> illya: start typing name and press tab, should autocomplete depending on client
[16:44] <illya> works
[16:45] * markl (~mark@tpsit.com) has joined #ceph
[16:46] * markl (~mark@tpsit.com) Quit ()
[16:46] * markl (~mark@tpsit.com) has joined #ceph
[16:48] <markit> janos: thanks
[16:49] * yehudasa_ (~yehudasa@2602:306:330b:1410:ea03:9aff:fe98:e8ff) Quit (Ping timeout: 480 seconds)
[16:50] * portante is now known as portante|afk
[16:50] * horsey_ (~horsey@203.92.58.165) Quit (Ping timeout: 480 seconds)
[16:50] * vata (~vata@2607:fad8:4:6:3505:9b64:879e:bf93) has joined #ceph
[16:51] <tnt> If I change the crush_ruleset of a pool, will it reorganize data ?
[16:51] * portante|afk is now known as portante
[16:51] <paravoid> yes
[16:51] <paravoid> afaik
[16:52] <LiRul> i built a ceph cluster with ceph-deploy. ceph.conf files are very thin, there are no mon, osd or mds entries at all. init script in ubuntu does not support this because i can't stop / restart any daemon types. is this a known issue?
[16:52] <tnt> paravoid: and during the move, data will stay available right ?
[16:52] <tnt> anybody ever confirm that ?
[16:53] <joelio> LiRul: It uses upstart now
[16:53] <joelio> stop ceph-all
[16:53] <joelio> start ceph-all
[16:53] <joelio> (if you're using a later version of ubuntu that is...)
[16:53] <LiRul> ahh joelio thanks
[16:54] <LiRul> 12.04 lts
[16:54] <LiRul> i hate upstart :)
[16:54] <joelio> does 12.04 use upstart?
[16:54] <LiRul> yes
[16:55] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[16:55] <LiRul> root@ceph3:~# restart ceph-mon-all
[16:55] <LiRul> ceph-mon-all start/running
[16:55] <LiRul> this works
[16:55] <LiRul> joelio: thx
[16:55] <joelio> the annoying thing is that there are still stuff in /etc/init.d - on my previous ceph install, with mkcephfs, I could do all teh restart via service and/or /etc/init.d/
[16:55] <joelio> LiRul: n/p
[16:57] * sleinen (~Adium@2001:620:0:26:b541:ba15:2940:1caf) Quit (Quit: Leaving.)
[16:57] * sleinen (~Adium@130.59.94.253) has joined #ceph
[17:05] * sleinen (~Adium@130.59.94.253) Quit (Ping timeout: 480 seconds)
[17:06] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[17:08] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[17:09] <skm> Anyone have an idea about what I could be doing wrong here?
[17:09] <skm> root@ceph1:~# ceph -k /etc/ceph/ceph.keyring auth add client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway
[17:09] <skm> 2013-07-02 11:00:51.425017 7f7318eac700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication
[17:09] <skm> 2013-07-02 11:00:51.425050 7f7318eac700 0 librados: client.admin initialization error (2) No such file or directory
[17:09] <skm> Error connecting to cluster: ObjectNotFound
[17:10] <skm> I have followed the directions step by step (twice) using this doc: http://ceph.com/docs/next/start/quick-rgw/
[17:11] <skm> I am just testing this out at this point...is there any problem with having a mon and radosgw on the same server?
[17:21] * sleinen (~Adium@2001:620:0:25:c1cd:368:9cd2:4dcb) has joined #ceph
[17:25] <joelio> skm: the error says missing keyring. Can you run normal ceph commands? ceph -s for example?
[17:25] <skm> yes
[17:25] <joelio> without using -k ?
[17:25] <skm> root@ceph1:/etc/ceph# ceph -s
[17:25] <skm> cluster 299615fb-765a-4225-ab83-97719549eb4d
[17:25] <skm> health HEALTH_OK
[17:25] <skm> monmap e1: 3 mons at {ceph1=172.31.2.101:6789/0,ceph2=172.31.2.102:6789/0,ceph3=172.31.2.103:6789/0}, election epoch 8, quorum 0,1,2 ceph1,ceph2,ceph3
[17:25] <skm> osdmap e25: 3 osds: 3 up, 3 in
[17:25] <skm> pgmap v880: 390 pgs: 390 active+clean; 9336 MB data, 18632 MB used, 362 GB / 380 GB avail
[17:25] <skm> mdsmap e1: 0/0/1 up
[17:25] <skm> I have rbd up and running fine
[17:25] <joelio> ok, so why are you using -'k /etc/ceph/ceph.keyring' in your command above?
[17:26] <joelio> as the key is already present, no need to redefine
[17:27] <skm> ok..I was just following the docs...there is something wrong though:
[17:27] <skm> root@ceph1:/etc/ceph# ceph auth list
[17:27] <skm> installed auth entries:
[17:27] <skm> osd.0
[17:27] <skm> key: AQBx4NFRQNTHABAAsz0v2oM8a0T1H5aDL5LfCA==
[17:27] <skm> caps: [mon] allow rwx
[17:27] <skm> caps: [osd] allow *
[17:27] <skm> osd.1
[17:27] <skm> key: AQB44NFRwPfMNBAAylY3lHVw53XkGyorMKuhAg==
[17:27] <skm> caps: [mon] allow rwx
[17:27] <skm> caps: [osd] allow *
[17:27] <skm> osd.2
[17:27] <skm> key: AQCA4NFR8AK3ChAAI96tg6E4OY8rV0jHrOiw8Q==
[17:27] <skm> caps: [mon] allow rwx
[17:27] <skm> caps: [osd] allow *
[17:27] <skm> client.admin
[17:27] <skm> key: AQA33tFRYBMwORAAOFvOvMJakZk1FGbXspSJwA==
[17:27] <skm> caps: [mds] allow
[17:27] <joelio> pastebin!!!
[17:27] <skm> caps: [mon] allow *
[17:27] <skm> caps: [osd] allow *
[17:27] <skm> client.bootstrap-mds
[17:27] <skm> key: AQA43tFR2IIBFxAAbLri8LaSunGw6LX22CODxw==
[17:27] <joelio> pastebin!!!
[17:27] <joelio> pastebin!!!
[17:27] <skm> caps: [mon] allow profile bootstrap-mds
[17:27] <skm> client.bootstrap-osd
[17:27] <skm> key: AQA43tFRgNk8ChAA3uMc0AJsBWuQgoGVYYBtcA==
[17:27] <skm> caps: [mon] allow profile bootstrap-osd
[17:27] <joelio> uff.. PASTEBIN!!!!
[17:27] <joelio> :)
[17:27] <tnt> omg ...
[17:28] <skm> shouldn't I see some rados gateway info in there...sorry :-(
[17:29] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[17:29] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Read error: Operation timed out)
[17:30] <joelio> skm: and there is a key in the file /etc/ceph/keyring.radosgw.gateway
[17:30] <joelio> ?
[17:30] <skm> yes...a key...and two other lines for caps mon and caps osd
[17:31] * yehudasa_ (~yehudasa@mc22736d0.tmodns.net) has joined #ceph
[17:32] <joelio> so, import the key then - ceph auth add client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway
[17:34] * jebba (~aleph@2601:1:a300:8f:f2de:f1ff:fe69:6672) Quit (Quit: Leaving.)
[17:34] * LiRul (~lirul@91.82.105.2) Quit (Quit: Leaving.)
[17:36] <skm> can you be more specific about what you mean by importing the key? I might have already done that...however when i run that command I get this: http://pastebin.com/bRzquPvb
[17:37] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:38] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[17:40] <joelio> skm: one sec, let me reproduce
[17:40] <skm> thank you
[17:41] <skm> I am using the testing branch btw...I mentioned that above...but I just wanted to make sure you saw that
[17:41] <joelio> ahh, ok, then things must have changed
[17:42] <joelio> root@vm-ds-01:~# ceph auth add client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway
[17:42] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:42] <joelio> 2013-07-02 16:41:45.264813 7fa1d7e25780 -1 read 119 bytes from /etc/ceph/keyring.radosgw.gateway
[17:42] <joelio> added key for client.radosgw.gateway
[17:42] <joelio> skm: one for the devs I'm afraid.. must have been changes
[17:42] <skm> ok thank you for your help
[17:43] <joelio> no problem
[17:43] <skm> maybe I'll just redeploy with stable if I don't get it worked out in the next little bit
[17:44] <joelio> I would stick to stable whereever possible. Only go to testing if you need to or feel slighly masochistic :)
[17:44] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[17:44] * yehudasa_ (~yehudasa@mc22736d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[17:45] <skm> haa...ok...sounds like a plan...I am going to send an email to the mailing list...maybe it is a bug of some sort
[17:48] <joelio> worht a shout
[17:48] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:49] * tnt (~tnt@109.130.72.62) has joined #ceph
[17:50] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:51] * markit (~marco@151.78.74.112) Quit (Quit: Konversation terminated!)
[17:51] * dosaboy_ (~dosaboy@host86-164-220-6.range86-164.btcentralplus.com) has joined #ceph
[17:53] * jebba (~aleph@70-90-113-25-co.denver.hfc.comcastbusiness.net) has joined #ceph
[17:54] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[17:54] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:55] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[17:56] * Maskul (~Maskul@host-89-241-165-148.as13285.net) Quit (Quit: Leaving)
[17:56] * TiCPU (~jeromepou@190-130.cgocable.ca) has joined #ceph
[17:57] * dosaboy__ (~dosaboy@host86-145-216-237.range86-145.btcentralplus.com) has joined #ceph
[17:57] * dosaboy__ (~dosaboy@host86-145-216-237.range86-145.btcentralplus.com) Quit ()
[17:57] * dosaboy (~dosaboy@host86-164-222-232.range86-164.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[17:57] * dosaboy (~dosaboy@host86-145-216-237.range86-145.btcentralplus.com) has joined #ceph
[18:01] * dosaboy_ (~dosaboy@host86-164-220-6.range86-164.btcentralplus.com) Quit (Read error: Operation timed out)
[18:03] * dosaboy_ (~dosaboy@host86-163-9-134.range86-163.btcentralplus.com) has joined #ceph
[18:04] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:05] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Read error: Operation timed out)
[18:06] * dosaboy (~dosaboy@host86-145-216-237.range86-145.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[18:12] * sagelap (~sage@2600:1012:b016:4ef4:59ef:e19:c22c:ec80) has joined #ceph
[18:16] * danieagle (~Daniel@186.214.58.66) has joined #ceph
[18:18] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[18:22] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[18:32] <illya> hi
[18:32] <illya> can somebody please comment my monitor status
[18:32] <illya> { "name": "bb927c63-818c-4e68-8421-1b0207754924",
[18:32] <illya> "rank": -1,
[18:32] <illya> "state": "probing",
[18:32] <illya> "election_epoch": 0,
[18:32] <illya> "quorum": [],
[18:32] <illya> "outside_quorum": [],
[18:32] <illya> "extra_probe_peers": [
[18:32] <illya> "165.225.156.49:6789\/0"],
[18:32] <illya> "monmap": { "epoch": 0,
[18:32] <illya> "fsid": "423b1e92-c631-420e-bad7-318a5558c28d",
[18:32] <illya> "modified": "0.000000",
[18:33] <illya> "created": "0.000000",
[18:33] <illya> "mons": [
[18:33] <illya> { "rank": 0,
[18:33] <illya> "name": "{bb927c63-818c-4e68-8421-1b0207754924}",
[18:33] <illya> "addr": "0.0.0.0:0\/1"}]}}
[18:33] <illya> I'm trying to have 1 mon in a cluster
[18:33] <illya> and I'm not sure
[18:33] <illya> that
[18:33] <illya> "rank": -1,
[18:33] <illya> "state": "probing"
[18:33] <illya> is good
[18:34] <tnt> seriously people ... PASTEBIN
[18:35] * dosaboy (~dosaboy@host86-161-202-253.range86-161.btcentralplus.com) has joined #ceph
[18:35] <illya> sorry
[18:35] <illya> http://pastebin.com/sY9m7FUy
[18:36] <illya> (lost my IRC skills :()
[18:37] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[18:38] * dosaboy_ (~dosaboy@host86-163-9-134.range86-163.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[18:41] <gregaf> illya: it's trying to reach the "extra probe peer" before it starts up, iirc
[18:42] * dosaboy_ (~dosaboy@host86-161-242-57.range86-161.btcentralplus.com) has joined #ceph
[18:43] * yehudasa_ (~yehudasa@2602:306:330b:1410:ea03:9aff:fe98:e8ff) has joined #ceph
[18:48] * dosaboy (~dosaboy@host86-161-202-253.range86-161.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[18:49] <illya> the config itself was a result of chef deployment
[18:49] <illya> and command
[18:49] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[18:49] <illya> ceph --admin-daemon '/var/run/ceph/ceph-mon.#{node['hostname']}.asok' add_bootstrap_peer_hint #{addr}
[18:49] <illya> was applied by chef
[18:51] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[18:51] <gregaf> probably something to do with having multiple network interfaces on the host and the monitor binding to the wrong one, then
[18:52] <gary> Hi, my cluster is running OK now :-), but I'm still having issues connecting to the RADOS g/w using Cyberduck (I'm getting an 'unrecognised SSL message, plaintext connection?. Does anyone know if the Server field in the Cyberduck bookmark should be xyzserver1.domain.com or if I need a /something at the end of the URL. Thanks
[18:53] <Psi-jack> SO, curious, anyone here running Ceph on CentOS 6.4?
[18:54] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[18:54] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[18:56] <grepory> Psi-jack: Working on getting it up and running on SL 6.4… almost the same thing.
[18:58] * sagelap (~sage@2600:1012:b016:4ef4:59ef:e19:c22c:ec80) Quit (Ping timeout: 480 seconds)
[18:58] * joshd1 (~joshd@108-93-176-49.lightspeed.irvnca.sbcglobal.net) Quit (Quit: Leaving.)
[19:01] <Psi-jack> gregaf: Oh? Working on it? OKay. Sounds good. I've been considering it, but what I'm aiming for is people that actually do run it, because I'm looking for specifics related to it. If they use stock kernel, or elrepo's kernel-ml kernel instead. How well it's been running for what duration of time. I've been running Ceph on Arch for 7 months now, and it's been rock solid, but Arch isn't so rock solid. :)
[19:01] <gregaf> wrong greg :)
[19:06] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[19:06] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[19:08] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[19:13] <grepory> Psi-jakc: 0.61.3, elrepo's kernel-ml 3.9, it's been up and in quorum with 54TB of disk for about 4 weeks—since the cluster was built. we're using it primarily for block storage w/ cinder and image storage with glance.
[19:14] <grepory> i had a lot of difficulty getting the rbd kernel module to work for testing, so eventually abandoned that altogether, since it's not a very accurate test of throughput and latency from cinder volumes in openstack vms.
[19:14] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Read error: Operation timed out)
[19:16] * portante is now known as portante|afk
[19:17] * Machske2 (~bram@d5152D87C.static.telenet.be) has joined #ceph
[19:19] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[19:21] * jluis is now known as joao
[19:22] * andrei (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Read error: Operation timed out)
[19:22] * mschiff (~mschiff@tmo-111-83.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[19:28] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[19:30] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[19:31] <Psi-jack> heh
[19:31] <Psi-jack> grepory: I see. Cuttlefish and all, too, I see.
[19:32] <Psi-jack> My Arch cluster's been running, as I said, 7 months on Arch, but it's on Bobtail.. I need to keep it Bobtail and in-place change distro, by taking one out of quorum and rebuilding it, then putting it back in, same data, just let it rebuild, then start on the next.
[19:33] <Psi-jack> I'm just curious mostly I guess on stability. I know Inktank supports Ubuntu primarily and extensively test it as such, but I'm trying to keep away from Ubuntu as much as I can.
[19:33] <Psi-jack> (despite me supporting it heavily in the past)
[19:35] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[19:35] * madkiss (~madkiss@144-239.197-178.cust.bluewin.ch) has joined #ceph
[19:37] * alram (~alram@38.122.20.226) has joined #ceph
[19:38] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Read error: Operation timed out)
[19:38] <Psi-jack> Either way, I suppose anything's better than Arch. Especially since it seems that CentOS's kernel supports what ceph needs, even if glibc doesn't.
[19:38] * horsey (~horsey@122.166.181.42) has joined #ceph
[19:38] <Psi-jack> Thankfully that issue seems to have been fixed in 0.55+ regarding glibc needing to support syncfs(2)
[19:40] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:44] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[19:44] * nwat (~Adium@eduroam-251-132.ucsc.edu) has joined #ceph
[19:45] * gary (~gary@217.33.61.67) Quit ()
[19:45] <loicd> sjust: https://github.com/ceph/ceph/pull/388 is updated. It is simpler indeed ... :-)
[19:50] * Machske2 (~bram@d5152D87C.static.telenet.be) Quit (Quit: This computer has gone to sleep)
[19:51] * rturk-away is now known as rturk
[19:53] * mschiff (~mschiff@81.92.22.210) has joined #ceph
[19:54] <sjust> loicd: merged!
[19:54] <loicd> cool :-)
[19:56] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[19:56] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Read error: Operation timed out)
[20:01] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[20:03] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[20:16] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[20:16] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[20:17] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[20:23] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[20:27] * illya (~illya_hav@21-155-135-95.pool.ukrtel.net) Quit (Ping timeout: 480 seconds)
[20:29] * jks (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[20:31] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[20:31] * jluis (~JL@89.181.156.133) has joined #ceph
[20:35] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[20:36] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[20:37] * horsey (~horsey@122.166.181.42) Quit (Ping timeout: 480 seconds)
[20:37] * joao (~JL@89-181-149-236.net.novis.pt) Quit (Ping timeout: 480 seconds)
[20:39] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[20:42] * kr0t (~kr0t@178.172.139.240) has left #ceph
[20:45] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[20:46] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Read error: No route to host)
[20:48] * mikedawson_ (~chatzilla@c-68-58-243-29.hsd1.sc.comcast.net) has joined #ceph
[20:51] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[20:53] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[20:53] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[20:54] * buck1 (~buck@bender.soe.ucsc.edu) has joined #ceph
[20:54] * mikedawson (~chatzilla@c-68-58-243-29.hsd1.sc.comcast.net) Quit (Ping timeout: 480 seconds)
[20:57] * danieagle (~Daniel@186.214.58.66) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[20:57] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit ()
[20:57] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[20:58] * Cube (~Cube@12.248.40.138) has joined #ceph
[20:59] * Midnightmyth_ (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[20:59] * Midnightmyth_ (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Remote host closed the connection)
[21:00] * bergerx_ (~bekir@78.188.101.175) Quit (Quit: Leaving.)
[21:01] * oddomatik (~Adium@12.248.40.138) has joined #ceph
[21:04] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[21:06] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[21:08] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[21:11] * portante|afk is now known as portante
[21:11] * LPG|2 (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Read error: Operation timed out)
[21:12] * LPG (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Read error: Operation timed out)
[21:14] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Quit: You think I'm not online. But I'm always here. Even if I'm not typing. I'm here. Reading. Judging.)
[21:15] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[21:17] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[21:21] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Read error: Operation timed out)
[21:22] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[21:23] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[21:29] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Remote host closed the connection)
[21:35] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[21:35] * LPG (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[21:36] * LPG|2 (~kvirc@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[21:39] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit (Quit: Leaving.)
[21:41] * sjm (~oftc-webi@c73-107.rim.net) has joined #ceph
[21:45] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[21:46] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[21:47] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[21:55] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[21:57] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has left #ceph
[21:58] * illya (~illya_hav@9-158-135-95.pool.ukrtel.net) has joined #ceph
[21:58] <illya> hi
[21:58] <illya> I have question about "NAMES ONLY" config from
[21:58] <illya> http://ceph.com/docs/next/dev/mon-bootstrap/
[21:59] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Read error: Operation timed out)
[22:00] <madkiss> Important Check the key output. Sometimes radosgw-admin generates a key with an escape (\) character, and some clients do not know how to handle escape characters.
[22:00] <madkiss> http://ceph.com/docs/master/radosgw/config/ — and there was "\/" in the middle of the key every time
[22:01] <madkiss> is this a standard now? (i did 10 attempts)
[22:01] <madkiss> or is this just again a VM being unable to produce proper entropy?
[22:02] <madkiss> oh, now it worked. hm.
[22:05] * fridudad_ (~oftc-webi@p5B09DD6A.dip0.t-ipconnect.de) has joined #ceph
[22:05] <dmick> illya: yes?
[22:05] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[22:06] * mschiff (~mschiff@81.92.22.210) Quit (Remote host closed the connection)
[22:07] * dosaboy (~dosaboy@host86-163-35-196.range86-163.btcentralplus.com) has joined #ceph
[22:11] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[22:11] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[22:12] * dosaboy_ (~dosaboy@host86-161-242-57.range86-161.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[22:15] <illya> so question is in next
[22:16] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:16] <illya> sorry for delay was reading sources :)
[22:16] <illya> I'm staring cloud with names only
[22:16] <illya> and then doing add_bootstrap_peer_hint
[22:17] <illya> but name did no match with address
[22:17] <illya> and cloud is not up
[22:17] <illya> this could happen because we are using dynamic cloud node
[22:17] * portante is now known as portante|afk
[22:18] <illya> without good ip_to_name, name_to_ip configuration
[22:18] * madkiss (~madkiss@144-239.197-178.cust.bluewin.ch) Quit (Quit: Leaving.)
[22:19] <illya> so I'd like to know any details about what should be configured on OS / net level
[22:19] <illya> to have all this magic working
[22:20] <dmick> everything is predicated on having the short names of machines in the cluster mapping to real usable IP addresses
[22:21] <dmick> whichever nameservice you're using
[22:21] <illya> not using so far, manipulating only /etc/hostname and /etc/hosts
[22:21] <illya> :(
[22:22] <dmick> that's a nameservice, just a simple (files-based) one
[22:25] * miniyo (~miniyo@0001b53b.user.oftc.net) Quit (Read error: Connection reset by peer)
[22:25] * dosaboy_ (~dosaboy@host86-164-222-119.range86-164.btcentralplus.com) has joined #ceph
[22:26] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[22:28] * zapotah (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[22:28] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[22:30] <illya> seems works
[22:30] <illya> thx
[22:30] * dosaboy (~dosaboy@host86-163-35-196.range86-163.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[22:30] <illya> and have a good day / night
[22:30] * dosaboy (~dosaboy@host86-145-218-64.range86-145.btcentralplus.com) has joined #ceph
[22:31] <dmick> gl
[22:33] * illya (~illya_hav@9-158-135-95.pool.ukrtel.net) has left #ceph
[22:34] * lxo is now known as lxo_away_at_fisl
[22:34] * dosaboy_ (~dosaboy@host86-164-222-119.range86-164.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[22:35] * lxo_away_at_fisl is now known as lxi
[22:35] * lxi is now known as lxo_away_at_fisl
[22:35] * dosaboy_ (~dosaboy@host86-161-204-9.range86-161.btcentralplus.com) has joined #ceph
[22:35] * dosaboy (~dosaboy@host86-145-218-64.range86-145.btcentralplus.com) Quit (Read error: Operation timed out)
[22:38] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) has joined #ceph
[22:39] * LeaChim (~LeaChim@90.221.247.164) Quit (Ping timeout: 480 seconds)
[22:39] <sagewk> joshd: can you take a look at wip-5493
[22:40] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:40] * miniyo (~miniyo@0001b53b.user.oftc.net) has joined #ceph
[22:41] <joshd> sagewk: looks fine to me. hopefully that's the last overlooked part that's common to various op types
[22:42] <sagewk> yeah really! :( let me grep for LingerOp and see if anythign else is obvious
[22:42] * LeaChim (~LeaChim@90.221.247.164) has joined #ceph
[22:43] <sagewk> looks ok. thanks!
[22:43] * fridudad_ (~oftc-webi@p5B09DD6A.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[22:43] <joshd> yw
[22:44] * mschiff (~mschiff@81.92.22.210) has joined #ceph
[22:45] * lxo_away_at_fisl (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[22:47] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:54] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[22:55] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[23:01] * sleinen (~Adium@2001:620:0:25:c1cd:368:9cd2:4dcb) Quit (Quit: Leaving.)
[23:06] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) has joined #ceph
[23:07] * houkouonchi-home (~linux@pool-108-38-63-48.lsanca.fios.verizon.net) has joined #ceph
[23:08] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has left #ceph
[23:09] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) Quit (Ping timeout: 480 seconds)
[23:12] * zapotah_ (~zapotah@dsl-hkibrasgw2-50dfdb-234.dhcp.inet.fi) has joined #ceph
[23:15] * dosaboy (~dosaboy@host86-150-243-95.range86-150.btcentralplus.com) has joined #ceph
[23:16] * jebba (~aleph@70-90-113-25-co.denver.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[23:17] * dosaboy_ (~dosaboy@host86-161-204-9.range86-161.btcentralplus.com) Quit (Read error: Operation timed out)
[23:19] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[23:21] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[23:22] * dosaboy_ (~dosaboy@host86-164-80-209.range86-164.btcentralplus.com) has joined #ceph
[23:26] * jebba (~aleph@2601:1:a300:8f:f2de:f1ff:fe69:6672) has joined #ceph
[23:28] * dosaboy (~dosaboy@host86-150-243-95.range86-150.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[23:28] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[23:34] <sagewk> dmick: can you look at wip-5490 ?
[23:35] * dosaboy_ (~dosaboy@host86-164-80-209.range86-164.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[23:42] * dosaboy (~dosaboy@host86-161-164-10.range86-161.btcentralplus.com) has joined #ceph
[23:42] <dmick> sagewk: yeah, I think that's right
[23:43] <sagewk> cool thanks
[23:43] <dmick> heh. I just typed "grep find *"
[23:43] <dmick> who will find the finders?
[23:47] * BillK (~BillK@124-169-221-120.dyn.iinet.net.au) has joined #ceph
[23:47] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:48] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Remote host closed the connection)
[23:51] * BillK (~BillK@124-169-221-120.dyn.iinet.net.au) Quit ()
[23:54] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) Quit (Remote host closed the connection)
[23:54] * BillK (~BillK@124-169-221-120.dyn.iinet.net.au) has joined #ceph
[23:54] * markbby (~Adium@168.94.245.2) has joined #ceph
[23:54] * drokita (~drokita@199.255.228.128) Quit (Read error: Operation timed out)
[23:55] * BillK (~BillK@124-169-221-120.dyn.iinet.net.au) Quit ()
[23:56] * BillK (~BillK@124-169-221-120.dyn.iinet.net.au) has joined #ceph
[23:57] * BillK (~BillK@124-169-221-120.dyn.iinet.net.au) Quit ()
[23:58] * sjm (~oftc-webi@c73-107.rim.net) Quit (Quit: Page closed)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.