#ceph IRC Log

Index

IRC Log for 2013-06-10

Timestamps are in GMT/BST.

[0:26] * mschiff_ (~mschiff@port-7047.pppoe.wtnet.de) has joined #ceph
[0:31] * mschiff (~mschiff@port-7047.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[1:33] * Farom (~leo@dslb-188-096-220-161.pools.arcor-ip.net) has joined #ceph
[1:41] * BManojlovic (~steki@178-222-75-71.dynamic.isp.telekom.rs) has joined #ceph
[1:41] * Faron (~leo@dslb-188-108-138-037.pools.arcor-ip.net) Quit (Ping timeout: 480 seconds)
[1:43] * LeaChim (~LeaChim@2.122.119.234) Quit (Ping timeout: 480 seconds)
[1:50] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Remote host closed the connection)
[1:50] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[2:06] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:12] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 482 seconds)
[2:32] * ghartz (~ghartz@ill67-1-82-231-212-191.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[2:39] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Remote host closed the connection)
[2:39] * xmltok (~xmltok@relay.els4.ticketmaster.com) has joined #ceph
[2:46] * Jahkeup (~Jahkeup@199.232.79.12) Quit (Remote host closed the connection)
[2:47] * Jahkeup (~Jahkeup@199.232.79.12) has joined #ceph
[2:56] * mschiff (~mschiff@port-1123.pppoe.wtnet.de) has joined #ceph
[3:03] * Meyer__ (meyer@c64.org) Quit (Remote host closed the connection)
[3:03] * Meyer__ (meyer@c64.org) has joined #ceph
[3:04] * mschiff_ (~mschiff@port-7047.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[3:06] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[3:06] * mschiff (~mschiff@port-1123.pppoe.wtnet.de) Quit (Remote host closed the connection)
[3:12] * xmltok (~xmltok@relay.els4.ticketmaster.com) Quit (Ping timeout: 480 seconds)
[3:19] * mschiff (~mschiff@port-1123.pppoe.wtnet.de) has joined #ceph
[3:26] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[3:27] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[3:28] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[3:28] * mschiff (~mschiff@port-1123.pppoe.wtnet.de) Quit (Remote host closed the connection)
[4:01] * Jahkeup (~Jahkeup@199.232.79.12) Quit (Remote host closed the connection)
[4:08] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[4:17] * rongze (~zhu@173-252-252-212.genericreverse.com) Quit (Ping timeout: 480 seconds)
[4:31] * redeemed (~quassel@cpe-192-136-224-78.tx.res.rr.com) Quit (Remote host closed the connection)
[4:41] * rongze (~zhu@173-252-252-212.genericreverse.com) has joined #ceph
[5:11] * Jahkeup (~Jahkeup@199.232.79.12) has joined #ceph
[5:20] * Jahkeup (~Jahkeup@199.232.79.12) Quit (Ping timeout: 480 seconds)
[5:32] * haomaiwang (~haomaiwan@124.161.78.121) has joined #ceph
[5:56] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[6:23] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[6:36] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[7:09] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[7:50] * capri (~capri@212.218.127.222) has joined #ceph
[7:57] * Machske (~Bram@d5152D87C.static.telenet.be) Quit ()
[8:02] * tnt (~tnt@228.199-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:06] * haomaiwang (~haomaiwan@124.161.78.121) Quit (Ping timeout: 480 seconds)
[8:12] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:12] * Faronitas (~leo@dslb-092-077-078-203.pools.arcor-ip.net) has joined #ceph
[8:17] * Farom (~leo@dslb-188-096-220-161.pools.arcor-ip.net) Quit (Ping timeout: 480 seconds)
[8:25] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Some folks are wise, and some otherwise.)
[8:36] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:36] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:43] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[9:12] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[9:15] * madkiss1 (~madkiss@chello062178057005.20.11.vie.surfer.at) has joined #ceph
[9:18] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[9:19] * haomaiwang (~haomaiwan@119.6.71.106) has joined #ceph
[9:20] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[9:20] * bergerx_ (~bekir@78.188.101.175) has joined #ceph
[9:21] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:25] * ScOut3R (~ScOut3R@212.96.47.215) has joined #ceph
[9:31] * leseb (~Adium@83.167.43.235) has joined #ceph
[9:31] * ChanServ sets mode +v leseb
[9:33] * tnt (~tnt@228.199-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:35] * athrift (~nz_monkey@203.86.205.13) Quit (Remote host closed the connection)
[9:38] * mschiff (~mschiff@port-1123.pppoe.wtnet.de) has joined #ceph
[9:39] * Volture (~quassel@office.meganet.ru) Quit (Remote host closed the connection)
[9:40] * ssejour (~sebastien@lif35-1-78-232-187-11.fbx.proxad.net) has joined #ceph
[9:41] <ssejour> hi
[9:41] * ScOut3R_ (~ScOut3R@212.96.47.215) has joined #ceph
[9:43] * LeaChim (~LeaChim@2.122.119.234) has joined #ceph
[9:44] * athrift (~nz_monkey@203.86.205.13) has joined #ceph
[9:44] <ssejour> could you tell me if it's normal that I do not have OSD configuration in ceph.conf when I use ceph-deploy?
[9:45] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:46] * athrift (~nz_monkey@203.86.205.13) Quit (Remote host closed the connection)
[9:48] * ScOut3R (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[9:48] * athrift (~nz_monkey@203.86.205.13) has joined #ceph
[9:50] * dosaboy (~dosaboy@host86-161-201-199.range86-161.btcentralplus.com) has joined #ceph
[9:55] <ssejour> another question. My cluster is not clean : HEALTH_WARN 163 pgs degraded; 330 pgs stuck unclean; recovery 4/12 degraded (33.333%)
[9:55] <ssejour> and it's not clear for me on how I can fix these degraded pgs. any hint?
[9:56] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[10:01] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[10:01] * leseb (~Adium@83.167.43.235) has joined #ceph
[10:03] * Machske (~Bram@Fiberspeed768-2.fiberspeed.claranet.nl) has joined #ceph
[10:07] * haomaiwang (~haomaiwan@119.6.71.106) Quit (Read error: Operation timed out)
[10:11] <Gugge-47527> ssejour: paste the output from ceph osd tree somewhere
[10:12] <tnt> (and to be clear, 'somewhere' is not 'directly in the IRC window' :)
[10:14] <ssejour> tnt: ;-)
[10:15] <ssejour> http://pastebin.com/rH1GtDEz
[10:16] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:16] <ssejour> a lot of pg are stuck unclean for 127179....
[10:17] <wogri_risc> ssejour: did you fiddle with your crushmap?
[10:18] <ssejour> wogri_risc: no. it's a clean install (should be) with ceph-deploy
[10:18] <tnt> ceph pg dump | grep unclean
[10:19] <ssejour> tnt: no output
[10:21] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[10:21] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[10:24] * leseb_ (~leseb@83.167.43.235) has joined #ceph
[10:24] * leseb_ (~leseb@83.167.43.235) has left #ceph
[10:25] * ChanServ sets mode +v leseb
[10:25] <ssejour> wogri_risc: I confirm. The crushmap is clean
[10:26] <wogri_risc> ok. any firewall rules preventing data from moving?
[10:27] * san (~san@81.17.168.194) Quit (Quit: Ex-Chat)
[10:28] <ssejour> wogri_risc: good point. my iptable rules are empty, but I use a dedicated network for the cluster. Let me check if there is a problem on this network
[10:29] <ssejour> wogri_risc: no problems on this network. I have established cnx between my OSDs
[10:31] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[10:36] <loicd> good morning ceph :-)
[10:39] <tnt> ssejour: try 'ceph pg dump | grep degraded' then
[10:40] <tnt> my current guess would be that because you only have two nodes you fell in the 'bad behavior' of crushmap with the legacy tunables.
[10:41] <ssejour> tnt: do you have any documentation for this?
[10:42] <tnt> http://ceph.com/docs/next/rados/operations/crush-map/#tunables
[10:43] * DarkAceZ (~BillyMays@50.107.53.195) Quit (Read error: Operation timed out)
[10:45] <ssejour> tnt: http://pastebin.com/aTskAEvs
[10:49] <tnt> ok, my theory of the tunables is not good. All those PGs have 2 OSDs assigned.
[10:49] <ssejour> tnt: if I change crush tunables to optimal profile, it may fix it?
[10:49] <tnt> no.
[10:49] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[10:49] * san (~san@81.17.168.194) has joined #ceph
[10:49] <tnt> The dump from above indicates that crush properly assigned OSD to all those pgs.
[10:49] <tnt> so that's not the issue.
[10:50] <ssejour> tnt: ok
[10:50] <tnt> did you try restarting the OSDs ?
[10:50] <ssejour> several time.
[10:50] <wogri_risc> maybe still your 'split' network is causing the issue.
[10:52] <ssejour> wogri_risc: what do you mean? it's not a usual configuration?
[10:54] <wogri_risc> it is.
[10:54] <wogri_risc> but maybe there's a issue there in your test-setup.
[10:54] <wogri_risc> just trying to remove complexity from your setup
[10:56] <tnt> what's weird is that they're not stuck peering, so the osd seem to talk to each other.
[10:57] <wogri_risc> yes. if one osd would be down or sth it would be easier to explain.
[10:57] <wogri_risc> but it's all OSD 3 and 4
[10:58] <ssejour> 3.11 is 7 and 2
[11:00] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[11:02] <wogri_risc> oh, sorry, I did look at the wrong place.
[11:03] <wogri_risc> it seems your problems are spread through the whole cluster.
[11:04] <ssejour> no pb. I tried to find a problem on OSDs, but all OSDs are concern...
[11:07] <tnt> can you raise the log level of osd and see if they say anything ?
[11:08] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Read error: Connection reset by peer)
[11:10] <ssejour> tnt: 5/5 ?
[11:10] <tnt> ssejour: that's a good start.
[11:11] <tnt> if nothing pops up, rise to 10 then 20.
[11:12] <ssejour> ok. for now I have only ticks and heartbeat
[11:15] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[11:15] <ssejour> rised to 10 and 20... lot of logs :)
[11:16] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Quit: Leaving)
[11:17] <ssejour> do you want a pastebin?
[11:17] <tnt> yup.
[11:18] <tnt> but I'm no expert so unless it stands out, it might not mean much to me. Maybe someone else will see smtg though
[11:21] <ssejour> http://pastebin.com/mJv10wrU
[11:22] <ssejour> I didn't find something clear about a "problem"
[11:25] * virsibl (~virsibl@94.231.117.244) has joined #ceph
[11:30] <tnt> me either
[11:30] <ssejour> :)
[11:31] <ssejour> this cluster is not yet in production and I have already a big problem
[11:33] <wogri_risc> I hope you won't go into production with 2 nodes.
[11:34] <tnt> ssejour: ask the ML, no sure what would cause this.
[11:36] <ssejour> wogri_risc: you are right, I'll add more nodes later ;)
[11:38] <ssejour> ok guys. thanks to tried. cheers
[11:59] * virsibl (~virsibl@94.231.117.244) has left #ceph
[12:01] * san (~san@81.17.168.194) Quit (Quit: Ex-Chat)
[12:03] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:05] * Faronitas (~leo@dslb-092-077-078-203.pools.arcor-ip.net) Quit (Ping timeout: 480 seconds)
[12:20] <schlitzer|work> how can i set higher loglevels for ceph (and radosgw)
[12:20] <schlitzer|work> ?
[12:21] <schlitzer|work> or is there a doc page describing this?
[12:23] <schlitzer|work> at least the link from "http://ceph.com/docs/next/rados/configuration/ceph-conf/" to "http://ceph.com/docs/next/rados/operations/troubleshooting/debug" is dead :-(
[12:29] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[12:33] <sha> hi all!
[12:35] * seif (uid11725@id-11725.hillingdon.irccloud.com) has joined #ceph
[12:40] * leseb (~Adium@83.167.43.235) has joined #ceph
[12:43] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[12:43] * ChanServ sets mode +v andreask
[12:48] * DarkAceZ (~BillyMays@50.107.55.63) has joined #ceph
[12:51] * Volture (~quassel@office.meganet.ru) has joined #ceph
[12:57] * hflai (~hflai@alumni.cs.nctu.edu.tw) Quit (Read error: Connection reset by peer)
[12:57] * hflai (~hflai@alumni.cs.nctu.edu.tw) has joined #ceph
[12:58] * Jahkeup (~Jahkeup@199.232.79.12) has joined #ceph
[13:30] <seif> hey guys
[13:44] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[13:44] * diegows (~diegows@190.190.2.126) has joined #ceph
[13:47] * hflai_ (~hflai@alumni.cs.nctu.edu.tw) has joined #ceph
[13:48] * hflai (~hflai@alumni.cs.nctu.edu.tw) Quit (Read error: Operation timed out)
[13:48] * elder_ (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[13:54] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:56] * capri (~capri@212.218.127.222) Quit (Quit: Verlassend)
[13:59] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[13:59] * Maskul (~Maskul@host-89-241-175-25.as13285.net) has joined #ceph
[14:05] * Machske (~Bram@Fiberspeed768-2.fiberspeed.claranet.nl) Quit (Read error: Connection reset by peer)
[14:05] * Machske (~Bram@Fiberspeed768-2.fiberspeed.claranet.nl) has joined #ceph
[14:07] * maximilian is now known as mxmln
[14:10] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:17] * goldfish (~goldfish@91.215.166.4) has joined #ceph
[14:20] * Jahkeup (~Jahkeup@199.232.79.12) Quit (Remote host closed the connection)
[14:28] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[14:33] * Maskul (~Maskul@host-89-241-175-25.as13285.net) Quit (Quit: Maskul)
[14:38] * stxShadow (~Jens@ip-88-152-161-249.unitymediagroup.de) has joined #ceph
[14:38] * Machske (~Bram@Fiberspeed768-2.fiberspeed.claranet.nl) Quit (Ping timeout: 480 seconds)
[14:38] * Machske (~Bram@Fiberspeed768-2.fiberspeed.claranet.nl) has joined #ceph
[14:43] * Jahkeup (~Jahkeup@69.43.65.180) has joined #ceph
[14:44] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:54] * wer (~wer@206-248-239-142.unassigned.ntelos.net) has joined #ceph
[15:01] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[15:10] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) Quit (Quit: wogri_risc)
[15:16] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[15:23] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[15:27] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[15:29] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) Quit (Remote host closed the connection)
[15:29] <jtang> is there anyone here running radosgw on centos/sl6 ?
[15:30] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:35] * julian (~julianwa@125.70.132.9) has joined #ceph
[15:40] * Jahkeup_ (~Jahkeup@199.232.79.55) has joined #ceph
[15:46] * yehudasa (~yehudasa@2607:f298:a:607:4d8c:2491:279:9f1) Quit (Ping timeout: 480 seconds)
[15:47] * Jahkeup (~Jahkeup@69.43.65.180) Quit (Ping timeout: 480 seconds)
[15:53] * yanzheng (~zhyan@101.83.123.141) has joined #ceph
[15:53] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) has joined #ceph
[15:55] * yehudasa (~yehudasa@2607:f298:a:607:85f0:aa00:835:5310) has joined #ceph
[16:05] * yanzheng (~zhyan@101.83.123.141) Quit (Ping timeout: 480 seconds)
[16:07] * portante (~user@66.187.233.206) has joined #ceph
[16:13] * portante (~user@66.187.233.206) Quit (Quit: upgrade)
[16:15] * yanzheng (~zhyan@101.83.98.145) has joined #ceph
[16:16] * Machske (~Bram@Fiberspeed768-2.fiberspeed.claranet.nl) Quit ()
[16:18] * portante (~user@66.187.233.206) has joined #ceph
[16:19] <sig_wal1> hello. Using cuttlefish I continuously get "osd wrongly marked me down" from random osds in cluster during recovery. Does anyone have such problem?
[16:27] * hufman (~hufman@rrcs-67-52-43-146.west.biz.rr.com) has joined #ceph
[16:39] * schlitzer|work (~schlitzer@109.75.189.45) Quit (Quit: Leaving)
[16:42] * julian_ (~julianwa@118.142.77.122) has joined #ceph
[16:43] * Julian__ (~julianwa@125.70.132.9) has joined #ceph
[16:43] <mikedawson> sig_wal1: I've seen that behavior. Most of the time it will self-heal, but it isn't certainly isn't desirable
[16:47] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[16:49] * julian (~julianwa@125.70.132.9) Quit (Ping timeout: 480 seconds)
[16:50] * julian_ (~julianwa@118.142.77.122) Quit (Ping timeout: 480 seconds)
[16:52] * Julian__ (~julianwa@125.70.132.9) Quit (Ping timeout: 480 seconds)
[16:55] * redeemed (~quassel@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[16:56] * portante_ (~user@66.187.233.207) has joined #ceph
[17:01] * portante (~user@66.187.233.206) Quit (Ping timeout: 480 seconds)
[17:05] * julian (~julianwa@125.70.132.9) has joined #ceph
[17:11] * bergerx_ (~bekir@78.188.101.175) Quit (Quit: Leaving.)
[17:16] <jnq> Does anyone know if it would be reasonable to run a ceph cluster on two 2.5GHz 4c machines with 12GB ram to back a handful of KVM virtual machines?
[17:16] <jnq> I'm reading conflicting things, interested to know if anyone is actually doing it in the wild...
[17:17] <hufman> i'm running a handful of kvm virtual machines on 3 heterogenous 2c machines with 8gb ram
[17:18] <hufman> it works well, i can't put my monitoring server on it due to lack of iops
[17:18] <hufman> i think i have 8 on there?
[17:18] <hufman> i'm also running on consumer-grade gigabit ethernet heh
[17:18] <jnq> i'd be running the virtual machines on seperate hosts, just the storage on the two 4c machines
[17:18] <darkfader> hufman: what do you use for monitoring that it needs so much iops?
[17:19] <hufman> i narrowed it down to munin
[17:19] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:19] <darkfader> in nagios land we have rrdcached to "streamline" the rrd updates
[17:19] <darkfader> maybe it works with munin too
[17:19] <hufman> i think so, and i tried it and it helped a lot
[17:19] <hufman> i still wasn't happy about it
[17:20] <darkfader> ah ok :)
[17:20] <hufman> it's especially painful because the VMs freeze up temporarily when they run out of iops
[17:20] <darkfader> hehe
[17:20] <hufman> so i'll just rely on non-shared-disk migration if i need to move it
[17:20] <hufman> it's already far more portable than it used to be!
[17:23] <jnq> hufman: do you know of any reliable articles/lit on running kvm backed by ceph?
[17:24] <hufman> http://ceph.com/docs/master/rbd/libvirt/ is what i used
[17:25] <jnq> super, thanks
[17:25] * vata (~vata@2607:fad8:4:6:85c4:7639:81a9:fbd) has joined #ceph
[17:26] <TiCPU> I'm pretty sure my RBD cache is not working correctly, is there any way to be sure?
[17:26] * zhyan_ (~zhyan@101.83.44.104) has joined #ceph
[17:26] <TiCPU> that is in KVM
[17:30] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:31] * yanzheng (~zhyan@101.83.98.145) Quit (Ping timeout: 480 seconds)
[17:38] * redeemed_ (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) has joined #ceph
[17:38] * redeemed_ (~redeemed@static-71-170-33-24.dllstx.fios.verizon.net) Quit ()
[17:44] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) has joined #ceph
[17:45] <TiCPU> currently my KVM eats 188% CPU (2 cores) using only interrupts in Windows
[17:46] <darkfader> TiCPU: you do have pv drivers installed and it still needs that much?
[17:50] * zhyan_ (~zhyan@101.83.44.104) Quit (Ping timeout: 480 seconds)
[17:50] * tkensiski (~tkensiski@209.66.64.134) has joined #ceph
[17:50] * tkensiski (~tkensiski@209.66.64.134) has left #ceph
[17:52] <TiCPU> darkfader, yes
[17:52] <TiCPU> I'm only using a virtio disk
[17:56] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[17:56] * gregaf1 (~Adium@2607:f298:a:607:fda9:e687:6e3c:62d0) has joined #ceph
[17:57] <ssejour> I'm going crazy... "HEALTH_WARN 49 pgs degraded; 192 pgs stuck unclean" with a fresh clean install (ceph-deploy) on one node with 4 OSD
[17:57] * leseb (~Adium@83.167.43.235) has joined #ceph
[17:58] * leseb (~Adium@83.167.43.235) Quit ()
[18:01] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[18:03] <mikedawson> hufman:
[18:03] <hufman> hiya
[18:03] <mikedawson> hufman: "VMs freeze up temporarilly due to lack of IOPS" http://tracker.ceph.com/issues/3737
[18:03] * gregaf (~Adium@2607:f298:a:607:ed33:570f:1106:e36b) Quit (Ping timeout: 480 seconds)
[18:03] * zhyan_ (~zhyan@101.83.44.104) has joined #ceph
[18:04] <mikedawson> hufman: look for qemu 1.4.2 or better for Josh Durgin's patch
[18:06] <hufman> exciting :D
[18:06] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[18:09] * ScOut3R_ (~ScOut3R@212.96.47.215) Quit (Ping timeout: 480 seconds)
[18:13] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[18:13] * DarkAceZ (~BillyMays@50.107.55.63) Quit (Ping timeout: 480 seconds)
[18:14] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) has joined #ceph
[18:17] * tnt (~tnt@228.199-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:17] * Jahkeup_ (~Jahkeup@199.232.79.55) Quit (Remote host closed the connection)
[18:19] <hufman> and it looks like it'll be in cuttlefish too?
[18:19] <hufman> and i just need to update qemu?
[18:20] <mikedawson> hufman: yes, but cuttlefish doesn't get you the qemu packages. New enough qemu packages aren't widely available from distros yet
[18:20] <hufman> yeah
[18:21] <hufman> i built my own debian package to get the librbd support back
[18:21] <mikedawson> hufman: may help http://tracker.ceph.com/issues/4834
[18:21] <hufman> should be easy enough to add random patch there?
[18:21] <hufman> i should figure out what benchmark to use to verify that it worked heh
[18:29] * stxShadow (~Jens@ip-88-152-161-249.unitymediagroup.de) has left #ceph
[18:29] * DarkAceZ (~BillyMays@50.107.55.63) has joined #ceph
[18:34] * vata (~vata@2607:fad8:4:6:85c4:7639:81a9:fbd) Quit (Quit: Leaving.)
[18:34] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:35] * zhyan_ (~zhyan@101.83.44.104) Quit (Ping timeout: 480 seconds)
[18:37] <joelio> ssejour: Is that normal? Will it not settle down after you've left it running for a short while? I didn't have a great experience with ceph-deploy, it has to be said. I did it all manually
[18:39] <ssejour> joelio: how long do I have to wait?
[18:40] * vata (~vata@2607:fad8:4:6:a98d:9901:f82f:678f) has joined #ceph
[18:40] <ssejour> joelio: I was waiting with a ceph -w but I was not able to know if something was on going...
[18:42] <joelio> it doesn't take too long - are you sure you have no osd's down? check with ceph osd tree and look for them all being up and having a 1
[18:43] <joelio> 'ceph osd tree' that is
[18:44] <ssejour> joelio: all osd are up...
[18:44] <joelio> and all are 'in' #
[18:45] <ssejour> yes
[18:45] * sagelap (~sage@2600:1012:b02d:c19b:bc04:545e:8f6f:72fe) has joined #ceph
[18:45] <ssejour> "pg 0.0 is stuck unclean since forever..."
[18:46] * gregaf1 (~Adium@2607:f298:a:607:fda9:e687:6e3c:62d0) Quit (Quit: Leaving.)
[18:46] <joelio> hmm, that shouldn't happen on a fresh install. espeically not a one (?) node system
[18:46] <TiCPU> ssejour, did you consider bad RAM ?
[18:47] * gregaf (~Adium@2607:f298:a:607:fda9:e687:6e3c:62d0) has joined #ceph
[18:48] <ssejour> TiCPU: no. it's a good idea (as it's a too weird situation to be logical - would say mister spoke)
[18:52] <joelio> ssejour: I'd also check that the any disks attached are good too - check SMART status and do some dd's/benchmarks agains individual disks. There is an inbuilt benchmark in ceph that may help
[18:52] <joelio> eph osd tell {$osd_id} bench
[18:52] <joelio> ceph osd tell {$osd_id} bench
[18:53] <joelio> results available in /var/log/ceph/ceph.log (or wherever you have configured)
[18:57] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) Quit (Quit: noahmehl)
[18:58] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[18:58] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[19:00] <imjustmatthew> Out of curiosity does anyone trust CephX enough yet to publicly expose the OSDs and MONs? (e.g. for partners in the same colo)
[19:01] * Tamil (~tamil@38.122.20.226) has joined #ceph
[19:14] * portante_ (~user@66.187.233.207) Quit (Ping timeout: 480 seconds)
[19:14] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:26] <TiCPU> for reason beyond my understanding, ballooning stopped working on my VMs..
[19:27] * portante (~user@66.187.233.206) has joined #ceph
[19:27] * dmick (~dmick@2607:f298:a:607:e80c:b504:a389:abfe) has joined #ceph
[19:27] * Jahkeup (~Jahkeup@199.232.79.12) has joined #ceph
[19:32] * Fetch (fetch@gimel.cepheid.org) Quit (Read error: Operation timed out)
[19:36] * Jahkeup (~Jahkeup@199.232.79.12) Quit (Ping timeout: 480 seconds)
[19:38] * noahmehl (~noahmehl@cpe-71-67-115-16.cinci.res.rr.com) has joined #ceph
[19:40] * alexk (~alexk@of2-nat1.sat6.rackspace.com) has joined #ceph
[19:42] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[19:45] * rturk-away is now known as rturk
[19:47] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) has joined #ceph
[19:50] <TiCPU> darkfader, it seems my interrupt problem is related to ballooning ginally
[19:51] * Fetch_ (fetch@gimel.cepheid.org) has joined #ceph
[19:51] * rturk (~rturk@ds2390.dreamservers.com) Quit (Quit: Coyote finally caught me)
[19:52] * sagelap1 (~sage@227.sub-70-197-77.myvzw.com) has joined #ceph
[19:52] * sagelap (~sage@2600:1012:b02d:c19b:bc04:545e:8f6f:72fe) Quit (Quit: Leaving.)
[19:52] * rturk-away (~rturk@ds2390.dreamservers.com) has joined #ceph
[19:53] * rturk-away is now known as rturk
[20:00] <mikedawson> TiCPU: interrupt problem? does it by chance involve ksoftirqd consuming lots of resources?
[20:04] <TiCPU> nope, this is in a windows 7 kvm guest, windows only says "interrupts" :(
[20:04] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[20:08] * sagelap1 (~sage@227.sub-70-197-77.myvzw.com) Quit (Ping timeout: 480 seconds)
[20:10] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:10] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:12] <mikedawson> ok, thanks TiCPU. I've been chasing a different issue.
[20:13] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[20:13] * mjblw1 (~mbaysek@wsip-174-79-34-244.ph.ph.cox.net) Quit (Quit: Leaving.)
[20:17] * john_barbee__ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:17] * john_barbee__ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit ()
[20:24] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:24] <redeemed> how does one tie ceph auth to a given RBD image? for example, i desire to mount an RBD image on ubuntu 12.04 using a mount entry such as: ceph-node3:6789 name=admin,secret={ASDF} rbd testshare1
[20:26] * sagelap (~sage@97.72.228.121) has joined #ceph
[20:26] <dmick> redeemed: auth is per-pool at the moment
[20:27] <redeemed> well, that's simple enough then, ain't it? just make the RBD image reside in a pool and link the auth to the pool...
[20:27] <redeemed> thank you, dmick
[20:28] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[20:28] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[20:28] <dmick> beware of using too many pools, but for tens of auth domains that's probably workable
[20:29] <dmick> namespaces should help when they come
[20:30] <redeemed> thank you, dmick. very good to know. i will likely stick to pools by use-case (NFS, service, etc).
[20:30] <dmick> cheers
[20:36] <paravoid> sjust: good morning :)
[20:56] * sagelap (~sage@97.72.228.121) Quit (Read error: Connection reset by peer)
[20:57] * portante (~user@66.187.233.206) Quit (Ping timeout: 480 seconds)
[21:00] * joshd (~joshd@2607:f298:a:607:1d5a:ef63:530c:5125) Quit (Ping timeout: 480 seconds)
[21:04] <sjust> paravoid: what's up?
[21:09] * joshd (~joshd@2607:f298:a:607:6182:5933:82a:8a19) has joined #ceph
[21:32] <ssejour> TiCPU: I have a second server ready. Same hardware configuration. Same problem... "HEALTH_WARN 49 pgs degraded; 192 pgs stuck unclean" So I don't think it's a hardware problem.
[21:36] <ssejour> ceph.log during my new cluster creation (1 node and 4 OSD) : http://pastebin.com/TUwBz6h4
[21:43] * sagelap (~sage@2600:1010:b004:7b51:bc04:545e:8f6f:72fe) has joined #ceph
[21:53] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:53] * ChanServ sets mode +v andreask
[22:00] <paravoid> sjust: anything else you need re: slow peering?
[22:00] <paravoid> and is log refactoring still your plan?
[22:06] <saaby> sagelap: would you know what the expected performance impact would be on running osd's without tcmalloc? - looks like (deep)scrubs are running very slow..
[22:06] <sjust> paravoid: don't think there's anything else I need from you right now
[22:06] <saaby> also, we managed to segfault an osd without tcmalloc just now
[22:09] <nhm> saaby: tcmalloc definitely helps
[22:10] <nhm> saaby: Any reason you are running without it?
[22:10] <saaby> yeah, we are testing without tcmalloc because osd's where segfaulting in tcmalloc
[22:10] * sagelap1 (~sage@126.sub-70-197-15.myvzw.com) has joined #ceph
[22:10] * sagelap (~sage@2600:1010:b004:7b51:bc04:545e:8f6f:72fe) Quit (Read error: No route to host)
[22:11] <saaby> see: http://tracker.ceph.com/issues/5239
[22:12] <paravoid> sjust: don't mean to push, I'm just trying to align my plans here, but is there a rough estimate on when you'll have something I could test?
[22:12] <sjust> paravoid: not sure, may not be backportable
[22:12] <sjust> next few days?
[22:13] <paravoid> for this I may even run latest & greatest (unless you advise me not to do so :)
[22:13] <sjust> paravoid: you should probably stick to production releases for a non-test cluster
[22:14] <paravoid> heh
[22:14] <nhm> saaby: doh, ok. You said you also were able to cause a crash without tcmalloc?
[22:14] <saaby> yep, just now
[22:15] <paravoid> that's my inclination too, but this issue is really disruptive
[22:16] <nhm> saaby: same kind of crash?
[22:16] <saaby> not sure yet, one of my colleagues are working on it now
[22:17] <saaby> but it is a segfault, like before
[22:18] <saaby> but, probably not related to tcmalloc as originally thought
[22:20] <nhm> ok
[22:22] <saaby> and interesting.. we are able to trigger this quite easily, just by putting enough load on the osd's. So, why noone else is reporting this eludes me..
[22:22] <nhm> you may want to report it in the bug tracker for 5239 and just mention that you think it might not be related.
[22:23] <saaby> the only semi exotic thing about our cluster is it's running on debian..
[22:23] <saaby> yeah
[22:23] <saaby> we will, as soon as we have the trace
[22:25] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[22:26] * soren_ is now known as soren
[22:27] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[22:32] * sagelap1 (~sage@126.sub-70-197-15.myvzw.com) Quit (Quit: Leaving.)
[22:32] * sagelap (~sage@2600:1010:b004:7b51:bc04:545e:8f6f:72fe) has joined #ceph
[22:32] * sjustlaptop (~sam@2607:f298:a:697:4191:4879:bdf7:c7dd) has joined #ceph
[22:34] * gregphone (~gregphone@38.122.20.226) has joined #ceph
[22:34] <scheuk> We are running a 0.56.6 bobtail cluster, and we are seeing massive memory usage on our OSDs during a nornal scrub (not a normal scrub). Is this a known problem with bobtail?
[22:34] <tnt> yes
[22:34] <scheuk> I mean (not a deep scrub)
[22:35] <tnt> I had the same with bobtail.
[22:35] <nhm> how massive is massive?
[22:35] <scheuk> like 96G Resident and 119G Virtual
[22:35] <nhm> scheuk: per OSD?
[22:35] <scheuk> the server has 128GB of ram
[22:35] <tnt> oh ... yeah, it wasn't that bad on my side.
[22:35] <scheuk> for one OSD
[22:35] <nhm> scheuk: wow!
[22:35] <paravoid> I've seen that happen too
[22:36] <nhm> hrm, can you repeat it?
[22:37] <scheuk> I did'nt catch wich PG it was scrubbing
[22:37] <scheuk> it finished the scrub and continued on
[22:37] <nhm> memory went back down after?
[22:37] <scheuk> not really
[22:37] <paravoid> not in my case
[22:37] <scheuk> it's still using 94GB right now
[22:37] <scheuk> it fell a little
[22:37] <nhm> how's CPU usage?
[22:38] <scheuk> low
[22:38] <scheuk> welp
[22:38] <scheuk> the cpu for the processess are little high
[22:38] <scheuk> 100%+ CPU
[22:39] <scheuk> these servers have 2 AMD athalon 16 core procs
[22:39] <tnt> damn, those are pretty beefy ...
[22:39] <paravoid> isn't this http://tracker.ceph.com/issues/4179 ?
[22:41] * sagelap (~sage@2600:1010:b004:7b51:bc04:545e:8f6f:72fe) Quit (Ping timeout: 480 seconds)
[22:42] <scheuk> we did see the memory leak during a deepscrub
[22:42] <scheuk> and set deepscubs to a 4 week interval
[22:42] <gregphone> scheuk: are you the ones who rebuilt without tcmalloc?
[22:42] * PodMan99 (~keith@dr-pepper.1stdomains.co.uk) has joined #ceph
[22:42] <scheuk> no
[22:43] <scheuk> I'm using the stock bobtail deb
[22:43] <scheuk> http://ceph.com/debian-bobtail/
[22:43] <nhm> scheuk: if the processes are still going nuts, try running "perf top" and see if it gives you anything useful.
[22:43] <gregphone> ah, just a thought. as you were :)
[22:43] <nhm> scheuk: you may need to install a package or two.
[22:43] <PodMan99> hey all ... I have been trying to play with a new CEPH install using IPv6, I try and run ceph-deploy install <HOSTNAME> and I get a complaint about pgp keys, I also notice that wget is not using the -6 switch? how can i make ceph-deploy work on iIPv6
[22:44] <nhm> scheuk: if we are lucky maybe it's doing something stupid we can catch without heap profiling.
[22:46] <PodMan99> pushy.protocol.proxy.ExceptionProxy: Command 'wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add -' returned non-zero exit status 2pushy.protocol.proxy.ExceptionProxy: Command 'wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add -' returned non-zero exit statpushy.protocol.proxy.ExceptionProxy: Command 'wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=ke
[22:47] <PodMan99> sorry... damn terminal times out
[22:48] <PodMan99> main issue I believe is that ceph.com (according to the wget command) does not have any IPv6 DNS enteries? :( suggestions?
[22:48] <scheuk> @nhm: I see that, working on getting the packages installed
[22:48] <cephalobot`> scheuk: Error: "nhm:" is not a valid command.
[22:49] * sagelap (~sage@2600:1010:b004:7b51:bc04:545e:8f6f:72fe) has joined #ceph
[22:49] * sagelap (~sage@2600:1010:b004:7b51:bc04:545e:8f6f:72fe) Quit ()
[22:53] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[23:00] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:00] * gregphone (~gregphone@38.122.20.226) Quit (Read error: Connection reset by peer)
[23:01] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[23:01] * Tamil (~tamil@38.122.20.226) has joined #ceph
[23:05] * sjustlaptop (~sam@2607:f298:a:697:4191:4879:bdf7:c7dd) Quit (Ping timeout: 480 seconds)
[23:13] * ssejour (~sebastien@lif35-1-78-232-187-11.fbx.proxad.net) Quit (Quit: Leaving.)
[23:16] * skm (~smiley@205.153.36.170) has left #ceph
[23:18] <Vjarjadian> just watching the youtube video from 4 days ago, and he says that there is a community initiative to write an iSCSI tartget into ceph... anyone here got a link to the project? quick google doesnt give me much
[23:22] * markbby (~Adium@168.94.245.2) has joined #ceph
[23:26] <scheuk> arg the package for linux-tools for our current kernel version does'nt exsist anymore :(
[23:26] <scheuk> which means I'll have to upgrade the kernel to get linux-tools installed
[23:27] <dmick> build from source?...
[23:38] * vata (~vata@2607:fad8:4:6:a98d:9901:f82f:678f) Quit (Quit: Leaving.)
[23:39] * portante (~user@66.187.233.206) has joined #ceph
[23:41] * Jahkeup (~Jahkeup@199.232.79.12) has joined #ceph
[23:54] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[23:55] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[23:55] * LeaChim (~LeaChim@2.122.119.234) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.