#ceph IRC Log

Index

IRC Log for 2013-07-08

Timestamps are in GMT/BST.

[0:01] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:04] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[0:05] * vhasi (vhasi@vha.si) Quit (Read error: Operation timed out)
[0:05] * vhasi (vhasi@vha.si) has joined #ceph
[0:06] * markit (~marco@88-149-177-66.v4.ngi.it) Quit (Remote host closed the connection)
[0:10] * markit (~marco@88-149-177-66.v4.ngi.it) has joined #ceph
[0:13] * markit (~marco@88-149-177-66.v4.ngi.it) Quit ()
[0:17] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Quit: Ex-Chat)
[0:19] * haomaiwang (~haomaiwan@117.79.232.241) has joined #ceph
[0:24] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[0:25] * DarkAceZ (~BillyMays@50.107.55.36) Quit (Ping timeout: 480 seconds)
[0:27] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[0:29] * haomaiwang (~haomaiwan@117.79.232.241) Quit (Ping timeout: 480 seconds)
[0:33] * LeaChim (~LeaChim@90.221.247.164) Quit (Ping timeout: 480 seconds)
[0:33] * DarkAceZ (~BillyMays@50.107.55.36) has joined #ceph
[0:38] * BillK (~BillK-OFT@124-169-221-120.dyn.iinet.net.au) has joined #ceph
[0:40] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[0:44] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:16] * tnt (~tnt@91.176.58.19) Quit (Ping timeout: 480 seconds)
[1:25] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[1:46] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[1:52] * haomaiwang (~haomaiwan@117.79.232.241) has joined #ceph
[2:00] * haomaiwang (~haomaiwan@117.79.232.241) Quit (Ping timeout: 480 seconds)
[2:01] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:21] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[2:29] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[2:40] * sebastiandeutsch (~sebastian@p57A06B01.dip0.t-ipconnect.de) has joined #ceph
[2:41] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Bye!)
[2:53] * haomaiwang (~haomaiwan@117.79.232.241) has joined #ceph
[3:01] * haomaiwang (~haomaiwan@117.79.232.241) Quit (Ping timeout: 480 seconds)
[3:04] * haomaiwang (~haomaiwan@117.79.232.241) has joined #ceph
[3:04] * rongze (~zhu@173-252-252-212.genericreverse.com) has joined #ceph
[3:11] * yy (~michealyx@218.74.35.50) has joined #ceph
[3:15] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[3:26] * rongze (~zhu@173-252-252-212.genericreverse.com) Quit (Quit: Leaving.)
[3:27] * sebastiandeutsch (~sebastian@p57A06B01.dip0.t-ipconnect.de) Quit (Quit: sebastiandeutsch)
[4:05] * rongze (~zhu@173-252-252-212.genericreverse.com) has joined #ceph
[4:28] * julian (~julianwa@125.69.104.140) has joined #ceph
[4:32] <Psi-jack> This is odd..
[4:32] <Psi-jack> HEALTH_WARN clock skew detected on mon.b, mon.c
[4:32] <Psi-jack> I have ntp running on all of them, and the clocks are very much in sync.
[4:35] <yy> try type ceph health detail
[4:35] <Psi-jack> mon.b addr 172.18.0.6:6789/0 clock skew 0.0744548s > max 0.05s (latency 0.009273s)
[4:35] <Psi-jack> mon.c addr 172.18.0.7:6789/0 clock skew 0.0764901s > max 0.05s (latency 0.008684s)
[4:35] <Psi-jack> heh
[4:52] * smiley (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley)
[5:07] <Psi-jack> Okay. I see what was happening. My ntp servers were in stratum 16 so the clients weren't able to sync, they were dropped.
[5:08] * madkiss (~madkiss@089144192006.atnat0001.highway.a1.net) has joined #ceph
[5:10] <Psi-jack> One of the reasons I kinda like chrony. it just worked, more forcibly. LOL
[5:12] <Psi-jack> There we go. Now my primary in-house ntp servers are stratum 3 and 4, as expectable and reasonable. :)
[5:37] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) has joined #ceph
[5:43] * madkiss (~madkiss@089144192006.atnat0001.highway.a1.net) Quit (Quit: Leaving.)
[5:51] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) has joined #ceph
[6:51] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[6:51] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[7:00] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[7:03] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit ()
[7:12] * capri (~capri@212.218.127.222) has joined #ceph
[7:45] * mjeanson (~mjeanson@00012705.user.oftc.net) Quit (Remote host closed the connection)
[7:47] * mjeanson (~mjeanson@bell.multivax.ca) has joined #ceph
[7:51] * tnt (~tnt@91.176.58.19) has joined #ceph
[8:15] * sleinen (~Adium@2001:620:0:46:6980:487e:1ccf:6176) has joined #ceph
[8:35] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[8:35] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:35] * LiRul (~lirul@91.82.105.2) has joined #ceph
[8:36] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[8:47] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[8:48] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Quit: Oops. My brain just hit a bad sector)
[9:06] * tnt (~tnt@91.176.58.19) Quit (Ping timeout: 480 seconds)
[9:15] * fridudad (~oftc-webi@fw-office.allied-internet.ag) has joined #ceph
[9:16] * madkiss (~madkiss@88.128.80.3) has joined #ceph
[9:19] * hybrid512 (~walid@106-171-static.pacwan.net) has joined #ceph
[9:22] * hybrid512 (~walid@106-171-static.pacwan.net) Quit (Remote host closed the connection)
[9:22] * hybrid512 (~walid@106-171-static.pacwan.net) has joined #ceph
[9:22] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:24] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:24] * ChanServ sets mode +v andreask
[9:26] * vipr (~vipr@office.loft169.be) has joined #ceph
[9:29] * john_ (~john@astound-64-85-225-33.ca.astound.net) Quit (Remote host closed the connection)
[9:31] * joshd1 (~jdurgin@2602:306:c5db:310:50ff:4b18:fd41:879a) Quit (Quit: Leaving.)
[9:33] * infinitytrapdoor (~infinityt@134.95.27.132) has joined #ceph
[9:34] * leseb1 (~Adium@83.167.43.235) has joined #ceph
[9:42] * mxmln (~maximilia@212.79.49.65) has joined #ceph
[10:10] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[10:11] * ScOut3R_ (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[10:12] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[10:12] * ScOut3R__ (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[10:12] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[10:13] * BManojlovic (~steki@91.195.39.5) Quit ()
[10:15] * mschiff (~mschiff@pD9511A92.dip0.t-ipconnect.de) has joined #ceph
[10:15] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[10:17] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:18] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[10:18] * Volture (~quassel@office.meganet.ru) Quit (Remote host closed the connection)
[10:19] * ScOut3R_ (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[10:20] * Volture (~quassel@office.meganet.ru) has joined #ceph
[10:21] * madkiss (~madkiss@88.128.80.3) Quit (Quit: Leaving.)
[10:22] * madkiss (~madkiss@88.128.80.3) has joined #ceph
[10:25] * LeaChim (~LeaChim@90.221.247.164) has joined #ceph
[10:38] * madkiss (~madkiss@88.128.80.3) Quit (Quit: Leaving.)
[10:40] * madkiss (~madkiss@88.128.80.3) has joined #ceph
[10:49] * julian (~julianwa@125.69.104.140) Quit (Quit: afk)
[10:53] * madkiss (~madkiss@88.128.80.3) Quit (Quit: Leaving.)
[10:55] * madkiss (~madkiss@88.128.80.3) has joined #ceph
[10:57] * madkiss (~madkiss@88.128.80.3) Quit ()
[11:10] * ScOut3R__ (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Remote host closed the connection)
[11:10] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[11:16] * haomaiwang (~haomaiwan@117.79.232.241) Quit (Remote host closed the connection)
[11:16] * fireD (~fireD@93-142-237-252.adsl.net.t-com.hr) has joined #ceph
[11:16] * haomaiwang (~haomaiwan@117.79.232.241) has joined #ceph
[11:24] * leseb1 (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[11:28] * leseb (~Adium@83.167.43.235) has joined #ceph
[11:28] * jjgew (~jjgew@ipc-hosting.de) has joined #ceph
[11:31] <jjgew> hi room! I have problems installing ceph on a fresh VM on debian (squeeze). I got a strange error while installing python with some test scripts and later after installing ceph-deply python is complaining about a missing package named pushy. I checked and saw it is installed (at least I can find it). also using pip install python-pushy did not help. Any suggestions? (ceph-deploy install --stable cuttlefish hostname1 -- ImportError: No module named pushy)
[11:33] <andreask> jjgew: you have python 2.7 installed?
[11:33] <jjgew> Python 2.6.6
[11:35] <jjgew> andreask: do I need 2.7?
[11:35] <andreask> hmm ... IIRC there was something with a missing dependency on python-setuptools ... you have that installed?
[11:35] <jjgew> andreask: seems I already have it - apt-get install python-setuptools: python-setuptools is already the newest version.
[11:36] <joelio> Wheezy is stable
[11:36] <joelio> if you have the option, go for that
[11:37] <andreask> that would be best option, yes
[11:37] * zapotah_ is now known as zapotah
[11:37] <jjgew> ok I will give it a try lets see if that helps coming back in a few :)
[11:38] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[11:43] * rongze (~zhu@173-252-252-212.genericreverse.com) has left #ceph
[11:45] * rongze (~zhu@173-252-252-212.genericreverse.com) has joined #ceph
[11:46] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Remote host closed the connection)
[11:51] * eternaleye (~eternaley@2002:3284:29cb::1) Quit (Ping timeout: 480 seconds)
[11:52] * eternaleye (~eternaley@2002:3284:29cb::1) has joined #ceph
[11:52] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[11:53] * deadsimple (~infinityt@134.95.27.132) has joined #ceph
[11:58] * infinitytrapdoor (~infinityt@134.95.27.132) Quit (Ping timeout: 480 seconds)
[12:00] * yy (~michealyx@218.74.35.50) has left #ceph
[12:00] <jjgew> this command gives me an PGP error: wget -q -O- 'https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc' | apt-key add -
[12:01] <jjgew> gpg: no valid OpenPGP data found.
[12:02] <jjgew> and I am on wheezy now :) so it worked so far ...
[12:06] <joelio> jjgew: are you behind a proxy? Looks like it couldn't get the exported gpg key
[12:06] <joelio> try the wget on it's own and see if you get the file
[12:08] <jjgew> I do get the file
[12:08] <jjgew> with just the wget without piping it to apt-key
[12:09] <joelio> umm, that should work then, just try it again, maybe it was a temporal issue?
[12:09] <jjgew> just did, same error
[12:09] <jjgew> hmm really strange as the wget command works just fine ....
[12:10] <joelio> well, this is for the signing key, you could install without them mind - which would be ok to test, but dedinitely not in prod. It really should work though
[12:11] <jjgew> but ceph-deploy issues the command :(
[12:12] <andreask> jjgew: you are root or doing sudo?
[12:12] <jjgew> hmmm apt-key add release.asc
[12:12] <jjgew> OK
[12:12] <jjgew> so that works ...
[12:12] <jjgew> root
[12:12] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[12:21] * portante|afk is now known as portante
[12:22] * tnt (~tnt@91.176.58.19) has joined #ceph
[12:23] * leseb1 (~Adium@83.167.43.235) has joined #ceph
[12:28] <jjgew> I am actually using the ceph-deploy tool and then get the error so when I do manually it works but then I can not use the ceph-deploy tool because it fails at this step
[12:30] <joelio> you just don't need to run the install step, do that manually
[12:30] <joelio> it's a strange one though, admiteddly
[12:31] * leseb1 (~Adium@83.167.43.235) Quit (Ping timeout: 480 seconds)
[12:36] <jjgew> joelio: ok thanks
[12:41] * leseb (~Adium@83.167.43.235) has joined #ceph
[12:51] * tnt (~tnt@91.176.58.19) Quit (Ping timeout: 480 seconds)
[13:01] * Machske (~bram@ip-188-118-5-253.reverse.destiny.be) has joined #ceph
[13:02] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[13:09] * portante is now known as portante|afk
[13:12] * nlopes (~nlopes@a89-154-18-198.cpe.netcabo.pt) Quit (Ping timeout: 480 seconds)
[13:18] * eternaleye (~eternaley@2002:3284:29cb::1) Quit (Ping timeout: 480 seconds)
[13:21] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[13:23] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[13:27] * mxmln3 (~maximilia@212.79.49.65) has joined #ceph
[13:30] * eternaleye (~eternaley@2002:3284:29cb::1) has joined #ceph
[13:32] * mxmln (~maximilia@212.79.49.65) Quit (Ping timeout: 480 seconds)
[13:36] * mxmln3 is now known as mxmln
[13:37] * LeaChim (~LeaChim@90.221.247.164) Quit (Ping timeout: 480 seconds)
[13:37] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:42] * X3NQ (~X3NQ@195.191.107.205) has joined #ceph
[13:42] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:42] * ChanServ sets mode +v andreask
[13:48] * deadsimple (~infinityt@134.95.27.132) Quit ()
[13:50] <loicd> does someone want to review https://github.com/ceph/ceph/pull/402 ?
[13:51] * nlopes (~nlopes@a89-154-18-198.cpe.netcabo.pt) has joined #ceph
[13:53] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[13:54] * LeaChim (~LeaChim@90.221.247.164) has joined #ceph
[14:03] * LeaChim (~LeaChim@90.221.247.164) Quit (Ping timeout: 480 seconds)
[14:09] * mozg (~andrei@212.183.128.61) has joined #ceph
[14:09] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[14:09] * vipr (~vipr@office.loft169.be) Quit (Remote host closed the connection)
[14:12] * leseb (~Adium@83.167.43.235) has joined #ceph
[14:18] * vipr (~vipr@office.loft169.be) has joined #ceph
[14:29] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:53] * dobber (~dobber@89.190.199.210) has joined #ceph
[15:02] <alexbligh> How do OSDs choose which ports they use? I am orchestrating deployment of Ceph OSDs, have the ceph.conf, but want to know which ports in my firewall I should open up given only a ceph.conf
[15:13] <Psi-jack> alexbligh: http://ceph.com/w/index.php?title=Cluster_configuration&oldid=3262
[15:15] <alexbligh> Well that says "Uses the first three available ports starting from 6800". So is the only way to count the number of mds & osd and hope it's chosen the same ports?
[15:16] <Psi-jack> Well, doesn't look to be the first 3.
[15:16] <alexbligh> "ceph-osd
[15:16] <alexbligh> Uses the first three available ports starting from 6800" <- from the link you posted. I guess it means "first 3 that were available when the OSD was started".
[15:16] <Psi-jack> That documentation looks to be slightly inaccurate, as I'm seeing ceph-osd's using 6801-6809
[15:17] <Psi-jack> And ceph-mds also using 6801 as well, at the same time.
[15:17] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:17] <Psi-jack> Oh no, pardon, ceph-mds was using 6800. :)
[15:17] <alexbligh> I'm seeing 4 OSDs from 6800 to 6811 inclusive.
[15:18] <Psi-jack> Sounds like 4 OSDs?
[15:19] <Psi-jack> So, the documentation is accurate. Each OSD takes the first 3 available ports starting at 6800.
[15:20] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Read error: Operation timed out)
[15:21] <Psi-jack> So, you run 4 OSD's, you'll use 12 ports.
[15:25] * ccourtaut (~ccourtaut@2001:41d0:1:eed3::1) Quit (Quit: Leaving)
[15:25] <alexbligh> ok - looks like I'll have to go with that then.
[15:28] * ccourtaut (~ccourtaut@2001:41d0:1:eed3::1) has joined #ceph
[15:30] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:37] * drokita (~drokita@199.255.228.128) has joined #ceph
[15:39] * drokita (~drokita@199.255.228.128) Quit ()
[15:40] * drokita (~drokita@199.255.228.128) has joined #ceph
[15:47] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[15:50] * guilhemfr (~guilhem@tui75-3-88-168-236-26.fbx.proxad.net) has joined #ceph
[15:55] <ofu> http://ceph.com/docs/master/rbd/ lacks index.html, http://ceph.com/docs/master/ => Link to Ceph Block Device
[15:58] * portante|afk is now known as portante
[16:01] * leseb (~Adium@83.167.43.235) has joined #ceph
[16:02] * capri (~capri@212.218.127.222) Quit (Quit: Verlassend)
[16:09] * BillK (~BillK-OFT@124-169-221-120.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:09] * markit (~marco@151.78.74.112) has joined #ceph
[16:10] <markit> hi, I'm trying to follow http://ceph.com/docs/master/start/quick-ceph-deploy/ wondering what is an "admin node", I want to create a 3 node cluster, is there a "special" node that acts as "admin"? or is just for convenience, i.e. you issue all the cluster command in one node, no different from the others, and you conventionally call "admin node"?
[16:10] <markit> (but you could use a different one at any time)
[16:15] <Gugge-47527> Its the computer you choose to run all the admin commands from :)
[16:15] <Gugge-47527> It could be your workstation (if it can run ceph-deploy)
[16:15] <Gugge-47527> Or it could be one of the nodes ... i dont know if ceph-deploy have problems adding itself though
[16:16] <markit> Gugge-47527: well, in that server, now that you explained me, it's one of the nodes, just an "admin node"
[16:16] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[16:24] * vipr (~vipr@office.loft169.be) Quit (Remote host closed the connection)
[16:26] * dosaboy (~dosaboy@host86-164-81-178.range86-164.btcentralplus.com) has joined #ceph
[16:28] <jjgew> could anyone help me figuring this out, i issue the command: ceph-deploy --overwrite-conf mon create ceph.local …. and get this error: pushy.protocol.proxy.ExceptionProxy: [Errno 2] No such file or directory: '/var/lib/ceph/mon/ceph'
[16:28] * dosaboy_ (~dosaboy@host86-161-205-138.range86-161.btcentralplus.com) Quit (Read error: Operation timed out)
[16:31] * LeaChim (~LeaChim@90.221.247.164) has joined #ceph
[16:41] * vata (~vata@2607:fad8:4:6:e017:3352:2091:12b4) has joined #ceph
[16:46] * leseb1 (~Adium@83.167.43.235) has joined #ceph
[16:48] * gaveen (~gaveen@175.157.183.243) has joined #ceph
[16:59] * leseb1 (~Adium@83.167.43.235) Quit (Ping timeout: 480 seconds)
[17:06] * leseb (~Adium@83.167.43.235) has joined #ceph
[17:09] * LiRul (~lirul@91.82.105.2) Quit (Quit: Leaving.)
[17:09] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:12] * eternaleye (~eternaley@2002:3284:29cb::1) Quit (Ping timeout: 480 seconds)
[17:13] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (Ping timeout: 480 seconds)
[17:13] * off_rhoden (~anonymous@pool-108-28-184-124.washdc.fios.verizon.net) has joined #ceph
[17:16] * eternaleye (~eternaley@2002:3284:29cb::1) has joined #ceph
[17:17] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:22] <guilhemfr> hi all
[17:22] <guilhemfr> it seems that I'm facing the same problem than here: http://tracker.ceph.com/issues/4521
[17:22] * mozg (~andrei@212.183.128.61) Quit (Ping timeout: 480 seconds)
[17:22] <guilhemfr> I tried to add a new osd
[17:23] <guilhemfr> (with ceph-disk-prepare etc)
[17:23] <guilhemfr> and all my mon go down with the same error
[17:24] * ron-slc_ (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[17:24] <guilhemfr> and for now, I can't up any mon with this new osd up
[17:28] * Machske (~bram@ip-188-118-5-253.reverse.destiny.be) Quit (Quit: Leaving)
[17:30] * vata (~vata@2607:fad8:4:6:e017:3352:2091:12b4) Quit (Quit: Leaving.)
[17:32] * scuttlemonkey_ is now known as scuttlemonkey
[17:33] * ron-slc (~Ron@173-165-129-125-utah.hfc.comcastbusiness.net) has joined #ceph
[17:33] * sleinen (~Adium@2001:620:0:46:6980:487e:1ccf:6176) Quit (Ping timeout: 480 seconds)
[17:35] <joao> guilhemfr, can you provide us with your mon stores?
[17:36] * dobber (~dobber@89.190.199.210) Quit (Remote host closed the connection)
[17:37] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:37] * madkiss (~madkiss@217.194.70.18) has joined #ceph
[17:40] * haomaiwang (~haomaiwan@117.79.232.241) Quit (Remote host closed the connection)
[17:47] * portante is now known as portante|afk
[17:47] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[17:49] * leseb (~Adium@83.167.43.235) has joined #ceph
[17:51] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[17:51] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[17:54] * markit (~marco@151.78.74.112) Quit (Quit: Konversation terminated!)
[17:54] <guilhemfr> joao, I'm doing it right now
[17:54] * ScOut3R_ (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[17:54] <joao> guilhemfr, thanks!
[17:56] * ScOut3R__ (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[17:59] * ScOut3R_ (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Read error: Operation timed out)
[18:00] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[18:01] * gregaf1 (~Adium@2607:f298:a:607:2522:386f:bdfa:dbdc) Quit (Quit: Leaving.)
[18:08] * mschiff (~mschiff@pD9511A92.dip0.t-ipconnect.de) Quit (Read error: Connection reset by peer)
[18:08] * guilhemfr (~guilhem@tui75-3-88-168-236-26.fbx.proxad.net) Quit (Quit: Quitte)
[18:09] * Tribaal (uid3081@id-3081.ealing.irccloud.com) Quit (Ping timeout: 480 seconds)
[18:09] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[18:11] * portante|afk is now known as portante
[18:11] * gregaf (~Adium@2607:f298:a:607:90b6:a075:51bd:1b9a) has joined #ceph
[18:11] <grepory> hrm. any thoughts on what would cause "ceph health" to succeed, but then calling rados lspools doesn't return?
[18:14] <grepory> specifically, this is from my openstack compute nodes using the admin key for authentication.
[18:14] <grepory> (openstack uses cephx which also fails in a similar fashion atm)
[18:14] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[18:15] <grepory> ceph osd ls succeeds, ceph os dump hangs forever.
[18:15] <grepory> ceph osd dump, rather.
[18:15] * ScOut3R__ (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[18:15] <grepory> ceph mon stat shows all three mons. i've verified that i can communicate with all three via tcp
[18:16] * leseb (~Adium@83.167.43.235) has joined #ceph
[18:20] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[18:26] <grepory> i see in the mon log that it gets the command (e.g. e2 handle_command mon_command(osd dump v 0) v1) which i issued against a specific mon for testing… but then on the client side nothing happens.
[18:28] * tnt (~tnt@109.130.77.55) has joined #ceph
[18:28] <gregaf> grepory: I don't have a lot of time right now, but that is indeed odd — you probably want to inject higher "debug mon = 20" levels, issue the command again, and see what the monitor is doing
[18:28] <gregaf> also, what version?
[18:28] <grepory> gregaf: thanks, i'll start there.
[18:28] <grepory> 0.64.1
[18:28] * b1tbkt_ (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) has joined #ceph
[18:29] <grepory> sorry 0.64.4
[18:29] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Read error: Connection reset by peer)
[18:29] * vata (~vata@2607:fad8:4:6:4120:5799:7bb8:eff8) has joined #ceph
[18:29] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[18:29] * ChanServ sets mode +o scuttlemonkey
[18:32] * b1tbkt (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[18:39] * yehudasa_ (~yehudasa@2602:306:330b:1410:9de3:9265:e904:18d3) has joined #ceph
[18:40] * haomaiwang (~haomaiwan@211.155.113.223) has joined #ceph
[18:40] * jjgew (~jjgew@ipc-hosting.de) Quit (Quit: jjgew)
[18:41] * portante is now known as portante|afk
[18:41] * leseb (~Adium@83.167.43.235) Quit (Quit: Leaving.)
[18:44] * mschiff (~mschiff@85.182.236.82) has joined #ceph
[18:48] * haomaiwang (~haomaiwan@211.155.113.223) Quit (Ping timeout: 480 seconds)
[18:50] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:51] * portante|afk is now known as portante
[18:52] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[18:52] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[18:53] * jtang1 (~jtang@142.176.24.2) has joined #ceph
[18:54] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[18:54] * jtang1 (~jtang@142.176.24.2) Quit ()
[18:54] * jtang1 (~jtang@142.176.24.2) has joined #ceph
[18:55] * oddomatik is now known as Brian
[18:56] * b1tbkt_ (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[18:57] * nwat (~nwatkins@eduroam-226-128.ucsc.edu) has joined #ceph
[18:57] * b1tbkt (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) has joined #ceph
[19:00] * b1tbkt (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[19:02] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[19:02] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[19:02] * Tamil (~tamil@38.122.20.226) has joined #ceph
[19:02] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Quit: WeeChat 0.3.8)
[19:09] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:10] * joshd1 (~jdurgin@2602:306:c5db:310:e110:2102:3a19:638c) has joined #ceph
[19:13] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[19:18] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[19:19] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) has joined #ceph
[19:21] * b1tbkt (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) has joined #ceph
[19:21] * athrift (~nz_monkey@203.86.205.13) Quit (Ping timeout: 480 seconds)
[19:23] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[19:26] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[19:26] * ChanServ sets mode +v andreask
[19:29] * mjeanson (~mjeanson@00012705.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:34] * nhm (~nhm@184-97-193-106.mpls.qwest.net) has joined #ceph
[19:34] * ChanServ sets mode +o nhm
[19:34] <loicd> sjust: I assume what sage means is that I should use https://github.com/ceph/ceph/blob/master/src/common/sharedptr_registry.hpp to replace https://github.com/ceph/ceph/blob/master/src/osd/ReplicatedPG.h#L445 instead of doing it manually (that's what I did in https://github.com/ceph/ceph/pull/402 )
[19:35] <loicd> I did not know about sharedptr_registry.hpp and it looks like it is exactly what's needed :-)
[19:35] <sjust> loicd: that's the idea
[19:35] <loicd> I'm on it :-) https://github.com/ceph/ceph/pull/407 is independant though and won't be affected by the change.
[19:36] <gregaf> given your description I wouldn't expect most of the conversion to shared pointer patches to rely on the specific registry, either? (w00t for that, btw)
[19:37] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:37] * mjeanson (~mjeanson@bell.multivax.ca) has joined #ceph
[19:38] * xmltok (~xmltok@pool101.bizrate.com) Quit (Remote host closed the connection)
[19:38] * xmltok (~xmltok@relay.els4.ticketmaster.com) has joined #ceph
[19:38] <loicd> gregaf: I don't know yet. It will definitely reduce the complexity of the change :-)
[19:38] * madkiss (~madkiss@217.194.70.18) Quit (Quit: Leaving.)
[19:39] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[19:41] * haomaiwang (~haomaiwan@li565-182.members.linode.com) has joined #ceph
[19:43] * fuzz (~pi@c-76-30-9-9.hsd1.tx.comcast.net) has joined #ceph
[19:44] <fuzz> hi all .. any advice on setting up ceph with 36 drives in a 6 core box? I was thinking of running 6 OSDs, each with a 6 drive mdraid
[19:44] <sjust> loicd: merged in the unit tests
[19:45] <fuzz> btw saw my find symlink bug patch made it in .. when's the next release?
[19:46] <sagewk> someone want to do a quick sanity check on https://github.com/ceph/ceph/pull/401 ? this fixes teh arm and i386 builds from the intel crc stuff
[19:47] <nhm> fuzz: that will likely work fine, but probably won't be quite as fast as a 36 OSD machine with a pair of 6 core processors and 1 disk per OSD.
[19:48] * xmltok_ (~xmltok@pool101.bizrate.com) has joined #ceph
[19:49] * haomaiwang (~haomaiwan@li565-182.members.linode.com) Quit (Ping timeout: 480 seconds)
[19:51] <sjust> sagewk: that looks fairly reasonable
[19:51] <nhm> fuzz: depends on the number of controllers and SAS lanes in the box too though.
[19:52] <infernix> if a pg is stuck unclean but there are no unfound objects, what do/
[19:53] <infernix> pg 4.7c1 is stuck unclean since forever, current state active+remapped+backfilling, last acting [6,38,1]
[19:53] <infernix> restart those osds 6,38,1?
[19:54] <sjust> infernix: restarting 6 will probably cause it to start making progress again
[19:54] <sjust> just in case
[19:54] <sagewk> sjust: thanks
[19:54] <sjust> would you be ok with enabling osd debugging on those osds in case it happens again?
[19:54] <sjust> infernix: ^
[19:54] <sjust> debug osd = 20
[19:54] <infernix> i have had a ton of issues with ceph the past weekend
[19:54] * xmltok (~xmltok@relay.els4.ticketmaster.com) Quit (Ping timeout: 480 seconds)
[19:54] <sjust> debug filestore = 20
[19:54] <sjust> debug ms = 1
[19:55] <infernix> unfortunately i have to travel soon so no time to look into it
[19:55] <sjust> k
[19:56] <infernix> looks like that did help
[19:56] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (Ping timeout: 480 seconds)
[19:57] <infernix> on 0.56.2 btw
[19:57] <sjust> you probably should upgrade to the most recent .56 point release
[19:58] <infernix> 2013-07-08 13:57:59.319582 osd.6 [WRN] map e28527 wrongly marked me down
[19:58] <infernix> tons of these again
[19:58] <infernix> that's actually on another cluster
[20:00] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[20:01] <infernix> upgrading
[20:02] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[20:03] * Tribaal (uid3081@id-3081.ealing.irccloud.com) has joined #ceph
[20:07] * gaveen (~gaveen@175.157.183.243) Quit (Remote host closed the connection)
[20:08] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Ping timeout: 480 seconds)
[20:09] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[20:09] <infernix> one OSD won't go up
[20:09] <infernix> "fault with nothing to send, going to standby"
[20:09] <infernix> "connect claims to be 10.246.1.19:6866/12874 not 10.246.1.19:6866/10177 - wrong node!"
[20:10] <infernix> ah, there it went
[20:11] * sleinen1 (~Adium@2001:620:0:26:e99d:4b1f:644a:f0fd) has joined #ceph
[20:14] <sagewk> https://github.com/ceph/ceph/pull/408
[20:16] <infernix> so, upgraded; my OSDs are still flapping. i go from 55 in/up to 49 up, 50, 52, 55 and down again to 49
[20:17] <infernix> lots of "wrongly marked me down"
[20:17] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[20:17] <infernix> down to 46
[20:20] * jtang1 (~jtang@142.176.24.2) Quit (Quit: Leaving.)
[20:20] <infernix> i have 55 OSDs and i see about 1000 established connections per host, so they peer alright
[20:21] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[20:23] <infernix> lots of slow requests still, after setting nodown/noup
[20:23] <infernix> is there a value to alter for marking OSDs down due to slow requestS?
[20:26] * jtang1 (~jtang@142.176.24.2) has joined #ceph
[20:29] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[20:31] * madkiss (~madkiss@217.194.70.18) has joined #ceph
[20:31] * terje-_ (~root@135.109.216.239) has joined #ceph
[20:32] <infernix> argh
[20:32] <infernix> unsetting nodown and 10 osds go down immediately
[20:33] * unit3 (~Unit3@72.2.49.50) has joined #ceph
[20:33] * terje- (~root@135.109.216.239) Quit (Ping timeout: 480 seconds)
[20:33] <unit3> Hey all. Was asking about issues deploying ceph on CentOS6 on Friday. I've gotten marginally further, have OSDs running, but all of the pgs report "pgid currently maps to no osd".
[20:34] <unit3> I had to manually start up the OSDs because ceph-deploy wasn't doing it. What do I have to do to map pgs to the OSDs?
[20:34] <infernix> heartbeat_check: no reply from osd.1 ever,
[20:34] <infernix> that can't be right
[20:36] * dmick (~dmick@2607:f298:a:607:c52d:3c98:eb9e:ff44) has joined #ceph
[20:37] <infernix> aha
[20:37] <infernix> balance-xor is failing on me
[20:37] <infernix> that's what all the flapping is about
[20:38] <nhm> infernix: doh
[20:38] <infernix> it works fine for a while
[20:39] <infernix> but then suddenly it fails
[20:39] <infernix> while being weeks/months
[20:39] <nhm> infernix: I've seen behavior like that with balance-rr too. Trying to remember what caused it.
[20:39] <nhm> Though in my case it was more like days vs weeks.
[20:39] <infernix> it's on IB
[20:39] <nhm> oh, fun. ;)
[20:40] <infernix> only on one box
[20:40] <infernix> probably need upgrade kernel
[20:40] <fuzz> @nhm it has 2 internals SFFs to a LSI expander backplane I believe (8 SATA channels)
[20:40] <cephalobot> fuzz: Error: "nhm" is not a valid command.
[20:40] <nhm> fuzz: ah, the E16 then?
[20:41] <nhm> fuzz: I've got a SC847A sitting in my basement that we use for Ceph performance testing.
[20:41] <fuzz> they'll be 6047R
[20:42] <infernix> nhm: i have a new box, 48 SSDs, 6 HBAs
[20:42] <nhm> !
[20:42] * nhm jealous
[20:42] <infernix> i can let you benchmark it if you have time
[20:42] <tchmnkyz> hey dmick, question. those lag spikes i see when adding a new node and everything. If i increase my journal size from 1g to 10g would that help anyway? (i ask this because currently i use 10G ipoib. Please let me know if this is something that would be benifitial to my setup.
[20:43] <nhm> infernix: Let me see what I can do. This week and probably next week are insane, maybe after that.
[20:43] <infernix> you have 4 weeks
[20:43] <infernix> then i'm putting it in production
[20:43] <infernix> i already know it screams but have no time to do ceph benchmarks for the next few weeks
[20:44] <nhm> infernix: lol, ok. Might have to skip it. What kind of controllers and SSDs?
[20:44] <fuzz> looks like it's a 847 chassis with E16
[20:44] <infernix> 9207-8i, intel 3700
[20:44] <nhm> infernix: perfect
[20:44] <infernix> 2x 8core 2.9ghz
[20:44] <infernix> 64gb
[20:44] <fuzz> so you would run 36 OSDs in this case?
[20:44] <nhm> infernix: that thing is going to scream.
[20:44] <infernix> i am doing 13GByte/s sequential 128k
[20:44] <infernix> about 10GB/s random
[20:45] <nhm> infernix: what benchmark?
[20:45] <infernix> and about 1 million iops random 4k
[20:45] <infernix> custom, fio-like
[20:45] <infernix> direct io
[20:45] <nhm> infernix: straight to disk?
[20:45] <infernix> the interrupts are holding it back
[20:45] <infernix> yes
[20:45] <fuzz> also going to have 2 of these boxes .. is there a way to ensure that replication occurs onto the second box? thus if one goes down, all blocks are available on the second?
[20:45] <nhm> fuzz: depends on the use case and how much CPU you end up with.
[20:46] <nhm> fuzz: by default you should get a crush map that puts replicas on the other machine.
[20:46] <infernix> nhm, the hope is we can repurpose these for ceph in the future when we have rsockets/rdma and erasure coding
[20:46] <fuzz> thanks i'll look into crush maps .. havn't read about them yet :)
[20:46] <nhm> fuzz: but typically I'd recommend buying a couple of smaller machines so you spread objects out over more nodes.
[20:47] <infernix> for now they are linux md raid5s
[20:47] <fuzz> the IO will be fairly low .. about 1 million objects per day .. sizes between 32k-16M, average around 300kb
[20:48] <fuzz> (write)
[20:49] <nhm> fuzz: write or read heavy?
[20:50] <fuzz> i have 4 OSDs running in KVM each with a physical spindle with virtio, and it holds up right now
[20:50] <fuzz> basically 1:1 read:write, with some occasional random access
[20:51] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[20:52] <fuzz> over time it will probably become more read heavy though, but mostly just serving up recently added objects
[20:52] <nhm> yeah, that's only like 300GB/day so you should be able to design for capacity vs performance.
[20:53] <grepory1> gregaf: i tried mon debugging, but it wasn't very forthcoming. stracing the process on the client side was interesting… some mutex it's getting stuck on
[20:53] <fuzz> we'll become network bound first i think :)
[20:53] * grepory1 is now known as grepory
[20:53] <nhm> fuzz: might be worth looking into the fattwin nodes. I've never tested them, but it gives you almost as high density spread across more nodes.
[20:54] <fuzz> the reason i wanted so many spindles per box was density .. smaller nodes will fill the rack up quickly :)
[20:55] <nhm> fuzz: not hotswapable, but the "hadoop" fattwin sleds give you 12 3.5" HDDs per U plus 2x2.5" drives.
[20:56] <nhm> http://www.supermicro.com.tw/products/nfo/FatTwin.cfm
[20:56] <fuzz> hmm interesting
[20:57] <janos> interesting... that people still use cold fusion to power web sites ;)
[20:57] <unit3> or the Dell C-series servers, which pack 4 dual core systems into 2U with a 12x3.5" drive plan up front.
[20:57] <unit3> haha
[20:58] <unit3> erm drive plane
[20:58] <unit3> http://www.dell.com/us/business/p/poweredge-c6220/pd
[20:58] <unit3> those. they're pretty alright, for Dell. ;)
[20:58] <nhm> unit3: yeah, we've got some folks looking at the 24 drive C6220s. Haven't tested it yet.
[20:59] <unit3> that'd be the 2.5" ones, I guess? I'm using a 12 drive c6100 for my ceph cluster at home, it's working really well.
[20:59] <joelio> now someone needs to invent a drive caddy robot, for screwing the damn things in
[20:59] <unit3> That's on Ubuntu... ceph-deploy works perfectly on Ubu, wish I wasn't having so much trouble with it on CentOS, since that's what we're forced to use at work.
[20:59] <unit3> haha
[21:00] * mnash (~chatzilla@vpn.expressionanalysis.com) Quit (Remote host closed the connection)
[21:09] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[21:15] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[21:17] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[21:18] <infernix> nhm: we have fattwin2s
[21:19] <infernix> 4 in 2u, 12 disks only though
[21:22] <gregaf> grepory: the mutex is probably just the "okay, now I'm waiting for a message to come in…" thing
[21:23] * Tamil (~tamil@38.122.20.226) has joined #ceph
[21:38] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:38] * ChanServ sets mode +v andreask
[21:39] <nhm> infernix: how do you like them?
[21:39] <infernix> great hypervisors
[21:40] <nhm> infernix: those dense disk nodes look very interesting.
[21:41] <nhm> infernix: In a big deployment where you've got replication and don't care about pulling a whole node out of production for 5 minutes replace a failed drive, it could be a very interesting setup.
[21:41] <nhm> Especially if you can just pull the sled out without a lot of work.
[21:41] <infernix> you can
[21:42] <infernix> but they are best when ran diskless
[21:42] <infernix> at least for OS disk
[21:42] <nhm> I believe that.
[21:42] * haomaiwang (~haomaiwan@117.79.232.241) has joined #ceph
[21:43] <nhm> 12 disks + 2 SSDs could potentially be pretty fast too, though I suppose you are stuck with something like an on-board 2308.
[21:45] * mnash (~chatzilla@vpn.expressionanalysis.com) has joined #ceph
[21:50] * haomaiwang (~haomaiwan@117.79.232.241) Quit (Ping timeout: 480 seconds)
[21:51] * jtang1 (~jtang@142.176.24.2) Quit (Quit: Leaving.)
[21:53] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[21:54] * LeaChim (~LeaChim@90.221.247.164) Quit (Ping timeout: 480 seconds)
[21:54] * jtang1 (~jtang@142.176.24.2) has joined #ceph
[21:54] * b1tbkt_ (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) has joined #ceph
[21:54] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[21:58] * jtang1 (~jtang@142.176.24.2) Quit ()
[21:59] * b1tbkt (~b1tbkt@24-217-196-119.dhcp.stls.mo.charter.com) Quit (Ping timeout: 480 seconds)
[21:59] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[22:00] * scuttlemonkey_ is now known as scuttlemonkey
[22:04] <mtanski> I'm still experiencing this: http://tracker.ceph.com/issues/5036 after applying Yan's patch on 3.10
[22:04] <mtanski> Any pointers how to go and debug this issue?
[22:04] * Midnightmyth_ (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[22:09] <nhm> mtanski: good question, let me see if Greg is around.
[22:10] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) has joined #ceph
[22:11] * LeaChim (~LeaChim@90.217.166.163) has joined #ceph
[22:11] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[22:12] <nhm> mtanski: Greg is going to take a quick look at the bug again, but unfortunately we can't devote a ton of time to it right now since CephFS isn't really getting any funding atm. :(
[22:13] <gregaf> mtanski: do you have high-debug mds logs of the time before and as it happened?
[22:14] <gregaf> maybe somebody can take some time to find the cause in the next couple of days if you've got those, otherwise it's just a wild goose chase we unfortunately don't have time for right now :(
[22:15] <mtanski> I think I do have debug turned up in this case
[22:15] <mtanski> Since I did it last time I ran into this issue last week
[22:15] <mtanski> I guess I can troll through /prod/fd to see which one is doing it
[22:15] <gregaf> oh, unless it's another kclient bug, I suppose, but the logs would tell us that too
[22:15] <mtanski> and then backtrack
[22:16] <mtanski> I don't mind doing a lot of the investigation (at least cklient side) I do need a little guidance where to look tho
[22:18] <mtanski> Sadly root@betanode2:/proc/12985/task/12986/fd# ls hangs as well
[22:19] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (Ping timeout: 480 seconds)
[22:20] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) has joined #ceph
[22:21] <mtanski> FFS, I turned off the debugging for MDS on the 3rd
[22:21] <gregaf> mtanski: okay, so the request is blocking on the MDS for some reason
[22:22] <gregaf> if you remember the logs you pasted last time, they included a line with contents like "add_waiter tag…" and "taking waiter here" on the blocked inode, and right before that it would have printed out the inode state
[22:22] <gregaf> the inode state includes a "caps" section
[22:22] <gregaf> caps={5679=pAsLsXsFr/-@1,16909=pAsLsXsFr/pAsxXsxFxwb@3,16922=pAsLsXsFr/p@0,16923=pAsLsXsFr/pAsLsXsFrw/pAsxXsxFsxcrwb@2,16924=p/p@0,16961=pAsLsXsFr/-@1} for instance
[22:23] <mtanski> Yup
[22:24] <gregaf> that's "client=<pending caps>/<issued caps>/<wanted caps>" for each client; if the issued caps have stuff that the pending caps don't (in the first one there the xwb following the F, that being File exclusive, write, buffer) then the block is because a client is misbehaving and there's still/another client issue
[22:24] <grepory> gregaf: ahhh…. yeah i guess that makes sense.
[22:24] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[22:26] <gregaf> mtanski: in which case put it on the bug and I bet Yan Zheng will figure it out pretty quick
[22:26] <gregaf> if not then there's an MDS bug and we'll want the logs to track it down
[22:27] * nolan_ (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) has joined #ceph
[22:28] * nolan (~nolan@2001:470:1:41:20c:29ff:fe9a:60be) Quit (Ping timeout: 480 seconds)
[22:28] * nolan_ is now known as nolan
[22:28] <mtanski> Alright, I'll go investigate
[22:29] <gregaf> good luck! :)
[22:34] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has left #ceph
[22:40] * unit3 (~Unit3@72.2.49.50) has left #ceph
[22:41] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[22:42] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:43] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[22:43] * haomaiwang (~haomaiwan@211.155.113.223) has joined #ceph
[22:47] <mtanski> What do these mean? ptrwaiter=0 request=1 lock=1 caps=1 dirty=1 waiter=1 authpin=1
[22:48] * markbby (~Adium@168.94.245.2) has joined #ceph
[22:51] * haomaiwang (~haomaiwan@211.155.113.223) Quit (Ping timeout: 480 seconds)
[23:00] * madkiss (~madkiss@217.194.70.18) Quit (Quit: Leaving.)
[23:01] <gregaf> mtanski: they're the "pins" on the inode that are keeping it in cache — there's not a ptrwaiter (Iforget what that is), but there is a request, there's a lock of some kind, there are clients with caps, the inode is dirty, there's a waiter, and it's auth pinned
[23:01] * rturk-away is now known as rturk
[23:09] <sagewk> ptrwaiter is usually (always?) a Context that has an CInode*
[23:13] <sagewk> sjust: can you look at wip-mon-osdmap-trim?
[23:13] <sagewk> wondering if i should break reported into a separate epoch and sequence field (and not an eversion_t) to avoid confusion. the semantics of that field are already pretty bogus
[23:14] <sjust> sagewk: yeah
[23:15] <sjust> sagewk: one observation, this osd receives >100 map messages notifying it of 8/10 of the map epochs I have glanced at in these logs
[23:16] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Ping timeout: 480 seconds)
[23:18] * jjgalvez1 (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) has joined #ceph
[23:19] <sjust> sagewk: yeah, that use of eversion_t is a bit bogus
[23:19] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) Quit (Ping timeout: 480 seconds)
[23:20] <sjust> sagewk: looks good otherwise
[23:23] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[23:27] * sleinen1 (~Adium@2001:620:0:26:e99d:4b1f:644a:f0fd) Quit (Quit: Leaving.)
[23:27] * Macheske (~Bram@d5152D87C.static.telenet.be) Quit (Ping timeout: 480 seconds)
[23:30] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[23:31] * Machske (~Bram@d5152D87C.static.telenet.be) Quit (Ping timeout: 480 seconds)
[23:31] * madkiss (~madkiss@217.194.70.18) has joined #ceph
[23:37] * rturk is now known as rturk-away
[23:39] <skm> why does the output from df on a cephfs mount show the incorrect usage info? Is this expected currently?
[23:39] <skm> 172.31.2.103:6789:/ 852M 83M 769M 10% /mnt/ceph
[23:40] <skm> it shows 83M used and there are 11GB worth of files in one of the dirs
[23:41] <dmick> 11GB of real data, or does it have holes in it
[23:41] <skm> real data...I copied a bunch of 700 MB .wav files there
[23:42] <skm> root@ceph4:/mnt/ceph# du -shc audio
[23:42] <skm> 11G audio
[23:42] * n3c8-35575 (~mhattersl@84.19.35.10) has joined #ceph
[23:42] * n3c8-35575v2 (~mhattersl@pix.office.vaioni.com) Quit (Read error: Connection reset by peer)
[23:43] <skm> doing an ls shows the correct numbers for the folder...but df looks broken
[23:44] * haomaiwang (~haomaiwan@notes4.com) has joined #ceph
[23:44] * madkiss (~madkiss@217.194.70.18) Quit (Ping timeout: 480 seconds)
[23:44] * kyle_ (~kyle@216.183.64.10) has joined #ceph
[23:45] <nhm> skm: Sounds like it could be a MDS bug. Mind submitting a bug report?
[23:45] <skm> sure
[23:45] <nhm> skm: can't guarantee that we'll be able to get to it really soon since the CephFS work is mostly unfunded right now, but at least we'll have a record of it.
[23:46] <skm> np
[23:46] <nhm> skm: also allows the community to pick it up if someone else has time to work on it! :)
[23:50] <dmick> ah, df on cephfs. I wonder, does ceph df give more-reasonable numbers?
[23:50] <gregaf> only in the sense that it doesn't try to map object store values into filesystem values
[23:50] * haomaiwang (~haomaiwan@notes4.com) Quit (Read error: Operation timed out)
[23:56] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Quit: Leaving.)
[23:58] <sagewk> gregaf: did you see revisions for wip-mon-scrub?
[23:59] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[23:59] <gregaf> I think they were in as I reviewed them? or is there something newer?

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.