#ceph IRC Log

Index

IRC Log for 2013-07-12

Timestamps are in GMT/BST.

[0:02] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[0:04] * Dark-Ace-Z is now known as DarkAceZ
[0:06] * jeff-YF (~jeffyf@67.23.117.122) Quit (Ping timeout: 480 seconds)
[0:09] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[0:12] * markbby (~Adium@168.94.245.2) has joined #ceph
[0:15] * humbolt (~elias@p4FEAD8A4.dip0.t-ipconnect.de) has joined #ceph
[0:17] * BillK (~BillK-OFT@124-148-212-240.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[0:19] * aliguori (~anthony@32.97.110.51) Quit (Remote host closed the connection)
[0:27] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[0:36] * _Tass4dar (~tassadar@tassadar.xs4all.nl) has joined #ceph
[0:37] * _Tassadar (~tassadar@tassadar.xs4all.nl) Quit (Ping timeout: 480 seconds)
[0:40] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[0:40] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[0:48] * BillK (~BillK-OFT@124-148-212-240.dyn.iinet.net.au) has joined #ceph
[0:54] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[1:00] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[1:03] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[1:10] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[1:13] * Lea (~LeaChim@2.216.167.255) Quit (Remote host closed the connection)
[1:17] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[1:18] * LeaChim (~LeaChim@2.216.167.255) has joined #ceph
[1:20] * Cube (~Cube@12.248.40.138) has joined #ceph
[1:33] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[1:36] <jackhill> joshd, rturk thanks!
[1:38] * gregaf2 (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[1:38] <rturk> no problem :)
[1:39] <rturk> If you want to see me giving a similar talk: http://www.youtube.com/watch?v=RAfsHEaiVxI
[1:39] <rturk> might help you decode some of the slides with no words on them
[1:40] <Midnightmyth> its a nice talk actually
[1:40] * infernix (nix@cl-1404.ams-04.nl.sixxs.net) Quit (Ping timeout: 480 seconds)
[1:42] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Read error: Operation timed out)
[1:42] * infernix (nix@cl-1404.ams-04.nl.sixxs.net) has joined #ceph
[1:48] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[1:50] * toMeloos (~tom@53545693.cm-6-5b.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[1:54] * diegows (~diegows@200.68.116.185) Quit (Ping timeout: 480 seconds)
[1:59] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:05] * rturk is now known as rturk-away
[2:05] * humbolt_ (~elias@p4FEADAFC.dip0.t-ipconnect.de) has joined #ceph
[2:07] * rturk-away is now known as rturk
[2:08] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[2:08] <rturk> Midnightmyth: thanks :)
[2:11] * humbolt (~elias@p4FEAD8A4.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[2:11] * humbolt_ is now known as humbolt
[2:15] * rturk is now known as rturk-away
[2:20] * sagelap (~sage@12.130.118.19) has joined #ceph
[2:23] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[2:24] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[2:27] * humbolt (~elias@p4FEADAFC.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[2:27] * LeaChim (~LeaChim@2.216.167.255) Quit (Read error: Operation timed out)
[2:33] * jtang1 (~jtang@blk-222-209-164.eastlink.ca) has joined #ceph
[2:43] * jtang2 (~jtang@blk-222-209-164.eastlink.ca) has joined #ceph
[2:43] * jtang1 (~jtang@blk-222-209-164.eastlink.ca) Quit (Quit: Leaving.)
[2:54] * yy (~michealyx@218.74.34.54) has joined #ceph
[2:59] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit (Quit: mtanski)
[3:00] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Ping timeout: 480 seconds)
[3:07] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Bye!)
[3:16] * john_barbee (~jbarbee@c-50-165-106-164.hsd1.in.comcast.net) has joined #ceph
[3:18] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[3:19] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[3:20] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit ()
[3:32] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit (Quit: mtanski)
[3:43] * janisg (~troll@85.254.50.23) Quit (Ping timeout: 480 seconds)
[3:46] * jtang2 (~jtang@blk-222-209-164.eastlink.ca) Quit (Quit: Leaving.)
[3:49] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[3:49] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[3:51] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) has joined #ceph
[3:53] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[4:00] * julian (~julianwa@125.69.104.140) has joined #ceph
[4:02] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:14] * sagelap (~sage@12.130.118.19) Quit (Remote host closed the connection)
[4:14] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[4:15] * sagelap (~sage@12.130.118.19) has joined #ceph
[4:15] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Remote host closed the connection)
[4:15] * xmltok (~xmltok@relay.els4.ticketmaster.com) has joined #ceph
[4:20] * sagelap (~sage@12.130.118.19) Quit (Remote host closed the connection)
[4:34] * Cube (~Cube@12.248.40.138) Quit (Read error: Operation timed out)
[4:41] * sagelap (~sage@2600:1012:b01d:ba22:40de:4923:3059:6d04) has joined #ceph
[4:44] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Read error: Operation timed out)
[4:57] * gregaf2 (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[5:00] * fireD (~fireD@93-142-232-63.adsl.net.t-com.hr) has joined #ceph
[5:01] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[5:03] * fireD1 (~fireD@93-139-145-34.adsl.net.t-com.hr) Quit (Read error: Operation timed out)
[5:03] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) Quit (Quit: Leaving.)
[5:07] * sagelap (~sage@2600:1012:b01d:ba22:40de:4923:3059:6d04) Quit (Read error: No route to host)
[5:11] <yy> hello, i have a problem about mon electing?
[5:12] * gregaf1 (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[5:13] <yy> i have three mon node. when one of them is down or stop , the ceph cluster just stuck! no respond from command line.
[5:15] <yy> when i check the log of the rest of mons, i find one mon stuck the "electing sync( leader provider state none )" state ,and the other stuck "(synchronizing sync( requester state chunks ))" state.
[5:16] <dmick> yy: are all monitors running the same version?
[5:21] <yy> yes
[5:21] <yy> ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
[5:24] <yy> dmick: At first the ceph version is 0.61.1, then update to 0.61.4
[5:27] <dmick> when the monitors are stuck, you can use the ceph --admin-daemon <asok> mon_status command to verify their status (and there's also a version command)
[5:28] <dmick> <asok> is usually in /var/run/ceph/{cluster}-mon.{id}.asok
[5:28] <dmick> (this connects through the admin socket, so doesn't require the monitors to work)
[5:28] <dmick> check that output on all the running mons
[5:30] * mtanski (~mtanski@cpe-74-65-252-48.nyc.res.rr.com) Quit (Quit: mtanski)
[5:33] * off_rhoden_ (~anonymous@pool-173-79-66-35.washdc.fios.verizon.net) has joined #ceph
[5:34] * off_rhoden (~anonymous@pool-173-79-66-35.washdc.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[5:34] * off_rhoden_ is now known as off_rhoden
[5:35] <yy> using above command you said, i get one mon's status is synchronizing sync( requester state chunks ), the other is "state": "electing sync( provider state none )",
[5:40] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[5:44] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley_)
[5:54] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[5:59] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[6:02] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) Quit (Read error: Connection reset by peer)
[6:05] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[6:58] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) Quit (Ping timeout: 480 seconds)
[7:03] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) Quit (Ping timeout: 480 seconds)
[7:03] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:08] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has joined #ceph
[7:15] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) has joined #ceph
[7:23] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) Quit (Ping timeout: 480 seconds)
[7:30] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[7:30] * Tamil (~Adium@cpe-108-184-66-69.socal.res.rr.com) has left #ceph
[7:35] * yehuda_hm (~yehuda@2602:306:330b:1410:baac:6fff:fec5:2aad) has joined #ceph
[7:47] * john_barbee (~jbarbee@c-50-165-106-164.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[7:49] * john_barbee (~jbarbee@c-50-165-106-164.hsd1.in.comcast.net) has joined #ceph
[8:06] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:12] * agh (~oftc-webi@gw-to-666.outscale.net) Quit (Quit: Page closed)
[8:29] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[8:29] * ChanServ sets mode +v andreask
[8:37] * yy (~michealyx@218.74.34.54) has left #ceph
[8:38] * yy (~michealyx@218.74.34.54) has joined #ceph
[8:45] * sjusthm (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:48] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[8:57] * john_barbee (~jbarbee@c-50-165-106-164.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[9:00] * julian (~julianwa@125.69.104.140) Quit (Read error: Connection reset by peer)
[9:01] * janisg (~troll@159.148.100.241) has joined #ceph
[9:02] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[9:02] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit ()
[9:04] * mschiff (~mschiff@port-92723.pppoe.wtnet.de) has joined #ceph
[9:09] * xmltok (~xmltok@relay.els4.ticketmaster.com) Quit (Ping timeout: 480 seconds)
[9:10] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[9:12] * ChanServ sets mode +v leseb
[9:12] * eternaleye (~eternaley@2002:3284:29cb::1) Quit (Ping timeout: 480 seconds)
[9:14] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:15] * eternaleye (~eternaley@2002:3284:29cb::1) has joined #ceph
[9:16] * humbolt (~elias@p4FEADAFC.dip0.t-ipconnect.de) has joined #ceph
[9:18] * humbolt (~elias@p4FEADAFC.dip0.t-ipconnect.de) Quit ()
[9:27] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:27] * eternaleye (~eternaley@2002:3284:29cb::1) Quit (Ping timeout: 480 seconds)
[9:28] * eternaleye (~eternaley@c-50-132-41-203.hsd1.wa.comcast.net) has joined #ceph
[9:31] * julian (~julianwa@125.69.105.128) has joined #ceph
[9:36] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[9:36] * trond_ (~trond@trh.betradar.com) Quit (Read error: Connection reset by peer)
[9:37] * mjeanson (~mjeanson@00012705.user.oftc.net) Quit (Remote host closed the connection)
[9:38] * `10__ (~10@juke.fm) Quit (Remote host closed the connection)
[9:38] * mjeanson (~mjeanson@bell.multivax.ca) has joined #ceph
[9:39] * `10__ (~10@juke.fm) has joined #ceph
[9:40] * ScOut3R_ (~ScOut3R@rock.adverticum.com) has joined #ceph
[9:43] * wer (~wer@206-248-239-142.unassigned.ntelos.net) Quit (Ping timeout: 480 seconds)
[9:43] * wer (~wer@206-248-239-142.unassigned.ntelos.net) has joined #ceph
[9:43] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[9:46] * trond (~trond@trh.betradar.com) has joined #ceph
[9:46] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[9:53] <ccourtaut> morning
[10:00] * vipr (~vipr@78-21-229-157.access.telenet.be) has joined #ceph
[10:02] * stacker666 (~stacker66@90.163.235.0) has joined #ceph
[10:17] * LeaChim (~LeaChim@2.216.167.255) has joined #ceph
[10:28] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[10:42] * iggy (~iggy@theiggy.com) Quit (Remote host closed the connection)
[10:43] * iggy (~iggy@theiggy.com) has joined #ceph
[10:47] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[10:53] * ScOut3R_ (~ScOut3R@rock.adverticum.com) Quit (Ping timeout: 480 seconds)
[10:59] * john_barbee (~jbarbee@c-50-165-106-164.hsd1.in.comcast.net) has joined #ceph
[11:16] * dignus (~dignus@bastion.jkit.nl) has joined #ceph
[11:46] * jtang1 (~jtang@blk-222-209-164.eastlink.ca) has joined #ceph
[11:59] * yy (~michealyx@218.74.34.54) has left #ceph
[12:01] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[12:03] * julian (~julianwa@125.69.105.128) Quit (Quit: afk)
[12:07] * john_barbee (~jbarbee@c-50-165-106-164.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[12:09] * john_barbee (~jbarbee@c-50-165-106-164.hsd1.in.comcast.net) has joined #ceph
[12:28] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[12:37] * john_barbee (~jbarbee@c-50-165-106-164.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[12:43] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[12:43] * ChanServ sets mode +v andreask
[12:46] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[12:50] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:00] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[13:07] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[13:30] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[13:48] * AfC (~andrew@gateway.syd.operationaldynamics.com) has joined #ceph
[14:04] <vipr> Is there a way or a command to see which RBD are connected to/ being used at that moment?
[14:06] * jtang1 (~jtang@blk-222-209-164.eastlink.ca) Quit (Quit: Leaving.)
[14:08] * jtang1 (~jtang@blk-222-209-164.eastlink.ca) has joined #ceph
[14:11] * dignus (~dignus@bastion.jkit.nl) Quit (Ping timeout: 480 seconds)
[14:11] * dignus (~dignus@bastion.jkit.nl) has joined #ceph
[14:14] * jtang1 (~jtang@blk-222-209-164.eastlink.ca) Quit (Quit: Leaving.)
[14:21] * yanzheng (~zhyan@134.134.139.72) has joined #ceph
[14:22] * AfC (~andrew@gateway.syd.operationaldynamics.com) Quit (Quit: Leaving.)
[14:24] * andreask1 (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[14:24] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Connection reset by peer)
[14:24] * ChanServ sets mode +v andreask1
[14:24] * andreask1 is now known as andreask
[14:25] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[14:26] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[14:32] * jksM (~jks@4810ds1-ns.2.fullrate.dk) has joined #ceph
[14:40] * jks (~jks@3e6b5724.rev.stofanet.dk) Quit (Ping timeout: 480 seconds)
[14:40] * jtang1 (~jtang@142.176.24.2) has joined #ceph
[14:48] * markbby (~Adium@168.94.245.4) has joined #ceph
[14:50] * AfC (~andrew@2001:44b8:31cb:d400:7c45:f9b8:8d74:6b09) has joined #ceph
[14:50] * diegows (~diegows@190.190.2.126) has joined #ceph
[14:59] * jks (~jks@3e6b5724.rev.stofanet.dk) has joined #ceph
[14:59] * jksM (~jks@4810ds1-ns.2.fullrate.dk) Quit (Read error: Connection reset by peer)
[15:02] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[15:06] <odyssey4me> Hi everyone - I'm planning to use Ceph in production with Ubuntu Precise. I see the documentation recommends using kernel v3.6.6 or later, or v3.4.20 or later. Neither of these are available for Precise, but v3.5 or v3.8 are available. What's the recommendation for best performance... especially when using RBD for FUSE and/or KVM?
[15:07] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[15:10] * mtanski (~mtanski@69.193.178.202) Quit ()
[15:11] * ScOut3R_ (~ScOut3R@rock.adverticum.com) has joined #ceph
[15:12] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[15:14] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[15:17] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[15:19] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[15:21] * yanzheng (~zhyan@134.134.139.72) Quit (Remote host closed the connection)
[15:26] * fridudad (~oftc-webi@fw-office.allied-internet.ag) Quit (Remote host closed the connection)
[15:26] * ScOut3R_ (~ScOut3R@rock.adverticum.com) Quit (Ping timeout: 480 seconds)
[15:28] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[15:30] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[15:39] * markbby (~Adium@168.94.245.4) Quit (Quit: Leaving.)
[15:39] * markbby (~Adium@168.94.245.4) has joined #ceph
[15:44] <jks> Trying to start qemu.system-x86_64 with a disk on rbd, but it segfaults inside librbd::aio:flush(). Anyone seen this before?
[15:45] <jks> (qemu is version 1.4.2 and rbd version 0.61.4)
[15:45] <joelio> yea, I have, had an old libvirt iirc
[15:45] * joelio may be wrong ofc
[15:45] <jks> I'm not using libvirt
[15:45] <joelio> ignore me, that had nothing to do with libvirt
[15:45] <joelio> snap :)
[15:46] <jks> ;-)
[15:46] <joelio> odyssey4me: I use raring kernel (on raring mind), works fine
[15:48] * illya (~illya_hav@9-39-133-95.pool.ukrtel.net) has joined #ceph
[15:49] <illya> hi
[15:49] <jks> hi
[15:49] <illya> fyi i solved my yesterday issue by cluster rebuild and adding more space on OSDs
[15:50] <illya> question - I'm doing remote start of ceph-fuse
[15:50] <illya> and my ceph-fuse process stops when I do logout
[15:50] <joelio> illya: :)
[15:51] <illya> any hints ?
[15:51] <illya> I've added row into fstab
[15:51] <joelio> not an mds user, but why not test the kernel drive?
[15:51] <illya> and when I do restart all fine
[15:51] <joelio> otherswise it may need some initscript etc
[15:51] <joelio> as FUSE is userspace
[15:52] <illya> I know
[15:52] <illya> kernel driver works fine
[15:52] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[15:52] <illya> currently one of our deployment option is to run inside cloud
[15:52] <illya> and their kernel do not have kernel module build in :(
[15:53] <joelio> well, that's kinda the point of FUSE, as it's user space you need to run as a given user, so if you're starting them up not in a screen or properly backgrounded, I guess they'll sigkill when log out
[15:53] <illya> so trying FUSE as an other option
[15:53] <joelio> not that au fait with it, admittedly
[16:00] <illya> -d
[16:00] <illya> Detach from console and daemonize after startup.
[16:00] <illya> from ceph fuse docs
[16:01] * jtang1 (~jtang@142.176.24.2) Quit (Quit: Leaving.)
[16:04] <loicd> has anyone tried to ceph-deploy osd prepare on a LVM logical volume ? For some reason the partitions do not show in /dev/mapper and mkfs -t xfs fails
[16:04] * loicd investigating
[16:08] <odyssey4me> joelio - thanks, I just did a performance test using the raring kernel on precise and it definitely performs better :)
[16:08] <joelio> odyssey4me: good stuff!
[16:11] <odyssey4me> joelio - ceph is actually giving performance relatively close to a 6 disk RAID5 disk set... it's awesome :)
[16:12] <joelio> what are you testing rbd or cephfs?
[16:13] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[16:13] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit ()
[16:13] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[16:17] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[16:18] * scuttlemonkey_ is now known as scuttlemonkey
[16:27] * jtang1 (~jtang@142.176.24.2) has joined #ceph
[16:43] * jtang1 (~jtang@142.176.24.2) Quit (Quit: Leaving.)
[16:44] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[16:44] * jtang1 (~jtang@142.176.24.2) has joined #ceph
[16:47] * BillK (~BillK-OFT@124-148-212-240.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:50] <illya> fyi - my solution was:
[16:50] <illya> 1) add to /etc/fstab
[16:50] <illya> 2) call nohup mount -a
[16:50] <illya> ceph-fuse can be started remotely and is alive
[16:54] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Operation timed out)
[16:54] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[16:55] <joelio> illya: cool, good to know
[16:58] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:59] <odyssey4me> joelio - testing rbd, mounting via kernel module
[17:00] <odyssey4me> so, I want to mount via fstab - is auth required when doing this or can it be done with auth disabled?
[17:00] <joelio> you can specify auth on the fstab entry iirc
[17:02] <joelio> oh actually, that might be for cephfs
[17:02] <odyssey4me> joelio - http://ceph.com/docs/next/cephfs/fstab/ shows how to do it, including the auth spec... I'm hoping to avoid using auth as ceph is in a private environment anyway
[17:03] <joelio> odyssey4me: that's cephfs, not rbd
[17:03] <odyssey4me> joelio - how do I mount rbd via fstab then?
[17:04] <joelio> do you need to mount rbd via ftsab? an not just have them mapped?
[17:04] * jeff-YF (~jeffyf@67.23.117.122) Quit (Quit: jeff-YF)
[17:04] <joelio> mounting an rbd device as a filesystem is different to mapping the rbd device to be used as a blcok device
[17:05] <odyssey4me> joelio - ah, I see... so I'd like to map the rbd device as a block device on boot reliably... how is it best to do that?
[17:06] <odyssey4me> from what I can tell, this is what gives good performance - whereas fuse/cephfs, from what I've read, isn't as good or reliable...
[17:06] <jks> I just installed Ubuntu 13.04 from scratch and installed Ceph 0.61.4 and qemu 1.4.2 ... and I get the same segfault in librbd when I qemu-system-x86_64
[17:06] <jks> seems quite odd?
[17:06] <joelio> odyssey4me: well, yes, cephfs is a POSIX compliant filesystem, which is not a block device, which rbd is
[17:07] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) Quit (Quit: Leaving)
[17:08] <joelio> odyssey4me: https://github.com/ceph/ceph/commit/a4ddf704868832e119d7949e96fe35ab1920f06a
[17:09] <joelio> jks: does it start to boot at all? i.e. it's not an auth issue somewhere?
[17:10] <jks> joelio, as far as I can see, yes it does start to boot - but it segfaults in librbd::aio_flush()
[17:10] <odyssey4me> joelio - ah, good call... let me check that out. fyi I've been using this process so far for performance testing: http://ceph.com/docs/next/start/quick-rbd/
[17:10] * jeff-YF (~jeffyf@216.14.83.26) has joined #ceph
[17:11] <joelio> odyssey4me: yea, that's right. Checkout rbd caching too for write caching (doesn't wait for sync, so quicker)
[17:12] <joelio> jks: are you setting the cache in qemu. iirc you need to enable cache=writeback to make sure flushes are enabled (at least in libvirt)
[17:12] <joelio> may be way off there though
[17:12] <jks> joelio, I have tried both with and with cache=writeback
[17:12] <jks> but I'm pretty sure qemu shouldn't segfault in any case :-)
[17:13] <jks> this setup worked with bobtail - but now with cuttlefish I have this problem
[17:13] <joelio> aye, sorry not too certain then mate, not been using aio (although I should)
[17:14] <joelio> and you've got it enabled in the cuttlefish config?
[17:14] <jks> got what enabled?
[17:14] <joelio> journal_aio?
[17:14] <jks> no?
[17:14] <jks> I have no idea what it is?
[17:15] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) has joined #ceph
[17:15] <joelio> jks: oh, I thought that's what you were using, but maybe this is only for storage side
[17:16] <joelio> http://ceph.com/docs/master/rados/configuration/journal-ref/
[17:16] <jks> It's a pretty "blank" config
[17:16] <jks> I just freshly installed ubuntu 13.04 and installed ceph from the ceph.com repository
[17:16] <joelio> ahh, apparmour?
[17:17] <jks> hmm, I could try disabling that ofcourse!
[17:17] <joelio> check in dmesg
[17:17] <joelio> if you see loads of errors.
[17:17] <jks> I don't
[17:17] <jks> the only thing is this: qemu-system-x86[2963]: segfault at 0 ip 00007f81a8b32fd3 sp 00007f818affcdb0 error 4 in librbd.so.1.0.0[7f81a8aec000+9e000]
[17:18] * AfC (~andrew@2001:44b8:31cb:d400:7c45:f9b8:8d74:6b09) Quit (Quit: Leaving.)
[17:18] <jks> I have tried now with apparmor disabled, and I get the same segfault
[17:20] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:20] * jeff-YF_ (~jeffyf@67.23.123.228) has joined #ceph
[17:21] <joelio> and you can successfully map that rbd as that user normally?
[17:21] <jks> with the kernel rbd mapper?
[17:22] <joelio> (just trying to remove ceph related things.. see if it's a qemu thing)
[17:22] <joelio> aye
[17:22] <joelio> just see if you can map them
[17:22] <jks> I could on other machines, I'll try on this machine also
[17:22] * jeff-YF (~jeffyf@216.14.83.26) Quit (Ping timeout: 480 seconds)
[17:22] * jeff-YF_ is now known as jeff-YF
[17:22] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[17:23] <jks> hmmm, that was odd! it's not working!
[17:24] <joelio> I don't know if that's good or bad, but it's something to go at
[17:24] <jks> rbd map just hangs and dmesg is full of messages like this: libceph: mon0 10.0.0.1:6789 feature set mismatch, my 40002 < server's 2040002, missing 2000000
[17:24] <joelio> eek
[17:24] <joelio> all upgraded, or fresh cuttlefish install?
[17:25] <jks> the ceph cluster itself started out as argonaut and have since been upgraded to bobtail and now cuttlefish
[17:25] <jks> the client is freshly installed with cuttlefish
[17:25] <joelio> not something I can advise on I'm afraid, maybe a dev has more insight
[17:25] <joelio> have you restarted all the cluster btw
[17:26] <odyssey4me> joelio - excellent, that init script and config is working perfectly now
[17:26] <jks> all mons and osds are running 0.61.4 and have since the day it was released
[17:26] <joelio> jks: afaik the whole shebang needs to be restarted to make it current
[17:27] <jks> joelio, what do you mean by "restarted" exactly?
[17:27] <joelio> odyssey4me: cool, thank the dev who wrote it though, not me - I meerly pointed :D
[17:27] * stacker666 (~stacker66@90.163.235.0) Quit (Read error: Operation timed out)
[17:27] <odyssey4me> joelio - the point has massive value to me :)
[17:28] <joelio> :D
[17:29] <odyssey4me> now to test performance from inside a kvm domain - via a block device mapping, and via a block device image...
[17:29] <joelio> jks: ensure all the parts of the stack have been restarted, no errant processes kicking about still, using the older map
[17:29] <joelio> jks: apart from that one for list/devs
[17:29] <jks> joelio, everything has been restarted... everything has been running 0.61.4 for a very long time
[17:30] <odyssey4me> (ie using the path now mapped with a standard qcow2 image, or via putting the qcow image into rbd and configuring kvm to talk directly to rbd)
[17:30] <jks> as far as I understand the error, it says that the server has more features than the client - so I assume it is a problem with the client, not the server?
[17:30] <joelio> jks: no idea I'm afraid
[17:30] <joelio> odyssey4me: http://ceph.com/w/index.php?title=Benchmark&oldid=5733
[17:30] <jks> joelio: I have other servers not running ubuntu that are accessing the ceph-cluster perfectly fine using qemu-kvm client
[17:30] <jks> so it seems it is only a problem when I install the ubuntu packages hmm :-|
[17:31] * jtang1 (~jtang@142.176.24.2) Quit (Quit: Leaving.)
[17:31] <joelio> odyssey4me: is qcow2 supported, thought it was all raw - be good if so
[17:32] <odyssey4me> joelio - I recall reading raw too... will test it all out
[17:33] <odyssey4me> joelio - thx for the benchmark... I've been referencing this too http://www.sebastien-han.fr/blog/2012/08/26/ceph-benchmarks/
[17:33] <jks> joelio, do you know if the feature set is printed in octal, decimal or?
[17:34] <mxmln> qcow2 doesnt work you will face filesystem problem
[17:34] * mschiff (~mschiff@port-92723.pppoe.wtnet.de) Quit (Remote host closed the connection)
[17:34] <mxmln> inside rbd only raw
[17:34] * mschiff (~mschiff@port-92723.pppoe.wtnet.de) has joined #ceph
[17:35] <odyssey4me> mxmln I take it that's only when using libvirt with rbd?
[17:36] <nhm> benchmarks benchmark benchmarks. :)
[17:36] <mxmln> yes I 've test it only using libvirt
[17:38] * sagelap (~sage@76.89.177.113) has joined #ceph
[17:38] * vata (~vata@2607:fad8:4:6:70a6:cde4:f3b9:58c2) has joined #ceph
[17:39] <jks> joelio, I have checked the source... seems like aio_flush doesn't check the image context ictx for NULL before it dereferences the value :-|
[17:42] <joelio> jks: catch
[17:42] <joelio> now patch it, haha :)
[17:45] * AfC (~andrew@2001:44b8:31cb:d400:cca9:f8da:7025:edbc) has joined #ceph
[17:45] <jks> hehe, well... then I would need to wonder why it is called with a NULL parameter ;-)
[17:45] * AfC (~andrew@2001:44b8:31cb:d400:cca9:f8da:7025:edbc) Quit ()
[17:46] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[17:47] * AfC (~andrew@2001:44b8:31cb:d400:cca9:f8da:7025:edbc) has joined #ceph
[17:47] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:48] * DiabloD3 (~diablo@64.222.255.123) has joined #ceph
[17:48] <DiabloD3> is ceph available on osx?
[17:53] * hybrid512 (~walid@106-171-static.pacwan.net) Quit (Quit: Leaving.)
[17:56] * jeff-YF (~jeffyf@67.23.123.228) Quit (Quit: jeff-YF)
[17:56] <odyssey4me> joelio - I don't have SSD's... however, if I dedicate a disk per host for journals do you think it'd make a big difference?
[17:57] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[17:57] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[17:57] <nhm> odyssey4me: you want the sum throughput capability of each journal to be the same as the throughput of the underlying OSD disk.
[17:58] <joelio> odyssey4me: I don't know your workload or setup. You need to measure at suitable points in the system, looks for bottleneck
[17:58] <joelio> yea what nhm said :)
[17:58] <nhm> odyssey4me: so IE a good match would be 3 150MB/s spinning disks OSDs, and 1 450MB/s SSD for the journals for those.
[17:59] * jeff-YF (~jeffyf@67.23.117.122) has joined #ceph
[18:04] <odyssey4me> joelio - regarding rbd caching... reference in the docs is to rbd.conf... but I don't find any other ref to it... do I just make one?
[18:04] * DiabloD3 (~diablo@64.222.255.123) has left #ceph
[18:06] <odyssey4me> nhm - ah, good call... so the number of journaling disks would depend on the number of disks being used as osd's... effectively the write speed of the osd's needs to be matched by the journal disks
[18:06] <odyssey4me> so if the osd & journal disks are the same speed, then best is a 1-1 ratio?
[18:07] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[18:08] * sagelap (~sage@76.89.177.113) Quit (Ping timeout: 480 seconds)
[18:09] <nhm> odyssey4me: yeah, though it may depend on your network too.
[18:09] <nhm> odyssey4me: ie if you have a single 10GbE link, it may not be worth using more than 2-3 SSDs.
[18:09] <joelio> odyssey4me: just add rbd_cache=true to config (along with any other tunables you want).
[18:09] <nhm> (per node that is)
[18:10] <nhm> One combination might be something like 12-bay 2U boxes with 12 spinning disks, and 2 SSDs in 2x2.5" bays.
[18:11] <nhm> And a single 10GbE link.
[18:16] * LeaChim (~LeaChim@2.216.167.255) Quit (Read error: Operation timed out)
[18:18] * AfC (~andrew@2001:44b8:31cb:d400:cca9:f8da:7025:edbc) Quit (Quit: Leaving.)
[18:20] <odyssey4me> nhm - using 2x10GBe bonded, with l2&l3 hash... so network linkage is pretty awesome :)
[18:24] <grepory> odyssey4me: we're doing the same thing, but we haven't started testing. throughput/latency are good?
[18:27] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[18:27] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:34] * julian (~julianwa@125.69.105.128) has joined #ceph
[18:42] <odyssey4me> grepory - throughput is comparable to RAID5 so far... still busy testing... latency I've not spend too much time on... gotta run, chat monday if you're on
[18:43] <grepory> later, sounds good
[18:44] <nhm> odyssey4me: nice, I'm using RR bonded on my test setup.
[18:44] <nhm> odyssey4me: put out a bunch of RBD benchmarks this week.
[18:44] <grepory> nhm: any reason you went rr over 802.3ad?
[18:45] <grepory> switches?
[18:45] <nhm> grepory: the two nodes are directly connected via SFP+ cables. :)
[18:45] <nhm> grepory: RR is actually working really well though.
[18:45] <grepory> ohhhhhhh
[18:46] <grepory> we are pretty stoked… finally got 802.3ad working this week with pretty close to the full 20Gbps throughput
[18:46] <grepory> it was a good day.
[18:46] <grepory> whiskey-worthy, even.
[18:46] <nhm> grepory: nice!
[18:46] <grepory> it took more configuration on the linux side of things than i was originally expecting.
[18:50] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[18:55] * LeaChim (~LeaChim@2.216.167.255) has joined #ceph
[18:59] * jtang1 (~jtang@142.176.24.2) has joined #ceph
[19:00] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[19:03] * LeaChim (~LeaChim@2.216.167.255) Quit (Ping timeout: 480 seconds)
[19:04] * LeaChim (~LeaChim@2.216.167.255) has joined #ceph
[19:06] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) Quit (Remote host closed the connection)
[19:08] * Tamil (~tamil@38.122.20.226) has joined #ceph
[19:14] * sjust (~sam@38.122.20.226) has joined #ceph
[19:27] <sage> yehudasa: the wip-corpus rgw bits look ok?
[19:27] * nhm (~nhm@184-97-193-106.mpls.qwest.net) Quit (Read error: No route to host)
[19:28] * xmltok (~xmltok@pool101.bizrate.com) Quit (Remote host closed the connection)
[19:28] * xmltok (~xmltok@relay.els4.ticketmaster.com) has joined #ceph
[19:29] <jks> when upgrading from bobtail to cuttlefish, do I have to stop all clients at the same time in order for the mon to agree on a new feature set? - or is it enough that every client is eventually restarted one by one?
[19:29] * off_rhoden (~anonymous@pool-173-79-66-35.washdc.fios.verizon.net) Quit (Quit: off_rhoden)
[19:30] <gregaf> the clients shouldn't need to be restarted at all, I don't think
[19:31] <jks> I'm trying to investigate this error: "libceph: mon1 10.0.0.2:6789 feature set mismatch, my 40002 < server's 2040002, missing 2000000"
[19:31] * nhm (~nhm@184-97-193-106.mpls.qwest.net) has joined #ceph
[19:31] <jks> as everything is running 0.61.4, I cannot figure out why I get the error
[19:31] <gregaf> that's the kernel client, right?
[19:31] <jks> yes
[19:32] <jks> I tried the kernel client, since the user-space qemu librbd driver always segfaults
[19:32] <jks> I'm guessing that it perhaps segfaults due to the same cause
[19:32] <gregaf> you should talk to joshd about that when he gets in
[19:32] <gregaf> the segfault, I mean
[19:33] <jks> okay :-) I sent a post to the ceph-devel with the line of code that causes the segfault
[19:33] <gregaf> yeah, i saw that
[19:33] <jks> okay :-)
[19:41] * julian (~julianwa@125.69.105.128) Quit (Quit: afk)
[19:44] * jtang1 (~jtang@142.176.24.2) Quit (Quit: Leaving.)
[19:48] * xmltok_ (~xmltok@pool101.bizrate.com) has joined #ceph
[19:53] * sjustlaptop (~sam@38.122.20.226) has joined #ceph
[19:56] * xmltok (~xmltok@relay.els4.ticketmaster.com) Quit (Ping timeout: 480 seconds)
[19:58] <dmick> hey nhm: along with the huge amount of really awesome blogging: are you aware of any short list of "this is why you can expect the performance you get out of your configuration" to address the inevitable n00b latency shock?
[20:00] * sjustlaptop (~sam@38.122.20.226) Quit (Quit: Leaving.)
[20:00] <dmick> I'm thinking like a one-pager, like "performance is much less than a physical disk for single-threaded I/O, here's why" and "if you test, test multiple threads; here are some easy ways" and "compare test X and test Y to see if your test is in the right range for your hardware/network"...stuff we can point people to when they say "crap, I'm only getting 200MB/s out of this, wth?"
[20:01] * sjustlaptop (~sam@2607:f298:a:607:5ac:ba1e:24f4:8e1) has joined #ceph
[20:01] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:02] <grepory> dmick: that sounds nice.
[20:03] <grepory> also whoah
[20:03] <grepory> i just realized who nhm is :D
[20:03] <janos> a hardware-hugging cave dweller?
[20:03] <grepory> hahahaha, yes!
[20:03] <dmick> *the* hardware-hugging cave dweller
[20:03] <janos> nodnod
[20:03] <grepory> and master of pretty graphs
[20:04] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:04] <nhm> grepory: hi. :)
[20:05] <grepory> nhm: hi! :D
[20:05] <jks> joelio, I am thinking that because I set the tunables to "optimal", I won't be able to use the kernel client - and that is then just the reason why the kernel mapper fails?
[20:05] <jks> joelio, and then my qemu-kvm error is not related to this
[20:05] <jks> but perhaps I'm wrong :-|
[20:05] <nhm> dmick: yeah, I'd like to do somethign like that. So far I haven't because I wasn't sure how much that might overlap with what the PS guys are doing.
[20:05] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[20:05] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[20:06] <dmick> I think if we had a short page like that on the website we could short-circuit a lot of email and IRC :)
[20:06] <dmick> maybe the PS guys could come up with it, even
[20:09] <joshd> jks: that qemu error is a bug in the qemu rbd driver or qemu itself - it should never pass a NULL ictx
[20:09] <jks> joelio, I set my tunables back to "legacy" and now the kernel map succeeds :-) so the problem was unrelated to the qemu issue
[20:09] <joshd> jks: did you compile that qemu from stock 1.4.2?
[20:09] <jks> joshd, hmm :-| yes, stock 1.4.2 just downloaded from the qemu web site
[20:10] <joshd> jks: I don't have time to look into it right now unfortunately, but trying a different version of qemu might work
[20:11] <jks> joshd, hmm, yeah, but I wanted 1.4.2 to have the async flush patch :-)
[20:11] <joshd> jks: try 1.5 maybe
[20:11] <jks> joshd, but the odd thing here is that I had 1.4.2 running on my Ubuntu 12.10 server - and it was working just fine!
[20:11] <jks> joshd, then I upgraded to Cuttlefish, and I got this error
[20:11] <jks> (note: I upgraded my mons and osds to Cuttlefish... not the Ubuntu client)
[20:12] <jks> then I took a brand new server, installed Ubuntu 13.04 and the same qemu 1.4.2 - and got the same error
[20:12] <joshd> jks: you must have upgraded librbd on the client to be able to compile the async flush though
[20:12] <jks> joshd, not that I know of no? - it was working fine on bobtail?
[20:12] <joshd> jks: oh, or 0.56.6
[20:12] <jks> yep
[20:15] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[20:19] <joelio> jks: good stuff
[20:19] * jwilliams (~jwilliams@72.5.59.176) has joined #ceph
[20:20] <jwilliams> Hi, I'm having trouble mapping an rbd
[20:20] <jks> jwilliams, what is the problem?
[20:20] <jwilliams> [ 312.524836] libceph: mon1 10.11.2.102:6789 feature set mismatch, my 40002 < server's 42040002, missing 42000000
[20:20] <jwilliams> [ 312.525658] libceph: mon1 10.11.2.102:6789 socket error on read
[20:21] <jwilliams> everything should be identical, same kernel, same librbd, librados, libceph
[20:21] <jwilliams> same ceph.conf
[20:22] <jks> jwilliams, Just had a similar problem... what did you set your crush tunables to?
[20:22] <jks> jwilliams, I found that setting the tunables back (for example to legacy), the kernel client is able to connect
[20:22] <jwilliams> I tried it with legacy and optimal and default
[20:23] <jwilliams> turning it to default or legacy it gives me 'missing 40000000'
[20:23] <guppy> is anyone running ceph on centos? I'm getting an error when trying to set up my second monitor
[20:23] <guppy> "ulimit -n 8192; /usr/bin/ceph-mon -i ceph2 --pid-file /var/run/ceph/mon.ceph2.pid -c /etc/ceph/ceph.conf " returned 1
[20:24] <jwilliams> I think that's the hashpspool flag, but i tried creating the pool with both false and true set
[20:25] <jks> looks like CEPH_FEATURE_OSDHASHPSPOOL yes
[20:26] <jks> I would have though that tunables legacy would disable that hmmm
[20:27] <jks> probably always forced on if it is actually used
[20:28] <jwilliams> why wouldn't the client use it?
[20:28] <jks> which kernel version do you have?
[20:28] <nhm> guppy: I've deployed on centos, but don't recall ever seeing that, sorry. :(
[20:29] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[20:29] <guppy> nhm: thanks
[20:29] <nhm> guppy: works with 1 mon?
[20:29] <jwilliams> Linux mcp01-atl 3.8.0-26-generic
[20:29] * TMM (~hp@535240C7.cm-6-3b.dynamic.ziggo.nl) has joined #ceph
[20:29] <guppy> nhm: it created one monitor yeah but then I'm trying to create 2 more and that's failing
[20:30] <guppy> seems to be stuck in ceph-create-keys waiting for a socket
[20:30] <guppy> do I not use "ceph-deply mon create" to deploy additional monitors?
[20:30] <nhm> guppy: Not sure if dmick is at lunch yet, but he might have some ideas.
[20:31] <dmick> ceph-create-keys is probably logging; what specifically is it logging? If it can't connect to the first monitor, there will be trouble
[20:31] <dmick> with upstart it'd be in the upstart logs; not sure about init.d
[20:31] <dmick> syslog maybe
[20:31] <guppy> oh, maybe that's the issue.
[20:32] <guppy> connect to /var/run/ceph/ceph-mon.ceph2.asok failed with (2) No such file or directory
[20:32] <guppy> INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
[20:32] <guppy> is what ceph-create-keys is returning
[20:32] <jks> jwilliams, are you running cuttlefish or?
[20:32] <jwilliams> ceph version 0.66 (b6b48dbefadb39419f126d0e62c035e010906027)
[20:32] <dmick> guppy: so the first monitor probably isn't completely up; check for the proc and his logs
[20:33] <jks> oh damn, I see that you shouldn't mess with tunables... my client's kernel just all panicced :-|
[20:34] <jks> jwilliams, hmm, I don't know then... it works for me on cuttlefish, perhaps you need a newer kernel for 0.66
[20:35] <jwilliams> but, the cluster is using the same kernel?
[20:36] <jks> jwilliams, yes, but that doesn't matter
[20:36] <nhm> keruspe: ooh, how did you do that?
[20:36] <jks> jwilliams, the server is in user space and that is what the kernel clients talks with... so if the server expects a newer client, it just won't work
[20:36] <nhm> jks: sorry, that was for you. :)
[20:36] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[20:36] <jwilliams> ah, I see
[20:37] <jks> nhm, I just set the tunables back to legacy, mapped and unmapped an image and then set the tunables to optimal again... and boom!
[20:38] <nhm> jks: hrm!
[20:38] <nhm> anything interesting in the panic?
[20:39] <jks> I'm sorry I have already rebooted as I needed to use the server - forgot to screenshot the message
[20:40] <nhm> jks: no worries
[20:41] * xmltok_ (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[20:42] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[20:44] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: sputnik13)
[20:47] * sjustlaptop (~sam@2607:f298:a:607:5ac:ba1e:24f4:8e1) Quit (Ping timeout: 480 seconds)
[20:47] <guppy> ah, http://tracker.ceph.com/issues/5195
[20:48] <guppy> should add that to the QSG also
[20:54] <illya> joelio: back to my issue
[20:54] <illya> if cluster can't became healthy - it could be issue in OSDs size
[20:54] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[20:54] <illya> if I had 2G - it was down
[20:55] <illya> if I had 2x8G it was 50% healthy
[20:55] <illya> if I had 2x15G it is Ok
[20:56] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit ()
[20:58] * Tamil (~tamil@38.122.20.226) Quit (Quit: Leaving.)
[20:59] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[21:01] <ofu> when an osd goes down (crash/reboot/whatever) and comes back up again, what will happen?
[21:01] <ofu> will all pgs be rebuilt?
[21:01] <ofu> are there timestamps so that only a small delta has to be applied?
[21:02] <saabylaptop> ofu: yes, only the delta is synced.
[21:02] <ofu> cool, that was the answer i hoped for
[21:05] * mikedawson (~chatzilla@23-25-19-14-static.hfc.comcastbusiness.net) has joined #ceph
[21:10] * stingray (~stingray@stingr.net) has joined #ceph
[21:11] <stingray> good morning
[21:12] <stingray> I am debugging this crash, if anybody wants to take a look: http://pastebin.com/NBs2rwGe
[21:12] <stingray> I'm not sure how I got there, presumably I have stopped everything, then after a while updated the distro & stuff to the 0.61.4
[21:13] <stingray> nevertheless it looks like crash is happening inside leveldb which suggests leveldb corruption, I'll try to find a way to repair it now.
[21:13] * jjgalvez (~jjgalvez@ip72-193-215-88.lv.lv.cox.net) Quit (Quit: Leaving.)
[21:15] * illya (~illya_hav@9-39-133-95.pool.ukrtel.net) has left #ceph
[21:20] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[21:22] <jwilliams> jks: for completeness, upgrading to 3.10.0-031000rc1-generic resolved the issue
[21:22] <jwilliams> thank you!
[21:23] <jks> jwilliams, super :-)
[21:27] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[21:28] <stingray> ok, this was leveldb bug
[21:28] * john_barbee (~jbarbee@23-25-19-14-static.hfc.comcastbusiness.net) has joined #ceph
[21:28] * mikedawson_ (~chatzilla@23-25-19-9-static.hfc.comcastbusiness.net) has joined #ceph
[21:28] <stingray> upgraded to 0.12 and is now working fine
[21:33] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:34] * mikedawson (~chatzilla@23-25-19-14-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[21:34] * mikedawson_ is now known as mikedawson
[21:36] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) Quit (Quit: Leaving.)
[21:38] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:41] <guppy> http://ceph.com/resources/downloads/ -- it says "Important: Before upgrading to Bobtail, review these upgrade notes. - See more at", shouldn't that be to Cuttlefish not Bobtail?
[21:42] <jks> joshd, trying to trace it in the source code, but I lost trace in the coroutine_trampoline() - don't know where the blockdriverstate structure was initialized, or why the "opaque" pointer was left as NULL
[21:47] <jks> joshd, I have now compiled 1.5.1 instead of 1.4.2 - and now it works!
[21:48] <jks> unfortunately, I cannot use 1.5.1 as it doesn't seem to be compatible with opennebula :-(
[21:51] * mikedawson (~chatzilla@23-25-19-9-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[22:01] * markbby (~Adium@168.94.245.4) Quit (Quit: Leaving.)
[22:02] * mikedawson (~chatzilla@23-25-19-14-static.hfc.comcastbusiness.net) has joined #ceph
[22:11] * john_barbee (~jbarbee@23-25-19-14-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[22:14] * Tamil (~tamil@38.122.20.226) has joined #ceph
[22:17] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) has joined #ceph
[22:17] * fridudad (~oftc-webi@p5B09C690.dip0.t-ipconnect.de) has joined #ceph
[22:19] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:19] * oddomatik (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[22:21] * mikedawson_ (~chatzilla@23-25-19-13-static.hfc.comcastbusiness.net) has joined #ceph
[22:28] * mikedawson (~chatzilla@23-25-19-14-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[22:36] * mikedawson_ (~chatzilla@23-25-19-13-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[22:38] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[22:39] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[22:41] * grepory1 (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[22:41] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Read error: Connection reset by peer)
[22:46] * jakes (~oftc-webi@128-107-239-234.cisco.com) has joined #ceph
[22:48] * jakes (~oftc-webi@128-107-239-234.cisco.com) Quit (Remote host closed the connection)
[22:49] * jakes (~oftc-webi@128-107-239-234.cisco.com) has joined #ceph
[22:50] <jakes> I have seen two ways to integrate ceph with hadoop. One is using kernel client via mounting and other using userlevel client via JNI. What can be performance differences? Is there any situation when one is preferred over the other?
[22:57] * nwat (~oftc-webi@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[22:58] <nwat> jakes: the hadoop ceph client only uses libcephfs
[22:58] * fireD (~fireD@93-142-232-63.adsl.net.t-com.hr) Quit (Read error: Operation timed out)
[22:59] <nwat> jakes: i think there is a JIRA for the kernel client version. it should be removed :)
[22:59] <jakes> yeah.. I was looking through that . https://issues.apache.org/jira/browse/HADOOP-6779
[23:00] * fridudad (~oftc-webi@p5B09C690.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[23:02] <nwat> jakes: the other JIRA for libcephfs is way out of date, mostly because it is unlikely that Hadoop will be merging new file systems. You can find updated information at http://ceph.com/docs/wip-hadoop-doc/cephfs/hadoop/
[23:02] <jakes> nwat, I didn't get it. As in http://ceph.com/docs/master/man/8/mount.ceph/ , why can't it be mounted and use it as local filesystem?..
[23:05] * vata (~vata@2607:fad8:4:6:70a6:cde4:f3b9:58c2) Quit (Quit: Leaving.)
[23:05] <nwat> jakes: it could certainly be made to work
[23:09] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[23:09] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[23:10] * fireD (~fireD@93-142-210-212.adsl.net.t-com.hr) has joined #ceph
[23:10] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Quit: Ex-Chat)
[23:11] <jakes> nwat, what is the difference between the two?
[23:12] <nwat> jakes: just different client implementations, both should perform well. there won't be much benefit to mounting Ceph in the VFS on all of the OSDs, so its extra configuration to deal with, too.
[23:17] <sjust> nwat, jakes: it's not generally safe to mount cephfs on an osd
[23:18] <nwat> sjust: good point. deadlock concerns?
[23:18] <sjust> correct
[23:18] <jakes> how?.. then where is it used for?
[23:19] <dmick> jakes: as we discussed at length
[23:19] <dmick> the cluster is a network service. clients can run anywhere. so, you mount a cephfs from a non-cluster machine
[23:22] <jakes> yup. I am trying out this setup for vms in the openstack. I thought of having kernel clients in each of the vm and whihc connect to common object store in the host.
[23:25] * saabylaptop (~saabylapt@1009ds5-oebr.1.fullrate.dk) Quit (Quit: Leaving.)
[23:27] * jeff-YF (~jeffyf@67.23.117.122) Quit (Read error: Operation timed out)
[23:34] <dmick> jakes: that's fine; you just don't want to run the kernel clients on the same machine as the OSDs.
[23:35] <jakes> yes.. just curious to know. what is the reason behind this?
[23:43] <sage> dmick: http://fpaste.org/25005/66538613/
[23:43] <sage> it was tehre for unlink but not rm or remove... the test script is catching it.
[23:43] <sage> not sure why thsi wasn't failing before tho
[23:44] <dmick> grr
[23:45] <dmick> are we calling it with ancestor? missing help is ugly tho
[23:45] <dmick> oh, no, it was, just reformatting, I see
[23:46] <dmick> wait, what is the test script catching?
[23:46] <dmick> oh crush_ops
[23:47] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:47] <dmick> but that doesn't use ancestor either. <confused>
[23:49] <sage> 2013-07-12T14:17:27.888 INFO:teuthology.task.workunit.client.0.err:+ ceph osd crush rm osd.6 host1
[23:49] <sage> 2013-07-12T14:17:27.989 INFO:teuthology.task.workunit.client.0.err:2013-07-12 14:17:50.804431 7f15a8270700 0 -- :/1011031 >> 10.214.131.30:6789/0 pipe(0x7f15a401ac60 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f15a4014010).fault
[23:49] <sage> 2013-07-12T14:17:31.055 INFO:teuthology.task.workunit.client.0.err:no valid command found; 10 closest matches:
[23:49] <sage> 2013-07-12T14:17:31.055 INFO:teuthology.task.workunit.client.0.err:osd crush rm <name>
[23:49] <sage> 2013-07-12T14:17:31.056 INFO:teuthology.task.workunit.client.0.err:Error EINVAL: invalid command
[23:50] <dmick> sigh. somehow I thought searching for "ancestor" would yield something. #deep-inside-rest
[23:51] * dmick (~dmick@2607:f298:a:607:c52d:3c98:eb9e:ff44) Quit (Quit: Leaving.)
[23:52] * dmick (~dmick@2607:f298:a:607:595c:2cc7:2718:adef) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.