#ceph IRC Log

Index

IRC Log for 2013-12-02

Timestamps are in GMT/BST.

[0:02] * BillK (~BillK-OFT@124-148-75-108.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[0:04] * rendar (~s@host9-176-dynamic.22-79-r.retail.telecomitalia.it) Quit ()
[0:04] * BillK (~BillK-OFT@124-168-235-23.dyn.iinet.net.au) has joined #ceph
[0:06] <symmcom> while redeploying it says, mon.mon-ceph-02 is not running and mon-cph-02 doe snot exist in monmap
[0:14] * BillK (~BillK-OFT@124-168-235-23.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[0:14] * BillK (~BillK-OFT@106-68-9-50.dyn.iinet.net.au) has joined #ceph
[0:16] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[0:17] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[0:18] <pmatulis> that's where you can do the restart thing
[0:19] * sarob (~sarob@2601:9:7080:13a:4c1a:73c1:dda0:5347) has joined #ceph
[0:20] <pmatulis> and if that doesn't work you might need to update ceph.conf on all nodes to declare that monitor
[0:21] <pmatulis> or declare a 'public network'
[0:21] <pmatulis> [global]
[0:21] <pmatulis> public_network = 192.168.122.0/24
[0:22] <symmcom> ok
[0:22] <pmatulis> OR
[0:22] <pmatulis> [global]
[0:22] <pmatulis> mon_host = 192.168.122.232
[0:22] <pmatulis> (example)
[0:22] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[0:23] <symmcom> is it auth cluster required = cephx or auth_cluster_required = cephx
[0:26] * BillK (~BillK-OFT@106-68-9-50.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[0:27] * sarob (~sarob@2601:9:7080:13a:4c1a:73c1:dda0:5347) Quit (Ping timeout: 480 seconds)
[0:28] * BillK (~BillK-OFT@106-69-27-252.dyn.iinet.net.au) has joined #ceph
[0:30] <pmatulis> these setting can be with '_' and it seems ceph removes them (when viewed during a query)
[0:31] <symmcom> i think i have found a possible flaw, the new node #2 MON just started and i finalyl have the file in /var/run/ceph/ceph-mon.mon-ceph-02.asok
[0:34] <pmatulis> it started by itself?
[0:34] <symmcom> i did ceph-deploy mon create
[0:36] <symmcom> but why did it create ceph-mon.mon-ceph-02.asok instead of ceph.mon-ceph-02.asok
[0:41] * DarkAce-Z (~BillyMays@50.107.53.200) has joined #ceph
[0:45] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[0:53] * BillK (~BillK-OFT@106-69-27-252.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[0:54] * BillK (~BillK-OFT@106-68-207-236.dyn.iinet.net.au) has joined #ceph
[0:55] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[0:56] <pmatulis> what was the full create command you used?
[1:03] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[1:06] * Cube (~Cube@66-87-65-52.pools.spcsdns.net) Quit (Quit: Leaving.)
[1:14] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[1:15] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[1:17] <symmcom> #ceph-deploy mon create mon-ceph-02
[1:18] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[1:18] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[1:20] <pmatulis> that's why
[1:20] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[1:20] <pmatulis> "ceph-mon" + "mon-ceph-02"
[1:24] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz…)
[1:25] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[1:26] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[1:29] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit ()
[1:35] <symmcom> looks like ceph-deploy made me lose sudo. /etc/ceph/ceph.conf now got permission error
[1:39] <alfredodeza> ceph-deploy would never make you lose sudo :)
[1:40] * mozg (~andrei@host81-151-251-29.range81-151.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:42] <symmcom> i did #ceph-deploy config push and right after that permission issue started. its not new
[1:42] <symmcom> cep
[1:48] <symmcom> looks like mon is running on 3 nodes, but all of them r still probing
[1:49] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[1:51] * Dark-Ace-Z (~BillyMays@50.107.53.200) has joined #ceph
[1:53] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[1:53] * Dark-Ace-Z (~BillyMays@50.107.53.200) Quit (Max SendQ exceeded)
[1:55] * dxd828 (~dxd828@host-92-24-127-29.ppp.as43234.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[1:56] * DarkAce-Z (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[2:01] * mwarwick (~mwarwick@2407:7800:400:107f:6e88:14ff:fe48:57e4) has joined #ceph
[2:08] <pmatulis> i've seen ceph.conf lose permissions before
[2:14] <symmcom> what would be my next step to bring this MONs out of Probing
[2:15] * odyssey4me (~odyssey4m@41-132-44-27.dsl.mweb.co.za) has joined #ceph
[2:16] <pmatulis> symmcom: so does 'ceph status' work now?
[2:16] <symmcom> #ceph -s shows me nothing except the cursor just sitting , no error msg or anything
[2:16] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[2:17] <pmatulis> symmcom: so the new monitor didn't help. when you query it directly like we did before, it still says 'probing'?
[2:17] <symmcom> .asok mon_status shows all MONs are probing and also one of the new node picked up an old node in the MON map somehow , so i think i need to delete entries from Map
[2:17] <symmcom> yes
[2:25] * LeaChim (~LeaChim@host86-162-2-255.range86-162.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:26] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:28] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:40] * torment3 (~torment@pool-71-180-185-11.tampfl.fios.verizon.net) Quit (Quit: WeeChat 0.3.7)
[2:47] <pmatulis> hmm, i never heard of editing a map other than the crush map
[2:52] <symmcom> i think i will call it a quit for the day. its been a very long day.. i think i am heading to theright direction, very slowly but surely. got to learn alot thanks to pmatulis and whoever helped
[2:52] <symmcom> i will restart entire clusters tomorrow o see if that fixes something
[2:52] <pmatulis> symmcom: sure, n/p
[3:01] * stacker100 (~stacker66@113.pool85-58-86.dynamic.orange.es) has joined #ceph
[3:06] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[3:07] * stacker666 (~stacker66@241.pool85-58-88.dynamic.orange.es) Quit (Ping timeout: 480 seconds)
[3:11] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Remote host closed the connection)
[3:11] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[3:19] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[3:22] * unis (~unis@58.213.102.114) has joined #ceph
[3:24] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[3:31] * Shmouel1 (~Sam@fny94-12-83-157-27-95.fbx.proxad.net) has joined #ceph
[3:36] * Shmouel (~Sam@ns1.anotherservice.com) Quit (Ping timeout: 480 seconds)
[3:52] * shang (~ShangWu@175.41.48.77) has joined #ceph
[3:52] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[3:55] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[3:58] * mwarwick (~mwarwick@2407:7800:400:107f:6e88:14ff:fe48:57e4) Quit (Ping timeout: 480 seconds)
[4:01] * DarkAce-Z (~BillyMays@50.107.53.200) has joined #ceph
[4:01] * sarob (~sarob@2601:9:7080:13a:b082:fc16:6746:cd63) has joined #ceph
[4:02] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[4:03] * odyssey4me (~odyssey4m@41-132-44-27.dsl.mweb.co.za) Quit (Quit: odyssey4me)
[4:05] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Read error: Operation timed out)
[4:08] * mwarwick (~mwarwick@2407:7800:400:1011:3e97:eff:fe91:d9bf) has joined #ceph
[4:15] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[4:47] * unis (~unis@58.213.102.114) Quit (Ping timeout: 480 seconds)
[4:55] * Cube (~Cube@66-87-65-52.pools.spcsdns.net) has joined #ceph
[4:56] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[4:59] * AfC (~andrew@2407:7800:400:107f:6e88:14ff:fe33:2a9c) has joined #ceph
[5:05] * shang_ (~ShangWu@175.41.48.77) has joined #ceph
[5:05] * shang (~ShangWu@175.41.48.77) Quit (Read error: Connection reset by peer)
[5:10] * Hakisho_ (~Hakisho@p4FC266B4.dip0.t-ipconnect.de) has joined #ceph
[5:13] * Hakisho (~Hakisho@0001be3c.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:13] * Hakisho_ is now known as Hakisho
[5:16] * sarob (~sarob@2601:9:7080:13a:b082:fc16:6746:cd63) Quit (Remote host closed the connection)
[5:16] * sarob (~sarob@2601:9:7080:13a:b082:fc16:6746:cd63) has joined #ceph
[5:19] * shang_ (~ShangWu@175.41.48.77) Quit (Remote host closed the connection)
[5:21] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[5:24] * sarob (~sarob@2601:9:7080:13a:b082:fc16:6746:cd63) Quit (Ping timeout: 480 seconds)
[5:27] * mmmucky_ (~mucky@mucky.socket7.org) Quit (Remote host closed the connection)
[5:27] * xmltok (~xmltok@cpe-23-240-222-226.socal.res.rr.com) Quit (Quit: Bye!)
[5:32] * KindTwo (KindOne@h210.45.28.71.dynamic.ip.windstream.net) has joined #ceph
[5:33] * mmmucky (~mucky@mucky.socket7.org) has joined #ceph
[5:35] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:35] * KindTwo is now known as KindOne
[5:39] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[5:44] * AfC (~andrew@2407:7800:400:107f:6e88:14ff:fe33:2a9c) Quit (Quit: Leaving.)
[5:44] * shang (~ShangWu@175.41.48.77) has joined #ceph
[5:45] * ivotron_ (~ivotron@c-98-196-87-151.hsd1.tx.comcast.net) has joined #ceph
[5:46] * mmmucky (~mucky@mucky.socket7.org) Quit (Remote host closed the connection)
[5:47] * sarob (~sarob@2601:9:7080:13a:d949:b07b:7d2:8eff) has joined #ceph
[5:52] * mmmucky (~mucky@mucky.socket7.org) has joined #ceph
[5:52] * Pedras1 (~Adium@c-67-188-26-20.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[5:53] <shang> I was not able to download the whitepaper from the inktank website: http://www.inktank.com/resource/calxeda-reference-architecture/
[5:53] <shang> can anyone else help to try it out?
[5:56] * cmdrk (~lincoln@c-24-12-206-91.hsd1.il.comcast.net) has joined #ceph
[5:58] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) has joined #ceph
[5:59] * mmmucky (~mucky@mucky.socket7.org) Quit (Remote host closed the connection)
[6:03] <pmatulis> shang: form is broken for me too
[6:05] <shang> pmatulis: how come you are still awake?
[6:12] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) Quit (Quit: Leaving.)
[6:12] <pmatulis> shang: i haven't gone to bed yet
[6:13] * sarob (~sarob@2601:9:7080:13a:d949:b07b:7d2:8eff) Quit (Remote host closed the connection)
[6:13] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[6:13] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[6:13] * ChanServ sets mode +o elder
[6:16] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:16] * mmmucky (~mucky@mucky.socket7.org) has joined #ceph
[6:30] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[6:44] * unis (unis@58.213.102.115) has joined #ceph
[6:47] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Ping timeout: 480 seconds)
[6:57] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[7:04] * unis_ (~unis@58.213.102.114) has joined #ceph
[7:07] * unis__ (~unis@58.213.102.114) has joined #ceph
[7:10] * unis (unis@58.213.102.115) Quit (Ping timeout: 480 seconds)
[7:11] * unis (unis@58.213.102.115) has joined #ceph
[7:12] * unis_ (~unis@58.213.102.114) Quit (Ping timeout: 480 seconds)
[7:12] * unis_ (~unis@58.213.102.114) has joined #ceph
[7:15] * unis__ (~unis@58.213.102.114) Quit (Ping timeout: 480 seconds)
[7:17] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[7:17] * unis__ (~unis@58.213.102.114) has joined #ceph
[7:18] * sarob (~sarob@2601:9:7080:13a:d949:b07b:7d2:8eff) has joined #ceph
[7:19] * unis (unis@58.213.102.115) Quit (Ping timeout: 480 seconds)
[7:20] * unis_ (~unis@58.213.102.114) Quit (Ping timeout: 480 seconds)
[7:24] * unis (unis@2002:3ad5:6673::3ad5:6673) has joined #ceph
[7:26] * sarob (~sarob@2601:9:7080:13a:d949:b07b:7d2:8eff) Quit (Ping timeout: 480 seconds)
[7:27] * unis__ (~unis@58.213.102.114) Quit (Ping timeout: 480 seconds)
[7:31] * unis_ (unis@2002:3ad5:6673::3ad5:6673) has joined #ceph
[7:34] * unis (unis@2002:3ad5:6673::3ad5:6673) Quit (Ping timeout: 481 seconds)
[7:39] * unis_ (unis@2002:3ad5:6673::3ad5:6673) Quit (Ping timeout: 480 seconds)
[7:39] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) has joined #ceph
[7:40] * danieagle_ (~Daniel@186.214.63.175) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[7:40] * unis (unis@58.213.102.115) has joined #ceph
[7:40] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) Quit ()
[7:40] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[7:48] * sarob (~sarob@2601:9:7080:13a:a995:f532:b11e:b5c1) has joined #ceph
[7:51] * codice (~toodles@71-80-186-21.dhcp.lnbh.ca.charter.com) Quit (Ping timeout: 480 seconds)
[7:56] * sarob (~sarob@2601:9:7080:13a:a995:f532:b11e:b5c1) Quit (Ping timeout: 480 seconds)
[8:12] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:13] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[8:13] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) Quit (Ping timeout: 480 seconds)
[8:13] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Quit: shimo)
[8:15] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[8:18] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[8:25] * Sodo (~Sodo@a88-113-108-239.elisa-laajakaista.fi) Quit (Ping timeout: 480 seconds)
[8:34] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) has joined #ceph
[8:45] * dvanders (~dvanders@dvanders-air.cern.ch) has joined #ceph
[8:46] * unis_ (~unis@58.213.102.114) has joined #ceph
[8:51] * Sysadmin88 (~IceChat77@90.221.57.103) Quit (Quit: Oops. My brain just hit a bad sector)
[8:52] * unis (unis@58.213.102.115) Quit (Ping timeout: 480 seconds)
[8:54] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:56] * fouxm (~fouxm@185.23.92.11) has joined #ceph
[8:58] * DarkAce-Z is now known as DarkAceZ
[8:59] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[9:01] * rendar (~s@host105-177-dynamic.20-87-r.retail.telecomitalia.it) has joined #ceph
[9:01] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[9:04] * fred__ (~fred@c83-248-221-150.bredband.comhem.se) has joined #ceph
[9:04] * fred_ (~fred@c83-248-221-150.bredband.comhem.se) Quit (Read error: Connection reset by peer)
[9:06] * mattt_ (~textual@94.236.7.190) has joined #ceph
[9:08] * Pauline (~middelink@2001:838:3c1:1:be5f:f4ff:fe58:e04) Quit (Quit: Leaving)
[9:09] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:10] * Pauline (~middelink@2001:838:3c1:1:be5f:f4ff:fe58:e04) has joined #ceph
[9:12] * mattt__ (~textual@92.52.76.140) has joined #ceph
[9:13] * mattt_ (~textual@94.236.7.190) Quit (Read error: Connection reset by peer)
[9:16] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) has joined #ceph
[9:17] * mwarwick (~mwarwick@2407:7800:400:1011:3e97:eff:fe91:d9bf) Quit (Ping timeout: 480 seconds)
[9:18] * sarob (~sarob@2601:9:7080:13a:c921:4228:4ca8:1550) has joined #ceph
[9:19] * jcfischer (~fischer@macjcf.switch.ch) has joined #ceph
[9:22] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[9:25] * mozg (~andrei@host81-151-251-29.range81-151.btcentralplus.com) has joined #ceph
[9:26] * sarob (~sarob@2601:9:7080:13a:c921:4228:4ca8:1550) Quit (Ping timeout: 480 seconds)
[9:28] * mwarwick (~mwarwick@2407:7800:400:107f:6e88:14ff:fe48:57e4) has joined #ceph
[9:29] * i_m (~ivan.miro@deibp9eh1--blueice2n2.emea.ibm.com) has joined #ceph
[9:33] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz…)
[9:38] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[9:46] * xdeller (~xdeller@91.218.144.129) has joined #ceph
[9:51] * syed_ (~chatzilla@123.63.144.240) has joined #ceph
[9:57] * zoltan (~zoltan@2001:620:20:222:c932:42d5:cb07:a2ef) has joined #ceph
[9:57] <zoltan> hi
[9:58] <zoltan> I shutdown one of the OSD nodes without setting noout and it started replicating; currently the node has trouble booting but since I still have all my data it's fine for me - can I stop the replication somehow?
[9:58] <zoltan> ceph osd set noout didn't do anything at this point
[10:02] * philipgian (~philipgia@195.251.28.222) has joined #ceph
[10:03] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) Quit (Quit: Ex-Chat)
[10:03] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) has joined #ceph
[10:10] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) Quit (Quit: Ex-Chat)
[10:10] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[10:18] * sarob (~sarob@2601:9:7080:13a:a5e7:da04:e1ba:5ffe) has joined #ceph
[10:26] * sarob (~sarob@2601:9:7080:13a:a5e7:da04:e1ba:5ffe) Quit (Ping timeout: 480 seconds)
[10:28] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[10:29] * mmmucky (~mucky@mucky.socket7.org) Quit (Ping timeout: 480 seconds)
[10:29] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[10:32] * mmmucky (~mucky@mucky.socket7.org) has joined #ceph
[10:34] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[10:37] * unis_ (~unis@58.213.102.114) Quit (Ping timeout: 480 seconds)
[10:39] * L2SHO_ (~L2SHO@office-nat.choopa.net) has joined #ceph
[10:43] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) has joined #ceph
[10:44] * mwarwick (~mwarwick@2407:7800:400:107f:6e88:14ff:fe48:57e4) Quit (Quit: Leaving.)
[10:44] * fireD (~fireD@93-139-132-105.adsl.net.t-com.hr) has joined #ceph
[10:47] * L2SHO (~L2SHO@office-nat.choopa.net) Quit (Ping timeout: 480 seconds)
[10:50] * Cube1 (~Cube@66-87-66-198.pools.spcsdns.net) has joined #ceph
[10:50] * Cube (~Cube@66-87-65-52.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[10:51] * LeaChim (~LeaChim@host86-162-2-255.range86-162.btcentralplus.com) has joined #ceph
[10:51] * Cube1 (~Cube@66-87-66-198.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[10:51] * Cube (~Cube@66-87-65-227.pools.spcsdns.net) has joined #ceph
[10:51] * Cube (~Cube@66-87-65-227.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[10:53] * mancdaz (~mancdaz@46.38.187.105) Quit (Quit: ZNC - http://znc.sourceforge.net)
[10:54] * mancdaz (~mancdaz@46.38.187.105) has joined #ceph
[10:56] * ivotron_ (~ivotron@c-98-196-87-151.hsd1.tx.comcast.net) Quit (Remote host closed the connection)
[10:57] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[10:59] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) Quit (Remote host closed the connection)
[11:15] * syed_ (~chatzilla@123.63.144.240) Quit (Ping timeout: 480 seconds)
[11:15] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[11:18] * sarob (~sarob@2601:9:7080:13a:d8a3:cfbd:c3eb:e39) has joined #ceph
[11:25] * shang (~ShangWu@175.41.48.77) Quit (Remote host closed the connection)
[11:35] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[11:35] * syed_ (~chatzilla@180.151.28.189) has joined #ceph
[11:38] * ScOut3R (~ScOut3R@212.96.46.212) has joined #ceph
[11:46] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[11:55] * sarob (~sarob@2601:9:7080:13a:d8a3:cfbd:c3eb:e39) Quit (Ping timeout: 480 seconds)
[11:55] * yanzheng (~zhyan@134.134.137.75) has joined #ceph
[11:58] * DarkAce-Z (~BillyMays@50.107.53.200) has joined #ceph
[11:59] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[12:05] * mozg (~andrei@host81-151-251-29.range81-151.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[12:07] * rendar (~s@host105-177-dynamic.20-87-r.retail.telecomitalia.it) Quit ()
[12:12] * DarkAce-Z (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[12:13] <mattch> Odd problem this morning... running 'service ceph start' isn't mounting/starting osds - only mons. I have to explicitly mount the osds before it will see them. Has the init.d script behaviour changed recently? (rhel6, using epel 0.67 packages)
[12:18] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[12:18] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[12:30] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[12:48] * fireD (~fireD@93-139-132-105.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[12:50] <mattch> It appears that the epel package doesn't have some of the udev rules scripts that the ceph-provided package does - specifically 50-rbd.rules 60-ceph-partuuid-workaround.rules 95-ceph-osd.rules
[12:50] * fireD (~fireD@93-136-2-65.adsl.net.t-com.hr) has joined #ceph
[12:51] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) Quit (Quit: Ex-Chat)
[12:52] * diegows (~diegows@190.190.11.42) has joined #ceph
[12:55] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[12:56] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:08] * yanzheng (~zhyan@134.134.137.75) Quit (Remote host closed the connection)
[13:10] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[13:17] * jcfischer (~fischer@macjcf.switch.ch) Quit (Read error: Connection reset by peer)
[13:18] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[13:23] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) has joined #ceph
[13:26] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[13:33] * LnxBil (~LnxBil@p5099afb6.dip0.t-ipconnect.de) has joined #ceph
[13:33] <LnxBil> Hi everybody
[13:33] <LnxBil> I've trouble doing the quick install guide on Wheezy. Maybe some steps missing?
[13:34] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[13:34] <LnxBil> The osd prepare step is failing because of missing directories (after mkdir'ing them, this step works as expected)
[13:34] * syed_ (~chatzilla@180.151.28.189) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 23.0/20130803193131])
[13:35] <LnxBil> Full output of failed admin-step is at http://paste.debian.net/68853/
[13:40] <mattch> Can't seem to create the /dev/disk/by-partuuid entries with udev on rhel6.4, since blkid doesn't return the ID_PART_ENTRY_SCHEME and ...UUID values the udev rules are looking for...
[13:46] * mrjack_ (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[13:48] <ofu> i use /dev/disk/by-path for disks on an lsi sas controller attached to a sas expander. This way missing disks wont renumber all the disks after a reboot
[13:54] * agh (~oftc-webi@gw-to-666.outscale.net) has joined #ceph
[13:54] <agh> Hello to all,
[13:54] <mattch> ok - turns out i was missing the sgdisk binary to set up the symlinks
[13:54] <agh> Is it possible to decrease the PG number ?
[13:55] <agh> if I do this :
[13:55] <agh> ceph osd pool set data pg_num 100
[13:55] <agh> I have this return :
[13:55] <agh> specified pg_num 100 <= current 10050
[13:56] * ScOut3R_ (~ScOut3R@212.96.46.212) has joined #ceph
[13:58] * Siva (~sivat@vpnnat.eglbp.corp.yahoo.com) Quit (Quit: Siva)
[14:01] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) Quit (Quit: Ex-Chat)
[14:02] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 25.0.1/20131112160018])
[14:02] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:03] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) has joined #ceph
[14:03] * ScOut3R (~ScOut3R@212.96.46.212) Quit (Ping timeout: 480 seconds)
[14:03] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:04] * allsystemsarego (~allsystem@5-12-240-115.residential.rdsnet.ro) has joined #ceph
[14:06] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:08] <agh> Is it possible to decrease the PG number ?
[14:08] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) Quit (Quit: Ex-Chat)
[14:10] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[14:11] <ccourtaut> agh: http://ceph.com/docs/master/rados/operations/placement-groups/ only speaks about increasing, not decreasing it. Might not be possible.
[14:12] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has left #ceph
[14:14] <agh> ccourtaut: mmm... well. So, ive done a big error... I have more than 200 OSDs. So, I followed the doc and increase pg_num and pgp_num for all my pools.
[14:15] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[14:15] <agh> since ive done it, my cluster is very slow... So I want to decrease pg num for "little" pools, and let pg_num high for bigger pools... Is the only way is to create new pools, copy data on it, then remove old ones ?
[14:16] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[14:17] <ccourtaut> agh: i've not dealt with this, so i prefer let someone else answer
[14:18] <agh> ccourtaut: ok. Thanks
[14:18] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[14:29] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[14:30] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[14:31] * Cube (~Cube@12.248.40.138) has joined #ceph
[14:33] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) has joined #ceph
[14:33] <pmatulis> agh: replication factor, how many pools, original # of PGs, current # of PGs, # of OSDs?
[14:34] * mancdaz (~mancdaz@46.38.187.105) Quit (Quit: ZNC - http://znc.sourceforge.net)
[14:34] * mancdaz (~mancdaz@46.38.187.105) has joined #ceph
[14:35] <agh> pmatulis: replication of 2, 15 pools, initial pg_num=600, now, pg_num=10000
[14:43] <pmatulis> agh: how are you using these 15 pools?
[14:44] <agh> pmatulis: not all of them. I've removed data, metadata and rbd
[14:44] <LnxBil> After Quick-Tutorial, I got 'HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean' after finishing the 'ceph-deloy admin' command
[14:44] <agh> pmatulis: but RadosGW uses all the others (.rgw.*)
[14:45] <pmatulis> agh: PGs are created on a per pool basis. have you created 10k PGs for each?
[14:45] <agh> pmatulis: yes
[14:46] <pmatulis> agh: you now have 750 PGs per OSD. you are overloading your OSDs now i'm quite sure
[14:47] <agh> pmatulis: yes sure. So, there is no way to reduce pg_num ?
[14:47] * madkiss1 (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[14:47] * japuzzo (~japuzzo@pok2.bluebird.ibm.com) has joined #ceph
[14:47] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Ping timeout: 480 seconds)
[14:48] <pmatulis> agh: there is no 'PG merge' feature to my knowledge. only 'PG split'
[14:48] <agh> pmatulis: ... ok. So I will have to create new pools with fewer PGs, then copy data from older pools, then remove them..
[14:49] <pmatulis> agh: how much data are we talking about?
[14:50] <agh> pmatulis: not so much (expect for the biggest pool, .rgw.buckets) 8413 GB used, 719 TB / 727 TB avail
[14:53] * fouxm_ (~fouxm@185.23.92.11) has joined #ceph
[14:53] * fouxm (~fouxm@185.23.92.11) Quit (Read error: Connection reset by peer)
[14:53] * alphe (~alphe@0001ac6f.user.oftc.net) has joined #ceph
[14:54] <alphe> hello
[14:55] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[14:56] <alphe> wahou my entire ceph cluster don t initiate I have a thousand errors ...
[14:58] <alphe> ok so first I had one of the node gone
[14:58] <alphe> I tryed to reinit it and it wasn t working
[14:58] <alphe> so I reinitialized the whole ceph cluster with stop ceph all then reboot and poof the ceph cluster was gone ... none of them was loading
[14:59] * gmeno (~gmeno@c-50-160-216-89.hsd1.ga.comcast.net) has joined #ceph
[14:59] <pmatulis> agh: that sounds plausible but i'm not sure. research that. i'm very interested in the outcome of this
[15:00] <agh> pmatulis: ok. I'm going to check that
[15:00] <agh> pmatulis: thanks for you help
[15:00] * nregola_comcast (~nregola_c@fbr.reston.va.neto-iss.comcast.net) has joined #ceph
[15:00] <pmatulis> agh: if removing a pool in itself is ok you will still tax the OSDs tremendously by both ① adding yet more PGs and ② creating a lot of data movement
[15:01] <pmatulis> agh: any way to get that data off the cluster and put it back after the change?
[15:01] <agh> pmatulis: no. :(
[15:06] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[15:08] * rmorrison (~rmorrison@162-204-234-147.lightspeed.mssnks.sbcglobal.net) has joined #ceph
[15:12] * i_m (~ivan.miro@deibp9eh1--blueice2n2.emea.ibm.com) Quit (Quit: Leaving.)
[15:12] * markbby (~Adium@168.94.245.1) has joined #ceph
[15:13] * linuxkidd (~linuxkidd@cpe-066-057-061-231.nc.res.rr.com) has joined #ceph
[15:14] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[15:14] <alphe> could it be some good reason for a ceph cluster installed with ceph-deploy to have it s component not loading automatically ?
[15:14] <alphe> or crash at boot ?
[15:15] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[15:18] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[15:19] <linuxkidd> alphe: A couple of questions...
[15:19] <linuxkidd> alphe: a) What distro
[15:19] <linuxkidd> alphe: b) What was required to bring it back up
[15:19] <alphe> ubuntu 13|0
[15:19] <alphe> ubuntu 13. 10
[15:19] <alphe> kernel 3.11.something
[15:20] <alphe> kernel 3.11.0-13
[15:21] <linuxkidd> ok, and what was required to bring ceph back up?
[15:21] * zoltan (~zoltan@2001:620:20:222:c932:42d5:cb07:a2ef) Quit (Ping timeout: 480 seconds)
[15:22] <alphe> ?
[15:22] <alphe> normally ceph cluster alwasy comes back on it own auto start ...
[15:23] <alphe> but now it crash
[15:23] <linuxkidd> It either doesn't load automatically, or crashes on boot... but, what did you have to do after this to restore Ceph to normal functionality
[15:23] <linuxkidd> did you simply need to start it again, or was there more involved?
[15:23] <alphe> I need to start it
[15:23] <linuxkidd> ok...
[15:23] <alphe> and make it work
[15:24] * sleinen (~Adium@2001:620:0:25:31c0:635a:1f0c:dfa9) has joined #ceph
[15:24] <alphe> before I was doing a stop ceph-all
[15:24] <alphe> in dsh -aM -F 10 "stop ceph-all" then
[15:24] <alphe> dsh -aM -F 10 "reboot"
[15:25] <alphe> and everyone was comming back in the ceph cluster without problems
[15:25] <LnxBil> Where are the osd options/configuration stored if nothing is shown in /etc/ceph/ceph.conf?
[15:25] * fouxm_ (~fouxm@185.23.92.11) Quit (Read error: Connection reset by peer)
[15:26] * fouxm (~fouxm@185.23.92.11) has joined #ceph
[15:26] <alphe> yes but it is minimalistic config file generated by ceph-deploy and it allways was like that
[15:26] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[15:26] <pmatulis> LnxBil: in the cluster map
[15:26] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:26] <alphe> and worked fine until now
[15:26] <linuxkidd> alphe: It sounds like the service is not set to start on boot... I'm not intimately familiar with Upstart (I'm an RPM distro kinda guy)
[15:27] <linuxkidd> alphe: You may find the answer here: https://help.ubuntu.com/community/UbuntuBootupHowto
[15:27] <linuxkidd> on how to get it to start at boot
[15:27] <alphe> linuxkidd when I start monitor on node 3 manually it crash
[15:27] <linuxkidd> ah, ok, that's a different thing altogether...
[15:27] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[15:27] <linuxkidd> have you reviewed the /var/log/ceph/ceph-mon.<node name>.log file to see if there's any indication why?
[15:28] <linuxkidd> LnxBil: OSD specifics are stored in the filesystem of the OSD itself..
[15:28] <linuxkidd> LnxBil: typically mounted at /var/lib/ceph/osd/Ceph-##
[15:29] <linuxkidd> LnxBil: So, when the ceph service starts, it looks in this location and loads OSDs based on what it finds there
[15:29] <linuxkidd> LnxBil: Regarding OSD configuration... unless there's something in /etc/ceph/ceph.conf, all defaults are being used.
[15:29] <LnxBil> linuxkidd: Ah, nice. Thanks
[15:30] <LnxBil> linuxkidd: Yeah, I got confused by the quick tutorial because it does not fill ceph.conf
[15:30] <linuxkidd> LnxBil: You can also inject config arguments during normal operation, but those changes will be lost on next restart unless you also put them into the /etc/ceph/ceph.conf
[15:30] <linuxkidd> LnxBil: Ya, there's not a lot of detail in the quick-start...
[15:30] <alfredodeza> alphe: to respond to your initial question: no, there is no way ceph-deploy is doing something to the cluster at install time to prevent it to load automatically
[15:30] * zoltan (~zoltan@2001:620:20:16:8c78:d928:eed5:ada7) has joined #ceph
[15:31] <linuxkidd> LnxBil: and honestly, I only know from playing around with it and figuring things out myself... (On Fedora 19, it didn't set the OSDs to mount via the fstab.. so my OSDs weren't coming online after reboot.. took a while to figure that out)
[15:31] <alphe> http://pastebin.com/7qkkRrmq
[15:32] <alphe> alfredodeza well my cluster was working fine ... then node2 goes missing then I restarted the whole cluster and all the service where crashing ...
[15:33] <alphe> linuxkidd see the pastebin http://pastebin.com/7qkkRrmq
[15:33] <linuxkidd> alphe: Sorry, I'm not seeing anything in the pastebin that leads me to any conclusion... I'm afraid someone else will need to help with this...
[15:33] <alfredodeza> sure, I am saying that it doesn't sound like ceph-deploy is causing it from your description of the problem
[15:34] * BillK (~BillK-OFT@106-68-207-236.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[15:34] <LnxBil> linuxkidd: Yes, I just get started with Ceph, yet if the tutorial doesn't yield a working example, there's something wrong.
[15:34] <alphe> alfredodeza yesh it would be weird that suddently the cluster stops working ...
[15:35] * linuxkidd (~linuxkidd@cpe-066-057-061-231.nc.res.rr.com) Quit (Quit: Konversation terminated!)
[15:36] * markl (~mark@tpsit.com) has joined #ceph
[15:37] * linuxkidd (~linuxkidd@cpe-066-057-061-231.nc.res.rr.com) has joined #ceph
[15:37] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[15:39] * Siva (~sivat@117.192.40.145) has joined #ceph
[15:39] * markbby1 (~Adium@168.94.245.2) has joined #ceph
[15:41] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[15:43] * markbby (~Adium@168.94.245.1) Quit (Remote host closed the connection)
[15:45] * markbby (~Adium@168.94.245.1) has joined #ceph
[15:47] * Siva (~sivat@117.192.40.145) Quit (Ping timeout: 480 seconds)
[15:48] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[15:48] * markbby1 (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[15:51] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[15:51] * ScOut3R_ (~ScOut3R@212.96.46.212) Quit (Ping timeout: 480 seconds)
[15:53] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:58] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) Quit (Remote host closed the connection)
[16:00] * rmorrison (~rmorrison@162-204-234-147.lightspeed.mssnks.sbcglobal.net) Quit (Quit: Leaving)
[16:01] * nregola_comcast (~nregola_c@fbr.reston.va.neto-iss.comcast.net) Quit (Read error: Operation timed out)
[16:02] * nregola_comcast (~nregola_c@c-69-140-90-184.hsd1.va.comcast.net) has joined #ceph
[16:08] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[16:08] * markbby (~Adium@168.94.245.1) Quit (Quit: Leaving.)
[16:09] * markbby (~Adium@168.94.245.1) has joined #ceph
[16:15] * xcrracer (~xcrracer@fw-ext-v-1.kvcc.edu) has joined #ceph
[16:18] * sarob (~sarob@2601:9:7080:13a:619c:2f17:1d0b:5e53) has joined #ceph
[16:25] * dmsimard (~Adium@108.163.152.2) has joined #ceph
[16:27] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[16:29] * markbby (~Adium@168.94.245.1) Quit (Remote host closed the connection)
[16:29] * markbby (~Adium@168.94.245.1) has joined #ceph
[16:32] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[16:37] * bandrus (~Adium@107.216.174.246) has joined #ceph
[16:37] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[16:40] <alphe> what does force sync on a monitor ?
[16:40] * mfisch (~mfisch@129.19.1.59) has joined #ceph
[16:45] * gregmark (~Adium@cet-nat-254.ndceast.pa.bo.comcast.net) has joined #ceph
[16:47] * nregola_comcast (~nregola_c@c-69-140-90-184.hsd1.va.comcast.net) Quit (Quit: Leaving.)
[16:49] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[16:50] * sleinen1 (~Adium@2001:620:0:2d:a986:4c3d:dee9:6bcb) has joined #ceph
[16:51] * philipgian (~philipgia@195.251.28.222) Quit (Quit: leaving)
[16:51] <alphe> after touching the network cables my node 01 is back node 2 is back too but not it is node 3 that is not willing to comeback ...
[16:51] <alphe> serriously super strange ...
[16:54] * sleinen2 (~Adium@2001:620:0:46:5c1b:1a1b:a5f1:fd84) has joined #ceph
[16:54] * sarob (~sarob@2601:9:7080:13a:619c:2f17:1d0b:5e53) Quit (Ping timeout: 480 seconds)
[16:57] * sleinen (~Adium@2001:620:0:25:31c0:635a:1f0c:dfa9) Quit (Ping timeout: 480 seconds)
[17:00] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:00] * sleinen1 (~Adium@2001:620:0:2d:a986:4c3d:dee9:6bcb) Quit (Ping timeout: 480 seconds)
[17:00] * mfisch (~mfisch@129.19.1.59) Quit (Read error: Operation timed out)
[17:00] * glowell1 (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:06] * mancdaz (~mancdaz@46.38.187.105) Quit (Quit: ZNC - http://znc.sourceforge.net)
[17:09] * glzhao (~glzhao@118.195.65.67) Quit (Quit: leaving)
[17:11] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) Quit (Quit: Ex-Chat)
[17:11] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[17:11] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) has joined #ceph
[17:14] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[17:18] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[17:20] * gdavis331 (~gdavis@38.122.12.254) has left #ceph
[17:30] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[17:32] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[17:32] <PerlStalker> My ceph store is currently running bobtail. Can I go right to emperor or do I have to go through dumpling first?
[17:34] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:36] <bandrus> PerlStalker: you can upgrade directly, but keep in mind there are some protocol changes to work around
[17:36] <sage> PerlStalker: i'm not sure if we tested bobtail -> emperor, but we definitely did test bobtail->dumpling. Should be ok tho.
[17:37] <PerlStalker> Fair enough.
[17:39] * nregola_comcast (~nregola_c@fbr.reston.va.neto-iss.comcast.net) has joined #ceph
[17:39] * nregola_comcast (~nregola_c@fbr.reston.va.neto-iss.comcast.net) has left #ceph
[17:39] * mancdaz (~mancdaz@2a00:1a48:7807:102:94f4:6b56:ff08:7ca6) has joined #ceph
[17:40] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[17:41] * sagelap (~sage@2600:1012:b01e:6da3:7cdc:682:a7d2:5029) has joined #ceph
[17:42] * mancdaz (~mancdaz@2a00:1a48:7807:102:94f4:6b56:ff08:7ca6) Quit ()
[17:42] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[17:43] * mancdaz (~mancdaz@2a00:1a48:7807:102:94f4:6b56:ff08:7ca6) has joined #ceph
[17:50] * The_Bishop (~bishop@f055098013.adsl.alicedsl.de) has joined #ceph
[17:56] * The_Bishop (~bishop@f055098013.adsl.alicedsl.de) Quit (Read error: Connection reset by peer)
[17:58] * The_Bishop (~bishop@f055098013.adsl.alicedsl.de) has joined #ceph
[18:00] * gregsfortytwo (~Adium@2607:f298:a:607:1d36:18ae:f47a:8fd2) Quit (Quit: Leaving.)
[18:00] * mattt__ (~textual@92.52.76.140) Quit (Read error: Connection reset by peer)
[18:00] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:00] * gregsfortytwo (~Adium@2607:f298:a:607:f838:d41:b7e4:60d5) has joined #ceph
[18:02] * Sysadmin88 (~IceChat77@90.221.57.103) has joined #ceph
[18:06] * The_Bishop (~bishop@f055098013.adsl.alicedsl.de) Quit (Remote host closed the connection)
[18:07] * sagelap (~sage@2600:1012:b01e:6da3:7cdc:682:a7d2:5029) Quit (Ping timeout: 480 seconds)
[18:07] * Underbyte (~jerrad@pat-global.macpractice.net) has joined #ceph
[18:08] * sagelap (~sage@2600:1012:b02c:98d3:7cdc:682:a7d2:5029) has joined #ceph
[18:15] * nregola_comcast1 (~nregola_c@fbr.reston.va.neto-iss.comcast.net) has joined #ceph
[18:15] * nregola_comcast1 (~nregola_c@fbr.reston.va.neto-iss.comcast.net) Quit ()
[18:16] * sleinen2 (~Adium@2001:620:0:46:5c1b:1a1b:a5f1:fd84) Quit (Ping timeout: 480 seconds)
[18:18] * mjeanson_ (~mjeanson@bell.multivax.ca) has joined #ceph
[18:19] * mjeanson (~mjeanson@00012705.user.oftc.net) Quit (Read error: Connection reset by peer)
[18:19] * sroy (~sroy@2607:fad8:4:6:3e97:eff:feb5:1e2b) has joined #ceph
[18:19] * xmltok (~xmltok@216.103.134.250) has joined #ceph
[18:19] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[18:21] * sagelap1 (~sage@38.122.20.226) has joined #ceph
[18:22] * davidzlap (~Adium@cpe-23-242-31-175.socal.res.rr.com) has joined #ceph
[18:23] * rendar (~s@host105-177-dynamic.20-87-r.retail.telecomitalia.it) has joined #ceph
[18:24] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[18:24] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) has joined #ceph
[18:25] * sagelap (~sage@2600:1012:b02c:98d3:7cdc:682:a7d2:5029) Quit (Ping timeout: 480 seconds)
[18:31] * sagelap1 (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[18:32] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[18:34] * Cube (~Cube@12.248.40.138) Quit (Read error: Connection reset by peer)
[18:38] * zoltan (~zoltan@2001:620:20:16:8c78:d928:eed5:ada7) Quit (Ping timeout: 480 seconds)
[18:39] * sagelap (~sage@2607:f298:a:607:7cdc:682:a7d2:5029) has joined #ceph
[18:41] * clayb (~kvirc@69.191.241.59) has joined #ceph
[18:42] * dpippenger (~riven@cpe-76-166-208-83.socal.res.rr.com) has joined #ceph
[18:48] * nregola_comcast (~nregola_c@fbr.reston.va.neto-iss.comcast.net) has joined #ceph
[18:48] * alphe (~alphe@0001ac6f.user.oftc.net) Quit (Remote host closed the connection)
[18:49] * mfisch (~mfisch@c-24-8-179-180.hsd1.co.comcast.net) has joined #ceph
[18:49] * onizo (~onizo@cpe-75-80-122-116.san.res.rr.com) has joined #ceph
[18:53] * \ask (~ask@oz.develooper.com) Quit (Ping timeout: 480 seconds)
[18:54] <loicd> I was going over the list of people signed up for the Ceph User Committee and realized there is noone from Dreamhost, which seems odd :-)
[18:55] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[18:55] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[19:00] * wusui1 (~Warren@2607:f298:a:607:cd5:7dc5:a006:fbcc) has joined #ceph
[19:02] * nregola_comcast (~nregola_c@fbr.reston.va.neto-iss.comcast.net) Quit (Read error: Operation timed out)
[19:05] * Sysadmin88 (~IceChat77@90.221.57.103) Quit (Read error: Connection reset by peer)
[19:06] * WarrenUsui (~Warren@2607:f298:a:607:3d58:9a3b:c9f8:8961) Quit (Ping timeout: 480 seconds)
[19:06] * WarrenUsui1 (~Warren@2607:f298:a:607:3d58:9a3b:c9f8:8961) Quit (Ping timeout: 480 seconds)
[19:06] * wusui (~Warren@2607:f298:a:607:cd5:7dc5:a006:fbcc) has joined #ceph
[19:08] * onizo (~onizo@cpe-75-80-122-116.san.res.rr.com) Quit (Remote host closed the connection)
[19:08] * nregola_comcast (~nregola_c@c-69-140-90-184.hsd1.va.comcast.net) has joined #ceph
[19:10] * nregola_comcast1 (~nregola_c@c-69-140-90-184.hsd1.va.comcast.net) has joined #ceph
[19:12] * warrenwang (~wwang@c-98-218-153-127.hsd1.va.comcast.net) has joined #ceph
[19:16] * nregola_comcast (~nregola_c@c-69-140-90-184.hsd1.va.comcast.net) Quit (Ping timeout: 480 seconds)
[19:19] * sarob (~sarob@nat-dip12.cfw-a-gci.corp.yahoo.com) has joined #ceph
[19:19] * houkouonchi-home (~linux@66-215-209-207.dhcp.rvsd.ca.charter.com) has joined #ceph
[19:20] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[19:21] <aarontc> loicd: why is that odd? didn't most of inktank's founders come from DH?
[19:27] <loicd> aarontc: to the point that there is noone left from DreamHost ? :-D
[19:35] * JoeGruher (~JoeGruher@134.134.139.76) has joined #ceph
[19:36] * mfisch (~mfisch@c-24-8-179-180.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[19:42] * danieagle (~Daniel@186.214.63.175) has joined #ceph
[19:43] <xarses> morning
[19:44] * fouxm (~fouxm@185.23.92.11) Quit (Remote host closed the connection)
[19:46] * dmick (~dmick@38.122.20.226) has joined #ceph
[19:50] * glowell (~glowell@69.170.166.146) has joined #ceph
[19:50] * nregola_comcast1 (~nregola_c@c-69-140-90-184.hsd1.va.comcast.net) Quit (Quit: Leaving.)
[19:51] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[19:52] * glowell (~glowell@69.170.166.146) Quit ()
[19:58] * alphe (~alphe@0001ac6f.user.oftc.net) has joined #ceph
[19:58] * sarob (~sarob@nat-dip12.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[19:58] * sarob (~sarob@nat-dip12.cfw-a-gci.corp.yahoo.com) has joined #ceph
[19:59] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[19:59] * glowell (~glowell@69.170.166.146) has joined #ceph
[20:03] * sarob (~sarob@nat-dip12.cfw-a-gci.corp.yahoo.com) Quit (Read error: Operation timed out)
[20:03] * Pedras (~Adium@216.207.42.132) has joined #ceph
[20:08] <alphe> i need to reinstall the os of one node it has the main mon on it that sucks
[20:08] <alphe> I don t know what is the best course of actions to reinstate it in the ceph cluster
[20:09] <alphe> I copied /var/lib/ceph files should I simply restore them to the same place
[20:09] * aliguori (~anthony@74.202.210.82) has joined #ceph
[20:09] <alphe> after installing ceph on that reinstalled node ?
[20:12] * angdraug (~angdraug@64-79-127-122.static.wiline.com) has joined #ceph
[20:12] <sagewk> alphe: just make sure ceph.conf is in /etc/ceph too; should just start up
[20:13] * \ask (~ask@oz.develooper.com) has joined #ceph
[20:14] <alphe> ok
[20:14] * davidzlap (~Adium@cpe-23-242-31-175.socal.res.rr.com) Quit (Quit: Leaving.)
[20:15] <alphe> sagewk got a strange set of crazy stuff
[20:16] <sagewk> what kind of crazy? :)
[20:16] <alphe> sage first friday i get the mon2 of 3 going out tryed to get it back it was going down with crash
[20:17] <alphe> then I reboot the whole ceph cluster and got the 3 osd down
[20:18] <alphe> the 3 monitors down too
[20:18] * michaelkk (~michaekk@ool-4353c729.dyn.optonline.net) Quit ()
[20:19] <alphe> the node 1 goes dead
[20:19] <alphe> then the node 2 was not automatically restarting so I had to start it manually
[20:19] * JoeGruher (~JoeGruher@134.134.139.76) Quit ()
[20:20] <aarontc> would adding SSDs to my OSD hosts for the OSD journals improve read latencies much, or just help writes?
[20:20] <xarses> aarontc: writes mostly
[20:21] <alphe> now I only have the dead node reinstalling the rest of the cluster is trying to fill the gaps happyly
[20:21] <aarontc> xarses: the problem I've observed is that writes starve reads terribly, like even a few writes per second adds hundrds of ms to read latency :/
[20:21] <alphe> but i don t understand why suddently mon2 and it osd went missing and why it doesn t auto start anymore
[20:22] <alphe> start ceph-all not working on node 2 ...
[20:23] <aarontc> xarses: so my thought was that the spindles are busy committing writes to journal, and that could be alleviated with SSDs
[20:23] <xarses> aarontc: the data from the journal sill has to be written to the osd volume, the SSD journal allows for better latency to ack the write IO, now given that the a write equates to two operations (one journal, and one osd), you would cut the write ops to the disk in half, which would allow for the chance of better read perf
[20:25] <aarontc> xarses: so the theoretical maximum performance gain is 2x IOPS by moving the journal to another device, SSD or otherwise
[20:26] <mikedawson> aarontc: I believe that is correct
[20:26] <aarontc> I don't have enough SATA ports to add a spindle per OSD for journaling, but I could manage to add a single SSD with multiple journal partitions (most of my OSD hosts have 4 to 6 OSDs)
[20:26] <xarses> aarontc: correct, the underlying device will reduce in load, but if your cluster is busy, it may stay busy
[20:28] <aarontc> xarses: long story short, I have a fairly light load (was down to a single qemu KVM running on rbd), and with any write activity I have to start measuring read latency in /seconds/ :(
[20:28] <xarses> aarontc: you might want to consider adding SSD _AND_ increasing the object replicas if the read latency continues to be an issue, that way you can have more sources to read from (at the cost of storage space). You may also want to inspect network bandwidth to ensure it isn't part of the problem
[20:29] <xarses> aarontc: ick
[20:29] <mikedawson> aarontc: my guess is you have some other issue going on
[20:29] <aarontc> mikedawson: well I added the performance logging you suggested and that's when it became very obvious that writes are completely starving reads
[20:29] <aarontc> I guess I haven't answered the question "why", yet, though
[20:30] <xarses> aarontc: i agree with mikedawson, that seems very bad
[20:30] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz…)
[20:31] <aarontc> xarses: I've had trouble finding any good reference numbers for what to expect, so I can't tell if it's "very bad" or not :/
[20:31] * nregola_comcast (~nregola_c@c-69-140-90-184.hsd1.va.comcast.net) has joined #ceph
[20:32] <mikedawson> aarontc: you should be able to get somewhere around 75 4KB iops using cheap 7200 rpm drives. Assuming you have 3x replication and the journal on the same spindle, that means you should get roughly 12 iops from each spinner
[20:32] <mikedawson> so if you have 10 spinners, 120 iops seems reasonable
[20:32] <aarontc> mikedawson: I have "size = 3", so it's only 2 replicas
[20:32] <aarontc> but the data is in 3 places
[20:32] <mikedawson> that's what I meant
[20:33] <aarontc> the cluster consists of 22 OSDs
[20:33] <aarontc> (cheap 7200 RPM drives)
[20:34] <alphe> hum confusing /etc/ceph/ceph.conf created with ceph-deploy is really minimalistic
[20:34] <aarontc> does the 512B vs. 4kB sector size matter? I know most of my drives are "advanced format" but I think they still emulate 512B sectors
[20:34] <xarses> alphe: it can be
[20:34] <mikedawson> aarontc: in that case, 275 or so writes/s sounds reasonable in the cluster. If you move the journal, perhaps you'll be able to do 550
[20:35] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[20:35] <alphe> xarses yes but in fact ceph.conf has not every info most if them the osdmap monmap etc are in var/lib/ceph
[20:36] <aarontc> mikedawson: I'm not sure if I have numbers to measure writes/s vs reads/s, but I believe the write activity is minimal
[20:36] <alphe> I don t know why my node2 with mon2 and osd2 and osd3 doesn t auto start anymore ...
[20:36] <alphe> that freaks me out
[20:36] <mikedawson> aarontc: reads always come from the primary (and your two replicas don't need to seek), so they should perform up to 3x better than writes. On the other hand, rbd writeback cache may coalesce things nicely to up the apparent writes/s in your rbd guest
[20:37] <mikedawson> aarontc: You can poll the rbd and osd admin sockets to get writes/s and reads/s
[20:37] <aarontc> mikedawson: I have enabled writeback caching on my VMs, and I saw definite write improvements since the guest sees 0 latency (I used 512MB cache, and my linux VMs probably don't see that much write activity in a week)
[20:38] <alphe> if I manually kill -9 every ceph related service then do a start ceph-all ceph-all tell me that it is already running ...
[20:38] <alphe> and ps -edf | grep ceph show nothing running ..
[20:39] <alphe> sagewk> alphe: just make sure ceph.conf is in /etc/ceph too; should just start up
[20:39] <aarontc> mikedawson: which variables should I look at for writes/s and reads/s from the perf dump on the rbd asok?
[20:39] <alphe> is that valid with a very minimalistic ceph-conf from ceph-deploy
[20:40] <alphe> or should I copy /var/lib/ceph content then push the config back to that rebuilt node ?
[20:40] <sagewk> the old /var/lib/ceph content needs to be copied into place too..
[20:40] <xdeller> hey, anybody understands what`s an estimation for recently discussed proposal in the ml for filestore replacement with the kvstore?
[20:40] <alphe> ok
[20:40] <alphe> great
[20:41] <aarontc> rbd stats - http://hastebin.com/howidajodo.json
[20:41] <alphe> I m happy before I jupiter the node to have kept a copy of that directory
[20:41] <mikedawson> aarontc: to measure cluster-wide performance look at osd admin sockets. op_r (read), op_w (writes), op_rw (read-writes, used for instance with cloned volumes), and op (total iops)
[20:42] <alphe> recovery io 614 MB/s, 177 objects/s
[20:42] <aarontc> mikedawson: I've got the per-OSD data being polled and logged regularly, but I'm looking at the librbd sock now on the VM itself to try and get more data
[20:42] <alphe> erf it is recovery while I will restore the two missing node in some few moments
[20:43] <aarontc> I guess I'm wrong about the write stats, this VM has written 22GB, apparently
[20:45] <mikedawson> aarontc: You use aio (asynchronous io), so on the RBD perf dump, use aio_rd and aio_wr
[20:46] <aarontc> mikedawson: okay, so I need to graph those over time for each librbd client
[20:46] <aarontc> + ?
[20:48] <mikedawson> aarontc: yes. But, I will tell you the osd perf dumps have been far more interesting than rbd perf dumps to me so far.
[20:49] <aarontc> mikedawson: I think I've shown you the charts before, but here is the latest from osd 0: http://i.imgur.com/F8sV2Vc.png
[20:50] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[20:50] <aarontc> this is with a single VM running. cephfs is mounted on a few machines but there shouldn't be any activity at the moment
[20:51] * mfisch (~mfisch@c-76-25-23-72.hsd1.co.comcast.net) has joined #ceph
[20:52] * nregola_comcast1 (~nregola_c@fbr.reston.va.neto-iss.comcast.net) has joined #ceph
[20:55] <aarontc> and I guess the followup query is - if I added 3 OSDs that were solid state drives, made a new pool/crush ruleset so the pool lived only on the SSDs, could I expect to see thousands of IOPS and low read latencies for VMs?
[20:57] <mikedawson> aarontc: graphed data seems slow (too much latency) to me, almost certainly due to journals on the SSD.
[20:57] <mikedawson> aarontc: not SSD, on the spinner
[20:58] * nregola_comcast (~nregola_c@c-69-140-90-184.hsd1.va.comcast.net) Quit (Ping timeout: 480 seconds)
[20:58] <mikedawson> aarontc: yes, a pure ssd pool would make these write latencies look much better.
[20:58] <aarontc> mikedawson: so back to my original thought - moving journals to SSDs would improve read latencies?
[20:58] <aarontc> or you think there is some other problem going on?
[20:58] * thomnico (~thomnico@2a01:e35:8b41:120:c17:240c:3f05:30d5) Quit (Ping timeout: 480 seconds)
[20:59] <mikedawson> aarontc: with journals on spinners my cluster with lots of small, random writes shows an average op latency of around 40ms
[21:00] <aarontc> on the same spindle as the OSD or separate spindles?
[21:00] <mikedawson> aarontc: sorry, I'm a bit distracted right now...my journals are on ssds (3 osd journals per ssd)
[21:01] <aarontc> oh okay
[21:01] <aarontc> well, I can just "dd" the journal to a new device and update the OSD symlink to try using SSDs, right? I have a few of them available at the moment
[21:02] * nwat (~textual@eduroam-229-33.ucsc.edu) has joined #ceph
[21:03] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) has joined #ceph
[21:07] * vata (~vata@2607:fad8:4:6:c461:ff4:cee1:a342) has joined #ceph
[21:09] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:09] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:09] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Read error: Connection reset by peer)
[21:10] * mozg (~andrei@host81-151-251-29.range81-151.btcentralplus.com) has joined #ceph
[21:10] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[21:15] * Cube1 (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[21:17] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[21:18] * ericenb (~ericenb@50.240.86.181) has joined #ceph
[21:23] * codice (~toodles@71-80-186-21.dhcp.lnbh.ca.charter.com) has joined #ceph
[21:24] <aarontc> question - does the OSD journal have to be a block device, or could it be a file? at least short-term for performance testing?
[21:25] <alphe> usually it s a file ...
[21:25] <aarontc> alphe: I don't agree, both the ceph-deploy tool and the documentation set things up with a separate disk partition for the journal
[21:25] <mikedawson> aarontc: OSD journal can be a file. I have never moved a journal, but I believe I have read a procedure to drain a journal and create a new journal somewhere else
[21:26] <janos> i've done precisely that before
[21:26] <janos> though i don't recall the commands
[21:26] <janos> but know it's possible, fast and safe!
[21:26] <saturnine> What would be the best way to go about getting usage information for billing purposes?
[21:28] * kraken (~kraken@gw.sepia.ceph.com) Quit (Remote host closed the connection)
[21:28] * kraken (~kraken@gw.sepia.ceph.com) has joined #ceph
[21:28] <aarontc> janos, I'm going to try the 'dd' method... wish me luck ;)
[21:28] <alphe> aarontc depends what journal ... is it the xfs journal ?
[21:29] <alphe> or is it the ceph osd journal ?
[21:29] <aarontc> alphe: no, the ceph OSD journal
[21:29] <alphe> aarontc i saw a journal file in /var/lib/ceph
[21:29] <janos> aarontc: good luck! i really wish i could recall what exactly i did. it was during bobtail days
[21:29] <aarontc> alphe: you should have /var/lib/ceph/osd/{cluster}-{osd#}/journal which is a symlink to the journal device (or file?)
[21:30] <alphe> symlink
[21:31] <alphe> /dev/disk/by-partuuid/3c01442f-6e46-41f2-b686-8a42f2dd8511
[21:32] <aarontc> 2013-12-02 12:31:49.284821 7f39093e1780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway
[21:32] <aarontc> looks like the OSD doesn't mind it being a file
[21:34] <aarontc> okay, two of my OSDs are using SSD for journal now, I'll see if they get less latency
[21:34] <alphe> aarontc great stuff !
[21:35] <alphe> aarontc you just removed the link and restored it ?
[21:35] <alphe> the osd was shutdown ?
[21:35] <mikedawson> aarontc: focus on subop latencies rather than op latencies. That will show you just the time spent on this OSD rather than the time spent of this osd and the replicas.
[21:36] <aarontc> alphe: I stopped the OSD, dd if=journal of=/mnt/ssd/journalfile; ln -s /mnt/ssd/journalfile journal; start OSD
[21:37] <mikedawson> aarontc: if you look at op latencies, you likely won't see much difference because the other osds will still be slow. But subop latencies should show the difference well
[21:37] <aarontc> mikedawson: noted, I'll keep an eye on the subop latencies
[21:39] <mikedawson> aarontc: my subop latencies average aroung 15ms
[21:40] <aarontc> mikedawson: should I be capturing osd.subop_w_latency and osd.subop_latency, or just subop_latency?
[21:40] * xdeller (~xdeller@91.218.144.129) Quit (Quit: Leaving)
[21:41] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[21:41] <aarontc> (I don't know if subop_latency is an average of r/w or just r)
[21:42] <mikedawson> aarontc: subop_latency is both read and write, I believe. I capture all numeric output of the osd admin sockets, then build graphs as needed
[21:42] <aarontc> mikedawson: (currently) I have to manually add the expressions to capture, so I only added subop_latency
[21:42] <aarontc> since I'm capturing every 10 seconds, my database is growing rapidly already :/
[21:43] * mtanski (~mtanski@69.193.178.202) Quit (Read error: Operation timed out)
[21:47] <aarontc> totally subjective, but after moving the journals there is a lot less disk activity on that OSD host
[21:48] <aarontc> the light is off a lot more than it's on now
[21:49] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[21:53] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[21:53] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[21:55] * sroy (~sroy@2607:fad8:4:6:3e97:eff:feb5:1e2b) Quit (Ping timeout: 480 seconds)
[21:58] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[21:58] * dwt (dwt@d.clients.kiwiirc.com) has joined #ceph
[21:59] * Cube (~Cube@66-87-66-39.pools.spcsdns.net) has joined #ceph
[22:00] * mozg (~andrei@host81-151-251-29.range81-151.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[22:01] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[22:02] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[22:02] * terje__ (~joey@174-16-114-127.hlrn.qwest.net) has joined #ceph
[22:03] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[22:03] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[22:04] * terje_ (~joey@75-171-250-107.hlrn.qwest.net) Quit (Ping timeout: 480 seconds)
[22:04] * sleinen1 (~Adium@2001:620:0:26:59cf:f3ac:db29:48ec) has joined #ceph
[22:05] * sroy (~sroy@ip-208-88-110-45.savoirfairelinux.net) has joined #ceph
[22:06] * sroy (~sroy@ip-208-88-110-45.savoirfairelinux.net) Quit ()
[22:07] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[22:10] * mtanski (~mtanski@69.193.178.202) Quit ()
[22:11] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[22:11] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[22:13] <dwt> is it possible to have redundency when using iscsi + ceph? eg having the same targets on two+ servers?
[22:14] <aarontc> mikedawson: the gray line is a lot less jittery after switching to SSD journal: http://i.imgur.com/X4wZevq.png
[22:15] <dmick> dwt: see comment 2 in http://ceph.com/dev-notes/updates-to-ceph-tgt-iscsi-support/ for my current thinking on that
[22:16] <dmick> you certainly need some sort of device access moderation agent outside of Ceph, at least
[22:16] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[22:17] <janos> oh neat dmick. i had no idea this feature was occuring
[22:17] <dwt> ok great, thanks
[22:23] <dwt> looks like HA is achieved with drbd and pacemaker
[22:23] <mikedawson> aarontc: yeah, that looks better! subop_latency average from 62ms -> 19ms
[22:24] <aarontc> personally I wish there was a really simple way to do QEMU-KVM VM HA when backed by RBD running under Libvirtd
[22:24] <aarontc> mikedawson: I'm doing the same operation on another OSD host now, so far it's looking like a good improvement :)
[22:24] <aarontc> is there a limit to how many OSDs can be on the same host? I have Intel 320 SSDs that are rated for, IIRC, 40k write IOPS and 35k read IOPS
[22:25] <aarontc> oops, I meant OSD journals can be on the same SSD
[22:25] <PerlStalker> aarontc: What sort of HA are you looking for with KVM?
[22:26] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[22:26] <aarontc> PerlStalker: just if one VM host fails, the VMs get rebooted on other hosts depending on available RAM
[22:26] <mikedawson> aarontc: the goal is to not clobber large, sequential write performance. So, divide average write throughput of the ssd by average write throughput of the spinners to get a ratio
[22:26] <PerlStalker> aarontc: see pacemaker
[22:27] <mikedawson> aarontc: if the SSD does 500MB/s and the spinner does 100MB/s, then 5:1
[22:27] <aarontc> mikedawson: hmm that's tough, most of the spindles can sustain 100MB/sec writes, but the SSDs are SATA-2 so even if it can go faster, the most it can do is about 280MB/sec (never seen it go below that)
[22:28] <aarontc> so I guess without SATA-2 I'm limited to two or three OSD journals per SSD
[22:28] <aarontc> man I fail at typing today... *SATA-3
[22:28] <aarontc> PerlStalker: thanks, I'll check it out
[22:28] <aarontc> PerlStalker: clusterlabs.org I'm assuming?
[22:29] <PerlStalker> aarontc: aye
[22:31] <aarontc> PerlStalker: http://www.woodwose.net/thatremindsme/2012/10/ha-virtualisation-with-pacemaker-and-ceph/ looks like a good quickstart
[22:33] <aarontc> mikedawson: maybe since I only have 2x1Gbit network connections on my OSD hosts, I don't have to worry about exceeding 250MB/sec throughput on the SSD?
[22:33] <dwt> is there a general ratio I should use for extra data to be available to my cluster above and beyond the size of data I actually want?
[22:33] <dwt> err extra available space rather
[22:34] <mikedawson> aarontc: sounds reasonable
[22:34] <lurbs> aarontc: What that quickstart doesn't seem to mention is how important proper STONITH is. Having your VMs ending up running on multiple VM hosts is all types of bad. :(
[22:35] <PerlStalker> It is indeed.
[22:35] <PerlStalker> You can generally kiss your file system goodbye
[22:36] <lurbs> At least you can roll back to your latest good RBD snapshot at that point. :)
[22:36] <janos> i think there is a libvirt sanlock process that can be used to prevent a guest from being run on multiple hosts at the same time
[22:37] <janos> http://libvirt.org/locking.html
[22:37] <janos> i've not used though
[22:37] <lurbs> I saw that. If someone has knowledge on how to get it working nicely on Ubuntu LTS I'd love to hear about it.
[22:38] <PerlStalker> I've not used the locking but I have used pacemaker+kvm+ceph
[22:40] * bandrus (~Adium@107.216.174.246) Quit (Quit: Leaving.)
[22:40] * bandrus (~Adium@107.216.174.246) has joined #ceph
[22:40] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[22:40] * BillK (~BillK-OFT@106-68-207-236.dyn.iinet.net.au) has joined #ceph
[22:40] * japuzzo (~japuzzo@pok2.bluebird.ibm.com) Quit (Quit: Leaving)
[22:43] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[22:44] * thomnico (~thomnico@2a01:e35:8b41:120:c980:50bb:8bc7:6c93) has joined #ceph
[22:44] * dwt (dwt@d.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[22:45] <aarontc> so does sanlock work with cephFS? :)
[22:47] * alexxy[home] (~alexxy@2001:470:1f14:106::2) has joined #ceph
[22:47] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Read error: Connection reset by peer)
[22:47] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Ping timeout: 480 seconds)
[22:50] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[22:51] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[22:51] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[22:51] * warrenwang1 (~wwang@cet-nat-254.ndceast.pa.bo.comcast.net) has joined #ceph
[22:53] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[22:56] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[22:57] * warrenwang (~wwang@c-98-218-153-127.hsd1.va.comcast.net) Quit (Ping timeout: 480 seconds)
[22:58] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[22:59] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[23:00] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Read error: Connection reset by peer)
[23:01] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[23:02] <PerlStalker> aarontc: I wouldn't use cephfs to store vm disk images.
[23:02] * allsystemsarego (~allsystem@5-12-240-115.residential.rdsnet.ro) Quit (Quit: Leaving)
[23:02] * xdeller (~xdeller@95-31-29-125.broadband.corbina.ru) has joined #ceph
[23:04] <aarontc> PerlStalker: I was talking about sanlock, the STONITH option for libvirt
[23:04] * nregola_comcast1 (~nregola_c@fbr.reston.va.neto-iss.comcast.net) has left #ceph
[23:06] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[23:06] <PerlStalker> aarontc: Since cephfs support access from multiple hosts, it's probably not needed.
[23:07] <aarontc> PerlStalker: sanlock requires a shared filesystem to store lock files on in order to know when a VM is running or not, it's required by pacemaker
[23:07] * xmltok (~xmltok@216.103.134.250) Quit (Quit: Leaving...)
[23:07] <PerlStalker> In that case, it should work. :-)
[23:08] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[23:08] <aarontc> I think I'll hold off on HA for my VMs until I get RBD fast enough to run them all :)
[23:09] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) Quit (Quit: Computer has gone to sleep.)
[23:10] <aarontc> I have SSD journals for 3 of my 6 OSD hosts now
[23:10] <aarontc> hopefully I'll see a good i mprovement
[23:13] * ScOut3R (~scout3r@5400354A.dsl.pool.telekom.hu) has joined #ceph
[23:25] * alexxy[home] (~alexxy@2001:470:1f14:106::2) Quit (Read error: No route to host)
[23:25] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[23:27] * rendar (~s@host105-177-dynamic.20-87-r.retail.telecomitalia.it) Quit ()
[23:30] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[23:30] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[23:35] * sjustwork (~sam@2607:f298:a:607:38aa:d318:6f02:da9b) Quit (Ping timeout: 480 seconds)
[23:37] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:38] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[23:39] * yanzheng (~zhyan@134.134.137.71) has joined #ceph
[23:40] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:41] * philips (~philips@ec2-54-226-249-155.compute-1.amazonaws.com) Quit (Quit: http://ifup.org)
[23:43] * danieagle (~Daniel@186.214.63.175) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[23:43] * philips (~philips@ec2-54-226-249-155.compute-1.amazonaws.com) has joined #ceph
[23:45] * sleinen1 (~Adium@2001:620:0:26:59cf:f3ac:db29:48ec) Quit (Quit: Leaving.)
[23:45] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[23:45] * sjustwork (~sam@2607:f298:a:607:858c:1b47:e54b:d8d2) has joined #ceph
[23:46] * vata (~vata@2607:fad8:4:6:c461:ff4:cee1:a342) Quit (Quit: Leaving.)
[23:47] * Pauline (~middelink@2001:838:3c1:1:be5f:f4ff:fe58:e04) Quit (Ping timeout: 480 seconds)
[23:49] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[23:53] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:56] * Pauline (~middelink@2001:838:3c1:1:be5f:f4ff:fe58:e04) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.