#ceph IRC Log

Index

IRC Log for 2013-09-03

Timestamps are in GMT/BST.

[0:07] * LeaChim (~LeaChim@054073db.skybroadband.com) has joined #ceph
[0:13] * sleinen1 (~Adium@2001:620:0:26:f43e:1367:4137:9058) Quit (Quit: Leaving.)
[0:13] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[0:15] * LeaChim (~LeaChim@054073db.skybroadband.com) Quit (Ping timeout: 480 seconds)
[0:18] * sleinen1 (~Adium@2001:620:0:26:3c99:1dd3:d6f3:9250) has joined #ceph
[0:18] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Leaving...)
[0:21] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[0:21] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:25] * LeaChim (~LeaChim@054073db.skybroadband.com) has joined #ceph
[0:25] * diegows (~diegows@200.68.116.185) Quit (Read error: Operation timed out)
[0:28] * markbby (~Adium@168.94.245.4) has joined #ceph
[0:31] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[0:41] * markbby (~Adium@168.94.245.4) Quit (Remote host closed the connection)
[0:41] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[0:41] * sherry (~sherry@wireless-nat-10.auckland.ac.nz) has joined #ceph
[0:50] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[0:58] * roald (~oftc-webi@87.209.150.214) Quit (Remote host closed the connection)
[0:58] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[1:01] * tnt (~tnt@91.176.35.47) Quit (Ping timeout: 480 seconds)
[1:03] * vanham (~vanham@gateway.mav.com.br) has joined #ceph
[1:04] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[1:05] <vanham> Guys, my MDS is stuck on rejoin, any1 have any tip for that?
[1:05] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[1:07] <vanham> any dev online plz?
[1:07] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[1:08] * JustEra (~JustEra@ALille-555-1-227-212.w86-215.abo.wanadoo.fr) has joined #ceph
[1:09] * sleinen1 (~Adium@2001:620:0:26:3c99:1dd3:d6f3:9250) Quit (Quit: Leaving.)
[1:09] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[1:15] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[1:17] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:21] * LeaChim (~LeaChim@054073db.skybroadband.com) Quit (Ping timeout: 480 seconds)
[1:23] * vanham (~vanham@gateway.mav.com.br) Quit (Quit: Saindo)
[1:40] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[1:45] * aliguori_ (~anthony@cpe-70-112-153-179.austin.res.rr.com) Quit (Quit: Ex-Chat)
[1:51] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[1:52] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:53] * mozg (~andrei@host86-185-78-26.range86-185.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:53] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[1:56] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[2:01] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) Quit (Remote host closed the connection)
[2:01] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) has joined #ceph
[2:05] * mschiff_ (~mschiff@port-54664.pppoe.wtnet.de) has joined #ceph
[2:07] * DarkAceZ (~BillyMays@50.107.55.36) Quit (Ping timeout: 480 seconds)
[2:12] * mschiff (~mschiff@port-49715.pppoe.wtnet.de) Quit (Ping timeout: 480 seconds)
[2:13] * The_Bishop__ (~bishop@f052102251.adsl.alicedsl.de) has joined #ceph
[2:16] * diegows (~diegows@190.190.11.42) has joined #ceph
[2:17] * The_Bishop_ (~bishop@e179002175.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[2:17] * sherry (~sherry@wireless-nat-10.auckland.ac.nz) Quit (Quit: Konversation terminated!)
[2:25] * ross__ (~ross@60.208.111.209) has joined #ceph
[2:45] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[2:45] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[2:50] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[2:53] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[2:54] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Bye!)
[2:54] * yy-nm (~Thunderbi@125.122.52.139) has joined #ceph
[2:55] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[2:57] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[2:57] * ChanServ sets mode +o scuttlemonkey
[2:57] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[2:58] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[3:04] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[3:11] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[3:12] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[3:15] * DarkAceZ (~BillyMays@50.107.55.36) has joined #ceph
[3:16] * KindTwo (~KindOne@h227.215.89.75.dynamic.ip.windstream.net) has joined #ceph
[3:18] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:18] * KindTwo is now known as KindOne
[3:19] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has left #ceph
[3:29] * mschiff_ (~mschiff@port-54664.pppoe.wtnet.de) Quit (Remote host closed the connection)
[3:31] * KindTwo (~KindOne@h46.175.17.98.dynamic.ip.windstream.net) has joined #ceph
[3:32] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:32] * KindTwo is now known as KindOne
[3:34] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[3:42] * sherry (~sherry@wireless-nat-10.auckland.ac.nz) has joined #ceph
[3:42] <sherry> how to install rpm-build and rpmdevtool?
[3:45] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[3:46] * nerdtron (~kenneth@202.60.8.252) Quit (Remote host closed the connection)
[3:48] * cofol1986 (~xwrj@110.90.119.113) Quit (Read error: Connection reset by peer)
[3:52] * themgt_ (~themgt@201-223-251-89.baf.movistar.cl) has joined #ceph
[3:53] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[3:53] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[3:55] * peetaur2 (~peter@CPEbc1401e60493-CMbc1401e60490.cpe.net.cable.rogers.com) Quit (Quit: Konversation terminated!)
[3:58] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit ()
[3:58] * themgt (~themgt@201-223-204-144.baf.movistar.cl) Quit (Ping timeout: 480 seconds)
[3:58] * themgt_ is now known as themgt
[4:03] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[4:12] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley_)
[4:14] * yanlb (~bean@8.37.228.186) has joined #ceph
[4:17] * diegows (~diegows@190.190.11.42) Quit (Read error: Operation timed out)
[4:21] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[4:28] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[4:29] * yy-nm (~Thunderbi@125.122.52.139) Quit (Remote host closed the connection)
[4:29] * yy-nm (~Thunderbi@125.122.52.139) has joined #ceph
[4:34] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[4:42] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Read error: No route to host)
[4:45] * joao (~joao@89.181.152.211) has joined #ceph
[4:45] * ChanServ sets mode +o joao
[4:45] * KindTwo (~KindOne@h26.52.186.173.dynamic.ip.windstream.net) has joined #ceph
[4:45] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[4:46] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:46] * KindTwo is now known as KindOne
[4:49] * jluis (~joao@89.181.146.94) Quit (Ping timeout: 480 seconds)
[4:50] * yanzheng (~zhyan@134.134.139.70) has joined #ceph
[4:51] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[4:52] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[4:53] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[4:59] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[5:05] * fireD_ (~fireD@93-139-160-160.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD (~fireD@93-139-162-39.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:09] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[5:20] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[5:20] <yehudasa_> ccourtaut: sorry, it was a holiday around here today
[5:21] <yehudasa_> ccourtaut: basically you need to do an object copy, there's some extra header that you need to use that'll specify the target zone, and you can only use it using a system user
[5:22] <yehudasa_> ccourtaut: note that the api only works within a single region, so it aims at the DR case
[5:23] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[5:26] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) Quit (Quit: sprachgenerator)
[5:44] * dosaboy (~dosaboy@host81-155-236-130.range81-155.btcentralplus.com) has joined #ceph
[5:44] * themgt_ (~themgt@201-223-212-24.baf.movistar.cl) has joined #ceph
[5:46] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[5:50] * themgt (~themgt@201-223-251-89.baf.movistar.cl) Quit (Ping timeout: 480 seconds)
[5:50] * themgt_ is now known as themgt
[5:51] * dosaboy_ (~dosaboy@host86-152-199-206.range86-152.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[5:54] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[6:20] * yy-nm (~Thunderbi@125.122.52.139) Quit (Quit: yy-nm)
[6:24] * themgt_ (~themgt@201-223-217-113.baf.movistar.cl) has joined #ceph
[6:30] * themgt (~themgt@201-223-212-24.baf.movistar.cl) Quit (Ping timeout: 480 seconds)
[6:30] * themgt_ is now known as themgt
[6:43] <nerdtron> how do i remove a monitor? it is currently down and is not participating anymore in the cluster
[6:46] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[6:54] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley_)
[6:54] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:16] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[7:17] * ScOut3R (~ScOut3R@catv-89-133-25-52.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[7:24] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:24] * sleinen (~Adium@2001:620:0:26:256e:888c:6815:ed1f) has joined #ceph
[7:26] * sleinen (~Adium@2001:620:0:26:256e:888c:6815:ed1f) Quit ()
[7:26] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[7:27] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Read error: Connection reset by peer)
[7:27] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[7:27] * haomaiwang (~haomaiwan@124.161.72.254) has joined #ceph
[7:34] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:35] * shimo_ (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[7:38] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[7:38] * shimo_ is now known as shimo
[7:39] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[7:42] * haomaiwang (~haomaiwan@124.161.72.254) Quit (Remote host closed the connection)
[7:43] * haomaiwang (~haomaiwan@li498-162.members.linode.com) has joined #ceph
[7:44] * Karcaw (~evan@68-186-68-219.dhcp.knwc.wa.charter.com) Quit (Read error: Operation timed out)
[7:44] * Karcaw (~evan@68-186-68-219.dhcp.knwc.wa.charter.com) has joined #ceph
[7:47] <nerdtron> hi all!.. i was wondering what read and write speeds should i expect regarding our hardware/network config in ceph
[7:47] <nerdtron> I'll tell you the specs if you someone is willing to listen
[7:47] <nerdtron> :)
[7:50] <Vjarjadian> testing would be best option
[7:54] <nerdtron> I already did test. my boss would like to see comparison results
[7:54] <MACscr> compared with what?
[7:54] <nerdtron> like why would our write speed is only 30MB/sec
[7:54] <MACscr> btw, howdy =P
[7:54] <Vjarjadian> there are some benchmarks on ceph's website...
[7:54] <Vjarjadian> what hardware do you have?
[7:55] <MACscr> nerdtron: we exchanged some posts on the ubuntu forums yesterday concerning ceph and usb, etc =P
[7:55] * tnt (~tnt@91.176.35.47) has joined #ceph
[7:55] <nerdtron> oh it's you??
[7:55] <MACscr> yep, that was me
[7:55] <nerdtron> alright, wait i'll post my configs and hardware specs
[7:56] <nerdtron> MACscr, is your cluster up? dumpling version?
[7:56] <MACscr> not at all, not even close to that yet
[7:56] <nerdtron> haha why?
[7:58] <MACscr> im still waiting on some hardware, plus still trying to figure out what route i am going to go for my mini cloud
[7:59] * sleinen (~Adium@2001:620:0:25:dd48:eb8f:ca97:5e41) has joined #ceph
[7:59] <MACscr> im thinking i was pushing to much for openstack and should probably go with something simple like proxmox for my 14 node setup
[8:00] <Vjarjadian> i'm leaning towards Hyper-V for my new setup...
[8:00] <nerdtron> MACscr, proxmox can be good, easy to setup and can also live migrate a node..
[8:00] <Vjarjadian> just a pity ceph doesnt have a windows client...
[8:00] <nerdtron> however it's a bit difficult to interface to ceph
[8:01] * cofol1986 (~xwrj@110.90.119.113) has joined #ceph
[8:01] <MACscr> nerdtron: depends on your needs i guess. As long as the software you are using is made to work with ceph, it shouldnt be that big of a deal
[8:01] <Vjarjadian> nerdtron, what hardware were you using that gave you 30MB/s... and how many nodes, might give a clue how it could be improved
[8:02] <MACscr> also, what kind of networking are you using
[8:02] <nerdtron> Vjarjadian, just a little more..i'll post it to pastebin
[8:02] <nerdtron> all gigabit interfaces
[8:02] <Vjarjadian> how many per host?
[8:02] <cofol1986> hello, seems ceph will not report clock skew even when I update the time, is there any way to let ceph know this?
[8:03] <cofol1986> sorry, ceph will report clock skew even when I update the time
[8:04] <nerdtron> can you view? http://pastebin.com/aaEGWQ3Z
[8:04] <Vjarjadian> how many replicas are you using?
[8:04] <nerdtron> Vjarjadian, 2 gigabit per host, one is for cluster_network and one for public_network
[8:05] <nerdtron> cofol1986, you don't have to manually update time...install ntp on your nodes
[8:05] <MACscr> nerdtron: is the gigabit setup just for ceph only communication? what kind of switch? Is the cluster being used for anything else during the test? Why is your block size so big?
[8:05] <MACscr> lastly, you dont mention anything about ceph mon, where is that being ran?
[8:06] <nerdtron> MACscr, i'm using a dlink (8port gigabit, unmanaged switch) dedicated only to the 3 nodes,
[8:06] <MACscr> nerdtron: yuck, thats not going to work well for you
[8:06] <nerdtron> the public network is also a 8 port gigabit switch connected to 3 opennebula nodes,
[8:06] <MACscr> plus you need to be using jumbo frames as well
[8:07] <MACscr> lastly, do something like so for a proper test: time dd if=/dev/zero of=/test/test.bin bs=1024 count=10000000
[8:07] <MACscr> you need to be doing tests higher than your ram
[8:07] <nerdtron> the MONs are on the 3 ceph nodes, to sum it up, 3 nodes, 1 mon each and 3 osd each,,,
[8:08] <Vjarjadian> and your replica count?
[8:08] <nerdtron> do i'm running both the mon and osd daemon at the same time on a single machine
[8:08] <nerdtron> 3
[8:08] <Vjarjadian> Netgear GS724T served me well
[8:08] <nerdtron> rep size is 3
[8:08] <Vjarjadian> 30MB/s * 3 = 90MB/s which is near the top end of gigabit bandwidth
[8:08] <nerdtron> MACscr, is that 10GB??
[8:08] <MACscr> right
[8:08] <MACscr> nerdtron: yes
[8:09] <MACscr> but Vjarjadian is right
[8:09] <nerdtron> Vjarjadian, you mean when i see the " ceph -w" showing only 30MB, it means 90MB on single pipe
[8:09] <Vjarjadian> your transferring to all your replicas simultaneously
[8:09] <Vjarjadian> iric
[8:10] <Vjarjadian> iirc
[8:10] <Vjarjadian> so three replicas, 1/3 speed
[8:10] <Vjarjadian> gigabit isnt that fast....
[8:10] <nerdtron> Vjarjadian, but why 21GB image takes too long to replicate?
[8:10] <nerdtron> about 30 mins
[8:13] * JustEra (~JustEra@ALille-555-1-227-212.w86-215.abo.wanadoo.fr) Quit (Quit: This computer has gone to sleep)
[8:14] <Vjarjadian> maybe your switch can't handle it? maybe your RAM is short when under load, maybe the i3 isnt coping...
[8:14] <Vjarjadian> could be many things
[8:14] <Vjarjadian> if you want big speeds... you need big hardware
[8:14] <nerdtron> hmm my i3 spikes load when doing 21GB clonig
[8:14] <Vjarjadian> is you want low cost expect lower performance
[8:15] <nerdtron> Vjarjadian, yeah i undestand..i jus twant to know if anyone here has the same setup and I want to know if what i'm getting is already the limit of what my hardware can do
[8:15] <nerdtron> oh by the way any tips or settings that can make the rd/wr speed faster??
[8:16] <Vjarjadian> my last tests were in VMs... so they wouldnt be any good for comparison
[8:17] <Vjarjadian> nerdtron, SSD journal might help you a bit...
[8:18] <Vjarjadian> but maybe a better switch might help too
[8:18] <Vjarjadian> Netgear GS724T isnt expensive, and would probably surpass both your little unmanaged switches
[8:19] <nerdtron> Vjarjadian i have VMs too any results on yours?
[8:19] <nerdtron> hmmm you know, If it really llok promising i'll convince my boss to buy that
[8:19] <Vjarjadian> the hardware i had them on was a SSD Raid 0 over 4 disks... and with internal networking the results were ridiculous... i noted that it worked and scaled but didnt keep the actual numbers
[8:20] <nerdtron> holy...ssd in raid 0
[8:20] * haomaiwa_ (~haomaiwan@124.161.72.254) has joined #ceph
[8:21] <MACscr> im using 10GbE for my ceph nodes and monitoring nodes are separate
[8:21] <nerdtron> ouch....i'm relly on the one here using cheap (and a bit old) hardware :(
[8:22] <MACscr> mine is old two. 3 to 5 years
[8:23] <MACscr> er, too
[8:23] <nerdtron> MACscr, and your rd/wr speed?
[8:23] <MACscr> lol
[8:23] <MACscr> nerdtron: not there yet. give me a few days =P
[8:23] <nerdtron> oh yeah..sorry
[8:23] <MACscr> but i know i have capacity to get pretty good results
[8:23] <MACscr> plus im only going to probably be doing a replica of 2
[8:23] <nerdtron> no SSD? seperate journal disk?
[8:24] <MACscr> not yet. Going to test if that really makes a difference. I have them on hand those as i originally designed things to use zfs
[8:24] <Vjarjadian> nerdtron... is 30MB/s really that bad for you?
[8:25] * stxShadow (~Jens@ip-178-201-128-101.unitymediagroup.de) has joined #ceph
[8:26] <nerdtron> Vjarjadian, i'm ok with that...but my boss (you know the grumpy ones) says " i'm buying you gigabit nics, gigabit switch and 9 hard drives, and you're saying 30MB is all you can do :(
[8:27] <Vjarjadian> he didnt buy a gigabit switch... it purchased 'the little switch that could'
[8:27] <nerdtron> yeah you're right, it's just a small cheap 8 port switch
[8:28] * haomaiwang (~haomaiwan@li498-162.members.linode.com) Quit (Ping timeout: 480 seconds)
[8:28] <Vjarjadian> GS724T is only �125 ish and could run both your networks on it
[8:29] <Vjarjadian> and the specs say it has something like 48gbps bandwidth on it...
[8:30] <nerdtron> i don't think i'll be hearing a new switch anytime soon..anyway is a load of 3. to 6.0 too much for i3? It's the load when i write something big
[8:30] <Vjarjadian> wht are you trying to use the setup for anyway? i've had a couple of workloads it was easier to just use synology NAs on
[8:30] <nerdtron> we could do nas, but ceph is easily scaleable
[8:31] <nerdtron> i mean , easy to add nodes and osd, realtime
[8:31] <Vjarjadian> as long as the solution is 'good enough'
[8:33] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[8:33] <MACscr> nerdtron: is that i3 a quad or dual core?
[8:33] <Vjarjadian> dual
[8:33] <Vjarjadian> with HT
[8:33] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[8:33] <MACscr> ouch
[8:34] <MACscr> specs dont really match whats recommended for production
[8:34] <MACscr> nor is it recommended to run ceph-mon on the same system as the OSD's
[8:34] <nerdtron> Vjarjadian, dual with hyperthreading and 1st gen
[8:35] <Vjarjadian> i have an i3-560 as a hypervisor... it's a nive little machiner
[8:35] <Vjarjadian> nice
[8:36] * Vjarjadian (~IceChat77@176.254.37.210) Quit (Quit: The early bird may get the worm, but the second mouse gets the cheese)
[8:40] <nerdtron> monmap e2: 2 mons at {ceph-node1=10.1.0.11:6789/0,ceph-node3=10.1.0.13:6789/0}, election epoch 718, quorum 0,1 ceph-node1,ceph-node3, all right i can't seem to add my ceph-node2 mon
[8:41] * tiger (~tiger@58.213.102.114) has joined #ceph
[8:49] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) Quit (Remote host closed the connection)
[8:49] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) has joined #ceph
[8:51] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[8:54] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:58] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[8:59] * JustEra (~JustEra@89.234.148.11) has joined #ceph
[8:59] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[9:01] * kDaser (kDaser@c-69-142-166-209.hsd1.pa.comcast.net) Quit (Ping timeout: 480 seconds)
[9:02] <sherry> is there any ceph developer in here?!
[9:04] <yanzheng> sherry, ?
[9:05] * tnt (~tnt@91.176.35.47) Quit (Ping timeout: 480 seconds)
[9:08] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[9:12] * stxShadow1 (~Jens@jump.filoo.de) has joined #ceph
[9:17] * mschiff (~mschiff@port-54664.pppoe.wtnet.de) has joined #ceph
[9:17] * madkiss (~madkiss@2001:6f8:12c3:f00f:5c8c:d54d:43b5:1a95) has joined #ceph
[9:18] * stxShadow (~Jens@ip-178-201-128-101.unitymediagroup.de) Quit (Ping timeout: 480 seconds)
[9:19] * Bada (~Bada@195.65.225.142) has joined #ceph
[9:19] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:21] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:23] * sleinen (~Adium@2001:620:0:25:dd48:eb8f:ca97:5e41) Quit (Quit: Leaving.)
[9:25] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[9:29] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:30] <nerdtron> is anybody using the latest ceph 0.67 dumpling??
[9:32] * stxShadow1 (~Jens@jump.filoo.de) Quit (Read error: Connection reset by peer)
[9:34] <MACscr> nerdtron: have you joined the mailing list?
[9:34] <MACscr> you should
[9:35] <nerdtron> hmm i should try that..
[9:46] * sleinen (~Adium@eduroam-etx-dock-1-138.ethz.ch) has joined #ceph
[9:47] * sleinen1 (~Adium@2001:620:0:25:d137:87f:3f6e:23af) has joined #ceph
[9:54] * sleinen (~Adium@eduroam-etx-dock-1-138.ethz.ch) Quit (Ping timeout: 480 seconds)
[9:56] <nerdtron> how many month is the usual major release of ceph?
[9:57] <wogri_risc> I believe it's 3 months
[9:58] <nerdtron> hmmm too fast for long term deployment
[10:00] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[10:03] * allsystemsarego (~allsystem@188.27.167.103) has joined #ceph
[10:11] * roald (~oftc-webi@87.209.150.214) has joined #ceph
[10:12] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has left #ceph
[10:23] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[10:25] * kyann (~oftc-webi@AMontsouris-652-1-224-208.w92-128.abo.wanadoo.fr) has joined #ceph
[10:25] * hybrid5121 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[10:27] <jerker> but they are maintained longer arent they? Bobtail 0.56.1 released in 7 jan 2013 and Bobtail 0.56.7 released 28 aug 2013 according to file dates in http://www.ceph.com/download/ and http://eu.ceph.com/rpm-bobtail/el6/x86_64/
[10:28] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[10:28] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph
[10:29] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[10:33] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[10:41] * hybrid5121 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[10:41] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[10:42] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[10:46] * kyann (~oftc-webi@AMontsouris-652-1-224-208.w92-128.abo.wanadoo.fr) Quit (Quit: Page closed)
[10:46] * kyann (~oftc-webi@AMontsouris-652-1-224-208.w92-128.abo.wanadoo.fr) has joined #ceph
[10:52] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[11:00] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[11:04] * ScOut3R (~ScOut3R@BC2484D1.dsl.pool.telekom.hu) has joined #ceph
[11:07] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[11:14] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[11:20] * ScOut3R (~ScOut3R@BC2484D1.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[11:24] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Quit: shimo)
[11:25] <ofu_> i tried to send a snapshot to another pool with rbd export kvm/olli1.img@backup-2013-09-03 - | rbd import - backup/olli1.img@2013-09-03... but on the destination i only have the image, not a snapshot... so i can not do incremental updates?
[11:37] * shimo (~A13032@61.121.217.66) has joined #ceph
[11:38] * tiger (~tiger@58.213.102.114) Quit (Ping timeout: 480 seconds)
[11:41] * tryggvil (~tryggvil@178.19.53.254) has joined #ceph
[11:42] <nerdtron> e3: 3 mons at {ceph-node1=10.1.0.11:6789/0,ceph-node2=10.1.0.12:6789/0,ceph-node3=10.1.0.13:6789/0}, election epoch 724, quorum 0,2 ceph-node1,ceph-node3
[11:42] <nerdtron> any help?
[11:43] <nerdtron> only node3 and node1 are up, i have 3 mons
[11:45] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[11:49] * dosaboy (~dosaboy@host81-155-236-130.range81-155.btcentralplus.com) Quit (Remote host closed the connection)
[11:49] * yanzheng (~zhyan@134.134.139.70) Quit (Quit: Leaving)
[11:51] * dosaboy (~dosaboy@host81-155-236-130.range81-155.btcentralplus.com) has joined #ceph
[11:53] * tryggvil (~tryggvil@178.19.53.254) Quit (Quit: tryggvil)
[11:56] <sel> Is there a way to ensure that a write is replicated without setting min_size 2?
[11:56] * tryggvil (~tryggvil@178.19.53.254) has joined #ceph
[11:56] <nerdtron> sel set the replication size to 2
[11:56] <nerdtron> well it is the default
[11:57] <nerdtron> and default min_size is 1
[11:57] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[11:57] <sel> yea, but that doesn't ensure that a replica is written.
[11:58] <sel> If one node goes down, some data might only be on that node.
[11:58] <sel> new data that is
[11:58] <nerdtron> it will replicate on "best effort" as long as you have the space for two copies, it will be replicated
[11:59] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit ()
[12:00] <nerdtron> sel if you nodes have the same capacity say, 2 identical nodes, the CRUSH algorithm will balance the data on the nodes
[12:00] <nerdtron> most likely on two nodes, CRUSH will seperate the data to have copy on two nodes
[12:03] <sel> The problem for me is the "best effort", I would have prefered to be able to say that a write isn't ack'ed before the data is replicated. I see that I can set size to 3, and min_size to 2 but 3 copies isn't exactly what I want...
[12:12] <nerdtron> and what exactly do you want?
[12:12] * xdeller (~xdeller@91.218.144.129) Quit (Ping timeout: 480 seconds)
[12:12] <nerdtron> sel, you want ex. 2 nodes, when you write on one node, it is automatically replicated on the second node?
[12:15] * shimo (~A13032@61.121.217.66) Quit (Quit: shimo)
[12:19] * sleinen1 (~Adium@2001:620:0:25:d137:87f:3f6e:23af) Quit (Quit: Leaving.)
[12:19] * sleinen (~Adium@eduroam-etx-dock-1-138.ethz.ch) has joined #ceph
[12:24] * xdeller (~xdeller@91.218.144.129) has joined #ceph
[12:27] <sel> I want to ensure that all data is replicated, and that I can handle that one datacenter (my failure domain) goes down.
[12:27] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[12:27] * sleinen (~Adium@eduroam-etx-dock-1-138.ethz.ch) Quit (Ping timeout: 480 seconds)
[12:28] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[12:32] <nerdtron> how do i update ceph version?
[12:34] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[12:35] * nerdtron (~kenneth@202.60.8.252) Quit (Remote host closed the connection)
[12:41] <sel> My problem is that if I set size=2 and min_size=2 I end up with incomplete pg's
[12:49] * xdeller (~xdeller@91.218.144.129) has left #ceph
[12:51] <kislotniq> has anyone dealed with osd plugins?
[12:51] <kislotniq> i've asked for help on the mailing list, but got no response
[12:54] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[12:57] * peetaur (~peter@CPEbc1401e60493-CMbc1401e60490.cpe.net.cable.rogers.com) has joined #ceph
[12:57] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:03] <jerker> sel: i thought "osd pool default min size" was just that. is the data centers or racks or whatever in the ceph config?
[13:03] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[13:04] <sel> I've got the datacenters and racks in my crush map.
[13:09] * yanlb (~bean@8.37.228.186) Quit (Quit: Konversation terminated!)
[13:11] <jerker> nothing gets filled up when increasing the "min size"? /just asking questions that happend to me last time i played with ceph/
[13:25] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:33] * sleinen (~Adium@2001:620:0:26:1402:1b5:835b:e627) has joined #ceph
[13:34] * diegows (~diegows@190.190.11.42) has joined #ceph
[13:35] * tryggvil (~tryggvil@178.19.53.254) Quit (Quit: tryggvil)
[13:35] * sherry (~sherry@wireless-nat-10.auckland.ac.nz) Quit (Quit: Konversation terminated!)
[13:35] * tryggvil (~tryggvil@178.19.53.254) has joined #ceph
[13:46] * The_Bishop__ (~bishop@f052102251.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[13:47] * ross__ (~ross@60.208.111.209) Quit (Ping timeout: 480 seconds)
[13:49] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[13:49] * yy-nm (~Thunderbi@211.140.18.122) has joined #ceph
[13:50] * yy-nm (~Thunderbi@211.140.18.122) Quit ()
[13:52] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[13:52] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) Quit (Ping timeout: 480 seconds)
[13:59] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[13:59] * ChanServ sets mode +v andreask
[13:59] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[14:05] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:05] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:15] * rsanti (~rsanti@74.125.122.33) has joined #ceph
[14:15] * yanzheng (~zhyan@101.83.50.133) has joined #ceph
[14:16] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:20] * sel (~sel@212.62.233.233) Quit (Read error: Connection reset by peer)
[14:23] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) has joined #ceph
[14:24] <yo61> sel: if you have size=2 and min_size=2 then you'll lose everything if you lose a DC
[14:24] <yo61> (because size becomes 1 so ceph will not allow access to those objects)
[14:28] <ofu_> clever
[14:29] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[14:36] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley_)
[14:46] * sleinen1 (~Adium@2001:620:0:46:ddc4:3710:410f:3fe4) has joined #ceph
[14:48] <mozg> hello guys
[14:48] <mozg> i am having these messages in dmesg: [640720.920070] init: ceph-osd (ceph/0) main process (27786) killed by ABRT signal
[14:48] <mozg> [640720.920219] init: ceph-osd (ceph/0) main process ended, respawning
[14:48] <mozg> is this something I should worry about?
[14:49] <joao> you should check the osd's log file
[14:49] <joao> yo61> sel: if you have size=2 and min_size=2 then you'll lose everything if you lose a DC <- I don't think this is true
[14:50] <joao> size 2 means a replication factor of 2
[14:50] <joao> if you replicate across your two datacenters, if you lose a datacenter you still have your data on the other datacenter
[14:51] <joao> your cluster will become 50% degraded because you lost the other datacenter and now only have 1 replica out of 2, but you still have your objects
[14:51] * markbby (~Adium@168.94.245.1) has joined #ceph
[14:52] * sleinen (~Adium@2001:620:0:26:1402:1b5:835b:e627) Quit (Ping timeout: 480 seconds)
[14:52] <loicd> hi ceph
[14:52] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[14:52] <joao> hello loicd
[14:55] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:59] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[15:00] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[15:10] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[15:15] * decede (~deaced@178.78.113.112) has joined #ceph
[15:17] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[15:23] * imjustmatthew_ (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) has joined #ceph
[15:24] * dmsimard (~Adium@108.163.152.2) has joined #ceph
[15:24] <yo61> joao: I wasn't clear
[15:24] <yo61> I meant you'll lose access to everything
[15:24] <yo61> ie. ceph will restrict access if size < min_size
[15:25] * mikedawson_ (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:25] <yo61> Yes, the objects will still be there, but you won't be able to get to them until the cluster is restored
[15:26] * clayb (~kvirc@proxy-nj2.bloomberg.com) has joined #ceph
[15:27] <tnt> will it block both read and write ? wouldn't it be possible to have a mode where you can't write/update the object but still read it ?
[15:27] <yo61> idk, tbh
[15:27] <yo61> I'm no expert - what I say is not authoritative!
[15:30] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:30] * mikedawson_ is now known as mikedawson
[15:40] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[15:42] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[15:43] * sprachgenerator (~sprachgen@c-50-141-192-36.hsd1.il.comcast.net) Quit (Quit: sprachgenerator)
[15:44] * fireD_ (~fireD@93-139-160-160.adsl.net.t-com.hr) Quit (Quit: Lost terminal)
[15:46] * BillK (~BillK-OFT@58-7-166-34.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[15:51] <ChoppingBrocoli> mozg: how do you set mode writeback? I have cloudstack running and I can dumpxml and define however on cloudstack restart it sets it back to default=none
[15:51] <mozg> ChoppingBrocoli, you can't at the moment
[15:51] <mozg> it's hard coded into CS
[15:52] <mozg> you can download sources and recompile them after changing that setting
[15:52] <mozg> at the moment it is set to cache=none
[15:52] <mozg> there is a ticket to implement the feature where you could change the cache option
[15:52] <mozg> but i am not sure when it is coming out
[15:52] <mozg> perhaps in 4.2
[15:53] <mozg> or 4.3
[15:53] <mozg> not sure
[15:54] <ChoppingBrocoli> Crap, thanks. 4.2 is prob a year away?
[15:54] <mozg> i've actually done some performance testing between cache=none and cache=writeback and not really notice a great deal of difference in performance
[15:54] <mozg> not sure about the data stability
[15:55] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:56] <mozg> 4.2 should be out in about 1-2 months
[15:56] <mozg> there has been a feature freeze a few months back
[15:56] <mozg> they are polishing the bugs
[15:56] <mozg> ChoppingBrocoli, have you done benchmarking with ceche=none and cache=writeback?
[15:57] <mozg> did you see performance improvements?
[15:57] <ChoppingBrocoli> yea, big difference on my end
[15:57] <mozg> really?
[15:57] <mozg> how much of a difference?
[15:57] <ChoppingBrocoli> dd with none = 70mbps with writeback = 230
[15:58] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[15:59] <mozg> what oflag settings did you use with dd?
[15:59] <mozg> i assume these are the write figures, right?
[15:59] * zack_ (~zack@formosa.juno.dreamhost.com) has joined #ceph
[15:59] * zack_ is now known as zackc
[15:59] <ChoppingBrocoli> correct
[15:59] <ChoppingBrocoli> i am allowing caching
[16:01] <LCF> https://gist.github.com/ljagiello/6424335 how to check what is wrong with authorization at radosgw ?
[16:04] * fireD (~fireD@93-139-160-160.adsl.net.t-com.hr) has joined #ceph
[16:06] * zhyan_ (~zhyan@101.83.49.69) has joined #ceph
[16:07] <mozg> ChoppingBrocoli, the trouble is your caching usually has a much smaller amount or storage than your data set
[16:07] <mozg> and when you do general benchmarks they do not show a clear picture
[16:08] <wrale> especially with volumes
[16:08] <mozg> I would suggest trying with a large data set
[16:08] <mozg> like a 20GB file taken from /dev/urandom
[16:08] * zhyan_ (~zhyan@101.83.49.69) Quit ()
[16:08] <mozg> or something like that
[16:08] <wrale> bonnie++ is a good tool for this
[16:08] <mozg> yeah
[16:09] <mozg> or you could use something like phoronix-test-suite with pts/disk benmark set
[16:09] <mozg> you can get a more reliable output compared to just using dd
[16:10] <mozg> ChoppingBrocoli, i've done a great deal of testing using cache=none and cache=writeback and i've not really seen a show stopper performance difference to be honest
[16:10] <mozg> so, i've decided not to bother with this feature until it will be officially released by cloudstack
[16:10] <mozg> ChoppingBrocoli, i am sure it is not a problem to find this setting in the source code and change it and recompile CS
[16:11] <mozg> but it's honestly too much trouble for me
[16:11] <mozg> as i've not seen a great level of performance boost
[16:11] * ircolle (~Adium@75-145-122-2-Colorado.hfc.comcastbusiness.net) has joined #ceph
[16:13] * yanzheng (~zhyan@101.83.50.133) Quit (Ping timeout: 480 seconds)
[16:14] * tziOm (~bjornar@194.19.106.242) Quit (Remote host closed the connection)
[16:21] * sagelap (~sage@2600:1012:b020:3644:a1d8:6962:87a3:483b) has joined #ceph
[16:22] * __jt__ (~james@rhyolite.bx.mathcs.emory.edu) Quit (Quit: leaving)
[16:23] * sagelap1 (~sage@2600:1012:b001:be5:1c6f:32ff:fb4:3e63) has joined #ceph
[16:29] * sagelap (~sage@2600:1012:b020:3644:a1d8:6962:87a3:483b) Quit (Ping timeout: 480 seconds)
[16:32] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) has joined #ceph
[16:38] * sprachgenerator (~sprachgen@130.202.135.201) has joined #ceph
[16:39] * jmlowe (~Adium@2601:d:a800:97:34ed:df80:912:bb08) has joined #ceph
[16:40] <jmlowe> This doesn't seem right to me, any hints?
[16:40] <jmlowe> ceph pg dump |grep peering
[16:40] <jmlowe> dumped all in format plain
[16:40] <jmlowe> 1.35c 189 0 0 0 419682 996 996 remapped+peering 2013-09-03 10:28:50.256797 7812'3001 12289:11550 [22,18] [8,3,14] 7812'3001 2013-09-02 18:28:09.769116 7812'3001 2013-08-30 18:20:59.768718
[16:40] <jmlowe> 0.35d 1273 0 0 0 128605883 995 995 remapped+peering 2013-09-03 10:28:50.256155 7812'2524 12289:13834 [22,18] [8,3,14] 7812'2530 2013-09-02 16:37:49.131079 7812'2530 2013-08-30 16:36:44.870149
[16:40] * grepory (~Adium@c-69-181-42-170.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:43] * gregmark (~Adium@cet-nat-254.ndceast.pa.bo.comcast.net) has joined #ceph
[16:43] <jmlowe> nm, they finally cleared
[16:45] * gregmark (~Adium@cet-nat-254.ndceast.pa.bo.comcast.net) has left #ceph
[16:48] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:48] <yo61> So, what's the best/easiest strategy to monitor/observe a ceph cluster?
[16:56] * sagelap1 (~sage@2600:1012:b001:be5:1c6f:32ff:fb4:3e63) Quit (Read error: Connection reset by peer)
[16:56] <topro> anyone knows whats holding back 0.67.3? it has some important fixes I'm waiting for. does it have some serious known showstoppers over 0.67.2 or has just time not come for it to be released?
[16:59] * roald (~oftc-webi@87.209.150.214) Quit (Remote host closed the connection)
[17:00] <jmlowe> topro: what are your showstoppers, I just upgraded to dumpling a couple of hours ago
[17:00] <mikedawson> yo61: That is a question that is tough to answer. "ceph -w" is the easiest, but it will not give you much insight.
[17:01] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:01] <mikedawson> yo61: I'm a few months in to the process of building a pretty good system out of Sensu, Graphite, and a bunch of custom glue
[17:02] * mattt (~mattt@94.236.7.190) has joined #ceph
[17:03] <mikedawson> topro, jmlowe: if you have any performance issues with 0.67.2, I have strong evidence that wip-dumpling-perf2 is better. It should be included on 0.67.3, but if you need something sooner...
[17:04] * DarkAceZ (~BillyMays@50.107.55.36) Quit (Ping timeout: 480 seconds)
[17:07] * JustEra (~JustEra@89.234.148.11) Quit (Quit: This computer has gone to sleep)
[17:07] <jmlowe> mikedawson: I just added 6 more osd's, so far so good on load during backfill and recovery with dumpling
[17:08] <mikedawson> jmlowe: you must not be writing enough :-)
[17:09] <topro> jmlowe: OSDs performance, using lots of CPU and some MDS stuff (like rejoin issue)
[17:09] <jmlowe> mikedawson: probably not, the new osd's are 32 core monsters
[17:09] <jmlowe> mikedawson: with only 3 osd's per node
[17:10] <yo61> mikedawson: what are you monitoring?
[17:10] <topro> mikedawson: I could install deb packages from gitbuild if its in there. I'll have a look at wip-dumpling-perf2 branch
[17:13] <yo61> Hmm, HEALTH_WARN
[17:14] * DarkAceZ (~BillyMays@50.107.55.36) has joined #ceph
[17:15] <mikedawson> jmlowe: it's not about CPU, RAM, or network, just a lack of prioritizing client i/o when there is spindle contention due to Ceph getting aggressive (backfill, recover, or scrub/deep-scrub)
[17:17] <mikedawson> yo61: we collect cpu, memory, network, and disk metrics for our hosts and we're starting to collect info on our guests and info from the ceph admin sockets
[17:18] <yo61> Ah, OK. I was meaning ceph-specific
[17:19] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:21] <yo61> Am liking it so far, but am very aware that I'm not entirely sure what's going on, how to check status, etc. etc.
[17:21] <PerlStalker> I need some help. I've had an OSD go down due to disk failure.
[17:22] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[17:22] <PerlStalker> And now I'm getting a whole stack of OSD "marked me out" messages on different OSD on different hosts.
[17:23] <PerlStalker> How to get the cluster to settle down?
[17:24] * tryggvil (~tryggvil@178.19.53.254) Quit (Quit: tryggvil)
[17:25] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:26] <PerlStalker> Is removing the failed OSD enough?
[17:27] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[17:28] <jmlowe> mikedawson: right, this new osd machine is serious overkill for 3 osd's as you point out it's i/o contention, we bought it bigger to do acceptance testing on our new 6PB lustre fs
[17:28] <yo61> PerlStalker: Does this help? http://ceph.com/w/index.php?title=Replacing_a_failed_disk/OSD&redirect=no
[17:29] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[17:29] <PerlStalker> yo61: No because I don't have a replacement disk yet.
[17:29] <PerlStalker> Right now, it looks like the cluster is thrashing.
[17:30] <PerlStalker> OSDs in this one pool keep going up and down.
[17:30] <yo61> If it were me (and bear in mind I don't know ceph all that well) I'd remove the OSD
[17:30] <mikedawson> PerlStalker: These are the types of problems I see under heavy load and spindle contention. A few strategies... 1) grin and bear it, 2) stop client i/o, 3) sometimes restarting OSDs can help clear the old requests, 4) deprioritize recovery
[17:30] * vata (~vata@2607:fad8:4:6:69c2:4273:75dd:943c) has joined #ceph
[17:31] <mikedawson> PerlStalker: for flapping OSDs.... 'ceph osd set nodown'
[17:31] <mikedawson> PerlStalker: after the cluster recovers, do a 'ceph osd unset nodown'
[17:31] <mikedawson> jmlowe: Data Capacitor II?
[17:32] <yo61> So, my cluster is showing HEALTH_WARN
[17:32] <yo61> And I can't figure out why
[17:32] <mikedawson> yo61: 'ceph health detail'
[17:32] <yo61> Ah, low disk space
[17:33] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[17:33] <yo61> ...which is because this is a PoC cluster on non-optimal hardware
[17:33] <yo61> root FS is on a 2GB flash drive
[17:33] <jmlowe> mikedawson: that's the one
[17:34] <jmlowe> mikedawson: we are mounting it and putting it into production today
[17:34] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[17:35] <mikedawson> jmlowe: fun!
[17:36] <yo61> So, if I were deploying a production cluster, would I create an OSD per raw disk? ie. multiple OSDs per machine?
[17:38] <janos> yo61: that's the typical (and i think intended) set up
[17:38] <yo61> Are there limits/best practices?
[17:39] <janos> there are tradeoffs like any system. in this case you can get speed gains by putting the OSD journals onto an SSD. you gain speeds, but one failed SSD can take down multiple OSD's, etc
[17:39] <yo61> eg. if I have 24 drives in an external chassis. I guess it depends on the bandwidth from memory → disk
[17:39] <janos> and you proibably don't want to lump 48 OSD's into one host. I/O could be a bear and the default failure domain is host, not osd
[17:40] <yo61> Right
[17:40] <jmlowe> mikedawson: I'm taking a vacation day tomorrow
[17:40] <yo61> So, smaller hosts with fewer drives is prob. the optimal config
[17:40] <joao> yo61, you may find this useful: http://ceph.com/docs/next/install/hardware-recommendations/
[17:41] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[17:41] <yo61> k, thxs
[17:41] * freedomhui (~freedomhu@117.79.232.211) has joined #ceph
[17:44] * Bada (~Bada@195.65.225.142) Quit (Ping timeout: 480 seconds)
[17:48] * alram (~alram@38.122.20.226) has joined #ceph
[17:54] * Steki (~steki@198.199.65.141) has joined #ceph
[17:55] <jmlowe> http://pastebin.com/uHqN6qRq
[17:55] <jmlowe> I've got a osd bug I think
[17:57] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Ping timeout: 480 seconds)
[17:58] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:59] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[18:03] * mattt (~mattt@94.236.7.190) Quit (Quit: Computer has gone to sleep.)
[18:04] * markbby (~Adium@168.94.245.1) Quit (Quit: Leaving.)
[18:05] * freedomhui (~freedomhu@117.79.232.211) Quit (Quit: Leaving...)
[18:06] * markbby (~Adium@168.94.245.1) has joined #ceph
[18:06] <jmlowe> sjust: you around?
[18:07] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[18:09] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[18:12] * dosaboy (~dosaboy@host81-155-236-130.range81-155.btcentralplus.com) Quit (Quit: leaving)
[18:13] * dosaboy (~dosaboy@host81-155-236-130.range81-155.btcentralplus.com) has joined #ceph
[18:17] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[18:18] * xarses (~andreww@204.11.231.50.static.etheric.net) has joined #ceph
[18:19] * ChoppingBrocoli (~quassel@rrcs-74-218-204-10.central.biz.rr.com) Quit (Remote host closed the connection)
[18:19] * angdraug (~angdraug@204.11.231.50.static.etheric.net) has joined #ceph
[18:22] * xarses (~andreww@204.11.231.50.static.etheric.net) Quit (Remote host closed the connection)
[18:22] * xarses (~andreww@204.11.231.50.static.etheric.net) has joined #ceph
[18:22] * xarses (~andreww@204.11.231.50.static.etheric.net) Quit (Remote host closed the connection)
[18:22] * xarses (~andreww@204.11.231.50.static.etheric.net) has joined #ceph
[18:30] * Cube (~Cube@12.248.40.138) has joined #ceph
[18:31] * tnt (~tnt@91.176.35.47) has joined #ceph
[18:31] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[18:32] * gregaf (~Adium@2607:f298:a:607:c501:9f75:49ae:ffe5) has joined #ceph
[18:33] * elder_ (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[18:35] * Steki (~steki@198.199.65.141) Quit (Ping timeout: 480 seconds)
[18:36] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[18:36] <gregaf> sagewk: wip-6179 looks good to me!
[18:37] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[18:37] <sagewk> gregaf: ok cool. still one failure, trying to reproduce it with logs
[18:38] * gregmark (~Adium@68.87.42.115) has joined #ceph
[18:38] <gregaf> still one failure on the client side?
[18:38] <sagewk> yeah, the osd returned the expected version + 1 but i didn't have an osd log to see why
[18:38] * aliguori (~anthony@cpe-70-112-153-179.austin.res.rr.com) has joined #ceph
[18:38] <gregaf> hrm
[18:39] * marrusl (~mark@235.sub-70-208-73.myvzw.com) has joined #ceph
[18:39] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[18:39] <gregaf> should probably merge in this patch so we don't have to sort through any more of those failures though :)
[18:39] * gregaf1 (~Adium@2607:f298:a:607:94a1:e5b4:fc26:deb5) Quit (Ping timeout: 480 seconds)
[18:43] <sagewk> k
[18:46] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[18:47] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[18:47] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[18:52] * gregaf (~Adium@2607:f298:a:607:c501:9f75:49ae:ffe5) Quit (Quit: Leaving.)
[18:53] * gregaf (~Adium@2607:f298:a:607:c501:9f75:49ae:ffe5) has joined #ceph
[18:59] * shang (~ShangWu@122-116-16-162.HINET-IP.hinet.net) has joined #ceph
[19:01] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:02] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:02] * marrusl (~mark@235.sub-70-208-73.myvzw.com) Quit (Remote host closed the connection)
[19:04] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Quit: Leaving.)
[19:05] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[19:07] * ScOut3R (~scout3r@BC2484D1.dsl.pool.telekom.hu) has joined #ceph
[19:08] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[19:09] * cydizen (~cydizen@ip98-177-171-174.ph.ph.cox.net) has joined #ceph
[19:11] * ScOut3R (~scout3r@BC2484D1.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[19:11] * ScOut3R (~scout3r@BC2484D1.dsl.pool.telekom.hu) has joined #ceph
[19:16] * Cube (~Cube@12.248.40.138) Quit (Read error: Connection reset by peer)
[19:16] * Cube (~Cube@12.248.40.138) has joined #ceph
[19:16] * angdraug (~angdraug@204.11.231.50.static.etheric.net) Quit (Quit: Leaving)
[19:17] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:19] * angdraug (~angdraug@204.11.231.50.static.etheric.net) has joined #ceph
[19:19] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[19:24] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[19:24] * markgthomas (~anonymous@ip72-223-109-105.ph.ph.cox.net) has joined #ceph
[19:24] * markgthomas (~anonymous@ip72-223-109-105.ph.ph.cox.net) has left #ceph
[19:26] * cydizen (~cydizen@ip98-177-171-174.ph.ph.cox.net) has left #ceph
[19:26] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[19:26] * Cube (~Cube@12.248.40.138) has joined #ceph
[19:32] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[19:32] <gregaf> sagewk: sorry I didn't notice those other patches before I said wip-6179 looked good :(
[19:33] * libsysguy (~libsysguy@2620:0:28a0:2004:1d35:2956:2d2e:7919) has joined #ceph
[19:33] <sagewk> np at lon gas they look good now :)
[19:33] <gregaf> anyway, the pg log entry version thing looks good to me and could definitely have been a source of issues, but we should run it by sjust because I actually swapped it to the current incarnation when he complained about having to remember that 0 was special
[19:35] <sjust> sagewk: I somewhat wanted pglog.head.user_version == info.user_version to always hold
[19:36] <libsysguy> I was reading in the documentation that ceph was eventually going to support async write capability through the RGW but in the mean time what is the best way to sync over a WAN
[19:36] <libsysguy> I was thinking that you would have to set up a cluster for each datacenter and then somehow sync them outside of ceph, maybe with a queueing daemon
[19:37] * rturk-away is now known as rturk
[19:37] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[19:39] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:41] * sleinen1 (~Adium@2001:620:0:46:ddc4:3710:410f:3fe4) Quit (Ping timeout: 480 seconds)
[19:42] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit ()
[19:43] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[19:45] * AndroUser (~androirc@med2736d0.tmodns.net) has joined #ceph
[19:47] * davidzlap (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:50] * AndroUser (~androirc@med2736d0.tmodns.net) Quit ()
[19:51] <sagewk> paravoid: did you try the fastcgi library upgrade that yehudasa recommended?
[19:53] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[19:55] * xmltok (~xmltok@pool101.bizrate.com) Quit (Remote host closed the connection)
[19:56] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[19:56] * dmick (~dmick@2607:f298:a:607:a55d:1154:5b5f:8288) has joined #ceph
[19:56] <gregaf> sagewk: two small changes I think we need to make: http://fpaste.org/36808/37823099/
[19:57] <sagewk> gregaf: looks good to me; go ahead and push?
[19:57] <gregaf> following your changes I believe we were returning 0 for the version on ENOENT ops, which could break higher-level functioning of that, and returning the old object user version instead of the pg user version for cls_current_version would have similarly-bad effects
[19:57] <gregaf> sweet, will do
[19:58] <gregaf> thanks!
[19:58] <sagewk> sadly i don't think that explains what i saw.. still trying to reproduce. :/
[20:00] <gregaf> sagewk: which assert was it, in what test?
[20:00] <gregaf> s/assert/assert or issue/
[20:00] <sagewk> iirc the read saw version X expected X-1
[20:00] <sagewk> ceph_test_rados + osd thrashing
[20:01] <gregaf> hrm, was it actually running with that whole chain of fixes? :)
[20:01] <sagewk> yeah
[20:01] <gregaf> hmm
[20:02] <sagewk> e.g., oid: 37 version is 93 and expected 91
[20:02] <sagewk> ubuntu@teuthology:/a/sage-2013-09-02_18:00:49-rados-wip-6179-testing-basic-plana/17655
[20:03] <gregaf> that's the same difference as I was seeing in the test I checked out on Friday :/
[20:03] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[20:03] * shang (~ShangWu@122-116-16-162.HINET-IP.hinet.net) Quit (Remote host closed the connection)
[20:03] <sagewk> was definite wip-6179 tho
[20:04] * sleinen1 (~Adium@2001:620:0:26:7c50:6ff7:2d6:f96f) has joined #ceph
[20:04] <jmlowe> I have a question about noout, if I have a down osd that was marked out by the cluster and I set noout to do some other work, what will happen?
[20:10] <mtanski> sage / great
[20:11] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[20:12] <mtanski> Sage / greg: in the kernel client in start_read (called from read pages). It there any reason that we have to worry about freeing pages ourselves when things go wrong versus sticking them back in the pagelist that caller will cleanup.
[20:13] <mtanski> Since right now what happens if adding the page to the LRU fails we back out, free that page by hand and then free the pages in the page vec.
[20:13] * sleinen1 (~Adium@2001:620:0:26:7c50:6ff7:2d6:f96f) Quit (Quit: Leaving.)
[20:13] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[20:13] <mtanski> I'm fixing a subtle bug in fscache there, and it kind of caught me off guard due to the having many different places the pages get cleaned up.
[20:16] * julienhuang (~julienhua@alf94-9-82-239-211-171.fbx.proxad.net) has joined #ceph
[20:16] * alfredodeza is now known as alfredo|lunching
[20:18] <gregaf> sagewk: ^ ?
[20:18] <sagewk> e
[20:18] <gregaf> I'm afraid I have no idea about the kclient unless I go in, audit the code, and look up the kernel interfaces ;)
[20:19] * tobru_ (~quassel@217-162-50-53.dynamic.hispeed.ch) has joined #ceph
[20:19] <sagewk> mtanski: don't remember if there was a reason i did it that way or just thought it was necessary
[20:20] <mtanski> was hoping for some magical memory I guess :)
[20:20] <sagewk> i was probably just trying to make it match up with what the other paths into osd_client were doing
[20:21] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[20:23] * tobru_ (~quassel@217-162-50-53.dynamic.hispeed.ch) Quit ()
[20:24] * tobru_ (~quassel@217-162-50-53.dynamic.hispeed.ch) has joined #ceph
[20:24] * ircolle (~Adium@75-145-122-2-Colorado.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[20:29] <mtanski> alright, thanks
[20:31] * haomaiwa_ (~haomaiwan@124.161.72.254) Quit (Ping timeout: 480 seconds)
[20:33] * libsysguy (~libsysguy@2620:0:28a0:2004:1d35:2956:2d2e:7919) Quit (Quit: Leaving.)
[20:36] * gaveen (~gaveen@175.157.69.118) has joined #ceph
[20:41] * nwat (~nwat@eduroam-237-79.ucsc.edu) has joined #ceph
[20:45] * rturk is now known as rturk-away
[20:46] * tobru__ (~quassel@217-162-50-53.dynamic.hispeed.ch) has joined #ceph
[20:51] * tobru_ (~quassel@217-162-50-53.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[20:58] * julienhuang (~julienhua@alf94-9-82-239-211-171.fbx.proxad.net) Quit (Quit: julienhuang)
[21:02] * julienhuang (~julienhua@alf94-9-82-239-211-171.fbx.proxad.net) has joined #ceph
[21:13] * alfredo|lunching is now known as alfredodeza
[21:22] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Quit: Leaving.)
[21:23] <mikedawson> jamespage: ping
[21:24] <jamespage> mikedawson, hey
[21:25] <mikedawson> jamespage: do you maintain the apparmor config for libvirt? I have a simple patch to enable RBD admin sockets
[21:25] <jamespage> mikedawson, I don't - but I can point someone who does at it
[21:26] <jamespage> mikedawson, can you raise a bug report on launchpad and attach the patch? I'll get hallyn or zul to pick it up
[21:26] <jamespage> mikedawson, or pastebinit or something :-)
[21:28] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[21:29] <mikedawson> jamespage: This is working for me http://pastebin.com/ARV5FPGu
[21:34] <sagewk> sjust: https://github.com/liewegas/ceph/commit/fecf7f02956d0931afd58c00bff8fa57b65dfcce#L0R1442
[21:34] <sagewk> what should i be doing there to give the target the src's content generator?
[21:35] <jamespage> mikedawson, what does this fix/enable? I think I'm being dumb?
[21:35] <jamespage> we already have /etc/ceph/ceph.conf r (I got that added a couple of cycles ago)
[21:36] <dmick> when running under libvirt, librbd needs to be able to log to the standard place (/var/log/ceph) and to be able to open admin sockets (for control and debugging) in /var/run ceph
[21:36] <jamespage> dmick, ack
[21:36] <mikedawson> jamespage: 1) the ability to debug RBD volumes via the ceph admin socket. Typically the mon and osd admin sockets are in /var/run/ceph. and 2) rbd logging in /var/log/ceph
[21:37] * Vjarjadian (~IceChat77@176.254.37.210) has joined #ceph
[21:38] <mikedawson> jamespage: not sure my syntax is perfect (as I'm new to apparmor), but this is at least an effective starting point
[21:38] <jamespage> mikedawson, re 1) that only applies if the instance is running on the same server as osd or mon right?
[21:39] <mikedawson> jamespage: no, this can apply to a compute host without any mon or osds
[21:39] <jamespage> mikedawson, ok
[21:40] <jamespage> mikedawson, OK - I passed to a libvirt guru
[21:40] <jamespage> cheers
[21:41] <mikedawson> jamespage: thx
[21:43] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[21:45] * rturk-away is now known as rturk
[21:52] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[21:52] * julienhuang (~julienhua@alf94-9-82-239-211-171.fbx.proxad.net) Quit (Quit: julienhuang)
[22:00] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) has joined #ceph
[22:05] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[22:05] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[22:06] * carif (~mcarifio@pool-96-233-32-122.bstnma.fios.verizon.net) has joined #ceph
[22:10] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[22:13] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[22:29] * Steki (~steki@198.199.65.141) has joined #ceph
[22:32] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Read error: Operation timed out)
[22:34] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[22:38] * tobru__ (~quassel@217-162-50-53.dynamic.hispeed.ch) Quit (Quit: No Ping reply in 180 seconds.)
[22:38] * tobru_ (~quassel@217-162-50-53.dynamic.hispeed.ch) has joined #ceph
[22:38] * imjustmatthew_ (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) Quit (Remote host closed the connection)
[22:43] * tobru_ (~quassel@217-162-50-53.dynamic.hispeed.ch) Quit (Remote host closed the connection)
[22:49] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[22:49] <bstillwell> How can I increase the maximum number of buckets with rgw?
[22:49] <bstillwell> It appears to be set at 1,000 currently
[22:50] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[22:50] * ChanServ sets mode +v andreask
[22:51] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:52] <bstillwell> google searches aren't pulling up much for me
[22:52] * Steki (~steki@198.199.65.141) Quit (Ping timeout: 480 seconds)
[22:56] * allsystemsarego (~allsystem@188.27.167.103) Quit (Quit: Leaving)
[22:56] <dmick> bstillwell: you can set max-buckets per user when creating a user via the CLI or REST API, it looks like
[22:56] <dmick> --max-buckets in the CLI
[22:57] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[22:57] <dmick> max-buckets parameter in the REST API http://ceph.com/docs/master/radosgw/adminops/#create-user
[22:58] * sleinen1 (~Adium@2001:620:0:25:7c2c:7154:96aa:4930) has joined #ceph
[22:58] <bstillwell> dmick: what about modifying an existing user?
[22:58] <bstillwell> and is there a way to say no limit?
[22:58] <dmick> dunno
[22:59] <dmick> it does seem like on that very same URL that I quoted, that if I search for 'max-buckets', the parameter I mentioned, it also appears under MODIFY USER
[22:59] <dmick> so I'm guessing maybe?..
[23:01] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[23:01] <bstillwell> dmick: thanks, I'll give it a try!
[23:02] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[23:02] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit ()
[23:02] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[23:03] <sagewk> sjust: https://github.com/ceph/ceph/commit/960dd3c8480754effe1ca98a6c13e63665463b20
[23:03] <dmick> yehudasa: yehudasa_: looks like http://ceph.com/docs/master/man/8/radosgw-admin/ is somewhat out of date?...
[23:04] <sjust> sagewk: yep
[23:04] <sagewk> k thanks
[23:05] <bstillwell> dmick: This seems to be the command I was looking for:
[23:05] <bstillwell> radosgw-admin user modify --uid=bstillwell --max-buckets=1000000
[23:05] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:06] <yehudasa_> bstillwell: or just --max-buckets=0
[23:06] <yehudasa_> dmick: yeah, it's been a while since updated it
[23:06] <dmick> yehudasa_: should probably be an issue open at least
[23:06] * JustEra (~JustEra@ALille-555-1-227-212.w86-215.abo.wanadoo.fr) has joined #ceph
[23:07] <bstillwell> yehudasa_: thanks!
[23:07] <yehudasa_> dmick: that was part of the papercuts task, but never got to it
[23:08] <yehudasa_> I'll open a task now
[23:09] <bstillwell> is there a quick way to delete a bucket and all the objects associated with it?
[23:11] <bstillwell> The python plugin doesn't support forcing a bucket removal, and both the ruby and php plugins failed for me when using the force option.
[23:11] <yehudasa_> bstillwell: not quick, but you can try 'radosgw-admin bucket rm --purge-data'
[23:11] <yehudasa_> (--bucket=<bucket>)
[23:12] <bstillwell> yehudasa_: thanks!
[23:13] <bstillwell> that seems quicker then the way I was doing it
[23:14] <yehudasa_> bstillwell: it should be quicker, but it's doing it object-by-object
[23:14] <bstillwell> yehudasa_: I would expect it to take some time. I have 1.5 million objects in that bucket...
[23:15] <PerlStalker> How does one decrease the priority of a cluster rebuild?
[23:16] <PerlStalker> I repaired a failed OSD from this morning. It's rebuilding and absolutely killing preformance.
[23:16] <PerlStalker> s/preformance/performance/
[23:17] * Xenith (~xenith@0001bcd9.user.oftc.net) has joined #ceph
[23:17] <lurbs> PerlStalker: I believe it's done via various config options as per http://ceph.com/docs/master/rados/configuration/osd-config-ref/#recovery
[23:21] <PerlStalker> The next question: Is it possible to set those live?
[23:23] <mikedawson> joshd: Finally got rbd admin sockets working, now I'm going to start collecting perf dump metrics to correlate them with guest performance issues. Any hint on what metrics are important?
[23:24] * Xenith (~xenith@0001bcd9.user.oftc.net) has left #ceph
[23:25] <lurbs> PerlStalker: Yes, via the admin sockets.
[23:25] <mikedawson> joshd: my theory is my issues crop up anytime Ceph rebalances, backfills, scrubs, or deep-scrubs and I see spindle contention. I believe RBD isn't being prioritized relative to the ceph maintenance
[23:28] <lurbs> PerlStalker: For example: ceph tell osd.* injectargs "--osd_recovery_max_active 5"
[23:28] <lurbs> That's a potentially bogus value, BTW.
[23:29] <dmick> lurbs: that's not the admin socket, but it should work
[23:29] <dmick> except you probably need a '--' before the last arg
[23:29] <lurbs> dmick: Yeah, I spotted that the admin sockets were largely deprecated when I saw 'ceph tell'.
[23:30] <lurbs> I'd not used it before.
[23:32] <dmick> still good stuff in the admin sockets (stats, etc.) but that's not where you typically do the config tweaking
[23:32] * vata (~vata@2607:fad8:4:6:69c2:4273:75dd:943c) Quit (Quit: Leaving.)
[23:33] <dmick> it's still a good place to dump the entire config for instance
[23:33] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[23:34] <lurbs> PerlStalker: Could also potentially be another bottleneck, of course. Network saturation across your cluster network for example.
[23:35] <PerlStalker> lurbs: Could be.
[23:37] <joshd> mikedawson: on the client side, large increases in aio_{read,write,flush} total latencies would be good to watch. on the osd side, I think there are some client request queue length measurements in there
[23:39] <mikedawson> joshd: Thx. Are those latency measurements in milliseconds? http://pastebin.com/raw.php?i=e0Q6amWG
[23:40] <joshd> I think so
[23:40] <mikedawson> joshd: does 6555ms seem high?
[23:41] <joshd> on the osd side you can also use the admin socket dump_historic_ops command to where recent slow requests were held up
[23:42] <mikedawson> joshd: I assume dump_historic_ops is for that OSD, right? So we'd need to coalesce dump_historic_ops from all OSDs to get a full view.
[23:42] <joshd> mikedawson: yeah, it's just for that one osd
[23:43] <joshd> mikedawson: and 6555 does seem high, but maybe it's not ms
[23:45] <mikedawson> joshd: aio_wr_latency has avgcount which seems like a rolling average, and also sum. Any idea what sum represents?
[23:46] <joshd> mikedawson: avgcount is actually total number of writes, and sum is the total latency (it looks like in ns actually)
[23:47] <mikedawson> joshd: "aio_wr_latency": { "avgcount": 6894, "sum": 2.018846000}
[23:47] <dmick> it's a little less useful than you might hope, as it's basically "the quotient is an average for all time"
[23:48] <mikedawson> joshd: so we've written 6894 times at an average 2ns latency?
[23:49] <dmick> no. 2.01s / 6894 is the average latency for all time
[23:49] <dmick> take two snapshots, form deltas, and divide for a currentish average
[23:49] <joshd> mikedawson: ah, it's dumped in seconds
[23:50] <mikedawson> dmick: gotcha, I'm looking for a raw counter of i/o latency that I can periodically sample, then apply a time a time-adjusted derivative when graphing it
[23:50] <joshd> mikedawson: just stored internally with ns granularity
[23:52] * BillK (~BillK-OFT@203-59-161-9.dyn.iinet.net.au) has joined #ceph
[23:52] * ScOut3R (~scout3r@BC2484D1.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[23:53] <mikedawson> so to contrast the read latency above, I have "aio_rd_latency": { "avgcount": 400, "sum": 6.424957000}. As expected the reads are much slower. Yeah RBD writeback cache!
[23:55] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[23:56] <joshd> hooray! the downside is the perf dump info from the cache isn't that useful in this case
[23:57] <joshd> you might see small spikes when it's flushed, but much larger ones when recovery is going on like you suggested
[23:58] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[23:59] * sprachgenerator (~sprachgen@130.202.135.201) Quit (Quit: sprachgenerator)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.