#ceph IRC Log


IRC Log for 2013-01-29

Timestamps are in GMT/BST.

[0:01] * JohansGlock (~quassel@kantoor.transip.nl) has joined #ceph
[0:02] * rtek (~sjaak@empfindlichkeit.nl) Quit (Read error: Connection reset by peer)
[0:02] * rtek (~sjaak@empfindlichkeit.nl) has joined #ceph
[0:05] * PerlStalker (~PerlStalk@ Quit (Remote host closed the connection)
[0:08] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[0:08] * JohansGlock___ (~quassel@kantoor.transip.nl) Quit (Ping timeout: 480 seconds)
[0:08] * PerlStalker (~PerlStalk@ has joined #ceph
[0:15] * ScOut3R (~ScOut3R@2E6BA0D4.dsl.pool.telekom.hu) has joined #ceph
[0:21] * sagelap (~sage@ has joined #ceph
[0:33] * dosaboy (~user1@host86-164-229-186.range86-164.btcentralplus.com) Quit (Quit: Leaving.)
[0:33] * jscharf (~scharf@c-76-21-1-236.hsd1.ca.comcast.net) has joined #ceph
[0:34] * jscharf (~scharf@c-76-21-1-236.hsd1.ca.comcast.net) has left #ceph
[0:49] * xiaoxi (~xiaoxiche@ has joined #ceph
[0:51] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[1:06] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[1:07] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:10] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[1:10] <Kdecherf> I spotted a strange behavior on MDS/ceph-fuse: some folders disappear (but still exist, a restart of the active mds is enough)
[1:10] <slang1> Kdecherf: what's your application workload?
[1:11] <slang1> Kdecherf: its not something you see with the cephfs kernel client?
[1:12] <Kdecherf> slang1: a "lot" of IO, some cp/rm, a global chmod and a ls
[1:12] <Kdecherf> slang1: the pool is small (7GB data, 340k objects)
[1:12] <Kdecherf> I didn't see it on the kernel client atm
[1:13] <slang1> Kdecherf: bobtail?
[1:14] <slang1> Kdecherf: did you try remounting the ceph fuse mountpoint (without the active mds restart) and still saw the same behavior?
[1:15] <slang1> Kdecherf: I can understand not wanting to do that (restarting the mds can sometimes be easier)
[1:22] * JohansGlock (~quassel@kantoor.transip.nl) Quit (Read error: Connection reset by peer)
[1:22] * JohansGlock (~quassel@kantoor.transip.nl) has joined #ceph
[1:31] * jlogan (~Thunderbi@2600:c00:3010:1:5c67:1323:2b43:da43) Quit (Ping timeout: 480 seconds)
[1:32] <Kdecherf> slang1: I will try next time
[1:32] <Kdecherf> and yes, it's bobtail
[1:32] <Kdecherf> other clients on our network use the kernel client (3.7.0+)
[1:33] <jmlowe> sagelap: I noticed there were a large number of btrfs changes in the 3.8 rc's, I'm going to try to reproduce 3810 with 3.8rc5
[1:40] * ScOut3R (~ScOut3R@2E6BA0D4.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[1:41] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:54] <sagelap> ok cool
[1:54] <sagelap> jmlowe: ^
[1:57] <jmlowe> if I can get the damn thing to reboot
[1:58] * xiaoxi (~xiaoxiche@ Quit ()
[2:12] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[2:20] * LeaChim (~LeaChim@b01bd420.bb.sky.com) Quit (Read error: Connection reset by peer)
[2:23] * wschulze1 (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:27] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[2:31] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[2:47] * BManojlovic (~steki@46-172-222-85.adsl.verat.net) Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:57] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Remote host closed the connection)
[2:57] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[3:13] * sagelap (~sage@ has joined #ceph
[3:25] * JohansGlock (~quassel@kantoor.transip.nl) Quit (Read error: Connection reset by peer)
[3:25] * JohansGlock (~quassel@kantoor.transip.nl) has joined #ceph
[3:26] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[3:37] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[3:43] * Cube1 (~Cube@ Quit (Ping timeout: 480 seconds)
[3:44] * xiaoxi (~xiaoxiche@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[3:45] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[3:46] <xiaoxi> Hi, my ceph cannot push my disk to 100% utils, since I have enough clients(128~480 rbds), i suppose the disk should be 100% busy..
[3:54] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[3:54] <phantomcircuit> xiaoxi, how many disks and how are you measuring disk utilization
[3:55] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[4:02] <dmick> xiaoxi: there are many places for bottlenecks; without discussing the configuration in detail it's difficult to discuss performance
[4:28] <paravoid> sagelap: hey
[4:29] <paravoid> so, any tips for optimizing ceph for small object sizes?
[4:29] <paravoid> performance is quite sucky
[4:34] <paravoid> rados bench for 4MB files is ~107MB/s, for 1MB ~60MB/s, for 512K ~30MB/s, for 256K ~20MB/s for 64K ~5MB/s, for 4K ~0.3MB/s
[4:35] <dmick> paravoid: do lots of them at the same time? :)
[4:36] <phantomcircuit> dmick, actually that doesn't seem to help
[4:37] <phantomcircuit> paravoid, put flashcache in writeback mode in between the journal and filestore...
[4:37] <paravoid> erm
[4:37] <paravoid> journal is raid0 SSDs and disks are behind a BBU
[4:38] <phantomcircuit> paravoid, try increasing the number of disk threads
[4:38] <paravoid> osd op threads?
[4:39] <phantomcircuit> yeah
[4:39] * nhm (~nh@184-97-251-146.mpls.qwest.net) Quit (Read error: Connection reset by peer)
[4:39] <paravoid> to what is your suggestion?
[4:39] * nhm (~nh@184-97-251-146.mpls.qwest.net) has joined #ceph
[4:39] <phantomcircuit> paravoid, it's hard to know what the limiting factor is without having statistics for resource usage underload
[4:40] <phantomcircuit> so without them im basically just guessing
[4:42] <xiaoxi> phantomcircuit:120 disks, I measure disk utilization by iostat
[4:42] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[4:42] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:43] <xiaoxi> dmick:Sorry for not enough infos, I have 20 disks ,10 Gb NICs,4 SSDs per nodes and 6 nodes in total
[4:44] <xiaoxi> Ceph on 0.56.4(daily build main brunch),default ceph setting except ulimit -n 102400
[4:45] * scuttlemonkey (~scuttlemo@ has joined #ceph
[4:45] * ChanServ sets mode +o scuttlemonkey
[4:53] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[4:53] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.89 [Firefox 18.0.1/20130116073211])
[4:54] <elder> dmick, you still on?
[4:58] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[5:00] * sjustlaptop1 (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[5:00] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Read error: Connection reset by peer)
[5:02] <dmick> elder: yep
[5:04] <xiaoxi> dmick:<xiaoxi> dmick:Sorry for not enough infos, I have 20 disks ,10 Gb NICs,4 SSDs per nodes and 6 nodes in total
[5:04] <xiaoxi> <xiaoxi> Ceph on 0.56.4(daily build main brunch),default ceph setting except ulimit -n 102400
[5:04] <dmick> elder: if you mean tracker, It Works For Me; I can reset passwords if that'll help
[5:04] <dmick> xiaoxi: so between 3 and 4 disks per node
[5:05] <xiaoxi> no,20disk per node
[5:05] <dmick> oh
[5:05] <xiaoxi> 120 in total
[5:05] <dmick> 1 10Gb NIC per node?
[5:05] <xiaoxi> yes
[5:06] <xiaoxi> 10Gb for data traffic and an 1Gb for monitor and management
[5:07] <phantomcircuit> xiaoxi, small io or large io?
[5:08] <xiaoxi> I have tested both, for 4K random io ontop of RBD, the backend disks ultilization are significant higher than that of 64K sequential io
[5:12] <dmick> so theoretical max for the NIC is, say, 1.25GB/s? let's say the disks could do 130MB/s sustained (I'm assuming they're 7200RPM)? For 20, that's slightly less than double what you can feed them with the NIC, right? Are my numbers working?
[5:12] * Pagefaulted (~AndChat73@c-67-168-132-228.hsd1.wa.comcast.net) Quit (Read error: Connection reset by peer)
[5:13] <dmick> (I suck at these sorts of estimations)
[5:14] <xiaoxi> dmick:yes,your numbers working,but the disks cannot be 130MB/s for RBD load, the typical bandwidth is ~ 30MB or less
[5:14] * chutzpah (~chutz@ Quit (Quit: Leaving)
[5:15] <dmick> you mean you're actually seeing 30MB
[5:15] <xiaoxi> yes
[5:16] <xiaoxi> Average: eth1 8808.95 146022.84 728.11 212562.37 0.00 0.00 0.00
[5:16] <xiaoxi> this is the statistics from sar of the 10Gb nic
[5:17] <xiaoxi> it's far away from 100% ultilized.
[5:17] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[5:17] <dmick> don't know what the sar columns are, but ok
[5:18] <xiaoxi> this is the sequential read test, txkB is 212562.37,or ~200MB=~1.6Gb/s
[5:26] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[5:35] * miroslav1 (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[5:36] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[5:41] * yoshi (~yoshi@p6124-ipngn1401marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[6:17] <Kdecherf> interesting, root can't read files without o+r right
[6:23] * dmick (~dmick@2607:f298:a:607:8996:57cd:5193:a7c0) Quit (Quit: Leaving.)
[6:39] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:43] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[6:49] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:04] * Pagefaulted (~AndChat73@c-67-168-132-228.hsd1.wa.comcast.net) has joined #ceph
[7:07] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[7:18] * sagelap (~sage@ has joined #ceph
[7:21] <xiaoxi> sagelap:hi~
[7:36] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[7:40] * sleinen (~Adium@2001:620:0:46:442d:b001:f46b:15b0) has joined #ceph
[7:48] * loicd (~loic@magenta.dachary.org) has joined #ceph
[7:52] * Pagefaulted (~AndChat73@c-67-168-132-228.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[7:57] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:17] * gucki (~smuxi@77-56-36-164.dclient.hispeed.ch) has joined #ceph
[8:17] <gucki> hi there
[8:17] <gucki> how can increase the near full ratio of my osds? ceph injectargs '--mon_osd_nearfull_ratio 87' doesnt seem to work :(
[8:17] * sjustlaptop1 (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Read error: Operation timed out)
[8:18] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[8:20] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[8:23] <gucki> also ceph mon tell \* injectargs '--mon-osd-nearfull-ratio 86' does not work. the full osd is at 85%, so it should work?
[8:38] * ctrl (~ctrl@128-72-212-31.broadband.corbina.ru) has joined #ceph
[8:38] <ctrl> Hello everyone
[8:38] <nz_monkey> In my learning the ins and outs of CEPH I have managed to do something stupid. I wanted to move a bunch of OSD's from spinners to ssd's, instead of marking the OSD's as down and removing them, I marked them as out and stopped the OSD process, I then put the new devices to the same mountpoint and pushed the OSD key on to them and then brought started the OSD process. As you can imagine, I now have a huge amount of stuck unclean and
[8:38] <nz_monkey> quite few stuck stale PG's. What is the best way to go about fixing this up ?
[8:39] <nz_monkey> hi ctrl
[8:43] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[8:53] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[9:00] * KindOne (KindOne@h199.58.186.173.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[9:02] * KindOne (KindOne@h113.42.28.71.dynamic.ip.windstream.net) has joined #ceph
[9:04] * loicd1 (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[9:09] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[9:11] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) has joined #ceph
[9:16] * leseb (~leseb@stoneit.xs4all.nl) has joined #ceph
[9:17] * leseb_ (~leseb@mx00.stone-it.com) has joined #ceph
[9:18] * scuttlemonkey (~scuttlemo@ has joined #ceph
[9:18] * ChanServ sets mode +o scuttlemonkey
[9:19] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) Quit (Read error: Operation timed out)
[9:22] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:22] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:24] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[9:24] * leseb (~leseb@stoneit.xs4all.nl) Quit (Ping timeout: 480 seconds)
[9:31] <ctrl> joao: are you here?
[9:33] <ninkotech> ctrl: hi. your nick reminds me a guy from concen ...
[9:34] <absynth_47215> ./nick alt-delete
[9:34] <ninkotech> :)
[9:34] <ninkotech> thats like bill gates - control -alt - vaccinate
[9:35] <ninkotech> (depopulate)
[9:36] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[9:41] <ctrl> %)
[9:48] * low (~low@ has joined #ceph
[9:51] * ScOut3R (~ScOut3R@ has joined #ceph
[10:04] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[10:07] * sleinen1 (~Adium@2001:620:0:26:51e8:94d6:d6d1:2d1) has joined #ceph
[10:07] * xiaoxi (~xiaoxiche@jfdmzpr02-ext.jf.intel.com) Quit (Ping timeout: 480 seconds)
[10:08] * sleinen (~Adium@2001:620:0:46:442d:b001:f46b:15b0) Quit (Read error: Operation timed out)
[10:18] * verwilst (~verwilst@ has joined #ceph
[10:18] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[10:20] * LeaChim (~LeaChim@b01bd420.bb.sky.com) has joined #ceph
[10:28] * LeaChim (~LeaChim@b01bd420.bb.sky.com) Quit (Ping timeout: 480 seconds)
[10:34] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:38] * LeaChim (~LeaChim@b01bd420.bb.sky.com) has joined #ceph
[10:40] * dosaboy (~gizmo@faun.canonical.com) Quit (Remote host closed the connection)
[10:41] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[10:47] <topro> on a ceph bobtail cluster with three osd hosts (2 osds each), all osds same size and weight 1, rep_size 3, min_size 2, would a default crushmap ensure to place all three replicas of each pg across all three hosts?
[11:09] * verwilst (~verwilst@ Quit (Ping timeout: 480 seconds)
[11:16] * benr (~benr@puma-mxisp.mxtelecom.com) Quit (Remote host closed the connection)
[11:28] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) has joined #ceph
[11:32] * scuttlemonkey (~scuttlemo@ has joined #ceph
[11:32] * ChanServ sets mode +o scuttlemonkey
[11:34] * scuttlemonkey (~scuttlemo@ Quit ()
[11:45] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[11:49] * xiaoxi (~xiaoxiche@ has joined #ceph
[11:49] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[11:55] * loicd1 (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Quit: Leaving.)
[11:59] * BillK (~BillK@124-168-240-221.dyn.iinet.net.au) Quit (Quit: Leaving)
[12:02] * sagelap (~sage@diaman3.lnk.telstra.net) has joined #ceph
[12:09] * scalability-junk (~stp@188-193-201-35-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[12:17] * madk (~mkrinke@ has joined #ceph
[12:17] <joao> ctrl, here now
[12:17] <joao> overslept for 2 hours
[12:18] <joao> topro, think so
[12:19] <joao> actually, no
[12:20] <joao> the default crushmap sets the osd bucket as the placement rule, iirc
[12:20] <joao> so you'd have your data replicated across osds, not hosts
[12:21] <joao> that could mean that one given pg could end up being replicated on only two hosts
[12:22] <madk> Hello
[12:22] <joao> hi
[12:22] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) has joined #ceph
[12:23] <madk> im trying to get phprados to speak to my rados cluster using cephx authentication. the examples from phprados state a configuration option for the keyring to use, but i am unsure on how to tell phprados which username to use.
[12:24] <ctrl> hi
[12:25] <ctrl> after change replica to 2 and restart ceph
[12:25] <ctrl> now health is ok
[12:25] <joao> ctrl, yeah, I'd figured changing the replica would fix it, but I don't see why you'd have to restart ceph
[12:38] <ctrl> joao: when default crushmap generates?
[12:40] <joao> it is not so much generated, afaiu, but it is more like the default settings plus whatever you specify when you add an osd to the cluster
[12:40] * xiaoxi (~xiaoxiche@ Quit (Remote host closed the connection)
[12:41] <joao> I'd have to go digging for the code that handles that, but I don't think that is what you want, is it?
[12:44] <joao> anyway, if no map is fed to the cluster, it has a set of implicit rules (data, metadata and rbd) with a set of default assumptions (such as steps), and a root; as you are adding osds to the cluster, the crushmap adapts to reflect where you're placing your osds (say, rack, row, w/e)
[12:50] <gucki> joao: do you know why ceph mon tell \* injectargs '--mon-osd-nearfull-ratio 86' does not work? the full osd is at 85%, so it should work?
[12:52] <joao> gucki, the default value for that option is '.85'
[12:52] <joao> so I'm assuming you should specify it as .86?
[12:52] <joao> or 0.86
[12:52] <joao> ?
[12:52] <gucki> joao: ok, i'll try that... :)
[12:53] <joao> let me know how it works out
[12:56] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[12:57] <topro> joao: ok that describes what I see. so how would I have to change my crushmap to achive replications across seperate hosts?
[12:57] <joao> topro, http://ceph.com/docs/master/rados/operations/crush-map/
[12:58] <joao> in a nutshell, obtain crushmap from cluster; decompile it; edit it; compile it; feed it to the cluster
[12:58] <joao> ah, you mean what you'd have to change in the crushmap
[12:58] <topro> joao: yea
[12:59] <joao> on the placement rules, setting 'host' as the bucket type on 'step chooseleaf ...' instead of osd should do the trick
[13:00] <topro> joao: thanks a lot. i'll give it a try
[13:00] <joao> topro, I advise you reading that doc
[13:00] <joao> it's insightful in many ways
[13:00] <joao> :)
[13:00] <topro> i will, promised ;)
[13:01] <Gugge-47527> my default non changed crushmap has "step chooseleaf firstn 0 type host", so it should be good right?
[13:03] <joao> should be right, yes
[13:03] <joao> I assumed the default was always 'osd' as it is what I see on my test cluster
[13:04] <joao> I'll try to remember to dig for it when I have the time, to put this question out of its misery
[13:04] <Gugge-47527> im not sure i understand the firstn 0 part
[13:04] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:04] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has left #ceph
[13:05] <joao> bbiab
[13:08] <gucki> joao: ceph mon tell \* injectargs '--mon-osd-nearfull-ratio .86' does not work
[13:08] <gucki> joao: ceph mon tell \* injectargs '--mon-osd-nearfull-ratio 0.86' does not work :(
[13:10] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[13:15] <joao> gucki, quoting sjust from earlier this month
[13:15] <joao> [1:32] <sjust> ceph mon tell \* -- injectargs '--mon-osd-full-ratio=0.98'
[13:15] <joao> [1:33] <sjust> that'll bump it to 0.98
[13:15] <joao> [1:33] <sjust> you really don't want to leave it like that
[13:16] <joao> s/0.96/0.86
[13:16] <gucki> joao: ah, probably i missed the =
[13:16] <joao> yep, and so did I ;)
[13:16] <gucki> joao: i also want to change the *near* full, not the full ratio :)
[13:16] <joao> same goes for that one
[13:17] <joao> didn't even noticed that though
[13:17] <topro> joao: i you don't mind, thats my crushmap http://paste.debian.net/230052/
[13:17] <joao> I clearly need more coffee before my reading skills are back to normal
[13:18] <gucki> joao: does not work for me ceph mon tell \* injectargs '--mon-osd-nearfull-ratio=0.86'
[13:18] <topro> joao: if i get it correctly, "step choose firstn 0 type osd" doesn't mind on howw to distribute replicas, it selects by random choice?
[13:18] <joao> topro, 'type osd' will choose from osds
[13:19] <joao> as in, if you have 12 osds distributed across 3 hosts, replication 3 will ensure 3 osds will get the same pg, but will disregard where said osds are
[13:19] <gucki> also this does not work: ceph mon tell \* -- injectargs '--mon-osd-nearfull-ratio=0.86'
[13:19] <topro> joao; but to choose from type host i would have to replace "step choose firstn..." with chooseleaf, right?
[13:19] <joao> topro, ah, that I don't know :x
[13:19] <joao> let me get back to you on that one
[13:19] <topro> nevermind, I will find out the hard way ;)
[13:20] <topro> is there a chance to kill my data by stuffing around with crushmap?
[13:21] <joao> don't think so
[13:21] <joao> it will shift data around for sure though
[13:21] <topro> ok, i'll just go and see :|
[13:22] <joao> gucki, do you have debugging on the monitors?
[13:22] <joao> you should be able to see what's going on
[13:22] <joao> when that command is received
[13:23] <gucki> joao: is this enough? http://pastie.org/pastes/5929856/text
[13:24] <joao> gucki, well, it is applying the configuration change
[13:24] <joao> it says so in the log
[13:24] <gucki> joao: it's always telling so, no matter what you pass to injectargs ;)
[13:24] <gucki> joao: but it doesn't change anything..
[13:24] <joao> ah
[13:24] <joao> let me go check that
[13:25] <gucki> joao: using argonaut 0.48.2
[13:25] <joao> oh
[13:25] <joao> I don't really recall how this used to work on argonaut
[13:25] <joao> but I'm going on a limb and assume there's the possibility of this not being supported or something
[13:26] <joao> checking it though
[13:28] <topro> joao: fyi replaced "step choose firstn 0 type osd" with "step chooseleaf firstn 0 type host" as "...choose... type host" would do as it says, choose a host, not an osd. chooseleaf instead chooses a random leaf below the host
[13:29] <topro> its rebalancing now and pg dump looks promising, so far
[13:29] <joao> topro, thanks for letting me know :)
[13:30] <topro> nevermind, i think I just got how crushmap basics work, thanks ;)
[13:31] <topro> another question, is tunables something i want to know about?
[13:37] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[13:43] * The_Bishop_ (~bishop@e177091169.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[13:44] * cgm_tco (~cgm_tco@ has joined #ceph
[13:50] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[13:53] * sagelap (~sage@diaman3.lnk.telstra.net) Quit (Ping timeout: 480 seconds)
[14:02] <joao> gucki, afaict, the argument injection is working; otherwise you'd have an error on the logs
[14:04] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[14:06] * BillK (~BillK@124-169-83-211.dyn.iinet.net.au) has joined #ceph
[14:16] * BillK (~BillK@124-169-83-211.dyn.iinet.net.au) Quit (Remote host closed the connection)
[14:25] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[14:31] * cgm_tco (~cgm_tco@ Quit ()
[14:38] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[14:41] <Kdecherf> I spotted a problem on the MDS when it consumes too much memory (don't know if it's linked, but it corresponds)
[14:43] <Kdecherf> it causes one client hangs (and not others) and sometimes some folders disappear
[14:43] <Kdecherf> same problem using ceph-fuse and the kernel client
[14:47] * xmltok (~xmltok@pool101.bizrate.com) Quit (Read error: Connection timed out)
[14:48] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[14:50] * cbm64 (c1cd3d4a@ircip2.mibbit.com) has joined #ceph
[15:06] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:10] * sleinen1 (~Adium@2001:620:0:26:51e8:94d6:d6d1:2d1) Quit (Quit: Leaving.)
[15:10] * sleinen (~Adium@ has joined #ceph
[15:12] * sleinen1 (~Adium@2001:620:0:25:8112:a74f:7c81:e9d5) has joined #ceph
[15:15] <gucki> joao: this is strange, as ceph health detail still shows HEALTH_WARN 1 near full osd(s)
[15:15] <gucki> osd.2 is near full at 85%
[15:15] <gucki> joao: shall i file a bug report for this?
[15:18] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[15:37] <joao> gucki, maybe someone else knows what's going on; from what I can tell, the command ought to be working
[15:38] <joao> gucki, have you tried with 0.87? just wondering if this happens to be some approximation on the calculated values or something
[15:38] <joao> it's a long shot
[15:42] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[15:42] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[15:43] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[15:51] * The_Bishop (~bishop@2001:470:50b6:0:adbc:f59c:21ba:248a) has joined #ceph
[15:52] <Kioob`Taff> Hi. I still have problem with scrubbing : when some scrubs start, they fill the network interface, and near all the cluster is down.
[15:53] <Kioob`Taff> (I have only 1 network card per host)
[15:53] <Kioob`Taff> is it possible that the problem came from the fact that there is only one monitor, hosted on one of this hosts ?
[15:53] <Kioob`Taff> (OSD hosts)
[15:55] <Kioob`Taff> I had 15 minutes of downtime of the entire cluster :/
[15:58] * PerlStalker (~PerlStalk@ has joined #ceph
[15:59] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[15:59] <Kioob`Taff> %util
[15:59] <Kioob`Taff> Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device).
[15:59] <Kioob`Taff> Device saturation occurs when this value is close to 100%.
[16:00] <Kioob`Taff> this one is at 100% since several minutes on multiple RBD (kernel)
[16:03] * tziOm (~bjornar@ has joined #ceph
[16:05] * cbm64 (c1cd3d4a@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[16:06] <Kioob`Taff> mmm
[16:06] <Kioob`Taff> not a network problem this time
[16:07] <Kioob`Taff> there was scrubbing of the PG [35,39,13]. After restarting OSD.13, that PG was blocked in state Ā«peeringĀ»
[16:07] <Kioob`Taff> And after restarting OSD.39, all is ok
[16:13] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[16:14] * vata (~vata@2607:fad8:4:6:a512:68b7:27c0:9c13) has joined #ceph
[16:18] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[16:19] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[16:27] * ninkotech (~duplo@ Quit (Ping timeout: 480 seconds)
[16:31] * ninkotech (~duplo@ has joined #ceph
[16:33] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[16:36] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[16:40] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) has joined #ceph
[16:44] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Quit: Leaving.)
[16:47] * aliguori (~anthony@cpe-70-112-157-151.austin.res.rr.com) Quit (Remote host closed the connection)
[16:47] * drokita (~drokita@ has joined #ceph
[16:53] * tziOm (~bjornar@ Quit (Remote host closed the connection)
[16:58] <Kioob`Taff> 2013-01-29 16:51:58.607391 7f2b3196d700 0 log [WRN] : slow request 240.744653 seconds old, received at 2013-01-29 16:47:57.862669: osd_op(client.4721.1:417108610 rb.0.1346.238e1f29.000000000480 [write 3092480~4096] 4.9a3bb1b0 RETRY) currently delayed
[16:58] <Kioob`Taff> 2013-01-29 16:51:59.607645 7f2b3196d700 0 log [WRN] : 63 slow requests, 6 included below; oldest blocked for > 310.447748 secs
[17:01] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[17:06] * jmlowe (~Adium@ has joined #ceph
[17:10] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[17:12] * jtangwk1 (~Adium@2001:770:10:500:29ad:e2b9:993e:8398) has joined #ceph
[17:12] * jtangwk (~Adium@2001:770:10:500:d903:a9f1:e23:62f2) Quit (Read error: Connection reset by peer)
[17:18] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[17:18] * aliguori (~anthony@ has joined #ceph
[17:19] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[17:20] * gregaf (~Adium@cpe-76-174-249-52.socal.res.rr.com) Quit (Quit: Leaving.)
[17:28] * low (~low@ Quit (Quit: Leaving)
[17:29] * dosaboy (~gizmo@faun.canonical.com) Quit (Read error: No route to host)
[17:30] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[17:32] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[17:42] * sleinen1 (~Adium@2001:620:0:25:8112:a74f:7c81:e9d5) Quit (Quit: Leaving.)
[17:42] * sleinen (~Adium@ has joined #ceph
[17:46] * jmlowe (~Adium@ Quit (Quit: Leaving.)
[17:47] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[17:50] * sleinen (~Adium@ Quit (Ping timeout: 480 seconds)
[17:55] <elder> Is "asok" a ceph thing?
[17:56] <joao> admin socket
[17:57] <nhm> very useful
[17:57] <ircolle> Not to be confused with http://en.wikipedia.org/wiki/Asok_(Dilbert)
[17:57] * mwcampbell (~mwc@2600:3c00::f03c:91ff:feae:4257) has joined #ceph
[17:57] <mwcampbell> Is Ceph practical for replication across a WAN (~70 ms round trip time)?
[17:57] <joao> nhm, I'd go so far as saying it's amazing stuff :)
[17:58] <elder> 2013-01-29 08:53:12.411195 7f956e3a4780 -1 asok(0x2691060) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/tmp/cephtest/asok.client.0': (98) Address already in use
[17:58] <elder> Not as useful
[17:59] <joao> elder, maybe you're running some other process on it? maybe lingering from a previous run or something?
[17:59] <joao> or even running another client on the same teuthology run (don't know if that would be a problem, don't think I ever tried it)?
[18:00] <elder> I'm trying to run a bunch of things concurrently, maybe I'm exhausting a resource somewhere.
[18:00] <joao> mwcampbell, that would not fare well with the monitors
[18:01] <joao> there are a couple of timeouts that would be triggered after 50 ms, unless readjusted
[18:01] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[18:01] * mwcampbell (~mwc@2600:3c00::f03c:91ff:feae:4257) has left #ceph
[18:02] * leseb_ (~leseb@mx00.stone-it.com) Quit (Remote host closed the connection)
[18:07] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[18:09] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[18:11] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[18:12] * sleinen (~Adium@2001:620:0:25:ac79:569b:6f5b:2f1) has joined #ceph
[18:13] * gucki (~smuxi@77-56-36-164.dclient.hispeed.ch) Quit (Remote host closed the connection)
[18:15] * alram (~alram@ has joined #ceph
[18:27] * sleinen (~Adium@2001:620:0:25:ac79:569b:6f5b:2f1) Quit (Quit: Leaving.)
[18:29] * madk (~mkrinke@ Quit (Quit: Leaving.)
[18:35] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:36] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[18:38] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[18:39] * gregaf (~Adium@2607:f298:a:607:950f:2385:d01a:644e) has joined #ceph
[18:43] * gregaf1 (~Adium@ has joined #ceph
[18:43] * gregaf1 (~Adium@ Quit ()
[18:45] * loicd (~loic@3.46-14-84.ripe.coltfrance.com) Quit (Ping timeout: 480 seconds)
[18:46] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:50] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[18:58] * pja (~pja@kobz-590d71e3.pool.mediaWays.net) has joined #ceph
[18:58] <pja> hello, sb who is familiar with radosgw?
[19:02] * gohko (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[19:02] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[19:02] <pja> nobody :( ?
[19:06] * chutzpah (~chutz@ has joined #ceph
[19:07] * jackhill (jackhill@pilot.trilug.org) has joined #ceph
[19:09] * sjustlaptop (~sam@ has joined #ceph
[19:13] <madkiss> Is it a known bug that cinder as described in http://ceph.com/docs/master/rbd/rbd-openstack/ will not work with Cephx?
[19:14] <madkiss> looks like it is
[19:14] <madkiss> https://bugs.launchpad.net/cinder/+bug/1083540
[19:17] <pja> nobody using radosgw?
[19:17] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[19:17] <pja> is there a certain person i can contact?
[19:20] <joao> pja, what would you like to know about radosgw? Am not the best person to give you info or advice, but there's a lot of people around who would probably be able to help you
[19:24] <pja> thx
[19:24] <pja> i've set up ceph+radosgw
[19:24] <pja> like in the how to on the ceph website
[19:24] <pja> we want to use the s3 api
[19:25] <pja> im the systemadmin who set up the system, theres a software developer who wants to use it
[19:25] <pja> in the first step he said he can connect
[19:26] <pja> but fails to create a bucket
[19:26] <pja> the sample says my-new-bucket.servername.de
[19:28] <pja> com.amazonaws.http.HttpClient execute
[19:28] <pja> INFO: Sending Request: PUT http://my-new-bucket.foo.de
[19:28] <pja> headers: (Authorization: AWS
[19:28] <pja> Content-Type: application/x-www-form-urlencoded; charset=utf-8, )
[19:29] <pja> [Fatal Error] :1:50: Leerstellen erforderlich zwischen publicId und systemId.
[19:29] <pja> .......
[19:29] <pja> it says you need spaces between publicid and systemid
[19:30] <joao> maybe someone else can take this one? yehudasa maybe?
[19:30] <joao> I gotta run (bbl)
[19:32] * Ryan_Lane (~Adium@ has joined #ceph
[19:34] <yehudasa> pja: where did that error come from? that's not a radosgw error
[19:34] <wer> with rest-bench "seq" option, is one expected to throw the --no-cleanup flag in oder to have sequential objects to read? Also is there any provision to specify a new write starting point to add additional objects instead of overwriting them?
[19:35] <yehudasa> pja: make sure that you can access my-new-bucket.servername.de, it might be that you'd need some dns setup for that
[19:35] <yehudasa> wer: yes and no I think
[19:35] <yehudasa> respectively
[19:36] <wer> heh. hmm. I see some gets also from performing a write yesterday.... hmm.
[19:37] <wer> oooh. nm. that was me getting all the objects. So in order to put 1M objects in the test bucket I must let it run for 60k seconds with 20 concurrency or something to that effect?
[19:38] <wer> and then seq just assumes they are going to be there :)
[19:38] * dosaboy (~gizmo@faun.canonical.com) Quit (Quit: Leaving.)
[19:38] <yehudasa> 60k seconds? that's like 20 hours
[19:39] <wer> yup :)
[19:39] <yehudasa> I don't think so, unless your performance is really bad
[19:39] <yehudasa> ah, 1M objects
[19:39] <wer> I am limited now by 1gig links.
[19:39] <yehudasa> so you basically want to transfer a terabyte
[19:39] * Cube (~Cube@ has joined #ceph
[19:40] <wer> yeah. I have a ~200TB cluster.... and am designing some ongoing testing to gather metrics.
[19:41] * sjustlaptop (~sam@ Quit (Ping timeout: 480 seconds)
[19:42] <wer> reads will likely outperform writes so the length of my read tests will have to account for that..... I actually get good performance testing information with tsung on reads, but writes are memory bound in a big way......
[19:43] * Cube1 (~Cube@ has joined #ceph
[19:44] <wer> so I am having to use rest-bench, which seems like it is not really looked after these days :)
[19:47] * Cube (~Cube@ Quit (Ping timeout: 480 seconds)
[19:48] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[19:48] <noob2> is there code in the python library to create rados gateway users?
[19:48] <noob2> if not i'll just use subprocess :)
[19:49] <pja> sorry yehudasa, im back
[19:49] <pja> yeah i make a cname entry in our dns
[19:50] <pja> are you still there?
[19:52] <wer> might be nice to add some object naming features (start seq and or naming) to rest-bench for use cases such as continuing... using existing objects..., and or using it to populate a lot of objects in a bucket. block size randomness would also be nice to some degree.
[19:53] <yehudasa> pja: yeah
[19:53] <yehudasa> wer: right
[19:54] <yehudasa> wer: rest-bench is basically doing whatever 'rados bench' is doing
[19:54] <pja> is it possible to write email?
[19:54] <wer> yehudasa: oh ok. Maybe I should be digging into rados bench :)
[19:55] <yehudasa> pja: if you can't do it here, you can try sending mail to ceph-devel, but chances are that I'll be the one to pick it up
[19:55] <yehudasa> wer: both share a common infrastructure
[19:57] <pja> i read i have first to subscribe in another mail and i dont know that this mail address is for questions
[19:57] * dmick (~dmick@2607:f298:a:607:c856:8f85:e202:43cc) has joined #ceph
[19:57] <pja> its hard to write in this public 1 sentence irc interface
[19:59] <pja> so again for "the story":
[19:59] <pja> INFO: Sending Request: PUT http://my-new-bucket.foooo.cgm.ag / Headers:
[19:59] * LeaChim (~LeaChim@b01bd420.bb.sky.com) Quit (Ping timeout: 480 seconds)
[19:59] <pja> and then the fatal error
[19:59] <dmick> pja: there's also ceph-devel@vger.kernel.org if you prefer
[20:00] <pja> thx dmick, yehudasa said that, but on the website it says you have to subscribe
[20:00] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[20:01] <pja> and its a quick plus that i can get here an answer directly
[20:01] <pja> so i will try it here again:
[20:02] * BManojlovic (~steki@46-172-222-85.adsl.verat.net) has joined #ceph
[20:04] * sagelap (~sage@diaman3.lnk.telstra.net) has joined #ceph
[20:05] <pja> have you read what i wrote before yehudasa?
[20:05] * sagewk (~sage@2607:f298:a:607:7c9e:ad40:b0ef:25d8) Quit (Read error: Operation timed out)
[20:07] <dmick> pja: he's in a short conference call ATM. Should be back in a few.
[20:07] <pja> ah thank you
[20:08] <pja> can i write to the mail address without subsribing or sthg like this?
[20:08] <pja> can i ask from which country you are
[20:08] <pja> because of the time zone
[20:08] <dmick> pja: I'm in the US. I don't know if nonsubscribers are permitted to post to the list
[20:09] <dmick> it's not super-high-volume, and a lot of it can be useful information about Ceph, though
[20:09] <pja> yeah i dont think so
[20:09] * LeaChim (~LeaChim@b01bd420.bb.sky.com) has joined #ceph
[20:15] <yehudasa> pja: will need to get more info about what tools you're using to send it
[20:15] <pja> what are you talking about
[20:15] <pja> ?
[20:15] <pja> try it another way
[20:16] <pja> from the beginnning
[20:16] <pja> we want to use s3api(java) with ceph+radosgw
[20:16] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[20:16] <pja> So I install the radosgw this way:
[20:16] <pja> http://ceph.com/docs/master/radosgw/manual-install/
[20:16] <pja> and configure it like this:
[20:16] <pja> http://ceph.com/docs/master/radosgw/config/
[20:17] <pja> can you open the 2nd link please?
[20:17] <pja> " host = {host-name}"
[20:17] * LeaChim (~LeaChim@b01bd420.bb.sky.com) Quit (Ping timeout: 480 seconds)
[20:17] <pja> i read on some pages there have to be the fqdn
[20:18] <pja> is it right?
[20:18] <pja> because in the ceph conf it was always only the hostname, not the full domain name
[20:19] <pja> next part: "CREATE RGW.CONF"
[20:19] <yehudasa> pja: yeah
[20:19] <pja> so fqdn or only hostname?
[20:19] <yehudasa> oh, not sure .. whichever works to start up the daemon
[20:19] <yehudasa> if the daemon runs then it's ok
[20:19] <yehudasa> it's used in /etc/init.d/radosgw
[20:20] <pja> that is another point, radosgw not running
[20:20] <pja> i thought its startet from the fastcgi script
[20:20] <dmick> it needs to match $(hostname)
[20:20] * scuttlemonkey (~scuttlemo@ has joined #ceph
[20:20] * ChanServ sets mode +o scuttlemonkey
[20:21] <pja> this variabel is always only hostname without domain i think dmick
[20:21] <yehudasa> pja: it can start by the fastcgi script, the instructions are for starting it manually
[20:21] <yehudasa> it's not recommended to let fastcgi control its lifecycle
[20:22] <dmick> pja: yes I think so too. hostname takes -f for fqdn
[20:22] * sagewk (~sage@2607:f298:a:607:58b:3536:b4:f25b) has joined #ceph
[20:29] * pja (~pja@kobz-590d71e3.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[20:29] * pja (~pja@kobz-590d71e3.pool.mediaWays.net) has joined #ceph
[20:30] <pja> damn connection crashed
[20:30] <pja> hope you had not written anything
[20:31] <dmick> not after (11:22:14 AM) dmick: pja: yes I think so too. hostname takes -f for fqdn
[20:31] <dmick> seems like task one is to get radosgw running
[20:31] <pja> when i type "/etc/init.d/radosgw start" i get no output
[20:31] <pja> on the console
[20:31] <pja> ps ax says no radosgw running
[20:32] <dmick> perhaps reading radosgw and looking at the things it checks would be fruitful
[20:32] <pja> pardon?
[20:32] <pja> dont understand
[20:33] <dmick> less /etc/init.d/radosgw
[20:33] <dmick> look for start)
[20:33] <dmick> see what it does
[20:33] <dmick> and what's going wrong
[20:35] <pja> for name in `ceph-conf --list-sections $PREFIX`; do.....
[20:36] <dmick> yes, looking for client.radosgw. sections
[20:38] <pja> PREFIX='client.radosgw.'
[20:39] <pja> at the beginning
[20:39] <dmick> right
[20:39] <pja> sorry, its the standard init script, i install rados gw via apt-get
[20:39] <pja> nothing changed in the init script
[20:39] <dmick> we're debugging your problem
[20:39] <pja> im sorry
[20:40] <dmick> so first it checks that you have not set 'auto start = no" or "false" or "0" in that section of ceph.conf
[20:40] <dmick> which I assume you have not
[20:40] <pja> no
[20:40] <dmick> then it checks for "rgw socket path", which you must have for auto startup
[20:40] <pja> its the standard ceph.conf of the website
[20:40] <xmltok> i dont think i ever had the init script work during my testing, i was running it like this manually: /usr/bin/radosgw -c /etc/ceph/ceph.conf -n client.radosgw.gateway
[20:40] <dmick> then it checks that "host" matches $(hostname)
[20:41] <dmick> if all those are true, it should proceed to "Starting..."
[20:41] <pja> yeah this is was the fcgi script does
[20:41] <dmick> so clearly one of those must not be true
[20:41] <pja> what is not true?
[20:41] <dmick> one of those three tests
[20:42] <pja> ive no knowledge in scripting
[20:42] <pja> but when i look in the init script
[20:42] <dmick> but this isn't scripting, this is just basic decision making
[20:42] <pja> i cannot find the dir of ceph or something, so who does it know the correct path
[20:42] <pja> who=how
[20:43] <dmick> RADOSGW=`which radosgw`
[20:43] <dmick> start-stop-daemon --start -u $user -x $RADOSGW -- -n $name
[20:43] <dmick> is how it finds the executable
[20:43] <dmick> why don't you pastebin your ceph.conf and I'll look at those three conditions for you
[20:43] <pja> wonderful
[20:45] <pja> http://pastebin.com/ynAUNhqy
[20:46] <dmick> that's not ceph.conf
[20:46] <pja> ahh sorry i thought you mean the init script
[20:47] <dmick> then I would have said /etc/init.d/ceph
[20:47] <dmick> and as we know, you're using the default one, so I have that
[20:48] <pja> [global]
[20:48] <pja> http://pastebin.com/BR24JaaD
[20:48] <dmick> please don't paste it here
[20:49] <pja> copy paste fault
[20:49] <dmick> so that looks like an FQDN in the hostname
[20:49] <dmick> which we established above was not correct
[20:50] <dmick> that's condition 3 of the conditions I enumerated above
[20:50] * ScOut3R (~ScOut3R@rock.adverticum.com) has joined #ceph
[20:50] <pja> where?
[20:50] <pja> at the bottom?
[20:50] <dmick> I think maybe you misunderstand the overall intent of the process here
[20:50] <dmick> you're adding things to ceph.conf
[20:50] <dmick> in order to set parameters for the radosgw service
[20:51] * ctrl (~ctrl@128-72-212-31.broadband.corbina.ru) Quit (Ping timeout: 480 seconds)
[20:51] <dmick> ceph.conf has several sections, as you can see
[20:51] <dmick> one of them is the radosgw section
[20:51] <dmick> that's the one you've added things to
[20:51] <dmick> (missing the leading tabs, but I think that's accepted)
[20:51] <dmick> in that section are all the things that affect radosgw
[20:51] <dmick> there is one hostname in that section
[20:51] <pja> yeah your right
[20:52] <pja> "Starting client.radosgw.gateway..." ;)
[20:52] <pja> i had this before
[20:52] <pja> as you can see this dns entry is also from another article
[20:52] * ctrl (~ctrl@95-24-254-107.broadband.corbina.ru) has joined #ceph
[20:52] <dmick> (and by the way, my test posting to the email list just went through, from a nonsubscribed address, so yes, you can post to the list without subscribing)
[20:53] <pja> thank you :)
[20:53] <pja> the problem is
[20:53] <pja> therefore i asked for the timezone
[20:53] <pja> we have 9 o clock in the evening here
[20:53] <pja> so theres no deeloper
[20:53] <pja> i dont think that this solves my problem
[20:54] <dmick> no developer where?
[20:54] <dmick> the email list is worldwide, as is IRC
[20:54] <pja> i mean on our site
[20:54] <pja> who can test our environment
[20:54] <pja> im not a programmer
[20:54] <pja> sysadmin for setting up this system i dont know
[20:55] <pja> so i cannot test if it solves my problem
[20:55] <dmick> ok, I can't help you there
[20:55] <pja> but thank you for that first step ;)
[20:55] <dmick> but I will note that email is less synchronous than IRC, and handles timezone differences better
[20:55] <dmick> but, use whatever support channel you feel is best
[20:55] <pja> yeah with pastebin its okay
[20:55] <pja> its directly
[20:56] <pja> has radosgw a webinterface?
[20:56] <pja> because i dont understand
[20:56] <pja> that i created ssl certificates and so on
[20:56] <pja> - how does the rados know where i put them?
[20:57] <pja> 1. the virtualhostconf says *:80
[20:57] <dmick> What do you believe radosgw is for?
[20:57] <pja> i think you need it to interact with the s3 api
[20:58] <dmick> and the s3 api uses HTTP as a transport
[20:58] <pja> not https?
[20:58] <dmick> whichever
[20:58] <pja> yeah but theres my question
[20:58] <pja> i create the ssl certificates
[20:59] <pja> but i dont have to put the path to them or activate the sslengine in the apache conf
[20:59] <pja> so how does rados know where they are
[20:59] <dmick> http://ceph.com/docs/master/radosgw/manual-install/#enable-ssl does that help?
[20:59] <dmick> rados doesn't do SSL, Apache does
[20:59] <pja> right
[20:59] <pja> theres my big question
[20:59] <dmick> so why would rados have to know where they are?
[21:00] <pja> when rados does not know it
[21:00] <pja> how does it apache?
[21:00] <dmick> I don't know what that question means
[21:00] <pja> ssl module activated - okay
[21:01] <pja> but you have to say a vhost where the certificates are
[21:01] <pja> .
[21:01] <pja> when you host a website
[21:01] <pja> you say in the virtualhost
[21:01] <pja> *:443
[21:01] <pja> sslengineon
[21:01] <pja> sslcertificatefile /var/www/ssl/apache.key
[21:01] <pja> apache.crt, sorry
[21:02] <pja> and also sslcertificatekeyfile ....apache.crt
[21:02] <pja> ------
[21:02] <pja> but here is nothing like that
[21:02] <dmick> I don't know. This sounds like an Apache configuration question, though, and nothing to do with RADOS or radosgw
[21:02] * BManojlovic (~steki@46-172-222-85.adsl.verat.net) Quit (Ping timeout: 480 seconds)
[21:03] * leseb (~leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) has joined #ceph
[21:04] <pja> yes it has
[21:04] <pja> that is my big question
[21:05] <pja> you said before "rados doesn't do SSL, Apache does "
[21:05] <pja> so it has to be configured in apache
[21:05] <pja> when you look here:
[21:05] <pja> http://ceph.com/docs/master/radosgw/config/
[21:06] <pja> theres only a virtualhost for port 80
[21:06] <pja> no word about adding something like ssl engine
[21:06] <pja> dont you use ssl?
[21:06] <dmick> I'm missing the point about how this has anything to do with configuring radosgw
[21:06] <dmick> again: this is Apache configuration
[21:07] <pja> yeah but it has to do with rados
[21:07] <dmick> yes, the sample Apache config on Ceph's page is for http
[21:07] <pja> because i need apache for rados
[21:07] <dmick> not https
[21:07] <dmick> pja: yes, everyone understands that
[21:07] <pja> ahh ok, so i have to edit it
[21:07] <dmick> I suggest you read up on, or find help with, Apache configuration
[21:07] <pja> no sample config from you?
[21:08] <dmick> not that I know of
[21:08] <pja> the ceph/rados page is written detailed... and then no word about you have to edit the vhost
[21:08] <pja> therefore i ask
[21:08] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[21:09] <pja> for me its all a big question mark
[21:09] <pja> i understand that radosgw is the blackbox between the ceph storage and the s3 api, the programmer use
[21:09] <pja> the connection point
[21:12] <pja> do you know boto?
[21:12] <pja> its a test programm for s3 api
[21:12] <dmick> I would suggest you could start testing this with http, but it seems like that would be obvious
[21:12] <pja> i installed it know
[21:12] <dmick> and yes, we use boto
[21:13] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[21:13] <pja> there is a test script
[21:13] <pja> we tested it with http and the developer says the api needs https
[21:13] <pja> so we thought this is the problem
[21:13] <pja> the test script with boto says:
[21:13] <pja> boto.exception.S3ResponseError: S3ResponseError: 405 Method Not Allowed
[21:15] <pja> the script looks like this if it helps: http://pastebin.com/iFzm6QsN
[21:22] <pja> the programmer use this: http://ceph.com/docs/master/radosgw/s3/java/
[21:23] <pja> he scanned his connection
[21:23] <pja> it wants to start a https session
[21:23] <pja> (443)
[21:23] * aliguori (~anthony@ has joined #ceph
[21:23] <pja> "conn.setEndpoint("objects.dreamhost.com");"
[21:24] <pja> as you can see, no protocoll typed in
[21:25] <pja> and it chooses automatically https
[21:26] <pja> so whe thought thats our problem
[21:26] <pja> and added ssl
[21:33] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[21:57] * pja (~pja@kobz-590d71e3.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[22:00] * ctrl (~ctrl@95-24-254-107.broadband.corbina.ru) Quit ()
[22:00] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[22:07] <wido> I was looking at RGW and adding RADOS pools. Is there some way to tie a specific RGW user to a RADOS pool?
[22:07] <wido> Or does it simply do "round robin" over all the pools?
[22:17] * sagelap (~sage@diaman3.lnk.telstra.net) Quit (Quit: Leaving.)
[22:17] <dmick> yehudasa: ?
[22:18] <yehudasa> wido: currently no way
[22:18] <yehudasa> wido: it's just doing a round robin over all the pools
[22:18] * sagelap (~sage@diaman3.lnk.telstra.net) has joined #ceph
[22:19] <jmlowe> haven't built a kernel for a while, I see it hasn't gotten any faster
[22:21] <dmick> more memory, more -j, fewer optional modules: all your friends
[22:22] <iggy> and ccache if you are going to do it much
[22:23] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[22:24] * sleinen1 (~Adium@2001:620:0:25:c81d:10a7:9fb1:2611) has joined #ceph
[22:26] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[22:26] * sagelap (~sage@diaman3.lnk.telstra.net) Quit (Ping timeout: 480 seconds)
[22:27] * ScOut3R (~ScOut3R@rock.adverticum.com) Quit (Ping timeout: 480 seconds)
[22:27] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[22:31] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[22:32] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[22:35] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:36] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[22:38] * LeaChim (~LeaChim@027ee384.bb.sky.com) has joined #ceph
[22:39] * sagelap (~sage@ has joined #ceph
[22:46] * sagelap (~sage@ Quit (Read error: No route to host)
[22:46] * sagelap (~sage@ has joined #ceph
[22:49] * sagelap (~sage@ Quit (Remote host closed the connection)
[22:54] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[22:56] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[23:00] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[23:04] * noob2 (~noob2@ext.cscinfo.com) Quit (Quit: Leaving.)
[23:05] * loicd (~loic@magenta.dachary.org) has joined #ceph
[23:06] * BillK (~BillK@58-7-74-106.dyn.iinet.net.au) has joined #ceph
[23:08] * nwat (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[23:10] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[23:15] <loicd> I'm trying to figure out how to run https://github.com/ceph/ceph/blob/master/src/test/filestore/store_test.cc
[23:17] <nwat> I'm looking a bit at watch/notify/linger subsystem. I'm curious about how a watch is reestablished after a client loses connection. What identifier ties a watch to a client?
[23:17] <joshd> nwat: the cookie
[23:18] <joshd> nwat: and the connection state
[23:18] <loicd> when I run ./test_filestore it hangs ( the output is http://paste.debian.net/230258/ )
[23:19] * dpippenger (~riven@cpe-75-85-17-224.socal.res.rr.com) has joined #ceph
[23:19] * dpippenger (~riven@cpe-75-85-17-224.socal.res.rr.com) Quit ()
[23:20] * dpippenger (~riven@cpe-75-85-17-224.socal.res.rr.com) has joined #ceph
[23:21] <nwat> joshd: so connection failures are OK, but would a client crash then not be able to recover its watches?
[23:27] <gregaf> nwat: it wouldn't make any sense for it to do so
[23:28] <nwat> Well, I can see value in attaching a name to a set of watches and restablishing that watch by name.
[23:28] <gregaf> watch/notify just tells you that the object changed, but you have to read it on initial setup anyway so you'd get any previous changes
[23:28] * tchmnkyz (~jeremy@0001638b.user.oftc.net) Quit (Quit: Lost terminal)
[23:28] <gregaf> huhwhat?
[23:28] <nwat> If a client doesn't need to see all operations
[23:31] <joshd> loicd: joao or sjust would know
[23:32] <gregaf> nwat: I'm not understanding your statements; I'm not sure if it's because you don't know how watch-notify works or if we're just on different wavelengths
[23:33] <nwat> my understanding of watch/notify.. a watch registers an interest in an object. a notify will synchronously broadcast a messages to watchers on an object. correct?
[23:34] <gregaf> right
[23:34] <loicd> joshd: ok. I'm investigating :-) Although it's gtest based and could probably run standalone, it belongs to bin_DEBUGPROGRAMS += test_filestore and should probably be called with arguments or a pre-defined context
[23:34] <gregaf> nwat: but there's no "set" of watches that the cluster can keep an eye on; a watch is attached to the object being watched
[23:35] <nwat> gregaf: got it, i was thinking a watch was more general than a single object. ok, i think that answers my question.
[23:35] <dmick> loicd: I don't know test_filestore, but I note the existence of test/filestore/run_seed_to.sh
[23:37] <nwat> gregaf: presumably this might be would be doable within a pg for users with custom locator keys
[23:37] <loicd> dmick: yes, and it relies on test_filestore_idempotent_sequence but I don't see that it calls test_filestore. Am I missing something ?
[23:38] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[23:38] <dmick> so it does, sorry about that. hum.
[23:38] <gregaf> nwat: not with any of our current implementation, although I guess if we really wanted to (but we really don't want to)
[23:39] <nwat> gregaf: ya.. but i've been staring at the ceiling a lot today
[23:39] <gregaf> heh
[23:41] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[23:43] * scuttlemonkey (~scuttlemo@ has joined #ceph
[23:43] * ChanServ sets mode +o scuttlemonkey
[23:45] * sleinen1 (~Adium@2001:620:0:25:c81d:10a7:9fb1:2611) Quit (Quit: Leaving.)
[23:49] * leseb (~leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) Quit (Remote host closed the connection)
[23:49] * pagefaulted (~pagefault@ has joined #ceph
[23:50] <pagefaulted> How often does radosgw free up space from deleted objects?
[23:50] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[23:52] <madkiss> can I alter the credentials for a specific user?
[23:53] <dmick> loicd: I can reproduce the hang, indeed
[23:53] <loicd> dmick: I'm glad it's not something from my environment, thanks for the confirmation ;-)
[23:56] <dmick> and running it again coredumps. wee.
[23:59] <loicd> dmick: yes but it's just because of the stale store_test_temp_dir and store_test_temp_journal if you remove them it will run and block again

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.