#ceph IRC Log


IRC Log for 2011-11-16

Timestamps are in GMT/BST.

[0:00] <gregaf> yep!
[0:00] <todin> it's $1 = 0
[0:00] <gregaf> cool, that's what sage wanted
[0:01] <grape> gregaf: so who decides what $name is? Is it $name.$id as in osd.0
[0:01] <todin> gregaf: yep that I know ;-)
[0:01] <gregaf> hopefully from his email he knows the probably cause given that
[0:01] <gregaf> grape: pretty sure, yep
[0:01] <grape> gregaf: ok that makes much more sense
[0:01] * todin learned a new trick today
[0:03] * MK_FG (~MK_FG@ has joined #ceph
[0:05] <Tv> grape: name = $type.$id
[0:05] <Tv> defining name based on name would be an interesting topological exercise
[0:06] <grape> Tv: LOL
[0:21] * adjohn (~adjohn@ has joined #ceph
[0:22] <grape> gregaf, joshd & Tv: Thanks for your help!
[0:28] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[0:39] * gregorg_taf (~Greg@ has joined #ceph
[0:39] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[1:10] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[1:35] <yehudasa_> Tv: (remote,) = ctx.cluster.only(client).remotes.keys() fails on ValueError: need more than 0 values to unpack on the swift task that I'm creating
[1:35] <yehudasa_> I'm trying to figure out what that means..
[1:35] <Tv> yehudasa_: you have more than one remote with that role
[1:36] <Tv> yehudasa_: that's meant for things like "osd.3"
[1:36] <Tv> oh wait "more than 0" -> you have no remotes with that role
[1:37] <yehudasa_> Tv: where does it take the remotes from the yaml?
[1:38] <Tv> yehudasa_: "targets:"
[1:38] <Tv> or dynamically from the lock server
[1:38] <Tv> matches against "roles:"
[1:38] <yehudasa_> Tv: roles:
[1:38] <yehudasa_> - [mon.a, osd.0, client.0]
[1:39] <Tv> so what's "client" in your python task?
[1:39] <yehudasa_> client.0?
[1:39] <yehudasa_> - testswift:
[1:39] <yehudasa_> clients: [client.0]
[1:39] <yehudasa_> testswift_conf:
[1:39] <yehudasa_> client.0:
[1:39] <yehudasa_> func_test:
[1:39] <Tv> i can't help you debug if you're guessing ;)
[1:43] <yehudasa_> Tv: that's my yaml: http://pastebin.com/BLCYKuaR
[1:43] <yehudasa_> the swift task is basically a modified s3tests task
[1:43] <yehudasa_> I got the create_users to work
[1:43] <Tv> yehudasa_: put a "print client" in the right spot to see what's actually happening
[1:52] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Ping timeout: 480 seconds)
[1:54] * ghaskins_ (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[2:00] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Ping timeout: 480 seconds)
[2:52] * ghaskins_ (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has left #ceph
[2:53] * adjohn (~adjohn@ Quit (Quit: adjohn)
[2:53] * adjohn (~adjohn@ has joined #ceph
[2:53] * adjohn (~adjohn@ Quit ()
[3:01] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[3:11] * cp (~cp@108-203-50-207.lightspeed.sntcca.sbcglobal.net) Quit (Quit: cp)
[3:19] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[3:27] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:34] * jojy (~jvarghese@ Quit (Quit: jojy)
[4:20] * jojy (~jvarghese@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[4:20] * jojy (~jvarghese@75-54-231-2.lightspeed.sntcca.sbcglobal.net) Quit ()
[4:32] * grape (~grape@c-76-17-80-143.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[5:44] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:02] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:04] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:31] * adjohn (~adjohn@50-0-164-220.dsl.dynamic.sonic.net) has joined #ceph
[7:09] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[8:29] * chaos_ (~chaos@hybris.inf.ug.edu.pl) has joined #ceph
[8:31] <chaos_> gregaf, there values are directly from performance sockets.. it's pulled by my ruby 'app', but it's based on collectd plugin
[9:12] * gregorg (~Greg@ has joined #ceph
[9:12] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[9:13] * adjohn (~adjohn@50-0-164-220.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[9:17] * tom (~tom@n16h50.rev.sprintdatacenter.pl) has joined #ceph
[9:18] <tom> hi every1, i would ask if some know that in radosgw v0.38 there are known problem with user creation ?
[9:21] <tom> "radosgw-admin user create --uid=TEST2 --gen-access-key --gen-secret --access=full --display-name="User TEST2" --email=test@test.test" is executing forever, but it works in v0.34.
[9:22] <tom> tried also working branch from git, and still the same result
[9:49] * Olivier_bzh (~langella@xunil.moulon.inra.fr) Quit (Quit: Leaving.)
[9:55] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:49] * Olivier_bzh (~langella@xunil.moulon.inra.fr) has joined #ceph
[10:50] <Olivier_bzh> hi everybody
[10:50] <Olivier_bzh> I am a very new user of ceph : since yesterday
[10:50] <Olivier_bzh> and I am very happy :
[10:51] <Olivier_bzh> it works fine : 3 servers running, 25T of physical space 17T online...
[10:52] <Olivier_bzh> For the momment, I am sharing this space with NFS... perhaps it is not so good
[10:52] <Olivier_bzh> I don't know : do you have some advice about that ?
[10:53] <chaos_> why with nfs?
[10:53] <chaos_> it's crazy idea;p
[10:54] <Olivier_bzh> the clients are running Debian squeeze, kernel 2.6.32, and there is no ceph module no ?
[10:54] <chaos_> oh
[10:54] <chaos_> but sharing ceph by nfs.. it's just.. wrong
[10:54] <Olivier_bzh> ah ok ;-)
[10:54] <chaos_> ;p
[10:54] <Olivier_bzh> so It's time to modify it ;-)
[10:54] <chaos_> use native ceph client
[10:55] <Olivier_bzh> ok
[10:55] <chaos_> use fuse
[10:55] <chaos_> you don't have to have native module
[10:55] <Olivier_bzh> alright, thanks for the advice
[10:55] <chaos_> it's just point of view;p
[10:56] <Olivier_bzh> That's good for me... I'am a newbie, and I trust people using it for a long time
[10:57] <Olivier_bzh> I have an other question if you don't mind :
[10:57] <chaos_> i'm using ceph for 1-2 months ;)
[10:57] <Olivier_bzh> that's a long time ;-)
[10:58] <Olivier_bzh> I also need to share this space with internet or intranet
[10:59] <Olivier_bzh> and I am wondering what would be usable...
[10:59] <Olivier_bzh> I planned to share it with webdav...
[10:59] <Olivier_bzh> I've seen radosgw, but I don't know anything yet about it
[10:59] <chaos_> i'm thinking about rados too
[10:59] <chaos_> i need shared space for web app
[11:00] <chaos_> and it looks like it
[11:00] <chaos_> 11 app servers.. and some data should be shared
[11:01] <Olivier_bzh> yes I see... for my nightmare, I've some windows clients too
[11:02] <chaos_> ;-)
[11:04] <Olivier_bzh> So I 'll take a look to rados, I will see
[11:04] <Olivier_bzh> thank you once more Chaos
[11:05] <chaos_> np
[11:05] <chaos_> i think we're only 'euro' guys using ceph ;p
[11:06] <Olivier_bzh> ;-) I'm French yes
[11:06] <failbaitr> chaos_: nah, plenty of dutch peeps in the channel also
[11:06] <Olivier_bzh> first euro users... wow
[11:07] <chaos_> failbaitr, oh ;) someone is alive.. ;p
[11:07] <chaos_> i can't get anything from anynone beyond california prime time ;)
[11:07] <failbaitr> chaos_: yes, but im just a user, and havent seen much problems yet
[11:08] <failbaitr> atleast not once I had everything setup and running
[11:08] <chaos_> software without problems.. where this world is going;p
[11:08] <Olivier_bzh> that's a good information ;-)
[11:09] <Olivier_bzh> for my part, I made some few tests since yesterday, and I found it very good :
[11:10] <chaos_> i've small performance problems, but it might be network/architecture related...
[11:10] <Olivier_bzh> 1Gb/s of write speed (my network card maximum)
[11:10] <chaos_> i've high write latency
[11:10] <failbaitr> chaos_: high write latency is triggered by the journals as far as I know
[11:11] <failbaitr> using a dedicated disk / partition / ssd should help
[11:11] <failbaitr> atleast, thats what the docs say
[11:12] <Olivier_bzh> I've used a dedicated partition on RAID1 disk, SAS 6gb, 15k rpm
[11:12] <chaos_> failbaitr, i've dedicated rais0
[11:12] <chaos_> raid*
[11:13] <chaos_> and write latency can go up to 2s during heavy load.. but even idle its 300ms
[11:13] <chaos_> sometimes 150ms.. but never lower
[11:13] <chaos_> i'm wondering if number od mds daemons have any performance impact
[11:13] <chaos_> gregaf ^^
[11:15] <chaos_> failbaitr, how your write latency looks like?
[11:17] <failbaitr> haven't benchmarked yet
[11:17] <failbaitr> but it loosk like around 250ms or a bit higher
[11:17] <NaioN> anybody also using rsync?
[11:17] <failbaitr> depens on the load, it can go much higher
[11:17] <chaos_> you can check it by performance counters... i've read that is command line tool in 0.38, and i'm writing wiki article about using performance sockets in ruby
[11:17] <NaioN> I've trouble with getting it stable with a heavy rsync load
[11:17] <NaioN> but it could also be btrfs related
[11:17] <chaos_> failbaitr, 250ms is quite high for me
[11:18] <NaioN> and do you guys use mdraid?
[11:18] <chaos_> yes
[11:18] <failbaitr> nope, just clean disks
[11:19] <chaos_> i'm using md
[11:19] <NaioN> failbaitr: with btrfs?
[11:19] <failbaitr> yes
[11:19] <NaioN> stripe or raid10?
[11:19] <failbaitr> I had some weirdness with the snapshots
[11:19] <failbaitr> NaioN: 1 partition per disk atm
[11:19] <failbaitr> 2 disks per machine
[11:19] <chaos_> NaioN, raid0
[11:19] <failbaitr> and two datastores per server
[11:19] <NaioN> ah ok
[11:19] <chaos_> one data store here
[11:20] <failbaitr> I thought it might help reduce the overhead
[11:20] <NaioN> well i tried many combinations but I don't get it stable... but as said with a heavy rsync workload
[11:20] <failbaitr> fragment the data into more partitions, and thus avoid overhead if I had an actual disk crash
[11:21] <NaioN> failbaitr: well the trouble with your setup is if a disks crashes the filesystem "crashes" and you have to do a reboot
[11:21] <NaioN> or doe you use raid1?
[11:22] <NaioN> therefor I use BTRFS in raid10 OR mdraid (raid5/6) with eg ext4
[11:23] <NaioN> if a disks dies on the osd i can replace it without rebooting the osd
[11:23] <NaioN> but with btrfs it looks like I'm hitting a bug under heavy load
[11:23] <failbaitr> NaioN: Nope, in case a disks crashes, the server needs a reboot, unless I just umount it and let is sit there untill there's a reboot scheduled
[11:49] * slang (~slang@chml01.drwholdings.com) Quit (Ping timeout: 480 seconds)
[11:51] * johnl_ (~johnl@johnl.ipq.co) Quit (Ping timeout: 480 seconds)
[11:54] * stass (stas@ssh.deglitch.com) Quit (Ping timeout: 480 seconds)
[12:29] <chaos_> gregaf, i've written wiki page about connecting to performance counters - http://ceph.newdream.net/wiki/Perfomance_counters, i've placed it under "Administration" section, i hope it's right
[12:31] <chaos_> maybe someone could add it to http://ceph.newdream.net/docs/latest/dev/logs/
[12:32] <chaos_> if it's useful ;)
[12:55] * tnt__ (~tnt@212-166-48-236.win.be) has joined #ceph
[12:57] <tnt__> Hi.
[12:57] <tnt__> Is it possible to have multiple "fs" hosted in one cluster (sharing disks dynamically) but having different replication level ?
[13:05] <chaos_> it's definded by osd map
[13:05] <chaos_> i think
[13:05] <chaos_> erm.. crush map
[13:07] <tnt__> Yes, for a "pool" (and there is also a page about setting replication level per pool). But what I haven't figured out yet is how a "pool" maps to something I can mount.
[13:25] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[13:26] <psomas> How can I debug an rbd/rados cmd?
[13:33] * ghaskins_ (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[13:34] <todin> psomas: which one?
[13:35] <todin> you could do something like this rbd create testmsg --size 10240 --log-to-stderr --debug-ms 1
[13:38] <psomas> i didn't use --log-to-stderr
[13:39] <psomas> without that, i wasn't getting any output
[13:39] <todin> psomas: ok
[13:40] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Ping timeout: 480 seconds)
[13:40] <psomas> thanks
[13:40] <todin> np
[13:41] <tnt__> Apparently you can't create several volumes in a single ceph cluster from what I gather.
[13:41] <tnt__> But can you map 'directories' inside the cluster to several pool ?
[13:50] <todin> tnt__: I think you can do it, if you create diffrent pools for the fs. that could wokr
[13:51] <todin> normally you have three pools, I think you could create more, and uses each one for one fs
[13:51] <psomas> if i don't use log-to-stderr, does it log the msgs somewhere?
[13:51] <tnt__> todin: I see how to create pools, but now how to use them for the fs.
[13:51] <tnt__> s/now/not/
[13:51] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Remote host closed the connection)
[13:54] <todin> tnt__: me neither, I used i for rbd images, but I cannot find it in the cephfs
[13:54] <todin> the poolname is not a mountoption
[13:55] <tnt__> todin: Yes, I don't think you can have several independent volumes. But I think I read you could define what files would be stored on what pool.
[13:59] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:00] <tnt__> Another question: Is a ssh from server to server an absolute requirement ? (in our prod environment, direct ssh as root is not authorized and machine cannot even ssh from one to another)
[14:03] <tnt__> gregaf: ping
[14:05] <tnt__> Ah found it ! Apparently you can use set_layout on the directory to change which pool is gonna be used !
[14:09] <NaioN> tnt__: as far as i know at the moment it isn't possible to use another pool as a fs
[14:10] <NaioN> the mds always uses the data pool
[14:10] <NaioN> but i also heard that they are working on it to have the mds use different pools
[14:10] <NaioN> and then you should be able to have more mountable fs'es on the same cluster
[14:10] <tnt__> NaioN: Apparently you can. It's not a separate fs and will appear as a single one, but the files can be made to use another pool.
[14:11] <NaioN> tnt__: yeah that's right
[14:11] <NaioN> sorry thought you meant multiple mountable fs'en
[14:11] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[14:12] <tnt__> Well, I was asking for anyway I could have some data replicated 4 times and some only 2 times for eg. And per-directory is fine.
[14:12] <NaioN> tnt__: but what you want you can do with the crushmap if i'm right
[14:12] <NaioN> but that was on the filename
[14:13] <tnt__> NaioN: with crush map I thin I can define how much and how each pool is distributed, but not how files map to each pool AFAICT.
[14:13] <tnt__> so I need both the crushmap to set my replication parameters and where to put the data and the set_layout to select which pool to use for which subdir.
[14:21] <NaioN> tnt__: I thought somebody on this channel did map filesnames to different pools, but I can't find it anymore
[14:23] <tnt__> NaioN: I found gregaf talking about it.
[14:23] <tnt__> Apparently using cephfs set_layout -p pool_id directory/
[14:24] <NaioN> tnt__: http://ceph.newdream.net/docs/latest/man/8/cephfs/#cmdoption-cephfs-c
[14:25] <tnt__> yes, that's the page I was on :)
[14:26] <NaioN> it looks like you could control it with cephfs
[14:27] <NaioN> I don't have a running cluster at the moment so I can't test
[14:28] <tnt__> Well, I'll try to build a small test one now. And then hopefully replace our lustre setup with ceph
[15:08] * adsllc (~davel@cblmdm72-240-119-60.buckeyecom.net) has joined #ceph
[15:12] <adsllc> Howdy! I'm brand new to ceph and trying to get a test system running... I think that I've got everything figured out except mounting it, but I'm not able to figure out the auth. My mount command fails with "Operation not permitted" and dmesg logs "libceph: ceph: Mount failed due to key not found: client.admin", but "ceph auth list" does show a client.admin. Any ideas what I'm missing?
[15:23] * grape (~grape@c-76-17-80-143.hsd1.ga.comcast.net) has joined #ceph
[15:57] * slang (~slang@chml01.drwholdings.com) has joined #ceph
[16:41] * ghaskins_ (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Ping timeout: 480 seconds)
[16:49] * pserik (~Serge@eduroam-61-104.uni-paderborn.de) has joined #ceph
[16:53] <pserik> hello @all, I have a question about the ceph configuration file. config file has a section [osd] which is general for all OSDs and [osd.X] which is for specific OSD. E.g. in my case I have only the [osd.1]. What exactly happens if I remove the entry "ext4 devs = /dev/sdx"?
[16:58] <pserik> I cannot understand the difference between osd "data = /srv/ceph/osd$id" in [osd] and "ext4 devs = /dev/sdx" in [osd.0]
[16:59] * adjohn (~adjohn@50-0-164-220.dsl.dynamic.sonic.net) has joined #ceph
[17:03] <NaioN> the first osd has to be 0
[17:03] <pserik> yeah… is it, its just a typo :)
[17:03] <NaioN> k
[17:04] <NaioN> the osd data points to the dir on the osd where the datastore resides
[17:04] <NaioN> this doesn't have to be a new mounted fs
[17:05] <pserik> ok. also it can be a local dir… right?
[17:06] <NaioN> with the devs option ceph manages the mounting
[17:06] <NaioN> yes
[17:06] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[17:07] <NaioN> with devs ceph manages the mounting and creation (mkcephfs)
[17:08] <NaioN> be aware that it cleans the dir
[17:09] <pserik> with "devs" you mean the devs entires of specific OSDs?
[17:10] <NaioN> yes
[17:13] <NaioN> so the osd data points to the datastore on the osd
[17:13] * ghaskins_ (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[17:14] <NaioN> and eg mkcephfs uses dev to format those devs with the fs and mount it under the data dir
[17:15] <pserik> ok, one more, if the "data" entry in [osd] is "/opt/test" and the [osd.0] entry hast only the ip address of ssd host, which device on ssd host will be mounted?
[17:15] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Ping timeout: 480 seconds)
[17:15] <NaioN> if you don't use dev its your own responsibility to have the right fs mounted under the data dir (or subdir)
[17:15] <NaioN> with only data ceph doesn't mount anything
[17:16] <NaioN> it assumes a correct fs under the data dir
[17:16] <pserik> ok, I think I got it. thanks!
[17:16] <NaioN> k
[17:19] <pserik> :q
[17:20] * ghaskins_ (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) Quit (Read error: Connection reset by peer)
[17:20] * tnt__ (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:21] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[17:21] <grape> NaioN: I was asking many of the same questions yesterday. You may want to check http://irclogs.ceph.widodh.nl/index.php?date=2011-11-15
[17:21] <grape> NaioN: sorry, i got the names mixed up
[17:22] <grape> pserik: I was asking many of the same questions yesterday. You may want to check http://irclogs.ceph.widodh.nl/index.php?date=2011-11-15
[17:23] <NaioN> :)
[17:24] <grape> NaioN: You might also want to throw in a couple of these http://ceph.newdream.net/wiki/Debugging
[17:26] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[17:28] <pserik> grape: thanks for link, some time ago the log server was broken so I didn't try to search today
[17:31] <grape> pserik: I'm trying to document my installation as I go along. Hopefully I will have something to show today, along with some sample config files.
[17:34] <pserik> grape: nice, I think I am not here in the evening but if you paste the link here I will be able to find it
[17:36] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:36] <pserik> it would be nice to know if I've configured it right on our servers
[17:37] * tnt__ (~tnt@22.185-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[17:39] <grape> pserik: I always found that a nice, fully documented config file was handy.
[17:44] * adjohn (~adjohn@50-0-164-220.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[18:01] * tnt__ (~tnt@22.185-67-87.adsl-dyn.isp.belgacom.be) Quit (Read error: Connection reset by peer)
[18:05] * pserik (~Serge@eduroam-61-104.uni-paderborn.de) has left #ceph
[18:24] * tnt_ (~tnt@ has joined #ceph
[18:37] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[18:41] <Tv> tnt_: replication level is per pool, a subdirectory of cephfs can be told to use a different pool, and clients can mount subdirectories of the filesystem directly
[18:42] <Tv> tnt_: direct ssh between hosts in cluster is not required, it's just a convenience feature
[18:43] <Tv> adsllc: the client side is not finding the key; ceph auth list just says that your monitors have the key
[18:56] * bchrisman (~Adium@ has joined #ceph
[18:57] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:04] * tnt_ (~tnt@ Quit (Read error: Operation timed out)
[19:07] <gregaf> chaos_: looks good to me, although I don't know any ruby so I can't check that ;)
[19:08] <gregaf> about your latency questions, I don't know — we've got some new hardware coming in so we can start collecting this kind of data
[19:09] <gregaf> but it's not something we've looked at lately and since we usually expect actual write latencies to be hidden behind buffering the system is designed for overall bandwidth first
[19:10] * tnt_ (~tnt@45.184-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[19:11] <tnt_> Tv: thanks. Yes that's what I found and I'll try that tomorrow
[19:12] <gregaf> tnt_: you can also mount only a portion of the filesystem, so although they'll still live in one tree you can have clients that only see part of it (assuming you control the clients, that is…)
[19:22] <tnt_> gregaf: I'm mostly interested in setting different replication levels for != directories. Not per client but per 'importance'. Some of the data is not that important and can be regenerated if lost. Others are very important.
[19:29] <Tv> tnt_: how will you communicate the importance to the computer
[19:31] <tnt_> Tv: basicaly, I have two directories :) thumbnails/ documents/ ...
[19:31] <Tv> tnt_: how's that != directories?
[19:32] <tnt_> I want everything underh thumbnails/ to just have 2 replica. And everything under documents/ to have 4.
[19:33] <Tv> tnt_: given that, i still don't understand your line about "!= directories"
[19:35] <tnt_> I have two different directories and I want two different replication level. What don't you get about that ?
[19:35] <Tv> tnt_: i guess i'd have expected you to say "I'm mostly interested in setting different replication levels for directories."
[19:35] * adjohn (~adjohn@50-0-164-220.dsl.dynamic.sonic.net) has joined #ceph
[19:36] <Tv> tnt_: and read the "!= directories" as in you want different replication levels on a non-directory level
[19:36] <tnt_> Tv: mmm ... I means "!=" as a shortcut for the word "different".
[19:36] <tnt_> But maybe the sentence is a bit weird ... I'm not a native english speaker ...
[19:36] <Tv> yeah, but you meant "differently named" i read "different types of objects than"
[19:37] <Tv> me neither ;)
[19:40] <tnt_> well, google shows other people using that construct, so if I'm wrong at least I'm not alone :p
[19:43] <chaos_> gregaf, what about number of mds daemons, it has anything to do with performance?
[19:44] <Tv> chaos_: right now we recommend just one active mds at a time
[19:44] <chaos_> i'm using one..
[19:44] <grape> in /etc/fstab, what mount options/flags should be used for a partition that will be used for block storage? I am only aware of "noatime", but saw a reference to something else in the docs.
[19:44] <Tv> chaos_: more would perform better, but trigger bugs more often
[19:44] <chaos_> and second one at standbay
[19:44] <chaos_> standby*
[19:45] <grape> oh wow it is almost 2. Can't believe the day is going by so fast
[19:52] <grape> heh the tornado sirens are going off
[19:54] <gregaf> chaos_: MDS daemons shouldn't impact write latency; they'll impact how long MDS ops take but if you're not maxing out one MDS you're not going to gain anything by increasing them
[19:57] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit (Quit: cp)
[20:11] * jojy (~jvarghese@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[20:12] * jclendenan (~jclendena@ Quit (Read error: Connection reset by peer)
[20:15] * tjikkun (~tjikkun@82-169-255-84.ip.telfort.nl) Quit (Remote host closed the connection)
[20:19] <chaos_> gregaf, so... there is nothing i can change to improve latency
[20:20] <chaos_> do you have any plans to work on latency?
[20:21] <gregaf> not right now, but if it turns out to be a problem I'm sure it will go on the roadmap pretty quickly
[20:21] <gregaf> let me look at what those latencies are actually measuring
[20:22] <chaos_> well ;) it's good question.. there is few counters named 'latency'
[20:22] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) has joined #ceph
[20:22] <chaos_> i'm using plain write latency and overall operation latency
[20:25] <gregaf> so there's two write latencies, l_osd_op_w_rlat and l_osd_op_w_lat
[20:25] <chaos_> w_lat
[20:25] * sagelap (~sage@wireless-wscc-users-3028.sc11.org) has joined #ceph
[20:25] * sagelap (~sage@wireless-wscc-users-3028.sc11.org) has left #ceph
[20:25] <chaos_> i don't know what op_w_rlat measure
[20:25] <gregaf> depending upon your application you probably want the first; that's how long it takes from a write op coming in until the write is visible to readers
[20:26] <gregaf> the second is how long it takes from a write op coming in until it's been committed to disk
[20:26] <chaos_> oh ;) it should be written somewhere;p
[20:26] <chaos_> i've took counters description from code comments
[20:26] <gregaf> aren't you just using the ceph fs though?
[20:27] <chaos_> yes, by fuse driver
[20:27] <gregaf> so why does the OSD write latency matter?
[20:28] <chaos_> hmm
[20:28] <chaos_> it's only for osd? it isn't overal latency?
[20:28] <chaos_> for whole cluster?
[20:28] <gregaf> yeah; the OSD write latency is how long an OSD op takes
[20:28] <chaos_> ...
[20:28] <gregaf> but if you're using the ceph FS that's all going to be hidden by client-side buffering anyway
[20:29] <chaos_> i'm soo dumb then;p but what about writing by one client and reading by other
[20:29] <gregaf> in that case you're going to care, yes
[20:30] <gregaf> but situations like that are pretty rare
[20:35] <chaos_> i've 11 app servers that share lots of small xml files with session data, and sometimes session is served through few servers.. yes it's rare, but it happens
[20:35] <chaos_> i've to ask developers for more accurate data about switching session through few app servers
[20:38] <chaos_> maybe i shouldn't care as you say ;)
[20:38] <gregaf> chaos_: in general though that kind of thing is write-once-read-many
[20:39] <chaos_> it write-many here ;/
[20:40] * cp (~cp@ has joined #ceph
[20:40] <gregaf> well in that case you might care, but I'd try it out first and see if it's actually a problem or not *shrug*
[20:41] <chaos_> probably i'll do that, it won't be worse than now
[20:43] <gregaf> if it is a problem then the quickest solution is probably to get a fast journal device (ie ssd)
[20:44] <gregaf> and make sure you're testing with the right sizes
[20:47] <chaos_> we'll see, i've meeting about this next week
[20:48] <chaos_> thanks for advices;)
[20:48] <gregaf> np!
[20:49] <gregaf> thanks for testing :)
[20:49] <chaos_> i don't have choice;)
[20:49] <chaos_> my boss hates gluster and ceph looks more useful
[20:51] <gregaf> haha, I'm not sure whether that makes me happy or sad
[20:52] <chaos_> it should make your happy ;-) ceph with his short history is better than gluster developed redhat
[20:52] <chaos_> you*
[20:59] <grape> lol
[21:02] * grape wonders if Ceph is going to be known as the Gluster refuge.
[21:08] <chaos_> i don't even know gluster ;) when i came to new job i've got big ticket with ceph implementation, because gluster sucks and they don't want to have anything to do with gluster anymore
[21:10] <todin> Hi, atm I am trying ods failures and cluster resync, if I turn one of four osd, the sync starts, but short before finshed it stops, that varies between 5 to 1% degraded status in diffrent runs, how could I debug this?
[21:17] <joshd> todin: what does ceph -s say? is there any workload running on the cluster while you're testing this?
[21:18] <todin> joshd: no workload, cephs -s says it's degraded, but the sync just stops, I tried it three times now
[21:19] <joshd> are you sure it wasn't just taking a long time to rereplicate some objects?
[21:21] <todin> joshd: I am, there was no network traffic at all, the cluster was 25% degraded 1 of 4 osd, than it start do sync and counts down, and short before finshed it stops, there were only two pg left all on one osd
[21:22] <todin> the sync from 25% to 5% took 15min, and after three houers still 5%, with no load at all
[21:24] <joshd> to debug further we'd need osd logs with debug osd = 25 for the osd that this occurs on - but before that, what version are you running?
[21:24] <todin> it was in 38 and in the git from today
[21:25] <joshd> ah, in that case logs would definitely help
[21:28] <todin> ok, just increase log level on all 4 osd, and the interesting log should be form the osd were the pg are which dont sync?
[21:28] <todin> and no incrase on the mon?
[21:28] <joshd> yeah, the mon doesn't matter for this
[21:29] <joshd> the other osd logs might matter as well, actually
[21:30] <todin> ok, I will rerun it.
[21:30] <joshd> thanks. once it's reproduced, a pg dump and osd dump will help too
[21:40] * stass (stas@ssh.deglitch.com) has joined #ceph
[21:58] <wido> Is it correct that LIBRADOS_VER_MAJOR, MINOR and EXTRA are still set to 0.30.0 ?
[21:58] <wido> or just forgotten? :-)
[21:59] <joshd> wido: probably the latter, although I'm not sure it's supposed to match the ceph version
[21:59] <gregaf> wido: I don't know if the API has changed since then?
[22:01] * tom (~tom@n16h50.rev.sprintdatacenter.pl) Quit (Ping timeout: 480 seconds)
[22:05] <grape> gregaf: I'm trying to recall if it was a configuration error or simply a big that was keeping my cluster from authenticating yesterday afternoon.
[22:06] <grape> gregaf: I think I just left it up in the air and spent the day today trying to document what I had figured out up until this point.
[22:07] <gregaf> I think the last thing I had worked out you didn't have your monitors running, but I don't know what state you were in before that
[22:08] <grape> gregaf: Indeed, nothing was running. I think because of an authentication issue.
[22:09] <grape> is it possible for $id to be anything other than a number?
[22:09] <gregaf> well the only auth issue you brought to me was that ceph tool couldn't connect to the monitors, and the monitors weren't running
[22:09] <gregaf> so maybe the monitors had their own auth issue, but that's unlikely
[22:09] <gregaf> you mean the config file id?
[22:09] <grape> yeah
[22:10] <gregaf> the OSD ids need to be numerical, but the MDS and monitors aren't
[22:10] <grape> so mon.a is not an issue
[22:12] <wido> gregaf: I'm not sure what the version should be or if the API changed since then
[22:12] <wido> but it seems to be the version convention Ceph is using
[22:12] <gregaf> yeah, me either :(
[22:12] <wido> 0.30.0 does seem like it has been forgotten since then ;)
[22:13] <gregaf> yeah, I'm sure it has, but that doesn't mean the API has changed, and if it hasn't the number should stay the same
[22:14] <wido> I see a commit from sage on September 12th adding add_conf_parse_env()
[22:14] <wido> that changed the API
[22:14] <gregaf> *sigh*
[22:15] <wido> there have been a number of API changes since 0.30.0
[22:17] <wido> I'm afk, ttyl
[22:22] <wido> oh, I forgot. Did something change on the ml? I seem to be subscribed but I'm not getting any messages
[22:22] <wido> I thought it was quite on the ml
[22:23] <joshd> wido: vger will unsubscribe you if your address bounces - try resubscribing
[22:31] <Tv> what (if any) existing ceph.conf statements can take a list?
[22:31] <Tv> i feel this wheel must exist already
[22:32] <joshd> Tv: auth_supported
[22:32] <joshd> separated by spaces or semicolons, iirc
[22:32] <Tv> ooh get_str_list
[22:32] <grape> Here is today's documentation effort - feedback is welcome. I have quite a few notes I haven't integrated yet, but it is getting closer to being something usable. http://ceph.newdream.net/wiki/User:Grape
[22:32] <Tv> that's exactly why i asked, thanks ;)
[22:39] * al (d@niel.cx) Quit (Remote host closed the connection)
[22:39] * al (d@niel.cx) has joined #ceph
[22:59] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:08] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:11] * al (d@niel.cx) Quit (Remote host closed the connection)
[23:11] * al (d@niel.cx) has joined #ceph
[23:23] <grape> once an admin key is placed in the keyring, is it the case that there is no more need for the ssh keys to manage the cluster?
[23:24] <joshd> grape: the only time you need ssh keys is when you're running mkcephfs
[23:24] * tomz (~tom@n16h50.rev.sprintdatacenter.pl) has joined #ceph
[23:25] <Tv> and even then only if you choose to use -a
[23:25] <grape> joshd: thanks. I really have had a difficult time stomaching the need for keys without passphrases.
[23:26] <grape> Tv: so is it correct to say that it is focused on transferring the config file?
[23:26] <Tv> grape: mkcephfs -a transfers ceph authentication keys, initial state data, etc
[23:28] <grape> Tv: Is the ssh method the best option you guys have found to handle this?
[23:29] <Tv> grape: right now, it's the most convenient; we're adding chef & juju for actual, proper, ops on large clusters
[23:29] <grape> Tv: that would be ideal.
[23:29] <Tv> grape: https://github.com/NewDreamNetwork/ceph-cookbooks
[23:30] <grape> Tv: thanks!
[23:30] <Tv> grape: severly limited, currently
[23:31] <Tv> grape: see the recent ceph-devel email from me
[23:31] <grape> Tv: will do
[23:32] * tomz (~tom@n16h50.rev.sprintdatacenter.pl) Quit (Ping timeout: 480 seconds)
[23:42] <Tv> grape: oh in case it's not obvious, i mean the Crowbar email.. crowbar uses chef cookbooks
[23:43] <Tv> http://www.spinics.net/lists/ceph-devel/msg04085.html
[23:44] <grape> Tv: cool, thanks
[23:48] <grape> could anyone take a look at this config file and let me know if there are any issues with it? http://ceph.newdream.net/wiki/User:Grape#ceph.conf
[23:50] <Tv> grape: that's so close to what i put in the docs that i'm not gonna find much fault in it ;)
[23:51] <grape> Tv: lol I took parts from a couple of sources. Omitted some things that were not aplicable
[23:52] <joshd> grape: looks like log_to_stderr is just a bool now
[23:52] <grape> joshd: thanks
[23:56] <joshd> grape: you might add a [client] section with log_to_stderr set to true, and maybe some debugging option commented out (rados = 20, rbd = 20, monc = 10, auth = 20, ms = 1)
[23:56] <joshd> it's useful for figuring out why command line programs are failing, although those can all be specified on the command line instead
[23:57] <grape> joshd: great!
[23:57] <grape> joshd: Having them documented in the config file makes for an easy reference when there are multiple versions of docs floating around.
[23:57] <joshd> oh and debug objecter = 20 (only for clients)
[23:58] * Tv . o O ( googling for std::list is tricky.. "list of STDs".. )

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.