#ceph IRC Log


IRC Log for 2012-12-07

Timestamps are in GMT/BST.

[0:00] <dmick> sometimes people put "keyring" in the global section of their ceph.conf
[0:00] <dmick> and that's bad if it doesn't include tokens (like $name, say) because there are multiple keyrings
[0:00] <dmick> (that was for Kioob)
[0:00] <Kioob> well, it's my case... like indicated in first doc
[0:01] <Kioob> so... good point :)
[0:01] <dmick> alan_: don't know; ceph is a script though so you can have a look
[0:04] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[0:05] <dmick> alan_: that looks like your shell got really confused
[0:06] <wer> hmmm. osd.010 and osd.10 appear equivalent to ceph.
[0:06] <Kioob> dmick: in the Debian Wiki http://wiki.debian.org/OpenStackCephHowto, there is � keyring = /etc/ceph/keyring.admin � in the global section... so, it's a stupid idea. right ?
[0:06] <Kioob> (it's overwrite in [osd] section)
[0:07] <dmick> yeah, but not in mon
[0:08] <dmick> I don't think that's cool, no; I think the admin key and the mon key need to be separate
[0:09] <rweeks> wer, that's because mathematically, 010 and 10 are the same in decimal.
[0:09] <rweeks> ;)
[0:10] <wer> well that much is obvious. but is osd.010 mathmatically equivalent to osd.10?
[0:11] <dmick> Kioob: otoh I have a cluster successfully running that way, so...
[0:11] <Kioob> so, now I think I really understand the problem, thanks for your patience dmick :)
[0:11] * chutzpah (~chutz@ has joined #ceph
[0:12] * joao (~JL@ has joined #ceph
[0:12] * ChanServ sets mode +o joao
[0:12] <rweeks> I think one of the devs will have to answer that, wer.
[0:12] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[0:12] <rweeks> I would assume so, since (I think) linux treats a leading zero that way
[0:13] <wer> You can see the magic if you look at ceph osd dump.... but it doesn't match the id's in the config.
[0:13] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[0:15] <dmick> so Kioob: I wish I could give you a definitive answer but I'm not certain. I *do* know that keeping the keyring paths separate is safe; I just don't know which daemons absolutely require it, and when the keyrings are overwritten
[0:15] <dmick> but I bet if you experiment based on that, you'll find a config that works at least
[0:15] * agh (~Adium@gw-to-666.outscale.net) Quit (Read error: Connection reset by peer)
[0:16] * agh (~Adium@gw-to-666.outscale.net) has joined #ceph
[0:16] <alan_> dmick http://tracker.newdream.net/issues/3581
[0:18] <dmick> alan_ are you referring to the fs_type problems?
[0:18] <dmick> there are more than two, which is why I ask
[0:19] <alan_> no, its not me
[0:19] <dmick> oh you mean two errors per run. ok.
[0:19] <dmick> oh OH. ok.
[0:19] <alan_> it's can be fixer, but i will wait update ;)
[0:20] <dmick> looks like df550c9cce969b667f3b062ee3113a5493ac16ce
[0:20] <rweeks> gods I hate git sometimes.
[0:20] <rweeks> why can't that be human readable???
[0:22] <wer> are you a human rweeks ?
[0:22] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[0:22] <rweeks> wouldn't you like to know?
[0:22] <dmick> rweeks: it can't be human readable because it's intended to be globally unique for computers
[0:23] <dmick> but it's easy to translate that into a set of stuff
[0:23] <dmick> completely unambiguously
[0:23] <rweeks> I know, it just bugs me
[0:23] <alan_> dmick, readd osd's not help me, all osd's now down :(
[0:23] <dmick> it bugs me less than "the commit from X on date/time"
[0:24] <Kioob> dmick: yes, by splitting configuration it works well :)
[0:24] <dmick> alan_: what does "readd osd's" mean?
[0:24] <dmick> Kioob: great!
[0:24] * rweeks wonders if he just failed a turing test
[0:24] <alan_> dmick: remove and add failed osd's.
[0:25] <dmick> ok. I'm not sure what state you're in at the moment then I guess
[0:25] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[0:26] <alan_> goodbye for all and big thanx dmick!
[0:26] <dmick> gl alan_
[0:27] * alan_ (~alan@ctv-95-173-34-17.vinita.lt) Quit (Quit: alan_)
[0:27] * KindTwo (KindOne@h183.63.186.173.dynamic.ip.windstream.net) has joined #ceph
[0:29] * KindOne (KindOne@h162.25.131.174.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[0:29] * KindTwo is now known as KindOne
[0:31] * aliguori (~anthony@ Quit (Remote host closed the connection)
[0:36] <dmick> anyone else: that fs_type error is noted and fixed in git this morning; I was missing that it was a shell syntax error (can't have spaces in an assignment)
[0:36] <dmick> 0a137d76bd9ee924c43a42abc33f4c6c06a03d5e for rweeks :)
[0:36] * rweeks thwaps dmick with a trout
[0:37] * benner (~benner@ Quit (Read error: Connection reset by peer)
[0:37] * benner (~benner@ has joined #ceph
[0:37] <rweeks> I miss a bot in another irc channel
[0:37] <rweeks> anytime you hit someone with a fish it would auto-kick you with the message "Don't be a fishtard!"
[0:37] <dmick> must you dance while you do it?
[0:38] <rweeks> not any dance
[0:38] <rweeks> a fancy dance
[0:38] <dmick> http://www.youtube.com/watch?v=IhJQp-q1Y1s
[0:39] <rweeks> Unfortunately I left my pith helmet at home.
[0:39] <dmick> we had a chemistry teacher with a severe lisp who never tired of talking about his pith helmet
[0:40] <dmick> jr. high kids never tired of giggling at it either. (well some of them. I kind of lost interest)
[0:40] <Kioob> if I create a RBD image in a pool, is it possible to change that pool later ?
[0:43] <tnt> don't think so.
[0:43] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:44] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:44] <tnt> (well outside of creating the image in the other pool and copying it)
[0:44] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) has joined #ceph
[0:44] <Kioob> yes, of course
[0:44] <Kioob> in fact, I probably missues pool
[0:44] <Kioob> I was creating pool per number of replica
[0:45] <Kioob> so one pool with only 1 replica, for testing purpose
[0:45] <Kioob> one pool with 2 replica, for huge data / low cost
[0:45] <Kioob> and one pool with 3 replicas, for standard use
[0:46] <Kioob> but if I can't easily move a RBD from one pool to the other, it's not a good idea
[0:47] <tnt> well different pools is the only way to get different replication level.
[0:47] <dmick> Kioob: no, it involves data copy, so not particularly easy, but doable
[0:47] <joshd> you can clone to another pool, and flatten later
[0:47] <dmick> wwwwell, that's true
[0:48] * benpol didn't know you could clone to a different pool, had assumed you couldn't
[0:48] * loicd (~loic@magenta.dachary.org) Quit ()
[0:48] <infernix> with the objsync tool i guess?
[0:48] <Kioob> if I clone to another pool, I have an instant copy ? (without replicas, of course)
[0:50] <dmick> Kioob: process is: make snapshot, which is readonly; then clone, and the clone can live in different pool
[0:51] <dmick> reads to the clone will actually read from the parent snapshot, so no data actually moves
[0:51] <dmick> until you write (at which piont you get COWed data) or flatten (which basically requests a full COW)
[0:51] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[0:51] <dmick> flatten also then severs the link to the parent snap
[0:52] * Leseb_ (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[0:52] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[0:52] * Leseb_ is now known as Leseb
[0:53] <joshd> benpol: you're not the only one who assumed that. maybe that should be more explicit in the docs
[0:54] <joshd> http://ceph.com/docs/master/rbd/rbd-snapshot/#layering
[0:56] <Kioob> Ceph only supports cloning for format 2 images (i.e., created with rbd create --format 2), and is not yet supported by the kernel rbd module. So you MUST use QEMU/KVM or librbd directly to access clones in the current release.
[0:56] <dmick> correct
[0:56] <sjustlaptop> joshd gregaf: anyone want to review wip_dho_bugs?
[0:56] <Kioob> of course, I use the kernel rbd module :p
[0:56] <sjustlaptop> I got a librados bug and a dumb filestore bug
[0:57] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[0:58] <joshd> librados one looks fine, although I wonder if anything was somehow relying on that bug
[0:59] <joshd> actually, does it need to be fixed in the objecter too when it resends watches?
[1:00] <benpol> joshd: Nice docu, thanks!
[1:01] <joshd> nevermind, the same lingerop flags are reused, so it's fine
[1:06] <tnt> Kioob: Damn that sucks ... (kernel not supporting fmt 2)
[1:07] * wubo (~wubo@nat-ind-inet.jhuapl.edu) Quit ()
[1:08] * jjgalvez1 (~jjgalvez@ Quit (Quit: Leaving.)
[1:09] * roald (~Roald@ Quit (Read error: Connection reset by peer)
[1:10] <joshd> sjustlaptop: looks good to me, assuming it's tested
[1:13] <sjustlaptop> the first patch passed the filestore unit tests
[1:13] <sjustlaptop> I'll run an rgw test against the second
[1:14] <joshd> what about watch_notify_stress?
[1:15] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has left #ceph
[1:15] <sjustlaptop> ah, forgot about that
[1:19] <gregaf> sjustlaptop: haven't looked at it, but did you remember to change the librados version (if appropriate), and have you run it through the other librados user tests to make sure nobody relies on whatever behavior you considered buggy? :)
[1:19] <sjustlaptop> it doesn't change any user visible behabior
[1:19] <sjustlaptop> *behavior
[1:19] <sjustlaptop> except that it won't crash
[1:20] <gregaf> okay then
[1:20] <sjustlaptop> the old version could crash the replicas
[1:20] <sjustlaptop> this version won't
[1:24] <Kioob> a RBD client should have which cephx privileges, to can RW an image ?
[1:24] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[1:26] <Kioob> on the Debian example, there is : --cap mon 'allow r' --cap osd 'allow rwx pool=XXX'
[1:26] <Kioob> so, only R on mon, and RWX on osd ?
[1:27] <joshd> yup
[1:27] <Kioob> and the X privilege is for ?
[1:27] <Kioob> �Gives the user the capability to call class methods (i.e., both read and write).�
[1:28] <joshd> osd class methods to interact with the rbd header
[1:28] <Kioob> mmm...
[1:28] <Kioob> ok...
[1:28] <Kioob> thanks :)
[1:30] <yasu`> Hi I'm running Ceph-FS on 32-bit machine just fine
[1:30] <yasu`> when I mount it from 64-bit machine, ranlib on the ceph space failed.
[1:31] <yasu`> (I was compiling and installing on that ceph space, from 64-bit client)
[1:32] <yasu`> Is this expected ?
[1:33] <yasu`> the error was something like: ranlib: /mnt/ceph/libXXX.a: Error reading libXXX-YYY.o: File truncated
[1:34] <dmick> was that .a file larger than 2GB, perhaps?
[1:34] * jlogan1 (~Thunderbi@2600:c00:3010:1:7db5:bf2b:27d1:c794) Quit (Ping timeout: 480 seconds)
[1:34] <yasu`> It seems 2.0M
[1:35] <yasu`> nm on the file is just fine.
[1:35] <dmick> I would say that's not expected, no. is it reproducible
[1:35] <dmick> ?
[1:36] <yasu`> Yes, reproducible
[1:36] <dmick> could you put details in an issue at http://tracker.ceph.com?
[1:36] <yasu`> in fact, I was compiling ceph-0.55 from 64-bit linux on 32-bit ceph space :)
[1:36] <dmick> cool test :)
[1:37] <yasu`> tracker.newdream.net ?
[1:38] <yasu`> from my network, no DNS found for tracker.ceph.com ...
[1:40] <yasu`> It compiled just fine, but installing (make install) under /mnt/ceph/ stopped in ranlib.
[1:41] <yasu`> I'm not familiar with filing issues ... a little help please ?
[1:41] <yasu`> should I file it as a ... Bug ?
[1:42] <dmick> sorry
[1:42] <Kioob> 60,2 MB/s for writing 1GB with dd (from /dev/zero). It's not really fast
[1:42] <dmick> helps if I give you the right URL too: http://tracker.newdream.net/projects/ceph
[1:42] <Kioob> but it works
[1:43] <dmick> and yes, I'd call it a bugb
[1:43] <yasu`> ah should I sign in or something in order to file an issue ?
[1:44] <dmick> you may have to create an account, yes. If you don't want to go through that I can do it for you but you'll have a better description, and can then see status updates
[1:45] <yasu`> I'll create an account. if I do that then a button like "new issue" comes up ?
[1:45] <dmick> probably. there's a tab in my view
[1:47] <yasu`> hmm, registering seemed to be okay but I cannot login with the password I set
[1:47] <Kioob> so, good night :)
[1:47] <Kioob> and thanks a lot for the help
[1:48] <dmick> yasu`: you need to confirm from the email it sent you
[1:48] <dmick> Kioob: gnite
[1:48] <yasu`> okay :)
[1:49] <tore_> yikes my client is hosed lol
[1:49] <tore_> man I hate ilo
[1:49] * tnt (~tnt@207.171-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:50] * benpol (~benp@garage.reed.edu) has left #ceph
[1:50] <yasu`> I logged in. which tab ?
[1:51] <dmick> New issue
[1:51] <dmick> in the bar starting with Overview
[1:51] <yasu`> got it
[1:53] * BManojlovic (~steki@242-174-222-85.adsl.verat.net) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:55] * LeaChim (~LeaChim@b0fafb7d.bb.sky.com) has joined #ceph
[1:57] <dmick> anyone besides Sage or possibly Peter willing to look into http://tracker.newdream.net/issues/3459 ?
[2:00] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[2:02] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Quit: Leaving.)
[2:02] <wer> sudo ceph auth get-or-create client.admin mds 'allow' osd 'allow *' mon 'allow *' > /etc/ceph/keyring just stalls
[2:03] <wer> it appears mon is ignored by init.d script. Is that normal?
[2:09] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[2:17] <wer> All my osd on two nodes are listed in a ceph osd dump, however from the second node, I am unable to do a ceph osd dump. This is a freshly made 2 node system that mkcepfs just finished making. I had to disable cephx and restart the mon in order to add my client.admin key, however the second node will not add it's key do to "unable to authenticate as client.admin "
[2:18] <wer> node1 is v0.54 and node 2 is v0.55. I have also tried copying over the keys from the first node to the second node and no luck.
[2:18] <wer> So how can I check in keys when authentication will not let me?
[2:32] <joshd> wer: 0.55 turns on authn by default. you'd need to explicitly disable it (i.e. auth supported = none) on the 0.55 node, or just to register keys, 'ceph --auth-supported none ...'
[2:32] <wer> thanks joshd . Yeah I spotted that.
[2:33] <wer> auth cluster required = none
[2:33] <wer> auth service required = none
[2:33] <wer> auth client required = none
[2:33] <wer> That disables it since 5.1 I think. It is working now. And I guess I will check-in another client.admin key.... but it is the same key. So I just don't get it.
[2:35] <joshd> wer: I don't know the full context of your setup, but maybe it's not looking in the same location for the keyring
[2:36] <wer> :) yeah. I don't know. I don't specify. but on both nodes I have an identical setup.... the same key is in the same place.
[2:36] <wer> ceph auth get-or-create client.admin mds 'allow' osd 'allow *' mon 'allow *' > /etc/ceph/keyring
[2:36] <wer> I ran that
[2:37] <wer> and the entry in ceph auth list is still the same under client.admin... and the key I had locally is identical....
[2:38] <wer> So basically, I don't get it. And to add the client.admin for the first node I had to disable cephx. So there is some sort of chicken/egg thing happening, or the instructions are out of date for how to add/create client admins.....
[2:43] <joshd> could you pastebin your ceph.conf from both nodes?
[2:43] <wer> do I just need to restart the my one and only mon when enabling/disabling cepx, or do all the osd's in all the nodes need to be restarted?
[2:43] <wer> yeah
[2:43] <joshd> ah, yes
[2:43] <joshd> you need to restart the daemons to enable/disable authentication
[2:43] <joshd> they just read the config file on startup
[2:45] <wer> well I was using the old style enablement in the ceph.conf to configure cephx... lemmie paste real quick, I have everything shut down.
[2:49] <wer> joshd: http://pastebin.com/M1X2z5Ch
[2:49] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:49] <wer> It's the same configs on both nodes... basically 24 osd's on one and 24 on the other.
[2:49] <wer> The mon lives on the first node for now.
[2:50] <wer> and so does radosgw.
[2:50] <tore_> anyone seeing errors like:
[2:50] <tore_> 2012-12-07 10:47:48.085324 7f1ed477d700 0 -- >> pipe(0x1472d80 sd=16 :0 pgs=0 cs=0 l=1).fault
[2:50] <tore_> in their mds and osd logs?
[2:51] * LeaChim (~LeaChim@b0fafb7d.bb.sky.com) Quit (Remote host closed the connection)
[2:51] <tore_> even attempting to restart the cluster just streams these error
[2:52] <via> tore_: i see lots of it, even upon output of things like ceph -s
[2:53] <tore_> yeah yesterday i fired up 0.55 and ran iozone against the cephfs mount
[2:53] <tore_> after about 20 min it locked up and it's proving very deiffcult to recover hehe
[2:54] <via> joshd: here's a monitor crashing: https://pastee.org/phr3v
[2:55] <joshd> tore_: that message indicates a tcp connection failure
[2:55] <tore_> So in 0.55 is fs_type no longer used?
[2:57] <tore_> mkcephfs chokes on fs_type at ln 328
[2:57] <via> tore_: remove the space between fs_type and =
[2:58] <tore_> ah good catch. is there already a commit for this?
[2:58] <via> they know about it at least
[2:59] <tore_> ah good saves me the time of submitting a bug report
[3:01] <tore_> need to reboot these nodes. they aren't crashe, but they are all reported as not responding
[3:01] <wer> well joshd .... like magic it is working now. Makes no sense to me. Perhaps it was the incorrect use of the old cephx statement....
[3:02] * chutzpah (~chutz@ Quit (Quit: Leaving)
[3:04] <joshd> via: that looks like http://www.tracker.newdream.net/issues/3495
[3:05] <joshd> wer: if you didn't restart them before, that's probably what did it
[3:06] <wer> ok... now osd's are just up down....... the node runnin the mon reports in ceph.log the correct 130TB I expect to see.... perhaps it will just take a while to settle?
[3:06] * dmick (~dmick@2607:f298:a:607:c8bf:e8a1:2154:e15f) Quit (Quit: Leaving.)
[3:09] <tore_> hmmm... rebooted all the nodes and now it's all back up
[3:10] <wer> is this bad ? osd.24 544 from dead osd.29, dropping, sharing map
[3:11] <wer> I am getting tons of debug saying one osd reported another dead.... and they keep dropping out....
[3:12] <wer> load average: 153.20, 131.82, 75.13
[3:12] <wer> :)
[3:13] <wer> I just never get used to those... it's like inflation.
[3:14] <joshd> something's seriously wrong if load is that high
[3:14] <wer> yeah... things keep getting reported as failed in the logs.... and the network isn't doing much.....
[3:15] <wer> osd.42 failed (5 reports from 2 peers after 100.217397 >= grace 95.738884)
[3:15] <joshd> anything in syslog? could be broken filesystems under the osds
[3:15] <wer> no it is pretty quiet in there....
[3:16] <wer> yeah something is seriously wrong though :)
[3:16] <joshd> what's using all the cpu or i/o?
[3:17] <wer> 5648 pgs down has been slowly incrementing..... all the osd's I think.
[3:17] <wer> They are each averaging 79.73 Blk_wrtn/s
[3:18] <joshd> but no network traffic?
[3:18] <wer> not really. like 100k.
[3:18] <wer> I am still waiting for the graphs.... installing nload too
[3:19] <wer> yeah the netowork is doing almost nothing :)
[3:19] <tore_> yeay a new error msg
[3:19] <tore_> 2012-12-07 11:16:57.475971 7fdff7b6d700 0 -- >> pipe(0x5ebd000 sd=29 :43029 pgs=0 cs=0 l=0).connect claims to be not - wrong node!
[3:20] <wer> osd.3.pid is going nuts though.....
[3:20] <joshd> tore_: there's a bug that causes that which should be fixed by 0.55
[3:20] <tore_> ah good show
[3:21] <joshd> wer: osd.3.pid? you mean it's crashing and restarting constantly by upstart or something?
[3:21] <wer> I don't know. It was eating %1300 cpu so I just restarted it.... checking the logs....
[3:26] <wer> osdmap e771: 48 osds: 20 up, 34 in
[3:26] <wer> I don't know what that osd.3 was doing.... or why it was using so much cpu....
[3:27] <wer> osdmap e781: 48 osds: 22 up, 34 in
[3:28] * jlogan1 (~Thunderbi@2600:c00:3010:1:7db5:bf2b:27d1:c794) has joined #ceph
[3:28] <wer> osdmap e790: 48 osds: 24 up, 34 in..... Slowly getting there?
[3:28] <via> joshd: i'll try that patch in the bug ticket
[3:29] <wer> I don't know why I am missing 14 osd's though..... There are a fair amount of [WRN] slow request 580.990385 seconds old, re in ceph.log
[3:29] <wer> oh wait a minute
[3:30] <joshd> wer: a number of them might have killed themselves since they were too slow
[3:30] <joshd> due to osd.3's craziness
[3:30] <wer> I might have some firewall craziness too. causing the osd's not to communicate..... they talk directly right?
[3:31] <wer> shit man. Yeah now I see 29MBits...... of traffic.
[3:31] <wer> I'm stupid.
[3:31] <wer> I had opened it up for mon..... but didn't think about the osd's
[3:32] <joshd> if you figure out what firewall stuff made osd.3 go crazy, it'd be good to know
[3:32] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[3:32] <joshd> cat /proc/$pid/stack if it happens again
[3:33] <wer> will do. If I catch one going crazy again I will be sure to grab that for you.
[3:34] <joshd> thanks
[3:34] <joshd> a short strace might help too
[3:34] <joshd> it's likely stuck in a loop somewhere it shouldn't have been
[3:35] <wer> load on that box is back down to 53 and no one osd looks angry.... but they are all kind of angry at this point I think.
[3:35] <joshd> via: that'd be great, I know joao hasn't been able to reproduce it itself
[3:37] <joshd> wer: yeah, they should sort themselves out, but in the mean time you might want to make them all stay in with 'ceph osd set noout', and 'ceph osd unset noout' when it's back to normal
[3:38] <tore_> angry, like my wife angry?
[3:38] <wer> tore_: no. Because this was your fault.
[3:38] <tore_> haha
[3:39] <infernix> how do i determine OSD journal size?
[3:40] <tore_> i don't think you have to in 0.55
[3:40] <tore_> I had to comment that out of my conf to get it to start
[3:40] <wer> tore_: I didn't ?!
[3:40] <tore_> [osd]
[3:40] <tore_> osd data = /var/lib/ceph/osd/$name
[3:40] <tore_> osd journal = /var/lib/ceph/osd/$name/journal
[3:40] <tore_> #osd journal size = 1000 ; journal size, in megabytes
[3:40] <tore_> keyring = /etc/ceph/keyring.$name
[3:40] <tore_> #debug osd = 1
[3:40] <tore_> #debug filestore = 1
[3:41] <tore_> yeah upgraded from 48 - 55
[3:41] <tore_> and it was choking on this because I had it defined
[3:41] <tore_> after commenting it out, np
[3:41] <joshd> it defaults to the whole block device, or 5GB if you're using a file
[3:41] <wer> Shoot I have filestore xattr use omap = true and am using xfs too
[3:42] <joshd> wer: that shouldn't be too much of an issue
[3:42] <tore_> latest mainline with btrfs here
[3:42] <infernix> ok but if you use a file for logging it can get fragmented
[3:43] <infernix> whereas if i partition a spindle disk, i know that the first part of the disk is fastest
[3:43] <wer> ok. should I go ahead and ceph osd set noout now joshd? and ceph osd unset noout when it is done being crazy?
[3:43] <infernix> so wouldn't it make sense to put it in a partition if you're not using ssd?
[3:43] <infernix> s/logging/journaling
[3:44] <joshd> wer: yeah, that'll stop too much unnecessary data shuffling
[3:44] <wer> ty!
[3:44] <joshd> infernix: it avoid a little fs overhead, but if it's not a separate drive, it's not a big deal
[3:45] <joshd> especially for your backup use case
[3:46] <infernix> i'm looking at putting 6 DL380s together
[3:46] <infernix> for a proof of concept
[3:46] <infernix> before shelling out the $30k production setup
[3:47] <infernix> have 4 racked, need to find some more. might get 7 total
[3:50] <wer> joshd: ok, things are slowly reducing in numbers..... I think I will let this sit. I plan on blowing it up again tomorrow as I add two new nodes :(
[3:51] <wer> Thanks for all the help. it was much appreciated.
[3:51] <joshd> wer: doubling capacity all at once would cause a lot of data movement. you can put them in at weight 0, and ramp that up slowly
[3:52] <joshd> although with 0.55 I think the recovery throttling might be enough
[3:52] <wer> yeah. I have done that with one osd... but I am not sure how to do that with 24 at a time..
[3:52] <wer> I guess as long as I don't start them before I weight them?
[3:52] <joshd> yeah
[3:53] <wer> :) well we will see how it goes tomorrow. I completly messed up my whole thing today.... and punted to a fresh clean mkcepfs across both nodes. I don't want to do tht tomorrow... but I seem to run into these stupid cephx issues every time.
[3:54] <wer> it is really annoying!!
[3:54] <joshd> good luck
[3:55] <joshd> i've got to run as well
[3:55] <infernix> on that note, with what data rates can a cluster expand?
[3:55] <wer> heh thanks! ok I am out.
[3:55] <infernix> one osd? combined osd bandwith of a node? some other bottleneck?
[3:56] * wer is now known as wer_gone
[3:57] <joshd> complicated to answer, but basically recovery is parallel (it's done at the pg level) read up on placement groups if you haven't yet
[3:58] * houkouonchi-work (~linux@ Quit (Read error: Connection reset by peer)
[3:59] * houkouonchi-work (~linux@ has joined #ceph
[4:07] <mikedawson> Re-addressed my development nodes today. OSDs handled the address change, but MONs will not start. Fail referencing old IP address
[4:08] <mikedawson> When trying to fix the issue, should I update the file /var/lib/ceph/mon/ceph-a/monmap/{highest integer} ?
[4:29] <wer_gone> hmm. despite the docs saying you can do rolling upgrades.... I don't recommend it. I think are large part of this thing never setting down was the versino mismatch between v.54 and v.55 As soon as I upgraded things are heading in the right direction towards OK. And OSD's are staying up.
[4:53] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[5:07] * madkiss (~madkiss@ Quit (Ping timeout: 480 seconds)
[5:13] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:17] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.89 [Firefox 17.0.1/20121128204232])
[5:18] * yasu` (~yasu`@dhcp-59-168.cse.ucsc.edu) Quit (Remote host closed the connection)
[5:21] * Cube1 (~Cube@ Quit (Ping timeout: 480 seconds)
[5:22] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:26] * deepsa (~deepsa@ has joined #ceph
[5:30] * jlogan1 (~Thunderbi@2600:c00:3010:1:7db5:bf2b:27d1:c794) Quit (Ping timeout: 480 seconds)
[5:48] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[5:55] * jlogan1 (~Thunderbi@ has joined #ceph
[6:17] <via> joshd: i'll have you know, no monitor crashes since the patch was applied
[6:17] <via> its been under cephfs load for about 15 minutes and still cranking
[6:31] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[6:45] * madkiss (~madkiss@ has joined #ceph
[6:50] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[6:52] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[7:05] * drokita (~drokita@24-107-180-86.dhcp.stls.mo.charter.com) Quit (Quit: Leaving.)
[7:07] * yoshi (~yoshi@p4105-ipngn4301marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[7:33] * madkiss (~madkiss@ Quit (Quit: Leaving.)
[7:52] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[7:58] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[8:02] * jlogan1 (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[8:32] * low (~low@ has joined #ceph
[8:36] * loicd (~loic@ has joined #ceph
[8:42] * loicd (~loic@ Quit (Quit: Leaving.)
[8:51] * tnt (~tnt@207.171-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:53] * madkiss (~madkiss@ has joined #ceph
[9:01] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[9:09] * nosebleedkt (~kostas@ has joined #ceph
[9:09] <nosebleedkt> goodmorning all !
[9:11] * loicd (~loic@ has joined #ceph
[9:14] <nosebleedkt> what is the default number of PGs in a new pool ?
[9:17] <Kiooby> monclient(hunting): failed to open keyring: (2) No such file or directory <=== it will really help to put the filename, in that kind of error
[9:22] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[9:26] * jlogan (~Thunderbi@2600:c00:3010:1:6963:f614:3534:e6d8) has joined #ceph
[9:30] * ctrl (~Nrg3tik@ Quit (Read error: Connection reset by peer)
[9:31] <tnt> nosebleedkt: 8 ... which is really not enough
[9:32] <nosebleedkt> tnt, i got 5 OSDs so I gave it 250 PGs
[9:32] <tnt> sounds goods
[9:32] <tnt> -s
[9:39] * agh (~Adium@gw-to-666.outscale.net) Quit (Read error: Connection reset by peer)
[9:39] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:40] * BManojlovic (~steki@ has joined #ceph
[9:43] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit ()
[9:44] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:49] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[9:51] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[10:10] * ScOut3R (~ScOut3R@ has joined #ceph
[10:12] <Kiooby> after an upgrade from v48 to v55, OSD doesn't start, because of auth problem... (cephx was already enabled on v48)
[10:12] <Kiooby> something was changed between both versions ?
[10:15] * The_Bishop (~bishop@2001:470:50b6:0:21b0:97ac:b32b:7f5e) Quit (Ping timeout: 480 seconds)
[10:19] * fc (~fc@home.ploup.net) has joined #ceph
[10:19] * LeaChim (~LeaChim@b0fafb7d.bb.sky.com) has joined #ceph
[10:25] * The_Bishop (~bishop@2001:470:50b6:0:c4ec:2f12:2ccd:748a) has joined #ceph
[10:44] <nosebleedkt> tnt, i created a new pool
[10:44] <nosebleedkt> named mypool
[10:44] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[10:44] <nosebleedkt> then i want to map and rdb device on the new pool
[10:45] <nosebleedkt> root@cephfs:~# rbd create foobar --size 256
[10:45] <nosebleedkt> root@cephfs:~# rbd map foobar --pool mypool
[10:45] <nosebleedkt> rbd: add failed: (2) No such file or directory
[10:45] <tnt> you forgot the --pool argument during the create ...
[10:47] * gaveen (~gaveen@ has joined #ceph
[10:49] <nosebleedkt> ah
[10:49] <nosebleedkt> wait
[10:52] <nosebleedkt> tnt, thanks
[10:59] * maxiz (~pfliu@ Quit (Ping timeout: 480 seconds)
[11:01] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[11:06] * roald (~Roald@ has joined #ceph
[11:13] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[11:18] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[11:18] * yoshi (~yoshi@p4105-ipngn4301marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:18] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit ()
[11:20] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[11:26] * jlogan1 (~Thunderbi@ has joined #ceph
[11:29] * jlogan (~Thunderbi@2600:c00:3010:1:6963:f614:3534:e6d8) Quit (Ping timeout: 480 seconds)
[11:29] * madkiss (~madkiss@ Quit (Read error: Connection reset by peer)
[11:32] * madkiss (~madkiss@ has joined #ceph
[11:40] <nosebleedkt> tnt,
[11:40] <nosebleedkt> root@cephfs:~# ceph osd pool mksnap mypool mysnap071212
[11:40] <nosebleedkt> created pool mypool snap mysnap071212
[11:40] <nosebleedkt> now how can i export that image in a file?
[11:41] * madkiss (~madkiss@ Quit (Quit: Leaving.)
[11:45] <tnt> map the snap and dd it to a file
[11:46] <nosebleedkt> how to map the snap ?
[11:47] * deepsa_ (~deepsa@ has joined #ceph
[11:49] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[11:49] * deepsa_ is now known as deepsa
[11:52] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[11:54] <nosebleedkt> tnt, how to map the snap ?
[12:03] * Lennie`away is now known as leen
[12:03] * leen is now known as Lennie
[12:06] <Lennie> hi, I'm somewhat confused
[12:06] <Lennie> on a new ceph cluster, with 3 mon/osd machines, I have 192 pgs: 186 active+clean, 6 active+degraded; 0 bytes data
[12:07] <Lennie> if everything is up and running, why would it keep those 6 degraded ?
[12:08] <Lennie> this is with debian-testing from the eu-mirror by the way
[12:09] * Leseb (~Leseb@ has joined #ceph
[12:16] <Lennie> or does that solve itself when I start writing some data to it ?
[12:19] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[12:22] <tnt> doutbful
[12:22] <tnt> try restarting the osd one by one, waiting for a stable situation between each restart
[12:22] <nosebleedkt> tnt, how to map the snap ?
[12:24] <Lennie> tnt: ok, will try that
[12:31] <tnt> nosebleedkt: read the man page or doc ...
[12:34] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[12:40] * gaveen (~gaveen@ has joined #ceph
[12:43] <Lennie> tnt: ok, after everything has settled, it still says 6 degraded :-(
[12:43] <Lennie> tnt: is it important that it gets marked down first before restart ?
[12:44] <jamespage> Lennie, which version of ceph?
[12:45] <jamespage> I've seen that on >= 0.48 when you have 1OSD/1MON per host
[12:46] <Lennie> jamespage: that is with debian-testing from the EU mirror: 0.55-1quantal
[12:46] <jamespage> Lennie, you can fix it up - http://ceph.com/docs/master/rados/operations/crush-map/?highlight=tunables#tunables
[12:47] <jamespage> Lennie, please read the "Which client versions support tunables" bit carefully :-)
[12:48] <jamespage> Lennie, I think you are seeing the first in the list of misbehaviours
[12:49] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[12:49] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has left #ceph
[12:52] <Lennie> jamespage: that works like a charm
[12:52] <jamespage> Lennie, great!
[12:53] <Lennie> jamespage: so would it have solved itself if I added an other osd ?
[12:53] <jamespage> Lennie, probably
[12:56] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:00] <jamespage> Lennie, I normally see it disappear when I add some more - so yes!
[13:20] * Lennie is now known as Lennie`away
[13:21] <nosebleedkt> tnt, cant find it :(
[13:32] * jlogan1 (~Thunderbi@ Quit (Ping timeout: 480 seconds)
[13:37] * absynth (~absynth@irc.absynth.de) Quit (Ping timeout: 480 seconds)
[13:38] * deepsa_ (~deepsa@ has joined #ceph
[13:40] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[13:40] * deepsa_ is now known as deepsa
[13:47] * absynth (~absynth@irc.absynth.de) has joined #ceph
[13:52] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[14:00] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[14:07] * deepsa_ (~deepsa@ has joined #ceph
[14:08] * fmarchand (~fmarchand@ has joined #ceph
[14:08] <fmarchand> Hi !
[14:08] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[14:08] * deepsa_ is now known as deepsa
[14:10] <fmarchand> I have just a question to install ceph using rados : where can I find documentation to install it without installing cephfs. The documentation I found is describing a setup with mds
[14:13] * loicd (~loic@magenta.dachary.org) has joined #ceph
[14:13] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[14:13] * ChanServ sets mode +o elder
[14:14] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[14:16] <andreask> fmarchand: you had a look at http://ceph.com/docs/master/rados/configuration/ ?
[14:17] <fmarchand> andreask: thx ! perfect ! I was not in the right part of the documentation
[14:18] <fmarchand> andreask: am I right if I say that I won't have to use a mds ?
[14:18] <andreask> fmarchand: yes ... only needed if you want to use cephfs
[14:19] <fmarchand> andreask: when you use rados you can mount it as a partition like cephfs ?
[14:20] <andreask> fmarchand: you can use rbd if you need block-devices
[14:22] <fmarchand> andreask: so first I configure my cluster with rados like in the documentation and after I configure it with rbd ... is that the idea ?
[14:26] <andreask> fmarchand: depens on your needs if you use rbd by using the kernel module or if you need e.g. disks for VMs then you can use a new qemu with rbd support
[14:31] <fmarchand> first option I think the second is interesting ... but I feel to newbie to do it ...
[14:32] <fmarchand> newbee
[14:35] <nosebleedkt> does anyone knows how to export a pool snaphot ??
[14:36] * myth (c351e231@ircip1.mibbit.com) has joined #ceph
[14:37] <myth> hello
[14:37] <myth> I have an unwanted behavior with RBD
[14:38] <myth> I create an image, size 1GB
[14:38] <myth> write 500MB on it, then delete the file
[14:38] <myth> on OSD, 500MB is still used on disk
[14:39] <myth> then, when I write 600MB, OSD grow from only 100MB (to 600MB)
[14:39] <myth> etc
[14:39] <myth> how can I "clear" OSD on used space on RBD?
[14:39] <myth> thanks :)
[14:42] <nosebleedkt> myth, you need to wait a bit so ceph can update the changes
[14:43] <nosebleedkt> run ceph -w
[14:43] <nosebleedkt> to watch the changes
[14:43] <myth> it was working with cephfs but not with rbd
[14:43] <myth> it's about 2H now
[14:43] <myth> OSD still have data
[14:44] <nosebleedkt> rbd is thin provisioned.
[14:44] <myth> using ubuntu 12.04
[14:44] <ScOut3R> myth: as far as i know that's the normali behaviour, the 500MB is freed up but won't show on the OSDs
[14:44] <nosebleedkt> you might have done something else
[14:44] <ScOut3R> you can see the changes by the numbers only if you delete the rbd image
[14:44] <myth> ScOut3R: that's not cool :(
[14:45] <ScOut3R> well, it's something i can live with :)
[14:45] <myth> there is a way to clean OSD?
[14:45] <ScOut3R> actually it's "clean"
[14:45] <ScOut3R> because if you write another 500MB on it the space usage won't grow
[14:45] <myth> lol? :d
[14:45] <nosebleedkt> yes
[14:45] <myth> ScOut3R: yep, indeed
[14:45] <myth> but if I add other rbd
[14:46] <myth> and write/delete things
[14:46] <myth> osd still growing
[14:46] <ScOut3R> probably because it's another image; i don't know the internals of ceph, i'm a new user myself
[14:47] <myth> cephfs do "normal" things (osd are clean after few seconds on delete)
[14:47] <ScOut3R> but if another image cannot reuse the space used by another image, now that might be a problem in the long run
[14:47] <myth> ScOut3R: that's what I'm seeing... (I would like to have a lot rbd... :()
[14:48] <myth> if I have waste space, it's not good in fact
[14:48] <nosebleedkt> ScOut3R, do u know how to export a pool snaphot ??
[14:48] <nosebleedkt> root@cephfs:~# ceph osd pool mksnap mypool mysnap071212
[14:48] <tnt> you can't "clear" a RBD ... once a 'sector' has been written to once, it will always exist.
[14:49] <nosebleedkt> ah tnt ! man give me a hand 1
[14:49] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[14:49] <ScOut3R> tnt: and that means it's tied to that specific image, right?
[14:49] <myth> tnt: ok... So we should not do over provisionning almost
[14:49] <myth> tnt: If i add more OSD, space will be lower because of the migrating, right?
[14:50] * deepsa (~deepsa@ has joined #ceph
[14:50] <tnt> yes if you add more osd, it will spread data around.
[14:50] <tnt> ScOut3R: yes, that space is reserved for that image.
[14:51] <ScOut3R> tnt: thank's for the explanation, good to know that it's by design, and it helps mitigating overprovisioning
[14:51] <tnt> nosebleedkt: dude, I have no idea how to work with snapshot so quit asking !
[14:51] <nosebleedkt> lol
[14:52] <ScOut3R> nosebleedkt: sorry, i don't know either, i'm not using spanshots
[14:52] <ScOut3R> snapshots*
[14:52] <nosebleedkt> yeah :(
[14:52] <tnt> ScOut3R: that might change if/when they implement the 'discard' function like on SSD where the OSD can tell the block layer it doesn't need those sectors anymore, but until then, as soon as a block is written, it's allocated forever.
[14:52] <tnt> s/OSD/OS/
[14:53] <myth> tnt: you know if it's planned?
[14:55] <tnt> actually I might be wrong: http://patchwork.ozlabs.org/patch/156010/
[14:56] <ScOut3R> tnt: thanks, now that i know why this is happening i'm happy :) it's not a blocker for us
[14:56] <tnt> not sure if it's in the kernel rbd driver.
[14:56] <myth> I use 3.2 kernel
[14:56] <tnt> and using rbd with it ?
[14:57] <myth> yep
[14:57] <myth> ubuntu 12.04
[14:57] <myth> ceph version 0.48.2argonaut
[14:58] <tnt> you shouldn't
[14:58] <tnt> 3.2 is really not recent enough for RBD
[14:58] <myth> ok :)
[14:58] <myth> (I just do some test with ceph)
[14:58] <myth> but it looks like I make me more trouble than I should)
[14:59] <myth> :(
[15:01] <ScOut3R> tnt: which kernel version would you recommend 0.48.2? i'm also running my tests on ubuntu 12.04 and planning to start the production environment soon and using a new, not distro specific kernel would require a few more tests
[15:02] <myth> ScOut3R: on the FAQ they say 3.6
[15:02] <ScOut3R> ah, sorry, the FAQ :)
[15:02] <myth> lastet 3.6.x release
[15:03] <myth> ceph is on heavy developpement since few months... 12.04 seems too old :(
[15:03] <myth> ScOut3R: http://www.liberiangeek.net/2012/10/install-upgrade-to-linux-kernel-3-6-0-in-ubuntu-12-04-precise-pangolin/
[15:04] <tnt> yeah I use 3.6.9 currently
[15:05] <ScOut3R> myth: thanks for the link :) i'm going to upgrade those kernel now using the ppa repo
[15:05] <myth> ScOut3R: they is only v3.6, not 3.6.9
[15:06] <myth> there is
[15:06] <myth> maybe using the really latest kernel should be better
[15:07] <ScOut3R> myth: there's 3.6.3: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6.3-quantal/
[15:07] <tnt> http://www.upubuntu.com/2012/12/installupgrade-to-linux-kernel-369-in.html
[15:07] <myth> thanks tnt :)
[15:09] <nosebleedkt> tnt, I have a computer that has 2 OSDs in the cluster.
[15:09] <nosebleedkt> Another computer with 3 OSDs
[15:10] <nosebleedkt> i shut down the computer with 2 OSDs
[15:10] <nosebleedkt> what is going to happen ?
[15:11] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[15:11] <myth> nosebleedkt: you have to setup a proper crush map
[15:11] <myth> to tell the difference between computer
[15:13] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[15:13] * tryggvil_ is now known as tryggvil
[15:13] <nosebleedkt> isnt ceph suppose to solve that problem?
[15:13] <nosebleedkt> and rearrange the PG's in its map
[15:21] <myth> tnt: since the kernel upgrade, speed is horrible when writting and reading
[15:22] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[15:23] <myth> ScOut3R: take care about it :)
[15:23] <myth> do you have the same behavior?
[15:24] <ScOut3R> myth: i've installed version 3.6.3 and the server won't boot, i'm into rescue mode now. Thank's to ubuntu's "quiet splash" mode i cannot see sh*t during boot and forgot to disable it
[15:24] <myth> raf
[15:24] <myth> arf
[15:24] <myth> too bad
[15:24] <myth> using the tnt scripts it works like a charm
[15:27] <ScOut3R> i'm going to compile my own kernel when i'll have the time and for the prod environment
[15:27] <ScOut3R> but for initial testing prebuilt is good
[15:27] <ScOut3R> but i don't like scripts :)
[15:27] <myth> ScOut3R: ;)
[15:28] <myth> ScOut3R: I can tell you the 3.6.9 gives me poor performance when writting AND reading
[15:28] <myth> btw, osd doesn't clean
[15:28] <ScOut3R> myth: what do you mean by poor performance? do you have "numbers"?
[15:28] * nhorman (~nhorman@nat-pool-rdu.redhat.com) has joined #ceph
[15:29] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[15:29] * tryggvil_ is now known as tryggvil
[15:29] <myth> only reading, 120MB/s (NC limiting), only writting (replica size 2), 25MB/s
[15:29] <tnt> myth: I never used 3.2 much ... it was oopsing when an OSD was going down so it was unusable anyway.
[15:29] <myth> it's the same for 3.6.9 and 3.2 for separate speed
[15:30] <myth> with both writting and reading on 3.2 I got 13MB/s writting and 50/60MB/s reading
[15:30] <ScOut3R> well, with 3.2 i have the writing speed around 20MB/s but i'm sure it's about the poor network connection between the nodes, shared 1Gps, 3 osd nodes, rep size 4
[15:30] <elder> nhm, what is your plan for lunch today? Do you want to meet? Would you prefer another reschedule?
[15:31] <myth> on 3.6.9 I have 13MB/s writting and 4/5MB/s reading
[15:31] <myth> tnt: yep... I did some test with osd up/down... strange behavior sometimes... :(
[15:32] <myth> ScOut3R: I'm using VMs on my laptop with 6GB, SSD and intel i7
[15:32] * flash (~user1@host86-164-235-115.range86-164.btcentralplus.com) has joined #ceph
[15:32] * flash is now known as Guest763
[15:32] <ScOut3R> myth: then it's a very unique testing environment :)
[15:33] * Guest763 (~user1@host86-164-235-115.range86-164.btcentralplus.com) has left #ceph
[15:33] * drokita (~drokita@ has joined #ceph
[15:33] <myth> ScOut3R: I don't have money to spend :(
[15:33] * Guest763 (~user1@host86-164-235-115.range86-164.btcentralplus.com) has joined #ceph
[15:33] <nhm> elder: I'm game, but I may not hav a car available.
[15:34] <elder> Hmm. Let's reschedule. I'm feeling short on time. (That's not unusual...)
[15:34] <ScOut3R> myth: yeah, that's not an easy thing; i've waited over a year to get my hands on these servers
[15:34] <elder> Will you have a car one day next week?
[15:35] <nhm> elder: Mondays are usually best for me. Preschool dropoff kind of throws things off.
[15:35] <elder> Let's do Monday then. Can we meet at Rosedale again? It's a good "half way" point for me.
[15:35] <elder> There are other restaurants nearby too.
[15:35] <elder> (Near there)
[15:36] <ScOut3R> hoho, nice bootup error with the new kernel: disk by uuid does not exist
[15:37] <myth> ScOut3R: tnt I create the RBD with the old kernel... I'll try from scratch new setup
[15:38] <nhm> elder: that works for me, the only potential issue is my wife has an dr. appointment at 10:15, so if it goes really long I might be a bit late.
[15:38] <elder> Let's talk about it Monday morning.
[15:38] <nhm> elder: ok, sounds good.
[15:44] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[15:51] <jefferai> are there any instructions for upgrading a ceph node without causing havoc in the cluster (for instance, writes from VMs failing because it cannot write to all nodes while the one is down)?
[15:52] <jefferai> do you simply mark those osds down/out
[15:52] <jefferai> stop a mon running on the box
[15:52] <jefferai> do the upgrade
[15:52] <jefferai> then reverse it?
[15:52] <noob2> jefferai: i'm interested also :D
[15:57] <via> joshd: still flawless. this is more stable than i've ever seen ceph thus far
[16:00] * Kiooby (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[16:00] <ScOut3R> myth: 3.6.3 finally running, going to upgrade the other two nodes than running some tests
[16:02] <myth> ScOut3R: ok :)
[16:08] <nosebleedkt> does it matter the number of OSDs I have in cluster?
[16:09] <nosebleedkt> for example, to be multiple of 2
[16:09] <myth> for osd, it looks like you can have what you want
[16:09] <myth> for mds/mon, odd number
[16:10] <absynth> with odd not being 1, preferrably
[16:10] <nosebleedkt> i dont understand what is going on when i shutdown some OSDs
[16:11] <absynth> after the osd out period (i thin 30s default), the cluster rebalances itself, taking all the data that was supposed to be on the OSD that is down and puts it onto the other OSDs, so you still have your redundancy goal fulfilled
[16:12] <nosebleedkt> yes but i see some crazy results here
[16:14] <nosebleedkt> I have 5 disks of 1GB each. I closed the machine which provides osd.3 and osd.4
[16:14] <paravoid> is it possible to map different radosgw buckets/containers to different rados pools?
[16:14] <paravoid> there seems to be a "radosgw-admin pool add" functionality
[16:14] <nosebleedkt> so i have now osd.{0,1,2}
[16:14] <paravoid> but how buckets are mapped to those is not clear
[16:14] <nosebleedkt> but still ceph -w reports i have 5GB
[16:15] <absynth> you are running a cluster with 0 redundancy?
[16:15] <absynth> what sense does that make?
[16:15] <nosebleedkt> how to check that?
[16:15] <jefferai> nosebleedkt: ceph reports the total amount of raw space available
[16:15] <jefferai> regardless of how many copies of the data are being stored
[16:16] <jefferai> which is why you can put 500GB of data in a cluster and see 1.5TB used
[16:16] <jefferai> but the raw space remains raw space
[16:16] <andreask> nosebleedkt: you have 3 out of 5 osds up ... that is fine ... though you only have one copy of each pg available
[16:18] <nosebleedkt> ok
[16:18] * nosebleedkt (~kostas@ Quit (Quit: Leaving)
[16:19] <myth> your welcome nosebleedkt :o
[16:19] <andreask> oh ... he did not like my answer?
[16:20] <myth> andreask: no, just not polite :)
[16:21] <andreask> ;-)
[16:22] <jefferai> besides, some people will never learn that hanging out in a chat channel is how you learn things
[16:24] <jtang> heh
[16:25] <jtang> i find this channel great for venting my anger at distributed and parallel filesystems
[16:25] <jtang> :)
[16:25] * jtang vents some anger at lustre
[16:26] <jtang> hrm, ddn just sent me a new letter
[16:26] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[16:26] <jtang> im still scratching my head about the new SFA range of ddn machines
[16:36] <nhm> jtang: sfa12k?
[16:37] <nhm> jtang: yes, feel free to vent. I think we want to hear about it, even if it's Ceph. Though especially if it's not. ;)
[16:43] * rlr219 (43c87e04@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[16:43] * rlr219 (43c87e04@ircip4.mibbit.com) has joined #ceph
[16:47] * jlogan (~Thunderbi@2600:c00:3010:1:1c0b:eec3:80d8:7155) has joined #ceph
[16:54] * low (~low@ Quit (Quit: Leaving)
[16:56] * PerlStalker (~PerlStalk@ has joined #ceph
[17:04] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:08] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[17:10] * aliguori (~anthony@cpe-70-113-5-4.austin.res.rr.com) has joined #ceph
[17:12] <jmlowe> so I accidentally created a 400TB rbd device, I meant to create one that was 400GB, it takes a while to delete
[17:15] <myth> jmlowe: haha :)
[17:15] <myth> too bad
[17:15] <myth> on the mailing list it looks like have multiple rbd is better than few big
[17:16] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[17:17] * nhorman (~nhorman@nat-pool-rdu.redhat.com) Quit (Quit: Leaving)
[17:17] <jmlowe> I'm just now remembering that you can resize, I should just take the day off while I have some foot left to shoot
[17:19] * Guest763 (~user1@host86-164-235-115.range86-164.btcentralplus.com) Quit (Quit: Leaving.)
[17:19] * ircolle2 (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) has joined #ceph
[17:22] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[17:23] * ircolle1 (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[17:27] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[17:31] <myth> ScOut3R: with ceph v.55 and kernel 3.6.9, only write speed is twice better
[17:31] <myth> from 25MB/s to 55/60MB/s
[17:31] <myth> doing both read and write still slow :(
[17:32] <ScOut3R> myth: thanks for the info; i've managed to upgrade the ceph nodes but the xen host is still running 3.2 because 3.6.3 and 3.6.9 would not boot as dom0
[17:32] <myth> but if you limit speed on write, you have good speed on read
[17:32] <myth> just don't let write at full speed
[17:32] <tnt> ScOut3R: I'm using 3.6.9 as dom0 ..
[17:32] <tnt> ScOut3R: did you create a matching initramfs after installing the kernel ?
[17:32] <myth> write speed is really aswome from v0.48 :o
[17:33] <tnt> myth: good to know. I'm planning to migrate to bobtail in january ... I guess I'll see the difference then.
[17:34] <myth> tnt: yep, really :)
[17:34] <myth> at least on my setup
[17:34] <ScOut3R> tnt: i've just let ubuntu's update-grub2 script create the initramfs, i think it should be working because the official kernel packages are using that too
[17:34] <tnt> update-grub2 doesn't create initramfs AFAIK ... it just adds it to the grub config if it exists.
[17:35] <myth> huhu
[17:35] <nhm> myth: it'll help a lot, especially if you didn't have syncfs support previously and now do.
[17:35] <myth> my RBD image is now in read only mode
[17:35] <myth> hmmm
[17:35] <ScOut3R> tnt: sorry, not update-grub2 :) it's getting late here...so when i've installed the kernel deb package is called update-initramfs
[17:36] <myth> arf
[17:36] <myth> normal
[17:36] <nhm> myth: theoretically I'm getting out an argonaut vs bo^h^h 0.55 performance comparison aritcle.
[17:36] <myth> no more space disk
[17:36] <myth> lol....
[17:36] <myth> arf, broken cluster :( (osd, mon & mds on same host)
[17:36] <myth> nhm: ok :)
[17:37] <tnt> yeah, getting the disk full is really not a good idea.
[17:37] <myth> tnt: I know
[17:37] <myth> it's the second that happend to me :(
[17:37] <tnt> nhm: when ? :p
[17:38] <myth> thanks for tips, see you
[17:38] * myth (c351e231@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[17:42] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[17:45] * Leseb (~Leseb@ Quit (Quit: Leseb)
[17:45] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[17:55] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[17:55] * ScOut3R (~ScOut3R@ Quit (Remote host closed the connection)
[17:57] <nhm> tnt: Got the data, just need to get it into an article. If I'm lucky maybe I can get it done next week.
[17:58] <nhm> tnt: My setup has syncfs both in argonaut and in bobtail though, so the results aren't as impressive as myth's. :)
[17:58] <nhm> mostly the performance improvements are with small reads/writes.
[17:58] * gaveen (~gaveen@ has joined #ceph
[17:59] <nhm> there's also a couple of small degradations.
[18:06] <tnt> nhm: but myth was running under 12.04 that should have had syncfs already.
[18:06] <tnt> even on 0.48
[18:07] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[18:19] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) has joined #ceph
[18:23] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) has joined #ceph
[18:25] * slang1 (~slang@cpe-66-91-114-250.hawaii.res.rr.com) has joined #ceph
[18:25] <slang> exit
[18:26] <slang> grr irssi
[18:26] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) Quit ()
[18:26] <rweeks> hehe slang
[18:29] * deepsa (~deepsa@ Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[18:33] * slang1 (~slang@cpe-66-91-114-250.hawaii.res.rr.com) Quit (Ping timeout: 480 seconds)
[18:33] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[18:39] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:41] * rlr219 (43c87e04@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[18:42] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) Quit (Ping timeout: 480 seconds)
[18:42] * rlr219 (43c87e04@ircip3.mibbit.com) has joined #ceph
[18:42] * joao (~JL@89-181-150-134.net.novis.pt) has joined #ceph
[18:42] * ChanServ sets mode +o joao
[18:43] <joao> forgot to turn xchat on after the power outage :x
[18:43] <joao> hello all
[18:45] <rweeks> wow, people still use xchat?
[18:45] <scuttlemonkey> hell yes
[18:45] <scuttlemonkey> although it may be muscle memory more than any real objective evaluation :P
[18:47] <jefferai> jmlowe: yeah, multiple RBD allows for multiplexing requests
[18:48] * fmarchand (~fmarchand@ Quit (Quit: Leaving)
[18:49] * jlogan (~Thunderbi@2600:c00:3010:1:1c0b:eec3:80d8:7155) Quit (Ping timeout: 480 seconds)
[18:53] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:54] * jlogan (~Thunderbi@2600:c00:3010:1:7829:6346:542c:5cf8) has joined #ceph
[18:54] * lilybet (52e7d4bf@ircip3.mibbit.com) has joined #ceph
[18:54] <jefferai> rweeks: who's the right person to run my Theory Of How To Properly Upgrade A Cluster Without VMs Going Nuts Due To Writes Not Happening To All OSD by?
[18:55] <lilybet> hi7
[18:55] * lilybet is now known as myth
[18:55] * myth is now known as mythzib
[18:55] <mythzib> hi :)
[18:56] <nhm> I had a brief xchat stint but it didnt last long between bitchx and irssi.
[18:56] <nhm> longer than pidgin though. :)
[18:57] <jefferai> there are some good clients that are still actually actively maintained
[18:57] <mythzib> nhm: if we have a lot of host client with CephFS, it's possible to have something great with?
[18:57] <jefferai> although, not text-based ones
[18:59] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) has joined #ceph
[18:59] <nhm> mythzib: how great do you need? ;)
[19:00] <mythzib> 300mbits/per hostclient
[19:01] <mythzib> for <750 connections concurrent
[19:02] <nhm> Well, unless I've done my math wrong, that's up to around 28GB/s?
[19:02] <mythzib> huhu no
[19:02] <mythzib> 300Mbits per host/computer
[19:02] <mythzib> one node
[19:03] <mythzib> 750 concurrents connections, doing 300-400Mbits
[19:03] <nhm> how many clients?
[19:03] <nhm> hostclients that is
[19:04] * chutzpah (~chutz@ has joined #ceph
[19:05] <mythzib> for example, for one data host, with 10 OSD/disk, I will have 4/5 hosts with cephFS doing 400Mbits each
[19:06] <mythzib> it's something reasonnable?
[19:06] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[19:06] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) Quit (Quit: Leaving.)
[19:08] <mythzib> mds daemon will be host on SSD
[19:08] * benpol (~benp@garage.reed.edu) has joined #ceph
[19:10] <joao> is there any kungfu needed to get tcmalloc on the planas?
[19:11] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[19:12] * wer_gone is now known as wer
[19:12] <nhm> mythzib: ok, so you'd like to be able to get ~2Gb/s to 4-5 client hosts from 1 data host with 10 disks? Is that correct?
[19:12] <jmlowe> nhm: may have finally gotten a handle on my dropped packets problem, I think pci tweaks did the trick
[19:12] <rweeks> jefferai: I think any of the inktank folks in here would be interested in your Theory
[19:12] <nhm> jmlowe: interesting! What finally fixed it?
[19:13] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[19:13] <jefferai> rweeks: it's more like a "last time things got really screwed up so I think I know how to do it properly this time but would appreciate someone giving it a once-over" :-)
[19:15] <mythzib> nhm: yes, that's it
[19:15] <mythzib> I can add more client hosts if needed (to get 200Mbits per host)
[19:15] <jmlowe> nhm: method 3 http://dak1n1.com/blog/7-performance-tuning-intel-10gbe
[19:16] <nhm> mythzib: how much replication?
[19:16] <wer> Anyone know what this means? 2012-12-07 18:03:00.230083 7fbc34a46700 0 -- >> pipe(0x54e8240 sd=36 :58588 pgs=0 cs=0 l=0).connect claims to be not - wrong node!
[19:16] <rweeks> I think both here and the ceph-devel list would be appropriate, jefferai
[19:16] <mythzib> nhm: replica size 2
[19:17] <mythzib> nhm: at the end, each data node will have 34 disk (3TB, 7200prm) but to start, only 10 disks
[19:18] <rweeks> 34 disks
[19:18] <rweeks> that's a LOT
[19:18] <nhm> jmlowe: wow, I'm surprised that was necessary!
[19:18] <jefferai> rweeks: ok, I'll post on ceph-devel
[19:19] <jmlowe> nhm: so am I, all of the memory tuning didn't seem to make a difference
[19:19] <nhm> mythzib: with 10 drives per server and 2x replication, it's possible with large write sizes. Here's some comparisons I did with 1x replication and 8 drives per storage node: http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/
[19:20] <mythzib> nhm: yep, already read :)
[19:20] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) has joined #ceph
[19:20] <mythzib> nhm: it's really about cephfs use in fact
[19:20] <nhm> jmlowe: I wonder if there is some strange interaction between your MB and card that makes that necessary.
[19:20] <mythzib> because it's not "production ready"
[19:20] <jmlowe> nhm: only needed for be2net emulex cards, my netxen cards are fine
[19:21] <jmlowe> nhm: also I think older kernels were ok, seems to be a recent problem
[19:21] <nhm> mythzib: Ah! Well, with large writes I think it will perform similarly. If you have a lot of small writes there will be advantages and disadvantages (caching, but also more overhead).
[19:22] <mythzib> ok
[19:22] <nhm> mythzib: it's definitely less mature than rbd or rgw, especially in multi-mds setups.
[19:22] <rweeks> nhm, have you considered doing a blog about network tuning for ceph?
[19:22] <mythzib> nhm: rbd has an unwanted behavior :(
[19:23] * gaveen (~gaveen@ has joined #ceph
[19:23] <nhm> rweeks: yes, I want to do a big messenger article. It's probably at least a month or two out.
[19:23] <mythzib> when a device is full and then empty, on OSD, it will still use all device space (on osd)
[19:23] <rweeks> ah cool
[19:23] <mythzib> it doesn't "clean" osd, when a file is deleted
[19:23] <mythzib> a function called "discard"
[19:24] <nhm> mythzib: ah yes. I think there has been some talk about that.
[19:25] <mythzib> nhm: discard will be add some days?
[19:27] <nhm> mythzib: actually, looking at the documentation, I see some mention of it: http://ceph.com/docs/master/rbd/qemu-rbd/
[19:27] <nhm> mythzib: are you using the qemu driver or kernel?
[19:27] <mythzib> nope
[19:27] <mythzib> hueu
[19:27] <mythzib> kernel
[19:28] <mythzib> I use kernel to use rbd device
[19:28] <nhm> mythzib: looks lik e there is a (old) feature request: http://tracker.newdream.net/issues/190
[19:30] <mythzib> very old :(
[19:33] <mythzib> I have to use QEMU to have discard/trim?
[19:33] <mythzib> it's not really clear
[19:35] <rweeks> http://ceph.com/docs/master/rbd/qemu-rbd/#enabling-discard-trim I have a question about this too
[19:35] <rweeks> It says "Note that this uses the IDE driver. The virtio driver does not support discard."
[19:35] <rweeks> can I assume then that the kernel RBD uses the virtio driver?
[19:36] <joshd> rweeks: that's the interface qemu uses to expose the rbd device (i.e. scsi, ide, or virtio), unrelated to the kernel driver
[19:36] <rweeks> ok
[19:36] <rweeks> thanks. :)
[19:36] <joshd> mythzib: rweeks: the kernel driver does not support discard yet
[19:37] <mythzib> joshd: :(
[19:38] <mythzib> it sucks :(
[19:40] <tnt> the kernel driver also doesn't support the v2 version of rbd image which is required for layering (clone images) which sucks even more.
[19:41] <mythzib> tnt: yep
[19:41] <mythzib> I don't care about cloning but I can imagine...
[19:42] <wer> If I reweight an osd, shouldn't I see the new value in ceph osd tree?
[19:43] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) has joined #ceph
[19:45] * BManojlovic (~steki@242-174-222-85.adsl.verat.net) has joined #ceph
[19:48] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) Quit ()
[19:50] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) has joined #ceph
[19:53] * Cube (~Cube@ has joined #ceph
[19:54] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) Quit ()
[19:55] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) has joined #ceph
[19:55] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) Quit ()
[19:57] <rektide> what's the non-kernel RBD driver?
[19:58] <rektide> qemu-rbd? are there others?
[19:58] * slang (~slang@cpe-66-91-114-250.hawaii.res.rr.com) has joined #ceph
[20:03] <rweeks> those are the only two I am aware of
[20:04] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[20:04] <jefferai> I thought qemu-rbd used the kernel driver
[20:05] <rweeks> doesn't seem so
[20:06] <benpol> jefferai: no qemu-rbd definitely does not use the kernel driver. AFAIK it's also the only way to use rbd format 2 images (which support the whole cloning/layering thing)
[20:08] <mythzib> nhm: very impressive v0.55 on write... 70MB/s stable on single SSD with replica size 2
[20:11] * vjarjadian (~IceChat7@027a4b0a.bb.sky.com) has joined #ceph
[20:13] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[20:15] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[20:24] * deepsa (~deepsa@ has joined #ceph
[20:25] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[20:29] <gregaf> were: there are two different weights you can set which are associated with the OSD
[20:30] <gregaf> if you set it and you aren't seeing it in the tree output, you've accidentally set the wrong one
[20:30] <gregaf> (which, yes, at one point at least was easier to set than the right one)
[20:30] * dmick (~dmick@2607:f298:a:607:2899:7889:ba56:8cb3) has joined #ceph
[20:31] <gregaf> http://ceph.com/docs/master/rados/operations/crush-map/#adjust-an-osd-s-crush-weight
[20:31] <gregaf> is the right one
[20:31] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[20:31] <tnt> there are 2 weights ? huh ... what does the wrong one do then ?
[20:31] <gregaf> there's the CRUSH weight, and there's also a monitor override to the CRUSH weight
[20:31] <gregaf> and honestly I don't remember exactly how it works
[20:32] <gregaf> calebamiles might know?
[20:33] <elder> I believe it's possible to put something in my yaml file that lets me run an arbitrary command. Can someone tell me how?
[20:36] <dmick> yyyyyeah.
[20:36] <dmick> um
[20:36] <dmick> no. I'm thinking python.
[20:36] <dmick> but
[20:36] <dmick> exec looks like your task
[20:37] <nhm> dmick: OK!
[20:37] <dmick> are....you a shell responding to me, nhm?
[20:37] <dmick> elder: yes, exec
[20:37] <dmick> see comments in that .py file
[20:38] <elder> Thanks dmick
[20:38] * wubo (~wubo@nat-ind-inet.jhuapl.edu) has joined #ceph
[20:38] <elder> Perfect.
[20:38] <calebamiles> one weight is really more of a toggle to mark a device in or out and the other relates to a devices's capacity
[20:39] <nhm> dmick: sorry, I thought you were doing your Lil Jon impression.
[20:39] <dmick> nhm: ah.
[20:39] <calebamiles> @gregaf
[20:39] <cephalobot`> calebamiles: Error: "gregaf" is not a valid command.
[20:41] <dmick> damn twitter users :)
[20:42] * wubo (~wubo@nat-ind-inet.jhuapl.edu) Quit ()
[20:42] * wubo (~wubo3@nat-ind-inet.jhuapl.edu) has joined #ceph
[20:44] * The_Bishop (~bishop@2001:470:50b6:0:c4ec:2f12:2ccd:748a) Quit (Ping timeout: 480 seconds)
[20:51] * loicd (~loic@jem75-2-82-233-234-24.fbx.proxad.net) Quit (Quit: Leaving.)
[21:17] * rlr219 (43c87e04@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[21:20] * gaveen (~gaveen@ Quit (Remote host closed the connection)
[21:21] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:31] * Ryan_Lane (~Adium@ has joined #ceph
[21:41] * vjarjadian (~IceChat7@027a4b0a.bb.sky.com) Quit (Ping timeout: 480 seconds)
[21:47] <joey__> hi all..
[21:48] <joey__> I got a new drop of ceph and noticed that the ceph-fuse rpm is missing a library
[21:49] <joey__> Please see: http://pastie.org/5495687
[21:49] * joey__ is now known as terje
[21:53] <terje> the ceph-fuse RPM should have dependencies for fuse and fuse-libs
[21:54] <terje> to make it all happy
[21:54] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[22:03] * The_Bishop (~bishop@2001:470:50b6:0:24dc:9482:b62d:3434) has joined #ceph
[22:07] * Cube1 (~Cube@ has joined #ceph
[22:08] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:08] * Cube (~Cube@ Quit (Read error: Connection reset by peer)
[22:13] <joao> should tcmalloc's heap profiler work with the monitors?
[22:13] <gregaf> it should, and I believe we've got it set up to do so, yeah
[22:14] <joao> for some reason it appears that I'm unable to get anything into the 'mon.X.profile.*.heap' file
[22:14] <gregaf> hmm, I think sjust had trouble with this last time he tried as well
[22:15] <dmick> joey__, terje, thanks
[22:15] <joao> maybe I'm not being patient enough...
[22:15] <gregaf> don't remember if we every diagnosed it
[22:15] <sjust> it currently seems to crash the osd
[22:15] <joao> I issue a 'heap dump', and the monitor just hangs
[22:15] <sjust> with a segfault somewhere in tcmalloc
[22:16] <joao> okay, before I go any further with this, would this help us in any way to figure out why the monitor is gradually eating more memory?
[22:16] <joao> or would I need to instrument the code in order to pinpoint the source?
[22:16] <gregaf> it was really nice for exactly that when it worked :(
[22:16] <gregaf> well, it would show you which functions were allocating memory
[22:16] <joao> right
[22:16] <gregaf> that was still in use in the heap
[22:17] <joao> will give this another shot
[22:17] <gregaf> sometimes you'd like to know more, but basic instrumentation would not add anything, no
[22:17] <joao> looks like the monitor's memory usage is gradually increasing
[22:17] <gregaf> I put a lot of sweat into making this work; I get annoyed when it fails :x
[22:19] <gregaf> let me know if I can help with it
[22:20] <joao> just confirm if this sequence of commands sounds right to you
[22:20] <joao> 'ceph -m ip:port heap start_profiler'
[22:20] <joao> 'ceph -m ip:port heap dump'
[22:20] <gregaf> yeah, should do it
[22:21] <gregaf> you could also do "ceph -m ip:port heap stats" instead of the "dump" to try and see if anything is working
[22:21] <joao> stats work
[22:21] <gregaf> that should send some basic info to the central log
[22:21] <joao> dump just hangs the monitor, but I will give it a bit longer to get on with it
[22:21] <gregaf> you can compare it to see if what "stats" says agrees with the amount of memory the daemon is actually using
[22:22] <gregaf> or try having the daemon start with the profiler on
[22:22] <gregaf> http://ceph.com/deprecated/Memory_Profiling
[22:22] <joao> yeah, so far, all the infos from 'stats' match top's
[22:22] <joao> aside from the monitor's virtual size
[22:22] <joao> but that's not that relevant anyway
[22:23] <joao> I'm more interesteed in figuring out why the resident memory is increasing 100k/sec or so
[22:23] <joao> (depending on workload)
[22:26] <joao> well, looks like the monitor to which I issued the dump is consuming a whole lot of CPU, so there's a chance this might be working on the background :)
[22:26] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:28] * The_Bishop (~bishop@2001:470:50b6:0:24dc:9482:b62d:3434) Quit (Remote host closed the connection)
[22:30] * The_Bishop (~bishop@2001:470:50b6:0:24dc:9482:b62d:3434) has joined #ceph
[22:32] <glowell> terje: I'll get that dependency added for the next release.
[22:35] <terje> cool, thanks.
[22:37] <terje> glowell: question..
[22:37] <terje> the ceph-fuse-mount command
[22:37] <glowell> Yes
[22:38] * nwat (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[22:38] <terje> do you have logging turned on by default?
[22:38] <terje> I have admin-socket on but logging is insanely noisy and filled our /var/ partition
[22:39] <dmick> admin socket is not related to logging level
[22:39] <terje> I recommend admin-socket but not a fuse log
[22:39] <terje> understand they are unrealted.
[22:40] <dmick> and I doubt, but am not sure that ceph-fuse does any adjustment of the logging requested in the conf
[22:40] <dmick> I could be wrong
[22:40] <glowell> I don't know. I packaged the script as provided via our professional services. I don't have any experience using it.
[22:40] <terje> ok
[22:40] <dmick> and, oh; I assumed you meant ceph-fuse, but this is something else?
[22:41] <terje> no, I mean ceph-fuse
[22:41] <terje> there is a ceph-fuse-mount script that comes with the ceph-fuse rpm
[22:42] <terje> it tells ceph-fuse to log to /var/log/ceph/fuse.log
[22:42] <dmick> ok. there is a command called ceph-fuse that I use for mounting FUSE cephfs's
[22:42] <dmick> I don't know where that -mount script lives, do you glowell?
[22:42] <terje> it lives in /sbin/
[22:42] <dmick> (in the source)
[22:42] <terje> ah
[22:43] <glowell> The ceph-fuse-mount script is only a wip branch
[22:43] * jlogan2 (~Thunderbi@2600:c00:3010:1:8474:22a3:9709:26c3) has joined #ceph
[22:43] <dmick> and we released it? yikes
[22:43] <glowell> wip-55-fuse-script
[22:44] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:46] <dmick> terje: I don't see any extra logging in that script as it exists in the branch
[22:46] <gregaf> wait, how did a work in progress get into a release glowell?
[22:46] * AaronSchulz (~chatzilla@ Quit (Remote host closed the connection)
[22:47] * jlogan (~Thunderbi@2600:c00:3010:1:7829:6346:542c:5cf8) Quit (Ping timeout: 480 seconds)
[22:47] <terje> thanks dmick
[22:47] <terje> i suspect that we may be the only ones using that
[22:48] <dmick> well but I'm not helping, I think I'm just confusing the issue
[22:48] <terje> we're actually paying customers (rather than general public)
[22:48] <terje> anyway .. off to the lab..
[22:49] <dmick> maybe you have a different version than this?...https://github.com/ceph/ceph/blob/wip-55-fuse-script/src/ceph-fuse-mount
[22:50] * tnt (~tnt@207.171-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[22:51] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[22:52] <janos> does anyone have any idea about how much disk space a monitor needs? i was goingt o set up some vm's to test some things
[22:53] <janos> i wasn't sure if they got very big due to logs or any sort of monitoring db
[22:53] <gregaf> depends how unhealthy your cluster is
[22:53] <gregaf> and if you're turning on debug logging, that'll take up a bunch of space
[22:53] <terje> dmick: yep, I do have something different. That looks good, thanks.
[22:53] <janos> i was going to do a larger-than-minimal 20gb allocation
[22:54] <janos> not a huge cluster - i'm still dipping my toes in - i'm new to this
[22:54] <gregaf> in terms of what the actual monitor stores, that's plenty
[22:54] <dmick> terje: glad to get that sorted; sorry for the temporary confusion on my end
[22:54] <gregaf> I don't think I've seen the monstore over a couple gigs, and that was a pretty unhappy cluster at the time
[22:54] <janos> cool, thanks
[22:55] <janos> i'm boud to have many noobish questions, though i try to read up first
[22:55] <terje> dmick: np
[22:55] <janos> boud/bound
[22:57] * BManojlovic (~steki@242-174-222-85.adsl.verat.net) Quit (Ping timeout: 480 seconds)
[22:59] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[23:00] * deepsa (~deepsa@ Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[23:01] <noob2> hey do you guys know if bobtail is coming out next week?
[23:01] <noob2> deciding on what version to go with
[23:01] * vata (~vata@ Quit (Quit: Leaving.)
[23:02] <sjust> noob2: i think it's more like 2 weeks from now
[23:02] <noob2> ok cool
[23:02] <noob2> thanks :)
[23:05] * noob2 (~noob2@ext.cscinfo.com) Quit (Quit: Leaving.)
[23:05] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[23:06] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[23:15] <jefferai> meh, majordomo is *not* working for me
[23:15] <jefferai> I can't subscribe to ceph-devel
[23:21] <dmick> jefferai: it can be a pain
[23:21] <jefferai> yeah
[23:21] <jefferai> dmick: / other Inktank guys: any comments on this? http://paste.kde.org/621938/
[23:21] <jefferai> I was going to put this on ceph-devel -- and still can, once I can manage to subscribe -- but I was going to attempt some work on the cluster tonight
[23:23] <dmick> actually I'm a little surprised we don't have upgrade documentation; searching
[23:24] <gregaf> jefferai: if you mark it "out" then the cluster will start re-replicating data away from it
[23:24] <gregaf> you probably want to first "ceph osd set noout", to prevent data migrations
[23:24] <gregaf> then mark them all down (but not out)
[23:24] <jefferai> gregaf: what if it can't replicate anywhere?
[23:24] <jefferai> e.g. I already have two copies of the data
[23:24] <jefferai> it'll just put two copies on one node?
[23:24] <gregaf> not if the CRUSH rules prevent that
[23:25] <gregaf> but "out" means "do not assign data to this node"
[23:25] <jefferai> I have default CRUSH rules
[23:25] <jefferai> hm, ok
[23:25] <gregaf> so it will try to put it somewhere else
[23:25] <dmick> you'll be degraded for a while, and you don't want to lose the one remaining OSD while you're doing that, but, once the new one comes back up, it'll catch up
[23:25] <dmick> right gregaf?
[23:25] <gregaf> yeah
[23:25] <jefferai> well, two remaining OSDs
[23:25] <dmick> ok
[23:25] <gregaf> then at the end when all the nodes are back up
[23:25] <jefferai> so set each osd that I'm taking out "noout"?
[23:25] <jefferai> and then mark them down?
[23:25] <gregaf> "ceph osd unset noout"
[23:25] <gregaf> no, "noout" is a cluster command that prevents any nodes from being marked out
[23:26] <jefferai> oh
[23:26] <gregaf> s/any/*any*
[23:26] <jefferai> okay
[23:26] <jefferai> different from marking a particular osd out
[23:26] <jefferai> which I know you can do
[23:26] <jefferai> I guess I'm not clear on the difference between out/in and up/down
[23:26] <gregaf> what you care about is marking the OSDs down before they actually get turned off, because that'll tell everybody it's down before it actually is
[23:26] <gregaf> up/down means "currently running and handling IO"
[23:26] <gregaf> (or not, in the down case)
[23:26] <jefferai> or, "not currently running"
[23:26] <jefferai> ok
[23:27] <jefferai> so if I have a replication factor of 3
[23:27] <jefferai> and I mark one of the OSDs handling data down
[23:27] <gregaf> in/out is "this OSD should be assigned data and considered to be one of the replicas" or not
[23:27] <jefferai> will that cause my VMs to freeze due to writes not being able to be satisfied?
[23:27] <jefferai> that's my real goal here, is not have that condition happen
[23:27] <joshd> jefferai: vger can be strict about mail, see http://vger.kernel.org/majordomo-info.html if you're having trouble sending or subscribing to ceph-devel
[23:27] <gregaf> if you only have 3 OSDs then the PGs will be "degraded" because it can't satisfy the replication requirements
[23:27] <jefferai> sure
[23:27] <jefferai> that I know
[23:28] <jefferai> but what I don't want is writes to block
[23:28] <gregaf> they won't prevent writes though, no
[23:28] <jefferai> ok
[23:28] <jefferai> so writes will only block if a node should be up/in
[23:28] <jefferai> but isn't
[23:28] <jefferai> right?
[23:28] <gregaf> correct
[23:28] <jefferai> so marking it down should ensure that it doesn't block
[23:28] <gregaf> it's when the daemon isn't there but the cluster still believes it is that you run into trouble
[23:28] <gregaf> correct
[23:28] <jefferai> I see
[23:29] <jefferai> so then this should be a good workflow: http://paste.kde.org/621950/
[23:29] <jefferai> that's an updated version
[23:30] <gregaf> as long as by "down/out" you mean merely "down", yes :)
[23:30] <jefferai> ah
[23:30] <jefferai> heh
[23:30] <gregaf> those are two separate states with two separate commands
[23:31] <jefferai> ok, http://paste.kde.org/621956/
[23:31] <jefferai> sure
[23:31] <jefferai> so any recommendations on write-back vs. write-through?
[23:31] <jefferai> with 3 replicas I imagine in many cases write-back should be safe up to a reasonable threshold
[23:31] <jefferai> although I'm not sure what that threshold might be
[23:31] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:32] <gregaf> the replica count doesn't impact your write-back versus write-through decisions
[23:32] <gregaf> those concerns are entirely local to the VM's host
[23:32] <jefferai> no, they impact my peace of mind :-D
[23:32] <jefferai> ah
[23:32] <jefferai> ok, fair enough
[23:33] <jefferai> that's a *client* issue...right
[23:33] <jefferai> so it's a matter of how safe I want my clients to be
[23:37] <jefferai> I was thinking it was a cluster issue since the cluster reporting back where the writes have been committed is what triggers the behavior
[23:52] * drokita (~drokita@ Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.