#ceph IRC Log


IRC Log for 2012-07-24

Timestamps are in GMT/BST.

[0:01] * jeffp (~jplaisanc@net66-219-41-161.static-customer.corenap.com) has joined #ceph
[0:05] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:06] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Read error: Connection reset by peer)
[0:09] <lurbs> Are there any performance tuning knobs that should be tweaked if the main use case for a cluster is for RBD backed VMs?
[0:10] <lurbs> Except rb_cache, of course.
[0:10] * MarkN (~nathan@ has joined #ceph
[0:10] <lurbs> rbd, rather.
[0:12] * MarkN (~nathan@ has left #ceph
[0:19] <nhm> lurbs: what kind of IO do you expect to be doing?
[0:21] * MarkN (~nathan@ has joined #ceph
[0:21] * MarkN (~nathan@ has left #ceph
[0:21] <lurbs> That's the tricky bit. Not sure what people will end up wanting to deploy inside those VMs. I'd expect something read heavy, tending towards fewer smaller files rather than fewer larger ones.
[0:22] <lurbs> Web servers, file servers, etc.
[0:27] <nhm> lurbs: We are actually working on some small write performance bugs right now, so I guess I'd say stay tuned. ;)
[0:28] <lurbs> Or untuned, for now? :)
[0:32] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[0:33] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:34] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[0:34] * MarkN (~nathan@ has joined #ceph
[0:36] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) Quit (Quit: LarsFronius)
[0:36] * MarkN (~nathan@ has left #ceph
[0:41] <nhm> lurbs: Maybe. ;) Feel free to try things and let us know what is terrible. ;)
[0:44] <lurbs> I did hit a nasty bug with the 3.2 kernels in 12.04 LTS where a sustained write inside a VM slowed a read (inside the same VM, but to a different backing RBD volume) way down.
[0:45] <lurbs> 3.5 kernels for the VM host and guest seemed to fix that.
[0:46] <gregaf> joshd was guessing that was because QEMU IO is single-threaded, but if kernel upgrades fixed it then maybe that wasn't the cause
[0:47] <lurbs> In iotop on the host I tend to see a bunch of threads for the same VM all doing IO.
[0:48] <lurbs> But it's more tricky to tell if it's going to a remote RBD volume, because it's not local IO and so I don't see it there at all.
[0:50] <iggy> depends on what disk emulation you use, ide is single threaded, none of the other are afaik
[0:50] <lurbs> It's all virtio.
[0:51] <iggy> and are you using the qemu rbd support or the kernel's rbd driver?
[0:51] <lurbs> QEMU. I've tried to avoid the kernel RBD driver.
[0:52] <iggy> okay, not too sure how that works, but the other drive backends can use aio simulated via a thread pool, so that might be the bunch of threads
[0:52] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Connection reset by peer)
[1:08] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:09] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:25] * Tamil (~Adium@2607:f298:a:607:497:2394:6f2a:eefa) has joined #ceph
[1:37] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:38] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[1:50] * MarkDude (~MT@c-67-170-181-8.hsd1.or.comcast.net) Quit (Read error: Connection reset by peer)
[1:50] * MarkDude (~MT@c-67-170-181-8.hsd1.or.comcast.net) has joined #ceph
[2:46] * themgt (~themgt@24-181-215-214.dhcp.hckr.nc.charter.com) Quit (Read error: Connection reset by peer)
[2:46] * Cube (~Adium@ Quit (Quit: Leaving.)
[2:46] * themgt (~themgt@24-181-215-214.dhcp.hckr.nc.charter.com) has joined #ceph
[2:50] * MarkDude (~MT@c-67-170-181-8.hsd1.or.comcast.net) Quit (Quit: Leaving)
[2:52] * Tamil (~Adium@2607:f298:a:607:497:2394:6f2a:eefa) Quit (Quit: Leaving.)
[2:56] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[3:03] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[3:32] * ogelbukh (~weechat@nat3.4c.ru) Quit (Ping timeout: 480 seconds)
[3:44] * ogelbukh (~weechat@nat3.4c.ru) has joined #ceph
[4:00] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:04] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[4:31] * glowell (~glowell@ Quit (Remote host closed the connection)
[4:33] * chutzpah (~chutz@ Quit (Quit: Leaving)
[5:19] * glowell (~glowell@ip-64-134-166-114.public.wayport.net) has joined #ceph
[5:48] * glowell (~glowell@ip-64-134-166-114.public.wayport.net) Quit (Remote host closed the connection)
[6:03] * glowell (~glowell@ip-64-134-166-114.public.wayport.net) has joined #ceph
[6:11] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[6:22] * MarkN (~nathan@ has joined #ceph
[6:23] * MarkN (~nathan@ has left #ceph
[7:10] * nhm_ (~nh@65-128-130-177.mpls.qwest.net) has joined #ceph
[7:15] * nhm (~nh@65-128-165-73.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[7:16] * MarkDude (~MT@c-50-137-1-13.hsd1.wa.comcast.net) has joined #ceph
[7:21] * sdouglas (~sdouglas@c-24-6-44-231.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[7:50] * MarkDude (~MT@c-50-137-1-13.hsd1.wa.comcast.net) Quit (Read error: Connection reset by peer)
[7:50] * MarkDude (~MT@c-50-137-1-13.hsd1.wa.comcast.net) has joined #ceph
[8:04] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:07] * loicd (~loic@magenta.dachary.org) has joined #ceph
[8:26] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[8:28] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit ()
[8:30] * loicd (~loic@magenta.dachary.org) Quit (Ping timeout: 480 seconds)
[8:54] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[9:01] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) Quit (Ping timeout: 480 seconds)
[9:07] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[9:08] * NaioN (~stefan@andor.naion.nl) Quit (Remote host closed the connection)
[9:09] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:11] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:19] * NaioN (~stefan@andor.naion.nl) has joined #ceph
[10:14] * BManojlovic (~steki@ has joined #ceph
[10:26] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:58] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[11:12] * loicd (~loic@ has joined #ceph
[11:20] <loicd> sileht: reading http://ceph.com/wiki/Benchmark
[11:21] <loicd> is it linked from the homepage or another page ? I'm not sure if there is a policy or anything.
[11:21] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[11:42] <loicd> sileht: ?
[11:43] <sileht> loicd, I made a link on the homepage in the misc section
[11:44] <loicd> sileht: ok. Did you find anyone here or on the mailing list suggesting it should be placed somewhere else ?
[11:46] <sileht> sileht, nop, I have planned to annonce it today on the mailing list
[11:46] <sileht> loicd, ^
[11:46] <loicd> sileht: ok
[11:47] <loicd> sileht: is there a page somewhere that explains how to read the results from http://ceph.com/wiki/Benchmark#RADOS_benchmark ?
[11:48] <sileht> loicd, only the man page of rados
[11:49] <sileht> loicd, the test start 16 threads and write many 4M blocks during 60s
[11:51] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[12:07] * ninkotech (~duplo@ has joined #ceph
[12:10] <loicd> sileht: ok.
[12:28] * sileht (~sileht@sileht.net) Quit (Ping timeout: 480 seconds)
[12:38] * sileht (~sileht@sileht.net) has joined #ceph
[12:52] * rosco (~r.nap@ Quit (Quit: *Poof*)
[12:52] * rosco (~r.nap@ has joined #ceph
[12:54] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) has joined #ceph
[12:58] <newtontm> iggy and gregaf: thx for the info
[13:40] * newtontm (~jsfrerot@charlie.mdc.gameloft.com) Quit (Quit: Lost terminal)
[13:57] * newtontm (~jsfrerot@charlie.mdc.gameloft.com) has joined #ceph
[13:58] <newtontm> is it possible to restrict a client.user access to a specific subfolder in your ceph folder tree?
[14:03] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[14:05] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[14:13] <iggy> newtontm: I think the answer is no (at least it was at one point), but you should verify that with the devs
[14:28] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[14:32] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:41] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[14:52] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[15:09] * glowell (~glowell@ip-64-134-166-114.public.wayport.net) Quit (Remote host closed the connection)
[15:41] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[15:55] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[15:56] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:04] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[16:06] <tnt> Is there a way to modify the mon address ? I did a quick test and used when creating the cluster but now I'd like to use an external ip to test non-local clients but if I just change the cfg file, it says config doesn't match monmap and still uses the old IP.
[16:42] <dspano> You could try this, I found it in the docs. ceph-mon -i newname --public-addr <ip:port>
[16:43] <dspano> replace newname with whatever the monitor id is. For example mine are 0,1,2 (mon.0, mon.1, mon.2)
[16:45] <tnt> That just starts the mon but specifying --public-addr is the same as mon addr in the config and this gives the warning and still starts on old address
[16:45] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[16:53] <tnt> Arf ... even the OSD are somehow registred as ... so no way it's gonna work.
[17:07] <nhm_> good morning #ceph
[17:09] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:10] <joao> hello nhm_
[17:11] <tnt> dspano: I ended up adding a new mon with the right address and remove the old one ...
[17:12] * gadago (~gavin@2001:9d8::223:54ff:fee2:f41d) has joined #ceph
[17:12] <gadago> hello
[17:12] <tnt> I have ceph health returning "HEALTH_WARN 1 pgs stale; 1 pgs stuck stale". I did a test by destroying an OSD (like if i had lost a hdd) and reconstructed it. Now some data had a replication size of 1 so I expect some data is lost. How can I tell it that it's "ok" and go back to healthy ?
[17:15] <gadago> ok, I have a question, here goes
[17:15] <gadago> I have just setup ceph using the getting started guide here http://ceph.com/docs/master/start/quick-start/#install-debian-ubuntu
[17:16] <dspano> tnt: Yeah, that'll work too.
[17:16] <gadago> and then mounted cephfs using sudo mkdir /mnt/mycephfs
[17:16] <gadago> sudo mount -t ceph /mnt/mycephfs
[17:16] <gadago> can someone explain why why I do a df -h
[17:16] <gadago> it shows my cephs mount pint as a size of 38GB ?
[17:17] <dspano> gadago: How much disk space do you have allotted to the cluster?
[17:17] <gadago> neither of my disks are that size
[17:17] <gadago> dspano: I'm not sure
[17:17] <gadago> I just used the getting started guide
[17:17] <dspano> gadago: It should be the sum of all the disk space you've alloted to your OSDs.
[17:18] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:18] <dspano> gadago: For instance, I have two OSDs with 2.2TB of space on each, so my mount shows 4.4TB.
[17:19] <dspano>,, 4.4T 56G 4.4T 2% /data
[17:19] <dspano> gadago: Check out Sebastien Hans howto also, I got my cluster running in a day with it.
[17:20] <dspano> http://www.sebastien-han.fr/blog/2012/06/10/introducing-ceph-to-openstack/
[17:20] <gadago> dspano: thanks I'll take a look
[17:21] <dspano> gadago: This one is helpful too. http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/
[17:21] <gadago> dspano: how do you specify the size of the OSDs?
[17:22] * lofejndif (~lsqavnbok@19NAAA8G0.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:23] <dspano> gadago: My disk space is on with LVM. I then have an xfs filesystem on that volume that ceph uses.
[17:23] <dspano> Sorry, I meant, is on LVM.
[17:23] <gadago> dspano: where in your ceph.conf do you pint ceph at the LVM?
[17:24] <gadago> *point
[17:25] <dspano> Here's the OSD part of my ceph.conf. http://pastebin.com/Zy0ZVN1u
[17:26] <dspano> Sorry. I missed the first part. Here's the whole thing. http://pastebin.com/WCUvTGhR
[17:26] <dspano> That part is in Sebastien's howto also.
[17:27] <gadago> dspano: many thanks - that's very helpful
[17:27] <gadago> one last thing
[17:28] <dspano> gadago: No problem.
[17:28] <gadago> do you know of any good tutorials for ceph + rados gateway?
[17:28] <gadago> I'm basically look to achieve being able to present object based storage via an s3 compatible interface
[17:29] <dspano> gadago: Personally, I haven't gotten into it yet.
[17:29] <dspano> gadago: I was thinking of using it as a swift replacement once I get my Openstack cloud of the ground.
[17:30] * BManojlovic (~steki@ Quit (Read error: Connection reset by peer)
[17:30] * BManojlovic (~steki@ has joined #ceph
[17:32] <dspano> gadago: Looking on google, I did see this if you're using Openstack it includes a part on the rados gateway. http://wiki.debian.org/OpenStackCephHowto
[17:36] <gadago> dspano: thank you. I had actually come across that one. The trouble I;m having is actually working out what component I need to achieve what I want
[17:36] <gadago> ceph/rados seems to be a very diverse beast
[17:38] <dspano> gadago: It sure is. Kind of like a storage swiss army knife.
[17:43] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:44] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:44] * lofejndif (~lsqavnbok@19NAAA8G0.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[17:46] <elder> http://ceph.com/gitbuilder-precise-kernel-amd64/ is out to lunch again.
[17:53] * lofejndif (~lsqavnbok@04ZAAEOI1.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:55] <Tv_> elder: it's disk image is trying to transfer between data centers
[17:56] <elder> I'm not sure what that means, but I'll accept that as an explanation.
[17:56] * tnt (~tnt@99.56-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[17:57] <Tv_> elder: we're getting rid of the old hardware & old hosting
[17:57] <elder> So the transfer of that VM is underway.
[17:57] <Tv_> yeah
[17:58] <elder> OK.
[17:58] <elder> Thanks.
[17:58] <Tv_> it's going at 30 MB/s again, but i think the transfer i left running overnight just crapped out
[17:58] <Tv_> well 30MB/s aggregate across 4 transfers
[17:59] <Tv_> it's unfortunately not DC-to-DC, because they're both in private networks.. i need to use the office as an intermediate hop :(
[17:59] <Tv_> and that makes it sloooow
[18:00] <elder> That's OK, as long as it's coming back again I can wait.
[18:02] <tnt> gadago: I find that the official howto works nicely ...
[18:04] <tnt> gadago: http://ceph.com/docs/master/start/quick-start/
[18:05] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) Quit (Quit: Leaving.)
[18:09] * BManojlovic (~steki@ has joined #ceph
[18:11] <gadago> tnt: I haev been using this. I just didn't find explaned things in enough depth
[18:12] * glowell (~glowell@ has joined #ceph
[18:13] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Quit: Ex-Chat)
[18:21] <Tv_> gadago: is http://ceph.com/docs/master/install/not enough depth? what more do you need?
[18:21] <Tv_> oops http://ceph.com/docs/master/install/
[18:21] <nhm_> ooh, mutrace looks useful.
[18:22] <gadago> Tv_: I'm looking for something that details the rados gateway set-up that will give me s3 compatible storage
[18:23] <Tv_> gadago: install ceph: http://ceph.com/docs/master/install/ configure it: http://ceph.com/docs/master/config-cluster/ set up radosgw: http://ceph.com/docs/master/radosgw/
[18:25] <gadago> Tv_: the first link you provide http://ceph.com/docs/master/install/ has many sections, for example 'Installing chef'. Do I even need that? How do I know which bits I need?
[18:26] <Tv_> gadago: "You may deploy Ceph with our mkcephfs bootstrap utility for development and test environments. For production environments, we recommend deploying Ceph with the Chef cloud management tool." -- they're alternative ways
[18:27] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[18:30] <andrewbogott> If I have a cephfs volume with two osd nodes and a replication of 2, it should be exactly as big as one of those nodes, right?
[18:30] <andrewbogott> When I do 'df' it says that it has size 2x which makes me think it isn't really replicating.
[18:31] <Tv_> andrewbogott: are the osds on the same server, using the same underlying disk?
[18:31] <andrewbogott> different servers, different disks
[18:32] <andrewbogott> which, btw, if someone can tell me how to specify replication count within ceph.conf, that'd rule. At the moment I'm doing it on the commandline after running mkcephfs, which is maybe my problem.
[18:47] * bchrisman (~Adium@ has joined #ceph
[18:48] * loicd (~loic@ Quit (Quit: Leaving.)
[18:50] <andrewbogott> Tv_: Am I misunderstanding what 'replication' means in this context?
[18:51] <gregaf> andrewbogott: because you can have different storage pools, each with their own replication count, Ceph simply reports the raw disk space available ?????it doesn't try and fit it in with a certain replication level
[18:51] <gregaf> if you had 10TB raw, and some of your data was in a 2x pool and some of it was in a 3x pool, would you want to be told that you had 5TB or 3.3TB? so it just says 10TB
[18:52] <andrewbogott> But??? df is showing me the size of a mounted volume. A single mounted volume can have different pools with different replications?
[18:53] <Tv_> andrewbogott: actually, yes ;)
[18:53] <Tv_> cephfs set_layout i recall
[18:53] <andrewbogott> Oh, but multiple volumes could be mounted into the same pool, and which one fills up first would affect the available size...
[18:53] <andrewbogott> This makes sense now. Thanks.
[19:02] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[19:03] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[19:04] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[19:04] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[19:04] * LarsFronius_ is now known as LarsFronius
[19:06] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) has joined #ceph
[19:09] * aliguori (~anthony@ has joined #ceph
[19:11] <Tv_> elder: gitbuilder-precise-kernel-amd64 is back
[19:12] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:13] <elder> Thank you.
[19:13] * andrewbogott_ (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[19:13] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Read error: Connection reset by peer)
[19:13] * andrewbogott_ is now known as andrewbogott
[19:16] * Cube (~Adium@ has joined #ceph
[19:22] * Leseb_ (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:22] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[19:22] * Leseb_ is now known as Leseb
[19:24] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[19:24] <elder> Tv_, my build shows " hostname: Name or service not known " at the end, resulting in a failure.
[19:26] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[19:36] * yehudasa (~yehudasa@2607:f298:a:607:c475:173a:4a0b:a6) Quit (Ping timeout: 480 seconds)
[19:39] <dspano> gregaf: I fixed the issue from yesterday. Not sure why, but I had to uninstall ceph, then re-install it and rebuild the OSD. Now they work.
[19:40] <dspano> gregaf: I initially upgraded the packages from the Ubuntu version to the ceph.com repository, so I'm not sure if that had anything to do with it or not.
[19:40] <gregaf> so you did do a ceph package upgrade too?
[19:40] <dspano> gregaf: I did that last week.
[19:41] <gregaf> that shouldn't have been an ongoing concern then???hmm
[19:41] <gregaf> all well, glad it works now!
[19:41] <dspano> Today, I purged the packages, then reinstalled it.
[19:41] <dspano> Lol.
[19:41] <dspano> gregaf: If it happens on the next kernel upgrade, I'll just rebuild from scratch.
[19:43] <dspano> I wonder if it had anything to do with stuff like librados because that all got uninstalled/reinstalled too.
[19:45] * yehudasa (~yehudasa@2607:f298:a:607:1815:c873:3856:4a95) has joined #ceph
[19:45] <dspano> gregaf: I forgot. Thanks for the help yesterday.
[19:47] <Tv_> elder: more detail please
[19:47] * mikeryan (mikeryan@lacklustre.net) has joined #ceph
[19:47] <elder> http://pastebin.com/HaPJt1ra
[19:48] <elder> The build seems to be near complete.
[19:49] <elder> That is from the end of this: http://ceph.com/gitbuilder-precise-kernel-amd64/log.cgi?log=ae80c31ef62f1babc7f9841659569510d2ae6108
[19:49] <Tv_> ah ok
[19:49] <Tv_> it seems /etc/hosts is broken on all these vms, and only worked because of their old dns location
[19:49] <Tv_> fixing
[19:50] <sagewk> mikeryan: can you take a quick look at bugfix-2799 patch?
[19:51] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[19:52] <sagewk> mikeryan: pretty trivial, but it got it wrong the first time..
[19:52] * chutzpah (~chutz@ has joined #ceph
[19:52] <mikeryan> sagewk: looking
[19:56] <Tv_> elder: fixed to work "as well as it ever did" ;)
[19:56] <Tv_> (dns outages break core functionality of the OS, but whatever)
[19:56] * adjohn (~adjohn@ has joined #ceph
[19:57] * adjohn (~adjohn@ Quit ()
[19:58] <elder> Thanks Tv_ I've started it up again.
[20:00] <nhm_> http://poormansprofiler.org/
[20:36] * mikeryan (mikeryan@lacklustre.net) has left #ceph
[20:36] * mikeryan (mikeryan@lacklustre.net) has joined #ceph
[20:37] <sagewk> nhm_: heh
[20:37] <sagewk> nhm_: let's see what it says :)
[20:40] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) has joined #ceph
[20:45] * andrewbogott_ (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[20:45] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Read error: Connection reset by peer)
[20:45] * andrewbogott_ is now known as andrewbogott
[20:52] * adjohn (~adjohn@ has joined #ceph
[21:11] * andrewbogott_ (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[21:11] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Read error: Connection reset by peer)
[21:12] * andrewbogott_ is now known as andrewbogott
[21:15] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has left #ceph
[21:17] * Cube (~Adium@ Quit (Ping timeout: 480 seconds)
[21:29] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[21:36] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[21:36] * adjohn (~adjohn@ Quit (Quit: adjohn)
[21:38] * nhorman (~nhorman@2001:470:8:a08:7aac:c0ff:fec2:933b) Quit (Ping timeout: 480 seconds)
[21:43] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Read error: Connection reset by peer)
[21:44] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[22:00] <nhm_> joao: ping
[22:04] * danieagle (~Daniel@ has joined #ceph
[22:05] <joao> nhm_, pong
[22:05] <joao> what's up?
[22:07] * lofejndif (~lsqavnbok@04ZAAEOI1.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[22:14] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) Quit (Quit: LarsFronius)
[22:15] <elder> joshd, Isn't it true that for a v1 image, when a "snap_add" method gets called, the header will get updated by the server. As a result, since the client had a watch out on that object, it will get a notification that should trigger a header refresh.
[22:16] <elder> What I'm getting at is that the client code now initiates a refresh of the image right after calling "snap_add".
[22:16] <elder> And I think that's redundant.
[22:16] * loicd (~loic@magenta.dachary.org) has joined #ceph
[22:17] <joshd> elder: yes, that's redundant, assuming watch/notify works correctly
[22:17] <elder> I'd really like to get rid of it.
[22:17] <joshd> go for it
[22:17] <elder> Is there testing that really beats the hell out of rbd snapshots?
[22:18] <joshd> not on the kernel side
[22:18] <elder> OK, well I"m going to need that. Another questino.
[22:18] <joshd> I'm not sure we really want to keep that snap_add interface in the kernel in any case
[22:19] <joshd> it's just extra work to maintain, and all the other management stuff is solely in the rbd cli tool
[22:19] <elder> Right after that refresh header, the code calls rbd_req_sync_notify().
[22:19] <elder> That apparently is requesting a new notification event.
[22:20] <elder> Do you have to re-arm that every time?
[22:20] <joshd> you mean in snap_add?
[22:20] <elder> Or maybe I'm misinterpreting what's going on.
[22:20] <elder> CEPH_OSD_OP_NOTIFY,
[22:20] <elder> Yes.
[22:20] <elder> Is this the client trying to send a notify to the server?
[22:21] <joshd> in snap_add, that's sending a notification to anyone else with the image open that a snapshot was added
[22:21] <joshd> yes
[22:21] <elder> Won't the server do that?
[22:21] <joshd> the server doesn't know a notification is needed
[22:22] <joshd> the client could batch several operations together, and only do a single notify after them all, for example
[22:22] <elder> So... See if this is right. The client, having just added a snapshot, is sending a request to the server to send out notifications to other clients?
[22:22] <joshd> yes
[22:23] <elder> I don't understand why the server couldn't "know" that a notify was needed.
[22:24] <elder> The server, after all, is the one that knows who might have that image open.
[22:24] <joshd> well, with the old format, the client was sometimes writing directly to the header instead of calling a class method
[22:25] <elder> It doesn't look to me like it does that any more.
[22:25] <elder> It calls the "add_snap" method, then refreshes (re-reads the header)
[22:25] <joshd> it still does for resizing
[22:26] <elder> I don't see that. Can you point me at where roughly in the kernel code that happens?
[22:27] <joshd> it's not in the kernel code
[22:27] <elder> Oh. Well that's all I'm looking at right now.
[22:27] <joshd> the userspace code has all the management operations like snap create/delete/rollback, rename, resize, etc
[22:28] <joshd> the kernel only has snapshot add, for no good reason
[22:28] <elder> Well then let me re-state my question then. Do you know why the kernel client needs to send a notify event to the server after a snapshot add?
[22:28] <elder> When it doesn't actually update anything on disk?
[22:28] <elder> (metadata-wise)
[22:30] <elder> yehudasa, maybe you have insight.
[22:31] <joshd> whenever the header changes, a notify needs to be sent. rbd_header_add_snap changes the header by calling the snap_add class method
[22:31] <elder> So because the kernel client initiated that operation, it also has to ensure the notification occurs?
[22:32] <joshd> yes
[22:32] <elder> And is that because there's not really an "rbd server," but just OSD's and monitors?
[22:33] <Tv_> umm, anyone remember why "mountall" would hang at boot on ubuntu 10.10?
[22:33] <joshd> yes, and if other clients don't notice the snapshot creation, it effectively did not happen
[22:33] <elder> Hence there is no "rbd server" that would know that others needed to know it changed. Got it.
[22:33] <joshd> that's why the notify is synchronous as well
[22:33] <elder> OK, I get it now.
[22:33] <elder> I'll delete the refresh, but will keep the notify.
[22:33] <joshd> ok
[22:34] <elder> We'll wait for the refresh to be triggered by a watch event.
[22:34] <joshd> yup
[22:35] <elder> OK. One last thing. Any suggestions on easily getting a test to beat on the kernel client snapshot functionality?
[22:36] <elder> I can write a script and maybe just do it manually. But if you have a better suggestion I'd love to hear it.
[22:37] <joshd> there aren't good tools to stress it other than bash scripts, but you could try to make the script generic so it could be used against the kernel or userspace
[22:38] <joshd> that is, put the kernel-specific bits in functions that could be changed to do the equivalent userspace thing
[22:38] <elder> OK. I'll start with something that will manually test stuff. After that I can figure out how to wedge it into broader application.
[22:39] <joshd> ok
[22:43] <Tv_> oh yay, one hundred 7-second timeouts!
[22:43] * Tv_ waits
[22:43] * adjohn (~adjohn@ has joined #ceph
[23:00] <joshd> elder: I just spoke with yehuda and sage, and they don't see any benefit to keeping the add_snap sysfs interface
[23:01] <joshd> elder: I'll send an email to the list, but I think we should just remove it
[23:12] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Read error: Connection reset by peer)
[23:14] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[23:16] <elder> joshd, sorry, my machine--or at least my display system--froze.
[23:17] <elder> I agree with that sentiment. The only reason to keep it would be for backward compatibility.
[23:17] <elder> And I'm not sure that's important.
[23:37] <sagewk> $ ./ceph --admin-daemon out/osd.0.asok version
[23:37] <sagewk> 0.49-244-g7d93a34
[23:38] <sagewk> that one people might actually notice when it turns into { "version": "0.49-244-g7d93a34" }
[23:45] <elder> joshd, just to clarify back on that redundant refresh call... The refresh that will be triggered via callback will occur because of the notify request *I* (the kernel client) initiate. Right?
[23:45] <elder> I.e., I initiate the notify because I just created the snapshot. And as a result, I (and any other clients) will get notified that there's a change to the header object that needs to be re-loaded.
[23:45] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Quit: andrewbogott)
[23:47] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) has joined #ceph
[23:48] <elder> That notify is done as a "best effort" and any errors are ignored. This may be why the "redundant" header refresh was needed, in the event that failed.
[23:49] <elder> Maybe I'll just hold off until we reach have concluded the whole hunk of functionality can be ripped out. (When can we decide that?)
[23:59] * andrewbogott (~andrewbog@50-93-251-66.fttp.usinternet.com) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.