#ceph IRC Log

Index

IRC Log for 2012-12-14

Timestamps are in GMT/BST.

[0:02] * The_Bishop (~bishop@e179010086.adsl.alicedsl.de) has joined #ceph
[0:03] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit ()
[0:03] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[0:05] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit ()
[0:05] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[0:06] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[0:06] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[0:07] * Cube (~Cube@12.248.40.138) has joined #ceph
[0:10] * occ (~onur@38.103.149.209) has joined #ceph
[0:12] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:12] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:13] * calebamiles1 (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[0:16] * jjgalvez1 (~jjgalvez@12.248.40.138) has joined #ceph
[0:19] * rweeks (~rweeks@0127ahost2.starwoodbroadband.com) has left #ceph
[0:22] * jjgalvez (~jjgalvez@12.248.40.138) Quit (Ping timeout: 480 seconds)
[0:22] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[0:22] * loicd (~loic@magenta.dachary.org) has joined #ceph
[0:22] * maxiz (~pfliu@222.128.143.175) Quit (Ping timeout: 480 seconds)
[0:24] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[0:27] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[0:29] * aliguori (~anthony@cpe-70-113-5-4.austin.res.rr.com) Quit (Remote host closed the connection)
[0:32] * jjgalvez1 (~jjgalvez@12.248.40.138) Quit (Quit: Leaving.)
[0:33] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Remote host closed the connection)
[0:37] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[0:39] * Cube (~Cube@12.248.40.138) has joined #ceph
[0:44] <joao> gregaf1, sagewk, pushed wip-3617 in case anyone wants to review the checks
[0:45] * Cube (~Cube@12.248.40.138) Quit (Read error: Operation timed out)
[0:46] * absynth_ (~info@ip-178-201-144-23.unitymediagroup.de) Quit (Quit: leaving)
[0:47] <gregaf1> joao: will do
[0:48] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[0:51] * gucki (~smuxi@HSI-KBW-082-212-034-021.hsi.kabelbw.de) Quit (Ping timeout: 480 seconds)
[0:54] <gregaf1> create, OSD, create!
[0:55] <terje> boy, twitter's new storage platform sounds familiar
[0:57] <Kioob> can I put disabled host in the CRUSH map ?
[0:57] <gregaf1> Kioob: what do you mean, disabled?
[0:58] <gregaf1> if you put a node in then it'll eventually get marked out so all data is allocated elsewhere, but the node will still be in there and get some hits (so it'll microscopically increase CRUSH calculation times)
[0:58] <Kioob> I start my cluster with 8 OSD on 1 host, with the default CRUSH map. Now, I would like to add 1 host and update the CRUSH map. I will repeat that 3 or 4 time
[1:00] <gregaf1> okay; I'm still not seeing the disabled part here :)
[1:00] <gregaf1> but I am getting the feeling that the answer is "yes"
[1:00] <Kioob> :)
[1:00] <Kioob> disabled is �not enabled for now�
[1:01] <gregaf1> if you add them with a weight of zero they won't get any data assigned
[1:01] <Kioob> in fact I have a �drbd cluster� of 6 hosts, which I convert host by host to RBD
[1:01] <dmick> Kioob: as in "not booted"? Or just "not participating in the cluster, for whatever reason"?
[1:01] <gregaf1> or you can add them to the map but not connect them to your root node, and then move them again later
[1:01] <Kioob> not participating dmick
[1:01] <Kioob> gregaf1 : weight 0, good idea. Thanks
[1:01] <dmick> oh so basically you just wanna batch your crushmap updates, and then make them start playing all at once
[1:02] <Kioob> the problem is that it's production data
[1:03] <Kioob> so I prefer to switch more progressively
[1:07] <Kioob> since I have data on the uniq host, I want to change the CRUSH map. I have to not change ID of the host and the root, then I need to �move the bucket� of the host in my rack bucket ?
[1:11] <gregaf1> Kioob: sounds like you just want to start with weights of zero on the "disabled" hosts
[1:11] <gregaf1> and then you can gradually turn them up to 1 (or whatever your standard is) once they're ready
[1:11] * Cube (~Cube@12.248.40.138) has joined #ceph
[1:13] <andreask> Kiobb: so you "convert" all secondaries to OSDs and then move data from DRBD primaries into the RADOS pool?
[1:13] * BManojlovic (~steki@85.222.178.194) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:14] <gregaf1> joao: wip-3617 looked good; I tested, merged, and pushed
[1:16] * Cube1 (~Cube@12.248.40.138) has joined #ceph
[1:17] <andreask> ah ... wrong nick ....
[1:17] <andreask> Kioob: trying correct nick ;-) ... so you "convert" all secondaries to OSDs and then move data from DRBD primaries into the RADOS pool?
[1:18] * Cube1 (~Cube@12.248.40.138) Quit ()
[1:18] * Cube (~Cube@12.248.40.138) Quit (Read error: Operation timed out)
[1:20] <joao> gregaf1, cool, thanks!
[1:22] * Cube (~Cube@12.248.40.138) has joined #ceph
[1:22] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[1:23] * jlogan1 (~Thunderbi@72.5.59.176) has joined #ceph
[1:24] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[1:25] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[1:25] * Cube (~Cube@12.248.40.138) Quit ()
[1:26] <Kioob> Ok thanks andreask and gregaf1, I will do like that
[1:27] <andreask> Kioob: that was a question ;-)
[1:28] * jlogan (~Thunderbi@2600:c00:3010:1:e5d8:3402:7d8b:dbea) Quit (Ping timeout: 480 seconds)
[1:30] * Kioob (~kioob@luuna.daevel.fr) Quit (Remote host closed the connection)
[1:32] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[1:33] <Kioob> lol andreask
[1:33] <Kioob> so, yes, it was my idea
[1:34] * stxShadow (~Jens@ip-178-201-147-146.unitymediagroup.de) has left #ceph
[1:35] <andreask> Kioob: and how do yo plan to access the data?
[1:35] <Kioob> during conversion ?
[1:35] <andreask> later
[1:35] <Kioob> rbd
[1:36] <Kioob> (the kernel module version)
[1:36] <andreask> just wonder because typically on DRBD the data is accessed locally on the storage nodes
[1:36] <andreask> what is your use-case?
[1:37] <Kioob> yes, I have to add hosts for running VM
[1:37] <Kioob> it's for virtualization (Xen)
[1:39] <andreask> ah, only tried with kvm till now
[1:41] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:44] * Cube (~Cube@12.248.40.138) has joined #ceph
[1:44] <Kioob> the main �problem� I see, it's that Xen �Dom0� doesn't manage cache for block devices
[1:45] <Kioob> so booting from a RDB device is not fast
[1:45] <Kioob> (maybe it's a latency problem too)
[1:47] * maxiz (~pfliu@222.128.143.175) has joined #ceph
[1:51] * l0nk (~alex@173.231.115.58) Quit (Quit: Leaving.)
[1:51] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:51] * Oliver1 (~oliver1@ip-178-203-175-101.unitymediagroup.de) Quit (Quit: Leaving.)
[1:52] <andreask> hmm, yes ... with kvm and qemu-rbd you can also have caching
[1:54] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[1:54] <Kioob> I saw that someone use flashcache over RBD to have a local cache, thought SSD for example
[1:55] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:59] * jlogan1 (~Thunderbi@72.5.59.176) Quit (Ping timeout: 480 seconds)
[2:00] <paravoid> is anyone around that could give a clarification for #3615?
[2:01] <paravoid> Samuel Just was the person that replied to me on that bug
[2:01] <gregaf1> sjustlaptop might be around
[2:02] <gregaf1> what're you interested in, paravoid?
[2:02] <sjustlaptop> looking
[2:02] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[2:03] <sjustlaptop> paravoid: that's the missing pg info file one?
[2:03] <paravoid> so Samuel said "You might be able to recover the OSD by renaming the pg directory"
[2:03] <paravoid> yes
[2:03] <paravoid> which pg directory?
[2:03] <sjustlaptop> oh, right
[2:03] <sjustlaptop> the 6.7111 pg directory under current
[2:03] <paravoid> there is no such directory
[2:03] <sjustlaptop> what's the output of ls current/ | grep '6.7111'
[2:03] <sjustlaptop> ?
[2:04] <Kioob> joao, here an example of trying to map a block device, with an auth problem :
[2:04] <paravoid> nothing, I know how to ls :)
[2:04] <Kioob> root! faude:~# time rbd map slhoka-hdd --pool hdd3copies --id faude
[2:04] <Kioob> rbd: add failed: (5) Input/output error
[2:04] <Kioob> real 1m0.016s
[2:04] <Kioob> user 0m0.012s
[2:04] <Kioob> sys 0m0.004s
[2:04] <sjustlaptop> ah, sorry
[2:04] <paravoid> np :)
[2:04] <sjustlaptop> oh, sorry, the pgs are in the directory in hex
[2:04] <sjustlaptop> one sec
[2:05] <paravoid> oh
[2:05] <sjustlaptop> how about looking for a 6.1bc7 pg info
[2:06] <paravoid> there is both a pginfo and a directory
[2:06] <sjustlaptop> ok, copy both of those out of the way
[2:06] <paravoid> ah, zero sized
[2:06] <sjustlaptop> sorry rename both of those out of the way
[2:06] <paravoid> typical XFS
[2:06] <sjustlaptop> there should be a log as well
[2:06] <sjustlaptop> well, the reason we crash in this case is that we can't trust the contents of the store at all
[2:06] <sjustlaptop> writes are applied to xfs via a transactional interface
[2:07] <paravoid> I know, it was my fault for running with nobarrier
[2:07] <sjustlaptop> this error implies that our transaction failed to completely apply which is either a ceph bug or a filesystem problem
[2:07] <sjustlaptop> paravoid: yeah, just giving our thinking
[2:07] <paravoid> I assumed ceph would recover from a truncated file
[2:07] <paravoid> I still think that it shouldn't crash
[2:08] <paravoid> maybe say a big fat warning
[2:08] <sjustlaptop> well, the truncated file isn't the problem, if there is other corrupted data on the OSD due to the filesystem malfunction, we might propogate it to the healthy replicas
[2:08] <paravoid> "filesystem is corrupted or ceph has a bug. pg 6.1bc7 is missing, you can't trust the contents of the filesystem"
[2:08] <paravoid> or something along those lines
[2:08] <sjustlaptop> anyway, our error reporting could definitely be better
[2:08] <sjustlaptop> yeah, like that
[2:11] <paravoid> so, if e.g. there are zero-sized objects there in a different partition
[2:11] <paravoid> this might propagate to the other replicas?
[2:12] <paravoid> no other checks are being done? like the file in the filesystem having the right size?
[2:12] <paravoid> or is this about actually corrupted files?
[2:12] <sjustlaptop> actually corrupted files are among the many ways the filesystem might choose to screw us
[2:12] <paravoid> heh
[2:13] <sjustlaptop> we try to detect many of them, the logs (iirc) have checksumming
[2:13] <paravoid> aha
[2:13] <sjustlaptop> we compare object contents across replicas online to detect errors
[2:13] <sjustlaptop> that's scrub
[2:13] <sjustlaptop> but we can always to better
[2:13] <paravoid> yeah I've read about that
[2:13] <paravoid> thanks for all the info
[2:13] <sjustlaptop> sure, no problem
[2:14] <paravoid> oh and another random question
[2:14] <paravoid> because it just happened again
[2:14] <paravoid> whenever I e.g. add an OSD (like now)
[2:14] <paravoid> I see quite a few slow request warnings
[2:14] <paravoid> e.g. 2012-12-14 01:13:40.302812 osd.17 [WRN] 18 slow requests, 4 included below; oldest blocked for > 136.683210 secs
[2:14] <sjustlaptop> there are some settings you can tweak, one sec
[2:15] <paravoid> this was just a single OSD re-added in a cluster of 40
[2:15] <sjustlaptop> try setting 'osd recovery op priority' to 10
[2:15] <paravoid> it recovered now
[2:15] <sjustlaptop> and try setting 'osd max backfills to 3'
[2:15] <paravoid> but I'll try it for next time :-)
[2:16] <sjustlaptop> it's been a problem, bobtail (I think these specific things went in for 0.55) has some changes to relieve the problem like the osd recovery op priority settings and the max backfills settings to reduce the problem
[2:17] <sjustlaptop> we've got more things planned for the release after bobtail as well
[2:17] <paravoid> that's 0.55
[2:17] <sjustlaptop> yeah, but it defaults to 30
[2:18] <sjustlaptop> your version has it, and you can reduce the priority to reduce impact on client io
[2:18] <sjustlaptop> oh, is this rbd?
[2:18] <sjustlaptop> you might want to set 'osd max recovery chunk' to 8388608
[2:19] <paravoid> no it's not
[2:19] <paravoid> it's radosgw
[2:19] <sjustlaptop> how large are your objects?
[2:19] <paravoid> small to very small
[2:19] <sjustlaptop> <1MB?
[2:19] <paravoid> possibly
[2:19] <sjustlaptop> well, 0.56 will actually default to the settings I mentioned, so it probably won't hurt to set them
[2:20] * justinwarner (~ceg442049@osis111.cs.wright.edu) has joined #ceph
[2:20] <sjustlaptop> but the recovery priority and max backfills are most likely to help
[2:20] <paravoid> it's wikipedia's images
[2:20] <paravoid> and thumbnails, which tend to be very small
[2:20] <paravoid> I'll try this, thanks
[2:21] <sjustlaptop> cool, good luck
[2:21] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[2:21] <paravoid> do you also know radosgw to answer another minor question? :)
[2:21] <sjustlaptop> perhaps
[2:21] <paravoid> that's what you get from being helpful!
[2:21] <paravoid> I saw there is some support for adding multiple pools
[2:21] <paravoid> right now all of the objects are saved in .rgw.buckets
[2:22] <paravoid> I'm wondering how objects are going to be assigned in pools though
[2:22] <sjustlaptop> ah, that I don't know about
[2:22] <sjustlaptop> my very limited recollection is that objects have a different pool
[2:22] <sjustlaptop> ?
[2:22] <paravoid> basically I'd like to map different buckets to different pools
[2:22] <justinwarner> I was wondering if anyone could help, I'm following the walk through here: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/. I've done it successfully on one machine, however, while working on the next machine, on step 7 when you do: ceph-osd -i # --mkfs --mkkey, it gives me an error: "Must specify '--osd-data=foo' data path. From the example given after, it shows that this is first off, not required. Secondly, when checking th
[2:22] <sjustlaptop> yehudasa: would be the person to ask
[2:23] <paravoid> since we might have different constraints per bucket
[2:23] <justinwarner> (Sorry to interrupt your conversation)
[2:23] <sjustlaptop> justinwarner: no worries
[2:23] <paravoid> yeah, no worries :)
[2:24] <sjustlaptop> justinwarner: I'm not quite sure, but does the new machine have a ceph.conf in the right place with the osd data config set?
[2:24] <justinwarner> Yes
[2:24] <justinwarner> And the ceph.conf does match that of the other machine.
[2:24] <sjustlaptop> does it include entries for the new osd?
[2:25] <justinwarner> It never said to update the ceph.conf on the other osd, which that could be a problem. The new one, which is on the main machine (Admin?) is up to date on it, and the one I'm adding.
[2:25] <justinwarner> (Other osd being the one I added earlier and is working).
[2:25] <yehudasa> paravoid: radosgw-admin pool add
[2:25] <paravoid> yehudasa: hi
[2:25] <paravoid> yes, I've seen that
[2:25] <yehudasa> but you can't control which bucket goes to which pool
[2:25] <paravoid> what happens if I have three pools though?
[2:25] <paravoid> how are objects assigned to pools?
[2:26] <yehudasa> as it is now, buckets will be placed on the same pool, which is the pool selected when they were created
[2:26] <yehudasa> the pool is selected randomally
[2:26] <sjustlaptop> justinwarner: can you pastebin the ceph.conf from the machine where you are running ceph-osd -i # --mkfs --mkkey
[2:26] <paravoid> so, there is a pool-per-bucket setting, it's just not manually selectable
[2:26] <paravoid> that's a shame
[2:27] <justinwarner> And this might relate, when mounting the file systems (On the one I set up earlier, and the current one I'm having problems with), I couldn't include the -o user_xattr. It worked without this, I'm also using btrfs. And I'll upload it, one sec.
[2:27] <paravoid> looks like the foundations are there and only missing a very small piece, but this might be a too high level interpretation :)
[2:27] <yehudasa> paravoid: there's an open issue for that and we'll probably revise it for the next stable release (the one follows bobtail)
[2:27] <sjustlaptop> btrfs I think always has user_xattr
[2:28] <justinwarner> http://pastebin.com/Aje3H1G4
[2:29] <sjustlaptop> justinwarner: how about the ceph.conf from the node where you were successful?
[2:29] <justinwarner> http://pastebin.com/tCgJZqy8
[2:29] <justinwarner> And do you mean it has user_xattr by default?
[2:29] <justinwarner> Or would I still have to specify?
[2:32] <sjustlaptop> justinwarner: I don't think you can disable it
[2:32] <paravoid> yehudasa: I can't seem to find that issue
[2:32] <justinwarner> Oh okay, then that works out.
[2:33] <yehudasa> paravoid: 2169
[2:33] <paravoid> ah!
[2:33] <paravoid> thanks :)
[2:34] <sjustlaptop> justinwarner: did you create the directory in sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
[2:35] <sjustlaptop> ?
[2:35] <sjustlaptop> justinwarner: what version are you running?
[2:35] <justinwarner> Yes.
[2:35] * LeaChim (~LeaChim@5ad684ae.bb.sky.com) Quit (Remote host closed the connection)
[2:35] <justinwarner> Version .41
[2:35] <sjustlaptop> not 0.48?
[2:36] <sjustlaptop> or 0.55?
[2:36] <justinwarner> # ceph -v
[2:36] <justinwarner> ceph version 0.41 (commit:c1345f7136a0af55d88280ffe4b58339aaf28c9d)
[2:36] <sjustlaptop> oh....
[2:36] <justinwarner> Should I update?
[2:36] * bigzaqui (~edward@200.8.115.193) has joined #ceph
[2:36] <sjustlaptop> the osd add/remove stuff has completely changed since then
[2:36] <bigzaqui> gentlemen
[2:36] <bigzaqui> anyone alive?
[2:36] <sjustlaptop> sure
[2:36] <bigzaqui> good :)
[2:36] <sjustlaptop> as far as you know...
[2:36] <justinwarner> Any reason it would work on the other machine and not this one?
[2:37] <sjustlaptop> not sure about that
[2:37] <justinwarner> Then I guess, how does one go about updating? Any documentation on that?
[2:37] <justinwarner> I'll Google around, probably find it.
[2:37] <sjustlaptop> justinwarner: ok, we'll probably be more helpful in about 14 hours
[2:38] <justinwarner> Lol.
[2:38] <justinwarner> You've been helpful =). Another quick question though, is their any specific way to upgrade Ceph among machines?
[2:38] <sjustlaptop> bobtail is probably 2-3 weeks from released, if you want to wait for that, fyi
[2:39] <justinwarner> Hm
[2:39] <sjustlaptop> starting the new version should upgrade the ondisk data automatically
[2:39] <justinwarner> Okay, I'm actually doing this with a professor at school, I'll send him an email and talk to him about that. It's winter break, so that actually works out.
[2:39] <sjustlaptop> ok
[2:39] <justinwarner> Thanks a lot!
[2:39] <sjustlaptop> sure!
[2:40] <bigzaqui> I'm having the weirdest problem right now, I'll explain it as simple as possible. I installed CentOS 6.3 (kernel 2.6... old) and I wanted to install ceph but after I followed all the steps the system crashed saying that the ceph module was missing... so I did some research and the problem was in the kernel, I needed a new one... so I downloaded The_Bishop 3.6.10 (stable), and executed the "make menuconfig", inside the network Item I selected the "Celph core
[2:40] <bigzaqui> library"
[2:41] <bigzaqui> and then, esc esc, esc esc and then did the make
[2:41] <bigzaqui> after 40 minutes, I checked the net/ceph folder, and there they were, ceph_fs.o, libceph.o and others
[2:42] <bigzaqui> ran the "make modules_install install"
[2:42] <bigzaqui> rebooted with that new kernel, but the module is still missing
[2:42] <sjustlaptop> modprobe ceph?
[2:42] <sjustlaptop> or ceph_fs?
[2:43] <bigzaqui> inside the /lib/modules/3.6.10/kernel/net/ there's no ceph folder
[2:43] <bigzaqui> I don't know what's happening, it's not being installed even though is being compiled..
[2:44] <bigzaqui> modprobe ceph
[2:44] <bigzaqui> -> FATAL: Module ceph not found.
[2:44] * maxiz (~pfliu@222.128.143.175) Quit (Quit: Ex-Chat)
[2:44] <sjustlaptop> oh, there might be another one under fs?
[2:44] <bigzaqui> in the make menuconfig?
[2:44] <sjustlaptop> yeah
[2:44] <sjustlaptop> I think you grabbed the common module for cephfs and rbd
[2:44] <bigzaqui> the only one I found was inside the networking
[2:46] <bigzaqui> I found it!
[2:46] <bigzaqui> god dammit
[2:46] <sjustlaptop> heh
[2:47] <bigzaqui> I'll compile again and let you know
[2:47] <bigzaqui> it was really deep
[2:47] <bigzaqui> File systems -> Network File systems -> Ceph distributed file system
[2:48] <bigzaqui> mm what the hell is the one in the networking then?
[2:48] <bigzaqui> "Ceph core library"
[2:49] <sjustlaptop> ceph and rbd share a messaging layer for communcation with the cluster
[2:49] <bigzaqui> well, I selected both and compile again, I'll tell you if it worked in 30 minutes
[2:49] <bigzaqui> thanks so much
[2:49] <yehudasa> paravoid: are you compiling from source?
[2:50] * justinwarner (~ceg442049@osis111.cs.wright.edu) has left #ceph
[2:50] <bigzaqui> one thing, do you think ceph is stable enough to use in a big project company?
[2:50] <yehudasa> paravoid: I pushed something, not tested, that allows specifying HTTP header to specify which pool to use when creating a bucket
[2:50] <bigzaqui> we were discussing if it's safe to use, considering that is still young
[2:50] <sjustlaptop> well, what do you want to use it for?
[2:51] <bigzaqui> we have 9 servers, with 12 hard drives each, each hard drive has 3 TB
[2:51] <yehudasa> paravoid: pushed it to wip-2169, you can get it there, done on top of current master
[2:51] <bigzaqui> we want to create a big pool with all that
[2:51] <sjustlaptop> so kernel cephfs?
[2:52] <yehudasa> praravoid: but note that we might not pick it up for upstream as we may want to implement it differently later
[2:52] <yehudasa> paravoid: but if you want to play with it you can
[2:52] <bigzaqui> we were thinking first in using zfs with AoE but the load is too big
[2:53] <bigzaqui> also talked with coraid but they said we need their special hardware, and we don't to spend more money
[2:53] <bigzaqui> so is ceph or other cluster file system
[2:54] <sjustlaptop> The filesystem parts (mds and kernel client) are less stable than the object layer at this time, you'd want to test your workload and see if you run into problems
[2:54] <sjustlaptop> the focus has been shifting to stabilizing the mds and fs
[2:55] <sjustlaptop> there are people taking cephfs based clusters into production, I tink
[2:55] <bigzaqui> the problem is that they already are using 4 servers to save a lot of info, so we cant "replicate" the same model we need with the 9 servers
[2:55] <bigzaqui> we will try it though
[2:55] <bigzaqui> if I need to search for another cluster FS, which one would you recommend?
[2:56] <sjustlaptop> people usually also mention glusterfs, but I think metadata throughput is frequently a concernt
[2:56] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[2:56] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[2:56] <sjustlaptop> inktank does sell cephfs support...
[2:59] <paravoid> yehudasa: oh! wow, thanks
[2:59] <paravoid> yehudasa: currently using ubuntu packages, but I guess I can rebuild
[2:59] <paravoid> not sure if I'd want to depend on an out of tree patch though :-)
[3:00] <paravoid> is there perhaps a way of running stock ceph but being to modify the bucket placement manually by running a command or modifying a file?
[3:01] <yehudasa> paravoid: once you create the bucket, it's being set
[3:01] <yehudasa> so if you create the bucket using a modified version of radosgw, you can continue using the stock ceph stuff with it
[3:02] <paravoid> aha
[3:02] <paravoid> maybe you forgot to push it?
[3:02] <paravoid> I see wip-2169 as being the same as master
[3:02] <yehudasa> paravoid: nope, it's pushed
[3:02] <yehudasa> ah, forgot to commit
[3:04] <yehudasa> paravoid: pushed now
[3:04] <paravoid> looking
[3:04] <yehudasa> note that you can get the built packages off the gitbuilder
[3:04] <paravoid> amazing response, thanks :-)
[3:05] <yehudasa> add http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/wip-2169 to your apt repository
[3:05] <paravoid> oh wow :)
[3:05] <yehudasa> but, again, all the relevant desclaimers hold here
[3:05] <yehudasa> it might take a bit to build though
[3:05] <paravoid> hehe yes
[3:05] <yehudasa> http://ceph.com/gitbuilder.cgi
[3:07] <bigzaqui> I compiled again, this time selecting ceph in the networking menu and also in the network file system menu, didn't work
[3:07] <bigzaqui> I'll run make clean, and then compile again to see if that has something to do, which I doub
[3:07] * jksM (~jks@3e6b7199.rev.stofanet.dk) has joined #ceph
[3:07] * jks (~jks@3e6b7199.rev.stofanet.dk) Quit (Read error: Connection reset by peer)
[3:08] <paravoid> yehudasa: why not merge it as it is if I may ask? what's the alternative plan?
[3:11] <sjustlaptop> bigzaqui: hmm, that should have worked, yehudasa thoughts?
[3:12] * llorieri (~llorieri@177.141.245.115) has joined #ceph
[3:12] <llorieri> hi
[3:13] <dmick> hi llorieri
[3:14] <dmick> I was just listening to Ella Fitzgerald the other night singing Lorelei
[3:14] <dmick> ;)
[3:14] <llorieri> dmick: :)
[3:14] <llorieri> is it the same as Tom Tom Club Lorelei ?
[3:14] <llorieri> I will check later :P
[3:14] <dmick> I doubt it; old jazz standard, but I'll check
[3:14] <dmick> probably same myth
[3:14] <llorieri> I just installed a ceph cluster
[3:15] <llorieri> it worked as a charm :)
[3:15] <bigzaqui> I'm looking for some manual where they had to compile a new kernel but I can't find it... I'll keep looking
[3:15] <llorieri> it is for some guys that wants to have a sandbox for S3
[3:16] <llorieri> the only thing I could not do so far is to grant access of a bucket for multiple accounts
[3:18] <dmick> like, ACL?
[3:18] <llorieri> yes
[3:19] <dmick> so you were sending an S3 request to do this?
[3:20] <llorieri> yes
[3:20] <dmick> sorta trying to get you describe what you did and what went wrong :)
[3:20] <llorieri> but it is not really necessary, it would help to use s3cmd or boto, but not necessary
[3:22] <llorieri> I dont know what to do ehhehehehe
[3:23] <dmick> oh you're asking how to do this?
[3:23] <llorieri> first if that is possible
[3:23] <llorieri> I only found exemples to make buckets and objects public or private
[3:23] <dmick> well I don't know much
[3:23] <dmick> but here's some doc
[3:23] <dmick> http://ceph.com/docs/master/radosgw/s3/bucketops/
[3:23] <dmick> that contains a description of ACL operations
[3:24] <dmick> specifically PUT BUCKET ACL
[3:24] <dmick> seems like generic S3 operation
[3:27] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[3:29] <llorieri> I believe it is by policies, but the docs says Policy (Buckets, Objects) Not Supported ACLs are supported
[3:31] <dmick> not sure what you mean
[3:32] <llorieri> I understood acls are: make it public, make it private
[3:33] <llorieri> policies: grant that ip, or that authenticated user some access
[3:33] <llorieri> something like that
[3:37] <dmick> well, yes, but, isn't that what you want?
[3:39] <llorieri> I would like to give write perms to more than one account
[3:40] <dmick> right, so Grantee would be teh other account, and Permission would be "write", yes?...
[3:40] <dmick> but this is basic S3; there are probably better references than the Ceph docs
[3:42] <llorieri> yes, that is :)
[3:42] <llorieri> cool, I will check it
[3:44] <dmick> http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTBucketPUTacl.html for example
[3:44] <dmick> http://docs.amazonwebservices.com/AmazonS3/latest/dev/ACLOverview.html basic info
[3:46] <llorieri> I just could connect using boto, it was trying SSL
[3:46] <llorieri> thanks
[3:47] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[3:48] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[3:55] * MarkN (~nathan@197.204.233.220.static.exetel.com.au) has joined #ceph
[3:59] * MarkN (~nathan@197.204.233.220.static.exetel.com.au) has left #ceph
[4:00] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[4:00] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[4:05] * deepsa (~deepsa@101.62.35.24) has joined #ceph
[4:14] <infernix> can I in theory run HA-LVM on a rbd device and map it on multiple hosts?
[4:14] * renzhi (~renzhi@116.226.37.139) has joined #ceph
[4:16] <joshd> if HA-LVM takes care of coordinating access so only one host is doing I/O to a portion of the device
[4:16] <infernix> yeah
[4:16] <infernix> that's the HA part of it
[4:16] * MarkN1 (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[4:16] * MarkN1 (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[4:16] <joshd> should be fine then
[4:17] <infernix> interesting. but then i'd need a recent version of kernel rbd in centos 6
[4:17] <infernix> i wonder if I can make that work
[4:18] <joshd> giving up on python, or is this for another application?
[4:18] <infernix> no this is separate :)
[4:19] <infernix> i will still need to use multiprocessing to speed up reads. probably not writes
[4:19] <infernix> but i'm thinking of more ways of using ceph, in this case as a shared block device
[4:20] <renzhi> is there a way change the oid of an object in rados? Just curious
[4:21] <infernix> joshd: i haven't gotten to writing that python multiprocessing goodness yet, but when it's done i'll pastebin it
[4:21] <dmick> renzhi: since that could change its placement, no, I'm guessing
[4:21] <joshd> infernix: cool, looking forward to it
[4:21] * deepsa (~deepsa@101.62.35.24) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[4:22] <dmick> or IOW: ./rados cp ; ./rados rm :)
[4:22] <renzhi> dmick: just curious, why would changing oid change its placement?
[4:23] <dmick> because basically placement is a hash of oid
[4:23] <joshd> placement goes hash(oid) -> pgid -> crush(pgid, crushmap, osdmap) -> osds
[4:23] <renzhi> oh
[4:24] <llorieri> dmick: got it :)
[4:24] * deepsa (~deepsa@101.63.237.153) has joined #ceph
[4:24] <dmick> yay!
[4:24] <llorieri> dmick: thanks man :)
[4:24] <dmick> np
[4:26] <bigzaqui> one question, after I recompiled the kernel, using the make modules_install a lot of "ERROR: modinfo: could not find module " appeared
[4:26] <bigzaqui> does it have something to do with the fact that I already installed the same kernel?
[4:26] <llorieri> dmick: the real problem is the "is_secure=False" missing in the boto s3 connection
[4:27] <llorieri> dmick: second problem is using s3cmd to list the all buckets, when do it it shows only yours
[4:27] <llorieri> dmick: bye bye, and thanks again
[4:27] <dmick> ah, ok. and yw. have a good night.
[4:28] * llorieri (~llorieri@177.141.245.115) has left #ceph
[4:31] * sagelap (~sage@76.89.177.113) has joined #ceph
[4:42] * llorieri (~llorieri@177.141.245.115) has joined #ceph
[4:42] <llorieri> Hi dmick, one more question to help on my anxiety heheheh
[4:43] <llorieri> dmick: does it take long to free space after removing objects ?
[4:50] <llorieri> in that file: https://github.com/ceph/ceph/blob/ac92e4d6bd453ffc77e88ab3ec2d2015b70ba854/src/init-radosgw
[4:50] <llorieri> it checks the hostname
[4:50] <llorieri> I believe it supposed to be hostname -s
[4:51] <llorieri> line 51
[4:58] <infernix> so although it's recommended, I really don't like the idea of SSDs as journal disks
[5:00] <infernix> they will see continuous writes. if I'm writing and deleting like 24TB of data a day, all that data will go over those ssds. suppose i have 6 nodes, that's 4TB a day per SSD
[5:01] <infernix> even with intels new 3700 at 100GB, that's 467 days
[5:02] <infernix> are people using SLC? is there another alternative? I mean, with 12 disks per node and 5GB journal, I only need 60GB
[5:03] <infernix> maybe split it in two 32GB SLCs. but then one 32GB SLC needs to write at 600MB/s to keep the 6 disks running at speed
[5:07] <infernix> STEC PCIE SOLID STATE-ACCELERATOR 240GB SLC PCIE HH-HL does about 1.1GB/sec writes. but that's like $3,361.57
[5:08] <michaeltchapman> infernix: I think we were looking at FusionIO-style cards at one point for journals, but yeah they are expensive
[5:09] <infernix> the fastest and cheapest would be OCZ RevoDrive 3 X2 480GB but that's MLC
[5:09] <infernix> and that's still only 815MB/s writes, so still too little for a box with 12 disks
[5:10] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[5:11] <infernix> the iodrive is also mlc, as is the OCZ Z-Drive R4. even their enterprise model
[5:18] <infernix> so what about btrfs and no journal? or using a btrfs ioctl to bypass writing the data the second time?
[5:18] <infernix> i remember reading about that
[5:20] <infernix> "One potential optimization for the journal file case would be to use btrfs's clone ioctl to avoid writing data twice (at least for big writes). Write the journal transaction, and then clone the data portion into the target file. This is somewhat tricky to implement but could improve performance on disk-only hardware."
[5:21] <infernix> http://ceph.com/w/index.php?title=OSD_journal - but that was in march 2011
[5:21] <infernix> did that get implemented?
[5:21] * renzhi (~renzhi@116.226.37.139) Quit (Quit: Leaving)
[5:30] <slang> infernix: I don't see that ioctl being used anywhere in the code
[5:30] <slang> infernix: I don't think that's been done yet
[5:37] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:37] * llorieri (~llorieri@177.141.245.115) Quit ()
[5:37] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:40] <tore_> try a LSI nytro warpdrive 1.6TB acceleraion card. those are fun if money is not an option. http://www.lsi.com/products/storagecomponents/Pages/NytroWarpDriveBLP4-1600.aspx
[5:40] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:40] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:41] <tore_> s/option/problem/
[5:41] <dmick> infernix: btrfs promises wonderful things
[5:41] <tore_> BLP4-1600 should be like $16.5k
[5:44] <infernix> tore_: the price has to be in line with the hardware
[5:46] <infernix> my storage nodes are about $3600 + 12*$160 3TB
[5:46] <infernix> adding a $16k card is not realistic :)
[5:47] <infernix> $500 orso would work. but for $500 i can't find slc/emlc that does 1200MB/sec, nor 2 at 600mb/sec
[5:47] <tore_> it's cheaper per GB than the setc ssd's you mentioned
[5:47] <tore_> stec
[5:47] <infernix> but i only need 60GB journal
[5:47] <infernix> 5GB per disk
[5:48] <infernix> that btrfs clone ioctl is probably the best bet
[5:48] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: This computer has gone to sleep)
[5:49] * deepsa_ (~deepsa@115.242.168.213) has joined #ceph
[5:50] * deepsa (~deepsa@101.63.237.153) Quit (Ping timeout: 480 seconds)
[5:50] * deepsa_ is now known as deepsa
[5:55] <infernix> slang: int FileStore::_do_clone_range(int from, int to, uint64_t srcoff, uint64_t len, uint64_t dstoff)
[5:55] <infernix> it's in there
[5:56] <tore_> no slc/emlc is not going to be that cheap
[5:56] <infernix> mount btrfs CLONE_RANGE ioctl is supported
[5:57] <infernix> tore_: stec mach16 50GB is $617.98. two of those. but one does only 190mb/s
[5:57] <infernix> that's slc
[5:57] <michaeltchapman> how can I check what the version the current loaded rbd kernel module is?
[5:59] <infernix> slang: i think it's just in filestore, not the journaling part
[5:59] <infernix> but not sure
[6:00] <slang> infernix: clone range is a separate ioctl from clone
[6:03] <infernix> slang: ah. so btrfs_ioctl_clone is in btrfs, just not in ceph yet.
[6:09] <tore_> take a look at this: http://www.acard.com/english/fb01-product.jsp?idno_no=270&prod_no=ANS-9010&type1_idno=5&ino=28
[6:10] <tore_> supposedly you can get this for $350 plus the cost of the ram and cf card. some people have reported that it can be configured for 64gb @ 400mb/s
[6:10] <infernix> tore_: already considered it, but impossible to fit in a supermicro server, and battery maintenance is meh
[6:11] <infernix> over time it'll be dead, and then when there's a power failure, journal is wiped and osd is considered dead
[6:12] <infernix> also, limited by sata 3gbit
[6:12] <tore_> some people wire an external dc power supply in to compensate for that. usually the major problem with the batteries is leakage
[6:12] <infernix> no, i think the btrfs clone ioctl is the best way
[6:13] <infernix> write data to btrfs journal file with directio, write metadata, then clone eg update only metadata
[6:13] <infernix> not sure why it didn't get implemented yet but i suppose lack of time :)
[6:15] <infernix> you only write to disk once so i suppose an almost 2x performance gain in write speed
[6:15] * houkouonchi-work (~linux@12.248.40.138) Quit (Ping timeout: 480 seconds)
[6:17] <michaeltchapman> I'm seeing the same error (rbd: add failed: (2) No such file or directory) as here: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/9168 . Josh's answer says to use the userspace side instead of the kernel module, how do I do this?
[6:19] * The_Bishop (~bishop@e179010086.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[6:20] <slang> infernix: yeah major kudos if you do add btrfs clone ioctl for journal
[6:24] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[6:24] * ChanServ sets mode +o scuttlemonkey
[6:28] * The_Bishop (~bishop@e179010086.adsl.alicedsl.de) has joined #ceph
[6:29] <slang> michaeltchapman: I think what josh means by userspace is librbd
[6:29] <slang> michaeltchapman: if you want to play with that, you can probably use the rbd python module: http://ceph.com/docs/master/rbd/librbdpy/
[6:29] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[6:29] <slang> michaeltchapman: I think it depends on what you're using rbd for though
[6:30] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[6:30] <michaeltchapman> slang: I'm trying to attach it to an openstack cluster. Cinder is configured to make images using --new-format. and then the compute nodes can't attach the image because ceph rbd map fails
[6:36] <michaeltchapman> slang: it looks like it goes to rbd --help and searches for 'clone' and then assumes that if it can clone then it can make things using new-format.
[6:39] <slang> michaeltchapman: this sounds like something Josh is better suited to help with
[6:40] <slang> michaeltchapman: can you send an email to ceph-devel?
[6:43] <michaeltchapman> slang: Yep. I'm just trying to confirm I haven't got version mismatches between the clients and the cluster.
[6:46] * ircolle (~ircolle@c-67-172-132-164.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[6:57] * The_Bishop (~bishop@e179010086.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[7:07] * edward_ (~edward@200.8.115.193) has joined #ceph
[7:09] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[7:09] * houkouonchi-work (~linux@12.248.40.138) Quit ()
[7:10] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[7:14] * bigzaqui (~edward@200.8.115.193) Quit (Ping timeout: 480 seconds)
[7:28] * edward_ (~edward@200.8.115.193) Quit (Read error: Connection reset by peer)
[7:46] * loicd (~loic@magenta.dachary.org) has joined #ceph
[7:49] * Machske (~bram@d5152D87C.static.telenet.be) Quit (Quit: Ik ga weg)
[8:13] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:16] * low (~low@188.165.111.2) has joined #ceph
[8:17] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[8:19] * deepsa (~deepsa@115.242.168.213) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[8:26] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:26] * agh (~2ee79308@2600:3c00::2:2424) has joined #ceph
[8:26] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:26] <agh> Hello to all
[8:26] <agh> I've a problem, is someone here to help me ?
[8:26] * dmick (~dmick@2607:f298:a:607:b8a0:4e3e:8f14:62ad) Quit (Quit: Leaving.)
[8:27] * maxiz (~pfliu@202.108.130.138) has joined #ceph
[8:50] * ebo^ (~ebo@233.195.116.85.in-addr.arpa.manitu.net) has joined #ceph
[8:54] * ebo^ (~ebo@233.195.116.85.in-addr.arpa.manitu.net) Quit ()
[8:58] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[8:59] * fc (~fc@home.ploup.net) has joined #ceph
[9:01] <fghaas> hey joshd, do you still happen to be around or is this just your digital alter ego while you're fast asleep?
[9:05] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[9:06] * deepsa (~deepsa@115.241.67.11) has joined #ceph
[9:12] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[9:16] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:19] * maxiz (~pfliu@202.108.130.138) Quit (Quit: Ex-Chat)
[9:23] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:23] * loicd (~loic@178.20.50.225) has joined #ceph
[9:32] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[9:33] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[9:50] <agh> hello to all, is someone here ?
[9:54] <fghaas> agh: about 146 people, it seems :)
[10:01] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[10:05] <Kioob`Taff> OSD recovery works very well, thanks !
[10:14] <agh> Does anybody try Ceph argonaut with XFS OSD and 3.6.9 kernel ?
[10:14] <agh> It seems that there is a strange bug with it on OSDs
[10:15] <fghaas> agh: if you can provide a stack trace, that would be useful
[10:15] <fghaas> or however that bug manifests in dmesg
[10:15] <fghaas> just pastebin that and share the url here
[10:16] <agh> yes, here it is :http://pastebin.com/MF0QGS74
[10:16] * LeaChim (~LeaChim@5ad684ae.bb.sky.com) has joined #ceph
[10:16] <agh> I rebuild by cluster, but in Btrfs, and it seems to work better
[10:16] <fghaas> well that's just a slow I/O path that causes your ceph-osd to be stuck in the D state
[10:16] <fghaas> check your storage hardware would be my first reaction
[10:17] <fghaas> there were some xfssyncd issues in the 3.2 kernel timeframe, but those should have been resolved since
[10:17] <agh> fghaas: mmm.. yes but ok, that's true, i did tests with old machines... BUT, what if a node goes wrong in a production cluster ?
[10:18] <fghaas> sysctl -w kernel.hung_task_timeout_secs = 30
[10:18] <fghaas> sysctl -w kernel.hung_task_panic = 1
[10:19] <agh> fghaas: because, the problem is that, with a size replication of 3, and a per-host replication, the outage of one OSD made my cluster almost down : a lot lot lot of "slow request"
[10:19] <fghaas> means as soon as a node gets stuck, poof it leaves the cluster
[10:19] <agh> ah great :) it what i was looking for !
[10:19] <agh> so i have to type theses 2 commands on each OSD ?
[10:19] <fghaas> note, your OSD is not suffering an outage. it's just excruciatingly slow, i.e. an I/O tarpit
[10:20] <fghaas> I'm actually not sure if ceph-osd has some built-in mechanisms to deal with this better, but it tends to be a tricky issue... processes stuck in D are un-killable. joao, any additional input?
[10:21] <fghaas> agh: no, you'd likely just set that in /etc/sysctl.conf so those tunables are set automatically on system boot
[10:21] <agh> ok, thanks a lot. I will note that
[10:21] <agh> BTW, did you ever try to put journal on a ramfs disk ?
[10:22] <fghaas> sure, but Kids, Don't Try This At Home™
[10:23] <fghaas> your journal is meant to be persistent. if you want it to be fast, use an SSD or DDR-drive or fusion-io card
[10:23] <agh> ok, yes sure.
[10:23] <agh> And, is there a way to simulate a disk failure ? not only a "on/off" test, but a real life failure ?
[10:26] <fghaas> sure, slap your OSD on a device mapper target and suspend it, or use the "error" target
[10:27] <Kioob`Taff> agh: Does anybody try Ceph argonaut with XFS OSD and 3.6.9 kernel ? <== I didn't a lot of test, but yes it was my setup, before I upgrade to Ceph v0.55
[10:28] * yoshi (~yoshi@80.30.51.242) has joined #ceph
[10:28] <Kioob`Taff> (10:23:03) fghaas: your journal is meant to be persistent. if you want it to be fast, use an SSD or DDR-drive or fusion-io card <== I add, use a *fast* SSD
[10:29] <Kioob`Taff> and there is not a lot of fast SSD which doesn't loose data in case of power failure
[10:30] <agh> Ceph 0.55 is stable ? better than argonaut ?
[10:31] <Kioob`Taff> not «stable», it will be in some weeks I read
[10:31] * deepsa_ (~deepsa@115.242.207.252) has joined #ceph
[10:31] <Kioob`Taff> it's the last version before next stable, 0.56
[10:32] * deepsa (~deepsa@115.241.67.11) Quit (Ping timeout: 480 seconds)
[10:32] * deepsa_ is now known as deepsa
[10:59] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[11:01] * dxd828 (~dxd@195.191.107.205) has joined #ceph
[11:04] * dxd828 (~dxd@195.191.107.205) Quit ()
[11:25] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has left #ceph
[11:26] * yasu`_ (~yasu`@99.23.160.146) has joined #ceph
[11:27] * yoshi_ (~yoshi@80.30.51.242) has joined #ceph
[11:27] * silversu_ (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:27] * mikedawson_ (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[11:27] * Leseb_ (~Leseb@193.172.124.196) has joined #ceph
[11:28] * gohko_ (~gohko@natter.interq.or.jp) has joined #ceph
[11:29] * jpieper_ (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[11:30] * yoshi (~yoshi@80.30.51.242) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * Leseb (~Leseb@193.172.124.196) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * sagelap (~sage@76.89.177.113) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * occ (~onur@38.103.149.209) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * yasu` (~yasu`@99.23.160.146) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * guigouz1 (~guigouz@177.33.243.196) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * gohko (~gohko@natter.interq.or.jp) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * michaeltchapman (~mxc900@150.203.248.116) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * terje_ (~joey@71-218-25-108.hlrn.qwest.net) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * terje (~terje@71-218-25-108.hlrn.qwest.net) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * kbad_ (~kbad@malicious.dreamhost.com) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (resistance.oftc.net oxygen.oftc.net)
[11:30] * Leseb_ is now known as Leseb
[11:30] * mikedawson_ is now known as mikedawson
[11:33] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[11:33] * sagelap (~sage@76.89.177.113) has joined #ceph
[11:33] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[11:33] * occ (~onur@38.103.149.209) has joined #ceph
[11:33] * guigouz1 (~guigouz@177.33.243.196) has joined #ceph
[11:33] * flakrat (~flakrat@eng-bec264la.eng.uab.edu) has joined #ceph
[11:33] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[11:33] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[11:33] * michaeltchapman (~mxc900@150.203.248.116) has joined #ceph
[11:33] * terje_ (~joey@71-218-25-108.hlrn.qwest.net) has joined #ceph
[11:33] * terje (~terje@71-218-25-108.hlrn.qwest.net) has joined #ceph
[11:33] * kbad_ (~kbad@malicious.dreamhost.com) has joined #ceph
[11:33] * gohko (~gohko@natter.interq.or.jp) Quit (Ping timeout: 480 seconds)
[11:40] * deepsa_ (~deepsa@106.221.169.184) has joined #ceph
[11:41] * deepsa (~deepsa@115.242.207.252) Quit (Ping timeout: 480 seconds)
[11:41] * deepsa_ is now known as deepsa
[11:54] * silversu_ (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:54] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:19] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[12:25] * mark is now known as Guest1480
[12:25] * yasu`_ (~yasu`@99.23.160.146) Quit (Remote host closed the connection)
[12:36] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[12:51] * ebo^ (~ebo@icg1104.icg.kfa-juelich.de) has joined #ceph
[12:59] <francois-pl> Hi all :) I have some tuning question about ceph : i'm trying to create a ceph storage in a compute cluster... journal (1G) on system disk... and osd on a separate disk with xfs... but i have many slow request warning ("currently waiting for sub ops", "currently no flag points reached" and "currently delayed")
[13:00] <ebo^> i saw the same this disappeared as soon as i added more osds
[13:01] <francois-pl> I tried to modify "osd disk threads" (3), "filestore op threads" (4), "osd max write size" (50, due to some slower disks), "filestore max sync interval" (2) and "filestore queue max ops" (300)... do you think it's the good parameters to modify ?
[13:01] <ebo^> no idea
[13:01] <francois-pl> I already have 8 osd
[13:02] <francois-pl> The bigest problem is when big files are written : if too many slow request happen... writes are stopped and all request are delayed/stopped :(
[13:02] <ebo^> same here
[13:02] <ebo^> if you find out: tell me ;-)
[13:03] <francois-pl> (seem to be in a vicious circle...)
[13:03] <francois-pl> Thx for answer, i feel less alone ;)
[13:05] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[13:06] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit ()
[13:13] <Kioob`Taff> is it possible to reserve some OSD to 1 pool ?
[13:21] <jtang> morning #ceph
[13:21] <jtang> just to provide some feedback on failed osd's at our site, we like the behaviour of suspended disk io on the client side
[13:22] <jtang> we had a 100% of our osd's fail but the mons were on a different machine, and the system survived after a reboot
[13:22] <jtang> whilst the clients just waited for the osd's to comeback
[13:22] <jtang> no one had noticed that the mons had failed and in the mean time we generated about 2-3tb of new data on the system
[13:23] <jtang> so two thumbs up for ceph!
[13:43] * gregorg_taf (~Greg@78.155.152.6) Quit (Quit: Quitte)
[13:50] * deepsa (~deepsa@106.221.169.184) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[13:59] <ebo^> francois-pl: how are your osds connected to the network and do you use cephfs?
[14:12] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:27] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[14:38] * nosebleedkt (~kostas@213.140.128.74) has joined #ceph
[14:38] <nosebleedkt> hi everybody
[14:38] <nosebleedkt> i have an RDB device mounted on local filesystem
[14:38] <nosebleedkt> 256mb of capacity
[14:39] <nosebleedkt> when all capacity is filled
[14:39] <nosebleedkt> something marks it as read-only
[14:39] <nosebleedkt> who does this?
[14:41] * BManojlovic (~steki@91.195.39.5) Quit (Remote host closed the connection)
[14:43] <darkfaded> nosebleedkt: look in dmesg if you see a message about journal abort
[14:43] <nosebleedkt> ok let me see
[14:49] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[14:53] * aliguori (~anthony@cpe-70-113-5-4.austin.res.rr.com) has joined #ceph
[14:53] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[14:53] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[14:56] <nosebleedkt> darkfaded, no.
[14:56] <nosebleedkt> ah ok
[14:56] <nosebleedkt> that was happening in 0.48
[14:57] <nosebleedkt> now i have 0.55
[14:57] <nosebleedkt> and does not happen anymore
[15:00] <jtang> *sigh* i just noticed that archlinux is now using systemd
[15:00] <jtang> are upstart scripts/configs compatible with systemd
[15:00] * joao (~JL@89-181-151-182.net.novis.pt) Quit (Ping timeout: 480 seconds)
[15:01] <jtang> i'd imagine the upcoming release of rhel7 will be using systemd for startup of services
[15:01] <jtang> given that there are init.d and upstart scripts in ceph already, is there work being done on systemd ?
[15:02] <jtang> the startup process in rhel7 is going to be fun, even though its meant to be compatible with the lsb init scripts
[15:04] * nhorman (~nhorman@nat-pool-rdu.redhat.com) has joined #ceph
[15:15] * drokita (~drokita@199.255.228.10) has joined #ceph
[15:17] * guigouz1 (~guigouz@177.33.243.196) Quit (Quit: Computer has gone to sleep.)
[15:22] * joao (~JL@89.181.149.6) has joined #ceph
[15:22] * ChanServ sets mode +o joao
[15:22] * gregorg (~Greg@78.155.152.6) has joined #ceph
[15:22] * joao hates winter and power outages
[15:24] <nosebleedkt> haha
[15:25] <joao> if they keep this up, I might end up reading paper books, made of actual paper
[15:26] <joao> why don't they think of the trees is beyond me
[15:27] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:28] <nosebleedkt> joao, i ride motorbike in winter
[15:28] <nosebleedkt> love the crystal cold air on me
[15:28] <nosebleedkt> :P
[15:28] <joao> I'm okay with winter; I just don't fancy heavy rain
[15:29] <joao> I'm all for rain over crops and farm land
[15:29] <joao> just not for wherever I am standing
[15:29] <joao> :(
[15:30] * agh (~2ee79308@2600:3c00::2:2424) Quit (Quit: TheGrebs.com CGI:IRC (Session timeout))
[15:32] <nosebleedkt> lol
[15:44] * nosebleedkt_ (~kostas@213.140.128.74) has joined #ceph
[15:44] <markl> morning all
[15:44] <markl> anyone here good with t3h btrfs
[15:46] <markl> more specifically, tuning it to store big vm images? in my lab i am running an oracle vm as a physical standby
[15:46] <markl> auto defrag has helped quite a bit but it is still just not going to cut it for real use
[15:47] * nosebleedkt_ (~kostas@213.140.128.74) Quit ()
[15:48] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[15:48] * nosebleedkt (~kostas@213.140.128.74) Quit (Ping timeout: 480 seconds)
[15:49] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[15:55] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[15:58] * ee_cc (~ec@oosteinde.castasoftware.com) has joined #ceph
[15:59] <ee_cc> folks... I'm losing my mind getting to connect to a test rados gw
[16:00] <ee_cc> I did followed http://ceph.com/docs/master/radosgw/config/
[16:00] <ee_cc> but keep getting 404
[16:01] <ee_cc> I try hitting http://10.10.100.175/s3gw.fcgi
[16:03] <ee_cc> I'm trying to connect using the S3 java api
[16:03] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[16:06] * loicd (~loic@178.20.50.225) Quit (Ping timeout: 480 seconds)
[16:07] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[16:07] <sstan> I compiled Ceph successfully (SLES 11sp2). After make finishes its work, what is the next step?
[16:08] <sstan> can't find the ceph binary to run it
[16:08] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[16:10] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Read error: Operation timed out)
[16:10] * tryggvil_ is now known as tryggvil
[16:12] <ee_cc> the radosgw is messing around with me
[16:12] <ee_cc> if I PUT a .../s3gw.fcgi/bucket01 I get 404
[16:13] * Machske (~bram@d5152D8A3.static.telenet.be) has joined #ceph
[16:15] <Machske> anyone experience with Ceph and Xen ?
[16:16] <Machske> I'm trying to find out how to use qemu rbd as disk, instead of mounting the rbd device via the linux rbd module.
[16:17] * sm (~sm@xdsl-78-35-233-48.netcologne.de) has joined #ceph
[16:19] * Machske (~bram@d5152D8A3.static.telenet.be) Quit (Quit: Ik ga weg)
[16:21] <sm> hi, i am currently trying to get radosgw to work, it does it's job quite nicely but for larger files (> 800MB) i frequently get a 400 error (Request Timeout) with a ruby S3 client. Any idea why this is happening?
[16:21] <yehuda_hm> sm:is it when you put the objects or when you get them?
[16:22] <sm> oh i forgot to mention, those are PUT requests
[16:22] <yehuda_hm> sm: are you using apache?
[16:22] * loicd (~loic@magenta.dachary.org) has joined #ceph
[16:23] <sm> yes, with the custom build fastcgi packages from gitbuilder for the fastcgi module
[16:23] <yehuda_hm> ee_cc: maybe you have another site enabled in your apache?
[16:23] <yehuda_hm> sm: usually it means that the wrong fastcgi module is being used
[16:24] <sm> i use this one currently: http://gitbuilder.ceph.com/libapache-mod-fastcgi-deb-oneiric-x86_64-basic/ref/master/
[16:24] <yehuda_hm> check to see in your apache mods-enabled that it's modfastcgi and not modfcgi
[16:24] <sm> for my debian squeeze installazion
[16:24] <sm> the PUT generally works, just for some requests it fails
[16:26] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: This computer has gone to sleep)
[16:27] <yehuda_hm> sm: what usually happens that does that is that the fastcgi module buffers all the request before sending it to radosgw, so when radosgw starts processing it against the backend, it takes too long and apache times out
[16:27] * portante (~user@66.187.233.206) has joined #ceph
[16:28] <yehuda_hm> sm: so you can try looking at the logs, set 'debug ms = 1', try to isolate the issue
[16:28] <sm> i do not think that's whats going on, i straced the radosgw, it actually receives the data, i dug in the sourcecode and added some debugging to rgw_op.cc, the ofs value there does not match content_length, but it is not 0
[16:28] <sm> it aborts somewhere down the road
[16:29] <yehuda_hm> I see.. maybe your backend is too slow?
[16:29] <yehuda_hm> try figuring out whether it's waiting for requests too much
[16:30] <yehuda_hm> I think the default fastcgi timeout is 30 seconds, you can bump that up
[16:30] <sm> already did that, it is at 600s now but the behaviour did not change
[16:31] <yehuda_hm> do you get the timeout from the radosgw or from apache?
[16:32] <sm> the apache actually throws a 500 at me, but i enabled radosgw logging and it looks like this:
[16:32] <sm> 2012-12-14 16:08:45.865851 7fe1dffff700 1 ====== content-length 1160887244 ofs=479761192
[16:32] <sm> 2012-12-14 16:08:46.724024 7fe1dffff700 10 --> Status: 400
[16:32] <sm> 2012-12-14 16:08:46.724049 7fe1dffff700 10 --> Content-Length: 80
[16:32] <sm> 2012-12-14 16:08:46.724052 7fe1dffff700 10 --> Accept-Ranges: bytes
[16:32] <sm> 2012-12-14 16:08:46.724055 7fe1dffff700 10 --> Content-type: application/xml
[16:34] <sm> (the first line is my debug output in rgw_op.cc right before the if (!chunked_upload && (uint64_t)ofs != s->content_length) { check
[16:34] <yehuda_hm> hmm.. so it's waiting for data from the client side
[16:36] <sm> interestingly i was only able to reproduce this with the ruby client, tests with the same file and a python boto client worked
[16:36] <sm> so i suppose this is not only a ceph problem
[16:37] <yehuda_hm> might be a bug in the ruby client
[16:37] <yehuda_hm> can you debug it, make sure that it sends all the data it actually needs to?
[16:38] <yehuda_hm> that check kicks in after we reached the end of stream
[16:38] <yehuda_hm> so basically we're saying we didn't get as much data as in content_length
[16:38] <sm> how do you determine that? is it actually a timeout or when the connection closes
[16:39] <yehuda_hm> we'd get that when apache determined that we're done
[16:39] <yehuda_hm> so len == 0
[16:39] <yehuda_hm> it could happen if the client is too slow I guess
[16:40] <yehuda_hm> hence the timeout
[16:41] <yehuda_hm> ok, I'll be off for an hour or two, be back later
[16:41] <sm> i will dig further into that, quite strange somehow
[16:48] * gaveen (~gaveen@112.135.133.70) has joined #ceph
[16:52] * vata (~vata@208.88.110.46) has joined #ceph
[16:53] * wer_gone (~wer@wer.youfarted.net) Quit (Ping timeout: 480 seconds)
[16:54] * sagelap (~sage@76.89.177.113) Quit (Read error: Operation timed out)
[16:56] * low (~low@188.165.111.2) Quit (Quit: bbl)
[16:58] * ebo^ (~ebo@icg1104.icg.kfa-juelich.de) Quit (Quit: Verlassend)
[16:58] <occ> hi all. I have a KVM/RBD setup with quite a bit of data. When I try to add a new osd, even if I slowly increase the weight (in 0.01 increments) I still end up having delayed writes and failing vms. As far as I understand, w/ a replica count of 2 and min_size 1 ceph should still be able to write to the rbd volume, but it can't. Am I missing something?
[17:02] <ee_cc> hmm, not really
[17:02] <ee_cc> if I just GET the http://10.10.100.175/s3gw.fcgi I get a response
[17:12] * Machske (~bram@d5152D87C.static.telenet.be) has joined #ceph
[17:16] * danieagle (~Daniel@177.133.174.165) has joined #ceph
[17:17] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[17:18] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[17:19] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Quit: Leaving.)
[17:20] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[17:20] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[17:21] * sagelap (~sage@160.sub-70-197-146.myvzw.com) has joined #ceph
[17:22] * l0nk (~alex@173.231.115.58) has joined #ceph
[17:23] * nwat (~Adium@50.12.61.82) has joined #ceph
[17:31] * ee_cc_ (~ec@oosteinde.castasoftware.com) has joined #ceph
[17:32] * iggy__ is now known as iggy_
[17:34] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[17:39] * ee_cc (~ec@oosteinde.castasoftware.com) Quit (Ping timeout: 480 seconds)
[17:49] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Quit: Leaving.)
[17:54] * nhorman (~nhorman@nat-pool-rdu.redhat.com) Quit (Quit: Leaving)
[17:54] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[17:58] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[17:59] * sagelap1 (~sage@38.122.20.226) has joined #ceph
[17:59] * sagelap (~sage@160.sub-70-197-146.myvzw.com) Quit (Ping timeout: 480 seconds)
[18:02] * joao sets mode -o joao
[18:03] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[18:05] * danieagle (~Daniel@177.133.174.165) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[18:14] <yehudasa> ee_cc: you never really need to put s3gw.fcgi in your url
[18:21] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[18:23] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[18:23] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:24] * Leseb_ (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:27] <infernix> what's the proper way to submit patches - just send github pull requests?
[18:27] * rino (~rino@12.250.146.102) Quit (Quit: ircII EPIC4-2.10.1 -- Are we there yet?)
[18:27] <slang> infernix: checkout out SubmittingPatches in the top level dir
[18:31] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[18:31] * Leseb_ is now known as Leseb
[18:47] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[18:50] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[18:53] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Remote host closed the connection)
[18:53] * sm (~sm@xdsl-78-35-233-48.netcologne.de) Quit (Quit: sm)
[18:56] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[18:57] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Read error: Operation timed out)
[18:57] * nwat (~Adium@50.12.61.82) Quit (Quit: Leaving.)
[19:00] * sagelap1 (~sage@38.122.20.226) Quit (Quit: Leaving.)
[19:00] * sagelap (~sage@38.122.20.226) has joined #ceph
[19:07] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) Quit (Remote host closed the connection)
[19:11] * sjustlaptop (~sam@2607:f298:a:607:5ddc:973a:55e8:c460) has joined #ceph
[19:12] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[19:13] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[19:16] * sagelap (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[19:17] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[19:25] * sagelap (~sage@2607:f298:a:607:f5f5:ee4f:6791:8406) has joined #ceph
[19:26] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[19:26] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[19:27] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[19:32] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[19:32] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:38] * dpippenger (~riven@216.103.134.250) has joined #ceph
[19:41] * llorieri (~llorieri@177.141.245.115) has joined #ceph
[19:41] * sjustlaptop (~sam@2607:f298:a:607:5ddc:973a:55e8:c460) Quit (Ping timeout: 480 seconds)
[19:42] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[19:42] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[19:43] <llorieri> Hi guys, I have a question about radosgw: I can't purge files that were partially uploaded, is that normal ? is there a workaround ?
[19:44] <via> has anyone tried nfs exporting a mounted cephfs? i seem to be running into hangs trying to do so
[19:44] <joshd> fghaas: I'm here now, if you're still around
[19:47] <llorieri> I believe "Deprecated since version 0.52." in the temp remove doc page means something hehehe
[19:47] * ee_cc_ (~ec@oosteinde.castasoftware.com) Quit (Quit: Ik ga weg)
[19:47] <fghaas> ah, thanks joshd. I was wondering if knew of performance data in the combination of RBD + Nova boot-from-volume + virtio + Windows PV drivers
[19:48] <fghaas> IOW, running Windows guests off of qemu-rbd block devices
[19:48] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[19:48] <joshd> no, I don't know of such data
[19:49] <gregaf1> llorieri: you need to do some manual purging, probably — yehudasa can talk about the specifics, but I think the radosgw-admin help text describes it
[19:49] <fghaas> ok, know of any stability concerns in that combination, joshd?
[19:49] <llorieri> gregaf1: thanks :) I run the temp remove, but the partially uploaded files were not purged
[19:49] <gregaf1> via: nfs-exporting of a ceph mount is a bit finicky; I don't know of any hangs off-hand but davidz has been working on a lot of that
[19:50] <gregaf1> llorieri: oh, you're probably still inside the upload window, then
[19:50] <gregaf1> I think it only deletes stuff older than 24 hours by default
[19:50] <llorieri> gregaf1: cool !
[19:50] <llorieri> it is less than that, it is about 20
[19:50] <via> gregaf1: i see -- is it any better w ith ceph-fuse?
[19:51] <via> i was hoping to provide access to my many openbsd machines through it
[19:51] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[19:51] * loicd (~loic@magenta.dachary.org) has joined #ceph
[19:51] <gregaf1> I really don't know — it's fundamentally problematic enough that we're focusing more on the Ganesha NFS server than on kernel re-exporting
[19:51] <via> ok
[19:52] <joshd> fghaas: for windows 2008, before 0.55 it was possible to get a sigfpe during boot: http://tracker.newdream.net/issues/3521
[19:52] <davidz> via: No, ceph-fuse is not the way to go, it will work but is probably never going to be production ready. At least the way the kernel interfaces with fuse.
[19:52] <joshd> fghaas: that's the only windows-specific issue I've seen
[19:52] <via> davidz: ok
[19:53] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[19:53] <fghaas> joshd, thanks but that one wouldn't apply to the boot-from-volume use case as you would always be using cache=none for that
[19:54] <fghaas> you can't live-migrate with cache=writeback
[19:54] <gregaf1> I just got reminded that there are memory deadlock issues with NFS re-exporting a mounted filesysttem that's all in kernel space, too, although I'm not sure if those are the bugs that davidz has been wrestling with or not
[19:54] <joshd> fghaas: that sounds like a bug in openstack if you can't live-migrate with cache=writeback
[19:55] <fghaas> nope, that's a libvirt feature
[19:55] <fghaas> a relatively recent one, iirc, but a live-migration will definitely fail with an error message in libvirtd.log if you're using a cache option other than none
[19:56] <davidz> via: There is more testing to be done and some performance issues, but all fixes I've made are in 0.55.
[19:56] <joshd> ah, right, that's libvirt wanting the '--unsafe' flag incorrectly for rbd. that's fixed in libvirt 0.10.0
[19:56] <via> davidz: okay. i just finally succeeded in mountaing and the nfs node started shitting backtraces on the kernel log, so i think i'll try again in a month <_<
[19:57] <fghaas> joshd: huh? so live-migration with rbd caching enabled is actually expected to be safe?
[19:58] <joshd> fghaas: yes, qemu does a flush and halts i/o from the source before the destination reads anything
[19:59] * noob2 (~noob2@ext.cscinfo.com) has joined #ceph
[19:59] <davidz> via: Oops…sorry my changes in our ceph-client kernel repo…so that doesn't correspond with 0.55 Ceph release.
[19:59] <via> ah
[19:59] <via> i'm gonna take a look at ganesha
[19:59] <fghaas> joshd, sweet. makes me wonder why that is not implemented in qemu in some generic fashion, because what you say sounds like this would be a per-driver option
[19:59] <via> i'd not heard of it till today
[20:00] <fghaas> or per-driver feature, really
[20:00] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[20:00] * ChanServ sets mode +o scuttlemonkey
[20:00] <joshd> fghaas: it is, the issue is with qcow2 in that it does read some data that may change when it opens the file, before the source has shut off
[20:00] * yoshi_ (~yoshi@80.30.51.242) Quit (Remote host closed the connection)
[20:02] <fghaas> I see
[20:04] <fghaas> I recall that someone (wido perhaps?) mentioned here at some point that they were able to read off a virtio'd RBD at 500 MB/s from within a Linux KVM guest; I wonder what that would look like for a Windows guest
[20:04] <noob2> gah: lio-utils blew up in spectacular fashion for me
[20:05] <fghaas> noob2: if you happen to be following http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-images-san-storage-devices -- I'd be happy to take comments and suggestions :)
[20:06] <noob2> i am
[20:06] <noob2> instead of iscsi i'm using fibre
[20:06] <joshd> fghaas: I'm not sure, I haven't seen benchmarks comparing windows vs virtio linux, and that might not be where the bottleneck is for different read sizes
[20:06] <noob2> i wrote a python script that mounts rbd's in a certain order and with idential wwn's across 2 proxies
[20:06] <noob2> so you can multipath it
[20:07] <noob2> that works fine. i started loading up some i/o's on this thing and the proxy locked up. it was ubuntu 12.10 with the dev packages of ceph
[20:07] <fghaas> for what purpose, for performance or failover?
[20:07] <noob2> i restarted and target won't start anymore :(
[20:07] <noob2> failover
[20:08] <fghaas> perfectly sure that you're not using both MPIO paths?
[20:08] <noob2> OSError: [Errno 2] No such file or directory: '/sys/kernel/config/target'
[20:08] <noob2> both mpio paths?
[20:08] <fghaas> forgot to mount your configfs?
[20:08] <noob2> i setup vmware to load balance round robin between them
[20:08] <fghaas> then that's not failover, if you're doing round robin
[20:08] <noob2> yeah someone else mentioned mounting configfs
[20:08] <noob2> how do i mount that?
[20:10] <fghaas> mount -t configfs configfs /sys/kernel/config
[20:10] <noob2> i wonder why it failed to do that on startup
[20:10] <noob2> lemme try that
[20:10] <fghaas> but the target init script would do that for you
[20:10] <noob2> right
[20:11] <noob2> fghaas: would you said load balancing with this setup is a bad idea and i should stick to failover?
[20:11] <fghaas> http://fghaas.wordpress.com/2011/11/29/dual-primary-drbd-iscsi-and-multipath-dont-do-that/ -- I believe this would also apply to an FC frontend and to RBD, unless joshd objects
[20:11] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[20:12] <noob2> maybe that's why my kernel panic'd haha
[20:12] <noob2> he seems pretty firm in his don't do that rant
[20:13] <fghaas> that "he" is me, as you may guess from the similarity of the domain name and my nick
[20:13] <noob2> haha
[20:13] <noob2> sorry
[20:13] <noob2> yeah i'm with ya
[20:13] <noob2> i'll change it to prefer one path and failover on problems
[20:14] <fghaas> well what you really want to do is manage your HA from the FC proxy
[20:14] <noob2> right
[20:14] <fghaas> such that you have one WWN, which happens to transparently float between your LIO heads
[20:15] <noob2> yeah i setup my script to build both proxies with 1 WWN so vmware sees that as 1 disk
[20:17] <fghaas> not what I mean; you're still exporting the same WWN from 2 nodes simultaneously
[20:17] <noob2> i'll share my script with you if you're intested
[20:17] <noob2> i guess i'm not following
[20:18] <fghaas> what you ought to be doing is the standard type of failover we're doing in pacemaker clusters for iscsi with the ocf:heartbeat:iSCSITarget resource agent
[20:18] <noob2> ok so with pacemaker if it sees one proxy dies it moves that wwn export to another node correct?
[20:18] <noob2> and brings that up
[20:18] <fghaas> as in, you have one IQN (for FC: WWN) that bounces between physical nodes just like a virtual IP address
[20:18] <noob2> right
[20:19] <fghaas> yes, what you say
[20:19] <noob2> i think i like that setup better
[20:19] <noob2> how would you scale that though?
[20:19] <fghaas> how many storage proxies do you want? more than 32 in one cluster?
[20:19] <noob2> maybe like 6 or so
[20:19] <noob2> i have a pretty large vmware cluster
[20:19] <fghaas> 6 is a piece of cake for pacemaker
[20:20] <noob2> ok
[20:20] <fghaas> the other option is to shell out for RTS OS, which is rising tide's commercial product
[20:20] <noob2> did you need to build up scripts to move the wwn's around with pacemaker?
[20:20] <noob2> right
[20:20] <noob2> that's the other option
[20:20] <noob2> how expensive is RTS OS?
[20:20] <fghaas> not a clue :)
[20:20] <noob2> lol
[20:20] <noob2> me either
[20:21] <fghaas> but I _think_ they have HA built in
[20:21] <fghaas> gtg now; kids calling for bedtime stories :) bbl
[20:21] <noob2> yeah i believe so also
[20:21] <noob2> ok thanks :)
[20:21] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[20:21] <noob2> RTS OS seems to use pacemaker anyways lol
[20:22] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[20:28] <janos> i need to drop a machine out for maintenance (.52, on f17) and ceph --help shows me a "ceph mon delete <name>" command
[20:28] <janos> i need to drop a mon out
[20:28] <janos> but when i try to issue that command, it tells me "unknown command delete"
[20:28] <janos> is that expected? or do i just suck
[20:28] <janos> the mon is named "mon.3"
[20:29] <janos> i'm issuing "ceph mon delete mon.3"
[20:30] <janos> i may just tank it and go about my business
[20:30] <janos> and see how it handles it
[20:30] <janos> not a production env
[20:36] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:37] * Psi-jack (~psi-jack@psi-jack.user.oftc.net) has joined #ceph
[20:37] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[20:41] <infernix> gregaf1: i noticed you committed the clone ioctl for btrfs. is there any reason why it hasn't been implemented for cephs journalling yet, other than lack of time?
[20:42] <infernix> maybe there's some reason why it hasn't been implemented that I'm missing?
[20:43] <via> i think something is wrong with cephfs permissions. if something is world readable, and all directors are world execuite, how can i have permission denied messages for clients trying to browse
[20:43] * Cube (~Cube@12.248.40.138) has joined #ceph
[20:49] <via> is there a bug where things are only readable by their owner regarldess of the perms?
[20:51] <via> i'm sorry, i was being retarded
[20:52] <via> and selinux
[20:52] <noob2> yeah selinux fails silently
[20:53] * f4m8 (f4m8@kudu.in-berlin.de) Quit (Read error: Connection reset by peer)
[20:53] * todin (tuxadero@kudu.in-berlin.de) Quit (Read error: Connection reset by peer)
[20:54] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:54] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:56] * f4m8 (f4m8@kudu.in-berlin.de) has joined #ceph
[20:56] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[21:00] <sstan> Is it possible to use Ceph storage without loading ceph/rbd modules ?
[21:03] <sstan> ceph-fuse returns: fuse: unknown option `big_writes'
[21:03] <sstan> I'm using SLES sp2
[21:11] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[21:12] <gregaf1> infernix: clone ioctl me journaling what?
[21:13] * gaveen (~gaveen@112.135.133.70) Quit (Remote host closed the connection)
[21:13] <gregaf1> sstan: try putting "fuse big writes = false" in your ceph.conf for the client
[21:13] <gregaf1> or just pass "--fuse_big_writes=false" as an argument to ceph-fuse
[21:13] <sstan> thanks I'll try that!
[21:16] * l0nk (~alex@173.231.115.58) Quit (Quit: Leaving.)
[21:16] <sstan> now it complains about something else => fuse: unknown option `atomic_o_trunc'
[21:16] <sstan> --atomic_o_trunc=false ?
[21:17] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has joined #ceph
[21:17] <gregaf1> that's been in longer and doesn't have a config flag
[21:17] <gregaf1> hrm
[21:18] <gregaf1> sstan: what kernel version does your box have?
[21:18] <sstan> gregaf1: so under SLES there aren't many options unless I'm missing something
[21:18] <sstan> 3.0.13-0.27-default
[21:18] <gregaf1> that ought to be new enough, odd...
[21:19] <gregaf1> sounds like maybe SLES has a weird FUSE and there might need to be some changes
[21:19] <gregaf1> you should put a bug in the tracker, I guess — I don't have time right now, sorry!
[21:19] <sstan> so if ceph, rbd , etc. modules are missing, it would still be possible to use ceph-fuse, correct ?
[21:20] <gregaf1> yeah
[21:20] <via> fwiw, i am having a decent amoutn of success with unfs3 re-exporting ceph
[21:20] <sstan> thanks for the info, gregaf1
[21:21] <sstan> via : how does unfs3 interact with ceph ?
[21:21] <via> no issues so far, i just have ceph mounted at the same point as an /etc/exports line
[21:24] * llorieri (~llorieri@177.141.245.115) Quit ()
[21:27] <sstan> via: what command (or module) is used to mount ceph ?
[21:27] <via> ...mount.ceph? its moutned just like any other cephfs mount
[21:30] <sstan> ah .. using cephfs . It could have been a block device that is mounted, but I didn't think about mount.ceph
[21:31] <sstan> ceph.mount requires the ceph module I think
[21:31] <via> yeah
[21:31] <sstan> it doesn't come with SLES unfortunately, yet
[21:31] <via> i want to be able to use cephfs on linux machines and nfs of cephfs on nonlinux machines
[21:31] * zK4k7g (~zK4k7g@digilicious.com) has joined #ceph
[21:31] <via> i'm using scientificliinux with elrepo mainline kernels
[21:31] <fghaas> sstan: I'm late to the discussion, but although SLES does ship a kernel that has the ceph fs module, it doesn't come with the helper binaries
[21:31] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Read error: Connection reset by peer)
[21:32] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[21:32] * jjgalvez1 (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[21:32] <fghaas> and it won't, at least for the forseeable future
[21:32] <fghaas> same thing is true for rbd, I believe
[21:32] <sstan> via: I see .. that's a good idea .. but if one node fails ... the NFS connection to non-linux machines will be cut ?
[21:33] <via> yeah, but this is all personal stuff, it doesn't need to be HA
[21:33] <via> but eventually i could have multiple nfs proxies
[21:33] <sstan> perhaps a multipath iSCSI could do the job
[21:34] <sstan> fghaas: thanks! my distribution doesn't support ceph as much as say ubuntu
[21:34] <fghaas> sstan: I hate to quote myself, but I just said this a little bit ago to a different person
[21:35] <fghaas> (08:11:25 PM) fghaas: http://fghaas.wordpress.com/2011/11/29/dual-primary-drbd-iscsi-and-multipath-dont-do-that/ -- I believe this would also apply to an FC frontend and to RBD, unless joshd objects
[21:35] <fghaas> that multipath iSCSI thingy for failover is a royally bad idea
[21:35] <fghaas> even more so for load balancing
[21:35] <sstan> fghaas : thanks, I didn't see that
[21:36] <sstan> via : multipath might not be a good idea XD
[21:36] <via> is there any reason if you had multiple nfs heads, since v3 is stateless, that yuou couldn't just load balance nfs requests across the heads?
[21:37] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[21:37] <noob2> fghaas: i'm looking over the tutorial for pacemaker now
[21:37] <fghaas> which tutorial?
[21:37] <noob2> your resource agent you wrote. doeshttp://clusterlabs.org/quickstart-ubuntu.html
[21:37] <noob2> does your resource agent know about wwn's for fibre lun's?
[21:38] <noob2> i'm confused how vmware would know that a moved resource is the same one
[21:38] <noob2> cause it would show up on a different port
[21:39] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Quit: Leaving.)
[21:39] <fghaas> ah, that tutorial
[21:39] <fghaas> no, the RA currently doesn't, but I'll be happy to take a patch :)
[21:39] <noob2> haha
[21:40] <noob2> you wrote it in bash?
[21:40] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) has joined #ceph
[21:40] <noob2> the python script i wrote uses the rtslib to make this work
[21:40] <fghaas> yeah, but OCF RAs can be any language
[21:40] <noob2> gotcha
[21:41] <fghaas> http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html (sorry everyone for the OT link, but noob2 could use it :) )
[21:41] <noob2> i looked up the RTS os also, it uses pacemaker anyways :D
[21:42] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[21:42] * dmick (~dmick@2607:f298:a:607:6ced:4a04:7d67:fd53) has joined #ceph
[21:48] * calebamiles (~caleb@c-107-3-1-145.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[21:49] * drokita (~drokita@199.255.228.10) Quit (Quit: Leaving.)
[21:49] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[21:50] <noob2> fghaas: if i multipath it on vmware, wouldn't that be the same thing as pacemaker? it'll failover to the other node on problems
[21:50] * drokita1 (~drokita@199.255.228.10) has joined #ceph
[21:50] <fghaas> noob2: only when you hit the simple problems :)
[21:51] <fghaas> a node dying hard is a no brainer, anyone can do failover in that situation
[21:51] <infernix> gregaf1: maybe I'm not understanding this right, but with btrfs, every write goes to the journal; with O_DIRECT; then once completed, the same data is written again. So it would make sense to me to write the data to the journal, then use the btrfs clone ioctls to not write out the data again, but simply clone the extent e.g. only writing out metadata. that way you're only writing to disk once in the journal, and once committed the extents can be c
[21:51] <infernix> loned into the actual object
[21:51] <fghaas> it gets interesting when you hit intermittent network issues between nodes, both nodes firing up resources, etc.
[21:52] <infernix> that would of course only work if the journal is a file on the same btrfs filesystem as the OSD. there's probably some flaw in my thinking here but where? :)
[21:52] <noob2> true
[21:52] * portante (~user@66.187.233.206) Quit (Ping timeout: 480 seconds)
[21:52] <noob2> that's a hairy issue
[21:52] <fghaas> that's the kind of stuff any "real" ha stack goes to great lengths to avoid. which is why ceph has this pretty cool paxos implementation between mons, corosync/pacemaker deal correctly with quorum and resource failure, etc.
[21:53] <noob2> that's where pacemaker comes in?
[21:53] <noob2> i see
[21:54] <noob2> yeah i can see where pacemaker shines now
[21:54] <noob2> fencing off the misbehaving host
[21:56] <fghaas> yup. this is something that ceph has very well covered by itself, but LIO (sans the RTS OS HA stuff -- how is that for alphabet soup) needs something to manage it, such as pacemaker
[21:56] <noob2> yup
[21:56] <noob2> i agree
[21:56] <noob2> ceph has the bases covered with this
[21:56] <noob2> LIO needs help
[21:58] * drokita1 (~drokita@199.255.228.10) Quit (Ping timeout: 480 seconds)
[22:02] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) Quit (Quit: Leaving.)
[22:05] * davidz1 (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[22:05] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[22:06] <gregaf1> infernix: I had a short discussion about that with somebody last week, but I don't know how you related me to the clone ioctl or anything :)
[22:07] <gregaf1> and btrfs journal-to-store cloning hasn't been implemented because it was just considered last week and doing so would be a serious undertaking we don't have internal time for right now
[22:07] * vata (~vata@208.88.110.46) Quit (Quit: Leaving.)
[22:08] <mikedawson> gregaf1: I'm having a tough time finding documentation on the best way to mount an XFS filesystem for an OSD. Do you use defaults or set something like noatime,nodiratime?
[22:09] <sjust> mikedawson: both of those are a good idea
[22:09] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Ping timeout: 480 seconds)
[22:09] * drokita (~drokita@199.255.228.10) has joined #ceph
[22:09] <mikedawson> sjust: any others?
[22:09] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[22:09] <noob2> i used the defaults
[22:09] <sjust> not that I can think of, xfs's defaults are pretty sane
[22:09] * wschulze (~wschulze@cpe-98-14-23-162.nyc.res.rr.com) has joined #ceph
[22:12] * loicd (~loic@magenta.dachary.org) has joined #ceph
[22:26] <nhm> sjust: mikedawson: if you use noatime nodiratime shouldn't be necessary.
[22:31] * fghaas (~florian@91-119-215-212.dynamic.xdsl-line.inode.at) has left #ceph
[22:31] <mikedawson> nhm: so my fstab will look something like --- /dev/disk/by-partlabel/osd-sdb1 /var/lib/ceph/osd/ceph-0 xfs noatime 1 1 right?
[22:32] <mikedawson> nhm: using labels similar to your response on the mailing list today
[22:34] <mikedawson> nhm: what do you set for dump and fsck?
[22:38] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:44] * yasu` (~yasu`@dhcp-59-227.cse.ucsc.edu) has joined #ceph
[22:45] <elder> joshd, the udevadm command fixes the problem. Do you mind if I take that "bash -x" thing back out again, or do you want it to remain in there?
[22:47] <joshd> elder: I'd rather leave it in, since it's difficult to tell what's actually happening otherwise
[22:47] <elder> OK, that's fine.
[22:47] <elder> It just gets a lot more testing completed in the same amount of time without it.
[22:48] <joshd> really? I wouldn't expect it to have much overhead at all
[22:48] <elder> I'll let you know in a few minutes how much difference it makes.
[22:48] <elder> (Running without right now)
[22:49] <elder> The use of udevadm (with bash -x) made the number of iterations rise to more than two per second though.
[22:50] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[22:51] * noob2 (~noob2@ext.cscinfo.com) has left #ceph
[22:56] <elder> Well, I guess it doesn't make *that* much difference. 701 iterations in 300 seconds. With "-x" it was 68 iterations in 30 seconds. Before it was closer to 500. With "-x" it's maybe 600.
[22:56] <elder> Whoops.
[22:56] <sagewk> sounds good
[22:56] <elder> I'll keep it inl.
[22:56] <elder> (I'll keep -x, that is)
[22:57] <dmick> so udevadm settle was the key?
[22:59] <elder> It seems to work like a charm, dmick.
[22:59] <elder> I think there's other code that should use it.
[23:00] <dmick> UDEVADM SETTLE ALL THE THINGS
[23:00] <elder> In fact, perhaps the rbd CLI could consider using it so the user experience is a little less surprising.
[23:00] <dmick> srsly, good news
[23:00] <elder> Any need for review? I'm OK with what I've got in the script.
[23:01] <joshd> seems fine to me
[23:02] <dmick> I'll look; where is it?
[23:02] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[23:02] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[23:04] <elder> Look at ceph/wip-map-unmap
[23:04] <elder> Only I've put the "bash -x" back at the top.
[23:05] <elder> https://github.com/ceph/ceph/commit/61bafa16b690cbe66ccccf611f82f66a8114bbe2
[23:07] <dmick> cool
[23:10] <sagewk> elder: is that ready to go into next?
[23:10] <elder> Yes,
[23:10] <elder> I'll push it now.
[23:10] <elder> Since I got nobody saying I shouldn't...
[23:11] <elder> Done.
[23:13] <sagewk> sweet
[23:15] * tryggvil_ (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[23:15] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Read error: Connection reset by peer)
[23:15] * tryggvil_ is now known as tryggvil
[23:18] <elder> joshd, I inquired about that filestreams allocator test (#173) and will let you know what I hear back. I suspect that if the allocator fails it has nothing to do with the underlying media.
[23:18] <elder> And I don't think it's a "failure" per se anyway.
[23:19] <elder> The filestreams allocator is optional behavior on XFS that tries to implement its locality differently from its normal block allocator.
[23:20] <elder> It's intended to allow you to have multiple files open and writing (like writing several video files concurrently) and having each of them allocate from distinct regions of the disk rather than falling all over each other.
[23:21] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[23:21] <nhm> elder: ah yes, I think the filestreams allocator is the one I was looking into to put metadata on an SSD?
[23:22] * jlogan1 (~Thunderbi@2600:c00:3010:1:5dfe:284a:edf3:5b27) has joined #ceph
[23:24] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit ()
[23:25] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[23:25] <elder> sagewk, I'm going to force-update ceph-client/testing with ceph-client/wip-testing (after one last look-see to make sure it's what I want). This is the branch in which I've re-ordered commits with reviewed patches up front.
[23:26] <elder> OK with you?
[23:26] <elder> nhm, no that would have been a different thing. I still have the patch to implement the "ibound" option but it didn't apply cleanly so it ended up getting pushed to a back burner.
[23:26] <elder> The ibound option is more like the inode32 allocator, generalized.
[23:27] <elder> With inode32, all inodes are allocated from blocks whose offsets are representable in 32 bits.
[23:27] <elder> That allows you to have file systems bigger than 2TB (I think that's right, or 2^32 * some block size anyway), while keeping inode numbers 32 bits.
[23:28] <elder> The ibound option would make that 2^32 bit boundary adjustable. So you could fix it at the size of an SSD placed at the front of your logical volume, for example.
[23:29] <elder> The 64-bit allocator (inode64) places inodes anywhere on the volume.
[23:30] * zK4k7g (~zK4k7g@digilicious.com) Quit (Quit: Leaving.)
[23:31] <sagewk> elder: go for it
[23:31] <elder> OK. In a minute.
[23:39] <elder> Done
[23:42] <joshd> elder: thanks for checking, I'll ignore it for now, and maybe remove it from the suite if it comes up again
[23:42] <elder> OK.
[23:43] <elder> My gut feeling is it's safe to ignore.
[23:47] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[23:47] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Read error: Connection reset by peer)
[23:47] * tryggvil_ (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[23:49] <dmick> NOTICE: sepia uplink-to-internet is about to be swapped. New cable is live, so this *should* be about 10s + STP recovery time, but in case something goes wrong, you may lose SSH or VPN connections and have to reestablish. Sage is about to start a new run, so now is the time
[23:50] <dmick> I'll let inktankers know when it's back and working
[23:55] <joao> gregaf1, sagewk, around?
[23:55] <gregaf1> hi
[23:56] <joao> http://imageshack.us/g/1/9919444/
[23:56] <gregaf1> I can't read any of that :/
[23:56] <joao> those are screenshots I took from htop running on a plana
[23:57] <joao> gregaf1, click on image, zoom in on the overlay that shows up
[23:57] <gregaf1> it doesn't zoom in to full resolution
[23:57] <joao> aww crap
[23:57] <gregaf1> I can make out what looks like a 335MB memory line that's highlighted?
[23:58] <gregaf1> are these time-lapsed and so I'm seeing one monitor grow from 280MB to 500+?
[23:58] <gregaf1> make that 700+
[23:58] <joao> go on...
[23:58] <joao> gregaf1, you're right, it doesn't zoom much more than that
[23:59] <joao> but ctrl-+ might help
[23:59] <joao> helps here :\
[23:59] <joao> gregaf1, if you get bored, jump to the last image
[23:59] <joao> http://imageshack.us/f/856/htop12121405.png/
[23:59] <joao> I'm sure I have a direct link for those

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.