#ceph IRC Log

Index

IRC Log for 2010-08-24

Timestamps are in GMT/BST.

[0:00] * ghaskins_mobile (~ghaskins_@130.57.22.201) Quit (Quit: This computer has gone to sleep)
[0:45] * neale_ (~neale@va.sinenomine.net) Quit (Quit: neale_)
[1:03] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[1:12] * diss3ntive (~diss3ntiv@external.infinityward.net) has joined #ceph
[1:13] <diss3ntive> so....Is the latest build stable enough to use as a production/project storage system?
[1:14] <gregaf> wouldn't it be nice...
[1:14] <diss3ntive> hehe, I guess that is a no
[1:14] <gregaf> the RADOS object storage layer is pretty stable
[1:14] <gregaf> as is RBD, the block device built on it
[1:14] <gregaf> we're trying to put those into a beta production service soon
[1:15] <diss3ntive> what would you say is the instability?
[1:15] <gregaf> but the filesystem layer still needs a few more rounds of testing
[1:15] <diss3ntive> ahh ok
[1:15] <gregaf> object stores are a lot simpler than filesystems ;)
[1:16] <diss3ntive> I am tired of looking into fs's to try, and this one sounds promising thus far
[1:16] <diss3ntive> I ran into ntfs acl issues with gluster though, so I dropped that
[1:18] <gregaf> well, we like to think it's promising!
[1:19] <gregaf> but something as complicated as a distributed FS needs a lot of banging around before you say it's production-ready and we're still getting a fairly steady stream of bug reports in different configurations and such
[3:20] * MarkN (~nathan@59.167.240.178) has joined #ceph
[3:27] * MarkN (~nathan@59.167.240.178) Quit (Quit: Leaving.)
[3:27] * MarkN (~nathan@59.167.240.178) has joined #ceph
[3:44] * eternaleye (eternaleye@bach.exherbo.org) Quit (Quit: leaving)
[3:45] * eternaleye (eternaleye@bach.exherbo.org) has joined #ceph
[3:45] * eternaleye (eternaleye@bach.exherbo.org) Quit ()
[3:46] * eternaleye (eternaleye@bach.exherbo.org) has joined #ceph
[3:47] * eternaleye (eternaleye@bach.exherbo.org) Quit ()
[3:47] * eternaleye (eternaleye@bach.exherbo.org) has joined #ceph
[5:08] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[5:43] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[5:56] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[5:59] * ghaskins_mobile (~ghaskins_@209-255-241-2.ip.mcleodusa.net) has joined #ceph
[6:12] <cclien_> w/in 3
[6:24] * MarkN (~nathan@59.167.240.178) Quit (Quit: Leaving.)
[6:32] * MarkN (~nathan@59.167.240.178) has joined #ceph
[6:49] * f4m8_ is now known as f4m8
[6:59] * ghaskins_mobile (~ghaskins_@209-255-241-2.ip.mcleodusa.net) Quit (Quit: This computer has gone to sleep)
[7:07] * ghaskins_mobile (~ghaskins_@209-255-241-2.ip.mcleodusa.net) has joined #ceph
[7:20] * eternale1e (eternaleye@bach.exherbo.org) has joined #ceph
[7:20] * ghaskins_mobile (~ghaskins_@209-255-241-2.ip.mcleodusa.net) Quit (Quit: This computer has gone to sleep)
[7:20] * eternaleye (eternaleye@bach.exherbo.org) Quit (Read error: Connection reset by peer)
[7:23] * kblin (~kai@h1467546.stratoserver.net) Quit (Ping timeout: 480 seconds)
[7:29] * jantje (~jan@shell.sin.khk.be) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * hijacker (~hijacker@213.91.163.5) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * iggy (~iggy@theiggy.com) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * MarkN (~nathan@59.167.240.178) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * monrad-65532 (~mmk@domitian.tdx.dk) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * eternale1e (eternaleye@bach.exherbo.org) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * todinini (tuxadero@kudu.in-berlin.de) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * wido (~wido@fubar.widodh.nl) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * darkfader (~floh@host-82-135-62-109.customer.m-online.net) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * f4m8 (~drehmomen@lug-owl.de) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * Guest495 (quasselcor@bas11-montreal02-1128531598.dsl.bell.ca) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * DJCapelis (~djc@capelis.dj) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * nolan (~nolan@phong.sigbus.net) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * cclien_ (~cclien@60-250-103-120.HINET-IP.hinet.net) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * jeffhung_ (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * josef (~seven@nat-pool-rdu.redhat.com) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * DLange (~DLange@dlange.user.oftc.net) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * diss3ntive (~diss3ntiv@external.infinityward.net) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * atg (~atg@please.dont.hacktheinter.net) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * pruby (~tim@leibniz.catalyst.net.nz) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * revstray (~rev@blue-labs.net) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * cowbar (3af5b6381a@dagron.dreamhost.com) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * NoahWatkins (~jayhawk@waterdance.cse.ucsc.edu) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * conner (~conner@leo.tuc.noao.edu) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * lidongyang (~lidongyan@222.126.194.154) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * Jiaju (~jjzhang@222.126.194.154) Quit (solenoid.oftc.net reticulum.oftc.net)
[7:29] * kblin_ (~kai@h1467546.stratoserver.net) has joined #ceph
[7:29] * eternale1e (eternaleye@bach.exherbo.org) has joined #ceph
[7:29] * MarkN (~nathan@59.167.240.178) has joined #ceph
[7:29] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[7:29] * diss3ntive (~diss3ntiv@external.infinityward.net) has joined #ceph
[7:29] * cowbar (3af5b6381a@dagron.dreamhost.com) has joined #ceph
[7:29] * NoahWatkins (~jayhawk@waterdance.cse.ucsc.edu) has joined #ceph
[7:29] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[7:29] * Guest495 (quasselcor@bas11-montreal02-1128531598.dsl.bell.ca) has joined #ceph
[7:29] * conner (~conner@leo.tuc.noao.edu) has joined #ceph
[7:29] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[7:29] * revstray (~rev@blue-labs.net) has joined #ceph
[7:29] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) has joined #ceph
[7:29] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[7:29] * sagewk (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[7:29] * gregaf (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[7:29] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[7:29] * josef (~seven@nat-pool-rdu.redhat.com) has joined #ceph
[7:29] * lidongyang (~lidongyan@222.126.194.154) has joined #ceph
[7:29] * darkfader (~floh@host-82-135-62-109.customer.m-online.net) has joined #ceph
[7:29] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[7:29] * DJCapelis (~djc@capelis.dj) has joined #ceph
[7:29] * todinini (tuxadero@kudu.in-berlin.de) has joined #ceph
[7:29] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[7:29] * cclien_ (~cclien@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[7:29] * jeffhung_ (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[7:29] * wido (~wido@fubar.widodh.nl) has joined #ceph
[7:29] * jantje (~jan@shell.sin.khk.be) has joined #ceph
[7:29] * monrad-65532 (~mmk@domitian.tdx.dk) has joined #ceph
[7:29] * Jiaju (~jjzhang@222.126.194.154) has joined #ceph
[7:29] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[7:29] * iggy (~iggy@theiggy.com) has joined #ceph
[7:29] * f4m8 (~drehmomen@lug-owl.de) has joined #ceph
[8:30] * allsystemsarego (~allsystem@188.26.33.211) has joined #ceph
[8:56] * andret (~andre@pcandre.nine.ch) has joined #ceph
[9:25] * Osso (osso@AMontsouris-755-1-5-251.w86-212.abo.wanadoo.fr) has joined #ceph
[10:04] * Yoric (~David@213.144.210.93) has joined #ceph
[12:24] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[12:24] * Yoric (~David@213.144.210.93) has joined #ceph
[13:12] <jantje> the ./configure thing complained about me not having libboost-dev (something with spirit was missing), but I -did- have libboost-dev installed, but I probably missed some others from (http://ceph.newdream.net/wiki/Debian) , so you guys might want to check it out
[13:13] <jantje> and I dont have git access from work, so I downloaded the tarball : cat: .git_version: No such file or directory
[13:57] * ghaskins_mobile (~ghaskins_@209-255-241-2.ip.mcleodusa.net) has joined #ceph
[14:08] * ghaskins_mobile (~ghaskins_@209-255-241-2.ip.mcleodusa.net) Quit (Quit: This computer has gone to sleep)
[14:14] * DLange (~DLange@dlange.user.oftc.net) Quit (synthon.oftc.net resistance.oftc.net)
[14:14] * josef (~seven@nat-pool-rdu.redhat.com) Quit (synthon.oftc.net resistance.oftc.net)
[14:17] * josef (~seven@nat-pool-rdu.redhat.com) has joined #ceph
[14:17] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[14:37] * ghaskins_mobile (~ghaskins_@130.57.22.201) has joined #ceph
[15:46] * bbigras (quasselcor@bas11-montreal02-1128531598.dsl.bell.ca) has joined #ceph
[15:46] * Guest495 (quasselcor@bas11-montreal02-1128531598.dsl.bell.ca) Quit (Remote host closed the connection)
[15:46] * bbigras is now known as Guest663
[15:50] * f4m8 is now known as f4m8_
[17:57] <sage> jantje: what distribution are you running?
[18:26] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[18:48] * ghaskins_mobile (~ghaskins_@130.57.22.201) Quit (Quit: This computer has gone to sleep)
[18:53] <wido> hi sagewk, i've done some testing today, a lot of files seem to be corrupted
[18:53] <wido> couldn't really work with vbindiff, but comparing just checksums gave me corrupted files
[18:54] <sage> this is all live data, or snapshotted?
[18:57] <wido> live data, i'm not using any snapshots
[18:58] <sage> ok, probably something happened during recovery or something. can you put a few example file names (and master copies) in the bug?
[18:58] <sage> i'll take a look in a bit
[18:59] <wido> yes, i'll open a bug and include the source files
[18:59] <wido> most of them are ISO's which are effected, small files seem OK
[19:00] <wido> but i'll report this all in the bug
[19:00] <sage> ok
[19:01] <sage> are they large files (in the fs), or large objects (added via radosgw)?
[19:04] * atg (~atg@please.dont.hacktheinter.net) Quit (Quit: No Ping reply in 180 seconds.)
[19:04] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[19:05] <wido> large files on the fs, haven't check the radosgw objects yet
[19:05] <wido> i'll do that too before opening the bug
[19:22] * Guest663 (quasselcor@bas11-montreal02-1128531598.dsl.bell.ca) Quit (synthon.oftc.net graviton.oftc.net)
[19:22] * diss3ntive (~diss3ntiv@external.infinityward.net) Quit (synthon.oftc.net graviton.oftc.net)
[19:25] * bbigras (quasselcor@bas11-montreal02-1128531598.dsl.bell.ca) has joined #ceph
[19:26] * bbigras is now known as Guest684
[19:26] * diss3ntive (~diss3ntiv@external.infinityward.net) has joined #ceph
[19:31] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[19:35] * diss3ntive (~diss3ntiv@external.infinityward.net) Quit ()
[20:28] <wido> sage: issue reported: #376
[20:58] <jantje> sage: debian
[21:03] <wido> i found something cool for OSD's: http://www.prlog.org/10874926-owc-announces-40gb-mercury-extreme-pro-ssd.html
[21:04] <wido> small SSD, affordable and fast, would be pretty nice for journaling
[21:04] <wido> only 99$ for a 40GB SSD
[21:06] <darkfader> wido: there's also a sanely cheap 50GB version of the ocz vertex 2
[21:06] <darkfader> wido: do you think it'll be ok to use one SSD for 4 OSDs that each got their own disk?
[21:07] <wido> darkfader: i think so. I'm doing so with a X25-M right now, 4 OSD with a 4GB journal each (seperate partition)
[21:07] <darkfader> also I wonder how long the SSDs will live under a journals constant writes
[21:07] <jantje> just leave enough space available
[21:07] <wido> i'm not so afraid for that. I'm using X25-M
[21:08] <darkfader> zfs has a nice feature to just drop a L2ARC device when it starts being naughty
[21:08] <darkfader> wido: all hail intel hehe
[21:08] <wido> i'm using the X25-M for about 9 months now in a few really heavy MySQL servers, no problems at all
[21:08] <jantje> and ssd's have like 200MB/s, slower than your total disk speed
[21:08] <darkfader> so, do you get full write speed with the X25?
[21:08] <wido> (i pressed enter to early)
[21:08] <darkfader> ah, ok that much for the question
[21:09] <jantje> i would say 1 ssd for 2 osd disks
[21:09] <darkfader> ah ok
[21:09] <darkfader> i'll stick by that then ;)
[21:09] <jantje> (but my network is 4x1gbps)
[21:09] <wido> the X25-M performs pretty well, but it's my RAID controller (Areca 1220) which is lacking CPU power, so it limits my bandwith to 350MB/sec, i only need the IOps, not the bandwith
[21:10] <darkfader> gotta run now, thanks for the insight
[21:10] <wido> and you should underpartition the SSD, just use 80% of its capacity, never use the other 20%, the wear-leveling then should do it's job
[21:10] <darkfader> yeah always doing that
[21:11] <jantje> wido: is there a way to verify that?
[21:12] <wido> no that i know, i'm just assuming it does. No errors or whatsoever after 9 months ;)
[21:16] <wido> jantje: http://communities.intel.com/message/74853;jsessionid=28B30450BA6A933EAABB48985EBBAD69.node7COM
[21:26] <jantje> thanks
[21:36] <gregaf> it turns out that the good consumer multi-level cell drives (like Intel) are rated for something like 5 years of continuous writes
[21:36] <darkfader> re
[21:37] <darkfader> the basic question is how a journal write error will be handled
[21:37] <gregaf> the enterprise single-cell are for….nearly a century, I don't remember exactly
[21:37] <darkfader> yeah, i wouldn't worry if i had one of these sexy Sun^WOracle SSD racks
[21:37] <todinini> we tested quite a few ssd, the intel G2 were the best, than came the atp velocity, the worest were samsung and crosair
[21:38] <darkfader> lol yes, i got a samsung in my desktop and it's not even well performing for that
[21:39] <gregaf> the SSD landscape changes pretty quickly depending on which random controller the company is using now, though
[21:39] <gregaf> I think Corsair has drives using the Barefoot controller now, those are good, and Samsung has a new controller as of 8 months ago or something
[21:40] <todinini> doesn't have samsung there own controller?
[21:40] <gregaf> a lot of companies did their first SSDs using a really crappy JMicron controller, or a few others that just fell apart on random writes until they got caught by the tech review sites
[21:40] <darkfader> hehe
[21:41] <todinini> gregaf: we had a big problem with the wearlevling, after the ssd was once full the perfomrance dropped significtant
[21:41] <gregaf> ah, I meant Sandforce, not Barefoot, don't remember which is the Barefoot controller
[21:42] <darkfader> todinini: the samsung in my macbook is like that. i'll get a seagate hybrid drive now, that has so much better reliable performance
[21:42] <todinini> for our workload it was essential to support the trim command,
[21:42] <darkfader> i wish there was 15k sas/8GB flash hybrids
[21:42] <gregaf> todinini: yeah, I just remember that most controllers had a lot of issues until AnandTech and a few other sites started doing comprehensive benchmark reviews of drives in new and used states and showed the difference between Intel's controller and everybody else's
[21:42] <darkfader> (used, on ebay, for $99)
[21:43] <darkfader> gregaf: did you read the review about btrfs ssd mode, by chance?
[21:43] <todinini> I don't like hybrids, the 8G would be much better used as an OS Cache, so buy more RAM
[21:43] <gregaf> no, don't think so
[21:43] <jantje> todinini: did you also test the newest ocz and such ?
[21:43] <darkfader> let me search for a moent
[21:44] <todinini> jantje: not sure, have to check at work
[21:44] <darkfader> http://www.phoronix.com/scan.php?page=article&item=btrfs_ssd_mode&num=1 (it was disappointing, i hope it's outdated)
[21:44] <gregaf> what kind of in-house testing do you need to do, if I can ask?
[21:44] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[21:44] <todinini> gregaf: what do you mean?
[21:45] <josef> darkfader: take phoronix's testing with a grain of salt
[21:45] <darkfader> josef: i hoped to hear that
[21:45] <josef> or anything smaller that you can think of
[21:45] <gregaf> I just ask because the good review sites have gotten pretty good at measuring SSD performance, they tend to give you sequential read/write, random read/write from 1KB to 4MB or so, IOPS under some sort of server workload, and run them all in new and fully-used states
[21:45] <darkfader> josef: then, i'll take the stability of our SAN netapp boxes lol
[21:46] <josef> heh
[21:47] <todinini> gregaf: That's right, but we like to do our own measurments, we get most of the ssd for a free trail
[21:48] <gregaf> like just confirmation testing on the drives you're looking at, plus stuff on specific apps
[21:48] <gregaf> ?
[21:49] <todinini> gregaf: yep, we use the ssd in internal server, as well we rent server to our customer with ssd, therefore we have plenty of them to test
[21:49] <todinini> and I like it to play with new hardware
[21:49] <gregaf> cool
[21:50] <todinini> atm, we are testing ceph, and we quite like the rbd
[21:50] <gregaf> I'm just not real familiar with industry practices on stuff like hardware selection, I'm stuck pretty firmly on the software side, so was curious :)
[21:51] <todinini> right now, we have around 44 server in the cluster, if erverthing goes well, we will expand it
[21:52] <gregaf> not hitting too many bugs?
[21:52] <darkfader> wow thats quite a few
[21:52] <todinini> after I converted a qcow2 image into the rbd, is there a possibility do list all images in the rbd?
[21:53] <todinini> gregaf: the beginning was a bit difficult, because the wiki is quit out date, and many things where try and error
[21:54] <todinini> performance wise the network is the bottelneck, we are hitting the 100M/s mark
[21:54] <gregaf> well let us know if you hit any issues to put in our tracker — it'll depend on your workload what you run into or don't, though
[21:54] <gregaf> sage might know about the rbd thing, I'm afraid I don't use it at all and yehudasa is building it but he's out on vacation
[21:54] <todinini> gregaf: the biggest problem seems to be the mds, and the need for memory
[21:54] <gregaf> yeah
[21:55] <gregaf> have you upgraded to the one using tcmalloc yet?
[21:55] <gregaf> that seems to help quite a bit
[21:55] <darkfader> todinini: i can try to un-outdate the wiki if it's easy things that i have understanding of.
[21:55] <todinini> the tcmalloc helps for the osd, but not for the mds
[21:55] <darkfader> so drop me a /msg if you find something
[21:55] <todinini> darkfader: ok
[21:55] <gregaf> huh, I thought wido said it was cutting his long-term mds memory usage about in half
[21:56] <todinini> gregaf: I did not notice that, but I can retest tomorrow
[21:57] <todinini> if I start blogbench on the fs, the mds goes oom, if I use kvm with rbd, everything is fine
[21:57] <gregaf> it might depend on your workload, and it's possible that cutting it in half is still too much for whatever box you're using
[21:58] <gregaf> does reducing the cache size help at all?
[21:58] <todinini> gregaf: yep, my server a abit short on the RAM site, they have only 1GB, but we want to use "old" unused server
[21:58] <todinini> gregaf: you mean in config.cc an recompiling?
[21:58] <gregaf> heh
[21:59] <gregaf> you can also use any of those parameters as command-line options!
[21:59] <gregaf> no need to recompile :)
[21:59] <todinini> or which cache you mean?
[21:59] <todinini> realy?
[21:59] <todinini> didn't know that
[21:59] <todinini> wich options would you recommand to try out?
[22:00] <gregaf> mds_cache_size
[22:00] <gregaf> it refers to the number of inodes to cache, and I think the restrictions are a bit loose but adjusting it will at least impact how much memory it uses
[22:01] <todinini> you mean like this? /usr/bin/cmds -i 0 -c /tmp/ceph.conf.7899 -mds_cache_size 2000
[22:02] <gregaf> double-dash on the full-word options:
[22:02] <gregaf> like /usr/bin/cmds -i 0 -c /tmp/ceph.conf.7899 --mds_cache_size 2000
[22:03] <gregaf> 2000 is probably a little small at 2% of the default though ;)
[22:03] <todinini> ahh, you guys should really work on the doku
[22:03] <gregaf> :( yes
[22:04] <darkfader> hmm there is nothing about cmd line options yet, is there?
[22:04] <todinini> do you know if commercial support is planed for ceph?
[22:04] <gregaf> at some point, ye
[22:04] <gregaf> *yes
[22:04] <todinini> gregaf: do you have any timeline?
[22:04] <gregaf> if there is one, I don't know it
[22:05] <gregaf> not until 1.0, certainly
[22:06] <todinini> we plan to start a beta testing programm, and it would be nice to get some sort of support, or urgent bug fixes
[22:06] <darkfader> gregaf: might be a good idea to accept some money from early adopters :>
[22:07] <gregaf> that kind of thing is a bit above my pay grade ;)
[22:07] <darkfader> hehe
[22:08] <darkfader> on the good side,, amplidata still hasn't bothered to answer my inquiry about a demo setup
[22:08] <gregaf> we tend to rank user bug reports pretty high priority to begin with, but if you really do want something paid for assurances sooner than 1.0 I'm sure Sage would be happy to talk to you
[22:08] <todinini> gregaf: I started the blogbench test, with the new commanline option
[22:10] <todinini> the memory consumption does't change
[22:10] <todinini> I used --mds_cache_size 40000
[22:12] <todinini> the mds use over 1GB of the mem
[22:12] <todinini> 0000000001690000 1244780 rw--- 0000000000000000 000:00000 [ anon ]
[22:13] <gregaf> do you have any swap?
[22:14] <gregaf> and how accurately are you checking usage — like would you notice if it's getting farther into the test?
[22:15] <todinini> yes 5G swap, I would see if I get more test loop runs throuh
[22:15] <gregaf> hmm
[22:16] <gregaf> todinini: by blogbench you mean http://www.pureftpd.org/project/blogbench/download ?
[22:16] <todinini> yep
[22:17] <gregaf> okay, I'll see if I can take a look at it today or tomorrow
[22:17] <gregaf> 6 GB is a lot higher usage than we normally see
[22:17] <gregaf> more like 1GB plus 1 or 2 GB swap, I think
[22:18] <todinini> gregaf: that would be great, so I should try with a server which as more ram?
[22:18] <gregaf> which is still higher than we expect, our memory workload is apparently pretty bad for malloc
[22:19] <todinini> the memory is not defragmented, it is one big malloc area
[22:19] <gregaf> well, one of my dev servers has been running qa tests on cfuse with 1 MDS, one monitor, and one OSD for about 20 hours and the MDS is currently using 828MB+1024MB swap
[22:19] <gregaf> cfuse is pretty slow at these things compared to the kernel client, though, and that's with tcmalloc rather than the standard ptmalloc
[22:20] <todinini> I use tcmalloc as well
[22:21] <darkfader> gregaf: if i didn't have the slight hope to use the fuse client on BSD some day, i'd ... nvm
[22:21] <darkfader> fuse is utter crap for performance
[22:21] <gregaf> well our fuse client can get much, much, much faster once we put the work in
[22:21] <darkfader> gregaf: but do you think it is worth the effort at all?
[22:21] <gregaf> the kclient engages in a number of caching behaviors that cfuse doesn't yet, and should
[22:22] <gregaf> probably
[22:22] <gregaf> fuse is ported to a number of OSes
[22:22] <gregaf> and writing a native client for each would get annoying
[22:22] <gregaf> besides, doesn't gluster run entirely via fuse?
[22:23] <darkfader> no, it has a "real" kernel client too i think
[22:23] <darkfader> they list fuse as a feature and if you ask them why it eats up so much speed they'll first point you at a ld_preload library and next at a kernel client
[22:23] <darkfader> at least as much as i remember
[22:23] <darkfader> i think orionvm will know better
[22:35] <darkfader> gregaf: for the manual, --mds_cache_size does set the number of inodes the mds is caching
[22:36] <darkfader> that more exactly mean, the number of inodes and their osd locations?
[22:36] <gregaf> it's the number of "CInodes" that are cached
[22:37] <gregaf> those contain a lot of data (Sage counted once and got ~2K per, but that may have changed since then)
[22:37] <darkfader> okay
[22:37] <gregaf> but there isn't really an OSD location for an inode
[22:37] <darkfader> ah, sorry.
[22:38] <gregaf> they're just MDS constructs to facilitate the filesystem, and the file data is striped across objects whose names are based on the inode
[22:38] <gregaf> and the object location is calculated rather than stored in a lookup table
[22:39] <darkfader> then i don't yet understand what is in the CInode cache
[22:39] <darkfader> but i'll just keep the entry vague enough to stay correct :)
[22:39] <gregaf> heh
[22:40] <darkfader> i guess -m monaddress is rarely used?
[22:40] <gregaf> well, it links to the dentries, it contains all the non-location data stored in a traditional inode, and it has a lot of complicated data structures that handle snapshots and distributing the fs across more than one mds
[22:41] <gregaf> rarely used?
[22:41] <darkfader> it's in the manpage, but do you ever manually specify a monitor to connect to instead of the one in your config?
[22:41] <gregaf> I think it's mostly just used with cfuse at this point
[22:41] <darkfader> ah, ok.
[22:42] <gregaf> but you could use it with whatever you liked if you wanted to specify a startup monitor for some reason
[22:48] <darkfader> can you give me a http link to the git so i can search out the other mds options? i wanna write some more doc but my brain is already shutting down
[22:48] <gregaf> http://ceph.newdream.net/git/?p=ceph.git;a=summary
[22:49] <gregaf> http://ceph.newdream.net/git/?p=ceph.git;a=blob;f=src/config.cc;h=4445240bb679b6558ead56da445d18dfae1c51b8;hb=unstable is the specific link to config.cc in unstable
[22:50] <gregaf> you want the bits starting at line 255 in the "struct config_options"
[22:50] <gregaf> http://ceph.newdream.net/git/?p=ceph.git;a=blob;f=src/config.cc;h=4445240bb679b6558ead56da445d18dfae1c51b8;hb=unstable#l255
[22:50] <darkfader> thaks
[22:50] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[22:51] <gregaf> some of it's documented and a lot isn't — if you can't figure out what it means just don't document it, it probably shouldn't be messed with if you don't know what you're doing ;)
[22:51] <todinini> darkfader: there is already a list in the wiki http://ceph.newdream.net/wiki/Ceph.conf#options_defined_in_src.2Fconfig.cc
[22:52] <todinini> but I didn't know I could use them as commandline option
[22:52] <darkfader> ah no wonder i didnt find it
[22:58] * allsystemsarego (~allsystem@188.26.33.211) Quit (Quit: Leaving)
[23:08] <darkfader> gregaf: i suspect mds_mem_max isn't really working, when we talk mds and OOM
[23:10] <gregaf> well, that's a very interesting config option
[23:11] <gregaf> I'm afraid the only time it's used is in a print out statement
[23:12] <darkfader> sigh. this salad sauce is so not OK any more
[23:13] <gregaf> I can't tell if you're being metaphorical or referring to a salad you're eating...
[23:14] <darkfader> sorry :(
[23:14] <darkfader> it was about real salad
[23:15] <gregaf> haha
[23:15] <gregaf> that's unfortunate
[23:32] <darkfader> do you know about the wait_safe option for mds?
[23:32] <darkfader> in the src it says either
[23:32] <darkfader> journaler->wait_for_flush(0, c);
[23:32] <darkfader> or
[23:32] <darkfader> journaler->wait_for_flush(c, 0);
[23:32] <darkfader> little me tries to understand what 0 and c are hehe
[23:32] <gregaf> I don't, let me look at it
[23:33] <darkfader> mds/MDLog.cc
[23:34] <darkfader> still looking where wait_for_flush is
[23:37] <gregaf> oh, c is a Context* and 0 is NULL
[23:38] * Osso (osso@AMontsouris-755-1-5-251.w86-212.abo.wanadoo.fr) Quit (Quit: Osso)
[23:38] <gregaf> the first position is the Context to call when it's synced to disk
[23:39] <gregaf> on the primary
[23:39] <gregaf> and all members of the PG have it in memory
[23:39] <gregaf> the second parameter is the Context to call when it's "safe", and all the OSDs in the group have it on disk
[23:41] <gregaf> so the mds_log_unsafe option, if true, will make the MDS somewhat faster by having looser requirements for journal safety
[23:42] <darkfader> thats a nice option while testing
[23:42] <darkfader> i.e. for the people that run it all on one node or in vmware
[23:42] <gregaf> yeah
[23:43] <gregaf> not a good idea in production, though, or you might get a corrupted journal, which would be bad
[23:43] <darkfader> yup
[23:44] <darkfader> *bigsmile* somewhat like using ext3 instead of vxfs
[23:44] <darkfader> although i've not seen any issue with ext3 in the last year
[23:45] <gregaf> I've had a lot of issues with it, probably from making it run Ceph and use xattrs
[23:46] <darkfader> it just gets worse with how much you demand
[23:46] <darkfader> i don't know why barriers arent used by default there

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.