#ceph IRC Log


IRC Log for 2012-01-24

Timestamps are in GMT/BST.

[0:04] <Tv|work> daemonik: on paper, yes
[0:04] <Tv|work> daemonik: then there's the part that says "budget 8GB of RAM for zfs if you don't want to reboot often, more if you actually run apps"
[0:05] <Tv|work> but really, the biggest issue is the damn licensing
[0:05] <daemonik> Tv|work: ZFS does aggressive caching, and needs that RAM for the deduplication tables and probably also the compression stuff. They made creating a subfilesystem/subvolume/whateveritscalled as inexpensive as created a new directory.
[0:05] <Tv|work> the only way zfs will be popular in the linux world is via things taking inspiration from it -- such as btrfs
[0:06] <Tv|work> oracle could change that by relicensing it, but i just don't see that happening
[0:06] <daemonik> Tv|work: FreeBSD worked around the licensing issue. btrfs is years off. ZFS is stable here and now and if I can have Ceph on two FreeBSD boxes I'll happily spend 32gb of RAM on them.
[0:06] <Tv|work> FreeBSD ain't GPL.
[0:07] <Tv|work> the license is explicitly designed to prevent integrating with GPL code
[0:07] <daemonik> I've been using it a lot more lately (at home, because of ZFS, Linux softRAID doesn't play nice on crappy hardware) and its weak licensing shows.
[0:07] <Tv|work> because Sun designed it as a competitive edge over various Linux vendors, probably mostly Red Hat
[0:08] <daemonik> Yeup, and now Solaris is borderline irrelevant. Illumos isn't very useful to the people who aren't directly involved in it for now. Thanks Sun.
[0:17] <dwm__> Regarding software-RAID and rebuilding, mdadm has a mode where you can add a write-intent bitmap to a RAID set.
[0:18] <dwm__> This allows for just resynchronising those sections which were known to be potentially dirty after an unclean shutdown.
[0:18] <dwm__> Can kick write performance quite badly with all the seeks, however, unless you stick it on another device.
[0:20] <daemonik> dwm__: There are other reasons we use ZFS. Linux's softraid can't grow. With ZFS I can casually add another mirror.
[0:20] <dwm__> daemonik: Uh, you can do that with mdadm, too.
[0:21] <daemonik> I don't know Ceph well enough to know, but perhaps it would make more sense to have individual ZFS mirrors comprise many OSDs rather than have large zpools comprise two OSDs.
[0:21] <daemonik> mdadm can grow raid10?
[0:21] <dwm__> Sure.
[0:22] <dwm__> Hell, it can even up- and down-convert between a whole host of different RAID levels non-destructively.
[0:22] <daemonik> dwm__: Ah I looked this up and I see
[0:24] <dwm__> Ah, I stand corrected -- it doesn't seem to support RAID10 at present.
[0:27] <Tv|work> daemonik: yes, as in, you don't want a single PG to grow too big, and probably don't >>100 PGs per OSD either
[0:27] <Tv|work> +want
[0:27] <daemonik> PG?
[0:27] <Tv|work> placement group
[0:27] <Tv|work> http://ceph.newdream.net/docs/latest/dev/placement-group/
[0:28] <dwm__> Hmm, that implies a practical upper bound on how much space you want a single OSD to serve.
[0:28] <Tv|work> yes; just have more OSDs
[0:28] <dwm__> Oh, sure. What do the numbers work out as for that bound?
[0:29] <Tv|work> i don't think anyone's figured out how big a PG can become, before it's too much
[0:29] <Tv|work> but a PG is the unit of recovery, and you don't want that to be too big, for best recovery speed
[0:29] <daemonik> Does this mean that Ceph would perform noticeably better one OSD to each ZFS two-disk mirror than two large OSDs on two large zpools?
[0:29] <Tv|work> eventually, number of PGs will auto-tune based on the amount of data stored
[0:30] <Tv|work> daemonik: i have only one answer to that: benchmark ;)
[0:31] <daemonik> Tv|work: If I were to set up such systems, what benchmarks should I use?
[0:31] <dwm__> daemonik: What applications do you care about?
[0:31] <Tv|work> daemonik: depends on the production workload you need to support.. sorry to be circumspect, but that's the reality
[0:32] <daemonik> Tv|work: Are there any benchmarks I could run that would be helpful? I'll post on the list.
[0:32] <Tv|work> daemonik: we want to provide more "on hardware X, workload Y performs at Z speed", but we're not ready for that quite yet
[0:32] <daemonik> Are there any outstanding issue that should deter someone from using Ceph in production?
[0:32] <Tv|work> daemonik: what would you use the measure performance of your local zfs fs
[0:33] <Tv|work> daemonik: rados, rbd and radosgw are pretty good; the distributed filesystem is not really production ready yet; pay attention to backing filesystem choice
[8:41] * weetabeex (~jecluis@ has joined #ceph
[8:42] <weetabeex> hi
[8:58] * meyer (meyer@c64.org) has joined #ceph
[8:58] * meyer is now known as Meyer
[9:41] <malachhe1> hi
[9:43] <malachhe1> i get an error when starting ceph monitor
[9:43] <malachhe1> tarting mon.0 rank 0 at mon_data /tmp/mon0 fsid 7e0b9410-7e6a-4390-9075-b5b2cc6568ea
[9:43] <malachhe1> accepter.bind unable to bind to Cannot assign requested address
[9:43] <malachhe1> failed: 'ssh itruc-1 /usr/bin/ceph-mon -i 0 -c /tmp/ceph.conf.23700 '
[9:44] <malachhe1> someone has an idea ?
[14:03] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) has joined #ceph
[18:38] * jojy (~jvarghese@ has joined #ceph
[19:40] <yehudasa> fred_: are you there?
[20:44] <dwm__> Hmm, does XFS have any ceph-significant limits on xattr size?
[20:46] <gregaf> dwm_: I don't think so…it might max out at 96k or something but I can't imagine Ceph reaching that
[20:50] <dwm__> Hmm, http://marc.info/?l=ceph-devel&m=131942130322957&w=2 indicates that Sage at least _used_ to believe that XFS had no restriction. :)
[20:50] <nhm> dwm__: http://ceph.newdream.net/wiki/Backend_filesystem_requirements
[20:52] <dwm__> nhm: Hmm, that sentence beginning 'XFS' seems to be missing a 'not' in there.
[20:52] <nhm> dwm__: agreed. :)
[20:53] <dwm__> Hmm, given we've been happily using XFS in production for some years now, I suspect that -- unless btrfs stabilizes substantially soon -- I might move our testing cluster over to that.
[20:53] <dwm__> (Though I think Oracle have committed to switching to btrfs for their next release? In which case, I wish Chris Mason luck..)
[20:54] <nhm> dwm__: it'd be interesting to hear how it goes for you if you go the XFS route.
[20:55] <dwm__> Given my current workload, I probably won't have time to poke it that hard for a short while..
