#ceph IRC Log


IRC Log for 2012-05-03

Timestamps are in GMT/BST.

[0:09] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:13] * aliguori (~anthony@c-68-44-125-131.hsd1.nj.comcast.net) Quit (Ping timeout: 480 seconds)
[0:16] * berwin22 (~joseph@ has joined #ceph
[0:16] * berwin22 (~joseph@ has left #ceph
[0:17] * berwin22 (~joseph@ has joined #ceph
[0:18] <sagewk> nhm, elder: the splash of reads at time ~90 in the 4m trace is during a sync(2)
[0:18] <yehuda_hm> sagewk: flusher?
[0:18] <sagewk> no flusher
[0:18] <sagewk> ceph-osd wasn't doing nothing but sync(2) at that point
[0:20] <sagewk> the rest of the time, the workload is roughly this: http://fpaste.org/P1yG/
[0:20] <sagewk> (modulo some getattr/setattrs on inodes we're writing too)
[0:20] <yehuda_hm> hmm.. the problem is that the MB/s is whatever going in to the joural, whereas the visualization is what's going to the fs, right?
[0:20] <yehuda_hm> or have I missed it completely?
[0:20] <sagewk> the movie is all for the fs
[0:21] <sagewk> the journal is going way faster than all of this (its a different raw disk we're writing to sequentially)
[0:21] <sagewk> the question is why xfs is so slow.. we should be getting somethign closer to the disk tput doing 4MB writes
[0:21] <sagewk> maybe the truncates need an inode update to truncate, and those are causing all the syncs?
[0:21] <nhm> yehuda_hm: the jouranl is a 10G partition on an SSD.
[0:21] <yehuda_hm> nhm: ok, yeah, missed that completely
[0:22] <nhm> yehuda_hm: /dev/sdi in the collectl output
[0:22] * berwin22 (~joseph@ has left #ceph
[0:26] <yehuda_hm> sagewk: how fast is disk tput doing 4MB writes?
[0:28] <sagewk> hrm, i only get 68 MB/sec
[0:28] <sagewk> let me try that a few times
[0:29] <nhm> sagewk: Are you doing something on the node right now?
[0:29] <sagewk> (time bash -c "dd .. ; sync")
[0:29] <sagewk> oh, yeah sorry :)
[0:29] <nhm> :D
[0:30] <sagewk> make that 129MB/sec when i'm not sharing :)
[0:30] <sagewk> nhm: all yours
[0:30] <nhm> sagewk: doing flusher tests now, will try some btrfs tests later.
[0:31] <nhm> sagewk: actually, once I finish this test with the flusher on, I'm going to go afk for a while and be back later, so in about 10 mins you can play on it of you want.
[0:33] <sagewk> k. then i suspect we should switch to btrfs. i think alex can tell us more given the operations that we're doing
[0:33] <sagewk> my guess is that truncate is the problem.. when xfs flushes it's journal it probably has to update the inode, and that's scattered about. something like that.
[0:33] <nhm> sagewk: yeah, that was my thought too. See what happens with btrfs.
[0:33] <Tv_> hehe
[0:33] <Tv_> i'm just waiting for ext4 to come out on top for a moment again ;)
[0:33] <sagewk> :)
[0:33] <sagewk> could happen!
[0:34] <Tv_> maybe if we moved fully from xattr to leveldb, then it might have a chance
[0:34] <Tv_> the metadata ops will kill it pretty fast i think ;)
[0:34] <Tv_> but it's just funny to have the world asking us for recommendations on fs to use
[0:34] <Tv_> "they all suck"
[0:38] <dmick> ntfs
[0:38] <dmick> <ducks>
[0:39] <joao> ntfs is actually a pretty neat fs
[0:39] <nhm> I've actually been tempted to try ceph on zfs just to see what would happen.
[0:40] <Tv_> hey i was an OS/2 user, HPFS was a'ight
[0:41] <yehuda_hm> ebofs ftw
[0:46] <Tv_> sending a program into the 1541 to implement your own storage offload engine ftw
[0:47] <Tv_> uhhh /var/run/ceph/ceph.name.asok
[0:47] <Tv_> did someone forget a "$" ?
[0:47] <Tv_> OPTION(admin_socket, OPT_STR, "/var/run/ceph/$cluster.name.asok")
[0:48] <Tv_> hahaha
[0:48] <yehuda_hm> $cluster.$name
[0:48] <yehuda_hm> actually should be $cluster.$name.$id
[0:48] <Tv_> name = type.id
[0:49] <yehuda_hm> oh
[0:49] <yehuda_hm> then ignore me
[0:49] <Tv_> fixing..
[0:49] <dmick> and, asok, not asock?...
[0:49] <dmick> shades of creat()
[0:49] <yehuda_hm> yeah, asock is better
[0:49] <Tv_> that's not a bug, that's a... cockroach, or something
[0:50] <Tv_> i agree but don't feel strongly enough to worry about what might break
[0:50] <yehuda_hm> sagewk: issue reproduced
[0:51] <dmick> ugh: m_asokc
[0:51] <dmick> thta mkaes my yees hrut
[0:51] <dmick> I'll shut up now
[0:52] <yehuda_hm> dmick: speaking of eyes hurting, I think I can finally see again
[0:52] <dmick> optometrist dilation?
[0:52] <yehuda_hm> yeah
[0:54] <sagewk> tv_: aie.. put that in stable branch, please :)
[0:55] <Tv_> sagewk: it went in master, want me to cherry-pick?
[0:55] <sagewk> i can do it.
[0:57] <dmick> which raises the question: "push to" default should be....?
[0:57] <Tv_> chef status update: limitations remaining: 1) single mon 2) journal inside osd
[0:57] <Tv_> new stuff: script to prepare a mounted disk as osd disk, another to activate such a disk (allocate id etc, mkfs, start a daemon)
[0:57] <Tv_> not complete: udev etc automagic to mount the disks automatically etc
[0:57] <Tv_> dmick: not stable
[0:58] <dmick> joshd suggested next, but then I couldn't find it
[0:58] <sagewk> dmick: git "push to" default? that sound dangerous..
[0:58] <Tv_> dmick: well the way *others* use next, it is scrapped & redone all the time, so using it as a base would be bad
[0:59] <dmick> well, I mean
[0:59] <Tv_> but yes, if something is identified as a bugfix, it should probably be prepared on top of stable, not in master
[0:59] <dmick> when I push to 'the tree', which branch should I generally be using. Obviously there are special cases. Or should it *always* be a private branch such that someone else can merge when wanted?
[1:00] <Tv_> but we don't really do topic branches like that
[1:00] <Tv_> dmick: only single value answer possible is "master"
[1:00] <Tv_> dmick: every other answer is more complex
[1:00] <Tv_> dmick: "private" branch is a good idea when you're unsure, etc
[1:00] <sagewk> dmick: normally topic branch, so it can be reviewed, unless it's trivial. bugfixes against stable (or next, depending on how old the bug is), new stuff against master.
[1:01] <Tv_> dmick: and if this hadn't been a single-character change, i might even used used a branch
[1:02] <dmick> ah. I've been using my own repo for review
[1:02] <Tv_> dmick: we haven't done "the github way" much, largely because we picked up these habits before we used github
[1:02] <dmick> not like I know what that is either really :)
[1:02] <Tv_> forking on github, working in your own repo (perhaps in a branch there), sending pull requests etc
[1:03] <yehuda_hm> sagewk: I think what happens is that we hang up the connection due to inactivity (jenkins runs only every 30 minutes), but fail to reregister the watch
[1:03] <Tv_> and i think as long as Sage takes the hit of doing the merges, we really don't want topic branches for anything too trivial
[1:03] * mgalkiewicz (~mgalkiewi@ Quit (Quit: Leaving)
[1:03] <Tv_> and this is what the more technical release manager person would get to own
[1:04] <yehuda_hm> that's why I wasn't able to accelerate the issue by running the test consecutively .. the connection didn't drop down after 15 minutes
[1:04] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:05] <yehuda_hm> so for some reason we hang up the watching connection which is a big no, and we also fail to resend the watch request once reestablished
[1:06] <yehuda_hm> workaround for congress: update bucket information every 10 seconds
[1:10] <Tv_> sagewk: ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1)
[1:19] <yehuda_hm> sagewk: ok, reproduced locally: scratchtoolpp --ms-tcp-read-timeout=10 (and waiting more than 10 seconds before sending notification)
[1:22] <sagewk> yehuda_hm: nice
[1:25] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[1:27] <Tv_> what's /dev/sda called as opposed to /dev/sda1 ?
[1:27] <Tv_> they're both block devices but...
[1:28] <Qten> morning #ceph
[1:29] <sagewk> tv_: for the socket thing, the problem is you have 'admin socket = ...' in [global]. move it into [osd] or [mds] (or use the default) so that it doesn't affect the ceph tool.
[1:29] <Tv_> sagewk: oh so it's unsafe to set.. heh
[1:29] <sagewk> tv_: or, we can make the ceph tool explicitly _never_ have a socket. in general, though, it's useful even on client side stuff...e.g. rados bench
[1:30] <Tv_> i only set it to work around the bug in the first place ;)
[1:30] <sagewk> yeah
[1:31] <Tv_> sagewk: so the real culprit is, our debian packaging doesn't ensure /var/run/ceph exists
[1:32] <dmick> Tv_: called where?
[1:32] <Tv_> called?
[1:32] <dmick> (04:27:25 PM) Tv_: what's /dev/sda called as opposed to /dev/sda1 ?
[1:33] <Tv_> dmick: oh in general language.. what do i put in cli usage to imply i want the /dev/sda kind not /dev/sda1
[1:33] <dmick> by humans? I'd probably say "sda" is "whole device" and "sda1" is "partition device"
[1:34] <Tv_> hmm, i'll try that
[1:34] <sagewk> tv_: that's against debian policy :/
[1:34] <sagewk> the init script (or whatever) should create as needed.
[1:34] <Tv_> yeah i'm just reading that
[1:35] <Tv_> actually, i only see that for /run
[1:35] <Tv_> which is a tmpfs anyway
[1:35] <Tv_> oh now i see, in fhs
[1:35] <sagewk> same thing, they jsut switched that a couple months ago
[1:35] <Tv_> "must be cleared"
[1:36] <Tv_> yeah ok... i wonder where to slap the bugger
[1:36] <Tv_> i guess both mon and osd upstart jobs get it
[1:36] <sagewk> start scripts...?
[1:36] <sagewk> yeah
[1:36] <Tv_> the problem with that is
[1:36] <sagewk> btw while we're talking about paths...
[1:37] <Tv_> does e.g. ceph-osd --mkfs try to use asok too?
[1:37] <sagewk> we ended with /var/lib/ceph/$type/$cluster-$id
[1:37] <sagewk> and /var/log/ceph/$cluster.$type.$id.log
[1:37] <sagewk> should we be more consistent with . vs 0
[1:37] <sagewk> -
[1:37] <Tv_> oh, hmm
[1:37] <sagewk> yeah
[1:38] <Tv_> log in $cluster-$type.$id.log won't make me lose sleep
[1:40] <sagewk> that's the spirit :)
[1:47] <sagewk> tv_: and ceph -n osd.123 -k $osd_data/keyring already works
[1:49] <Tv_> sagewk: sorry not sure what you mean by that
[1:49] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[1:49] <Tv_> sagewk: you mean crush set?
[1:50] <sagewk> yeah, no need to use the admin key there.
[1:50] <Tv_> ah nice; didn't actually try it
[1:56] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[2:05] * Tv_ (~tv@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:08] <gregaf> sagewk: finished reviewing that branch, couple comments on github
[2:17] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[2:19] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:21] <Qten> when using XFS file system is there recommended parms for mounting and formatting?
[2:23] <gregaf> Qten: I don't think anything but the defaults for XFS (so far)
[2:23] <dmick> maybe noatime?
[2:29] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:29] <Qten> mmkay shall have a look
[2:48] <iggy> something with ACLs maybe
[2:49] <iggy> might check the list archives
[2:50] <iggy> err, not ACLs... xattrs
[3:02] * joao (~JL@89-181-154-158.net.novis.pt) Quit (Ping timeout: 480 seconds)
[3:39] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[3:51] * eightyeight (~atoponce@pthree.org) Quit (Ping timeout: 480 seconds)
[3:53] * eightyeight (~atoponce@pthree.org) has joined #ceph
[4:05] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:18] * eightyeight (~atoponce@pthree.org) Quit (Ping timeout: 480 seconds)
[4:20] * eightyeight (~atoponce@pthree.org) has joined #ceph
[5:16] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[5:26] * imjustmatthew (~imjustmat@pool-96-228-59-72.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[6:09] * The_Bishop (~bishop@cable-86-56-102-91.cust.telecolumbus.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[6:13] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has joined #ceph
[6:13] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has left #ceph
[6:28] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has joined #ceph
[6:30] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[6:43] * chutzpah (~chutz@ Quit (Quit: Leaving)
[6:52] * yoshi (~yoshi@p3167-ipngn3601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[7:02] * cattelan is now known as cattelan_away
[7:19] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) Quit (Quit: Leaving.)
[7:19] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) has joined #ceph
[7:39] * Ryan_Lane (~Adium@c-98-210-205-93.hsd1.ca.comcast.net) has joined #ceph
[7:44] * renzhi (~renzhi@ has joined #ceph
[8:10] * The_Bishop (~bishop@cable-86-56-102-91.cust.telecolumbus.net) has joined #ceph
[8:13] * aa (~aa@r186-52-138-155.dialup.adsl.anteldata.net.uy) has joined #ceph
[8:16] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[8:37] * aa (~aa@r186-52-138-155.dialup.adsl.anteldata.net.uy) Quit (Ping timeout: 480 seconds)
[8:38] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[8:44] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[8:52] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) has joined #ceph
[8:53] * adjohn (~adjohn@70-36-139-109.dsl.dynamic.sonic.net) Quit ()
[8:57] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[8:58] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:09] * s[X]_ (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[9:10] * Theuni (~Theuni@ has joined #ceph
[9:10] * Oliver1 (~oliver1@ip-176-198-97-69.unitymediagroup.de) has joined #ceph
[9:20] * Oliver1 (~oliver1@ip-176-198-97-69.unitymediagroup.de) Quit (Quit: Leaving.)
[9:52] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[10:10] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) has joined #ceph
[10:33] * ceph-test (~Runner@mail.lexinter-sa.COM) has joined #ceph
[10:51] * joao (~JL@89-181-148-121.net.novis.pt) has joined #ceph
[10:56] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) Quit (Remote host closed the connection)
[11:03] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) has joined #ceph
[11:06] * The_Bishop (~bishop@cable-86-56-102-91.cust.telecolumbus.net) Quit (Ping timeout: 480 seconds)
[11:14] * The_Bishop (~bishop@cable-86-56-102-91.cust.telecolumbus.net) has joined #ceph
[11:39] <Dieter_b1> cool. new website.. btw newdream is now inktank, or what?
[11:41] <Theuni> oh yeah, nice
[11:41] <Theuni> and documentation!
[11:41] * Theuni is happy
[11:45] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) Quit (Remote host closed the connection)
[11:48] <Dieter_b1> huh. i just read this on the mailing list:
[11:48] <Dieter_b1> RADOS objects are stored on the OSD as a whole file, so potentially a
[11:48] <Dieter_b1> single RADOS object could press an OSD over the full_ratio and stalling
[11:48] <Dieter_b1> the whole cluster.
[11:48] <Dieter_b1> ^^ i thought files were always being chunked irrespective of their size?
[12:13] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Remote host closed the connection)
[12:16] * Oliver2 (~oliver1@p4FFFEEEB.dip.t-dialin.net) has joined #ceph
[12:16] * Oliver2 (~oliver1@p4FFFEEEB.dip.t-dialin.net) Quit ()
[12:30] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) has joined #ceph
[12:36] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[12:52] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) Quit (Remote host closed the connection)
[12:58] * renzhi (~renzhi@ Quit (Quit: Leaving)
[13:09] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) has joined #ceph
[13:20] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[13:28] <Qten> Hi All, if the ceph's journal disk is to fail do I loose any data and or does the current write get lost?
[13:30] <joao> depends if the last operation journaled got to be written to an osd
[13:31] <joao> I also think it depends on the journaling mode
[13:31] * pmjdebruijn (~pascal@overlord.pcode.nl) has left #ceph
[13:32] <joao> but I'm not sure about that
[13:32] <joao> would have to check it
[13:32] <joao> just a minute
[13:35] <Qten> no probs
[13:36] * yoshi (~yoshi@p3167-ipngn3601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:36] <Qten> i guess it work make sence to use some kind of journal disk's raid 1 or something
[13:37] <Qten> any ideas why brtfs dosnt need one? is it something todo with alike zfs and its arc caches?
[13:38] <joao> when on btrfs, snapshots are taken instead of writing to a journal
[13:39] <NaioN> Qten: I only use refatime
[13:39] <NaioN> Qten: for XFS :)
[13:40] <NaioN> sorry relatime :)
[13:41] <Qten> hmm
[13:41] <NaioN> Qten: you don't need anything for xattrs for XFS
[13:41] <NaioN> but I formatted the XFS fs'es with -i 1024 in order to have 1k inodes
[13:42] <Qten> what i'm trying to work out is why my performance is so bad, large dd's (RAMx2) to the disks can get over 120mb/s but somehow i'm hardly getting 40mb/s
[13:42] <NaioN> that way a lot of xattrs will fit into the inode
[13:42] <NaioN> with what kind of journal
[13:42] <NaioN> if i'm correct ceph uses writeback for xfs
[13:43] <NaioN> so with a lot of disks behind a journal, the journal could also be a bottleneck
[13:44] <Qten> osd.1 80 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 34.595994 sec at 30309 KB/sec
[13:44] <Qten> osd.2 43 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 33.980284 sec at 30858 KB/sec
[13:44] <NaioN> I don't know what sort of workload the benchmark does
[13:45] <Qten> and these 2 servers with a normal dd get over 100mb/s
[13:45] <Qten> they pretty much are what i'm getting if i do a dd on a rbd volume
[13:45] <Qten> xfs noatime,nodiratime,nobarrier,logbufs=8
[13:46] <Qten> are my fstab mount options
[13:47] <Qten> whats realtime option do?
[13:48] <NaioN> 2012-05-03 13:47:59.919742 osd.0 315 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 8.258861 sec at 123 MB/sec
[13:48] <joao> Qten, hang around for a bit; I'm sure nhm should be popping up any time now and he's probably the guy you want to talk to when it comes to performance
[13:48] <Qten> no probs
[13:48] <NaioN> looks a lot better
[13:48] <Qten> is that a single disk NaioN or raid?
[13:48] <NaioN> relatime is something like noatime
[13:49] <NaioN> single disk
[13:49] <NaioN> i have a setup with an osd per disk
[13:49] <Qten> nice
[13:49] <Qten> atm i was trying to get a bit more speed so i software raided 2 disks up in the lab machines but didnt much all that much difference :)
[13:50] <NaioN> no I've tested a lot with mdraid
[13:50] <Qten> what kind of disk is it? sata sas?
[13:50] <NaioN> and I didn't get it stable and it wasn't that fast
[13:50] <NaioN> sata's
[13:50] <Qten> enterprise or consumer not that it should matter for "speed"
[13:50] <NaioN> hmmm i wrote a e-mail on the mailing list about sata and sas
[13:51] <Qten> i need to get on that :)
[13:51] <NaioN> well somebody asked if it's better to use sata or sas disks for journal
[13:51] <NaioN> but the speed depens more on other things
[13:52] <NaioN> like rpms, density, cache, etc
[13:52] <NaioN> sata/sas is just the type of connection (protocol)
[13:52] <NaioN> but i use "consumer" grade sata
[13:52] <NaioN> 2TB disks
[13:52] <Qten> suppose at the end of the day a spinning disk can only do 1 thing at a time
[13:53] <NaioN> indeed
[13:53] <Qten> i was looking at the hitachi 2tb disks look pretty nice
[13:53] <NaioN> have those
[13:54] <NaioN> no problems with them
[13:54] <NaioN> and I have seagate's
[13:54] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[13:54] <Qten> in prod or just a lab?
[13:54] <NaioN> the hitachi's in prod
[13:55] <NaioN> but the prod clusters runs for a couple of weeks now, so no long term measurement
[13:55] <Qten> yah specs look nice for a non-enterprise i was looking at doing the same
[13:55] <NaioN> don't look to long at specs
[13:55] <NaioN> more disks is better :)
[13:56] <Qten> mainly interested in the mtbf and "run times" 24x7 vs 8x5 etc
[13:56] <NaioN> I looked at the price/GB
[13:56] <Qten> hitachi are the only ones i found that say 24x7
[13:56] <NaioN> I just bought a couple extra
[13:56] <Qten> for sure thats my plan too
[13:56] <Qten> cant justify spending 700 on a 2tb enterprise sas
[13:57] <NaioN> indeed
[13:57] <NaioN> for those prices you can get an extreme fallout...
[13:57] <NaioN> without losing
[13:57] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[13:57] <Qten> heh
[13:58] <NaioN> I mean a lot of consumer sata disks for 1 enterpise sas
[13:58] <Qten> i cant really find much in the way of recommended ram per node tho what did you end up using?
[13:58] <Qten> yeah i agree 4/5 consumer vs 1 enterprise ummm
[13:59] <NaioN> at the moment I have 24GB per node with 24 disks per node
[13:59] <Qten> how many nodes/rep level did you end up with?
[13:59] <Qten> i was thinking 4 nodes 3 rep
[14:00] <NaioN> at the moment 3 nodes with rep 2
[14:00] <Qten> as a starting point
[14:01] <NaioN> it's for backup
[14:01] <Qten> ah
[14:01] <Qten> any idea on how much ram the mds's use?
[14:02] <NaioN> the osd use about 750MB (VIRT) and 200MB (RES) and 2GB (SHARED) per osd
[14:02] <NaioN> a lot
[14:02] <NaioN> but I don't use it
[14:02] <NaioN> only rbd's
[14:03] <Qten> do we have any ideas on what alot is :)
[14:03] <NaioN> no sorry
[14:03] <Qten> no drama
[14:04] <Qten> oh well i'm off for the night have a good one guys, thanks for the info NaioN
[14:05] <NaioN> np
[14:10] * BManojlovic (~steki@ has joined #ceph
[14:11] <Dieter_b1> hey joao or NaioN can you have a look at my question? (just before Qten's question)
[14:21] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[15:31] <nhm> good morning all
[15:34] <Dieter_b1> hi
[15:41] * s[X]_ (~sX]@60-241-151-10.tpgi.com.au) Quit (Remote host closed the connection)
[15:52] * Theuni (~Theuni@ Quit (Ping timeout: 480 seconds)
[15:54] * josef (~seven@nat-pool-rdu.redhat.com) has joined #ceph
[15:54] <josef> i'm trying to create an rbd image and its giving me an error
[15:54] <josef> 2012-05-03 09:57:26.568955 7fef6260f780 librbd: failed to assign a block name for image
[15:54] <josef> i'm using the command rbd create -size 100 testimg
[15:54] <josef> any ideas what i'm doing wrong?
[16:00] * Theuni (~Theuni@ has joined #ceph
[16:00] <yehuda_hm> josef: maybe you don't have the rbd pool?
[16:01] <yehuda_hm> (where the image is being created by default, iirc)
[16:02] <josef> yehuda_hm: probably, i'm following instructions from somebody else so i can reproduce a btrsf problem
[16:02] <josef> http://fpaste.org/4wnk/
[16:02] <josef> those are the steps i've followed
[16:02] <josef> exactly
[16:03] * Theuni (~Theuni@ has left #ceph
[16:04] <nhm> yehuda_hm: you are up early. :)
[16:04] <yehuda_hm> nhm: I'm usually awake this time actually, but yeah
[16:06] <yehuda_hm> josef: rbd --debug-ms=1 might give you extra info
[16:07] <josef> http://fpaste.org/e3BG/
[16:07] <nhm> yehuda_hm: I figured everyone out at the HQ must party until 1AM every morning. ;)
[16:08] <yehuda_hm> nhm: so true, we tried to hide that from you, but you're on to us
[16:16] <nhm> josef: yehuda had to run, I can try to help though I can't guarantee I know what I'm doing. ;)
[16:17] <josef> nhm: the errors i'm getting are in that fpaste
[16:18] <josef> <-- btrfs developer, i know nothing about ceph
[16:19] <nhm> josef: maybe this? http://ceph.com/docs/master/dev/osd-class-path/
[16:21] <josef> osd class dir = /usr/lib64/rados-classes
[16:21] <josef> i have that
[16:22] <nhm> hrm, is that where cls_rbd.so is?
[16:22] <nhm> on your systme?
[16:22] <josef> yeah
[16:23] <josef> [root@destiny btrfs-next]# ls /usr/lib64/rados-classes/
[16:23] <josef> libcls_rbd.so.1 libcls_rbd.so.1.0.0 libcls_rgw.so libcls_rgw.so.1 libcls_rgw.so.1.0.0
[16:23] <nhm> ok, just to cover bases, that's in all of the ceph.conf files on the osd nodes and the osds have been restarted with that line present?
[16:25] <josef> yup
[16:25] <josef> it was there originally
[16:25] * josef restarts ceph just to make sure
[16:25] <josef> same thing
[16:27] <nhm> huh, ok. I'm not really familiar with this problem. I found an ancient reference to running "cclass -a"
[16:28] <nhm> oh, apparently you might be able to do a "ceph class list" too.
[16:29] <josef> class distribution is no longer handled by the monitor
[16:29] <josef> bummer
[16:30] <nhm> hrm, you might be able to do "ceph class add -i"
[16:32] <josef> still no go
[16:33] <nhm> did you try somethign like "ceph class add -i /usr/lib64/rados-classes/libcls_rbd.so.1.0.0 rbd 1.3 x86-64"?
[16:33] <josef> no i just did ceph class add -i /usr/lib64/rados-classes/libcls_rbd.so.1.0.0
[16:33] <nhm> what did it do?
[16:33] <josef> printed this
[16:33] <josef> 2012-05-03 10:37:45.323712 7f1a87d44780 read 22880 bytes from /usr/lib64/rados-classes/libcls_rbd.so.1.0.0
[16:33] <josef> and then the other error
[16:34] <josef> class distribution is no longer handled by the monitor
[16:34] <nhm> huh, ok. I think we need yehuda or one of the other guys. Sorry. :(
[16:35] <josef> hah its cool
[16:36] <nhm> btw, what OS is this?
[16:37] <josef> linux?
[16:37] <nhm> josef: distro?
[16:37] <josef> fedora
[16:37] <josef> there are other distros? ;_
[16:37] <josef> err ;)
[16:40] <nhm> josef: I think our marketting people would probably kill me if I got into a distro flamewar on the day our website launched. ;)
[16:41] <nhm> josef: looks like this is maybe a common problem on fedora with ceph: http://berrange.com/posts/2011/10/12/setting-up-a-ceph-cluster-and-exporting-a-rbd-volume-to-a-kvm-guest/
[16:43] <nhm> redhat bugillza entry: https://bugzilla.redhat.com/show_bug.cgi?id=745460
[16:43] <josef> nhm: yeah i'm on 0.45
[16:43] <josef> so it should be fine
[16:44] <nhm> josef: yeah, maybe it broke again for some reason.
[16:44] <nhm> josef: apparently this is the patch that fixed it the first time: https://github.com/ceph/ceph/commit/7e5dee907a8218647a88d1c7d3316cc277e1c44b
[16:46] <josef> nhm: right but if it was the same thing osd class dir should fix it
[16:46] <josef> unless that option is not working as well
[16:48] <nhm> josef: that's what I'm wondering...
[16:56] <dwm_> Hey chaps, congratulations on Inktank launching. :-)
[17:03] <elder> http://www.inktank.com/news-events/news/new-startup-inktank-delivers-the-future-of-storage-with-ceph/
[17:03] <nhm> dwm_: thanks! :)
[17:07] <josef> hrm rebooted and now its doing something
[17:07] <josef> weird
[17:07] <josef> how long does it take for the command to run?
[17:08] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:14] <josef> its doing this over and over again
[17:14] <josef> 2012-05-03 11:16:17.832881 7ff8fffff700 -- --> -- ping v1 -- ?+0 0x7ff8f0002b50 con 0xba63e0
[17:15] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:18] * nhm (~nh@ Quit (Ping timeout: 480 seconds)
[17:22] * nhm (~nh@ has joined #ceph
[17:24] * Oliver1 (~oliver1@p4FFFEEEB.dip.t-dialin.net) has joined #ceph
[17:27] <nhm> josef: sorry, got disconnected for a bit due to rain. What's going on?
[17:28] <josef> http://fpaste.org/OE92/
[17:28] <josef> that
[17:28] <jefferai> nhm: speaking of which -- congrats on the website launch! (not sure who else to ping... sagewk, elder at least)
[17:28] * cattelan_away is now known as cattelan
[17:29] <nhm> jefferai: Thanks!
[17:29] <nhm> jefferai: Sage defintiely. :)
[17:29] <elder> nhm too
[17:29] <jefferai> yeah, nice little profile of him on the page
[17:29] <elder> We're all in this together.
[17:30] * Oliver1 (~oliver1@p4FFFEEEB.dip.t-dialin.net) Quit (Quit: Leaving.)
[17:30] <elder> Others may not yet be online.
[17:30] <nhm> Yeah, everyone should be on in an hour or so.
[17:34] * danieagle (~Daniel@ has joined #ceph
[17:39] <joao> finally, got all my affairs in order
[17:39] <joao> this country is so damn bureaucratic...
[17:40] <nhm> joao: yours or ours? ;)
[17:41] <joao> mine
[17:47] <filoo_absynth> hrm
[17:47] <filoo_absynth> i cannot figure out how to use the latest php-cgi exploit for remote code execution
[17:48] <filoo_absynth> i read that dreamhost was specifically hit by this (via the nullcon CTF contest)
[17:49] * Theuni (~Theuni@ has joined #ceph
[17:50] * Tv_ (~tv@aon.hq.newdream.net) has joined #ceph
[18:04] * BManojlovic (~steki@ has joined #ceph
[18:06] <sagewk> gregaf: uh oh, i wonder if we broke this? http://fpaste.org/OE92/
[18:13] * Oliver1 (~oliver1@ip-176-198-97-69.unitymediagroup.de) has joined #ceph
[18:17] * Theuni (~Theuni@ Quit (Remote host closed the connection)
[18:30] <josef> sagewk: so you're to blame!?
[18:31] <josef> i have to go work from the dealership so i wont have access to that box for hte next couple of hours
[18:31] <sagewk> :/ we'll see... should have an answer before you get back
[18:32] <josef> k
[18:32] <josef> its 0.45 btw
[18:32] <sagewk> if we are, it should be an intermittent thing
[18:32] <sagewk> k
[18:33] <josef> ill be on irc still, just wont have access to the box
[18:38] <morpheus> sometimes rbd info and other commands are hanging forever, example: http://fpaste.org/xhzr/ any idea whats the reason?
[18:39] <morpheus> if i kill the running rbd command and restart it after ~10 seconds its working fine
[18:50] * steki-BLAH (~steki@bojanka.net) has joined #ceph
[18:51] * Ryan_Lane (~Adium@c-98-210-205-93.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:53] <Tv_> morpheus: perhaps one of your mons is down; a random one is chosen, and the timeout can look like a hang
[18:54] * joshd (3f6e330b@ircip4.mibbit.com) has joined #ceph
[18:55] <Tv_> sagewk: i hear you might be merging wip-doc-rebase-2
[18:55] <Tv_> sagewk: i'd like to do s/_/-/g on the urls, and have all clear from John
[18:55] * BManojlovic (~steki@ Quit (Ping timeout: 480 seconds)
[18:56] <Tv_> sagewk: either before or after your merge, just let me know so we don't race each other
[18:56] <morpheus> hum, all mons seem to run, i'll do some more testing
[18:56] <sagewk> tv_: i think john is doing that now
[18:57] * mkampe (~markk@aon.hq.newdream.net) Quit (Remote host closed the connection)
[18:57] <Tv_> sagewk: ok i'm his faster text manipulator right hand, in this case
[18:57] <Tv_> sagewk: hold on the merge
[18:59] * lofejndif (~lsqavnbok@82VAADK2N.tor-irc.dnsbl.oftc.net) has joined #ceph
[18:59] <sagewk> tv_ coordinate with him?
[19:00] <Tv_> on it
[19:02] <morpheus> another thing which happend twice since updating to 0.46 is ceph complaining about a secret: cephx: verify_authorizer could not get service secret for service osd secret_id=325
[19:03] * mkampe (~markk@aon.hq.newdream.net) has joined #ceph
[19:15] <Tv_> sagewk: ok wip-doc-rebase-2 updated, back to you and/or John
[19:16] * lofejndif (~lsqavnbok@82VAADK2N.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[19:17] * chutzpah (~chutz@ has joined #ceph
[19:21] * lofejndif (~lsqavnbok@09GAAFIZ7.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:22] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[19:23] * Ryan_Lane (~Adium@ has joined #ceph
[19:24] <morpheus> Tv_: i did some more testing and the problem seems to be caused by a OSD http://fpaste.org/CNax/raw/
[19:26] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[19:30] <nhm> elder: if you have a chance, I'd like your opinion on what Sage was takling about yesterday regarding truncate needing to update the the inode and causing all kinds of syncs...
[19:33] <Tv_> morpheus: i wonder if that's the "filesystem gets slow" issue we've seen before
[19:33] * lofejndif (~lsqavnbok@09GAAFIZ7.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[19:34] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[19:34] <elder> nhm, Let me go back to see what you're talking about.
[19:35] * lofejndif (~lsqavnbok@09GAAFI0U.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:35] <elder> Nope, I don't know what you're talking about.
[19:35] <morpheus> Tv_: the interesting thing is, there are ~90vms running on the cluster without problems, can't track this down
[19:36] <elder> Forward me the e-mail or something, maybe I didn't get it, or maybe I deleted it, nhm.
[19:37] * adjohn (~adjohn@ has joined #ceph
[19:37] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[19:38] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[19:39] <elder> nhm, going to grab some lunch. I'll check back in about 10 minutes. I also need to leave about 1:45 for a haircut.
[19:41] * The_Bishop (~bishop@cable-86-56-102-91.cust.telecolumbus.net) Quit (Ping timeout: 480 seconds)
[19:43] <nhm> elder: Was on IRC
[19:43] * fzylogic (~fzylogic@ has joined #ceph
[19:43] <nhm> elder: sorry, I left to make a sandwich right before you replied
[19:44] <joshd> morpheus: if you have the admin socket enabled on osd.8 (defaults to /var/run/ceph/$cluster.$name.asok) you can check where the request is stuck with 'ceph --admin-daemon /path/to/socket dump_ops_in_flight'
[19:44] * fzylogic (~fzylogic@ Quit ()
[19:44] * fzylogic (~fzylogic@ has joined #ceph
[19:51] <morpheus> joshd: show 'num_ops = 0'
[19:51] <morpheus> +s
[19:52] * The_Bishop (~bishop@cable-86-56-102-91.cust.telecolumbus.net) has joined #ceph
[19:52] * NaioN (~stefan@andor.naion.nl) Quit (Remote host closed the connection)
[19:57] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[19:57] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[20:24] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:40] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[20:41] * jlogan (~chatzilla@2600:c00:3010:1:f4d3:1617:caf4:ebd) has joined #ceph
[20:42] <sagewk> elder: http://fpaste.org/P1yG/
[20:43] <sagewk> elder: that's what ceph-osd is doing during that seeky movie
[20:43] <sagewk> vs ~127MB/sec for dd 4mb writes.
[20:44] <sagewk> my theory was that truncate +rewrite on teh small files is the problem.. in order to retire those journal entries it has to update the inodes, which are a scattered about..?
[20:44] <sagewk> i guess not truncate specifically, but the small file updates...
[20:49] <elder> I have leave for a haircut. I'll be back in an hour or two... Thanks for posting that sagewk I'll take a closer look when I return.
[20:49] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[20:55] <Oliver1> joshd: took a look at personal note?
[20:56] * lofejndif (~lsqavnbok@09GAAFI0U.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[21:03] * adjohn (~adjohn@ Quit (Quit: adjohn)
[21:09] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[21:10] * joshd (3f6e330b@ircip4.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[21:11] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[21:13] * NaioN (~stefan@andor.naion.nl) has joined #ceph
[21:14] * Ryan_Lane (~Adium@ has joined #ceph
[21:32] <sagewk> oliver1: saw it.
[21:32] <sagewk> oliver1: he's been hammering on it for days and iirc still hasn't been able to reproduce the problem
[21:34] <Oliver1> sage: I do not have any prob to provide my setup ;)
[21:41] * joshd (3f6e330b@ircip1.mibbit.com) has joined #ceph
[21:47] <sagewk> oliver1: that may be the next step..
[21:50] <nhm> sagewk: first test on btrfs looks like it may have similar issues as with xfs. Going to run a couple more tests then start generating movies and post results.
[21:50] <nhm> sagewk: with flusher on at least.
[21:50] <nhm> then I'll turn it off.
[21:52] <sagewk> nhm: btw i was also watching someting likethis: watch -n .2 'ceph --admin-daemon /var/run/ceph/ceph.name.asok perfcounters_dump | json_xs | grep -A 30 throttle' to see what was going on
[21:53] <nhm> sagewk: sadly I just screwed up and overwrote some old results, so I need to rerun the last test. ;(
[21:54] <nhm> sagewk: nice, I've got a little script that dumps the perfcounters at the start of each second with the second included in the json.
[21:54] <nhm> sagewk: if we want to start collecting that too.
[22:06] * Oliver1 (~oliver1@ip-176-198-97-69.unitymediagroup.de) Quit (Quit: Leaving.)
[22:07] * lofejndif (~lsqavnbok@19NAAIJ8M.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:10] * NaioN (~stefan@andor.naion.nl) Quit (Remote host closed the connection)
[22:10] * NaioN (~stefan@andor.naion.nl) has joined #ceph
[22:22] * Theuni (~Theuni@dslb-088-066-111-066.pools.arcor-ip.net) has joined #ceph
[22:23] * adjohn (~adjohn@ has joined #ceph
[22:37] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:47] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[23:05] * Theuni (~Theuni@dslb-088-066-111-066.pools.arcor-ip.net) Quit (Quit: Leaving.)
[23:17] * jlogan (~chatzilla@2600:c00:3010:1:f4d3:1617:caf4:ebd) Quit (Quit: ChatZilla [Firefox 12.0/20120417165043])
[23:31] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[23:32] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[23:34] * andreask1 (~andreas@chello062178013131.5.11.vie.surfer.at) Quit ()
[23:35] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[23:40] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[23:40] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[23:52] * joshd (3f6e330b@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.