#ceph IRC Log


IRC Log for 2011-08-03

Timestamps are in GMT/BST.

[0:15] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[0:27] <gregaf> cmccabe: anybody else: it doesn't appear to be possible to set the op size for rados bench anymore?
[0:28] <cmccabe> gregaf: the block-size parameter to radostool?
[0:28] <gregaf> yeah
[0:29] <gregaf> gregf@kai:~/ceph/src$ ./rados -p data --block-size 1 bench 0 write -t 4
[0:29] <gregaf> unrecognized command --block-size
[0:29] <gregaf> gregf@kai:~/ceph/src$ ./rados -p data -b 1 bench 0 write -t 4
[0:29] <gregaf> unrecognized command -b
[0:29] <gregaf> etc
[0:29] <cmccabe> gregaf: the first parameter should be the action
[0:29] <cmccabe> bench in this case
[0:30] <gregaf> alternatively:
[0:30] <gregaf> gregf@kai:~/ceph/src$ ./rados -p data bench 0 write -t 4 -b 1
[0:30] <gregaf> Maintaining 4 concurrent writes of 4194304 bytes for at least 0 seconds.
[0:30] <gregaf> note the lack of size change
[0:30] <cmccabe> actually, I think I know what's going on here
[0:30] <gregaf> I'd hoped you would
[0:31] <cmccabe> that first argument parsing loop is supposed to search for block-size (and any other similar parameters)
[0:37] <cmccabe> gregaf: fixed by 9dbeeaaf0eaeb3f213b3dc4a6d9421101845a7fc
[0:37] <gregaf> cool, thanks
[0:50] * _Shiva_ (shiva@whatcha.looking.at) Quit (Server closed connection)
[0:50] * _Shiva_ (shiva@whatcha.looking.at) has joined #ceph
[0:56] * stingray (~stingray@stingr.net) Quit (Server closed connection)
[0:57] * stingray (~stingray@stingr.net) has joined #ceph
[0:59] * damoxc (~damien@andromeda.digitalnetworks.co.uk) Quit (Server closed connection)
[1:00] * damoxc (~damien@94-23-154-182.kimsufi.com) has joined #ceph
[1:09] * peritus (~andreas@h-150-131.a163.priv.bahnhof.se) Quit (Server closed connection)
[1:09] * peritus (~andreas@h-150-131.a163.priv.bahnhof.se) has joined #ceph
[1:29] * Tv (~Tv|work@ip-64-111-111-107.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:04] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Ping timeout: 480 seconds)
[2:13] * huangjun (~root@ has joined #ceph
[2:19] * cmccabe (~cmccabe@ has left #ceph
[2:27] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[2:48] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:52] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[3:02] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[3:03] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[3:08] * wonko_be_ (bernard@november.openminds.be) has joined #ceph
[3:10] * wonko_be (bernard@november.openminds.be) Quit (Ping timeout: 480 seconds)
[3:11] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Ping timeout: 480 seconds)
[3:13] <lxo> gregaf, in my case, all OSDs carry all PGs (3 OSDs, all PGs 3-plicated), so would that have worked? (i.e., would the xattrs or the contents of the meta/ directory make for trouble?)
[3:27] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:28] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Remote host closed the connection)
[3:50] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[3:54] * kemo (~kemo@c-68-54-224-104.hsd1.tn.comcast.net) Quit ()
[5:08] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[5:59] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[7:31] * lxo (~aoliva@1GLAAC0A0.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[7:31] * lxo (~aoliva@09GAAFVPI.tor-irc.dnsbl.oftc.net) has joined #ceph
[9:38] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[10:38] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[10:54] * huangjun (~root@ Quit (Quit: leaving)
[10:58] <lxo> yay, rsyncing a PG's _head from one osd to another works!
[11:22] * Juul (~Juul@ has joined #ceph
[11:23] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[11:58] * Juul (~Juul@ Quit (Remote host closed the connection)
[13:20] * yoshi (~yoshi@p10166-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:25] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Read error: Connection reset by peer)
[13:42] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[16:00] * lxo (~aoliva@09GAAFVPI.tor-irc.dnsbl.oftc.net) Quit (Read error: Connection reset by peer)
[16:01] * lxo (~aoliva@ has joined #ceph
[18:10] * darkfader (~floh@ Quit (Remote host closed the connection)
[18:10] * darkfader (~floh@ has joined #ceph
[18:13] * DLange (~DLange@dlange.user.oftc.net) Quit (Remote host closed the connection)
[18:14] * monrad-51468 (~mmk@domitian.tdx.dk) Quit (Quit: bla)
[18:37] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) has joined #ceph
[18:39] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[18:46] <sagewk> bchrisman: any news on http://tracker.newdream.net/issues/1194?
[18:48] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:59] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[19:15] * slang (~slang@chml01.drwholdings.com) has joined #ceph
[19:16] <slang> hi all
[19:16] <slang> I'm trying to test out ceph in a 5 node setup, with (around) 8 osds per node
[19:17] <slang> I hadn't had any problems with this setup in the past, but on some hardware with 10ge nics, when I first start the ceph servers, I see quite a few heartbeats getting missed, and then a bunch of osds getting marked down
[19:18] <slang> has anyone else seen anything like this?
[19:19] <slang> I've enabled debugging on the osds (set to 30). logs and config are here: http://dl.dropbox.com/u/18702194/ceph-files-t1.tbz2
[19:21] <slang> (this is with the current stable branch)
[19:22] <slang> eventually things settle down, and we don't see missed heartbeats anymore
[19:22] <slang> but the nodes remain down, and eventually get taken out
[19:22] <slang> s/nodes/osds/
[19:23] <cmccabe> slang: I have to head to a meeting, brb
[19:23] <slang> cmccabe: ok no worries
[19:24] <slang> I also see a number of 'journal throttle: waiting for ops' messages initially
[19:25] <slang> even with osd_journal_queue_max_ops set to 2000
[19:36] <gregaf> slang: have you tested on 10GbE before?
[19:37] <gregaf> the defaults are all set based on our 1GbE experience so it may just be that there's too much stuff getting pumped through before the heartbeats and the receiving OSDs don't see the heartbeat messages until too late
[19:39] <gregaf> also I don't see osd_journal_queue_max_ops as being an option...
[19:39] <gregaf> but yehudasa should be able to help you more!
[19:45] <slang> gregaf: haven't tested on 10ge before, no
[19:46] <slang> OPTION(journal_queue_max_ops, OPT_INT, 500),
[19:47] <slang> that one
[19:48] <slang> looks like that's the limit that causes the 'waited for ops' message in the log
[19:52] <slang> it seems that initially on startup an osd tries to queue up a bunch of transactions to the file store (based on the number of other osds in the pool), and we hit that config limit of 500 (or 2000 in my case), causing the thread to wait
[19:52] <slang> on some condition variable (I assume it gets signaled when the next op finishes)
[19:54] * kemo (~kemo@c-68-54-224-104.hsd1.tn.comcast.net) has joined #ceph
[19:57] <kemo> Hmm...anyone know what's the max RAM the CMDS can utilize?
[19:57] <yehudasa> kemo: it's not completely bounded currently
[20:00] <kemo> Nice
[20:00] <kemo> Trying to design my machines and keep running into random variables..
[20:01] <kemo> Wondering if I should do it big or bigger on RAM, haha
[20:03] <slang> maybe the journal throttling is causing the heartbeat messages to not be sent?
[20:06] <yehudasa> slang: osds actually went down
[20:06] <yehudasa> do you have core files for them?
[20:07] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[20:08] <slang> yehudasa: I killed them eventually
[20:08] <slang> yehudasa: don't think they went down though
[20:08] <yehudasa> oh, I see
[20:09] <slang> yehudasa: I was just trying to capture the init phase where osds get marked down (without actually being down), so I killed all the osds once I saw that behavior, to avoid getting huge logs
[20:12] <yehudasa> slang: debug ms = 1 could help
[20:13] <slang> yehudasa: ok
[20:13] <slang> yehudasa: anything particular you're looking for?
[20:14] <yehudasa> slang: not yet
[20:14] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[20:15] <slang> yehudasa: what makes you suggest debug ms = 1?
[20:15] <yehudasa> slang: it dumps the messages passed between the nodes
[20:16] <slang> i see
[20:19] <yehudasa> slang: there are some issues I see in your logs, sjust will probably have a better idea
[20:21] <sjust> slang: I'm grabbing your logs now
[20:23] <slang> sjust: cool
[20:23] <slang> fyi the resulting osdmap is in there too
[20:34] <sjust> slang: seems to be caused by heart_beat thread unable to get map_lock
[20:37] <slang> ok
[20:37] <slang> sjust: due to race or?
[20:39] <sjust> slang: map_lock map be monopolized by map activation, not sure though, still looking (may also be something else, this should not prevent the heartbeat peers set from being populated anyway)
[20:51] <slang> http://dl.dropbox.com/u/18702194/ceph-files-t2.tbz2
[20:52] <slang> that's another run with debug ms = 1, debug osd = 30, debug filestore = 20, debug journal = 20
[20:52] <slang> looks like only 2 osds were marked down in this run
[20:53] <slang> I ran ceph -w right after starting, the output is included in ceph-w.out
[20:53] <slang> might help get a summarized form of what happened
[21:04] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[21:13] * phil__ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[21:15] * MarkN (~nathan@ Quit (Read error: Operation timed out)
[21:16] * MarkN (~nathan@ has joined #ceph
[21:17] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[21:52] <wido> Hi
[21:52] <wido> just wanted to give an update about the crashes I'm seeing
[21:52] <wido> Last week two disks failed (don't buy the WD green 2TB!!) and caused two OSD's to crash
[21:53] <wido> the cluster recovered from that, but not much later everything started to cascade and right now I'm down to 2 OSD's (from 40)
[21:53] <wido> I also saw my mon going OOM again
[21:53] <wido> problem is, everybody here is still on vacation so I don't have the time to take a close look
[22:09] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[22:49] <cmccabe> wido: that answers the question I had had about the WD Green drives
[22:50] <cmccabe> wido: basically they found that the green drives were constantly parking the heads, which just meant they had to be unparked a short time afterwards. So it reaches its normal lifetime limit of parkings in a few hundred days with a standard linux setup
[22:52] <cmccabe> wido: there is a firmware update that fixes it: http://support.wdc.com/product/download.asp?groupid=609&sid=113&lang=en
[22:53] <cmccabe> wido: I think you need this too: http://support.wdc.com/product/download.asp?groupid=609&sid=114&lang=en
[22:55] <wido> cmccabe: These two made set the failure rate to 12%
[22:55] <wido> I bought 40 disks, 5 failed...
[22:57] <cmccabe> wido: it's just that what WD is trying to do requires cooperation from the OS
[22:58] <cmccabe> wido: if the OS accesses the drive every few seconds, spinning down every 8 seconds is a real mistake
[22:58] <wido> yeah, indeed. Tnx for the firmware, I'll take a look!
[22:58] <wido> really going afk now, bed time here
[22:58] <wido> ttyl!
[22:58] <cmccabe> wido: there's something called laptop mode which you could enable which might help. Of course it would never be useful in combination with ceph
[22:58] <cmccabe> wido: bye!
[23:14] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[23:18] * phil__ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[23:25] <gregaf> hmm, sepia22 is stuck and won't reboot (hung kernel mount, we think)???anybody have a good solution?
[23:25] <cmccabe> wasn't there some super-secret /proc entry or something
[23:25] <cmccabe> or are you already at the state where you can't ssh in
[23:27] <gregaf> yeah, stuck
[23:27] <gregaf> probably need to powerc but josh and I don't know how
[23:27] <cmccabe> isidore?

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.