#ceph IRC Log


IRC Log for 2011-08-04

Timestamps are in GMT/BST.

[0:04] <sagewk> gregaf: still stuck?
[0:04] <sagewk> use the powercycle script in ceph-qa-deploy.git
[0:09] <gregaf> we got it, but I'll put that in my toolbox, thanks!
[0:12] <slang> it seems that if you kill all the osds at once, the monitor never discovers that they have died because none of the osds are left to report on each other
[0:14] <slang> which makes me wonder if there is a degenerate case of certain network partitions between monitors and osds going undetected
[0:19] <gregaf> there's a (long) timeout after which the monitor will start actively querying OSDs and discover that they're down
[0:24] <slang> gregaf: ok
[1:15] <gregaf> sagewk: joshd: I need a good way to create hung kernel mounts for testing, suggestions?
[1:16] <sagewk> teuthology with interactive:, kill the daemons, control-c out of teuthology.
[1:49] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (Read error: Connection reset by peer)
[2:03] <cmccabe> just FYI
[2:03] <cmccabe> I got an assert from os/FileStore:3176 (basically asserting that BTRFS_IOC_SNAP_CREATE succeeds)
[2:04] <sagewk> cmccabe: what was the error code?
[2:04] <cmccabe> -1
[2:05] <cmccabe> errno was...
[2:05] <cmccabe> errno was not printed
[2:06] <sagewk> :( that's unfortunate
[2:06] <cmccabe> it looks like gdb can't find it either
[2:06] <sagewk> wanna fix that in master for next time?
[2:06] <cmccabe> ok
[2:07] <sagewk> anything in dmesg?
[2:07] <cmccabe> [5967969.391206] INFO: task cmon:31331 blocked for more than 120 seconds.
[2:08] <sagewk> you're using btrfs on metropolis?
[2:08] <cmccabe> it's on cmccabe-rgw
[2:08] <sagewk> ah ok
[2:09] <cmccabe> it's probably pretty far from head-of-line btrfs at this point
[2:11] <cmccabe> https://uebernet.dreamhost.com/index.cgi?tree=dev.paste&action=view&id=3044
[2:17] * phil__ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[2:19] * huangjun (~root@ has joined #ceph
[2:21] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[2:41] * greglap (~Adium@ has joined #ceph
[2:58] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:03] * phil__ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Remote host closed the connection)
[3:11] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[3:21] * hutchint (~hutchint@c-75-71-83-44.hsd1.co.comcast.net) has joined #ceph
[3:22] * Ceph (~Adium@aon.hq.newdream.net) has joined #ceph
[3:23] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:26] * Ceph (~Adium@aon.hq.newdream.net) has left #ceph
[3:28] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has left #ceph
[3:34] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) Quit (Quit: jojy)
[3:38] * greglap (~Adium@ Quit (Quit: Leaving.)
[3:52] * joshd (~jdurgin@108-89-24-20.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[4:20] * huangjun (~root@ Quit (Ping timeout: 480 seconds)
[4:25] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[4:25] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has left #ceph
[4:29] * huangjun (~root@ has joined #ceph
[4:49] * joshd (~jdurgin@108-89-24-20.lightspeed.irvnca.sbcglobal.net) Quit (Quit: Leaving.)
[4:57] <huangjun> hi,all
[4:57] <huangjun> if a object is marked LOST, we can not recovery from it.so should the cosd process shutdown?
[5:33] * hutchint (~hutchint@c-75-71-83-44.hsd1.co.comcast.net) Quit (Quit: Leaving)
[5:53] * DLange (~DLange@dlange.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:58] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[6:15] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[6:56] * kilburn (~kilburnsc@ Quit (Ping timeout: 480 seconds)
[8:27] * kemo- (~kemo@c-68-54-224-104.hsd1.tn.comcast.net) has joined #ceph
[8:27] * kemo (~kemo@c-68-54-224-104.hsd1.tn.comcast.net) Quit (Write error: connection closed)
[9:31] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Read error: Connection reset by peer)
[9:31] * huangjun (~root@ Quit (Ping timeout: 480 seconds)
[9:46] * huangjun (~root@ has joined #ceph
[11:46] * lxo (~aoliva@ Quit (Ping timeout: 480 seconds)
[11:48] * lxo (~aoliva@83TAACP02.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:28] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[15:36] * huangjun (~root@ Quit (Remote host closed the connection)
[15:56] * Nadir_Seen_Fire (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 481 seconds)
[16:33] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[17:02] * greglap (~Adium@static-72-67-79-74.lsanca.dsl-w.verizon.net) has joined #ceph
[17:03] * aliguori (~anthony@ has joined #ceph
[17:28] * greglap (~Adium@static-72-67-79-74.lsanca.dsl-w.verizon.net) Quit (Ping timeout: 480 seconds)
[17:29] * greglap (~Adium@ has joined #ceph
[17:32] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[17:37] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:43] * aliguori (~anthony@ has joined #ceph
[18:31] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:31] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Remote host closed the connection)
[18:40] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:41] * greglap (~Adium@ Quit (Quit: Leaving.)
[18:47] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:55] * cmccabe (~cmccabe@ has joined #ceph
[18:56] <u3q> Can't contact the database server: Lost connection to MySQL server at 'reading authorization packet', system error: 0 (mysql.ceph.newdream.net)
[18:56] <u3q> wiki sadness
[18:57] <yehudasa> u3q: looking into it
[18:57] <u3q> looks like it came back now too
[18:57] <u3q> woot
[19:17] <wido> cmccabe: The firmware you gave me is for the WD RAID edition disks, I'm using the WD Caviar Green (WD20AERS), the tools don't seem to apply for those
[19:17] <wido> another disk failed today, 8 failed disks on 40 disks now...
[19:18] <cmccabe> wido: yeah, I Just got that from an article about the WD green disks in general
[19:18] <cmccabe> wido: you might have to really hunt for the particular firmware for you
[19:19] <cmccabe> wido: these things are always a pain-- you find that you have model 1a instead of model 1, etc
[19:19] * jim_ (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[19:20] <wido> searching didn't reveal anything for the WD20AERS :(
[19:29] <u3q> pretty sure they make WD green disks with the idea that some dude will copy his photos there once and never ever write to the disk again
[19:29] <u3q> we had tons of the WD20EARS fails and just stopped using them
[19:30] <cmccabe> u3q: yeah, they posted a very defensive press release with the updated firmware for the Green RAID edition disks
[19:30] <u3q> ah
[19:31] <cmccabe> u3q: it was like "everyone else is stupid for using the disks in the traditional way, our original firmware is fine, if only your OS was leet enough to handle it"
[19:32] <cmccabe> u3q: actually you could configure your OS to write to the disks less often-- there's something called laptop mode in Linux
[19:32] <u3q> ahh
[19:32] <cmccabe> u3q: but it still won't help if there's an application explicitly flushing data out to disk every so often, like a database-- or Ceph
[19:33] <u3q> i assume it also doesnt help that they try to spin down whenever they are idle so you are constantly respinning them in a server environment
[19:33] <u3q> ya
[19:33] <u3q> or any unix probably doing logs heh
[19:33] <cmccabe> u3q: yeah, it seems fairly useless for a server environment
[19:33] <cmccabe> u3q: even more useless for RAID
[19:34] <cmccabe> u3q: with any kind of RAID that does striping (I guess everything except RIAD0) in general you can almost never spin down
[19:34] <cmccabe> u3q: as long as there is some guy accessing something somewhere, you will need all the disks spinning, because the data is probably striped across all of them
[19:35] <cmccabe> u3q: netapp has spent a lot of engineering resources trying to crack this problem, but not very successfully.
[19:35] <cmccabe> u3q: RAID = always spinning
[19:36] <cmccabe> u3q: eventually solid state disks will take over and all of this will be a historical footnote
[19:37] <u3q> i hope so except that most of the SSDs we stuck in desktops here are failing too
[19:37] <u3q> so i think thats a ways out yet
[19:37] <u3q> so fast though... so very fast
[19:48] <wido> Trying the WD20EARS was just a test, they are pretty cheap :)
[19:48] <wido> So I tried them for our Ceph env, not a really good choice :(
[19:50] <wido> I think I'll take Seagate for our production machines
[19:51] <wido> btw, have any of you encountered the "btrfs open_ctree failed" message? I didn't unmount my systems today, so I have them a hard reboot
[19:51] <wido> I have about 10 OSD's out due to my btrfs being broken
[19:51] <wido> I tried the btrfsck and btrfs-select-super, but none fixed it :(
[20:10] * aliguori (~anthony@ Quit (Ping timeout: 480 seconds)
[20:11] <bchrisman> getting this error on creating an rbd device??? vaguely familiar??? like I just forgot something:
[20:11] <bchrisman> 2011-08-04 18:10:21.115209 7f7fed548720 librbd: failed to assign a block name for image
[20:11] <bchrisman> create error: Input/output error
[20:12] <joshd> bchrisman: check your 'osd class dir' in ceph.conf
[20:12] <joshd> bchrisman: that was the problem last time, iirc
[20:13] <bchrisman> ahh yeah.. thanks..
[20:13] <bchrisman> gotta put that in my stupid automation.. feh
[20:16] <bchrisman> put that error string in the ceph wiki rbd/debugging??? hopefully a google search will pick it up quickly then.
[20:21] * aliguori (~anthony@ has joined #ceph
[20:40] <u3q> 11T 12M 11T 1% /ceph
[20:41] <u3q> so much storage space
[20:41] <u3q> now to download the entire internet to it.
[20:43] * jim_ (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Remote host closed the connection)
[21:04] * monrad-51468 (~mmk@domitian.tdx.dk) has joined #ceph
[21:14] * jim_ (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[21:16] * jim_ (~chatzilla@astound-69-42-16-6.ca.astound.net) Quit (Remote host closed the connection)
[21:40] <u3q> lol
[21:40] <u3q> maxing gige to ceph
[21:40] <u3q> need 10g now!
[22:04] <sagewk> wido: what is the probability that a starting osd will crash?
[22:04] <sagewk> wido: is it high enough that you can crank up the log on restart and at least one will crash the first time around?
[22:04] <sagewk> wido: specifically i want to see 'debug filestore = 20'
[22:05] <wido> sagewk: I tried that, but now that same OSD went down with a complete different backtrace
[22:05] <wido> in PG::read_state
[22:06] <sagewk> may be the same root cause.. looks like it's reading bad data back from disk
[22:06] <wido> Ok, just crank up the debug filestore?
[22:07] <wido> or also osd?
[22:07] <wido> no read errors from btrfs nor the disk itself btw
[22:07] <wido> osd is at 65% of the 2TB
[22:11] <wido> sagewk: I'm updating the issue as we speak
[22:11] <sagewk> ok
[22:11] <wido> could I ask for a favour? I'm tracking my time in redmine since we have a tax rebate when you work on Open Source projects as a company, but currently I always select "Development"
[22:12] <wido> could you add "Testing" or so?
[22:12] <sagewk> sure
[22:14] <wido> sagewk: issue updated
[22:18] <sagewk> wido: you're using btrfs?
[22:18] <gregaf> sagewk: joshd: you guys are making anything to do with teuthology pretty difficult??? :(
[22:19] <sagewk> gregaf: ?
[22:19] <gregaf> 3 "locked_by": "colin",
[22:19] <gregaf> 1 "locked_by": "sage",
[22:19] <gregaf> 6 "locked_by": "sage@fatty",
[22:19] <gregaf> 22 "locked_by": "sage@metropolis",
[22:19] <gregaf> 5 "locked_by": "sam",
[22:19] <gregaf> 30 "locked_by": "scheduled_joshd@vit",
[22:19] <gregaf> 1 "locked_by": "stephon@flak",
[22:19] <gregaf> 1 "locked_by": "tv",
[22:19] <gregaf> 3 "locked_by": "tv@dreamer",
[22:19] <gregaf> 3 "locked_by": "yehuda",
[22:19] <gregaf> 21 "locked_by": null,
[22:19] <gregaf> 20 of those 21 free machines are down
[22:19] <sagewk> i'll unlock some
[22:20] <gregaf> sjust: cmccabe: yehudasa: are you guys actually using those machines that are locked by your names, or are those leftovers from when we just started the locking service?
[22:20] <u3q> jb: sent
[22:20] <u3q> er ww
[22:20] <sjust> I am actively using nothing
[22:20] <sagewk> there is no @ it's probably old
[22:20] <yehudasa> gregaf: nope
[22:20] <gregaf> sagewk: yeah, that's my guess
[22:20] <sagewk> since teuth will fail the lock check unless you're expliitly setting your user to something else
[22:22] <gregaf> of course freeing them up won't do much good until joshd implements a cap on how many teuthology-suite will take over
[22:23] <gregaf> ???ungh, yep, it already ate them up
[22:23] <joshd> I'll have some more unlocked soon - the socket module isn't returning the errors it promised
[22:29] <wido> sagewk: Yes, using btrfs, kernel .39
[22:30] <sagewk> wido: can you look in snap_* and see if it's 0 bytes there too?
[22:30] * kemo- (~kemo@c-68-54-224-104.hsd1.tn.comcast.net) Quit (Ping timeout: 480 seconds)
[22:31] <wido> sagewk: Yes, in both snapshots
[22:31] <wido> find current/ -type f -size 0 -name 'pginfo*' shows a lot of pginfo files being empty
[22:32] <wido> 47 to be exact
[22:32] * jim (~chatzilla@astound-69-42-16-6.ca.astound.net) has joined #ceph
[22:32] <wido> and they are all empty in the snapshots
[22:34] <sagewk> yeah, those files should never be 0 bytes. it's either a FileStore or btrfs bug. :/
[22:35] <sagewk> well, they could be 0 in current/ if cosd crashes while writing them, but that should always get thrown out when cosd next starts up.
[22:36] <wido> I'm doing a quick search if I can find more OSDs with the same issue
[22:41] * phil__ (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[22:44] <cmccabe> gregaf: not using that machine
[22:45] * phil_ (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[22:48] <wido> sagewk: I just updated the issue again. Going afk now
[22:48] <wido> ttyl!
[22:54] <sagewk> wido: ttyl!
[23:02] <sagewk> slang: around?
[23:02] <slang> sagewk: hey
[23:02] <slang> sagewk: just saw you created that branch
[23:02] <sagewk> just pushed a wip-heartbeats branch
[23:03] <slang> I'll try it out
[23:03] <sagewk> i don't have a large cluster handy to hammer it with, unfortunately. it looks well behaved on a small set of nodes though
[23:03] <sagewk> thanks!
[23:21] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[23:31] <bchrisman> quick question.. does rbd support unmap requests? I'm guessing it can't for the same reason ceph doesn't track st_blocks?
[23:34] <sagewk> bchrisman: it doesn't right now. it will eventually, though.. it's a pretty quick add
[23:34] <sagewk> it'll be an O(n) operation for the trivial implementation, though. to delete the objects for the given range.
[23:34] <sagewk> when we add the layering there will be some bitmaps in the rbd header to fix that (and make the COW more efficient)
[23:34] <bchrisman> ahh I see
[23:35] <bchrisman> it'll have to look through everything with the simple-implementation
[23:35] <sagewk> yeah

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.