#ceph IRC Log


IRC Log for 2012-04-14

Timestamps are in GMT/BST.

[0:12] <sagewk> elder: around?
[0:40] * gregorg (~Greg@ Quit (Ping timeout: 480 seconds)
[0:47] * gregorg (~Greg@ has joined #ceph
[0:53] <elder> sagewk, I am here.
[0:53] <elder> Sorry my window was hidden.
[1:13] <elder> Tv_, how do I go about capturing the output of a command run on a remote, such that I can use the result in my teuthology script? Here's what I'm trying to do.
[1:13] <elder> I have a path, /dev/rbd/rbd/image, and it is a symlink to /dev/rbd1 (on the target client system).
[1:14] <elder> I know the path, and but I need to have the remote interpret it for me and tell me the canonical path name (/dev/rbd1) for the symlink path I give it (/dev/rbd/rbd/image).
[1:14] <elder> I know I can do it with readlink -f /dev/rbd/rbd/image
[1:15] <elder> But how can I run that command and then use the result in place of the /dev/rbd/rbd/image path I have?
[1:16] <joshd> elder: check out the kernel task's need_to_install function - it reads the output of uname -r
[1:17] <elder> Cool!
[1:22] <elder> Do you know if ctx.cluster.run returns a value to indicate whether the command was successful?
[1:23] <joshd> it raises an exception if it exited non-zero
[1:24] <elder> Exceptions are beyond me at the moment. I'm going to assume it succeeds...
[1:25] <joshd> you might need to wait in a loop until it does (i.e. until udev creates the link) - the rbd task does this in dev_create
[1:27] <elder> I am just using dev_create so I should be fine.
[1:28] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:28] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:30] <sagewk> sjust: see new wip-guard?
[2:11] * Tv_ (~tv@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:13] * Qten (Qten@ppp59-167-157-24.static.internode.on.net) has left #ceph
[2:13] <joao> 'night guys
[2:13] <joao> o/
[2:15] * joao (~JL@ Quit (Quit: Leaving)
[2:31] <elder> joshd, remote.run() has a wait parameter. If False, does that essentially specify background completion, True means wait for it?
[2:32] <joshd> yes
[2:33] <elder> Thanks.
[2:34] <dmick> :param wait: Whether to wait for process to exit. If False, returned ``r.exitstatus`` s a `gevent.event.AsyncResult`, and the actual status is available via ``.get()``.
[2:35] <elder> Where is that?
[2:35] <dmick> teuthology/orchestra/run.py
[2:35] <elder> OK.
[2:35] <elder> I can find things in cscope, but not when they have names like "run"
[2:36] <dmick> remote sets up run.run() as its _runner, and calls it from run()
[2:36] <dmick> yeah, I sorta dug from filenames
[2:36] <elder> I'm sure I've encountered all of those along the way.
[2:49] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[3:41] <The_Bishop> i've a problem here. my setup is 1 mon, 2 mds and 4 osd.
[3:42] <The_Bishop> i copied some gigabytes into the cluster and now both of the mds die shortly after start.
[3:46] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:47] <The_Bishop> here is the logfile from mds.0: http://pastebin.com/GEyCVbAH
[3:48] <The_Bishop> what can i do now? the cluster is dead this way
[3:50] * lofejndif (~lsqavnbok@1RDAAAUKL.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[4:09] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[4:13] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:13] * Qten (Qten@ppp59-167-157-24.static.internode.on.net) has joined #ceph
[4:39] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[4:53] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[4:53] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:12] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[5:13] * f4m8_ (f4m8@kudu.in-berlin.de) Quit (Read error: Operation timed out)
[5:13] * f4m8_ (f4m8@kudu.in-berlin.de) has joined #ceph
[5:17] <The_Bishop> 2012-04-14 05:14:06.383989 b1af7b70 10 mds.0.log _replay_thread finish
[5:17] <The_Bishop> 2012-04-14 05:14:06.389330 b46f2b70 -1 mds.0.journaler(rw) _prezeroed got (6) No such device or address
[5:17] <The_Bishop> 2012-04-14 05:14:06.390275 b46f2b70 -1 mds.0.journaler(rw) handle_write_error (6) No such device or address
[5:17] <The_Bishop> 2012-04-14 05:14:06.390299 b46f2b70 -1 mds.0.log unhandled error (6) No such device or address, shutting down...
[5:17] <The_Bishop> 2012-04-14 05:14:06.390329 b46f2b70 1 mds.0.50 suicide. wanted down:dne, now up:replay
[5:19] * mint (~mint@ip70-191-88-25.sb.sd.cox.net) has joined #ceph
[5:23] * mint (~mint@ip70-191-88-25.sb.sd.cox.net) has left #ceph
[6:09] <The_Bishop> this strace line seems to be related to the case: strace-mds.log.7194: 0.000119 [00f2a416] mkdir("gmon/7194", 0755) = -1 ENOENT (No such file or directory)
[6:12] <The_Bishop> ceph-mds chdir's to /
[6:13] <The_Bishop> later on it does mkdir("gmon/$PID",0755), but this is a relative path
[6:14] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[6:14] <The_Bishop> when i mkdir /gmon , this error vanishes in the further logs, but the mds still dies
[6:17] <sage> The_Bishop: hmm, ENXIO is coming from the osd i think. you can run with '--debug-ms 1' to verify that
[6:17] <sage> it means the mds sent a request to the wrong osd. there is a bug lurking there, see #2022
[6:18] <sage> the gmon is old cruft to make gprof behave; it should probably be removed. but it's not related to your problem
[6:19] <The_Bishop> this is the last line from additional output: 2012-04-14 06:18:14.231726 b4758b70 1 -- <== osd.0 16 ==== osd_op_reply(45 200.00000078 [delete] ondisk = -6 (No such device or address)) v4 ==== 111+0+0 (4084408190 0 0) 0x94555e0 con 0x941f2d0
[6:19] <The_Bishop> ok, how to resolve this?
[6:24] <The_Bishop> i dont find much similarities to my case
[6:24] <The_Bishop> all my MDSs die after init :(
[6:25] <The_Bishop> #2022 does not seem close
[7:35] * cattelan is now known as cattelan_away
[7:49] * gregaf1 (~Adium@aon.hq.newdream.net) has joined #ceph
[7:50] * sagewk (~sage@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[7:50] * sjust (~sam@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[7:50] * gregaf (~Adium@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[7:52] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[7:52] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[9:04] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:05] * loicd (~loic@magenta.dachary.org) Quit ()
[9:16] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:21] * loicd (~loic@magenta.dachary.org) Quit ()
[9:23] * loicd (~loic@magenta.dachary.org) has joined #ceph
[10:48] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[11:13] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (Read error: Operation timed out)
[11:14] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[12:23] * sagewk (~sage@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[12:25] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) Quit (Remote host closed the connection)
[12:26] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[12:27] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) has joined #ceph
[12:38] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Read error: Operation timed out)
[12:43] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) Quit (Read error: Operation timed out)
[12:43] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) has joined #ceph
[14:28] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) has joined #ceph
[14:28] * stxShadow (~Jens@ip-78-94-239-132.unitymediagroup.de) has left #ceph
[14:31] * stxShadow1 (~Jens@jump.filoo.de) has joined #ceph
[14:51] * stxShadow1 (~Jens@jump.filoo.de) has left #ceph
[15:05] * joao (~JL@ has joined #ceph
[15:15] * tjikkun (~tjikkun@82-169-255-84.ip.telfort.nl) Quit (Ping timeout: 480 seconds)
[15:41] * deam (~deam@dhcp-077-249-088-048.chello.nl) has joined #ceph
[15:41] <deam> hi
[15:47] <deam> Is the statement that Ceph is not suitable for production systems still valid? And what does it mean? Does it mean that it will crash on a day to day basis or just that you need to take the proper precautions to prevent data loss?
[15:51] <The_Bishop> the statement still holds...
[15:51] <Kioob> data loss.
[15:52] <The_Bishop> play with it but keep a copy outside ceph
[15:52] <deam> Are there any numbers about the chance of data loss?
[15:53] <The_Bishop> what numbers? you mean the seconds until data loss?
[15:53] <deam> for example, I am trying to get a feeling about how serious the chance is
[15:55] <deam> Ceph is perfect for my project but that statement frightens me.
[15:55] <deam> and keeping/creating a back-up of petabyte's of storage is a no go
[15:59] <The_Bishop> be frightened, i just did not lose data with ceph because i kept a copy outside
[16:01] <deam> hmm
[16:01] <deam> was it because you were experimenting or during regular runs?
[16:01] <The_Bishop> it's still far from stability imho. i think the code needs more time to get better
[16:02] <deam> bummer
[16:02] <The_Bishop> i set up the cluster, copy some gigabytes on it, restart some services, read some gig...
[16:02] <The_Bishop> this stuff
[16:03] <joao> I'm shaky on details, but I think there are a couple of guys around here that have been testing ceph for the long run for a while
[16:03] <The_Bishop> and right now the MDS die right after start
[16:03] <joao> but every now and then there are reports of lost data
[16:04] <deam> so how does one scale to petabytes then? You can hardly double all that data
[16:04] <The_Bishop> yes, this is done
[16:04] <deam> double 1PB of data?
[16:04] <The_Bishop> you can even set the replication level >2
[16:05] <The_Bishop> "scale" means only that the infrastructure can handle this amount
[16:06] <deam> I see
[16:06] <The_Bishop> if you want to lose data quicky then you can turn replication off and watch your data vanish on the first disk fault
[16:08] <deam> well that's obvious
[16:09] <deam> so with the rep level set to >2 it's more reliable?
[16:09] <The_Bishop> well, i don't think so
[16:10] <The_Bishop> but the data survives more dead disks
[16:12] <The_Bishop> the ceph system is the main problem for reliability so far, the replication works as i can see
[16:12] <deam> maybe for the start of the project I can go with a back-up
[16:12] <deam> but when grown to 1PB it's a serious issue, I can't even imagine restoring 1PB of data in a reasonable time frame
[16:13] <The_Bishop> yepp
[16:13] <The_Bishop> it is not ready for prime time
[16:14] <deam> any idea when it will be? any roadmap/plans for that?
[16:14] <The_Bishop> i'm newbie here too :) i play with it for two weeks now
[16:14] <deam> ah
[16:15] <joao> deam, efforts are being made in that regard
[16:16] <deam> are we talking years/months?
[16:16] <joao> the thing is, it's Saturday and the guys aren't at the office today.... maybe you can get more details next Monday if you stop by...
[16:16] <deam> ah sure
[16:16] <deam> no problem
[16:16] <joao> deam, I am in no position to speak of time frames
[16:16] <The_Bishop> joao: do you think i should issue a bug report about my short-lived MDSs?
[16:17] <joao> The_Bishop, to be on the safe side, I'd say go for it :) If it happens to be a duplicate I'm sure someone will notice it
[16:18] <deam> joao: I am also curious about their views on having it run as a production system
[16:18] <The_Bishop> i found no exact match in the bug tracker so far
[16:18] <joao> deam, what do you mean?
[16:19] <deam> joao: well what I asked before, I only got an opinion of The_Bishop but he does not seem to be part of the project
[16:19] <joao> The_Bishop, then I'd say it would be safe to file a bug :)
[16:20] <joao> deam, and I don't know a lot of details when it comes to running ceph in production
[16:21] <joao> but from what I perceived during WHD is that there is interest of using ceph in production systems, and there are some guys running tests with it on a serious level
[16:21] <joao> we are setting up internal long-term runs
[16:21] <Qten> if brtfs is the prefered fs for ceph as the FS is doing the checksumming/data integ, I assume this means we'll need to use it as a raid/mode to be able to use the data integ features? or can it run as Individual disk mode to maximize performance of the machines?
[16:22] <joao> but when it comes to timeframes and its reliability, I'm just in no position to give you an informed opinion
[16:22] <The_Bishop> right, i'm not part of the project, just looking and testing
[16:23] * Hugh (~hughmacdo@soho-94-143-249-50.sohonet.co.uk) Quit (Ping timeout: 480 seconds)
[16:23] <joao> Qten, afaik, btrfs is used as an object store, much like xfs
[16:23] <joao> just that, although ceph takes advantage of some of btrfs capabilities
[16:24] <The_Bishop> i've already tried ceph with ext4, xfs and btrfs
[16:24] <joao> damn, you guys make me realize the amount of knowledge I'm lacking
[16:24] <joao> I gotta step away from the code for a bit and start gathering some infos when it comes in handy for IRC talks :p
[16:26] <Qten> joao: as i understand ceph dosnt have any built in checksumming/data protection is that right? it uses btrfs for this level of protection? which would mean if you were using the file replicated in a object basis you wouldn't know which version of the file is the correct checksum?
[16:26] <The_Bishop> half of the important info is on ceph.newdream.net/docs, the other on /wiki
[16:26] <The_Bishop> i find it quite hard to get a grip at first
[16:30] <joao> Qten, I'm not sure, but I think the PGs keep checksums
[16:31] <joao> and I don't think ceph relies on btrfs checksums for that, but I may be wrong
[16:33] * lofejndif (~lsqavnbok@04ZAACMXI.tor-irc.dnsbl.oftc.net) has joined #ceph
[17:16] * joao (~JL@ Quit (Quit: Leaving)
[17:52] * lofejndif (~lsqavnbok@04ZAACMXI.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[17:57] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[19:19] * Kioob (~kioob@luuna.daevel.fr) Quit (Quit: Leaving.)
[19:58] * lofejndif (~lsqavnbok@09GAAEW0P.tor-irc.dnsbl.oftc.net) has joined #ceph
[20:56] * lofejndif (~lsqavnbok@09GAAEW0P.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[21:02] * lofejndif (~lsqavnbok@1RDAAAVNN.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:23] * al_o (~alo@host162-131-dynamic.244-95-r.retail.telecomitalia.it) has joined #ceph
[22:05] * stxShadow (~jens@jump.filoo.de) has joined #ceph
[22:05] * stxShadow (~jens@jump.filoo.de) Quit ()
[23:23] <al_o> hi, I would like to setup a Ceph with btrfs. It will be for production use... honestly do you think i'm mad?
[23:34] <The_Bishop> from my experience so far, btrfs is more stable than ceph
[23:46] <The_Bishop> do keep backups
[23:50] <iggy> al_o: the ceph site still says not to use ceph in production yet

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.