#ceph IRC Log

Index

IRC Log for 2011-06-02

Timestamps are in GMT/BST.

[0:53] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[0:56] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[0:56] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[0:59] * Juul_ (~Juul@c-76-21-88-119.hsd1.ca.comcast.net) has joined #ceph
[1:00] * Juul_ (~Juul@c-76-21-88-119.hsd1.ca.comcast.net) Quit ()
[1:10] <sagewk1> djlee__: large journal means a longer time period that can elapse between btrfs snapshots/syncs. they can be slow at times.
[1:11] <gregaf> doesn't really matter if they're on the same partition though
[1:11] <gregaf> right?
[1:15] * votz (~votz@dhcp0020.grt.resnet.group.upenn.edu) Quit (Quit: Leaving)
[1:34] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[1:38] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[1:44] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[1:54] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:05] * yoshi (~yoshi@p24092-ipngn1301marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:12] <yoshi> joshd: Are you around?
[2:12] <joshd> yeah
[2:13] <yoshi> joshd: great. sage posted on tracker to dump some info. do you still need them?
[2:13] <joshd> yeah, it's still useful
[2:14] <joshd> after that, the next thing tocheck is whether the files are still on disk
[2:14] <yoshi> alright. 'ceph pg dumpg -o -' https://gist.github.com/290d7c94b9aed3d17205
[2:15] <joshd> specifically, they'd be in <osd_data_dir>/current/3.1_head
[2:16] <yoshi> 'ceph osd dump -o -' https://gist.github.com/9032cb7277dfcf02ede2
[2:17] <yoshi> ???ceph osd dump -o - 26??????https://gist.github.com/4cd22aa4d58c565b9689
[2:19] <yoshi> The on disk data is still there. https://gist.github.com/5be93a65abf3b981c081
[2:20] <sjust> yoshi: taking a look
[2:20] <yoshi> sjust: thanks.
[2:25] <sjust> yoshi: could you get me the output of of grep 'pg\[3.26(' <osdlog>?
[2:27] <yoshi> sjust: that specific pg doesn't show up in the log.
[2:27] <sjust> ah
[2:27] <sjust> yoshi: one sec
[2:28] <yoshi> sjust: maybe the log could be too verbose.
[2:28] <sjust> yoshi: no, the pg isn't getting initialized in the first place
[2:28] <sjust> yoshi: were there originally more osds?
[2:29] <yoshi> sjust: No. It has been single and is single still.
[2:29] <sjust> in the current/ directory of the osd store, is there a 3.26* directory?
[2:31] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Read error: Operation timed out)
[2:32] <yoshi> yes, there is. https://gist.github.com/eb09cf1bd7498761cae6
[2:33] <sjust> could you post the output of ls current/meta?
[2:35] <yoshi> sure. just a sec.
[2:35] <yoshi> sjust: https://gist.github.com/e68898a66d875d284548
[2:37] <joshd> can you grep the log for 'load_pgs skipping'?
[2:42] <yoshi> joshd: in osd log? load_pgs doesn't show up.
[2:43] <joshd> what about just load_pgs?
[2:43] <joshd> are your logs getting rotated?
[2:45] <joshd> if they are being rotated, restarting the osd will give us the relevant output
[2:47] <yoshi> joshd: right. hold on.
[2:49] <yoshi> 'grep -A 30 load_pgs' https://gist.github.com/575ddb54ba70a2110bf6
[2:50] * djlee_ (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[2:51] * Meths (rift@2.25.212.84) Quit (Read error: Operation timed out)
[2:51] <sjust> dumb question: /data/osd0 is where the osd data is, right?
[2:52] <yoshi> sjust: yep. I believe so...
[2:52] <sjust> yoshi: ok
[2:57] * djlee (~dlee064@des152.esc.auckland.ac.nz) Quit (Ping timeout: 480 seconds)
[2:58] <sjust> yoshi: it looks like it's failing to list the pg collections in /data/osd0 on startup
[3:00] <joshd> yoshi: unfortunately we have no logging in the relevant loop
[3:01] <yoshi> joshd: hmm. let me take a look at rotated logs.
[3:01] <joshd> yoshi: either opendir or readdir is failing
[3:02] <joshd> yoshi: I mean the osd doesn't have any print statements there at all
[3:03] <yoshi> joshd: I see.
[3:04] <sjust> yoshi: if we get you some packages with updated debugging, could you install them?
[3:04] <yoshi> sjust: sure. i was wondering upgrading is gonna fix.
[3:05] <yoshi> BTW, is there any tools to clean unstable state like this?
[3:05] <yoshi> I think we definitely need some kind of tools like that.
[3:05] <joshd> this particular part of the code hasn't changed, but other bugs have been fixed
[3:05] <yoshi> Great. Let me try.
[3:19] * Meths (rift@2.25.212.84) has joined #ceph
[3:34] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[4:03] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Read error: Connection reset by peer)
[5:36] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[5:44] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[5:49] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[7:24] * lalitb (~lalitb@182.72.233.185) has joined #ceph
[7:51] <lalitb> Guys, I am new to ceph. Trying to create a 2 node cluster setup of ceph. Getting errors during mk file system : http://dpaste.com/hold/549518/. Any idea what's going wrong?
[7:51] * lxo (~aoliva@186.214.52.246) Quit (Ping timeout: 480 seconds)
[7:59] * lxo (~aoliva@186.214.51.106) has joined #ceph
[8:07] * lx0 (~aoliva@186.214.49.122) has joined #ceph
[8:12] * lxo (~aoliva@186.214.51.106) Quit (Ping timeout: 480 seconds)
[9:00] <lalitb> I have also posted by problem at mailing list. Any help would be appreciated.
[9:09] * lalitb (~lalitb@182.72.233.185) Quit (Quit: Leaving)
[9:14] * lalitb (~lalitb@182.72.233.185) has joined #ceph
[9:40] * allsystemsarego (~allsystem@188.27.167.240) has joined #ceph
[9:58] * jbd (~jbd@ks305592.kimsufi.com) Quit (Ping timeout: 480 seconds)
[10:43] * yoshi (~yoshi@p24092-ipngn1301marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:29] * yoshi (~yoshi@FL1-122-135-122-224.tky.mesh.ad.jp) has joined #ceph
[11:34] * Juul_ (~Juul@c-76-21-88-119.hsd1.ca.comcast.net) has joined #ceph
[11:46] * lalitb (~lalitb@182.72.233.185) Quit (Quit: Leaving)
[11:56] * yoshi (~yoshi@FL1-122-135-122-224.tky.mesh.ad.jp) Quit (Ping timeout: 480 seconds)
[12:10] * failboat (~stingray@stingr.net) Quit (Quit: WeeChat 0.3.3)
[12:22] * Juul_ (~Juul@c-76-21-88-119.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[13:32] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[13:36] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[13:36] * mtk (~mtk@ool-182c8e6c.dyn.optonline.net) has joined #ceph
[14:16] * lalitb (~lalitb@182.72.233.185) has joined #ceph
[14:21] * lalitb (~lalitb@182.72.233.185) Quit ()
[14:27] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit (Remote host closed the connection)
[14:52] * johnl (~johnl@johnl.ipq.co) Quit (Ping timeout: 480 seconds)
[15:03] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:03] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[15:07] * johnl (~johnl@johnl.ipq.co) has joined #ceph
[15:09] * DLange (~DLange@dlange.user.oftc.net) Quit (Quit: reboot)
[15:13] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[15:13] * djlee_ (~dlee064@des152.esc.auckland.ac.nz) Quit (Read error: Connection reset by peer)
[15:14] * djlee_ (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[15:24] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) Quit (Remote host closed the connection)
[16:28] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[16:39] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[17:02] * greglap (~Adium@mobile-198-228-210-153.mycingular.net) has joined #ceph
[17:09] * djlee_ (~dlee064@des152.esc.auckland.ac.nz) Quit (Quit: Ex-Chat)
[17:15] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (synthon.oftc.net graviton.oftc.net)
[17:15] * lidongyang_ (~lidongyan@222.126.194.154) Quit (synthon.oftc.net graviton.oftc.net)
[17:15] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (synthon.oftc.net graviton.oftc.net)
[17:15] * todin (tuxadero@kudu.in-berlin.de) Quit (synthon.oftc.net graviton.oftc.net)
[17:16] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[17:16] * lidongyang_ (~lidongyan@222.126.194.154) has joined #ceph
[17:16] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:16] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[17:29] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[17:43] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:46] * djlee (~dlee064@des152.esc.auckland.ac.nz) has joined #ceph
[17:47] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:47] * verwilst (~verwilst@dD576F882.access.telenet.be) has joined #ceph
[17:52] * verwilst (~verwilst@dD576F882.access.telenet.be) Quit ()
[18:12] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: Leaving)
[18:12] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[18:13] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[18:35] * greglap (~Adium@mobile-198-228-210-153.mycingular.net) Quit (Read error: Connection reset by peer)
[18:52] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[18:55] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:57] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[19:06] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:25] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[19:25] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[19:50] * greglap1 (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:52] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[19:53] <cmccabe> tv, are you there?
[19:54] <Tv> cmccabe: yea
[19:54] <cmccabe> do you think it makes sense to use virtualenv for obsync?
[19:54] <cmccabe> I received a complaint about obsync's dependencies
[19:54] <cmccabe> and I guess virtualenv makes it easier to manage python dependencies?
[19:55] <cmccabe> on the other hand, I guess if obsync is installed, it needs to run in the real env rather than a virtual one
[19:56] * verwilst (~verwilst@dD576F882.access.telenet.be) has joined #ceph
[19:56] <Tv> i don't understand the last line
[19:56] <cmccabe> well, obsync is part of the ceph.spec right now, and the .deb
[19:56] <Tv> ahh
[19:57] <cmccabe> so it actually gets installed to /usr/bin or wherever
[19:57] <Tv> can you expand on the complaints?
[19:57] <Tv> as in, DH internal deployment?
[19:57] <cmccabe> well, the deps aren't in the rpm I guses
[19:57] <cmccabe> rpm/deb
[19:57] <Tv> ah, well that just needs to be fixed
[19:57] <cmccabe> so I guess I have to hunt down the dependencies and put them there
[19:57] <Tv> what you *don't* want is to try to use virtualenv inside the deb packaging
[19:57] <Tv> that's a road to insanity
[19:57] <cmccabe> k
[19:58] <Tv> but if e.g. DH runs a debian release too old to have the deps, then they need to use virtualenv to deploy it (and not use the debs)
[19:58] <cmccabe> I think we're trying to avoid that
[19:58] <Tv> yeah, then you need to dig up the right deps and hope they are in whatever the server happens to run
[19:59] <cmccabe> k
[19:59] <Tv> (or change obsync to not have that dep, etc)
[19:59] <Tv> virtualenv --no-site-packages is a good tool for discovering the deps
[19:59] <cmccabe> oh, good...
[20:17] <Tv> iiinteresting: http://bcov.sourceforge.net/
[20:18] <Tv> probably not high enough quality
[20:18] <cmccabe> I forget how breakpoints work; I don't remember it being very high performance though
[20:18] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[20:19] <Tv> oh yeah
[20:19] <cmccabe> I think gdb might just implement breakpoints with its gross ptrace stuff?
[20:21] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[20:56] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[20:56] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[21:24] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[21:57] <bchrisman> libceph and Client.cc issue. Client::lstat can return < ???1, when the stat/lstat system calls generally return ???1 on error and zero otherwise. I want to fix this in Client::lstat. It looks like the fuse client goes through ll_getattr to get stat info and I don't see any other references to Client::lstat around.
[21:57] <cmccabe> that seems reasonable to me at least
[21:57] <Tv> bchrisman: all of ceph follows the kernel -errno return convention
[21:57] <Tv> bchrisman: is that what this is about?
[21:57] <cmccabe> you might try removing that function and seeing if it breaks the compile
[21:58] <cmccabe> tv: no, he wants Client::lstat to follow the POSIX lstat convention
[21:58] <cmccabe> although... come to think of it
[21:58] <cmccabe> how would you get the actual error code then?
[21:58] <Tv> all of ceph api follows -errno afaik
[21:58] <bchrisman> Tv: I see...
[21:58] <Tv> and that makes it easily thread-safe yadda yadda
[21:58] <bchrisman> cmccabe: generally errno
[21:58] <Tv> if you need to map it to posix, errno = -ret; return -1
[21:59] <cmccabe> bchrisman: yeah, we don't have anything like errno (I think) so we follow the convention tv described
[21:59] <bchrisman> is the intent of libceph to have return codes similar to the kernel?
[22:00] <Tv> bchrisman: kernel is only relevant as inspiration
[22:00] <Tv> bchrisman: -errno is just a very sane convention
[22:00] <cmccabe> it's just easier
[22:01] <bchrisman> ahh okay??? so maybe we should stick with that and do the conversion in the samba vfs layer, which expects system call like API...
[22:01] <gregaf> bchrisman: yeah
[22:01] <gregaf> we can't change the return codes in the Client class or we'll muck up the errors that FUSE generates
[22:02] <bchrisman> would only make sense to change that return convention if there are a number of other clients using libceph that would otherwise expect a system call mechanism.
[22:02] <bchrisman> gregaf: yeah.. I was look at that??? thus far it's been avoided...
[22:02] <gregaf> I'm not sure about libceph itself, and the only other clients I know of are the Hadoop and Hypertable ones :P (though I don't know of anybody using those)
[22:03] <cmccabe> honestly, I think the old POSIX convention of -1 and errno is suboptimal
[22:03] <gregaf> but certainly those both want real error numbers
[22:03] <cmccabe> and the POSIX guys seem to agree, because they return -errno in some newer POSIX functions
[22:03] <cmccabe> and leave the global/thread-local variables alone
[22:03] <Tv> gregaf: they're just as real whether they come from errno or not..
[22:03] <Tv> and hadoop definitely doesn't need errno, it's an all-java api after all
[22:04] <gregaf> in the return values, I meant
[22:05] <gregaf> and really I'm just amusingly unversed in the semantics of errno so when people start talking about it not being thread safe I don't want to go near it
[22:05] * lx0 is now known as lxo
[22:06] <cmccabe> when people talk about errno being thread safe, it's because they are stupid
[22:06] <cmccabe> errno has been a thread-local variable for decades now
[22:06] <rsharpe> Really?
[22:06] <rsharpe> # ifndef __ASSEMBLER__
[22:06] <rsharpe> /* Function to get address of global `errno' variable. */
[22:06] <rsharpe> extern int *__errno_location (void) __THROW __attribute__ ((__const__));
[22:06] <rsharpe> # if !defined _LIBC || defined _LIBC_REENTRANT
[22:06] <rsharpe> /* When using threads, errno is a per-thread value. */
[22:06] <rsharpe> # define errno (*__errno_location ())
[22:06] <rsharpe> # endif
[22:06] <rsharpe> # endif /* !__ASSEMBLER__ */
[22:06] <rsharpe> #endif /* _ERRNO_H */
[22:06] <rsharpe> That is from bits/errno.h
[22:06] <Tv> that doesn't make it painless
[22:07] <Tv> errno clobbering is a real issue
[22:07] <Tv> n:m threading is a real issue
[22:07] <cmccabe> rsharpe: what you posted basically confirms what I just said
[22:07] <cmccabe> rsharpe: "When using threads, errno is a per-thread value"
[22:07] <gregaf> I was just looking at "and that makes it easily thread-safe yadda yadda"
[22:07] <rsharpe> Sorry, I am aggreeing with cmccabe
[22:07] <gregaf> I really don't know the semantics
[22:08] <cmccabe> tv: If you are using n:m threading, it's up to your threading library to keep you safe
[22:08] <Tv> cmccabe: and the way errno is now, they are more annoying to write..
[22:08] <Tv> sure, it's a problem that has been solved several times, but the mere existence of the problem is stupid
[22:09] <cmccabe> tv: yes, I agree... the new return convention is better
[22:09] <Tv> errno clobbering is the one that's harder, because you can't solve it "once in a library"
[22:09] <cmccabe> tv: now we can all agree that the old convention will not go away and get on with our lives :)
[22:09] <cmccabe> tv: well, you can, but that library has to be named glibc
[22:09] <Tv> and have a different api
[22:09] <cmccabe> tv: honestly, n:m threading has a lot of problems
[22:09] <rsharpe> Tv: I am not sure what you mean. libceph is providing functions that are providing the same functionality as libc so it is appropriate to set errno on errors.
[22:09] <cmccabe> tv: for one thing, it kills the ability of the kernel to schedule processes properly and share them between CPUs
[22:10] <Tv> rsharpe: it's not the same api anyway
[22:10] <Tv> if you want the same api, mount it and then you'll have it ;)
[22:30] * lxo (~aoliva@186.214.49.122) Quit (Remote host closed the connection)
[22:41] * lxo (~aoliva@186.214.49.122) has joined #ceph
[22:51] * jbd (~jbd@ks305592.kimsufi.com) has joined #ceph
[22:56] * Yulya_ (~Yu1ya_@ip-95-220-185-194.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[22:59] * Yulya_ (~Yu1ya_@ip-95-220-173-252.bb.netbynet.ru) has joined #ceph
[23:03] * verwilst (~verwilst@dD576F882.access.telenet.be) Quit (Quit: Ex-Chat)
[23:10] * jbd (~jbd@ks305592.kimsufi.com) has left #ceph
[23:23] <Tv> DEBUG:teuthology.misc:Ceph health: HEALTH_OK
[23:23] <Tv> Ceph test interactive mode, press control-D to exit...
[23:23] <Tv> >>> ctx.cluster.only('osd.1').run(args=['uptime'])
[23:23] <Tv> DEBUG:orchestra.run:Running: 'uptime'
[23:23] <Tv> INFO:orchestra.run.out: 14:23:31 up 17 days, 14:09, 0 users, load average: 0.01, 0.10, 0.12
[23:24] <Tv> [<orchestra.run.RemoteProcess object at 0x2b08730>]
[23:24] <Tv> >>>
[23:24] <Tv> mmmm....
[23:29] * yoshi (~yoshi@KD027091032046.ppp-bb.dion.ne.jp) has joined #ceph
[23:34] * allsystemsarego (~allsystem@188.27.167.240) Quit (Quit: Leaving)
[23:37] <yoshi> joshd: sjust: Hi. Are you around?
[23:37] <sjust> yoshi: yep
[23:40] <sjust> deb http://gitbuilder-deb-amd64.ceph.newdream.net/debian/filestore_debugging squeeze main
[23:40] <yoshi> sjust: great. are the debugging packages ready?
[23:40] <sjust> that repo has packages with debugging
[23:40] <sjust> you should be able to add that line to /etc/apt/sources.list, run apt-get update, and apt-get install ceph
[23:42] <Tv> sjust: note: that'll only work if the version number is > than currently installed
[23:43] <yoshi> sjust: the one you mentioned yesterday that has specific logging outputs in osd?
[23:43] <sjust> yoshi: yes
[23:44] <yoshi> sjust: oops. sorry, missed your comments before.
[23:44] <yoshi> just a sec.
[23:44] <sjust> yoshi: you should uninstall ceph and dependencies first, I think
[23:45] <yoshi> sjust: thanks. hold on.
[23:52] <yoshi> sjust: Hmm. I couldn't connect to http://gitbuilder-deb-amd64.ceph.newdream.net
[23:52] <yoshi> could be a typo>
[23:52] <sjust> oh, right, it's internal :(
[23:52] <yoshi> ?
[23:52] <sjust> hang on
[23:52] * Juul_ (~Juul@slim.dhcp.lbl.gov) has joined #ceph
[23:52] <yoshi> ah.
[23:57] <Tv> http://ceph.newdream.net/gitbuilder-deb-amd64/
[23:59] <yoshi> Tv: thanks. just a sec.

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.