#ceph IRC Log

Index

IRC Log for 2011-02-18

Timestamps are in GMT/BST.

[0:02] * sagelap (~sage@216.2.29.105) has joined #ceph
[0:02] * sagelap (~sage@216.2.29.105) has left #ceph
[0:12] * sagelap (~sage@216.2.29.105) has joined #ceph
[0:12] * sagelap (~sage@216.2.29.105) has left #ceph
[0:30] <cmccabe> trying to create the same snapshot twice seems to lead to a hang
[0:30] <cmccabe> as in, same name
[0:33] <gregaf> ceph snapshots, you mean?
[0:33] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:33] <cmccabe> rados snaps
[0:33] <cmccabe> per-pool snaps
[0:34] <gregaf> huh
[0:34] <cmccabe> I'll see if I can reproduce it with the C-level testrados
[0:35] <Tv> what does cauthtool --set-uid=0 do?
[0:35] <Tv> what's auid?
[0:36] <gregaf> auth user id
[0:36] <gregaf> it's used in the pool-based capabilities infrastructure
[0:37] <Tv> gregaf: i would like to understand.. more
[0:38] <gregaf> users have capabilities bits for each of the MDS/Mon/OSD
[0:38] <gregaf> read/write/execute for each, plus the OSD caps include a read/write/execute scheme on a per-pool basis
[0:39] <Tv> i see/sort of understand these:
[0:39] <gregaf> these caps let them do various operations
[0:39] <Tv> caps mds = "allow"
[0:39] <Tv> caps mon = "allow *"
[0:39] <Tv> caps osd = "allow *"
[0:39] <Tv> what's auid?
[0:40] <gregaf> each user can be given an auid
[0:40] <gregaf> which is used in certain osd ops
[0:40] <gregaf> if they aren't given an auid, they get the anonymous auid
[0:41] <gregaf> then pool owners can give auid x permission to read from their pool
[0:41] <gregaf> or write to it, execute, whatever
[0:41] <Tv> where i'm at right now is, if i make cfuse use the "ceph.keyring" with lots of "allow *" etc, it works; if i try to gen-key from scratch, i get cfuse[1895]: ceph mount failed with Operation not permitted
[0:41] <Tv> i'm trying to understand what part i missed from the auth
[0:41] <gregaf> I don't think that auid stuff should be causing you any trouble
[0:42] <Tv> from "ceph auth list":
[0:42] <Tv> client.admin
[0:42] <Tv> key: AQDbqV1NQIF4DBAAHRQ/qmu05F8CagzQuDJR8w==
[0:42] <Tv> caps: [mds] allow
[0:42] <Tv> caps: [mon] allow *
[0:42] <Tv> caps: [osd] allow *
[0:42] <Tv> client.cfuse
[0:42] <gregaf> although it's been mucked around with a lot since its creation so don't take what I'm saying here as authoritative any more
[0:42] <Tv> key: AQDfqV1NqNE/IxAAqVVfHCgPW+8/lNVm5Iyylg==
[0:42] <Tv> caps: [mds] allow
[0:42] <Tv> caps: [mon] allow r
[0:42] <Tv> caps: [osd] allow rw pool=data
[0:42] <Tv> it seems client.admin can run cfuse, client.cfuse can't
[0:42] <Tv> i guess i'll give client.fuse those caps one by one and see if it starts working
[0:42] <gregaf> I think they're going to need rw on the monitor in order to mount
[0:43] <Tv> ahh
[0:43] <gregaf> that should do it, but I can check
[0:43] <Tv> something led me to think r would be enough, trying to find that again
[0:47] <Tv> A client mounting the file system with minimal permissions would need caps like
[0:47] <Tv> mds = "allow"
[0:47] <Tv> osd = "allow rw pool=data"
[0:47] <Tv> mon = "allow r"
[0:47] <Tv> man cauthtool(8)
[0:47] <gregaf> all right, hmm
[0:50] <gregaf> did you check that you can do other stuff with client.cfuse?
[0:50] <gregaf> like rados lspools?
[0:50] <Tv> not really no
[0:52] <Tv> i just did, works while cfuse mount fails
[0:53] <gregaf> well I'm confused, blargh
[0:53] <Tv> err sorry i may have used the wrong key, hold on
[0:53] <Tv> there's not explicit key option and it probably picked up the wrong config section
[0:53] <Tv> yeah it did, digging up what config section it uses
[0:54] <Tv> "rados"
[0:54] <Tv> except not really?-o
[0:55] <Tv> 2011-02-17 15:55:05.786292 7fb4ea36d720 librados: client.admin authentication error Operation not permitted
[0:55] <Tv> yeah it also fails if i give it my "new" key
[0:55] <Tv> and at this point, ceph auth list says caps are the same for both keys
[0:55] <gregaf> are you sure your keyring and ceph auth list contain the same keys?
[0:56] <Tv> apart from me being confused about auid, yes
[0:56] <gregaf> well if client.cfuse and client.admin have the same caps on ceph auth list, and client.admin works and client.cfuse doesn't, I think that's pretty much the only thing left that can be broken...
[0:57] <Tv> ceph auth list doesn't echo back any auids to me
[0:57] <gregaf> yeah, you don't need to worry about the auids
[0:57] <Tv> i changed it later to be "0" just client the admin key, but i can't see if that took effect on later re-add, or not
[1:04] <Tv> for posterity: i needed a --name=cfuse, or it tried to load a [client.admin] section, which didn't even exist in the keyring, and i don't even know what key it then tried to use
[1:05] <yehudasa> Tv: the default user is admin
[1:05] <yehudasa> it probably tried a blank key
[1:05] <yehudasa> which obviously didn't work
[1:05] <Tv> yehudasa: what i'm saying is the keyring file it read had no such section, yet it went ahead trying to do operations
[1:05] <Tv> i'd have expected it to crap out earlier
[1:06] <yehudasa> yeah, the auth function can be a bit smarter about that
[1:08] <Tv> YES!
[1:08] <Tv> that is a successful 2-machine autotest run
[1:08] <yehudasa> cool!
[1:08] <Tv> cfuse mount of another host
[1:09] <Tv> next up: clean that up, push for others to see, generalize to >2 machines, refactor shared things to library, make writing things easy
[1:52] * ooolinux (~bless@203.114.244.22) has joined #ceph
[1:52] <cmccabe> so when I roll back an object to a particular pool snap
[1:52] <cmccabe> I must use the pool snap name, never snapid
[1:55] <gregaf> uh, yeah
[1:55] <cmccabe> but on the other hand, the selfmanaged snap stuff always uses the snapid
[1:57] <cmccabe> I'm guessing that rados_set_snap and rados_set_snap_context are only used with selfmanaged snaps?
[1:57] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[1:59] <gregaf> I don't remember for sure, but I think so
[2:06] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[2:15] <cmccabe> is auid 0 special
[2:17] <cmccabe> I'm guessing 0 is the anonymous auid?
[2:18] <gregaf> I think 0 is admin
[2:18] <gregaf> anonymous is -1
[2:18] <gregaf> (it's unsigned, so not really -1)
[2:19] <cmccabe> ok
[2:19] <cmccabe> default seems to be 0 now?
[2:41] <cmccabe> I'm guessing rados_write_full waits for a commit before returning?
[2:43] <cmccabe> well, actually, that's not quite it. It seems to replace the entire contents of the object with a different object
[2:49] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[2:58] * cmccabe (~cmccabe@208.80.64.79) has left #ceph
[3:04] * ooolinux (~bless@203.114.244.22) Quit (Quit: Leaving)
[3:04] * ooolinux (~bless@203.114.244.22) has joined #ceph
[3:37] * tel_ (~tel@59.64-150-148-net.sccoast.net) has joined #ceph
[7:06] * Psi-Jack_ (~psi-jack@yggdrasil.hostdruids.com) Quit (Read error: Connection reset by peer)
[7:06] * Psi-Jack_ (~psi-jack@yggdrasil.hostdruids.com) has joined #ceph
[7:51] <ooolinux> hi
[7:52] <ooolinux> hi
[7:55] <ooolinux> would you like discuss strncpy in ceph?
[8:03] <ooolinux> if ceph promise no Overflow ? when strncpy , strcp?
[8:21] * Psi-Jack- (~psi-jack@yggdrasil.hostdruids.com) has joined #ceph
[8:23] * Meths_ (rift@91.106.208.48) has joined #ceph
[8:23] * Psi-Jack (~psi-jack@yggdrasil.hostdruids.com) has joined #ceph
[8:29] * Psi-Jack_ (~psi-jack@yggdrasil.hostdruids.com) Quit (Ping timeout: 480 seconds)
[8:29] * Psi-Jack- (~psi-jack@yggdrasil.hostdruids.com) Quit (Ping timeout: 480 seconds)
[8:29] * Psi-Jack_ (~psi-jack@yggdrasil.hostdruids.com) has joined #ceph
[8:30] * Meths (rift@91.106.241.181) Quit (Ping timeout: 480 seconds)
[8:31] * Psi-Jack (~psi-jack@yggdrasil.hostdruids.com) Quit (Ping timeout: 480 seconds)
[8:49] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (Read error: Operation timed out)
[9:06] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[9:17] <jantje> hmm
[9:20] <ooolinux> hi
[9:22] <ooolinux> #12 0x00007f6330b71a3a in start_thread () from /lib64/libpthread.so.0
[9:22] <ooolinux> #13 0x00007f632fd8f77d in clone () from /lib64/libc.so.6
[9:22] <ooolinux> #14 0x0000000000000000 in ?? ()
[9:22] <ooolinux> any one know frame 14 is which function?
[9:49] <ooolinux> hi,are you there?
[10:01] * MK_FG (~MK_FG@188.226.51.71) Quit (Ping timeout: 480 seconds)
[10:06] * Yoric (~David@213.144.210.93) has joined #ceph
[10:41] * allsystemsarego (~allsystem@188.25.130.49) has joined #ceph
[10:43] * ooolinux (~bless@203.114.244.22) Quit (Ping timeout: 480 seconds)
[12:11] <DeHackEd> that's the end of the stack. there is no frame 14
[12:24] * Yoric_ (~David@213.144.210.93) has joined #ceph
[12:24] * Yoric (~David@213.144.210.93) Quit (Read error: Connection reset by peer)
[12:24] * Yoric_ is now known as Yoric
[12:32] * MK_FG (~MK_FG@188.226.51.71) has joined #ceph
[14:14] * julienhuang (~julienhua@pasteur.dedibox.netavenir.com) has joined #ceph
[14:48] * julienhuang (~julienhua@pasteur.dedibox.netavenir.com) Quit (Ping timeout: 480 seconds)
[14:50] * julienhuang (~julienhua@82.67.204.235) has joined #ceph
[15:07] * Meths_ is now known as Meths
[16:52] * julienhuang (~julienhua@82.67.204.235) Quit (Quit: julienhuang)
[17:29] <prometheanfire> I know this may have been discused before, but what is the proper course of action for ods of differing size
[17:29] <prometheanfire> I think it was to put a greater weight on the crushmap for the bigger node
[17:51] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:51] * greglap (~Adium@166.205.138.233) has joined #ceph
[17:55] * verwilst (~verwilst@router.begen1.office.netnoc.eu) Quit (Quit: Ex-Chat)
[18:09] * alexxy (~alexxy@79.173.81.171) Quit (Remote host closed the connection)
[18:12] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:13] * alexxy (~alexxy@79.173.81.171) has joined #ceph
[18:32] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[18:39] * Yoric (~David@213.144.210.93) Quit (Ping timeout: 480 seconds)
[18:40] * greglap (~Adium@166.205.138.233) Quit (Quit: Leaving.)
[18:44] * Yoric (~David@213.144.210.93) has joined #ceph
[18:46] * Yoric (~David@213.144.210.93) Quit ()
[18:46] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[19:00] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:01] <gregaf> prometheanfire: it depends on whether you want to optimize for storage space or speed :)
[19:01] <gregaf> but if you want to use all your available space, yes, you assign weights proportional to the disk space each OSD has
[19:02] <prometheanfire> I want to optimize for failover
[19:03] <prometheanfire> and space would be a big plus, if I need more speed then I can look at upgrading all nodes
[19:03] <gregaf> yeah
[19:03] <gregaf> I just mean that if you assign more weight to nodes with larger disks then you're going to be retrieving more data from those disks
[19:04] <gregaf> but they may not have enough extra bandwidth (either at the disk or network level) to match the amount of extra data they'll be required to serve
[19:05] <prometheanfire> ah, it would be a network limitation
[19:05] <bchrisman> are the weights in the crushmap normalized? ie, an example on the wiki has each 'rack' in the 'root' weighted as 4.0… those are all at the same level… and I can see how it represents four hosts for each rack… but would the crushmap be the same if those were all '1.0' instead of 4.0?
[19:05] <prometheanfire> right now I'm populating servers with 1TB 2.5 inch sas drives
[19:06] <prometheanfire> if I order another server down the line with 1.5tb drives I want to use it :D
[19:06] <Tv> bchrisman: whether or not they are normalized, i frankly don't know right now -- but the selection is made based on *relative* weights, purely
[19:06] <prometheanfire> bchrisman: I think it is, but I have no confirmation
[19:06] <gregaf> bchrisman: pretty sure they're normalized but that's the kind of admin stuff I don't get into much ;)
[19:06] <prometheanfire> yes, relitive :D
[19:06] <bchrisman> okay.. that's what I figured.. thx
[19:21] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[19:27] * cmccabe (~cmccabe@c-24-23-253-6.hsd1.ca.comcast.net) has joined #ceph
[20:17] * Ormod (~hjvalton@vipunen.hut.fi) Quit (Ping timeout: 480 seconds)
[20:18] <Tv> my 2-machine cfuse autotest is now pushed to ceph-autotests.git
[20:33] <Tv> the control file got a bit unwieldy :(
[20:51] * allsystemsarego (~allsystem@188.25.130.49) Quit (Ping timeout: 480 seconds)
[20:53] * allsystemsarego (~allsystem@188.27.164.243) has joined #ceph
[21:09] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[21:46] <wido> should the current ceph-client build against 2.6.38?
[21:49] <gregaf> wido: is it not?
[21:50] <wido> gregaf: well, I'm getting some errors. Let me dump it on pastebin
[21:50] <gregaf> I think it should build but it won't play nice — Nick Piggin's scalability patches broke a few things and he's supposedly pushing a fix through but nobody's heard from him in a few weeks!
[21:52] <wido> gregaf: Ah, I think I found it, I was using ceph-client.git at sage's repo at kernel.org
[21:52] <wido> That one hasn't been synced it seems
[21:52] <wido> Is still here btw: http://ceph.newdream.net/source-control/
[21:53] <Tv> wido: i recall that the repos on kernel.org are meant for linus to pull
[21:54] <wido> Tv: Oh, that could be, I changed to the newdream.net repo. But that one won't build either against 2.6.38 right now
[21:54] <Tv> wido: yeah i think yehuda mentioned a workaround a day or two back
[21:55] <gregaf> that's for correctness, though — I don't recall hearing about builds failing
[21:56] <gregaf> can you pastebin the output?
[21:56] <wido> gregaf: http://pastebin.com/DfPtk8rU
[21:57] <wido> gregaf: one when building libceph: http://pastebin.com/QpEkDQRE
[21:59] <gregaf> which branch are you building from?
[21:59] <wido> master branch
[21:59] <wido> gregaf: It's at 8dc22f352411a6a8349dc9961a6b141c20a6bc81
[22:00] <wido> But doesn't seem to be up-to-date, while git says so
[22:00] <wido> last commit is Feb 7 in my repo
[22:00] <wido> Oh, no, that seems OK
[22:04] <gregaf> well I think Sage is looking at it — I have no idea what's been going on in the kernel lately and Yehuda's out for a while — but he just got back from FAST 11 so he's got several things going on at once
[22:04] <gregaf> it's pretty clear there are some headers not being included properly or something but no idea when that could have gotten broken
[22:06] <wido> gregaf: tnx, no hurry!
[22:07] <wido> Why have the python bindings been included in the main repo? Why not a seperate repo?
[22:08] <gregaf> we want them available for scripting tests with
[22:09] <wido> Ah, ok. Imho the main repo seems a bit big some times, a lot of files which to not relate to eachother
[22:09] <wido> like the radosgw, but that one is moving out
[22:10] <wido> but, tnx! I'll head back to testing again. Still testing that Atom, to me it seems to be working fine
[22:10] <wido> But the 4GB of Ram limitation is a bit low
[22:11] <gregaf> let us know how it goes :)
[22:11] <Tv> wido: the source size isn't actually all that big, it's just slooow to compile..
[22:12] <wido> gregaf: I'll do. But the combination of the 4 2TB disk's and the X25-M seems fine for now. Hope to get a 20 ~ 30 OSD cluster soon
[22:12] <wido> Tv: Get a faster CPU ;)
[22:13] <cmccabe> yeah, I'm trying to compile on my quad-core at home, and the compile times are killing me
[22:13] <cmccabe> I might have to go back to remoting, although the network has been erratic here today
[22:14] <wido> A friend of mine used to run Gentoo, and when updating his laptop he would hijack all the PC's in his house to have his compile cluster running
[22:14] <wido> Just to "emerge" his laptop
[22:15] <Tv> wido: hehe.. umm, about those CPUs.. http://i.imgur.com/Vahhd.png
[22:15] <Tv> <3 distcc
[22:16] <Tv> distcc over 5 8-core machines, and ccache on top of that
[22:17] <wido> Ok, that is enough. But does the compile scale over so many CPU's? Since jobs are waiting for eachother
[22:17] <Tv> wido: as long as you see more green than white in that image, yes
[22:17] <cmccabe> it seems to parallelize fairly well for us at least
[22:18] <Tv> and note the scrollbar ;)
[22:20] <sagewk> wido: what kernel are you compiling against?
[22:21] <sagewk> there are lots of vfs/dcache changes in .38, so it has to build against latest mainline/-rc
[22:24] <wido> sagewk: Ubuntu's 2.6.38 from yesterday, let me check the version
[22:24] <wido> 2.6.38-rc5: http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/
[22:25] <sagewk> huh. can you try again the mainline? maybe the ubuntu team has changed things up some
[22:25] <sagewk> fwiw i have no compilation problems with current master merged with current mainline.
[22:27] <wido> sagewk: As far as I know, those kernels are the mainline, but I'll try against mainline
[22:27] <sagewk> yeah.. :/ ok thanks!
[22:30] <wido> sagewk: I'll grab the mainline from kernel.org and try the ceph-tree against it
[22:51] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[22:52] <wido> http://pastebin.com/EEuDDbaR
[22:53] <wido> has this one been seen before?
[23:21] * allsystemsarego (~allsystem@188.27.164.243) Quit (Quit: Leaving)
[23:32] * Meths (rift@91.106.208.48) Quit (Ping timeout: 480 seconds)
[23:39] <gregaf> wido: is that on 2.6.38?
[23:40] <gregaf> looks like it might be that showstopper bug — it really isn't going to work on 2.6.38 until we implement a workaround or Nick pushes that patch!
[23:52] <Tv> gregaf: with regard to the 20 second stall, ltrace -tt -S or something would be helpful
[23:53] <Tv> i've never actually used ltrace with c++ and pthreads, not sure how well it'll cope
[23:54] <gregaf> I haven't looked to see if I'm reproducing the same issue as Jim (if I am I'll try that), but you probably want to email him about it :)
[23:54] <cmccabe> I wonder if we ever take locks inside constructors, destructors, or assignment operators?
[23:55] <cmccabe> the secret functions of C++
[23:55] <gregaf> the code path at issue is very short and easy to check manually
[23:56] <gregaf> void SafeTimer::add_event_after(double seconds, Context *callback)
[23:56] <gregaf> {
[23:56] <gregaf> assert(lock.is_locked());
[23:56] <gregaf> utime_t when = g_clock.now();
[23:56] <gregaf> when += seconds;
[23:56] <gregaf> add_event_at(when, callback);
[23:56] <gregaf> }
[23:56] <gregaf> void SafeTimer::add_event_at(utime_t when, Context *callback)
[23:56] <gregaf> {
[23:56] <gregaf> assert(lock.is_locked());
[23:56] <gregaf> dout(10) << "add_event_at " << when << " -> " << callback << dendl;
[23:56] <gregaf> scheduled_map_t::value_type s_val(when, callback);
[23:56] <gregaf> scheduled_map_t::iterator i = schedule.insert(s_val);
[23:56] <gregaf> event_lookup_map_t::value_type e_val(callback, i);
[23:56] <gregaf> pair < event_lookup_map_t::iterator, bool > rval(events.insert(e_val));
[23:56] <gregaf> /* If you hit this, you tried to insert the same Context* twice. */
[23:56] <gregaf> assert(rval.second);
[23:56] <gregaf> /* If the event we have just inserted comes before everything else, we need to
[23:56] <gregaf> * adjust our timeout. */
[23:56] <gregaf> if (i == schedule.begin())
[23:56] <gregaf> cond.Signal();
[23:56] <gregaf> }
[23:56] <gregaf> clog.send_log();
[23:56] <gregaf> timer.add_event_after(1.0, new C_Tick(this));
[23:56] <gregaf> // only do waiters if dispatch() isn't currently running. (if it is,
[23:56] <gregaf> // it'll do the waiters, and doing them here may screw up ordering
[23:56] <gregaf> // of op_queue vs handle_osd_map.)
[23:56] <gregaf> if (!dispatch_running) {
[23:57] <gregaf> between the start of timer.add_event_after() and the beginning of that if is where the mysterious pause comes from
[23:57] <gregaf> and there's the entirety of add_event_after and add_event_at
[23:57] <cmccabe> are we absolutely sure that it's not in clog.send_log?
[23:57] <cmccabe> I forget where the printouts were... let me check
[23:59] <cmccabe> hmm, my bad, clog.send_log is not involved

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.