#ceph IRC Log


IRC Log for 2011-11-10

Timestamps are in GMT/BST.

[0:00] * grape (~grape@ Quit (Ping timeout: 480 seconds)
[0:01] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:20] <gregaf1> nwatkins`: do you know how wide a range of things that patch applies to? is it only good against hadoop-.20.205 or should it be good with a generic .20 branch?
[0:31] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[0:42] <nwatkins`> gregaf1: hmm. i suspect it applies to any 0.20 (the core-hadoop change is to the distributed cache and i bet that code doesn't see much churn). similarly, I see barely any change to S3 File System, so the internal API probably is OK too. That said, I haven't tried any other versions.
[0:42] <gregaf1> cool, just want to get the docs right :)
[0:43] <gregaf1> pushed to master!
[0:46] <nwatkins`> ahh, sweet!
[1:11] * grape (~grape@c-76-17-80-143.hsd1.ga.comcast.net) has joined #ceph
[1:26] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:41] <Tv> if gitbuilders seem slow, you can blame me ;)
[1:41] <Tv> installing >=3 vms in parallel on that hardware, lots of disk writes going on
[1:43] * adjohn (~adjohn@70-36-139-211.dsl.dynamic.sonic.net) has joined #ceph
[2:58] * The_Bishop (~bishop@port-92-206-45-45.dynamic.qsc.de) Quit (Read error: Connection reset by peer)
[3:17] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:23] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:36] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:58] * nwatkins` (~user@kyoto.soe.ucsc.edu) Quit (Remote host closed the connection)
[4:22] * darkfader (~floh@ Quit (Remote host closed the connection)
[4:22] * darkfader (~floh@ has joined #ceph
[6:53] * jantje_ (~jan@paranoid.nl) has joined #ceph
[6:53] * jantje (~jan@paranoid.nl) Quit (Read error: Connection reset by peer)
[8:38] * adjohn (~adjohn@70-36-139-211.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[8:39] * votz (~votz@pool-108-52-121-103.phlapa.fios.verizon.net) Quit (Read error: Operation timed out)
[8:45] * votz (~votz@pool-108-52-121-103.phlapa.fios.verizon.net) has joined #ceph
[8:45] * adjohn (~adjohn@70-36-139-211.dsl.dynamic.sonic.net) has joined #ceph
[9:06] * adjohn (~adjohn@70-36-139-211.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[9:41] * tjikkun (~tjikkun@2001:7b8:356:0:225:22ff:fed2:9f1f) Quit (Ping timeout: 480 seconds)
[10:54] * MK_FG (~MK_FG@ Quit (Quit: o//)
[10:54] * MK_FG (~MK_FG@ has joined #ceph
[10:58] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:02] * NaioN (~stefan@andor.naion.nl) Quit (Read error: Connection reset by peer)
[11:06] * NaioN (~stefan@andor.naion.nl) has joined #ceph
[11:12] * stefanha (~stefanha@yuzuki.vmsplice.net) has joined #ceph
[11:13] <stefanha> Is it allowed to have multiple initiators accessing a RADOS block device?
[11:13] <stefanha> Or will librados return an error?
[11:14] <stefanha> (Same question applies to the Linux rbd block driver)
[11:15] <todin> stefanha: you can access the rbd block device from diffrent machines at the same time
[11:17] <stefanha> todin: Thanks
[11:18] <todin> stefanha: but be carefull you have to do the locking yourself
[11:35] <stefanha> todin: I wanted to check whether there was a fundamental rule preventing this. I'm interested in the VM live migration case.
[11:35] <stefanha> I wanted to make sure that a VM with its disk on a RADOS block device can live migrate properly under KVM.
[11:36] <stefanha> That involves having the rbd open simultaneously but does not write from two initiators simultaneously :)
[11:45] <psomas> stefanha: I think it can live migrate without problems, ie you don't need to worry about locking and simultaneous writes i think
[11:50] <stefanha> great
[11:50] <todin> stefanha: you can use the rbd device which is already in qemu-kvm, than you can do a live migration via the qemu migration command
[11:51] <todin> stefanha: I use that myself.
[11:52] <stefanha> cool
[11:54] <todin> stefanha: you could use an rbd image in kvm like this -drive format=rbd,file=rbd:rbd/natty2:rbd_writeback_window=81920000,cache=none,if=virtio
[12:09] * DLange_ is now known as DLange
[12:47] * Nightdog (~karl@190.84-48-62.nextgentel.com) has joined #ceph
[13:21] <NaioN> Does anybody know what these messages mean in the mds log:
[13:22] <NaioN> 2011-11-10 12:49:36.477458 7fe879f0f700 mds.0.cache.dir(100000b4d01) [dentry #1/I1543-AGeneWienberg/backup_20111110_1225/opt/apps/HeutBO/Afbeeldingen/075012A.JPG [2,head] auth (dversion lock) v=29705 inode=0xc19b820 | inodepin 0x49d0130] n(v0 b17669 1=1+0)
[13:22] <NaioN> I'm getting a huge amount of these
[13:22] <NaioN> and the log grows so fast it fills up the disk
[14:13] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[14:13] * gregorg_taf (~Greg@ has joined #ceph
[16:02] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[16:43] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:59] * adjohn (~adjohn@70-36-139-211.dsl.dynamic.sonic.net) has joined #ceph
[17:03] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (Quit: Ex-Chat)
[17:45] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:53] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[18:35] * bchrisman (~Adium@ has joined #ceph
[18:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[18:51] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:59] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:00] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[19:24] * aliguori (~anthony@ has joined #ceph
[19:44] <Tv> i don't understand ruby gems
[19:44] <Tv> that's the last problem with the barclamp
[19:45] <Tv> i see a gem installed on the filesystem, yet require 'open4' won't work
[19:46] <Tv> it seems like the /var/lib/gems path isn't being searched
[19:46] <Tv> yet that's where it gets installed with gem_package 'open4'
[19:53] <Tv> require 'rubygems', eh
[20:25] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[20:30] <jpieper> I'm trying to bring up a new OSD for the first time, and I'm having troubles. Everytime I try to start the osd daemon, it fails with an "File exists not handled" error, tripped on FileStore.cc:2406. Running git HEAD as of earlier today and using btrfs. Any thoughts?
[20:32] <jpieper> Running strace on it indicates it barfs trying to create the /data/osd0/current/meta directory, which indeed, already exists.
[20:35] <Tv> sjust: any clue? ^
[20:35] <jpieper> I've tried stopping everything, running mkcephfs, and starting things up again, but get the same error each time.
[20:35] <sjust> jpieper, Tv: that's odd
[20:35] <Tv> i'm staring at the code but all that sounds to me is like a half-way initialized osd
[20:35] <Tv> sjust: could this be about the idempotent transactions too?
[20:36] <Tv> jpieper: even if you rm -rf /data/osd0 before the mkcephfs?
[20:36] <sjust> Tv: it's btrfs, shouldn't matter
[20:36] <jpieper> Tv, I tried that once, but I haven't been doing that each time I tried.
[20:36] <sjust> jpieper: completely removing the /data/osd0 as well as the journal and recreating the osd from scratch should do it
[20:37] <jpieper> Hmm, now I remember, trying to 'rm -rf /data/osd0/*' gives me a "directory not empty" on /data/osd0/current. ls -al on it shows nothing. Is perhaps my btrfs corrupted?
[20:38] <sjust> jpieper: it probably means that you need to remove the subvolumes seperately?
[20:38] <sjust> /data/osd0/current, that is
[20:38] <sjust> as well as the snap directories
[20:41] <jpieper> Not being an expert on btrfs, how would I go about removing the subvolumes separately?
[20:43] <jpieper> Well, I just unmounted it, ran mkfs.btrfs on the volume, mounted it, then ran mkcephfs again. Same problem.
[20:43] <Tv> jpieper: neat, a reproducible bug!
[20:44] <jpieper> ;)
[20:44] <Tv> jpieper: what version of the kernel / btrfs are you running?
[20:45] <jpieper> ubuntu 11.10, 3.0.0-12
[20:45] <Tv> (.. i wonder if our btrfs detection code could check the version number and ignore too old = fall back to posix behavior)
[20:45] <jpieper> Perhaps that is too old?
[20:45] <Tv> jpieper: we think it should be fine
[20:45] * adjohn (~adjohn@70-36-139-211.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[20:46] <Tv> jpieper: but that's a "think"..
[20:53] <jpieper> For what it is worth, the log mentions some possibly useful bits before, http://pastebin.com/tEPhkytE
[20:54] <jpieper> That was a run made after freshing mkfs.btrfs-ign things, running mkcephfs, then a single '/etc/init.d/ceph start osd'
[20:55] <Tv> jpieper: as it is lunch time and i fear many of us will be busy in the afternoon, would you please file a proper bug report so this gets recorded somewhere?
[20:55] * depend3nt (~default@cpe-98-149-53-63.socal.res.rr.com) has joined #ceph
[20:56] * depend3nt (~default@cpe-98-149-53-63.socal.res.rr.com) Quit (autokilled: Mail support@oftc.net with questions (2011-11-10 19:56:53))
[20:57] <Tv> #%@!@ chef nodes pxebooted the wrong interfaces
[20:58] <Tv> this thing is fiddly
[20:59] <jpieper> Tv: sure thing.
[21:00] * cp (~cp@ has joined #ceph
[21:06] <jpieper> Tv: http://tracker.newdream.net/issues/1707
[21:36] * The_Bishop (~bishop@port-92-206-45-45.dynamic.qsc.de) has joined #ceph
[21:49] * fronlius (~fronlius@e182093240.adsl.alicedsl.de) has joined #ceph
[22:33] <sjust> jpieper: was that log snippet from when you first started the osd?
[22:36] <sjust> d'oh, I think we just need to take a snap shot right after the filestore is created
[22:37] <jpieper> sjust: Yes.
[22:39] <jpieper> I switched to an ext4 backed filesystem, and now have gotten mon to crash: 'mon/PGMonitor.cc: 218: FAILED assert(paxos->get_version() + 1 == pending_inc.version)'
[22:39] <jpieper> This was while testing failing over under rbd load. Is that expected, or should I do another bug report.
[22:40] <sjust> jpieper: another bug report would be cool
[22:46] <jpieper> sjust: http://tracker.newdream.net/issues/1708
[22:51] <sjust> jpieper: thanks
[22:52] <slang> any known issues with current stable branch?
[23:06] <Tv> fyi crowbar status: it seems the remaining bugs are crowbar bugs, not ceph barclamp bugs
[23:06] <Tv> still waiting for a successful run, but the repos are pushed
[23:06] <Tv> the new one being https://github.com/NewDreamNetwork/barclamp-ceph
[23:08] <sjust> jpieper: 7fb182a17b703002c1bd098391fb688b5b1e2749 should fix your first issue (it's in master now)
[23:10] <jpieper> sjust: Great, I'll test it out here shortly.
[23:13] <Tv> hrmm crowbar, why is .deb signatures missing a problem *now*, it's been like that forever
[23:22] * nwatkins` (~user@kyoto.soe.ucsc.edu) has joined #ceph
[23:22] <Tv> ohhh it failed to drop in an apt config file that says trust everything
[23:22] <Tv> huh
[23:22] <Tv> *sigh* reinstalling again
[23:24] <nwatkins`> Ceph client suddently started receiving connection refused errors. Here is the client log http://pastebin.com/WdnpGepH
[23:28] <jpieper> sjust: Yep, 7fb182 appears to fix that problem. Thanks!
[23:30] <gregaf1> nwatkins`: well that's fun, what does the server side say?
[23:32] <nwatkins`> gregaf1: is it safe to rm the log file? it's like 250mb and i want to get something trimed up to look at
[23:32] <gregaf1> not sure what you mean
[23:33] <gregaf1> it's some kind of messaging error, so assuming it persists I don't think we'll care about past history, if that's what you mean
[23:33] <nwatkins`> that's what i mean. i want to get the most recent stuff, but my box is pretty laggy dealing with this enormous txt file
[23:34] <nwatkins`> it's reproducible, so i wanted to just start with an empty log and re-run the test
[23:34] <gregaf1> heh
[23:34] <gregaf1> yeah, should be fine
[23:34] <nwatkins`> hmm
[23:34] <nwatkins`> it's not returning...
[23:37] <nwatkins`> gregaf1: either way, here is the tail end of the mds log. it definitely includes at least one instance of the test in which the client received connection refused. http://pastebin.com/jK76X2hd
[23:38] * fronlius (~fronlius@e182093240.adsl.alicedsl.de) Quit (Quit: fronlius)
[23:39] <gregaf1> nwatkins: oh, no messenger debug on the mds? :(
[23:39] <nwatkins`> ahh
[23:40] <nwatkins`> yikes, no
[23:40] <gregaf1> I don't think I can do much without that, since the client never actually makes it to the MDS code
[23:41] <gregaf1> the clocks are off too, though I presume that's just because they're off on the hardware
[23:41] <nwatkins`> shoot, i restarted ceph too and the problem is gone. i believe that this is the same symptom that I saw last week, and when i bumped up the logs, the MDS was complaining that there were not any available file descriptors
[23:42] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[23:49] * The_Bishop (~bishop@port-92-206-45-45.dynamic.qsc.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.