#ceph IRC Log


IRC Log for 2012-05-17

Timestamps are in GMT/BST.

[7:00] * CephLogBot (~PircBot@rockbox.widodh.nl) has joined #ceph
[10:17] <chuanyu> hi everyone, does someone meet "mount error 12 = Cannot allocate memory" error when mounting ceph ?
[10:20] <chuanyu> ceph health is ok, and no error log appear (level 2)
[10:22] <chuanyu> I have 3 mon, and no matter what mon I mount, just "mount error 12" after normal msg
[10:23] <chuanyu> I use ceph 0.46
[14:13] <darkfader> does any of you know how to identify if a dm device is a lvm devices?
[14:13] <darkfader> i need to make something lightweight to do per vg-stats
[14:13] <darkfader> err per lv
[14:14] <darkfader> and dmsetup table can't help me because lvm is just the target "linear"
[14:14] <darkfader> not something like "lvm"
[14:15] <darkfader> i can use 'lvs' and then grep wildly :)
[15:20] <benner> darkfader: try ls -la /dev/mapper/
[15:21] <benner> and dmsetup ls
[15:27] <chuanyu> anybody can help me figure out "mount error 12 = Cannot allocate memory" ?
[15:28] <chuanyu> i try to strace mount.ceph ... , but just mount("...") = -1 ENOMEM (Cannot allocate memory)
[18:40] <Tv_> sagewk: fyi SpamapS crashed ceph-mon last night, I just sent email to ceph-devel ("pushed it to a crash")
[18:40] <sagewk> k
[19:10] <Tv_> ruh-roh: http://tracker.newdream.net/issues/2443
[19:10] <Tv_> that's some mighty big stink there
[19:10] <gregaf> it might be the x cap that allows that, let me see if I can figure it out
[19:11] <gregaf> (not counting on it though)
[19:11] <Tv_> gregaf: thanks.. check against generate_caps in teuthology, which is the best understanding of what's meant to be used where that i've ever heard from anyone
[19:12] <Tv_> actually no the teuthology one is too wide
[19:12] <Tv_> the one i have in the ticket is already narrower
[19:14] <gregaf> heehee, there are no cap checks anywhere in the path to the AuthMonitor
[20:06] * BManojlovic (~steki@ has joined #ceph
[20:13] <MarkDude> rturk, hello. You were at UDS? I guess I missed you
[20:13] <MarkDude> Where are you based now?
[20:18] <rturk> Hi! Yes, I was at UDS but only on Monday
[20:18] <rturk> Tuesday I was at their cloud summit in the morning, then had to head out
[20:21] <rturk> I'm based in LA
[22:03] <SpamapS> Tv_: is there a different branch/repo I should try? Is it possible that crash is unique to the 'chef-2' branch?
[22:05] <gregaf> SpamapS: Tv_ went out to run some errands, but it's not a branch-specific problem
[22:05] <gregaf> assuming you're talking about the crash that he forwarded to ceph-devel this morning
[22:08] <SpamapS> gregaf: indeed I am. Ok thats good to know.
[22:25] <yehudasa> Tv_: I've got some tcpdump I need help to decipher
[22:41] <sagewk> spamaps: looking into it now
[23:01] <sagewk> spamaps: there?
[23:09] <nhm> sagewk: so I'm thinking I want to take a look at 1 OSD vs 5 OSDs per node on the ssd nodes, and then walk through and look at how long the OSDs are spending at each stage in the pipeline for each run and note the differences. Seem like a reasonable way to go to investigate the bottleneck?
[23:09] <SpamapS> sagewk: here now.
[23:10] <SpamapS> sagewk: I have the boxes that are affected running
[23:10] <SpamapS> sagewk: and can pretty easily (I think) cause it to happen
[23:11] <nhm> sagewk: alternatively, I could start out by just looking at the seekwatcher results for each and see if there is anything odd looking.
[23:11] <sagewk> nhm: i doubt seekwatcher will tell us much if its osds
[23:11] <sagewk> looking at latencies seems like the right way forward
[23:11] <nhm> sagewk: yeah, that was kind of my thought too.
[23:12] <sagewk> spamaps: question about how you're doing this.. the first mon forms its initial quorum just with itself, right?
[23:12] <sagewk> and each additional mon gets a monmap with the first guy + itself?
[23:12] <SpamapS> sagewk: I believe so. Let me pastebin the whole log
[23:13] <sagewk> spamaps: which branch are you working against?
[23:13] <SpamapS> sagewk: here is /var/log/ceph/ceph-mon.cmon-0.log http://paste.ubuntu.com/993124/
[23:13] <SpamapS> sagewk: deb http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/chef-2 precise main
[23:14] <sagewk> k
[23:14] <sagewk> can you reproduce with 'debug mon = 20' in the [mon] section of ceph.conf?
[23:14] <SpamapS> sure
[23:16] <SpamapS> sagewk: also, the monmap we're creating does not have the first guy + itself.. I believe we had to put in some workaround to avoid a bug in doing it that way, and use public network = x.x.x.0/24
[23:17] <SpamapS> sagewk: so each box gets one created just with monmaptool --create --clobber --add cmon-0 $address_of_cmon0 ...
[23:22] <dmick> we need to commission Ceph art from this guy: http://www.theatlanticcities.com/arts-and-lifestyle/2012/05/heart-stopping-art-day-giant-inflatable-tentacles/2032/
[23:22] <sagewk> spamaps: ah, let me test that scenario
[23:24] <rturk> dmick: yes???perhaps we can do it to the Aon building
[23:24] <elder> HUGE ones on the AON building.
[23:25] <dmick> I vote yes
[23:25] <rturk> I guess if we all vote "yes" that means we need to do it
[23:25] <elder> Yes
[23:25] <rturk> I want to see a helicopter stuck in it
[23:28] <gregaf> I shudder to think what Aon would charge to allow it
[23:29] <SpamapS> just guerilla them into the subway entrances. Its not like anybody would complain.. :)
[23:30] <nhm> gregaf: maybe work the publicity angle? "You too could have a law office in this building..."
[23:30] <SpamapS> "Send a message to potential clients.. We are in fact, SLIMY"
[23:31] <sagewk> spamaps: hmm, my simple test doesn't reproduce.. any luck with generating a log?
[23:31] <SpamapS> sagewk: its crashing now :)
[23:31] <elder> sagewk, we have more important things to talk about.
[23:32] <sagewk> spamaps: http://fpaste.org/piOr/
[23:32] <sagewk> elder: skype?
[23:32] <elder> No, I mean we need to talk about how to get tentacles mounted on the Aon building.
[23:32] <elder> That sort of important business.
[23:33] <sagewk> heh right :)
[23:33] <SpamapS> sagewk: http://paste.ubuntu.com/993157/
[23:33] <SpamapS> sagewk: log w/ debug
[23:33] <Tv_> SpamapS: i was preparing a chef-3 but tried to bundle a little more features into it, and ran into issues of my own
[23:34] <SpamapS> sagewk: does not seem to crash until the second and third machines connect
[23:34] <Tv_> sagewk, SpamapS: secondary mons get a monmap with just the first mon, to avoid the other bug..
[23:35] <Tv_> #2436
[23:35] <Tv_> and use public_network to figure out what their own ip is
[23:37] <SpamapS> sagewk: do you want the other logs from cmon-debug-1 and -2 ?
[23:37] <sagewk> oh i see what's going on.
[23:38] <sagewk> i think my wip branch already fixes this. tv_, what i should i rebase on?
[23:38] <sagewk> chef-2?
[23:38] <Tv_> sagewk: SpamapS is running on chef-2, i was making a chef-3 based on wip-upstart etc in master but got sidetracked
[23:40] <sagewk> k. spamaps: pushed wip-quorum branch that includes chef-2 stuff. packages will take a bit to build..
[23:41] <SpamapS> sagewk: cool, I'll set it aside for a bit.. I'm sure I can find *something* else to do. :)
[23:44] <Tv_> SpamapS: the apt url you have, you can s/chef-2/any branch name/, and you can eyeball the url to see if it's built yet..
[23:46] <SpamapS> Tv_: man what a pain, I have to hit *refresh*? ;)
[23:51] <joshd> sagewk: was there a bug with dprune in 3.2?
[23:52] <sagewk> hmm... don't remember. what kind of bug?
[23:52] <sagewk> i think we're ignoring it by default
[23:53] <joshd> null pointer dereference reported on the mailing list with subject 'reproductible kernel oops with kernel 3.2 inside kvm'
[23:57] <sagewk> 774ac21da76f5c3018428725074e27a3fd40b128 perhaps
[23:57] <joshd> no, that was in 3.2
[23:59] <sagewk> yeah, not seeing obvious culprits
[23:59] <sagewk> i wonder if he can do it on a newer kernel too

