#ceph IRC Log

Index

IRC Log for 2012-07-23

Timestamps are in GMT/BST.

[0:02] * BManojlovic (~steki@212.200.241.106) has joined #ceph
[0:12] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[0:16] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) has joined #ceph
[0:25] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[0:37] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) Quit (Quit: LarsFronius)
[0:40] * BManojlovic (~steki@212.200.241.106) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:44] * BManojlovic (~steki@212.200.241.106) has joined #ceph
[0:49] * nearbeer (~nik@c-75-75-33-53.hsd1.va.comcast.net) has joined #ceph
[0:59] * nearbeer (~nik@c-75-75-33-53.hsd1.va.comcast.net) Quit (Quit: Colloquy for iPad - http://colloquy.mobi)
[1:25] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:26] * BManojlovic (~steki@212.200.241.106) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:49] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[1:55] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[2:54] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[4:39] * deepsa (~deepsa@122.172.212.52) has joined #ceph
[4:41] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:43] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[4:45] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit ()
[4:58] * nhm (~nh@174-20-12-175.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[5:24] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[6:01] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[6:12] * zhangdongmao (~zhangdong@222.126.194.154) has joined #ceph
[6:25] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:45] * Meths_ (rift@2.25.212.98) has joined #ceph
[6:51] * Meths (rift@2.25.191.157) Quit (Ping timeout: 480 seconds)
[7:28] * zhangdongmao (~zhangdong@222.126.194.154) Quit (Quit: Lost terminal)
[7:36] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[7:44] * zhangdongmao (~zhangdong@222.126.194.154) has joined #ceph
[7:46] * Guangliang (~glzhao@222.126.194.154) has joined #ceph
[7:47] * zhangdongmao (~zhangdong@222.126.194.154) Quit ()
[7:55] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has left #ceph
[8:09] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:13] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has left #ceph
[8:19] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) has joined #ceph
[8:48] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[9:05] * ryant5000 (~ryan@cpe-67-247-9-63.nyc.res.rr.com) has joined #ceph
[9:06] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Remote host closed the connection)
[9:07] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph
[9:12] <ryant5000> is there any way to get the size of a directory including the space used by snapshots?
[9:15] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) Quit (Ping timeout: 480 seconds)
[9:17] * loicd (~loic@83.167.43.235) has joined #ceph
[9:27] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:41] * Guangliang (~glzhao@222.126.194.154) has left #ceph
[9:45] <ryant5000> is it possible to use the cephfs command with a filesystem mounted by ceph-fuse?
[9:47] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[9:51] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[9:54] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[10:03] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:07] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[10:17] * Meths (rift@2.25.214.171) has joined #ceph
[10:18] * Meths_ (rift@2.25.212.98) Quit (Ping timeout: 480 seconds)
[10:25] * Meths (rift@2.25.214.171) Quit (Ping timeout: 480 seconds)
[10:27] * Meths (rift@2.25.214.159) has joined #ceph
[10:29] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[10:33] * Meths (rift@2.25.214.159) Quit (Remote host closed the connection)
[10:33] * Meths (rift@2.25.214.159) has joined #ceph
[10:40] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) has joined #ceph
[10:41] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[10:41] * Meths (rift@2.25.214.159) Quit (Ping timeout: 480 seconds)
[11:03] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[11:03] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[11:35] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[11:38] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[11:39] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[11:56] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[11:57] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:58] * Meths (rift@2.27.73.166) has joined #ceph
[11:59] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit ()
[13:09] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[13:15] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[13:24] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[13:25] * Leseb_ (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[13:25] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[13:25] * Leseb_ is now known as Leseb
[13:28] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit ()
[13:33] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:44] * joao (~JL@89.181.148.137) has joined #ceph
[13:46] <joao> hello #ceph
[13:52] * loicd (~loic@83.167.43.235) Quit (Quit: Leaving.)
[14:06] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[14:09] * newtontm (~jsfrerot@charlie.mdc.gameloft.com) has joined #ceph
[14:09] <newtontm> hi,
[14:17] <newtontm> i'm trying to figure out the *best* way to have different mount points from my ceph fs, I've read about pools but the doc is not very explicit. What would be your recommandation?
[14:19] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[14:21] * loicd (~loic@83.167.43.235) has joined #ceph
[14:33] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[14:34] <newtontm> also, what is the minimum block size for cephfs, let's say I have a file which contains 1 byte of data, how much space will it take in ceph?
[15:07] <joao> newtontm, regarding your last question, although not 100% certain, I would say it would take as much space as the block size of the osd's file system
[15:07] <joao> plus any space taken by xattrs
[15:08] <joao> but, as I said, not certain :)
[15:08] <joao> if you really need specifics, maybe some of the guys could provide them, once they arrive
[15:08] <joao> (which should happen in 3 hours or so)
[15:18] <newtontm> joao: thx, I'll wait for confirmation from the guys
[15:21] <elder> My problems with teuthology have ceased. Turns out one of my machines was crashed. Duh.
[15:27] * ryant5000 (~ryan@cpe-67-247-9-63.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:03] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Connection reset by peer)
[16:16] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:41] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[16:41] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[17:07] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:20] * loicd (~loic@83.167.43.235) Quit (Quit: Leaving.)
[17:26] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:32] <newtontm> Hi again, seems there is a bug with the lock mechanism when 2 processes wants to write the the same file over 2 different machines.
[17:33] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[17:35] <joao> looks like gitbuilder died
[17:36] <joao> at least, looks like gitbuilder-precise-amd64 died
[17:36] <iggy> newtontm: we've dealt with that with panasas, it's usually the application at fault
[17:38] <newtontm> i'm filling a bug report, i'm pretty sure it's not application, but if you feel we need to discuss this before I have no problems with this
[17:40] <iggy> I'm not a ceph dev, so I can't really speak to that... I was merely relaying issues we've had on other similar storage systems
[17:40] * loicd (~loic@magenta.dachary.org) has joined #ceph
[17:53] * joebaker (~joebaker@64.245.0.3) has joined #ceph
[17:55] * Tv_ (~tv@2607:f298:a:607:9cb9:bfb8:f8d0:361c) has joined #ceph
[17:57] <newtontm> http://tracker.newdream.net/issues/2825 created
[18:00] <iggy> someone might want to push the debian maintainer (of both ceph and leveldb) a bit, the packages are near being dropped before the freeze
[18:03] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[18:05] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[18:11] * gregaf (~Adium@2607:f298:a:607:e987:dc8f:9b8a:10bd) Quit (Quit: Leaving.)
[18:12] * mkampe (~markk@2607:f298:a:607:222:19ff:fe31:b5d3) Quit (Quit: Leaving.)
[18:13] * gregaf (~Adium@2607:f298:a:607:ed43:f4ec:1f6c:43d8) has joined #ceph
[18:14] * mkampe (~markk@2607:f298:a:607:222:19ff:fe31:b5d3) has joined #ceph
[18:15] * nhm (~nh@65-128-165-73.mpls.qwest.net) has joined #ceph
[18:20] <gregaf> newtontm: that's odd???the fcntl locks have actually been used before. Could you add the versions you're using to the bug report?
[18:22] * Cube (~Adium@12.248.40.138) has joined #ceph
[18:22] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Remote host closed the connection)
[18:22] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[18:23] <dspano> I stopped one of two OSDs did an update, then when the two try to peer, I'm getting errors like this. Has anyone run into this before?
[18:23] <dspano> 2 slow requests, 1 included below; oldest blocked for > 41.237050 secs
[18:24] <dspano> The cluster locks until I shut the node down.
[18:25] <gregaf> dspano: that output means that your OSDs are being slow at handling some requests, which isn't good but isn't necessarily a problem
[18:25] <gregaf> if the requests aren't eventually being satisfied, you should look at the node and see if there's output in dmesg or something about the local filesystem
[18:25] <dspano> gregaf: That's bizarre.
[18:30] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[18:31] <Tv_> FYI: gitbuilders will be having intermittent downtime today, migrating them to newer hardware
[18:32] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[18:37] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[18:50] * glowell (~glowell@141.142.134.93) has joined #ceph
[18:55] <dspano> gregaf: When I take the one node out, the surviving one has this status. Is there anyway to clear this out? pgmap v368371: 400 pgs: 400 active+degraded; 27777 MB data, 28952 MB used, 4436 GB / 4464 GB avail; 5808/11616 degraded (50.000%)
[18:55] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:01] <gregaf> dspano: that status (degraded) means that the data doesn't have as many copies as it should; you'll need to either change your CRUSH map to say one copy, or get another OSD working
[19:01] <newtontm> gregaf: it's in the bug report version 0.48
[19:10] * chutzpah (~chutz@100.42.98.5) has joined #ceph
[19:15] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[19:18] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[19:25] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:27] * ninkotech (~duplo@89.177.137.231) Quit (Remote host closed the connection)
[19:27] * deepsa_ (~deepsa@122.172.21.41) has joined #ceph
[19:29] * deepsa (~deepsa@122.172.212.52) Quit (Ping timeout: 480 seconds)
[19:29] * deepsa_ is now known as deepsa
[19:33] <dspano> gregaf: setting size 1 and remove the osd from the crushmap fixed it.
[19:34] <gregaf> of course now you only have one copy of everything ;)
[19:40] * BManojlovic (~steki@212.200.241.106) has joined #ceph
[19:51] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) Quit (Quit: adjohn)
[19:56] * Ryan_Lane (~Adium@216.38.130.164) has joined #ceph
[19:56] * Ryan_Lane (~Adium@216.38.130.164) Quit ()
[19:56] * Ryan_Lane (~Adium@216.38.130.164) has joined #ceph
[20:04] * mib_6m33m8 (c5cd2ebb@ircip2.mibbit.com) has joined #ceph
[20:05] * nymous (~darthampe@95-86-252-224.pppoe.yaroslavl.ru) has joined #ceph
[20:07] * mib_6m33m8 (c5cd2ebb@ircip2.mibbit.com) Quit ()
[20:09] * nymous (~darthampe@95-86-252-224.pppoe.yaroslavl.ru) Quit (Quit: RAGING AXE! RAGING AXE!)
[20:23] * cephalobot (~ceph@ps94005.dreamhost.com) Quit (Remote host closed the connection)
[20:23] * cephalobot (~ceph@ps94005.dreamhost.com) has joined #ceph
[20:24] * cattelan (~cattelan@2001:4978:267:0:21c:c0ff:febf:814b) has joined #ceph
[20:24] * cephalobot (~ceph@ps94005.dreamhost.com) Quit (Remote host closed the connection)
[20:25] * cephalobot (~ceph@ps94005.dreamhost.com) has joined #ceph
[20:26] * cephalobot (~ceph@ps94005.dreamhost.com) Quit (Remote host closed the connection)
[20:26] * cephalobot (~ceph@ps94005.dreamhost.com) has joined #ceph
[20:31] <newtontm> i'm trying to figure out the *best* way to have different mount points from my ceph fs, I've read about pools but the doc is not very explicit. What would be your recommandation?
[20:31] <newtontm> also, what is the minimum block size for cephfs, let's say I have a file which contains 1 byte of data, how much space will it take in ceph?
[20:42] <gregaf> newtontm: you need to define different mount points better???you can mount the Ceph filesystem in arbitrary directories
[20:42] <gregaf> ceph doesn't do minimum block sizes on its own; it just stores the data ??? so Joao is correct, it's dependent on the OSD's filesystem
[20:42] <gregaf> back later, lunchtime!
[20:44] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:46] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:47] <newtontm> gregaf: what do you mean by 'you need to define different mount points better' ?
[20:49] <newtontm> I guess what I mean is that ceph has a default pool named "data" and when I mount this pool I mount it anywhere I want, let say I want to seperate the data on the server, whould I need to create different pool, or should I create folder directly into the main "data" pool then mount those directory as I see fit?
[20:53] <iggy> newtontm: you can only have one namespace in a ceph cluster, pool layout is personal preference/needs
[20:54] <iggy> newtontm: when I've setup ceph, I've just had 1 pool and then subdir's in the root(/) of the namespace, then mount those subdirs where I need them
[20:56] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:58] * loicd (~loic@magenta.dachary.org) has joined #ceph
[21:06] <rturk> If anyone is interested in seeing what happens when you run this channel's log into superseriousstats, look here: http://dreamy.inktank.com/irc/
[21:07] <elder> I guess I'm talkative.
[21:07] <elder> Who knew?
[21:08] <rturk> hah - I guess you do kind of win in a lot of categories
[21:08] <nhm> rturk: man, I clearly need to talk more.
[21:08] <elder> Well, I'll just say something each time you do, to preserve my pole position.
[21:08] <rturk> I wouldn't read too much into it.
[21:09] <rturk> I like how "ceph" is the fourth most common four-letter word
[21:09] <nhm> elder: that's fine, I'm ok with 2nd place. ;)
[21:09] <elder> There. Now I'm still infirst.
[21:09] <rturk> elder: lol!
[21:09] <nhm> ooh, I'm 3rd for most tedious chatter
[21:10] <elder> But first for exclamations. (Still in first)
[21:10] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[21:10] <rturk> ya I don't really know what tedious means in this context
[21:10] <nhm> rturk: characters/line
[21:10] <nhm> rturk: So
[21:10] <nhm> I
[21:11] <nhm> Need
[21:11] <elder> And fluent?
[21:11] <nhm> To
[21:11] <nhm> do
[21:11] <nhm> this
[21:11] <elder> a
[21:11] <elder> b
[21:11] <elder> c
[21:11] <elder> d
[21:11] <elder> e
[21:11] <elder> f
[21:11] <elder> g
[21:11] <elder> !!!
[21:11] <rturk> I'm starting to regret this
[21:11] <nhm> joao: you are super moody
[21:12] <rturk> ok! lunch :)
[21:12] <joao> I am?
[21:12] <rturk> gotta go eat a penance salad
[21:12] <nhm> joao: yes, rturk's irc chatbot statistics proove it. ;)
[21:13] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Read error: Connection reset by peer)
[21:13] <joao> looks like I'm the third most talkative person, although I have no idea how or why
[21:14] <joao> what are the criteria for the "moodiest people"? :p
[21:14] <joao> oh
[21:14] <nhm> sagewk apparently the number of emoticons you use. Ie ":p"
[21:14] <joao> smilies
[21:14] <joao> ok
[21:15] * adjohn (~adjohn@50-0-133-101.dsl.static.sonic.net) has joined #ceph
[21:15] <joao> I knew the smilies were good for something; now I know what they're good for
[21:16] <nhm> ooh, performance is the #1 11 character word
[21:28] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:28] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:33] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[21:36] <elder> Going away for a few hours. Back online for most of the evening.
[21:44] * steki-BLAH (~steki@bojanka.net) has joined #ceph
[21:48] * BManojlovic (~steki@212.200.241.106) Quit (Ping timeout: 480 seconds)
[21:52] <dspano> No idea why yet, but doing the kernel upgrade for Ubuntu 12.04 that just came out hosed one of my OSDs. Rolling back to 3.2.0-26-generic fixed it.
[21:59] <dspano> I forgot to preface that with it's just a heads up for anyone running Ubuntu 12
[22:05] <gregaf> newtontm: you don't mount pools ??? rather, files are stored in pools, and you mount some portion of the filesystem hierarchy
[22:06] <gregaf> you can set a different default pool for (newly-created) files to be stored in on any directory, and all children will use that pool (if they don't have their own setting)
[22:12] * MarkDude (~MT@c-67-170-181-8.hsd1.or.comcast.net) has joined #ceph
[22:27] * BManojlovic (~steki@212.200.241.106) has joined #ceph
[22:30] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:31] * steki-BLAH (~steki@bojanka.net) Quit (Ping timeout: 480 seconds)
[22:43] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[22:44] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:47] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:54] * Tamil (~Adium@2607:f298:a:607:d14c:28c:44ac:9eaa) has joined #ceph
[22:57] * LarsFronius (~LarsFroni@95-91-243-240-dynip.superkabel.de) has joined #ceph
[22:57] <dspano> Actually, upgrading the kernel took out both nodes, just at different times. I didn't receive any errors in the kernel, dmesg, or ceph logs to say why. They would both come up, but would get stuck on peering
[22:59] * MarkDude (~MT@c-67-170-181-8.hsd1.or.comcast.net) Quit (Read error: Connection reset by peer)
[23:00] <dspano> Positive thing is that although it locked up, the cephfs I had mounted on a client server did not crash.
[23:14] * Tamil (~Adium@2607:f298:a:607:d14c:28c:44ac:9eaa) has left #ceph
[23:19] * joebaker (~joebaker@64.245.0.3) Quit (Remote host closed the connection)
[23:27] <dspano> When I run the newer kernel, this is the error I'm getting. 0 -- 192.168.3.12:6802/2277 >> 192.168.3.11:6802/2440 pipe(0x85a7500 sd=35 pgs=0 cs=0 l=0).accept connect_seq 0 vs existing 0 state 1
[23:27] <dspano> 2012-07-23 17:26:25.822043 7f6cdb974700 0 -- 192.168.3.12:6802/2277 >> 192.168.3.11:6802/2440 pipe(0x88c5500 sd=32 pgs=0 cs=0 l=0).connect got RESETSESSION but no longer connecting
[23:28] <dspano> Should I submit it to the tracker?
[23:28] * MarkDude (~MT@c-67-170-181-8.hsd1.or.comcast.net) has joined #ceph
[23:31] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[23:31] <gregaf> sagewk: comments on wip-msgr-master (not that it had much to do with the messenger ;) )
[23:31] <gregaf> dspano: that's out of the OSD log, after upgrading the kernel?
[23:31] <gregaf> did you upgrade the userspace stuff too?
[23:32] <dspano> gregaf: Yeah. That's from the OSD that's up when the other one with the upgraded kernel starts.
[23:32] <gregaf> and that's an odd error to be seeing???what version of ceph-osd are you running?
[23:32] <dspano> 0.48argonaut
[23:32] <gregaf> what's "ceph -s" display?
[23:34] <dspano> gregaf: I don't have the ceph -s, Here's what ceph -w says.
[23:34] <dspano> 2012-07-23 17:31:16.081494 mon.0 [INF] pgmap v369369: 400 pgs: 27 active+clean, 183 peering, 187 active+degraded, 3 active+recovering; 27775 MB data, 56795 MB used, 4409 GB / 4464 GB avail; 2322/11616 degraded (19.990%)
[23:34] <dspano> It will just hang there.
[23:34] <gregaf> you should be able to run "ceph -s" from any node with monitor access
[23:35] <gregaf> same as ceph -w
[23:37] <dspano> I've got to break it again. Hang on. Lol.
[23:37] <gregaf> wait, it's not broken now?
[23:37] <gregaf> what's it say now?
[23:37] <gregaf> (then break it and see what it says again)
[23:38] <dspano> I rolled back the kernel to fix it before you replied.
[23:38] <gregaf> and that actually did fix it?
[23:38] <gregaf> you have both OSDs up and working fine now?
[23:42] <dspano> gregaf: Yeah, I just broke it again, and here's what I get with the newer kernel.
[23:42] <dspano> root@ha2:~# ceph -s
[23:42] <dspano> health HEALTH_WARN 400 pgs peering; 185 pgs stuck inactive; 395 pgs stuck unclean
[23:42] <dspano> monmap e1: 3 mons at {0=192.168.3.11:6789/0,1=192.168.3.12:6789/0,2=192.168.1.64:6789/0}, election epoch 118, quorum 0,1,2 0,1,2
[23:42] <dspano> osdmap e1451: 2 osds: 2 up, 2 in
[23:42] <dspano> pgmap v369519: 400 pgs: 400 peering; 27776 MB data, 56799 MB used, 4409 GB / 4464 GB avail
[23:42] <dspano> mdsmap e240: 1/1/1 up {0=2=up:active}, 2 up:standby
[23:42] <dspano> These are the packages Ubuntu released today or over the weekend.
[23:42] <dspano> The following NEW packages will be installed:
[23:42] <dspano> linux-image-3.2.0-27-generic
[23:42] <dspano> The following packages will be upgraded:
[23:42] <dspano> linux-generic linux-image-generic
[23:44] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:45] <gregaf> okay, so both the OSDs are up and communicating with the monitors, but they apparently can't talk to each other?
[23:45] <gregaf> can you check and see if that same error message has popped up again?
[23:47] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:51] <lurbs> FWIW, my test cluster seems okay with the new 3.2.0-27 kernel. Also 12.04 and 0.48.
[23:51] <gregaf> yeah, I don't think this can really be a kernel thing
[23:53] <dspano> Yeah this time it's something different.
[23:53] <dspano> 2012-07-23 17:42:05.846884 7f6cdb570700 1 CephxAuthorizeHandler::verify_authorizer isvalid=1
[23:53] <dspano> 2012-07-23 17:42:17.945702 7f6ce9492700 0 log [WRN] : 5 slow requests, 2 included below; oldest blocked for > 75.403134 secs
[23:53] <dspano> 2012-07-23 17:42:17.945727 7f6ce9492700 0 log [WRN] : slow request 60.403225 seconds old, received at 2012-07-23 17:41:17.542385: osd_op(client.8478.1:1785 10000029839.00000000 [write 0~35251 [1@-1]] 0.bb35e594 snapc 1=[]) currently delayed
[23:53] <gregaf> okay, that means the OSD is moving slowly, but it might be making progress
[23:54] <gregaf> let it run for a while and see if it gets better
[23:56] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[23:56] * MarkDude (~MT@c-67-170-181-8.hsd1.or.comcast.net) Quit (Read error: Connection reset by peer)
[23:57] * MarkDude (~MT@c-67-170-181-8.hsd1.or.comcast.net) has joined #ceph
[23:59] * s[X] (~sX]@eth589.qld.adsl.internode.on.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.