#ceph IRC Log


IRC Log for 2011-11-30

Timestamps are in GMT/BST.

[0:00] <grape> have a good night
[0:00] <NaioN> i advise to look at the wiki and begin with a plain config as possible
[0:00] <NaioN> np
[0:00] * aliguori (~anthony@ Quit (Remote host closed the connection)
[0:09] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Remote host closed the connection)
[0:20] * sjustlaptop (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[0:27] * adjohn (~adjohn@ has joined #ceph
[0:28] * adjohn is now known as Guest18753
[0:28] * _adjohn (~adjohn@ has joined #ceph
[0:28] * Guest18753 (~adjohn@ Quit (Read error: Connection reset by peer)
[0:28] * _adjohn is now known as adjohn
[0:37] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[0:42] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[0:44] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[0:50] * adjohn is now known as Guest18754
[0:50] * _adjohn (~adjohn@ has joined #ceph
[0:50] * _adjohn is now known as adjohn
[0:50] * Guest18754 (~adjohn@ Quit (Ping timeout: 480 seconds)
[1:24] <Tv> pmjdebruijn: still there?
[1:39] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:43] <Tv> pmjdebruijn: for the logs, your problem was likely fixed by 77a62fdce4afb305d5314590c02325b1b221c93f
[2:04] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[2:05] * greglap (~Adium@aon.hq.newdream.net) Quit ()
[2:08] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:23] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[2:35] * sjustlaptop (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[3:01] * adjohn (~adjohn@ Quit (Quit: adjohn)
[3:04] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[3:52] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[3:52] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit ()
[4:07] * NightDog (~karl@52.84-48-58.nextgentel.com) Quit (Quit: Leaving)
[4:16] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[4:16] * NightDog (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[4:17] * NightDog (~karl@52.84-48-58.nextgentel.com) Quit ()
[4:17] * NightDog (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[4:56] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[4:59] * NightDog (~karl@52.84-48-58.nextgentel.com) Quit (Quit: This computer has gone to sleep)
[5:28] * elder (~elder@aon.hq.newdream.net) Quit (Quit: Leaving)
[6:00] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[6:02] * sjustlaptop (~sam@96-41-121-194.dhcp.mtpk.ca.charter.com) has joined #ceph
[6:07] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Read error: Operation timed out)
[6:10] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:30] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[6:46] * sjustlaptop (~sam@96-41-121-194.dhcp.mtpk.ca.charter.com) Quit (Read error: Operation timed out)
[7:26] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[8:40] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[9:22] * fronlius (~fronlius@testing78.jimdo-server.com) has joined #ceph
[9:43] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[10:37] * yoshi (~yoshi@p9224-ipngn1601marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:53] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[14:07] * gregorg_taf (~Greg@ has joined #ceph
[14:07] * gregorg (~Greg@ Quit (Read error: Connection reset by peer)
[14:08] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Remote host closed the connection)
[14:12] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[14:41] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[16:08] * NightDog (~karl@52.84-48-58.nextgentel.com) has joined #ceph
[16:48] * MarkDude (~MT@ has joined #ceph
[16:54] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) has joined #ceph
[16:56] * gregorg (~Greg@ has joined #ceph
[16:56] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[17:10] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[17:11] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:13] * Olivier_bzh (~langella@xunil.moulon.inra.fr) Quit (Quit: Leaving.)
[17:20] * fronlius (~fronlius@testing78.jimdo-server.com) Quit (Quit: fronlius)
[17:34] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:01] * fronlius (~fronlius@e176057253.adsl.alicedsl.de) has joined #ceph
[18:13] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[18:52] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:03] <wido> Hi
[19:04] <sjust> hi
[19:04] <wido> I just created a pool with a 'garbage' name, my char did not have a pool name, so I have two pools now with a weird name
[19:04] <wido> just the memory contents of that char where used for the pool name
[19:05] <wido> there is no way to remove a pool by it's ID, is there?
[19:05] <wido> instead of the name
[19:05] <sjust> hmm
[19:05] * bchrisman (~Adium@ has joined #ceph
[19:05] <wido> "`y���Dއ"
[19:05] <wido> that is a pool name for example
[19:06] <Tv> wido: the api bindings should be able to remove it, if nothing else
[19:06] <joshd> you should be able to delete it by name still by listing pools, and deleting the weird ones you get back
[19:07] <joshd> the objecter has the function for deleting by id, but librados doesn't expose it directly
[19:07] <wido> Tv: I get it, but the name is non-ascii
[19:07] <Tv> wido: shouldn't matter to an api
[19:08] <sagewk> i wonder if we should disallow pool names that are numeric. then the mon commands (among other things) could delete either without ambiguity.
[19:08] <Tv> sagewk: numbers can always be passed in via an explicit --numeric=241
[19:08] <wido> Tv: true, but, it could be a feature request
[19:08] <wido> in the future you would see this happen more often
[19:08] <Tv> restricting to valid utf-8, that i can see
[19:09] <wido> and a sysadmin might want to remove that pool
[19:09] <Tv> but even then, you're gonna have a row of snowmen
[19:09] <sagewk> yeah. it can still be confusing if pool id 12 is called "14" or something. currently the pool name is totally unrestricted
[19:09] <Tv> wido: oh yes it should be removable with the command line tools
[19:10] <Tv> sagewk: i really don't expect people to deal with the numbers...
[19:10] <Tv> sagewk: that's like... is a completely valid dns *name*
[19:11] <Tv> that doesn't mean dns the mechanism shouldn't prevent numbers in labels
[19:11] <sagewk> except it isn't... the first char has to be a-z.
[19:11] <sagewk> right
[19:11] <wido> Tv: I was indeed able to remove it, just used phprados, listed all pools and issued a pool_delete
[19:12] <Tv> sagewk: $ORIGIN is a phone number stored in dns
[19:12] <wido> But indeed, the command-line tools should be able to do this, rados rmpool --numeric 451 ?
[19:13] <sagewk> oh right i'm thinking zone
[19:15] <Tv> are we doing standup?
[19:16] <sagewk> yeah
[19:45] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[20:17] * MarkDude (~MT@ Quit (Quit: Leaving)
[20:17] * fronlius (~fronlius@e176057253.adsl.alicedsl.de) Quit (Read error: Connection reset by peer)
[20:18] * fronlius (~fronlius@e176057253.adsl.alicedsl.de) has joined #ceph
[20:43] <grape> I built a fresh VM cluster this morning and was able to get ceph running, which is always nice. I found that if I only use one mon node, the daemons start and the cluster runs. If I use the planned three mon nodes, the init process on all three nodes fails with the following message:
[20:43] <grape> unable to read magic from mon data.. did you run mkcephfs?
[20:43] <grape> failed: ' /usr/bin/ceph-mon -i c -c /etc/ceph/ceph.conf '
[20:43] <grape> started ceph on all the nodes
[20:43] <grape> the thing is that it isn't started on all the nodes ;-)
[20:46] <grape> oh. mon likes to remove it's directory. MonitorStore::mkfs: failed to remove /srv/mon.a: rm returned run_cmd(rm): exited with status 1
[20:56] <grape> it appears that mon has a DIY streak
[21:01] <grape> It appears to like to make it's own directory and doesn't take kindly to me mounting a partition for it.
[21:11] <grape> so now everything is running, the most of the logs are flowing, yet the output of ceph health or ceph -s is:
[21:11] <grape> monclient(hunting): MonClient::init(): Failed to create keyring
[21:11] <grape> ceph_tool_common_init failed.
[21:12] <grape> anyone have any ideas?
[21:20] <NaioN> i think you didn't run mkcephfs right
[21:21] <NaioN> I assume you first mounted the partition under the specified dirs in the config for the mons
[21:21] <NaioN> and then a mkcephfs -a
[21:21] <NaioN> because then every mon dir on every server gets filled with the right information
[21:26] <grape> let me check into that
[21:30] <grape> i got rid of the mounts and it runs
[21:31] <NaioN> i think you did something wrong with the mounting and creating
[21:31] <grape> yeah, that fixed it
[21:32] <grape> the two osds that had strange logs are now looking normal and healthy
[21:32] <NaioN> nice
[21:32] <grape> but I still get that monclient message with ceph health
[21:33] * nhm (~mark@penguin.msi.umn.edu) has joined #ceph
[21:33] <grape> NaioN: you were right about keeping ceph.conf simple
[21:34] <NaioN> mine is as simple as it could be :)
[21:35] <NaioN> first try simple, then extend
[21:35] <NaioN> but I have a different problem, I keep triggering a BTRFS bug
[21:35] <grape> fancy ;-)
[21:36] <NaioN> and I think it gets triggered only under load
[21:48] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[21:48] <grape> NaioN: Thanks for the tip on the directories
[21:48] <grape> NaioN: everything ran perfectly this time
[21:51] <NaioN> nice
[22:00] * adjohn (~adjohn@70-36-139-247.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[22:15] <sagewk> i wonder if we should maintain a public btrfs tree to collect all the patches that are specifically addressing ceph issues
[22:16] <sagewk> to at least have a central point of reference for talking about btrfs issues, and to have something to point people at if they run into problems
[22:17] <nhm> sagewk: I havne't been really keeping up, but are there a lot of issues specific to Ceph?
[22:20] * verwilst (~verwilst@dD576F6D0.access.telenet.be) has joined #ceph
[22:20] <sagewk> a few have come up recently and i've had a hard time keeping track of when they hit mainline. the ones that come to mind are josef's xattr regression fix, lxo's performance degradation after a couple days, all the metadata space reservation stuff
[22:21] <nhm> sagewk: I'm building out an openstack cloud and am thinking of using ceph for the nova-volumes storage, so I'm sure I'll run into all of this soon enough. ;)
[22:27] * sjustlaptop (~sam@aon.hq.newdream.net) has joined #ceph
[22:27] <NaioN> sagewk: wouldn't be a bad plan
[22:28] <NaioN> I'm still hitting one very annoying bug
[22:34] <sagewk> naion: which one?
[22:50] <NaioN> kernel BUG at fs/btrfs/extent-tree.c:3595!
[22:50] <NaioN> I'm communicating with Josef for that bug
[22:51] <NaioN> I applied a patch of him to get more info
[22:51] <NaioN> I'm posting it also to the btrfs mailing list
[22:51] <NaioN> I have a real simple setup: host with mon and mds en two hosts with osd (12 disks each)
[22:52] <NaioN> btrfs on both in raid10 on all disks
[22:52] <NaioN> then a ceph client host with rsync daemon
[22:52] <NaioN> ceph client mounts cephfs and exposes it with rsync daemon
[22:53] <NaioN> and then I run some rsync workloads
[22:53] <NaioN> I'm getting linerate speeds (1Gb) so that's the limiting factor
[22:54] <NaioN> and after some time one of the osds crashes with that bug and a while later the other
[22:54] <NaioN> sometimes I also get some of these messages: WARNING: at fs/btrfs/inode.c:2198 btrfs_orphan_commit_root+0xa8/0xc0
[22:54] <NaioN> That's where it started with
[23:08] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit (Quit: cp)
[23:28] * verwilst (~verwilst@dD576F6D0.access.telenet.be) Quit (Quit: Ex-Chat)
[23:28] * sjustlaptop (~sam@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[23:29] * cp (~cp@adsl-76-194-112-187.dsl.pltn13.sbcglobal.net) has joined #ceph
[23:39] * cp (~cp@adsl-76-194-112-187.dsl.pltn13.sbcglobal.net) Quit (Read error: Operation timed out)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.