[0:00] <grape> have a good night
[0:00] <NaioN> i advise to look at the wiki and begin with a plain config as possible
[0:00] <NaioN> np
[1:24] <Tv> pmjdebruijn: still there?
[1:43] <Tv> pmjdebruijn: for the logs, your problem was likely fixed by 77a62fdce4afb305d5314590c02325b1b221c93f
[19:03] <wido> Hi
[19:04] <sjust> hi
[19:04] <wido> I just created a pool with a 'garbage' name, my char did not have a pool name, so I have two pools now with a weird name
[19:04] <wido> just the memory contents of that char where used for the pool name
[19:05] <wido> there is no way to remove a pool by it's ID, is there?
[19:05] <wido> instead of the name
[19:05] <sjust> hmm
[19:05] <wido> "`y���Dއ"
[19:05] <wido> that is a pool name for example
[19:06] <Tv> wido: the api bindings should be able to remove it, if nothing else
[19:06] <joshd> you should be able to delete it by name still by listing pools, and deleting the weird ones you get back
[19:07] <joshd> the objecter has the function for deleting by id, but librados doesn't expose it directly
[19:07] <wido> Tv: I get it, but the name is non-ascii
[19:07] <Tv> wido: shouldn't matter to an api
[19:08] <sagewk> i wonder if we should disallow pool names that are numeric. then the mon commands (among other things) could delete either without ambiguity.
[19:08] <Tv> sagewk: numbers can always be passed in via an explicit --numeric=241
[19:08] <wido> Tv: true, but, it could be a feature request
[19:08] <wido> in the future you would see this happen more often
[19:08] <Tv> restricting to valid utf-8, that i can see
[19:09] <wido> and a sysadmin might want to remove that pool
[19:09] <Tv> but even then, you're gonna have a row of snowmen
[19:09] <sagewk> yeah. it can still be confusing if pool id 12 is called "14" or something. currently the pool name is totally unrestricted
[19:09] <Tv> wido: oh yes it should be removable with the command line tools
[19:10] <Tv> sagewk: i really don't expect people to deal with the numbers...
[19:10] <Tv> sagewk: that's like... is a completely valid dns *name*
[19:11] <Tv> that doesn't mean dns the mechanism shouldn't prevent numbers in labels
[19:11] <sagewk> except it isn't... the first char has to be a-z.
[19:11] <sagewk> right
[19:11] <wido> Tv: I was indeed able to remove it, just used phprados, listed all pools and issued a pool_delete
[19:12] <Tv> sagewk: $ORIGIN is a phone number stored in dns
[19:12] <wido> But indeed, the command-line tools should be able to do this, rados rmpool --numeric 451 ?
[19:13] <sagewk> oh right i'm thinking zone
[19:15] <Tv> are we doing standup?
[19:16] <sagewk> yeah
[20:43] <grape> I built a fresh VM cluster this morning and was able to get ceph running, which is always nice. I found that if I only use one mon node, the daemons start and the cluster runs. If I use the planned three mon nodes, the init process on all three nodes fails with the following message:
[20:43] <grape> unable to read magic from mon data.. did you run mkcephfs?
[20:43] <grape> failed: ' /usr/bin/ceph-mon -i c -c /etc/ceph/ceph.conf '
[20:43] <grape> started ceph on all the nodes
[20:43] <grape> the thing is that it isn't started on all the nodes ;-)
[20:46] <grape> oh. mon likes to remove it's directory. MonitorStore::mkfs: failed to remove /srv/mon.a: rm returned run_cmd(rm): exited with status 1
[20:56] <grape> it appears that mon has a DIY streak
[21:01] <grape> It appears to like to make it's own directory and doesn't take kindly to me mounting a partition for it.
[21:11] <grape> so now everything is running, the most of the logs are flowing, yet the output of ceph health or ceph -s is:
[21:11] <grape> monclient(hunting): MonClient::init(): Failed to create keyring
[21:11] <grape> ceph_tool_common_init failed.
[21:12] <grape> anyone have any ideas?
[21:20] <NaioN> i think you didn't run mkcephfs right
[21:21] <NaioN> I assume you first mounted the partition under the specified dirs in the config for the mons
[21:21] <NaioN> and then a mkcephfs -a
[21:21] <NaioN> because then every mon dir on every server gets filled with the right information
[21:26] <grape> let me check into that
[21:30] <grape> i got rid of the mounts and it runs
[21:31] <NaioN> i think you did something wrong with the mounting and creating
[21:31] <grape> yeah, that fixed it
[21:32] <grape> the two osds that had strange logs are now looking normal and healthy
[21:32] <NaioN> nice
[21:32] <grape> but I still get that monclient message with ceph health
[21:33] <grape> NaioN: you were right about keeping ceph.conf simple
[21:34] <NaioN> mine is as simple as it could be :)
[21:35] <NaioN> first try simple, then extend
[21:35] <NaioN> but I have a different problem, I keep triggering a BTRFS bug
[21:35] <grape> fancy ;-)
[21:36] <NaioN> and I think it gets triggered only under load
[21:48] <grape> NaioN: Thanks for the tip on the directories
[21:48] <grape> NaioN: everything ran perfectly this time
[21:51] <NaioN> nice
[22:15] <sagewk> i wonder if we should maintain a public btrfs tree to collect all the patches that are specifically addressing ceph issues
[22:16] <sagewk> to at least have a central point of reference for talking about btrfs issues, and to have something to point people at if they run into problems
[22:17] <nhm> sagewk: I havne't been really keeping up, but are there a lot of issues specific to Ceph?
[22:20] <sagewk> a few have come up recently and i've had a hard time keeping track of when they hit mainline. the ones that come to mind are josef's xattr regression fix, lxo's performance degradation after a couple days, all the metadata space reservation stuff
[22:21] <nhm> sagewk: I'm building out an openstack cloud and am thinking of using ceph for the nova-volumes storage, so I'm sure I'll run into all of this soon enough. ;)
[22:27] <NaioN> sagewk: wouldn't be a bad plan
[22:28] <NaioN> I'm still hitting one very annoying bug
[22:34] <sagewk> naion: which one?
[22:50] <NaioN> kernel BUG at fs/btrfs/extent-tree.c:3595!
[22:50] <NaioN> I'm communicating with Josef for that bug
[22:51] <NaioN> I applied a patch of him to get more info
[22:51] <NaioN> I'm posting it also to the btrfs mailing list
[22:51] <NaioN> I have a real simple setup: host with mon and mds en two hosts with osd (12 disks each)
[22:52] <NaioN> btrfs on both in raid10 on all disks
[22:52] <NaioN> then a ceph client host with rsync daemon
[22:52] <NaioN> ceph client mounts cephfs and exposes it with rsync daemon
[22:53] <NaioN> and then I run some rsync workloads
[22:53] <NaioN> I'm getting linerate speeds (1Gb) so that's the limiting factor
[22:54] <NaioN> and after some time one of the osds crashes with that bug and a while later the other
[22:54] <NaioN> sometimes I also get some of these messages: WARNING: at fs/btrfs/inode.c:2198 btrfs_orphan_commit_root+0xa8/0xc0
[22:54] <NaioN> That's where it started with
