#ceph IRC Log

Index

IRC Log for 2011-09-09

Timestamps are in GMT/BST.

[0:01] * Guest9525 (~adjohn@50.0.103.18) Quit (Read error: Operation timed out)
[0:02] * cp (~cp@206.15.24.21) has joined #ceph
[0:25] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[1:06] <Tv> joshd, gregaf: no guarantees it'll work, but it did work for me once ;) https://github.com/vishvananda/novascript.git
[1:07] <Tv> without the .git
[1:15] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[1:40] <cp> I'm following the instructions here and running into so trouble with the keyrings
[1:40] <cp> http://ceph.newdream.net/wiki/OSD_cluster_expansion/contraction
[1:41] <cp> The --mkkey flag doesn't seem to work, and I have no osd keyring so the later command to add the osd complains about a lack of a keyring
[1:41] <cp> (steps 4 and 5)
[1:43] <joshd> cp: are you using authentication (i.e. auth = cephx in your ceph.conf)?
[1:43] <joshd> if not, you don't have to worry about keyrings
[1:56] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[1:56] <cp> joshd: I'm not using cephx, but I still haven't figured out a command which lets me add the osd
[1:57] <cp> ceph auth add osd.4 osd 'allow *' mon 'allow rwx'
[1:57] <cp> 2011-09-08 16:57:51.736342 mon <- [auth,add,osd.4,osd,allow *,mon,allow rwx]
[1:57] <cp> 2011-09-08 16:57:51.736952 mon0 -> 'error decoding keyring' (-22)
[1:58] <joshd> cp: you don't need to run that 'ceph auth add' command
[1:58] <joshd> that's only for cephx
[1:58] <joshd> for step 4, just leave off the --mkkey option
[1:59] * adjohn (~adjohn@50.0.103.34) Quit (Quit: adjohn)
[2:00] <cp> joshd: thanks
[2:00] <cp> Still have to use a command like this "service ceph -a start osd" to start the new osds. "service ceph start osd4" doesn't work
[2:00] <cp> New osds in the cluster! :)
[2:07] <joshd> hooray!
[2:14] <djlee> guys, im not sure why cmds goes 100% (and cosd writes very slows down) when im using lots of 1mb file, copying (using ffsb), >600,000 files
[2:15] <djlee> it happens on both ext4 and btrfs, whenever im using ffsb (initial file-copy) of large number of files, i think it happens as i get over 200,000 files, etc. cmds just goes 100% busy
[2:17] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[2:18] <greglap> djlee: are those files all in one directory?
[2:19] <djlee> yes
[2:19] <greglap> it sounds like your MDS is hitting its cache limit (I think it defaults to 100k inodes)
[2:19] <greglap> so if you have a directory larger than that everything gets very slow
[2:19] <djlee> i dont remember happening that when cmds/osd/mon were separated node before on 0.29
[2:19] <djlee> yeah
[2:19] <greglap> maybe you just didn't make such large directories?
[2:20] <greglap> it won't be the case permanently, we're working on getting directory fragmentation stable and then it will handle that situation much better
[2:20] <greglap> you could try enabling it and see if that helps your performance
[2:20] <djlee> large directory? how i used default mkfs.ext4 , and used ceph's btrfs format
[2:21] <greglap> it's a limitation in the MDS ??? it has a maximum number of inodes it keeps in its directory, and it always loads the entire fragment off of disk
[2:21] <djlee> ah, i see, so no way to increase that for now?
[2:21] <greglap> when the entire fragment is the entire directory, and the directory is larger than the max cache size, then it loads the entire directory off disk for every op, then throws out the inodes, then loads it for the next op
[2:22] <greglap> you can bump up the "mds cache size" parameter if you have more RAM than it's using
[2:22] <greglap> it defaults to 100k
[2:23] <greglap> there's also an option to enable fragmentation which I'm having trouble finding, give me a minute
[2:23] <greglap> if you want to live on the edge you can do that
[2:23] <djlee> i see, so a line just under [mds.0] , i set mds_cache_size = 200k
[2:23] <greglap> 200000, yeah
[2:24] <greglap> although if the cache isn't large enough to hold the entire directory it won't really help any
[2:24] <djlee> right now lots pretty much all of the debug-options are off, just for a sake of performance (over)consideration, so can i turn on some of debug on that's not going to hurt performnace too much?
[2:24] <djlee> the ram's 18gb
[2:25] <greglap> the logging is not very well organized, you probably don't want it
[2:25] <greglap> if you've got 18GB of ram the MDS can use you can probably give it a much bigger cache
[2:25] <greglap> I think it uses ~200MB when it's got a 100k inode cache
[2:25] <greglap> you could probably just let it use a million inodes, if that's as large as you want your dirs to get ;)
[2:26] <djlee> i see, very helpful thanks..
[2:26] <djlee> so this doesnt have to do with the pgs ?
[2:26] <djlee> 0.34 is default 196 i think
[2:27] <greglap> no, nothing to do with the PGs
[2:27] <greglap> those impact OSD data placement, but the MDS doesn't care
[2:27] <djlee> i see
[2:28] <greglap> ah, there we go
[2:28] <greglap> try out the "mds bal frag" option
[2:28] <greglap> set it to true on your MDS and see if performance gets better on large dirs :)
[2:29] <djlee> i see, oh i be using just a single dir with tons of files though..
[2:30] <djlee> another behavior i noticed is that, when i using ffsb on e.g., 12tb raid0 or just 2tb single space, just doing a very initial test (e..g, file copying, thus single-threaded writes), i don't really think it's writing "sequentially"? as the performance drops quite a lot over some time,
[2:31] <greglap> djlee: the point is that with fragmentation enabled, the MDS will fragment directories into more than one unit which it can work with
[2:31] <greglap> I'm not sure what you mean about the sequential thing with ffsb?
[2:32] <greglap> it might be that your first hundred MB go very quickly because it's all going into cache, and then it gets stuck going at network speed after that
[2:33] <djlee> right the net link is 4gb/s, i believe it is due to the preexisting mds_cache_size maxed out !
[2:33] <djlee> (iperf = 3.6gb/s)
[2:33] * yoshi (~yoshi@p15222-ipngn101marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[2:36] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) Quit (Quit: jojy)
[2:36] <gregaf> djlee: I wouldn't expect the MDS cache to impact sequential file transfers; it's not involved at all in doing those
[2:36] <djlee> i also see that mds_cache_limit only applies during the write operation? because once all files sitting on there, when i do the random-read, i get the expected performance.
[2:37] <djlee> the symptom was that everytime cmds 100% busy, the osd write performance drops (mds processing slowly), hence the slow write i observed.. :)
[2:38] <gregaf> djlee: oh, that's probably because the MDS is continually reading and writing large objects off the OSD
[2:38] <djlee> and the reason cmds is 100% busy is because im copying too many files, (yes smaller copies like 10000 files (20GB, or 40GB) , the sequential write is great)
[2:39] <gregaf> again, I think you'll find this works better if you turn on fragmentation ??? if it doesn't, then we'll certainly want to know about it!
[2:40] <gregaf> the way mds cache limits interact with benchmarks like ffsb can be a bit strange due to the distributed nature of the system
[2:41] <gregaf> like for reads the MDS is less likely to need to pull things off disk, as long as the client has touched the file recently
[2:41] <gregaf> but for file creates if will always need to look at the entire directory to make sure the file doesn't already exist
[2:41] <djlee> right~, i set mds_bal_frag = true as well
[2:41] <djlee> will let you know :p
[2:42] <gregaf> cool
[2:42] <djlee> thanks tones :)
[2:42] <gregaf> I've got to head out soon, but I'll check tomorrow if you leave anything here or send it to the list :)
[2:42] <djlee> ok, have a good day~
[2:43] * cp (~cp@206.15.24.21) Quit (Quit: cp)
[2:51] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[2:55] * greglap (~Adium@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:01] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Read error: Connection reset by peer)
[3:01] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[3:03] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[3:08] * bchrisman (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[3:21] * rsharpe (~Adium@70-35-37-146.static.wiline.com) Quit (Ping timeout: 480 seconds)
[3:22] * rsharpe (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[3:59] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[3:59] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) Quit ()
[4:32] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[4:49] * greglap (~Adium@mobile-166-205-141-172.mycingular.net) has joined #ceph
[4:49] * greglap (~Adium@mobile-166-205-141-172.mycingular.net) Quit ()
[5:50] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[6:42] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[6:56] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[7:16] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[7:16] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[7:17] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit ()
[7:20] * gregorg (~Greg@78.155.152.6) Quit (Ping timeout: 480 seconds)
[7:23] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[7:49] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[7:50] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit ()
[8:09] * yoshi (~yoshi@p15222-ipngn101marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[8:12] * yoshi (~yoshi@u671232.xgsfmg4.imtp.tachikawa.mopera.net) has joined #ceph
[8:45] * yoshi_ (~yoshi@p15222-ipngn101marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[8:52] * yoshi (~yoshi@u671232.xgsfmg4.imtp.tachikawa.mopera.net) Quit (Ping timeout: 480 seconds)
[8:52] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[9:06] * gregorg (~Greg@78.155.152.6) has joined #ceph
[10:51] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[11:32] * yoshi_ (~yoshi@p15222-ipngn101marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[13:49] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[14:13] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[14:26] * yehuda_hm (~yehuda@bzq-79-183-192-55.red.bezeqint.net) has joined #ceph
[14:26] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[14:32] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[15:25] * yehuda_hm (~yehuda@bzq-79-183-192-55.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[15:53] * drnexus (~drnexus@branch.inria.fr) has joined #ceph
[16:04] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[16:05] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[16:58] <drnexus> Hello everybody
[17:14] <stingray> o hai
[17:34] * niemeyer (~niemeyer@200-102-220-181.pltce701.dsl.brasiltelecom.net.br) has joined #ceph
[17:34] <niemeyer> Greetings
[17:35] <niemeyer> Sorry for the silly question.. surely I'm doing something wrong, but trying to use s3cmd with objects.dreamhost.com fails with auth issues (403) in all cases
[17:35] <niemeyer> Is there any gotchas ATM?
[17:43] * yehuda_hm (~yehuda@bzq-79-183-192-55.red.bezeqint.net) has joined #ceph
[17:50] * darktim (~andre@ticket1.nine.ch) Quit (Remote host closed the connection)
[17:52] * Tv (~Tv|work@aon.hq.newdream.net) has joined #ceph
[17:57] <Tv> it seems gitbuilder is down
[17:59] <Tv> thttpd was not running on it, started
[18:00] <Tv> weird
[18:03] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[18:30] <gregaf> drnexus: hey, can we help you?
[18:31] <gregaf> niemeyer: I think objects.dreamhost.com is down right now???the perils of early testing!
[18:33] <niemeyer> gregaf: Ah, sweet, will try out later. Thank you
[18:38] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) has joined #ceph
[18:42] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:43] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[19:13] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) Quit (Quit: jojy)
[19:16] * cp (~cp@c-98-234-218-251.hsd1.ca.comcast.net) Quit (Quit: cp)
[19:28] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:42] <sagewk> niemeyer: it's up, but i don't think there are any outside testers yet.. were you part of the beta earlier this year? those credentials won't work anymore
[19:46] <niemeyer> sagewk: I am/was
[19:46] <niemeyer> sagewk: But I generated new ones today thinking something creepy like that might be happening
[19:46] <niemeyer> sagewk: I suppose the ones in the panel should be alrihgt?
[19:47] <sagewk> i'm actually not sure how the panel stuff interacts with the old/new beta stuff. send an email to ben.cherian@dreamhost.com and he can sort you out
[19:47] <niemeyer> sagewk: Still not working
[19:47] * cp (~cp@206.15.24.21) has joined #ceph
[19:48] <niemeyer> sagewk: Ok, will do
[19:49] <cp> Question: what's using all of my space: "380 MB data, 2213 MB used, 1206 MB / 4882 MB avail"
[19:53] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) has joined #ceph
[20:02] <cp> Ah, my question now is: how do I limit the sizes of the log files, or adjust the logging level?
[20:04] * xns (~xns@evul.net) has joined #ceph
[20:06] <xns> hey guys, just started experimenting with ceph a bit this week.. have been running into a mds crash a few times now
[20:06] <xns> rather hitting an assert
[20:06] <xns> mds/MDCache.cc: In function 'void MDCache::handle_cache_expire(MCacheExpire*)', in thread '0x7f66f8a66700'
[20:06] <xns> mds/MDCache.cc: 5895: FAILED assert(!(parent_dir->is_auth() && parent_dir->is_exporting()) || (migrator->get_export_state(parent_dir) == Migrator::EXPORT_WARNING && !migrator->export_has_warned(parent_dir, from)))
[20:07] <xns> I have a 4 node mds config.. 2 active, and two standby-replay
[20:07] <xns> in this case, 02 died.. then shortly after its standby died
[20:07] <xns> same asserr
[20:08] <xns> assert
[20:08] <xns> mind you I'm running bleeding edge (been following the development a bit) so this was as of 4281f02193539ee428cf98ad10a2277071ac341f
[20:11] * greglap (~Adium@aon.hq.newdream.net) has joined #ceph
[20:12] <greglap> cp: unfortunately there's not a good way to limit the size of the logs, although you could probably just truncate them from time to time
[20:13] <greglap> you can adjust debug levels in your ceph.conf by changing the numbers next to debug * options
[20:13] <greglap> although if you don't have any debugging explicitly turned on then it shouldn't be generating much
[20:14] <cp> greglap: Ok, thanks. Here's an informal request to add an option for that :)
[20:14] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[20:14] <greglap> xns: hmm, that's not something that looks familiar to me
[20:14] <cp> My osd's and mons were dying and it took me a little while to figure out that the logs had eaten all of my disk space :)
[20:15] <xns> greglap: yeah seems to be somehwat difficult to reproduce too
[20:15] <Tv> cp: you can rotate logs with logrotate, see /etc/logrotate.d/ceph
[20:15] <greglap> xns: if you could put a bug in the tracker we'd appreciate it, but for now I recommend running only one active MDS ??? it's much better-tested and more stable
[20:15] <greglap> and we're focusing on other aspects of the project that are a bit closer to done right now :)
[20:15] <greglap> cp: did you have logging turned on explicitly?
[20:16] <xns> greglap: will do, I dont mind trying to break multi mds's to help you guys along down the road however :)
[20:16] <cp> Yup, I'm turning it down in my conf files now. The one I'd started working from has a lot of 20s ...
[20:16] <greglap> xns: just be warned that we probably won't handle those bugs so quickly ??? we're a small team
[20:16] <greglap> cp: yeah, debug 20 generates a ton, 10 will be a lot less and 1 will barely generate any
[20:18] <xns> greglap: no worries, I've been eagerly trying to get up to speed on the code, really interested in the project I'll see if I can't debug some of them myself and contribute.
[20:19] <greglap> all right, sounds nice!
[20:19] <cp> Another random question: what does a directory like this (in my osd directory) mean? "current.remove.me.846930886"
[20:22] <greglap> cp: checking
[20:22] <greglap> looks like it's how we handle a problem with btrfs subvolumes
[20:23] <greglap> yep
[20:23] <cp> greglap: subvolumes in the sense that some of the files around here are actually file systems inside?
[20:24] <greglap> cp: it's a btrfs concept that I'm not intimate with, but...partly
[20:24] <greglap> one thing they do is establish snapshot "domain" for btrfs
[20:24] <greglap> ceph tried to remove a subvolume and the version of btrfs in use doesn't allow it, so it renamed it to "remove.me.*"
[20:25] <cp> greglap: ah. I'm having trouble nuking the directory too.
[20:26] <greglap> cp: my understanding is that btrfs didn't actually implement subvolume deletion for a long while, so you're probably stuck with it
[20:26] <greglap> or at least it was buggy
[20:27] <cp> greglap: does that mean I just need to get a new version of btfs installed and migrate everything over?
[20:27] <greglap> cp: that would do it, yes
[20:29] <cp> tv: thanks
[20:44] * rsharpe (~Adium@70-35-37-146.static.wiline.com) Quit (Quit: Leaving.)
[20:51] <xns> greglap: safe to assume you guys are focusing on a single mds with atleast one replay slave?
[21:06] <cp> greglap: I have another question: my journal files seem to be getting out of control (and eating all my space). Any idea what to do about this. I think deleting them messes up the osd.
[21:12] <greglap> xns: single-mds should be pretty safe, the standby MDS should be equivalently safe
[21:13] <greglap> cp: your osd journal? that's odd, what size do you have it set to?
[21:14] <cp> osd journal size = 100
[21:14] <greglap> cp: and it's much larger than that?
[21:16] <cp> Ah, I guess it is only 100mb.
[21:16] <cp> greplap: thanks
[21:20] <cp> greglap: one more question: what command should work to start a specific osd or mon on a particular machine?
[21:22] <greglap> cp: I don't know if the launcher can do it ??? sjust thinks it can for daemons located on the machine?
[21:22] <greglap> otherwise right now you just have to ssh over and do it manually if you're trying to do individual nodes
[21:22] <cp> greglap: what's the manual command if I'm ssh'd in?
[21:23] <greglap> well, if sjust is right then you can just run it via serve or init.d or whatever
[21:23] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[21:23] <greglap> otherwise, run the executable and pass in the conf location and (if necessary) the keyring and tell it its name
[21:23] <greglap> ie "cmds -c /etc/ceph/ceph.conf -k /etc/ceph/keyring -i a"
[21:23] <greglap> for mds.a
[21:27] <Tv> sjust: fyi http://tracker.newdream.net/issues/1529
[21:30] <cp> greglap: Ok, so my space is going somewhere else. I have a 5 node cluster with a grand total of almost nothing stored on it and I keep running out of space. current in 257MB, snap_X are about 250MB each as well. Is this supposed to be happening?
[21:31] <greglap> cp: well, it's safe to assume that you're not supposed to run out of space without storing data ;)
[21:31] <greglap> is there any churn on it? like are you creating a bunch of files and then you deleted them?
[21:32] <cp> greglap: :) that's what I was thinking. I just have a single object stored in a single pool. I'm mounting the object as a file system (when the cluster isn't totally degraded from running out of space on nodes), but haven't even been writing to it.
[21:32] <greglap> cp: using rbd, you mean?
[21:32] <cp> yup
[21:32] <cp> greglap: yup
[21:32] <greglap> hmmm
[21:33] <greglap> what's rados df give you?
[21:38] <cp> http://pastebin.com/Bha10AR6
[21:43] <greglap> cp: looks like you've got ~340MB in "mypool" spread over 93 objects ??? what's your replication level on that pool?
[21:43] <cp> greglap: Hmmm... the default? How do I check that?
[21:43] <greglap> okay, so that's 2x then
[21:44] <greglap> and you have 5 OSDs?
[21:44] <cp> yup
[21:45] <cp> cp: but only two are up right now. They've been failing a lot
[21:46] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[21:46] <greglap> cp okay, hmm
[21:46] <greglap> so there's 1765MB in use for RADOS, of which 500MB will be journal, and 336MB of replicated data
[21:47] <greglap> that doesn't quite add up
[21:47] <greglap> I wonder if there's something weird going on with the snapshots
[21:49] <cp> greglap: (also, one of the osd's doesn't start and give this log message: "2011-09-09 12:45:37.207379 7f0abd37b760 osd3 0 unable to mount object store") Another osd gives no complaint but doesn't stay up "2011-09-09 12:45:38.366248 7f17fff88700 journal write_thread_entry finish". The third on complains that the journal block size is off and crashes out.
[21:50] <greglap> cp: that's...a lot of errors
[21:50] <cp> greglap: yeah
[21:50] <greglap> I think maybe the space usage is due to failing to delete the subvolumes
[21:51] <greglap> so if you can't upgrade your kernel you might be better off with a different backing fs
[21:51] <cp> greglap: right. I'll try ext4 with xattrs and see how that works out.
[21:51] <greglap> are all your OSDs running with the same configuration?
[21:52] <cp> yup. Though I think a couple of them have slightly more space. I'm using files formatted with btfs and mounted.
[21:52] <greglap> it almost sounds like some of them are pointed to a filesystem and some are pointed to a block device, but they aren't configured properly for the different environments
[21:53] <greglap> can you post your conf file?
[21:53] <cp> http://pastebin.com/rQNvsmt0
[21:55] <cp> greglap: lunch time
[21:55] <greglap> cp: okay
[21:55] <cp> greglap: thanks for the help so far
[21:55] <greglap> yeah
[21:55] <greglap> when you get back, can you make sure that each of your nodes has a filesystem mounted on /mnt/foo/osd*?
[22:40] * MK_FG (~MK_FG@188.226.51.71) Quit (Ping timeout: 480 seconds)
[22:52] <cp> greglap: yup, I have files in all of them.
[22:53] <greglap> cp: on each node?
[22:54] <cp> greglap: yup
[22:56] <greglap> cp: are they just part of the root fs or does it get its own disk?
[22:57] <cp> They have their own disk - it's a formatted mounted file
[22:57] <cp> dd if=/dev/zero of=/var/tmp/foo count=1000000 # create a large empty file
[22:57] <cp> mkfs.btrfs /var/tmp/foo # put a file system on it
[22:57] <cp> mount -o loop /var/tmp/foo /mnt/foo
[22:59] <greglap> hmm, I'm not sure what could be going on then
[22:59] <greglap> sjust, you have any ideas?
[23:00] <sjust> looking
[23:05] <greglap> cp: hmm, can you paste that error message about the block size?
[23:05] <greglap> not sure if loopback on btrfs will play nicely
[23:06] <cp> greglap: I think that may be bacause I deleted the journal. Just a sec
[23:06] <greglap> cp: well, that is likely to break things
[23:07] <cp> greglap: :) Yup. That failure mode makes sense at least
[23:07] <greglap> running it on loopback makes me a bit nervous, and you're probably not getting the real benefits of btrfs anyway
[23:07] <cp> http://pastebin.com/8i5Pm6U0
[23:07] <greglap> why not try just running them inside your regular ext4 partitions and seeing how that behaves
[23:08] <cp> greglap: That's an option. I'll need to remount with xattrs turned on, right?
[23:08] <greglap> cp: yes
[23:09] <cp> greglap: Though I do like the idea of being able to control the size so it doesn't over-run the rest of the system (as it seems to chew a lot at the moment)
[23:09] <greglap> cp: well, if you've got logging off then it's going to chew exactly as much as you put into the system
[23:09] <greglap> if you want to control size, running in VMs is probably a better idea
[23:09] <greglap> Ceph doesn't do well when its nodes run out of disk space
[23:10] <cp> greglap: yes, though io performance could be bad in the VM case
[23:13] <greglap> cp: I wouldn't expect it to be any worse than with loopback ??? VM systems have gotten a lot better at handling I/O
[23:13] <greglap> how big are those devices, anyway? 500MB each?
[23:14] <greglap> that's probably why your other daemons are crashing, they're taking 100MB each for the journal and with that little data it's going to have trouble balancing properly
[23:42] * cp (~cp@206.15.24.21) Quit (Quit: cp)
[23:55] <slang> I have some pgs that are stuck in down+peering
[23:57] <slang> even though the osds that the pgs map to are up/in
[23:57] <greglap> slang: do you have any down OSDs?
[23:58] <slang> greglap: yes, and I've done the step to mark them lost
[23:58] <slang> I've also tried to do injectargs '--debug_osd 20 --debug_ms 1' on the osd that one of the pgs in down+peering maps to
[23:59] <slang> but I don't see anything about that particular pg in the log
[23:59] <greglap> yeah, I wouldn't expect you to at this point, it probably is waiting for some event

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.