#ceph IRC Log


IRC Log for 2013-05-27

Timestamps are in GMT/BST.

[0:07] * diegows (~diegows@ has joined #ceph
[0:12] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[0:33] * tnt (~tnt@ has joined #ceph
[0:35] * tnt_ (~tnt@ Quit (Ping timeout: 480 seconds)
[0:39] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[0:56] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[0:56] * ChanServ sets mode +o scuttlemonkey
[0:57] * ScOut3R (~ScOut3R@540240A4.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[1:32] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[1:32] * MarkN (~nathan@ has joined #ceph
[1:33] * MarkN (~nathan@ has left #ceph
[1:33] <mrjack> is the leveldb issue solved somehow?
[1:34] <mrjack> lightspeed: hm could be that conversion failed and now there is a folder store.db with b0rked conversion left?
[1:35] * LeaChim (~LeaChim@ Quit (Ping timeout: 480 seconds)
[1:36] <tnt> mrjack: the proposed patch seems to work fine.
[1:37] <mrjack> tnt: -vvv pls ;)
[1:39] <tnt> http://tracker.ceph.com/issues/4895
[1:40] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[1:40] * loicd (~loic@magenta.dachary.org) has joined #ceph
[1:42] <lightspeed> mrjack: there was a store.db folder containing 3 zero byte files (LOCK, LOG, and LOG.old)
[1:43] <lightspeed> anyway I think I fixed it by just recreating the broken mon
[1:44] <lightspeed> interestingly that one now only has a keyring file plus the store.db folder
[1:44] <lightspeed> whereas the others that had been upgraded have a load of other files/directories in them
[1:46] * The_Bishop (~bishop@f052098020.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[1:47] * The_Bishop (~bishop@f052098020.adsl.alicedsl.de) has joined #ceph
[1:52] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[2:03] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[2:06] * vipr_ (~vipr@78-23-112-22.access.telenet.be) has joined #ceph
[2:09] * vipr (~vipr@78-23-118-217.access.telenet.be) Quit (Read error: Operation timed out)
[2:12] * vipr (~vipr@78-23-112-45.access.telenet.be) has joined #ceph
[2:19] * vipr_ (~vipr@78-23-112-22.access.telenet.be) Quit (Ping timeout: 480 seconds)
[2:20] * BManojlovic (~steki@fo-d- Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:32] * Coyo (~coyo@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[2:32] * Coyo (~coyo@00017955.user.oftc.net) Quit ()
[2:49] * jahkeup (~jahkeup@ Quit (Quit: Textual IRC Client: www.textualapp.com)
[3:36] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[3:52] * julian (~julianwa@ has joined #ceph
[4:14] * The_Bishop_ (~bishop@e177088119.adsl.alicedsl.de) has joined #ceph
[4:21] * The_Bishop (~bishop@f052098020.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[4:44] * codice (~toodles@75-140-71-24.dhcp.lnbh.ca.charter.com) has joined #ceph
[4:55] * Crothers (~croths@ec2-54-214-232-166.us-west-2.compute.amazonaws.com) has joined #ceph
[4:58] * Crothers (~croths@ec2-54-214-232-166.us-west-2.compute.amazonaws.com) Quit (Quit: Leaving.)
[5:12] * Vanony_ (~vovo@ has joined #ceph
[5:18] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[5:19] * Vanony (~vovo@i59F7A824.versanet.de) Quit (Ping timeout: 480 seconds)
[6:58] * Coyo (~coyo@pool-71-164-242-68.dllstx.fios.verizon.net) has joined #ceph
[7:07] * tnt (~tnt@ has joined #ceph
[7:23] * Machske (~Bram@d5152D87C.static.telenet.be) Quit ()
[7:24] * capri (~capri@ Quit (Quit: Verlassend)
[7:30] * capri (~capri@ has joined #ceph
[8:13] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has joined #ceph
[8:17] * bergerx_ (~bekir@ has joined #ceph
[8:19] * ivoks (~ivoks@jupiter.init.hr) Quit (Remote host closed the connection)
[8:22] * capri (~capri@ Quit (Quit: Verlassend)
[8:37] * Machske (~Bram@d5152D8A3.static.telenet.be) has joined #ceph
[8:45] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Read error: Operation timed out)
[8:45] * Coyo (~coyo@00017955.user.oftc.net) Quit (Quit: Heaven is not a place, Bartleby, it's being with people who love you.)
[8:47] <alexxy> hi all
[8:48] <alexxy> seems i again get a problem with ceph on mon
[8:48] <alexxy> it seems writes -rw-r--r-- 1 root root 51G May 27 10:47 ceph-mon.alpha.tdump
[8:48] <alexxy> with a huge size
[8:48] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[8:48] <tnt> you're on 0.61.1 ?
[8:49] <alexxy> yep
[8:49] <tnt> you shouldn't :)
[8:49] <tnt> upgrade to 0.61.2 and delete that file.
[8:49] <tnt> 0.61.1 had a debug option enable by default by mistake that dumps all transactions to a dump file ...
[9:00] * loicd (~loic@magenta.dachary.org) has joined #ceph
[9:02] <alexxy> tnt: ok updating now
[9:03] <alexxy> tnt: is it enough to update mon only?
[9:03] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[9:05] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:05] * ChanServ sets mode +v andreask
[9:06] <wogri_risc> alexxy: please update the OSDs too. never a good idea to just restart the mons. even if functionality might not break...
[9:07] <alexxy> wogri_risc: there hanging tasks on osd
[9:07] <alexxy> so it will be problematic
[9:07] <wogri_risc> hm. then do the mon's first.
[9:23] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:32] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[9:33] <nigwil> my MDS crash while "idle": http://pastebin.com/v3rDUBZ8
[9:33] <nigwil> restarting fixed it
[9:44] * leseb (~Adium@ has joined #ceph
[9:45] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:46] * ChanServ sets mode +v leseb
[9:47] * BManojlovic (~steki@ has joined #ceph
[9:52] * pja (~pja@a.clients.kiwiirc.com) has joined #ceph
[9:53] * pja (~pja@a.clients.kiwiirc.com) Quit ()
[9:55] * frank9999 (~frank@kantoor.transip.nl) Quit (Read error: Connection reset by peer)
[9:59] * fabioFVZ (~fabiofvz@ has joined #ceph
[10:02] * frank9999 (~frank@kantoor.transip.nl) has joined #ceph
[10:05] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:08] * ScOut3R (~ScOut3R@ has joined #ceph
[10:09] * tziOm (~bjornar@ has joined #ceph
[10:16] <tnt> joao: The good news is that over the last 5 days since I installed the patch, there wasn't any huge growth of the mon. So given that and the fact that the patch just makes sense, I think #4895 is fixed.
[10:17] <tnt> joao: The bad news is that there seems to still be some weirdness in mon disk space usage: http://i.imgur.com/D17beYv.png
[10:17] <tnt> (those are averaged out a bit to remove a bit of the sawtooth effect of compaction cycle).
[10:27] <tnt> and restarting the mon makes it go back to a lower disk usage.
[10:32] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[10:32] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[10:34] * eschnou (~eschnou@212-166-45-50.win.be) has joined #ceph
[10:35] * eschnou (~eschnou@212-166-45-50.win.be) Quit ()
[10:37] * frank9999 (~frank@kantoor.transip.nl) Quit (Ping timeout: 480 seconds)
[10:39] * LeaChim (~LeaChim@ has joined #ceph
[10:41] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[10:58] * ghartz (~ghartz@33.ip-5-135-148.eu) Quit (Remote host closed the connection)
[10:59] * ghartz (~ghartz@33.ip-5-135-148.eu) has joined #ceph
[11:09] <joao> morning all
[11:10] <joao> tnt, is it safe to assume that bikini-01 and bikini-02 are peons?
[11:10] <tnt> yes
[11:10] <tnt> and they run the standard 0.61.2 and they've been restarted at the same time.
[11:11] <tnt> the -00 is the master and runs the patched code and has been restarted in th mean time.
[11:11] <tnt> -02 also has compact on trim disabled (which doesn't seem to change anything AFAICT)
[11:13] <joao> only the leader trims the state; then it passes it on to the peons as with any other proposal
[11:13] <tnt> the space seem to be used by the MANIFEST file (26M) and LOG (ls says 88M but du -sh says 152M ...).
[11:13] <joao> peons don't even notice there's a trim happening
[11:13] * rahmu (~rahmu@ has joined #ceph
[11:13] <joao> which now that I think of it may be the issue
[11:13] <joao> given they won't trim as often as the leader
[11:13] <joao> err
[11:13] <joao> compact
[11:14] <joao> they won't compact as often
[11:14] <joao> but it's rather interesting how the store grew so much in just an instant
[11:15] <joao> it just spiked
[11:15] <tnt> they do compact, if I "zoom in", I can clearly see the sawtooth pattern.
[11:15] <niklas> wido: Am I wrong, or does rados-java only support String as input/output for Object contents?
[11:16] <joao> tnt, but do they compact at the same time?
[11:16] <joao> leveldb is supposed to compact every now and then
[11:16] <joao> so that's what may be happening
[11:16] <wido> niklas: Not sure, let me check
[11:16] <wido> niklas: I think I used a byte array as buffer
[11:17] <joao> the thing is, whenever we trim, we compact the store; but given that we only actually trim on the leader, I'm almost certain that we will just compact-on-trim on the leader too
[11:17] <tnt> joao: pretty much http://i.imgur.com/1z1tpXr.png The drop of the red is when I restarted it, to see if it would drop in size.
[11:17] <wido> niklas: That's pretty stupid of me. I indeed used a String
[11:17] <wido> forgot the byte[] methods
[11:17] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[11:18] <wido> in IoCTX
[11:18] <wido> RBD does support byte[]
[11:18] <joao> tnt, that's interesting, but that might be just a trim going through, freeing the trimmed keys
[11:18] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[11:18] <joao> well
[11:18] <joao> if it is, then it is great
[11:18] <joao> but what we would like to achieve would be that considerable drop in disk usage obtained from restarting the monitor
[11:19] <tnt> on -02 I even disabled compact-on-trim and it still seems to compact in sync with the trims ...
[11:19] <niklas> wido: String seems to be rather inefficient for larger objects, thats why I ask…
[11:20] <joao> tnt, my guess there is that the peons have a lighter load than the leader, and more time for leveldb to compact after the trim
[11:20] <joao> but I'm just speculating
[11:20] <wido> niklas: No, you are completely right
[11:20] <wido> My bad
[11:20] <wido> One thing though, Java is 32-bit, so 2GB at max
[11:20] <wido> in a byte array
[11:20] * leseb (~Adium@ Quit (Quit: Leaving.)
[11:20] <tnt> joao: the leader also has those "jumps" in size. just at different times probably because it's been restarted at a different time.
[11:21] <wido> niklas: But you never write or read 2GB at once
[11:21] <tnt> joao: and as I mentionned, the bulk of the extra size seem to be in the LOG and MANIFEST files rather than the .sst table files
[11:21] <joao> tnt, the LOG is used as a write buffer
[11:21] * leseb (~Adium@ has joined #ceph
[11:22] <joao> on compact, it should force those writes to be applied on the store
[11:22] <tnt> joao: I don't think so ... it looks like a bunch of _text_ :p
[11:22] <joao> really?
[11:22] <tnt> yup ...
[11:23] <tnt> there is a 814764.log which is probably the DB transaction log.
[11:23] <joao> ah
[11:23] <joao> yeah, those should be the logs
[11:23] <joao> err
[11:23] <joao> write buffers, whatever
[11:23] <joao> no idea what the LOG file is then
[11:23] <joao> tnt, does the text make any sense?
[11:23] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[11:24] <tnt> http://pastebin.com/QbVPxsv1
[11:24] <tnt> they're logs ...
[11:24] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[11:24] <niklas> wido: rbd uses librbd, I guess? And librbd sits on top of librados?
[11:24] <joao> ah
[11:24] <tnt> joao: but I don't get why they would jump in size like that.
[11:24] <joao> literally leveldb logs
[11:24] <joao> kay
[11:24] <wido> niklas: Correct, librbd is on top of librbd. Same in the Java bindings
[11:24] <wido> that's why they are in one package
[11:25] <niklas> ok, thanks
[11:26] <tnt> joao: also, I've seen https://github.com/bitly/simplehttp/tree/master/simpleleveldb that has a warning about leveldb saying : "there is a known issue with long-running leveldb processes resulting in unbounded memory growth of the process. This is due to the fact that the MANIFEST file is never compacted"
[11:28] <joao> tnt, thanks, wasn't aware of that one
[11:28] <tnt> I didn't find any info about it on the leveldb site tough.
[11:29] <joao> we could create a workaround for that, closing and reopening leveldb
[11:29] <joao> I'll poke the leveldb guys for confirmation on that being an issue
[11:29] <tnt> Also, does anyone know how a file could report a bigger size with 'du -sh filename' than with 'stat filename' ?? Smaller I can understand with sparse files, but larger ?
[11:31] <darkfader> tnt: i think du doesn't think in fragments, so if it sees a file in an 'inode' it assigns it's using inode_size space?
[11:33] <joao> tnt, try with --apparent-size
[11:34] <tnt> for du or stat ?
[11:34] <joao> du
[11:34] <joao> unsure if it will fix it
[11:34] <tnt> yes, then it gives the same value.
[11:34] <joao> honestly, the man page states that it may even make it bigger if it is a sparse file :p
[11:35] <tnt> but it seem to really occupy 156M of space on the disk.
[11:35] <joao> waiting for the leveldb folks to wake up on freenode :)
[11:38] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[11:40] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[11:43] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[11:44] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[11:52] <sig_wal1> hello
[11:52] * loicd (~loic@ has joined #ceph
[11:53] <sig_wal1> after unexpected poweroff of HDD cage some pginfo files zero-sized on XFS and ceph does not run.
[11:53] <sig_wal1> how to repair it ?
[11:53] <darkfader> what does xfs_check / xfs_repair say?
[11:54] <sig_wal1> FS is clean
[11:54] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[11:54] * frank9999 (~frank@kantoor.transip.nl) has joined #ceph
[11:54] <sig_wal1> file just zero-sized, it is normal XFS behaviour on power loss
[11:54] * IHS (~horux@ has joined #ceph
[11:54] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[11:55] <sig_wal1> I need to run ceph without that corrupted PG
[11:55] <sig_wal1> *ceph-osd
[11:56] <sig_wal1> is there any way to do this ?
[12:01] <sig_wal1> deleted all files for that pg
[12:01] <sig_wal1> that helped
[12:02] <andreask> ceph pg repair ... another possibility
[12:03] <sig_wal1> oops, sorry, that didn't help
[12:03] <sig_wal1> andreask: even if osd does not run ?
[12:04] <andreask> no
[12:12] <andreask> sig_wal1: so you might need to replace the osd disk
[12:12] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[12:12] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[12:15] <loicd> I'm interested to know how you managed to run rgw from scratch in the dev environment :-)
[12:15] <loicd> ccourtaut: ^
[12:15] <ccourtaut> loicd: i started a dev cluster with vstart
[12:15] <loicd> since you're using ext4, you are running with the omap flag
[12:16] <ccourtaut> loicd: OSD=3 vstart -l -x -n -r
[12:16] * loicd checking vstart
[12:16] <ccourtaut> on ext4 filesystem
[12:16] <ccourtaut> on the branch wip-rgw-geo-2
[12:18] <loicd> did you manage to successfully run rgw on the latest stable before trying with the dev branch ? In order to not accumulate the problems.
[12:18] <ccourtaut> loicd: i'll try that now
[12:18] <loicd> -r https://github.com/ceph/ceph/blob/master/src/vstart.sh#L76
[12:18] * loicd learning as he reads :-)
[12:19] <loicd> ccourtaut: I wish I had more expertise to help. But I know almost nothing in this area.
[12:19] <loicd> people like yehuda or wido: are the experts :-)
[12:20] * loicd reading https://github.com/ceph/ceph/blob/master/src/vstart.sh#L495
[12:20] * ccourtaut compiling master branch
[12:20] * wido hasn't worked with the geo replication yet of the RGW
[12:21] <ccourtaut> wido: i'm currently just trying to setup a dev env to run Swift tests on RGW
[12:23] <wido> ccourtaut: I'm a CloudStack guy :) OpenStack isn't my expertise
[12:23] <wido> I know how the S3 implementation works, but not Swift and keystone
[12:24] <loicd> ccourtaut: https://github.com/ceph/ceph/blame/master/src/vstart.sh#L494 has not changed in a long time, odds are it is slightly broken. That's usually what happens when code does not get updated ;-)
[12:24] <ccourtaut> wido: Ok
[12:25] <loicd> wido: I'm curious to know how you run the s3 tests on a dev branch for rgw ? https://github.com/ceph/s3-tests
[12:25] * ccourtaut is going out for lunch, be back later
[12:26] <sig_wal1> andreask: replace osd disk because of one zero-sized pginfo file due to XFS specific? I don't think that it is good idea...
[12:27] <sig_wal1> if ceph cannot restore after power loss on XFS (XFS zero-sizes files on power loss very often), it is not very good...
[12:30] * leseb (~Adium@ Quit (Quit: Leaving.)
[12:38] * jpieper (~josh@209-6-205-161.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[12:42] * jpieper (~josh@209-6-205-161.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[12:44] * Vjarjadian (~IceChat77@ has joined #ceph
[12:44] <andreask> sig_wal1: you said you can't start the osd? looks like more than one corrupt pg file ... what do logs tell?
[12:45] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:46] * KindTwo (~KindOne@h50.34.28.71.dynamic.ip.windstream.net) has joined #ceph
[12:46] * leseb (~Adium@ has joined #ceph
[12:46] * KindTwo is now known as KindOne
[12:48] <sig_wal1> andreask: gdb traces show crash on one osd
[12:48] <sig_wal1> the same osd that has corrupted pginfo
[12:48] <sig_wal1> *on one pg
[12:48] <sig_wal1> *the same pg
[12:48] * leseb (~Adium@ Quit ()
[12:48] <sig_wal1> sorry
[12:52] <sig_wal1> so backtraces say that osd always crashes on pg that have empty pginfo file.
[12:53] <sig_wal1> can I remove that pg from osd offline ?
[12:53] <sig_wal1> moving pg files does not help
[12:54] * KindTwo (~KindOne@h66.211.89.75.dynamic.ip.windstream.net) has joined #ceph
[12:54] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:54] * KindTwo is now known as KindOne
[12:59] <andreask> sig_wal1: have a look at that bug report http://tracker.ceph.com/issues/3615
[12:59] * leseb (~Adium@ has joined #ceph
[13:00] * KindTwo (~KindOne@h46.25.131.174.dynamic.ip.windstream.net) has joined #ceph
[13:01] <sig_wal1> we don't use nobarrier and deleting that directory does not help - osd still crashes on the same pg :(
[13:02] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:02] * KindTwo is now known as KindOne
[13:06] <andreask> sig_wal1: hmm ... so you removed pginfo, pglog and the according pg directory?
[13:06] <sig_wal1> yes
[13:08] <sig_wal1> wow, now it crashed on next pg with zero-sized pginfo !
[13:08] <andreask> you have volatile caches enabled?
[13:12] <sig_wal1> ah, barrier=0 was really enabled by admin. I'll say admin that it is bad idea. thank you for your help!
[13:13] <andreask> yw
[13:19] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[13:20] * eschnou (~eschnou@62-197-93-189.teledisnet.be) has joined #ceph
[13:23] * rahmu (~rahmu@ Quit (Remote host closed the connection)
[13:25] * mrjack (mrjack@office.smart-weblications.net) Quit ()
[13:31] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:32] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[13:32] <ccourtaut> loicd: vstart with -r seems to work on master
[13:32] <ccourtaut> loicd: retrying on wip-rgw-geo-2
[13:33] * eschnou (~eschnou@62-197-93-189.teledisnet.be) Quit (Read error: Operation timed out)
[13:40] * KindTwo (KindOne@h51.45.28.71.dynamic.ip.windstream.net) has joined #ceph
[13:40] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[13:42] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[13:43] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Remote host closed the connection)
[13:43] * eschnou (~eschnou@ has joined #ceph
[13:49] * KindTwo (KindOne@h51.45.28.71.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[13:50] <tnt> joao: is it possible that the leveldb is opened twice on mon boot ? (like open/close/open ?)
[13:51] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[13:52] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[13:58] * fabioFVZ (~fabiofvz@ Quit (Remote host closed the connection)
[14:01] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[14:08] <loicd> ccourtaut: good news that vstart with -r works on master. It's nice to find something that works when exploring new code :-)
[14:09] * wogri_risc (~wogri_ris@ro.risc.uni-linz.ac.at) has left #ceph
[14:10] <joao> tnt, IIRC, that's by design
[14:11] <joao> daemonizing ceph-mon made a leveldb thread to quit, generating issues with compaction and all that
[14:11] <tnt> joao: ok, I was just wondering why the LOG.old was always short.
[14:11] <ccourtaut> loicd: yup, retrying on wip-rgw-geo-2, still compiling, to check if it was something related to my env, or to the code
[14:11] <joao> thus we open leveldb, do stuff, close level db, daemonize, open leveldb again
[14:11] <tnt> joao: I guess maybe closing/reopening the leveldb on HUP would make sense as well.
[14:12] <loicd> I like to use ./configure --with-debug CC='ccache gcc' CXX='ccache g++' CFLAGS='-g' CXXFLAGS='-g' --disable-silent-rule
[14:12] <loicd> ccourtaut: when switching branches it speeds up compilation considerably
[14:12] <joao> tnt, that would have to be looked into
[14:13] <joao> I'm afraid implementing that would be a little bit more complex than it should
[14:13] <joao> given we would have to stop with all leveldb accesses while we were closing/reopening it
[14:15] <ccourtaut> loicd: ok, so it seems broken on the branch, don't know why, but seems related to the new region/zone stuff
[14:16] <ccourtaut> 2013-05-27 14:14:52.077929 b6745980 -1 failed reading region info from .rgw.root:region_info.: (2) No such file or directory
[14:17] <loicd> :-)
[14:21] * diegows (~diegows@ has joined #ceph
[14:24] <tnt> I think the "jumps" are actually caused by xfs ...
[14:24] <tnt> joao: ^^
[14:25] * itamar_ (~itamar@IGLD-84-228-128-94.inter.net.il) has joined #ceph
[14:27] <joao> tnt, what makes you think that?
[14:28] <tnt> basically if you have a 128M file and you write 1 byte to it, it will start using 256M on disk until you close it.
[14:29] <tnt> that's the xfs "preallocation".
[14:29] * rahmu (~rahmu@ has joined #ceph
[14:29] <rahmu> hello. When making RGW work with Keystone, why don't I include the tenant_id in the URL?
[14:29] <wido> niklas: fyi, working on a path for byte reading and writing
[14:30] <rahmu> I believe Swift would require the tenant_id in the URL
[14:30] <tnt> joao: and so it double the size each time it reaches the previous allocated size.
[14:35] <tnt> joao: unfortunately leveldb only cycles it when creating the DB object (i.e. on open).
[14:35] <joao> oh
[14:35] <joao> that's unfortunate
[14:43] * BillK (~BillK@58-7-155-66.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[14:47] <niklas> wido: Looks like some minor changes
[14:47] <niklas> wido: I just looked into that, and my changes seem to work
[14:49] <niklas> wido: You should also take a look at IoCTX.java lines 222
[14:49] <niklas> That call seems to be quite dangerous I think
[14:50] <wido> niklas: Small change indeed
[14:50] <loicd> ccourtaut: I write tests for pg_missing_t and it's surprisingly puzzling ... although it's not a lot of code :-)
[14:51] <wido> niklas: Line 222 will be removed, I'll return a byte array anyway :)
[14:51] <wido> and it's up to you to cast it to a String
[14:51] <wido> I dare to make that API change now since nobody is really using it yet
[14:52] * BillK (~BillK@124-169-77-36.dyn.iinet.net.au) has joined #ceph
[14:53] <niklas> wido: You could also just create a second method, like "public byte[] readBytes(String oid, int length, long offset)"
[14:53] <niklas> and just overload the write() methods
[14:53] <wido> niklas: True, still thinking about that
[14:53] <wido> But imho it should have been bytes anyway
[14:54] <niklas> True, but still an API change… It's up to you, I have no idea how many people are using it
[14:55] <wido> niklas: It is an API change indeed. I don't think a lot of people are using it right now, since I released it last week
[14:56] <wido> but all read methods in Java return bytes
[15:01] <ccourtaut> loicd: :)
[15:02] <ccourtaut> loicd: i'm investingating on why vstart -r doesn't work in wip-rgw-geo-2
[15:03] <niklas> wido: public String read(String oid, int length, long offset) expects an int, but passes a long, why not change it so that it expects a long as length ?
[15:03] <wido> niklas: Changing that as well. It should be an it. Since it's going to read bytes and Java only supports 32-bit byte arrays
[15:03] <wido> so the length should be an int
[15:03] <wido> thanks for the feedback btw!
[15:04] * ChanServ sets mode +v leseb
[15:05] <loicd> ccourtaut: good luck with that, I'm sure yehudasa will be gratefull :-)
[15:05] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[15:05] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[15:05] <niklas> wido: RadosObjectInfo.java holds the size of an object as long, so imho you should always use long
[15:05] <niklas> seems to be inconsistent otherwise…
[15:06] <wido> niklas: But when reading, I can't read more then 2GB at once
[15:06] <wido> and you probably don't want to read the whole object at once, but read in 4MB chunks or so
[15:06] <ccourtaut> loicd: sure, but i can't figure out why a command failed in the script, but right after that, if you launch it by hand, it works...
[15:06] <loicd> race conditions maybe ?
[15:07] <niklas> well, if I want to get an object right now, I'd use:
[15:07] <niklas> long length = io.stat(object).getSize();
[15:07] <niklas> byte[] buf = io.read(object, (int) length, 0);
[15:07] <ccourtaut> loicd: maybe, but i tried to wait a moment in the script (ugly sleep) before executing the failing command, but still the same result :/
[15:08] <ccourtaut> loicd: i'm certainly missing something around here
[15:08] <niklas> wido: th int cast seems rather odd…
[15:10] <wido> niklas: True, but a array in Java is 32-bit
[15:10] <wido> you can't create a 64-bit array in Java
[15:10] <wido> so reading more then 2GB at once is impossible in Java
[15:11] <loicd> ccourtaut: environment maybe ?
[15:11] <wido> So the stat will tell you how big the object is and you do a iteration until you've read everything
[15:12] <niklas> wido: ok, that makes sense…
[15:12] <niklas> thanks
[15:14] * vipr_ (~vipr@78-23-114-40.access.telenet.be) has joined #ceph
[15:14] <wido> niklas: I made mistake with read as well
[15:14] <wido> I think I have to make a API change there
[15:14] <wido> It should return an int with the number of bytes it read
[15:14] <wido> and return the actual data by reference
[15:15] <ccourtaut> loicd: maybe
[15:19] * vipr__ (~vipr@78-23-116-33.access.telenet.be) has joined #ceph
[15:21] * vipr (~vipr@78-23-112-45.access.telenet.be) Quit (Ping timeout: 480 seconds)
[15:23] * vipr_ (~vipr@78-23-114-40.access.telenet.be) Quit (Ping timeout: 480 seconds)
[15:24] * vipr (~vipr@78-23-113-4.access.telenet.be) has joined #ceph
[15:27] * vipr__ (~vipr@78-23-116-33.access.telenet.be) Quit (Ping timeout: 480 seconds)
[15:34] * brother| is now known as brother
[15:47] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Remote host closed the connection)
[15:48] * jgallard (~jgallard@gw-aql-129.aql.fr) has joined #ceph
[15:58] <loicd> ccourtaut: it works ?
[15:58] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Read error: Operation timed out)
[16:02] <loicd> ccourtaut: submitting a patch can be done via the mailing list or via a pull request. It tends to be more and more via pull requests.
[16:02] * bergerx_ (~bekir@ Quit (Quit: Leaving.)
[16:02] * vata (~vata@2607:fad8:4:6:6deb:15c:a201:cea4) has joined #ceph
[16:03] * cce is looking for a S3-like service I could use for my applications, that might scale from a single, disconnected VM on a laptop, to a multi-VM system at a client installation.
[16:03] <rahmu> sorry to ask again but when making RGW work with Keystone, why don't I include the tenant_id in the URL?
[16:03] <cce> Is ceph overkill? How would it back it up?
[16:06] <wido> niklas: Btw, be aware with writing. There is a RADOS option max write size
[16:06] <wido> by default it's 100MB
[16:07] <wido> So you can write more then 100MB at once. Better to write in chunks anyway
[16:08] <ccourtaut> loicd: i'll submit it via pull request
[16:13] * jahkeup (~jahkeup@ has joined #ceph
[16:38] <saaby> is there anyway I can force/specify which osd is primary for a PG if my crushmap/crushruleset allows multiple placements?
[16:40] <saaby> I am asking because I have two racks which according to the crushrule can hold the primary osd, and I would like to have ~50/50 in each rack to balance traffic a bit, but right now I have a quite strong bias to one of the racks.
[16:41] <saaby> and but I haven't come across any docs describing how that could be done.. by rule or by cmd.
[16:43] <saaby> this is the rule used: http://pastebin.com/gbBVMHXn
[16:43] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[16:47] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[16:53] <saaby> looks like ~90% of primary osd's are located in one of the two racks, ~10% in the other.
[16:57] <tnt> joao: I created 5175 and 5176 with the issues I still have with the mon. They're much less urgent than the 4895 but I guess they should still be dealt with somehow.
[17:00] * eschnou (~eschnou@ Quit (Remote host closed the connection)
[17:04] * itamar_ (~itamar@IGLD-84-228-128-94.inter.net.il) Quit (Quit: Leaving)
[17:11] <joao> tnt, thanks
[17:24] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[17:24] * ChanServ sets mode +v andreask
[17:27] * Machske (~Bram@d5152D8A3.static.telenet.be) Quit ()
[17:29] * tziOm (~bjornar@ Quit (Remote host closed the connection)
[17:30] * jgallard (~jgallard@gw-aql-129.aql.fr) Quit (Quit: Leaving)
[17:34] * loicd (~loic@ Quit (Read error: Operation timed out)
[17:36] * julian is now known as Guest6791
[17:37] <ccourtaut> yehudasa: i submitted a fix on wip-rgw-geo-2 about rgw_rados.cc
[17:38] <ccourtaut> yehudasa: it was preventing vstart to start with the -r option
[17:48] * fmarchand (~fmarchand@ has joined #ceph
[17:49] <fmarchand> Hi everyone !
[17:50] * fmarchand2 (~fmarchand@ has joined #ceph
[17:50] * fmarchand (~fmarchand@ Quit (Read error: Connection reset by peer)
[17:51] * ay (~ay@ Quit (Remote host closed the connection)
[17:51] * fmarchand2 (~fmarchand@ Quit (Read error: Connection reset by peer)
[17:51] * tnt (~tnt@212-166-48-236.win.be) Quit (Read error: Operation timed out)
[17:51] * fmarchand (~fmarchand@ has joined #ceph
[17:52] <fmarchand> where can I find more info about ACL in RGW?
[17:53] <fmarchand> I would like to know if wa can have millions of users with RGW
[17:54] <joao> ccourtaut, it's unlikely yehudasa or anyone else from LA comes in today (it's a holiday over there)
[17:54] <fmarchand> joao : hi !
[17:55] <joao> hello there fmarchand
[17:55] <fmarchand> joao : how are you snce last time we talked ? :)
[17:55] <joao> good, you?
[17:56] * ay (~ay@ has joined #ceph
[17:56] <fmarchand> joao : fine I'm looking for answers about RGW
[17:57] <fmarchand> :) but it's holidays in california ?
[17:57] <joao> not the best person to help you with that
[17:57] <joao> fmarchand, it's a holiday in the united states, afaik
[17:57] <joao> Memorial Day I believe
[17:57] <fmarchand> oh oki
[17:58] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:58] <fmarchand> thx ... I guess I won't find my answers with nobody there ...
[17:59] <joao> you can try the list
[17:59] * fmarchand (~fmarchand@ Quit (Read error: Connection reset by peer)
[17:59] <joao> ceph-users I guess would be best
[18:02] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:05] * tnt (~tnt@ has joined #ceph
[18:05] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[18:12] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) Quit (Ping timeout: 480 seconds)
[18:14] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:19] <saaby> anyone here knows what to do with a mon which just used ~100GB space for the leveldb store (and then shut down because of low diskspace)?
[18:19] <tnt> saaby: restart it
[18:20] <tnt> actually you need to restart all the mons at the same time.
[18:20] <saaby> aha
[18:20] <saaby> ok
[18:21] <tnt> saaby: status is tracked here http://tracker.ceph.com/issues/4895
[18:21] <saaby> so, shut them all down at the same time and restart?
[18:21] <tnt> hopefully 0.61.3 will soon be out :p
[18:21] <tnt> saaby: yes.
[18:21] <saaby> right :)
[18:21] <saaby> ok
[18:21] <saaby> trying
[18:23] * leseb (~Adium@ Quit (Quit: Leaving.)
[18:25] * rahmu (~rahmu@ Quit (Remote host closed the connection)
[18:26] <ccourtaut> joao, oh, didn't knew about that :)
[18:26] <joao> oh, that reminds me: loicd, no stand-up today!
[18:29] <loicd> joao: thanks for the warning :-)
[18:29] <ccourtaut> :D
[18:29] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[18:30] * loicd finished writing the notes for today's standup anyway ;-)
[18:30] <tnt> oh ... that's why none of my tv shows are out today ...
[18:31] <loicd> ccourtaut: if you'd like a review of your pull request, I can give it a try :-)
[18:31] <loicd> what's the URL ?
[18:31] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[18:31] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit ()
[18:32] <ccourtaut> loicd, https://github.com/ceph/ceph/pull/323
[18:32] <loicd> joao: I take it you're not located in LA and therefore working today ?
[18:32] <ccourtaut> loicd: don't know if it's the best to do
[18:33] <joao> loicd, I'm in Lisbon, and it feels just like any other monday; I'm just hanging around getting some stuff done but without too much fuss :)
[18:34] <loicd> ccourtaut: I assume there are no unit tests for src/rgw/rgw_rados.cc yet and that's why there are no tests associated to your patch, right ?
[18:35] <loicd> joao: :-)
[18:35] <ccourtaut> loicd, if there were tests, i don't know where they are located ^^'
[18:35] <loicd> :-D
[18:35] <ccourtaut> loicd: but i digged :)
[18:37] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[18:38] <loicd> ccourtaut: you can git commit --amend --signoff ; git push --force origin wip-rgw-geo-2
[18:38] <loicd> there is no need to re-issue the pull request, it will be transparently updated
[18:38] <loicd> which is slightly confusing when you're the reviewer
[18:38] <loicd> but since most of them are on vacation today, I guess it's ok ;-)
[18:39] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[18:39] <tnt> saaby: did it work ?
[18:40] * loicd now trying to make sense of https://github.com/kri5/ceph/blob/4ed90da04f10ab27de624597f882ee09dc313601/src/rgw/rgw_rados.cc
[18:42] <saaby> tnt: thanks, yes, it worked
[18:42] <saaby> one of the mons are dead now, however - crashes on startup
[18:43] <tnt> saaby: easiest to just shutit down, erase the data dir, remake a mkfs on it, put back keyring and restart.
[18:43] <tnt> it will auto-resync
[18:43] <saaby> wow, ok
[18:43] <saaby> thanks
[18:44] <saaby> don't think I have tried that on a mon before
[18:45] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:45] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:45] <ccourtaut> loicd: updated
[18:52] <loicd> ccourtaut: other than this the logic seems good. How difficult would it be to bootstrap a unit test for this specific part of the code ?
[18:53] <loicd> rgw_rados being 4k lines long I figure it's not a trivial task
[18:53] <ccourtaut> loicd: yep, it's a good quite and a big piece of code
[19:05] * jshen (~jshen@108-231-76-84.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[19:33] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:33] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:49] * diegows (~diegows@host195.190-224-143.telecom.net.ar) has joined #ceph
[19:54] <cce> Is it possible to run ceph on a single node (in production).
[19:54] * BManojlovic (~steki@fo-d- has joined #ceph
[19:54] <tnt> possible ... yes.
[19:54] * cce realizes you lose redundancy, etc.
[19:54] <tnt> recommended ... not really
[19:54] <cce> tnt: I'm replacing a home-grown file storage mechanism
[19:55] <cce> so, my bar isn't very high.
[19:55] <tnt> and why do you want ceph ? with one node you get none of the advantage and all of the overhead ...
[19:56] <cce> Mostly so that it's a separate service that is documented.
[19:56] <cce> Right now we have a simple HTTP interface to folder, plus rsync on cron for mirroring.
[19:56] <cce> it works; but, it doesn't scale for clients who can scale
[19:56] <tnt> and you want to use radosgw ?
[19:57] <tnt> there are other S3-like provider server that don't have all the overhead.
[19:57] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[19:57] <cce> swift duplicates the data 3x
[19:57] <cce> How bad is the overhead?
[19:58] <cce> tnt: so, yes, a S3 like provider is what I'm looking for.
[19:58] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has left #ceph
[19:59] <tnt> something like https://github.com/jubos/fake-s3 backed by a RAID or brtfs volume.
[19:59] * cce saw that.
[20:01] <cce> tnt: so, 1/2 of our clients are single node; the other half are 3-6 node installs
[20:02] <cce> I was hoping for something that might work with a single node, together with aggressive backups
[20:02] <cce> But, be something that might scale out cleaner.
[20:02] <tnt> we use Ceph in our prod setup, but for the couple of client that have an on-premise deployment, we don't use ceph but rather a custom S3 server just backed by RAID. much easier to setup ...
[20:03] <cce> Ok.
[20:03] <cce> So, you stick with S3 API for your application.
[20:03] <cce> Do you use Fake-S3?
[20:05] <cce> tnt: I was just hoping to remove our "custom server, backed by RAID"
[20:05] <cce> so, finding an external project that we could collaborate with would be quite great
[20:05] <tnt> cce: not Fake-S3, we built our own.
[20:05] <cce> tnt: any reason why you built your own? -:)
[20:06] <tnt> cce: fake-s3 was ruby ... we use all python and we didn't want to have to install and maintain a whole ruby setup.
[20:08] * cce sees a python clone of fake-s3, mock-s3
[20:08] <cce> i have same concern, we have python stack
[20:10] <cce> tnt: thank you for the advice
[20:22] <cce> I still wonder what overhead ceph has even when run as a single node (on mirrored hard drive).
[20:22] <cce> Having 2 solutions seems like it'd be more admin overhead.
[20:45] * Volture (~quassel@office.meganet.ru) Quit (Remote host closed the connection)
[21:13] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Quit: Leaving.)
[21:13] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[21:15] * eschnou (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:20] * pja (~pja@a.clients.kiwiirc.com) has joined #ceph
[21:20] <pja> hello
[21:20] <pja> when i want to start an added mon i get the following error message
[21:21] * pja (~pja@a.clients.kiwiirc.com) Quit ()
[21:21] * pja (~pja@a.clients.kiwiirc.com) has joined #ceph
[21:22] <pja> [19057]: (33) Numerical argument out of domain
[21:22] <pja> failed: 'ulimit -n 8192;  /usr/bin/ceph-mon -i d --pid-file /var/run/ceph/mon.d.pid -c /etc/ceph/ceph.conf '
[21:23] <pja> ceph version 0.61.2
[21:23] <pja> tested with ubuntu 12.04 lts and debian7
[21:37] <jksM> how can I read out the "min_size" of a pool? It is not listed in ceph osd dump... I'm running bobtail, does this require cuttlefish?
[21:38] <tnt> are you sure it's not ?
[21:39] <jksM> tnt, yes, pretty sure :-)
[21:39] <jksM> for example: pool 4 'jks' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 250 pgp_num 250 last_change 8 owner 0
[21:40] <lxo> anyone else using cuttlefish's ceph-fuse for writing experiencing inconsistent pgs quite often? I looked into two cases. in one, a 4MiB file was created for a small (~2KiB) file, and after the file's bytes, there was random junk, and the random junk was different in each replica of the pg; in another, the file was part of a torrent download, and a small portion of the file (<512B) at a random spot within the file differed between the 3 replicas
[21:41] * partner (joonas@ajaton.net) Quit (Ping timeout: 480 seconds)
[21:41] <tnt> jksM: yup, found the commit that added it ... so not in bobtail.
[21:42] <jksM> tnt, does that mean that it cannot be read out in bobtail, or that there is no concept of min_size in bobtail?
[21:42] <lxo> the oddity is that, although it is the pg's master that's supposed to replicate to replicas, this behavior only occurs when files are written to by ceph-fuse; with ceph.ko, files don't end up inconsistent, or with incorrect sizes, in the pgs
[21:42] <tnt> jksM: it can't be read out AFAICT.
[21:42] <jksM> okay, thanks :-)
[21:43] <tnt> but if yu didn't change it, it's '1'
[21:43] <jksM> oh, so the default is 1 no matter what the size is?
[21:44] <jksM> that was really all I needed to know - thanks a lot! :-) I was afraid to change the min_size without knowing the default :)
[21:44] <tnt> jksM: it seems to depend on when the pool was created
[21:46] <tnt> jksM: Ah actually, it seems min_size was added on "Mon Oct 29 15:35:09 2012 "
[21:46] <jksM> okay - mine are old :-)
[21:46] <tnt> so not sure if bobtail is newer than that or not ...
[21:46] <jksM> hmm, my osd dump lists the created date as 2012-10-14
[21:47] <tnt> ok, then bobtail doesn't have the min_size concept.
[21:47] <jksM> I should probably consider upgrading to cuttlefish soon then!
[21:48] <jksM> I haven't been following the problem with mon leveldbs growing... is that solved or should I hold back to the next minor release?
[21:48] <joao> lxo, that is odd indeed
[21:48] <tnt> and when upgrading, min_size will be set to "size - size/2"
[21:48] <jksM> tnt, rounded up or down?
[21:49] <joao> lxo, not sure who's looking after ceph-fuse, and everyone LA-based is away today; mind sending an email to ceph-devel with that?
[21:49] <tnt> int arithmetic. so size/2 will be rounded down.
[21:49] <jksM> tnt, ok - thanks! :-)
[21:49] <tnt> jksM: and the mon growth issue is still present in 0.61.2 so better wait for 0.61.3
[21:49] <jksM> I'll do that! - thanks for the advice again! :)
[21:50] <saaby> tnt: is there any magic involved in startin up the recreated monitor again? - looks like it doesn't figure out that it is part of the cluster, and just starts for it self
[21:51] <tnt> saaby: did you provide the old monmap when doing the mkfs ?
[21:51] <saaby> yes
[21:51] <joao> pja, that might be a ceph.conf issue; mind pastebin it?
[21:51] <joao> *pastebin'ing
[21:51] <tnt> saaby: ceph-mon --mkfs -i <name> --monmap <initial_monmap> --keyring <initial_keyring> ?
[21:52] <tnt> that should just work ...
[21:52] <tnt> I did it recently.
[21:52] <saaby> yes, thats what I did
[21:52] <tnt> what does the log says ?
[21:53] <saaby> this is what the logs say when starting: http://pastebin.com/qQVLRken
[21:54] <tnt> and pastebin the start log of another mon ?
[21:54] <tnt> the monmap you used must be incorrect, it only has 1 mon in it.
[21:55] <saaby> the logs from restarting one of the functioning mons?
[21:55] <tnt> yes
[21:55] <joao> saaby, so, basically, your monitor starts, and is elected leader, but there's a client that is unable to authenticate? is that it?
[21:56] <saaby> joao: well, it looks as if it doesn't "see" the other two mons and the running cluster
[21:56] <tnt> joao: no. one of the mons was KO, so I told him to just erase the data and redo a mkfs and that it should resync from the other mons.
[21:56] <saaby> yup, thats it.
[21:56] <saaby> just checked the monmap used, it has references to all three mons
[21:56] * andreask (~andreask@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[21:56] * ChanServ sets mode +v andreask
[21:57] <joao> saaby, where did you grab that monmap from?
[21:58] <saaby> one of the other monitors: "ceph mon getmap -o monmap" - and scp'ed it over
[21:58] <jksM> general question: can much performance be gained from having mons be stored on ssd drives instead of on ordinary hard drives?
[21:58] <jksM> (having just added osd journals on ssds to all osd and gained quite a lot from that)
[21:59] <tnt> jksM: mons are not in the IO critical path.
[21:59] <jksM> tnt, okay, good to know :) are they in the critical path during a remapping for example? (for example if an osd crashes completely and has to be taken out)
[21:59] <lxo> joao, will do. thanks for reminding me of the partial-US holiday
[22:00] <joao> lxo, I thought it was all across the US :p
[22:01] <joao> I don't really get which ones are for everybody or just for CA :)
[22:02] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has left #ceph
[22:02] <joao> saaby, what does 'monmap --print <monmap-file>' say?
[22:02] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[22:02] * ChanServ sets mode +o scuttlemonkey
[22:02] <joao> does it hold all three monitors?
[22:03] <saaby> you mean monmaptool, right?
[22:03] <saaby> yeah, that shows all three mons
[22:04] <joao> yeah, monmaptool
[22:04] <joao> kay
[22:04] <joao> saaby, time to increase debug mon to 20 and pastebin the resulting log :)
[22:04] <saaby> http://pastebin.com/RZ1jGhgH
[22:05] <saaby> ok - will do
[22:11] <saaby> joao: here it is: http://pastebin.com/Tf5m4186
[22:12] <saaby> that is the log from the mon started as: "ceph-mon -i ceph1-cph1f11-mon1 --debug-osd 20"
[22:13] <wido> yehudasa: Quick question. Is there a limit on the object size in the RGW? I don't think so, right?
[22:17] * eschnou (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[22:29] * eschnou (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[22:30] <saaby> This is the part from the mon log that is interesting I think; It looks as if the other two mons are just ignored:
[22:30] <saaby> mon.ceph1-cph1f11-mon1@-1(probing) e0 initial_members ceph1-cph1f11-mon1, filtering seed monmap
[22:30] <saaby> and from then of it just assumes that it is the first mon in a new cluster, and goes on creating it.
[22:33] * diegows (~diegows@host195.190-224-143.telecom.net.ar) Quit (Read error: Operation timed out)
[22:38] <saaby> right - I think I found the problem. "mon initial members" only had this one mon defined (it was the first to be added to the cluster
[22:38] <saaby> having added the two others, and started it again, it looks as though sync is happening
[22:39] * sleinen (~Adium@2001:620:0:26:c0b:2911:50d4:bd5b) has joined #ceph
[22:41] <nyerup> saaby: I wonder if we even need that directive persisted in the config. Isn't that mainly for when we're bootstrapping the cluster with only one mon?
[22:42] <nyerup> Now that all three share a monmap, an initial member list may just get in the way in recovery situations.
[22:43] <nyerup> But I'm not sure that point is valid.
[22:43] * eschnou (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[22:45] <saaby> nyerup: from what I can see/guess I think that conf directive is a fallback for bootstrapping. i.e. only osed when bootstrapping, hence not being a problem for recovery
[22:45] <saaby> but.. I may be wrong.
[22:47] * pja (~pja@a.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[22:49] <nyerup> saaby: Well - that's my point. We just were in a recovery situation, and it kinda came in the way. :)
[22:49] <saaby> ah.. well. yes. I see your point now.
[22:55] * eschnou (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[22:56] * fmarchand (~fmarchand@85-168-75-207.rev.numericable.fr) has joined #ceph
[22:56] <fmarchand> hi !
[22:57] <fmarchand> joao : still there ?
[22:59] <saaby> tnt, joao: thanks for your help. the mon works again now.
[23:01] <tnt> ok great :)
[23:03] <fmarchand> I'm looking for info about rgw ...
[23:03] <fmarchand> someone knows a little that subject ?
[23:06] <tnt> yup
[23:07] <fmarchand> mmm so tnt I have to work on a "poc" using a cluster which can be access through REST url's
[23:08] <fmarchand> I need a rest endpoint which can handle a lot of concurrent connections
[23:09] <fmarchand> how can I "scale" rgw ?
[23:09] <tnt> you can just put several rgw server accessing the same cluster
[23:10] * andrei (~andrei@host86-155-31-94.range86-155.btcentralplus.com) has joined #ceph
[23:10] <andrei> hello guys
[23:10] <fmarchand> and I can load balance them ...
[23:10] <fmarchand> oki
[23:10] <andrei> i have a newbie question.
[23:11] <andrei> i would like to implement some performance improvements based on recommendations from benchmarks done by http://ceph.com/community/ceph-bobtail-jbod-performance-tuning/
[23:11] <fmarchand> tnt : last question ... can I have millions of "rgw" users with acl's ?
[23:12] <andrei> one i've chosen the options tha I would like to go for, is it a matter of adding them to the ceph.conf and restarting ceph?
[23:12] <andrei> or or do I need to do something else?
[23:12] <tnt> fmarchand: huh ... you said "knows a little" ... I don't know that much :p Ask yehudasa , he's the man for those questions.
[23:13] <fmarchand> :) I understand thx
[23:13] <fmarchand> I will try to find him !
[23:14] <fmarchand> he's working in US ?
[23:27] * jahkeup (~jahkeup@ Quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
[23:39] * eschnou (~eschnou@60.197-201-80.adsl-dyn.isp.belgacom.be) Quit (Read error: Operation timed out)
[23:51] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[23:54] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[23:54] * The_Bishop_ (~bishop@e177088119.adsl.alicedsl.de) Quit (Read error: Operation timed out)
[23:59] * BillK (~BillK@124-169-77-36.dyn.iinet.net.au) Quit (Read error: Connection reset by peer)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.