#ceph IRC Log


IRC Log for 2013-05-09

Timestamps are in GMT/BST.

[0:00] <pioto> i got this when i upgraded one of my OSDs to 0.61... fixed it by just rebuilding the whole osd
[0:00] <sagewk> pioto: hmm that's different
[0:01] <dmick> netmass: what are the perms on /etc/ceph/ceph.conf?
[0:01] <sjustlaptop> pioto: that's a different bug
[0:01] <sjustlaptop> pioto: fs?
[0:02] <pioto> sjustlaptop: btrfs
[0:03] <pioto> on "whatever 'ole kernel is the default on ubuntu 12.04"
[0:03] <pioto> 3.2
[0:04] <pioto> when i gdb'd it... i think the size was 1
[0:04] <pioto> but i didn't dig too deep
[0:04] <pioto> it didn't happen on my other OSDs, so... dunno
[0:04] <pioto> it's also only a test cluster
[0:05] <sjustlaptop> pioto: there are a plethora of problems with btrfs
[0:05] <sjustlaptop> esp. prior to 3.8
[0:05] <pioto> hm, ok
[0:05] <jmlowe> pioto: you will loose data without a 3.8 kernel
[0:05] * jeffv (~jeffv@2607:fad0:32:a02:f5ab:6bed:610e:8166) has left #ceph
[0:05] <pioto> sweet.
[0:06] <nhm> cjh_: heya, how's it going?
[0:06] <jmlowe> happened to me a lot
[0:06] <sjustlaptop> like that one
[0:06] <pioto> i wasn't planning on using it in production... but, wanted to see if it helped performance (and i think it did)
[0:06] * gmason (~gmason@hpcc-fw.net.msu.edu) Quit (Ping timeout: 480 seconds)
[0:06] <cjh_> pretty good :)
[0:06] <nhm> pioto: on a fresh cluster btrfs is definitely speedy.
[0:07] <cjh_> do you think i should try setting up 20 clients that are out of rack instead of testing in rack with rados bench? i'm thinking that might be causing my performance slowdown i keep seeing
[0:07] <cjh_> nothing obvious is the problem at this point
[0:08] <nhm> cjh_: I'm not sure I see how moving the clients out-of-rack would improve performance?
[0:09] <saras> ./configture run noerrors
[0:09] <saras> holy crap
[0:09] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:09] <cjh_> nhm: true. i guess i'm out of things to test then
[0:10] <saras> make is running
[0:11] <nhm> cjh_: did you ever try a pool with just 1x replication?
[0:11] <cjh_> not yet. i'm still at 3x replication
[0:12] <cjh_> think that might help track down the culprit?
[0:12] <nhm> cjh_: it would be interesting to see if you end up seeing a 3x different in performance, or more/less.
[0:13] <saras> doxygen <- is huge what does it do
[0:13] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:13] <cjh_> nhm: good thinking. that would narrow it down to disk or network
[0:14] <saras> saras: doxygen was almost whole gig for what
[0:16] * JaksLap (~chris@ion.jakdaw.org) has joined #ceph
[0:21] <cjh_> nhm: for my 1x pool i get 3.5GB/s for the cluster
[0:21] <saras> https://plus.google.com/u/0/106479011389609622954/posts/4jtAV1rasVt
[0:22] <cjh_> so it's about 1.5GB/s faster. so i'm def taking a hit on the replication but i didn't think it would be this high
[0:22] <Ifur> any success stories here wth using cephfs on SLES with libsdp.so?
[0:23] <Ifur> or without libsdp.so (but would assume most who would set it up; having Infiniband)
[0:23] <netmass> dmick: Headed out. I have the new test cluster up and running. Things went very well after getting the key generated. Many thanks for your help!
[0:24] <saras> http://paste.ubuntu.com/5646104/ i have no idea how to fix this one
[0:25] <dmick> netmass: I just wish I understood why it failed to do so automatically
[0:25] <matt_> Ifur, sdp support was discontinued on OFED unfortunately so I haven't used it. Plenty of us are running ceph on IPoIB just fine however
[0:25] <dmick> did you get a chance to check perms on /etc/ceph/ceph.conf?
[0:26] <netmass> Me too... perms are 0644
[0:26] <Fetch> joshd: on investigation, I'm using qemu 0.12 (what comes with CentOS6.4) which is very likely too old for rbd
[0:26] <Ifur> matt_: i can find libsdp.so installable here on SLES11, but we may not have latest OFED....
[0:27] <saras> dmick: when you have time take look at that error se if their any thing i can do
[0:27] <saras> please
[0:27] <saras> afk
[0:27] * markbby1 (~Adium@ Quit (Quit: Leaving.)
[0:27] <Ifur> IPoIB is not really a substitute for sdp
[0:27] <matt_> Ifur, yeh I think they removed it in 1.5.x onwards
[0:27] <netmass> One thing I noted... there was an "extra user" in /etc/sudoers.d called #sjpceph#. This user was left around accidentally. It did not have the proper permissions so each time a sudo command was executed by any user, there was a warning about that user that was listed on the screen. However, as far as I could tell... the sudo command would still run. I have removed that user. Don't think it is related, but I will have to re-run everyting to check.
[0:28] <Ifur> matt_: thats very unfortunate....
[0:29] * stxShadow (~Jens@ip-88-152-161-249.unitymediagroup.de) has joined #ceph
[0:29] <matt_> Ifur, I'm running datagram mode on a QDR fabric and it pushes 800MB/s no problem doing reads
[0:30] <matt_> connected would be better but my switch is a little dodgy I think, I have a new one on order
[0:31] <Ifur> matt_: matt_ as far as i can tell SDP is still supported, doesnt make sense to have SDP without libsdp.so -- dont see how that would work :)
[0:31] * aliguori (~anthony@ Quit (Quit: Ex-Chat)
[0:31] <Ifur> matt_: thats a 1/4 of theoretical max.... that is not much at all
[0:32] * stxShadow (~Jens@ip-88-152-161-249.unitymediagroup.de) Quit ()
[0:32] <Ifur> if cephfs cannot be used with SDP, its pretty much dead in the water for me atm... need to be able to push 2GB/s per object storage node
[0:32] <matt_> Connected mode got me to 18Gbps in testing
[0:33] <Ifur> stil less than half...
[0:33] <Ifur> mhm, but you dont happen to be running it on SLES?
[0:33] <matt_> yeh, it's more a limitation of the IPoIB stack
[0:33] * netmass (~netmass@ Quit (Quit: HydraIRC -> http://www.hydrairc.com <- Now with extra fish!)
[0:34] <matt_> No, I'm on Ubuntu using the kernel modules
[0:34] <Ifur> ouch, Ubuntu...
[0:34] <Ifur> thats is pretty much the worst distro in a HPC environment, hehe
[0:35] <Ifur> at least the least flexible distro ive used in a HPC setting, was a work around nightmare for me
[0:35] <matt_> I needed the 3.2 kernel from 12.04.1 to fix some KVM bugs. I was previously on SL6.3
[0:36] <Ifur> have nothing bad to say about SLES atm, even tho im new in using it
[0:36] <Ifur> and its 3.0 kernel
[0:36] <Ifur> would suspect that KVM fixes got backported to SL6?
[0:37] * diegows (~diegows@ Quit (Ping timeout: 480 seconds)
[0:37] <Ifur> RHEL have plenty of cloud stuff built around kvm
[0:37] <matt_> Ceph also had some kernel requirements, something to do with syncfs support
[0:37] <Ifur> hmm
[0:38] <matt_> Ubuntu actually has a libsdp package... interesting...
[0:38] <Ifur> yeah, i think you must have confused something there
[0:39] <joshd> Fetch: yeah, that doesn't have rbd in it. we'll have a qemu package for rhel/centos soon, but in the mean time you can compile e.g. qemu 1.4 with configure --enable-rbd
[0:39] <Ifur> SDP is a major selling point, unlikely to go away anytime soon
[0:39] <Ifur> just think of all those latency crazy finance people with custom and handbuilt datatransporty framworks... like SAP and so on....
[0:39] <Ifur> you'd lose that entire market segment
[0:41] <matt_> Maybe it was just the inclusion with OFED... who knows
[0:41] <Ifur> maybe....
[0:41] <Ifur> or maybe there were vendor specific issues?
[0:42] <matt_> hmm... could be
[0:42] <Ifur> but still, i for one would have preferred that ceph was developed for any distrubiton except ubuntu simply because its downstream from debian, and taking debian and dpoing desktop specifics.... while opensuse and fedora are opposite...
[0:42] <Ifur> that RHEL is downstream from fedora, and SLES from opensuse
[0:43] <Ifur> makes more sense that way, rather than the wrong way... (ubuntu being debian unstable, mostly...)
[0:43] * Ifur should maybe not rant about that here...
[0:44] <kylem> hello all. I'm having trouble with my MDS servers. I have two of them and they both crash almost instantly after I start them. sometimes the second one will stay alive for a while. but seems to crash again if i start the first one. I just want one with a hot backup. Here is my conf: http://pastebin.com/aGA90vn9
[0:44] <matt_> I think it was party because ceph was originally meant to be used with BTRFS. Ubuntu is probably the closest thing to bleeding edge with commercial support
[0:45] <matt_> partly*
[0:46] <Ifur> its bleeding edge because its debian unstable....
[0:46] <kylem> any help as to where i should start digging would be awesome.
[0:46] <Ifur> unlike ubuntu, debian is a server distro
[0:46] <Ifur> meaning that debian dont assume their users are idiots
[0:47] <Ifur> im not buying comercial support because it was designed with that mindset
[0:47] <matt_> This is the post about SDP : http://www.spinics.net/lists/ceph-users/msg01102.html
[0:49] <Ifur> well he is wrong, i dont think mellanox and qlogic when trying to sell IB to the finance industry and companies like SAP for use with their ancient tpc/ip applications will care what a ceph developer says
[0:49] <matt_> Post about SDP being depreciated : http://comments.gmane.org/gmane.network.openfabrics.enterprise/5371
[0:49] <Ifur> im sorry...
[0:50] <matt_> apparently rsockets is the future
[0:50] <Ifur> libsdp is stable and wont go away until rsockets also si
[0:51] <Ifur> something isnt deprecated because work has been started on something else, i get the impression pepple are jum[ping to conclusions
[0:53] <saras> back
[0:54] <matt_> I wonder how libsdp would work with qemu using librbd...
[0:55] <Ifur> matt_: found a pdf at openfabris from 2012 presentuing rsockets
[0:56] <Ifur> and well, the fact that they are complaining that sdp sucks because its bad with mpi is kind of crazy, especially when openmpi support rdma through SCM
[0:56] <Ifur> didnt see any real test except against mpi, and well, if you are using sdp to get rdma with mpi you are doing it wrong
[0:57] <Ifur> and even the developer of rsockets said in the presentation that rsockets was about 5 years too late
[0:57] <nhm> cjh_: before were you seeing 2GB/s from the clients or 2GB/s total throughput to the disks?
[0:57] <Ifur> cephfs needs native verbs use at some point to be taken seriously in HPC
[0:57] <Ifur> in the mantime libsdp is the only thing you can do about that...
[0:58] <nhm> Ifur: for what it's worth, with IPoIB we can do about 2GB/s per link.
[0:58] <nhm> on QDR
[0:59] <Ifur> you'd do 3,2GB/s with verbs... and much much much higher IOPS on small files, which is were it matters most
[1:00] <nhm> Ifur: I'm not strictly disagreeing, but there's a tradoff here in terms of how worthwhile it is to target native verbs vs something like rsockets.
[1:01] <Ifur> rsockets and SDP are workaround for deprecated software, also... with how things are going RDMA may very well become a pervasive technology in the future
[1:02] <saras> dmick: did you look at that file
[1:02] <nhm> Ifur: In terms of throughput, there are two questions: Can the client reasonably drive 3.2GB/s per node, and can the OSDs drive 6.4GB/s of backend throughput since Ceph does full data journal writes.
[1:02] <Ifur> you can get a 26 ports 40Gbps IB switch for the same money that a 8port 10GbE switch cost
[1:02] <dmick> sorry, been busy saras
[1:02] <Ifur> *36
[1:02] <saras> dmick: np
[1:03] <saras> Ifur: from who
[1:03] <dmick> atomic is an external dependency IIRC
[1:03] <Ifur> nhm: there is an order og magnitude latency difference with RDMA and withouit... MPI ping from core to core on two differentn noes are 1,2 microseconds today
[1:04] * brady (~brady@rrcs-64-183-4-86.west.biz.rr.com) Quit (Quit: Konversation terminated!)
[1:04] <Ifur> saras: i checked last year, 10GbE switches might be cheaper, but ethernet per design are having scaling issues, 40GbE and 100GbE are way behind schedule allready... IB essentially uses the same architecture that pci express and SAS uses.
[1:05] <Ifur> serial is where it is at today, ethernet was designed when a bus was a better idea than a serial connection
[1:05] <saras> max... guys
[1:06] <nhm> Ifur: RDMA is definitely lower latency I agree, but is it so much better than rsockets that it's worth the coding effort for something like ceph that likes to hide latency with parallelism anyway?
[1:06] * dxd828_ (~dxd828@host-2-97-78-18.as13285.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[1:06] * mip (~chatzilla@cpc15-thor5-2-0-cust102.14-2.cable.virginmedia.com) Quit (Quit: ChatZilla 0.9.90 [Firefox 5.0/20110615151330])
[1:06] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[1:07] <Ifur> nhm: do you want to bet on a workaround? you probably can, thanks to ancient software needed by finance and stock market analytics... but in order to be taken seriously, RDMA is needed in my opinion
[1:07] <nhm> Ifur: btw, what center are you with?
[1:08] <Ifur> nhm: does that matter?
[1:08] <Ifur> im new here, so rather not, but used to be at CERN
[1:08] <nhm> Ifur: Was just curious, most of the folks that are interested in Ceph for HPC have been with one of the various labs.
[1:09] <Ifur> of course they would be, hehe
[1:09] <nhm> Ifur: I used to work for the Minnesota Supercomputing Institute.
[1:10] <Ifur> last thing i did was auto vectorization for a physics simulation package
[1:10] * Oliver1 (~oliver1@ip-178-201-147-182.unitymediagroup.de) Quit (Quit: Leaving.)
[1:10] <Ifur> anyways, RDMA is not going away, even Intel has latched onto it with varying success
[1:11] <matt_> grrr... I swear I got given an impossible problem in my Google interview this morning
[1:11] <nhm> Ifur: It's a hot topic. What it comes down to for us is getting the best bang with limited resources.
[1:11] <Ifur> http://www.intel.com/content/www/us/en/infiniband/truescale-infiniband.html
[1:12] <Ifur> nhm: i can sympathise with that :-/
[1:13] <nhm> Ifur: I think we'll probably start out with rsockets just to see how well it works. If it's not doing what we need, I imagine we'll probably jump into a full RDMA version of the simplemessenger at some point.
[1:13] <sjustlaptop> Ifur: also, from what I have been able to work out, if we didn't use rsockets, we'd need to implement much of rsockets ourselves anyway
[1:13] <sjustlaptop> Ifur: seems easier to put that effort behind rsockets instead
[1:13] <saras> their 3 or 4 companys in networking that i care about mellanox
[1:14] <Ifur> and as you may be aware DoE and such recently funded nvidia and intel and so on a crap load of cash for exascale research, and the biggest problem atm is memory access (RDMA is relevant for that), atm its looking like like the energy needed for one memory access is going to be the same as doing 1000 vector instructions
[1:14] <saras> arista
[1:15] <nhm> Ifur: Yeah, the fast-forward work will be interesting to watch. It's too bad the sequester is going to slow things down though. My guess is no exascale until 2020 or 2022, maybe later.
[1:15] <saras> plexxi
[1:16] <Ifur> sjustlaptop: suspect you are right there, but i would try to keep the door wide open for RDMA nontheless, as in keeping it in mind
[1:16] <saras> maybe brocade
[1:16] <sjustlaptop> well, rsockets is using rdma under the covers anyway
[1:16] <Ifur> nhm: we still arent at petascale
[1:16] <Ifur> you have petaflop machine, but they arent able to use all the resources for any apps that i know off at least
[1:17] <Ifur> the reason why they are building them is probably because of politics and funding agencies
[1:17] <saras> it all was is
[1:18] <Ifur> very very few people need more than 1k cores for one app, its just lots and lots of small jobs that gives the scheduler a hard time due to the shere size of the systems
[1:18] <nhm> There are a *lot* of politics involved.
[1:18] <saras> network and storage are hard part
[1:18] <Ifur> posix is never what you want
[1:19] <Ifur> but unfortunatly people write apps on their laptops where it works well, and wonder why its difficult to run it on a large scale
[1:19] <nhm> Ifur: I heard a while back that there may be some external work done on HDF5 for Ceph wihtout the posix layer. Should be interesting to see if that works out.
[1:19] <saras> weta still geting butt kick by big data we issues
[1:20] <Ifur> nhm: maybe intel will make a library or some profiling tool or compiler or somethibg something that can do some magic
[1:20] <saras> lol
[1:21] <Ifur> but still, when it comes to Infiniband, if you need IO and buy 10GbE instead of IB, youre a retard with too much money
[1:22] <saras> lol
[1:22] <nhm> Ifur: was just talking to some folks out at LLNL. Sounds like they've got quite a few people using HDF5 at this point, but still lots of posix too.
[1:22] <saras> that like fiber channel
[1:23] <Ifur> nhm: thats interesting and good new, probably comes from all the pressure to get something that can use these petascale machines they have, hehehehe
[1:24] <Ifur> saras, the same relationships is with SAS atm, if you go with fiberchannel istead of SAS, youre likely buying a system from IBM -- and those guys like bleeding people for money on storage
[1:25] <saras> Ifur: so ture
[1:25] <Ifur> SAS2 basically does what you want, 36 port expanders that work like infiniband, support zoning (think virtual lans) the drives have dual ports and can be connected to two expanders, and the expanders can form a network
[1:25] <Ifur> so why go with fibrechannel?
[1:26] <nhm> Ifur: I really wish expanders performed consistently.
[1:26] <nhm> across brands/models/configurations
[1:27] <Ifur> nhm: if you buy these big systems from LSI they probsably do :D
[1:27] <Ifur> aha, well, you dont have an OFED equivalent for SAS AFAIK
[1:27] <nhm> Ifur: Best luck I've had so far is to just stick lots of controllers in a box and directly connect all of the disks.
[1:27] <Ifur> but even with IB there are subtle differences, even tho they interoperate
[1:27] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Read error: Operation timed out)
[1:28] <Ifur> you need to be on a very large scale if connecting two jbods isnt enough :)
[1:29] <saras> mesh network the only sane way scale out the network
[1:29] <saras> not up
[1:31] <Ifur> well, id rather get screwed by paying a premium for something that has been properly tested on large scale by an integrator than doing it myself
[1:32] <saras> Ifur: pay the jester
[1:32] <saras> by who get ifur going
[1:32] * jlogan2 (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[1:34] <Ifur> lol, well, storage gives me anxiety :D
[1:34] <saras> the futur of network is all about mirrors
[1:34] <Ifur> i think cepfs is on the right track with doing stuff in software
[1:34] <dmick> saras: do you have libatomic-ops-dev installed?
[1:35] <Ifur> low level optimzations tends to be the bane for many a paralell and distributed storage
[1:35] <saras> dmick: i think so
[1:36] <dmick> those defs should be coming from /usr/include/atomic_ops.h (and friends)
[1:38] <saras> dmick: i did should check the verison
[1:43] <saras> dmick: i ver 7.2 of that
[1:43] <cjh_> nhm: total throughput to the disks for the cluster
[1:43] <dmick> I doubt it's version-dependent
[1:43] <saras> dmick: 7.2~alpha5+cvs20101 armhf
[1:43] <dmick> but you want to diagnose why those definitions aren't taking effect
[1:44] <dmick> probably something about autoconf?
[1:44] <cjh_> nhm: i'm thinking of going back to one giant raid6 and then starting 8 osd processes ( 1 for each cpu ) and seeing if that does it
[1:44] <matt_> Ifur, do you know if SDP falls back to regular sockets if it can't connect using SDP?
[1:46] <sagewk> matt_: hey
[1:46] <sagewk> matt_: are you still seeing unbounded mon data growth?
[1:46] <matt_> sagewk, indeed I am
[1:47] <sagewk> on 0.61? with the default mon options?
[1:48] <Ifur> matt_: dont know for sure, but something tells me thats technically very problematic, as how would the recieving end know to start listening on other sockets?
[1:48] <Ifur> matt_: think ZMQ might do something like that, but its a different ball game
[1:48] <matt_> sagewk, Yep
[1:48] <Ifur> if you need SDP, then a fallback is likely not going to work anyways
[1:49] <sagewk> matt_: how big is it now?
[1:49] <saras> dmick: i when't down list one by one install all the dep the only did not was the google thing
[1:49] <Ifur> (if SDP was only for storage it would make sense, but still would need to be done on the application libsdp.so layer...)
[1:50] <matt_> sage, I compacted it offline this morning from 19G down to 660M. It's at 680M now but all my users are offline so there isn't much load yet
[1:50] <Ifur> i want things to follow the unix philosophy as much as possible... reliable failure is important!
[1:51] <matt_> Ifur, I thought that might be the case. I was just thinking of a test scenerio having mixed ceph clients on sdp/tcp
[1:51] <matt_> Ifur, I need to get back in the office and test it out using some spare hardware
[1:52] <sagewk> matt_: i have instructions on http://tracker.ceph.com/issues/4895 for generating a trace with useful debug info. i will push a branch based on cuttlefish with the necessary support shortly
[1:52] <Ifur> matt_: since libsdp.so is LD_PRELOAD and bascially screws with everything tcp, youd probably have to initate it by UDP or some other hack, like telling it to HUP without libsdp.so
[1:53] <dmick> saras: not sure what to say; it's a build problem you're gonna have to dig into. We've never built on Pi, or ARM, to my knowledge
[1:54] <Ifur> i know FraunhoferFS or FhGFS uses all available links, wether IB or GbE... but doubt this works with SDP... likely only for verbs and ipoib
[1:54] <saras> i was their was older packages so some one has
[1:54] <matt_> Ifur, the docs do mention a conf file where you can restrict the ports and IP's enabled for SDP. It appeared to be a part of OFED however
[1:54] <Ifur> but still, if you rely on IB then you kind of want to stop until its back up.
[1:54] <saras> dmick: so i have ever thing on readme but 2 google thing and dot
[1:55] <Ifur> matt_: yes. libsdp.conf tells which apps and so on should use sdp, but im not an expert on SDP
[1:55] <Ifur> didnt know that you can do this on a port by port basis to be honest
[1:55] <Ifur> (never had a use for SDP, just always wanted to try it)
[1:56] <dmick> saras: ok
[1:56] <Ifur> so any proper sysadmin tasks involving it is still lack
[1:56] <matt_> Ifur, haha same here. Only reason I haven't had a play is because I thought it was depreciated
[1:56] <Ifur> its not, and its not going away
[1:56] <saras> just he got all the dock stuff on their going to try it again
[1:57] <matt_> Ifur, I have an SSD storage tier in Ceph with 48 SSD's so far. It would fly if I have native IB support
[1:57] <Ifur> matt_: i mean, just because it gives lower latency, banking, finance and other comercial people will be willing to pay for it... finance people do stock trades on FPGA's because normal computers are too slow... doing one trade in 16 nano seconds
[1:58] <Ifur> if you are doing million banking transactions per hour, and going from 10ms to 1ms means 9ms of interest per trade, this adds up quick
[1:59] <saras> delete and try from git
[1:59] <matt_> Ifur, I wasn't so worried about it's availability in the future. It was more getting it to compile with OFED support. Now I know there's an ubuntu module then it's not really an issue anymore :)
[1:59] <saras> dmick: by the way thanks very much
[1:59] * mjblw (~mbaysek@wsip-174-79-34-244.ph.ph.cox.net) Quit (Quit: Leaving.)
[2:00] * LeaChim (~LeaChim@ Quit (Read error: Connection reset by peer)
[2:00] <Ifur> matt_: aha! well, from a techynology point of view i agree is deprecated, but i dont agree that its going away anytime soon, depends on your time scale really, but even so... enterprise support wil likely exists and be demanded for years to come, dont worry about it :)
[2:01] <Ifur> but damn, this was interesting, but will just have to buy hardware and test ceph on suse :P
[2:01] <matt_> sagewk, how large are the traces it produces? Just need to plan in case I'm going to fill my mon partition
[2:02] * berant (~blemmenes@24-236-240-247.dhcp.trcy.mi.charter.com) has joined #ceph
[2:02] <nhm> Ifur: yes, we are always looking for volunteer testers. ;)
[2:03] <saras> nhm: i try man
[2:03] <Ifur> nhm: but i shouldnt have a problem getting it up and running on suse, the packages and kernel modules should all work off the bat?
[2:03] <nhm> Ifur: to be honest I've never installed on SuSE. :)
[2:04] <nhm> Ifur: Most of my testing has been on Ubuntu or CentOS (and centos was compiled from scratch)
[2:04] <Ifur> suse tends to be a bit newer than rhel based i believe
[2:05] * JaksLap (~chris@ion.jakdaw.org) Quit (Ping timeout: 480 seconds)
[2:05] * dwt (~dwt@128-107-239-234.cisco.com) Quit (Quit: Leaving)
[2:05] <Ifur> SLES is kernel 3.0 atm, not 2.6 like in RHEL
[2:06] <sagewk> matt_: latest cuttlefish branch will do (still building :)
[2:06] * tkensiski (~tkensiski@ has joined #ceph
[2:06] <sagewk> matt_: trace size is pbly on the order of the amount of growth
[2:07] <sagewk> so keep an eye on it... :/
[2:07] <Ifur> matt_: 3.7 for opensuse tbh
[2:07] <saras> Ifur: you work for suse
[2:07] <Ifur> nhm, i meant.
[2:07] <Ifur> saras: no, just recently started working with it, and am positively surprised...
[2:07] <Ifur> saras: OBS alone is quite nifty
[2:08] <Ifur> germans tend to over-engineer things, and i like this
[2:08] <matt_> sagewk, no problems. I should be able to get you a trace today once the load picks up again
[2:08] <sagewk> matt_: thanks, appreciate it!
[2:08] <saras> Ifur: one guy from the gallery team give great talk linux fest north west about picking js framework
[2:08] <Ifur> saras: http://openbuildservice.org/
[2:08] <sagewk> matt_: if we can get a sample of a small leveldb and a trace that makes it grow, hopefully fixing compaction won't be too difficult
[2:09] <Ifur> saras: lost me there, gallery team picking js framework?
[2:09] <saras> Ifur: once
[2:10] <Ifur> saras: still not following
[2:10] <Ifur> im dense
[2:10] <saras> http://blog.susestudio.com/2013/03/client-side-js-mv-framework-roundup.html
[2:11] <saras> he really good talk about the 2 weeks of his life that was lose doing that
[2:11] <Ifur> saras: yeah, not that into suse stuff yet, but did notice there is an opensuse talk on gentoo-prefix, and i must admit im a long time gentoo fan
[2:12] <saras> by the way gallery will be get the muilt arch stuff
[2:13] <Ifur> well, gotto get sleep im basically screwing myself now, later and thanks for the chat!
[2:14] * tkensiski (~tkensiski@ Quit (Ping timeout: 480 seconds)
[2:21] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[2:25] <saras> sagewk: download from ceph org or git
[2:25] <saras> github
[2:26] * alram (~alram@ Quit (Quit: leaving)
[2:28] <dmick> saras: is that a question?
[2:28] <dmick> and, if so, what are you asking about?
[2:29] <saras> dmick: it was
[2:29] <saras> but tring with no git
[2:29] <dmick> what are you trying to download?
[2:29] <saras> yes
[2:30] <dmick> that question can't be answered with yes. What thing are you trying to get?
[2:30] <saras> use github or the verison off ceph.com
[2:31] <dmick> the version *of what*
[2:31] <dmick> github has sources. ceph.com has packages. they're not equivalent.
[2:31] <saras> the ceph source code
[2:31] <saras> you tar.gz on the site
[2:31] <dmick> github is where the source code is
[2:32] <dmick> tarballs are always out of date
[2:33] <saras> so then i should use github
[2:36] <dmick> yes, absolutely
[2:37] <dmick> you said before "delete and try from git". Had you been trying to build from some tarball?
[2:38] <saras> yes
[2:38] <dmick> where did you get that?
[2:38] <dmick> ceph.com, I assume, but, where?
[2:38] <saras> http://ceph.com/download/ceph-0.61.tar.gz
[2:39] <dmick> ah, ok, so at least it wasn't hopelessly out of date
[2:39] <dmick> but yeah, I'd still use git
[2:39] <saras> http://ceph.com/resources/downloads/
[2:39] <dmick> (I honestly didn't know we were still publishing tarballs that easily findable)
[2:39] <saras> very easy
[2:40] <dmick> that shouldn't be your problem, in any event, though, but, still better to be set up with git
[2:41] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[2:48] * rustam (~rustam@ has joined #ceph
[2:48] * hflai (~hflai@alumni.cs.nctu.edu.tw) has joined #ceph
[2:48] <saras> dmick: git submodule update --init this is must right
[2:49] * tkensiski (~tkensiski@2600:1010:b014:ef84:fd7d:3870:f801:e4ae) has joined #ceph
[2:49] <dmick> saras: just as the README says, yes
[2:49] * tkensiski (~tkensiski@2600:1010:b014:ef84:fd7d:3870:f801:e4ae) has left #ceph
[2:49] <saras> ok thanks
[2:51] * rustam (~rustam@ Quit (Remote host closed the connection)
[2:56] <jfchevrette> Any ideas why I get this when trying to list an OSD's disks (it's my first time!) http://pastebin.com/vXj38kpC
[2:57] <dmick> jfchevrette: not very useful, is it
[2:58] <jfchevrette> dmick, well I'm a noob. Not really helpful indeed.
[2:58] <dmick> I'm actually working on the error handling as we speak, but, some things to check: passwordless ssh and sudo from 'ceph' to 'osd1' are working OK?
[2:58] <jfchevrette> dmick, #1
[2:59] <dmick> that is, ssh osd1 "sudo id" works ok?
[2:59] <jfchevrette> yep
[3:00] <dmick> what version of ceph does osd1 have, and does it have the 'ceph-disk' tool installed?
[3:00] <jfchevrette> Ah wait. I have to ceph-deploy install the OSDs for this to work, do I ?
[3:00] <jfchevrette> ah :)
[3:01] <dmick> the hosts have to at least have ceph installed, yes
[3:01] <jfchevrette> I was following the quick-ceph-deploy guide but trying to do a 2 mons 2 osds setup instead of a single host
[3:02] <dmick> yep. that'd be the [] in "ceph-deploy install {server-name}[,{server-name}]" :)
[3:02] <dmick> but the command should definitely have told you that instead of puking a backtrace without any info about what was missing
[3:02] <dmick> coming shortly
[3:02] <jfchevrette> Ah. Makes sense.
[3:03] <jfchevrette> I had only installed ceph on the monitor servers as the guide was only referencing monitors up to that point.
[3:03] <jfchevrette> thanks :)
[3:03] <dmick> np
[3:14] * glowell (~glowell@ Quit (Quit: Leaving.)
[3:14] * Tamil (~tamil@ Quit (Quit: Leaving.)
[3:22] <jfchevrette> dmick, apparently I have to create a partition table on my osds disks for ceph-deploy to be able to recognize/use them?
[3:22] <dmick> ceph-deploy should be able to do that for y9ou
[3:23] <saras> dmick: http://paste.ubuntu.com/5646453/ is their any log files that could help
[3:23] <dmick> you specify an osd destination as host:data:journal, where data and journal can be whole disk (in which case you get a whole-disk partition, I think), a partition, or a path
[3:24] <dmick> special case, if you specify data as whole-disk, it'll slice it up into two partitions, one for data and one for journal. At least that is how I understand it
[3:24] <jfchevrette> I did disk zap osd:/dev/sdb. disk list won't list it though. I do see a new GPT partition table on it
[3:24] <jfchevrette> GPT table w/o partitions ...
[3:24] <jfchevrette> hmm ..
[3:25] <dmick> saras: not that I'm aware of. You'll need to examine the code. Note that you can "make V=1" and get more information about each compilation
[3:25] <sage> zap should blow away teh gpt partition table...
[3:25] <jfchevrette> ahh.. then 'osd create' will do the rest
[3:25] <sage> if its still there (cat /proc/partitions) after zap, then shoot me in the head. and then smack the sgdisk devs
[3:25] <dmick> if you log into osd, does ceph-disk list show it?
[3:25] <dmick> lol@sage
[3:26] <dmick> saras: it's likely this is not a problem with prerequisites; it's likely there's some incompatibility with the way autoconf set options and the atomic libs on your platform
[3:27] <dmick> but this is basic porting stuff: compiles break, and you have to diagnose why
[3:27] <saras> yeap
[3:27] <sage> saras: this is libatomic-ops not being fully supported for your platofrm
[3:27] <sage> you can build --without-libatomicops (or somethign like that)
[3:28] <dmick> sage: oh? someone would provide any implementation that doesn't include fetch_and_add1?
[3:28] <sage> ./configure --without-libatomic-ops
[3:28] <sage> probably has a slightly different name or something
[3:28] <dmick> O_o
[3:28] <sage> i seem to recall some grief from the debian maintainers about arm builds way back when
[3:28] * rustam (~rustam@ has joined #ceph
[3:29] <sage> in any case, turning off the lib will avoid it
[3:29] <saras> AO_fetch_ stuff
[3:29] <sage> and someday we should make the non-lib version use the gcc primitives instead of a spinlock :)
[3:29] * Oliver1 (~oliver1@ip-178-201-147-182.unitymediagroup.de) has joined #ceph
[3:29] * Oliver1 (~oliver1@ip-178-201-147-182.unitymediagroup.de) Quit ()
[3:29] <saras> sage: you know what the AO_fetch_ is about
[3:31] <dmick> saras: they're just routines in libatomic_ops. AO_fetch_and_add1 does an atomic increment
[3:31] <dmick> http://www.hpl.hp.com/research/linux/atomic_ops/README.txt
[3:37] <saras> dmick: why whould that break the build
[3:38] <dmick> it's not available in your version of libatomic_ops
[3:38] <dmick> it is on x86
[3:38] <dmick> but not on arm
[3:38] <dmick> apparently
[3:38] <dmick> hence, do as sage suggested: just avoid the lib
[3:40] <dmick> maybe you have the wrong version of libatomic_ops, as I see this appears to implement it: https://github.com/ivmai/libatomic_ops/blob/master/src/atomic_ops/sysdeps/gcc/arm.h (but there are lots of different ARM variants, and maybe it's conditionally compiled)
[3:41] * jfchevrette (~jfchevret@modemcable208.144-177-173.mc.videotron.ca) Quit (Quit: Leaving)
[3:42] <saras> dmick: hum
[3:42] <dmick> then again that's 7.4
[3:43] <dmick> so you can try to sort out the right versions, or just configure without it
[3:43] <saras> i am tring without it
[3:44] <Fetch> anyone from the project know what the timeframe on making CentOS/RHEL qemu builds with rbd support is? I'm torn between doing it myself and waiting, if it's like "yeah we've got someone on it, should be done tomorrow"
[3:44] <saras> sage: how big deal is atomic-ops stuff
[3:45] * portante|ltp (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[3:46] * Cube (~Cube@ Quit (Ping timeout: 480 seconds)
[4:01] <saras> winng i think we got it
[4:01] <joshd> Fetch: a repo should be ready this week or next, but in the mean time you can use https://objects.dreamhost.com/rpms/qemu/qemu-kvm-
[4:03] <joshd> also https://objects.dreamhost.com/rpms/qemu/qemu-img- and https://objects.dreamhost.com/rpms/qemu/qemu-kvm-tools-
[4:03] <Fetch> huh, 0.12 can built with rbd support? did that take a lot of patching?
[4:04] <joshd> not too much, since red hat already backported thousands of things
[4:09] <saras> how long has red hat shiped ceph
[4:10] <joshd> rhel doesn't include it yet, it's just in EPEL so far
[4:10] <saras> joshd: kool
[4:10] <Fetch> I hope they do soon, but to be honest Ceph has a crazy velocity to try to track for centos/rhel
[4:11] <Fetch> I couldn't blame them if they didn't
[4:11] <saras> it slow down some day
[4:11] <saras> maybe
[4:11] <joshd> true, but having even an older stable version included would make trying it out much easier
[4:12] <saras> will it should other wise it will crazy scope creep
[4:12] <saras> ceph db does not sound bad
[4:15] <joshd> rados is a nice building block for lots of things, but they don't all have to be part of the ceph project
[4:16] <saras> i got to learn at lot about what the part of ceph are
[4:33] * nhm (~nhm@65-128-150-185.mpls.qwest.net) Quit (Quit: Lost terminal)
[4:40] * b1tbkt (~Peekaboo@24-216-67-250.dhcp.stls.mo.charter.com) has joined #ceph
[4:45] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has joined #ceph
[4:46] * tkensiski (~tkensiski@c-98-234-160-131.hsd1.ca.comcast.net) has left #ceph
[5:10] * rustam (~rustam@ Quit (Remote host closed the connection)
[5:28] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:39] * JohansGlock (~quassel@kantoor.transip.nl) has joined #ceph
[5:44] * via (~via@smtp2.matthewvia.info) Quit (Ping timeout: 480 seconds)
[5:46] * JohansGlock_ (~quassel@kantoor.transip.nl) Quit (Ping timeout: 480 seconds)
[5:47] * noahmehl (~noahmehl@wsip-98-173-51-204.sd.sd.cox.net) has joined #ceph
[5:50] * mohits (~mohit@ Quit (Ping timeout: 480 seconds)
[5:55] * mohits (~mohit@zccy01cs105.houston.hp.com) has joined #ceph
[5:58] <saras> note if any compiles on PI setup more swap
[6:25] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[6:37] * via (~via@smtp2.matthewvia.info) has joined #ceph
[6:38] * brian_appscale_ (~brian@wsip-72-215-161-77.sb.sd.cox.net) has joined #ceph
[6:42] * noahmehl (~noahmehl@wsip-98-173-51-204.sd.sd.cox.net) Quit (Quit: noahmehl)
[6:43] * brian_appscale (~brian@wsip-72-215-161-77.sb.sd.cox.net) Quit (Ping timeout: 480 seconds)
[6:43] * brian_appscale_ is now known as brian_appscale
[7:52] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[7:53] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit ()
[8:00] * Oliver1 (~oliver1@ip-178-201-147-182.unitymediagroup.de) has joined #ceph
[8:00] * Oliver1 (~oliver1@ip-178-201-147-182.unitymediagroup.de) Quit ()
[8:04] * tnt (~tnt@ has joined #ceph
[8:09] * bergerx_ (~bekir@ has joined #ceph
[8:12] * dpippenger (~riven@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[8:18] * coyo (~unf@00017955.user.oftc.net) Quit (Quit: F*ck you, I'm a daemon.)
[8:22] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:38] * Vjarjadian (~IceChat77@ Quit (Quit: It's a dud! It's a dud! It's a du...)
[8:54] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[9:22] * The_Bishop (~bishop@2001:470:50b6:0:40e7:824f:8945:bf7f) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[9:30] * tnt (~tnt@ Quit (Ping timeout: 480 seconds)
[9:33] * LeaChim (~LeaChim@ has joined #ceph
[9:39] * ScOut3R (~ScOut3R@ has joined #ceph
[9:43] * Muhlemmer (~kvirc@cable-88-137.zeelandnet.nl) has joined #ceph
[9:48] * BManojlovic (~steki@ has joined #ceph
[10:02] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[10:10] * glowell (~glowell@c-98-210-224-250.hsd1.ca.comcast.net) has joined #ceph
[10:17] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[10:26] <saras> will now my PI got 16 gigs of swap
[10:27] <saras> aka my thumb drive
[10:29] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[10:32] * NyanDog (~q@ Quit (Ping timeout: 480 seconds)
[10:34] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[10:35] * fabioFVZ (~fabiofvz@ has joined #ceph
[10:46] * NyanDog (~q@ has joined #ceph
[10:48] <saras> winning it got past Server.o
[11:10] <mattch> Are there any plans to make ceph-deploy read cluster provisioning info from ceph.conf in the way that mkcephfs does? So you can run 'ceph-deploy osd create' without further args and it will create osds as needed according to the conf args?
[11:11] <saras> at this point i have not though that far
[11:13] <saras> mattch: why will ram be hugh issue with it
[11:14] <mattch> saras: Not sure I follow - where does ram come in?
[11:15] <saras> mattch: sorry you where not taking to me
[11:15] <mattch> saras: :)
[11:17] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[11:17] <saras> mattch: https://plus.google.com/u/0/106479011389609622954/posts/JfeSzrZuq6K this what i have been fighting
[11:19] <mattch> saras: ceph for pi? brave :)
[11:20] <saras> nuts is more like it
[11:26] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:27] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:27] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:27] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit ()
[11:27] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:28] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:28] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:28] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit ()
[11:28] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:29] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:29] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:29] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:29] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:30] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:30] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:31] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit ()
[11:31] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:31] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:31] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:31] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:31] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:32] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:32] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:33] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit ()
[11:33] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:33] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:33] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:34] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:34] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:34] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:34] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:35] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:35] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:35] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:35] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:36] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:36] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:36] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:36] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:37] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) has joined #ceph
[11:37] * fghaas1 (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[11:37] <Ifur> would be nice if people throttled rejoins a little bit
[11:45] * fghaas (~florian@91-119-65-118.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[11:52] * Ifur (~osm@hornbill.csc.warwick.ac.uk) Quit (Read error: Connection reset by peer)
[11:54] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[11:59] * leseb (~Adium@AMarseille-651-1-203-27.w92-153.abo.wanadoo.fr) has joined #ceph
[12:11] * leseb (~Adium@AMarseille-651-1-203-27.w92-153.abo.wanadoo.fr) Quit (Quit: Leaving.)
[12:18] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[12:18] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[12:19] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[12:19] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[12:28] * kylem (~kyle@ Quit (Read error: Connection reset by peer)
[12:28] * kylem (~kyle@ has joined #ceph
[12:34] * allsystemsarego (~allsystem@ has joined #ceph
[12:35] * berant (~blemmenes@24-236-240-247.dhcp.trcy.mi.charter.com) Quit (Quit: berant)
[12:36] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Remote host closed the connection)
[12:36] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[12:43] * madkiss (~madkiss@2001:6f8:12c3:f00f:d413:cd8:6fc1:6e62) Quit (Ping timeout: 480 seconds)
[12:52] * BillK (~BillK@58-7-104-61.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[12:57] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:57] * KindTwo (KindOne@h6.178.130.174.dynamic.ip.windstream.net) has joined #ceph
[12:57] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[12:57] * KindTwo is now known as KindOne
[12:58] * ScOut3R (~ScOut3R@ Quit (Remote host closed the connection)
[13:00] * ScOut3R (~ScOut3R@ has joined #ceph
[13:01] * BillK (~BillK@58-7-220-225.dyn.iinet.net.au) has joined #ceph
[13:12] * leseb (~Adium@AMarseille-651-1-203-27.w92-153.abo.wanadoo.fr) has joined #ceph
[13:17] * fghaas (~florian@91-119-122-29.dynamic.xdsl-line.inode.at) has joined #ceph
[13:25] * Rocky (~r.nap@ Quit (Quit: **Poof**)
[13:27] * fghaas (~florian@91-119-122-29.dynamic.xdsl-line.inode.at) Quit (Ping timeout: 480 seconds)
[13:33] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[13:34] * markbby (~Adium@ has joined #ceph
[13:42] * leseb (~Adium@AMarseille-651-1-203-27.w92-153.abo.wanadoo.fr) Quit (Quit: Leaving.)
[13:50] * markbby1 (~Adium@ has joined #ceph
[13:50] * markbby (~Adium@ Quit (Remote host closed the connection)
[14:02] * berant (~blemmenes@sslvpn.ussignalcom.com) has joined #ceph
[14:14] <Fetch> joshd: as you would expect, those qemu rpms with rbd support fixed my issue
[14:18] * madkiss (~madkiss@91-119-68-174.dynamic.xdsl-line.inode.at) has joined #ceph
[14:18] * madkiss (~madkiss@91-119-68-174.dynamic.xdsl-line.inode.at) Quit ()
[14:22] * mip (~chatzilla@cpc15-thor5-2-0-cust102.14-2.cable.virginmedia.com) has joined #ceph
[14:47] * madkiss (~madkiss@91-119-68-174.dynamic.xdsl-line.inode.at) has joined #ceph
[14:48] * mohits (~mohit@zccy01cs105.houston.hp.com) Quit (Ping timeout: 480 seconds)
[14:57] * berant_ (~blemmenes@sslvpn.ussignalcom.com) has joined #ceph
[15:00] * mohits (~mohit@ has joined #ceph
[15:02] * berant (~blemmenes@sslvpn.ussignalcom.com) Quit (Ping timeout: 480 seconds)
[15:02] * berant_ is now known as berant
[15:03] * Vjarjadian (~IceChat77@ has joined #ceph
[15:18] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[15:32] * portante|ltp (~user@c-24-63-226-65.hsd1.ma.comcast.net) Quit (Ping timeout: 480 seconds)
[15:35] <Vjarjadian> the google hangouts work nicely, but the first one was a bit long.
[15:37] * tnt (~tnt@ has joined #ceph
[15:48] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has joined #ceph
[15:48] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has left #ceph
[15:55] * noahmehl (~noahmehl@wsip-98-173-51-204.sd.sd.cox.net) has joined #ceph
[15:56] * markbby1 (~Adium@ Quit (Quit: Leaving.)
[15:56] * yehuda_ (~yehuda@2602:306:330b:1410:8595:8c8e:abfc:6738) Quit (Ping timeout: 480 seconds)
[15:57] * DarkAce-Z (~BillyMays@ has joined #ceph
[15:58] * yehuda_ (~yehuda@2602:306:330b:1410:57:9565:bd48:71cb) has joined #ceph
[16:00] * DarkAceZ (~BillyMays@ Quit (Ping timeout: 480 seconds)
[16:03] * markbby (~Adium@ has joined #ceph
[16:12] <saras> http://paste.ubuntu.com/5647982/ spoted this message might be useful
[16:12] <saras> ps the build is still going sweet
[16:17] * mega_au (~chatzilla@ has joined #ceph
[16:24] * drokita (~drokita@ has joined #ceph
[16:27] <scuttlemonkey> summaries and videos from the Summit are up: http://ceph.com/events/ceph-developer-summit-summary-and-session-videos/
[16:31] * mega_au (~chatzilla@ Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[16:31] <saras> scuttlemonkey: thanks
[16:32] * noahmehl (~noahmehl@wsip-98-173-51-204.sd.sd.cox.net) Quit (Quit: noahmehl)
[16:37] * eschnou (~eschnou@54.120-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[16:44] * jskinner (~jskinner@ has joined #ceph
[16:46] * madkiss1 (~madkiss@91-119-68-174.dynamic.xdsl-line.inode.at) has joined #ceph
[16:46] * madkiss (~madkiss@91-119-68-174.dynamic.xdsl-line.inode.at) Quit (Read error: Connection reset by peer)
[16:47] * stxShadow (~jens@p4FD06914.dip0.t-ipconnect.de) has joined #ceph
[16:51] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has joined #ceph
[16:51] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) Quit ()
[16:53] <Azrael> nyerup: http://pastebin.com/Bt3yWvfV
[16:53] <Azrael> nyerup: and also /var/log/ceph/ceph*osd*26.log or something
[16:54] <Azrael> nyerup: iirc thats on node data3
[16:57] <absynth> hello there
[16:57] * markbby (~Adium@ Quit (Quit: Leaving.)
[16:57] <absynth> anyone from inktank awake?
[17:00] * eschnou (~eschnou@54.120-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[17:03] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has joined #ceph
[17:03] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has left #ceph
[17:08] * markbby (~Adium@ has joined #ceph
[17:10] <scuttlemonkey> hey absynth
[17:12] <absynth> hey there
[17:12] <absynth> ceph update today :) making best use of the holiday
[17:17] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) has joined #ceph
[17:17] <scuttlemonkey> hehe nice
[17:19] <scuttlemonkey> saras: you were the saltstack fan, right?
[17:26] * BMDan (~BMDan@ has joined #ceph
[17:27] <absynth> gnah, that starts of nicely
[17:28] <BMDan> I swear I took my meds this morning, but for the life of me, I cannot find an instance of radosgw on my cluster. Yet, to be clear, my little OpenStack instances are humming along merrily. Did radosgw get made vestigial at some point? Or am I missing it somewhere?
[17:28] <mrjack> is it safe to upgrade from 0.56.6 to 0.61? are there performane improvements?
[17:29] <BMDan> mrjack: There's a difference in mon protocols that create an upgrade quorum cliff, but notwithstanding that, I had a clean experience with it. And yes, there are performance improvements.
[17:29] <scuttlemonkey> BMDan: radosgw isn't installed as a part of the default packages
[17:29] <jmlowe> mrjack: performance improvements: yes safe: relatively
[17:29] <scuttlemonkey> it's just the s3/swift API for RESTful object storage
[17:29] <scuttlemonkey> the openstack instance stuff uses RBD
[17:29] <jmlowe> mrjack: I had about 2 minutes of lost quorum when I upgraded
[17:30] <BMDan> scuttlemonkey: Yes, I installed it manually thinking it was required in my 0.56 cluster. But I never really *did* anything with it. So perhaps it was never running? If not, at the risk of asking something dumb—should it be running?
[17:30] * alram (~alram@ has joined #ceph
[17:30] <scuttlemonkey> BMDan: only if you want an s3/swift API for object storage stuff
[17:30] <absynth> hrm
[17:30] <jmlowe> mrjack: my only caveat, do not use the pg split feature until 0.61.1
[17:30] <BMDan> Okay, so if I don't use Swift—and I don't—I can just leave well enough alone? That would be a significant load off. Okay, cool. Thanks. :)
[17:30] <absynth> our 0.56.6 freshly restarted osds use a massive amount of CPU
[17:30] <mrjack> ok
[17:30] <mrjack> mh
[17:31] <scuttlemonkey> BMDan: yar :)
[17:31] <mrjack> i have seen errors with 0.56.6 where one osd marks out all other osds as down
[17:31] <BMDan> absynth: User space, kernel, or kernel-IO?
[17:31] <mrjack> is that fixed in 0.61?
[17:32] <mrjack> absynth: also observed this, ceph-osd took several minutes before osd was added to the osdmap again
[17:33] <scuttlemonkey> absynth: had a few people in here upgrading yesterday...didn't hear anyone mention cpu spikes for .61
[17:33] * gmason (~gmason@hpcc-fw.net.msu.edu) has joined #ceph
[17:34] <scuttlemonkey> but that's hardly an authoritative answer
[17:34] <absynth> we are upgrading TO 0.56.6
[17:34] <absynth> not to .61
[17:34] <sagewk> this is the bobtail point release
[17:34] <scuttlemonkey> absynth: oh, sry...saw mrjack's "is this fixed in 0.61" and parsed that as yours :P
[17:35] <mrjack> :)
[17:35] <mrjack> i am on 0.56.6
[17:36] <mrjack> i did the upgrade two days ago iirc... but i see sometimes errors where only one osd is left and stating that all other osds are down
[17:36] <mrjack> all qemu guests using rbd crash when they are connected to the ceph-mon only seeing one osd
[17:40] <mrjack> scuttlemonkey: but that's why i asked if it is save to upgrade to 0.61
[17:40] <mrjack> scuttlemonkey: i hope my problems with my monitors go away
[17:40] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has joined #ceph
[17:40] * fghaas (~florian@91-119-68-174.dynamic.xdsl-line.inode.at) has left #ceph
[17:41] <scuttlemonkey> I heard someone mentioning a similar issue of all osds getting marked down just before I left last night
[17:41] <BMDan> mrjack: Do all monitors agree on the map versions if queried individually?
[17:41] <scuttlemonkey> but I didn't see context or resolution
[17:41] <BMDan> mrjack: Also, as a workaround, you can set nodown/noout.
[17:43] * jskinner_ (~jskinner@ has joined #ceph
[17:43] * jskinner (~jskinner@ Quit (Read error: Connection reset by peer)
[17:44] <mrjack> it looks like there is no quorum, i saw this happen when there are elections running.. then i see in the logs osd.X wrongly marked me down a few times... maybe my servers were too overloaded.. i only see this when there is load... (e.g. backup jobs or benchmarks within qemu guests using rbd..)
[17:45] <scuttlemonkey> looks like the discussion last night was paravoid:
[17:45] <scuttlemonkey> http://irclogs.ceph.widodh.nl/index.php?date=2013-05-08
[17:45] <scuttlemonkey> around 20:28
[17:45] <scuttlemonkey> his case was a memory leak causing swaths of osd to get marked down it seems
[17:48] <mrjack> hmm
[17:48] <scuttlemonkey> doesn't appear to be any real resolution for you though :(
[17:52] <mrjack> i have 5 mons, mon.0 won election and got quorum, then for unknown reason, new election starts... clients connected to mon.0 now somehow see for short period of time: 1/$numoftotalosds are down.. all pg stuck degraded.. after election completes, the osds which marked all other osds down then does recovery (not long, maybe 1 minute or two) and everything is working again, except clients connected to mon.0 (rbd qemu guests) got kernel panic...
[17:53] <sagewk> kernel panic in the guest? what was the dump?
[17:54] * madkiss1 (~madkiss@91-119-68-174.dynamic.xdsl-line.inode.at) Quit (Quit: Leaving.)
[17:55] <BMDan> mrjack: I can get the "wrongly marked me down" message pretty easily by running heavy benchmarks, so that makes sense. There are tunables related to this—number of missed beacons, etc. You may need to adjust those.
[17:57] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[17:57] <mrjack> sagewk: wait..
[17:58] <mrjack> sagewk: http://office.smart-weblications.net/panic_screenshot.png
[17:59] <mrjack> BMDan: where can i adjust that values? where can i see what is currently set?
[18:01] * ScOut3R (~ScOut3R@ Quit (Ping timeout: 480 seconds)
[18:04] <mrjack> BMDan: could yo point me to some info about those tunables?
[18:05] <BMDan> http://ceph.com/docs/master/rados/configuration/mon-osd-interaction/
[18:08] * fabioFVZ (~fabiofvz@ Quit (Remote host closed the connection)
[18:12] * mohits (~mohit@ Quit (Ping timeout: 480 seconds)
[18:16] <mrjack> thank you
[18:20] <Kioob> Hi
[18:20] <Kioob> is there a way from a file "rb.0.1b0e54.238e1f29.000000000129__277c_F49FCE63__3" to find the corresponding RBD image ?
[18:20] <Kioob> here I see the prefix
[18:21] <Kioob> I can get the prefix of earch RBD image and compare, but is there an other way ?
[18:22] <mrjack> BMDan: hm should i adjust osd min down reporters to a higher number.. maybe 3 if i have 7 osds?
[18:22] * Tamil (~tamil@ has joined #ceph
[18:30] * rturk-away is now known as rturk
[18:38] * stxShadow (~jens@p4FD06914.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[18:41] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:43] * gmason (~gmason@hpcc-fw.net.msu.edu) Quit (Ping timeout: 480 seconds)
[18:59] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:03] <sagewk> pushing out v0.61.1 now
[19:05] <Kioob> great !
[19:05] <Kioob> I was waiting for the .1 :p (I dislike .0 release)
[19:06] <tnt> what's changed ?
[19:08] <cjh_> woot, .1 is when it's time to roll it out :D
[19:10] <tnt> first thing on my schedule for monday :p
[19:10] * yehudasa (~yehudasa@2607:f298:a:607:c018:c2d6:a679:684d) Quit (Ping timeout: 480 seconds)
[19:11] * Vjarjadian (~IceChat77@ Quit (Quit: Take my advice. I don't use it anyway)
[19:11] <scuttlemonkey> tnt: http://ceph.com/releases/v0-61-1-released/
[19:11] <tnt> scuttlemonkey: thanks. it wasn't there yet when I looked :p
[19:12] <pioto> excellent... the CEPH_MDS_OP_* constants seem to already have "write op" encoded in them... (when they're & 0x001000)... that'll make my job easier-er
[19:12] * sjusthm (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[19:12] <cjh_> i ran into a little issue where i have one monitor that keeps complaining about ulimits and won't start.
[19:12] <cjh_> on 61.0
[19:15] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) has joined #ceph
[19:18] * rustam (~rustam@ has joined #ceph
[19:18] * yehudasa (~yehudasa@2607:f298:a:607:b1ff:b5ec:8c2f:bffd) has joined #ceph
[19:21] * BillK (~BillK@58-7-220-225.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[19:21] * nhm (~nhm@65-128-150-185.mpls.qwest.net) has joined #ceph
[19:27] * tziOm (~bjornar@ti0099a340-dhcp0870.bb.online.no) has joined #ceph
[19:33] <nlopes> is there anyway I can restart a conversion for the monitor store? upgrade to 0.61 gone bad :D
[19:36] * rustam (~rustam@ Quit (Remote host closed the connection)
[19:38] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[19:40] <cjh_> it seems like from testing with a jbod setup that ceph seems to only be able to pin about 1/4th of the drives in my 12 drive host. Has anyone else seen this behavior? It'll pin 4 drives while the others sit almost idle.
[19:41] <nhm> cjh_: often that means that all of the outstanding operations are backed up on a couple of OSDs.
[19:41] <cjh_> ah, interesting
[19:41] <cjh_> i wonder how i can dig into that further to see what is going on
[19:42] <nhm> cjh_: if you poll the admin sockets for each OSD and look at how many outstanding ops there are, you can see which ones are backing up.
[19:42] <cjh_> do you think the hardware is backed up or that the software is waiting for the osd's to return?
[19:42] <cjh_> interesting
[19:42] <nhm> If it's always the same ones, it might be a bad disk. If it changes, it might be something else.
[19:42] <cjh_> it changes. i have a dstat 'picture' with colors illustrating it
[19:42] <sagewk> cjh_: also check crush weights; 'ceph osd tree'
[19:43] <cjh_> i made sure my pool was a power of 2 also. i heard that could mess things up if it wasn't
[19:43] <berant> speaking of the admin socket, is there anything that documents what the individual metrics report (what the significance of the metric is, not the schema)
[19:43] <cjh_> sagewk: looks like all osd's are weighted at 1
[19:44] <nhm> cjh_: I feel like I'm a broken record with how much I keep complaining about expanders, but I still suspect that ceph's workload makes certain brands of expanders improperly distribute data over the avialible SAS links.
[19:44] <cjh_> nhm: yeah you're probably right
[19:44] <cjh_> i believe i'm on an LSI but i need to confirm
[19:44] <nhm> No real evidence to make that claim other than that I've seen a lot more machines without expanders perform well than machines with expanders.
[19:45] <cjh_> that's interesting
[19:45] * Dark-Ace-Z (~BillyMays@ has joined #ceph
[19:45] <cjh_> i wonder why it's doing that
[19:45] <cjh_> nhm: yeah i'm on an LSI megaraid 2108
[19:45] <nhm> cjh_: Yeah, that's like a 9260 or H700 controller. The interesting bit is the expanders though. Any idea what brand?
[19:46] <cjh_> i'm not sure
[19:46] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[19:46] <cjh_> i think it's a dell perc 700
[19:47] <nhm> cjh_: what systems are you using again?
[19:47] <cjh_> they're dell branded but i don't remember what version they are
[19:47] <nhm> ok
[19:48] * DarkAce-Z (~BillyMays@ Quit (Ping timeout: 480 seconds)
[19:48] <cjh_> i think you guys have a lot of experience with dell right?
[19:49] <nhm> cjh_: We've got some R515s and some 1U dells in-house. Working on testing some of the C-series gear too, but don't have any ourselves.
[19:50] <cjh_> gotcha
[19:50] <cjh_> i wish pastebin had color support. this dstat output makes it super easy to see what is going on
[19:50] <cjh_> i'll have to do some netstat commands to see which osd admin ports are backing up
[19:52] <cjh_> nhm: do you know what port range the admin uses?
[19:52] <cjh_> or how i can pool the admin sockets
[19:53] <via> i just updated to cuttlefish and get when i restart the mons: failed: 'ulimit -n 131072; /usr/bin/ceph-mon -i 1 --pid-file /var/run/ceph/mon.1.pid -c /etc/ceph/ceph.conf '
[19:53] * gmason (~gmason@hpcc-fw.net.msu.edu) has joined #ceph
[19:53] <cjh_> via: i get the same thing but it says 8192 instead of 131072
[19:53] <cjh_> it's only mon.b that does that. a and c are fine for some reason
[19:54] <via> none of my mons start
[19:54] <cjh_> ah, well that's a little worse haha
[19:55] <cjh_> nhm: i found it on sebastien's website: http://www.sebastien-han.fr/blog/2012/08/14/ceph-admin-socket/
[19:58] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[19:58] <via> https://pastee.org/twdtn
[19:58] <via> i was hoping there was a simple fix to this
[19:59] <cjh_> via: if you find out let me know. I need to fix this also
[20:00] <nhm> cjh_: yeah, that hsould work
[20:01] <nhm> cjh_: I just make a little for loop in bash to walk through all of hte sockets on the server and then run them across all of the OSD nodes with pdsh.
[20:01] <cjh_> nhm: now i just need to get it to query all 240 osd's at once haha
[20:01] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[20:01] <cjh_> yeah pdsh sounds good
[20:01] <via> is it a bad idea to downgrade back to bobtail?
[20:02] <cjh_> via: are you ubuntu or centos?
[20:02] <cjh_> i'm on centos
[20:02] <via> scientific, but still el6
[20:02] <cjh_> ok maybe we have a common theme here then
[20:03] <BMDan> mrjack: If you're sitll here (and still care), the answer is, I dunno. I've never had to adjust them; I just know that they're there to be adjusted. I literally haven't read the descriptions.
[20:04] <BMDan> cjh_: I'm using an LSI2808 in a SuperMicro box to drive twelve spinners and two SSDs, and they move right along, I can assure you.
[20:04] <cjh_> BMDan: can it pin all the drives at once?
[20:07] <joao> via, that happens when you have a lingering 'store.db/' on mon data
[20:08] <joao> and that may happen when the store conversion was for some reason interrupted, either due to a monitor crash or via user intervention
[20:08] <joao> try 'mv store.db store.db.old' and rerunning the monitor
[20:08] <joao> if something weird happens let us know
[20:09] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[20:10] <BMDan> cjh: rados bench can; OpenStack can't, but that's a deficiency somewhere in the libvirt stack, I assume.
[20:10] <cjh_> joao: i have the same prob. so should i just blow away that store.db/ dir?
[20:10] <via> janos: okay, i'll try
[20:11] <cjh_> BMDan: mind sharing your ceph.conf? I'm curious if there's any major differences from mine
[20:11] <BMDan> cjh_: You'll be powerfully disappointed, but sure. ;)
[20:11] <cjh_> defaults?
[20:11] <cjh_> lol
[20:12] <joao> cjh_, try simply moving it out of the way
[20:12] <BMDan> cjh_: http://pastebin.ca/2377050
[20:12] <joao> don't blow it away until you know this does indeed solve the issue
[20:13] <Kioob> mmm I can't build ceph 0.61.1 on Squeeze
[20:13] <via> janos: https://pastee.org/ycf4
[20:13] <cjh_> BMDan: ok yes i'm disappointed lol.
[20:13] <cjh_> joao: will do. :)
[20:14] * kyle_ (~kyle@ has joined #ceph
[20:14] <Kioob> when I try to compile for Squeeze, I obtain : http://pastebin.com/8Q4kNVHz
[20:14] <cjh_> joao: that did it! You're awesome
[20:14] <joao> If I were that awesome I probably would have added that info to the error message :p
[20:15] <joao> cjh_, glad it worked
[20:15] <joao> :)
[20:15] <Kioob> I suppose the problem come from :
[20:15] <Kioob> pbuilder-satisfydepends-dummy: Depends: libleveldb-dev which is a virtual package.
[20:15] <Kioob> Depends: libsnappy-dev which is a virtual package.
[20:15] <BMDan> cjh_: Couple things. One, do you have an unusually small number of PGs? Two, how are you testing, and is there a way for you to test that's closer to the hardware? Specifically, with question two, I'm wondering where the issue appears—can you run multiple ceph osd benches and see if that manifests the problem in iostat, for example?
[20:15] <Kioob> ok... I have to backport libleveldb then
[20:16] <joshd> Fetch: cool, let me know if you find any issues with them
[20:16] <nhm> BMDan: Do you have expanders in your nodes?
[20:17] <BMDan> The LSI card thinks so (enclosure 252 vs. enc 0), but the reality is that I do not.
[20:17] <cjh_> BMDan: i'm testing with 1 rados bench cmd on each host. It's all within 1 rack. I have 240 osd's and 8192 set for my PG's. I thought that was a sane number
[20:18] <via> joao: i don't suppose that log dump helps or that there's anything else i can do
[20:18] <nhm> cjh_: With 240 osds you may have a slightly unbalanced distribution with only 8k PGs.
[20:19] <cjh_> nhm: would you suggest i bump it up? I'm using replica=3
[20:20] <nhm> cjh_: you could try 32k
[20:20] <cjh_> wow that's a big jump :)
[20:20] <cjh_> i'll see what that does
[20:21] <nhm> cjh_: probably won't make a really big difference.
[20:21] * kylem (~kyle@ Quit (Ping timeout: 480 seconds)
[20:22] <cjh_> joao: so my mon came back up when i moved that store.db directory but it won't join the cluster for some reason. ceph -s says it's still down after waiting a few mins
[20:23] * rturk is now known as rturk-away
[20:24] <mrjack> can i upgrade from 0.56.6 to 0.61.1 on squeeze or do i have to update to wheezy first?
[20:29] * rustam (~rustam@ has joined #ceph
[20:30] <joao> via, did the conversion failed again?
[20:31] * rustam (~rustam@ Quit (Remote host closed the connection)
[20:31] <joao> cjh_, can you pastebin your log?
[20:32] <cjh_> joao: sure one sec
[20:34] <pioto> hi. are there docs somewhere on setting up a bare minimum cluster from a local git build, for testing purposes? i just want 1 mon, 1 osd, 1 mds, so i can hook up gdb and poke things...
[20:34] <pioto> ceph-deploy or mkcephfs seem like overkill
[20:35] * rustam (~rustam@ has joined #ceph
[20:35] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[20:35] <joshd> pioto: from the git tree's src dir, create dev and out directories, and run ./vstart.sh -n -x
[20:36] * rustam (~rustam@ Quit (Remote host closed the connection)
[20:38] <BMDan> cjh_: Back-of-napkin calcs suggest at least 12k PGs for 240 OSDs.
[20:39] <BMDan> Try ceph osd dump -o - | grep pool
[20:39] <BMDan> to verify that the PG is set properly on the pool you're using.
[20:39] <BMDan> Because if you're using the wrong pool, you could have as few as 8 PGs configured.
[20:42] * firdadud (~oftc-webi@p4FC2C72B.dip0.t-ipconnect.de) has joined #ceph
[20:42] <pioto> joshd: thanks
[20:42] <cjh_> BMDan: ok will do :)
[20:45] <mrjack> hm
[20:46] <mrjack> this is interesting... does it mean my rbd pool with 256PGs has too few PGs? i use 7 OSD with 1 TB each...
[20:46] * mjblw (~mbaysek@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[20:47] <mjblw> The release notes mention upgrading from 0.56.4 to cuttlefish. Can 0.56.3 be upgraded directly to cuttlefish?
[20:49] * bergerx_ (~bekir@ Quit (Quit: Leaving.)
[20:50] <BMDan> mrjack_: IIRC, the guidance is 50 * spinners.
[20:51] <BMDan> mrjack_: So 256 is low, but not "too low".
[20:51] <BMDan> To the extent that that phrase has any meaning.
[20:51] <mrjack> hmm
[20:51] <mrjack> o
[20:51] <mrjack> ok
[20:52] <mrjack> how can i see to which PGs a rbd image maps?
[20:53] <BMDan> uhhh
[20:53] <BMDan> like
[20:53] <nhm> our official docs say 50-100 it looks like: http://ceph.com/docs/master/rados/operations/placement-groups/
[20:53] <BMDan> ceph osd map volumes volume-somevolumeUUID
[20:53] <BMDan> But I doubt that's actually what you want.
[20:55] <mrjack> hm
[20:56] <paravoid> anyone around?
[20:56] <mrjack> i'd like to know how a rbd image is distributed on the osds ;)
[20:56] <tnt> mrjack: it's split in 4Mb chunks each chunk is 1 rados object
[20:56] <dmick> mrjack: rbd info to get object prefix
[20:57] <dmick> filter rados ls to get the list of objects
[20:57] <dmick> osd map to see the pg mapping
[20:57] <dmick> ceph osd map
[20:57] <mrjack> ceph osd map
[20:57] <mrjack> unknown command map
[20:57] <mrjack> hm
[20:57] <mrjack> does not work with 0.56.6?
[20:58] <dmick> should
[20:58] <tnt> ceph pg dump will also show pg->osd mapping
[20:58] <dmick> ceph osd map <pool> <objname>
[20:59] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) Quit (Quit: Leaving.)
[21:00] <mrjack> it would be cool if rbd info <image> would show this information, too
[21:00] <mrjack> rbd info ocfs2
[21:00] <mrjack> rbd image 'ocfs2':
[21:00] <mrjack> size 128 GB in 32768 objects
[21:00] <mrjack> order 22 (4096 KB objects)
[21:00] <mrjack> block_name_prefix: rb.0.1
[21:00] <mrjack> format: 1
[21:00] <mrjack> ceph osd map rbd rb.0.1
[21:00] <mrjack> osdmap e3552 pool 'rbd' (2) object 'rb.0.1' -> pg 2.5aeb10b0 (2.b0) -> up [5,3] acting [5,3]
[21:00] <mrjack> so this image is only using one pg?
[21:02] <dmick> rb.0.1 is not an object
[21:02] <dmick> it's an object prefix
[21:02] <dmick> (11:57:03 AM) dmick: filter rados ls to get the list of objects
[21:04] <mrjack> ah ok
[21:05] <mrjack> hm so when i do rados ls -p rbd |grep 'rb.0.1.' i would see all objects?
[21:10] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[21:11] <Fetch> mrjack: oh man, I know that was just an example, but are you really using ocfs2?
[21:13] <firdadud> fetch: whats the alternative?
[21:14] <Fetch> firdadud: I haven't been in the multiwriter san filesystem market for a while. It was pretty awful, though. I was going to give mrjack my sympathies if it was still as bad :)
[21:14] <mrjack> Fetch: yes
[21:15] <mrjack> Fetch: i feel like not having a better option than using ocfs2 at the moment for distributing qemu guest config files..
[21:15] <tnt> Fetch: speaking of multiwriter, is there a good 'single writer' ? i.e. tens of machines mounting the RBD but being 'read only' and a single one that can write.
[21:15] <firdadud> fetch: yes it's still awful... especially with oracle stuff... but there's no alternative GFS2 is even more awful
[21:15] <Fetch> tnt: I'm a Ceph noob, sorry I have no useful answer to that
[21:16] <mrjack> tnt: you need cluster aware fs even for reads, except you do not use fscache at all
[21:16] <tnt> well, I was looking for something other than gfs2/ocfs2 in the simplified case where there is only 1 writer ...
[21:16] <Fetch> mrjack: ouch, you got my sympathies. I was using it for shared VM storage with Xen back in 2007...buggy as crap
[21:17] <Fetch> tnt: yeah I get the need, I just don't have an answer. I understand why normal filesystems don't work well for that
[21:17] <mrjack> Fetch: it works stable for this purpose backed up by rbd... but is evil if ceph hangs or io to rbd hangs for more than 60 secs... nodes panicing and fencing.... :(
[21:17] <firdadud> mrjack: consider nfs ...
[21:18] <mrjack> firdadud: i tried, but nfs is worse if there is heavy io etc, i get tons of "nfs server not responding" on clients, making my pacemaker havoc, etc etc...
[21:19] <firdadud> mrjack: heavy io for config files?
[21:19] <mrjack> firdadud: glusterfs sometimes doesn't get updates to files right on all nodes and caches old configs...
[21:19] <mrjack> firdadud: i use nfs also for installing clients with FAI, booting pxe etc
[21:19] <cjh_> joao: looks like my b monitor just spins after starting it up. I moved the store.db file and it starts but doesn't join the cluster. Logs here: http://fpaste.org/11341/36812715/
[21:20] <firdadud> mrjack: yes multi accessable FS is a mess...
[21:21] <mrjack> firdadud: yeah, so ocfs2 does it in the meanwhile until cephfs is ready ;)
[21:21] <firdadud> mrjack: same applies to me ;-)
[21:21] <firdadud> mrjack: even tough i'm using it for webserver content sharing
[21:22] <mrjack> i have apache on ocfs2, too
[21:22] <saras> http://paste.ubuntu.com/5648813/ it is done holy crap ceph will build on PI
[21:23] <mrjack> ocfs2 on rbd gives less headache than using gluterfs
[21:23] <mrjack> s/gluterfs/glusterfs/
[21:23] <saras> see you guys when time to setup up ceph have fun
[21:23] <firdadud> mrjack: yeah my experiences with glusterfs AND GFS2 were even more worse than with ocfs2
[21:23] * saras (~kvirc@74-61-8-52.war.clearwire-wmx.net) Quit (Quit: KVIrc 4.1.3 Equilibrium http://www.kvirc.net/)
[21:28] <mrjack> firdadud: are you using 0.61.1
[21:28] <mrjack> ?
[21:29] <firdadud> mrjack: no still on 0.56.6 i'll wait until the level db trim / shrink stuff is def. solved. still seeing reports with problems when having a lot of small i7o using rbd
[21:29] <mrjack> firdadud: which qemu version do you use?
[21:30] <firdadud> mrjack: qemu 1.4.1 with cherry-picked async patch - why?
[21:30] <mrjack> firdadud: i upgraded to ceph 0.56.6 and qemu 1.4.50 to make use of aio_flush, but now i see some guests having oopses
[21:30] <jmlowe> was somebody working on a fuse driver for rbd?
[21:31] <sjusthm> jmlowe: there is one lying around somewhere
[21:32] <jmlowe> my life would get a lot better if I could have a fuse driver that presented the snapshot deltas as files to use with my tivoli backup clients
[21:33] <firdadud> mrjack: qemu 1.4.50?
[21:33] <mrjack> firdadud: latest master
[21:33] <firdadud> mrjack: why don't you use 1.4.1 and then just add the patch? it applies and works cleanly
[21:33] <mrjack> firdadud: i would, but i'm relativly new git user and don't know how :-P
[21:34] <firdadud> git checkout v1.4.1; git cherry-pick dc7588c1eb3008bda53dde1d6b890cd299758155
[21:34] <firdadud> mrjack: that's it
[21:35] <mrjack> thanx, i'll try
[21:36] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[21:36] <firdadud> mrjack: no problem
[21:36] * Dark-Ace-Z is now known as DarkAceZ
[21:37] <joao> cjh_, thanks; I'll take a look in the next hour or so
[21:37] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[21:37] <cjh_> ok
[21:37] * Tamil (~tamil@ Quit (Quit: Leaving.)
[21:51] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[21:54] * Tamil (~tamil@ has joined #ceph
[21:57] * markbby (~Adium@ Quit (Quit: Leaving.)
[21:58] <Kioob> Is it possible to have source of this package : http://ceph.com/debian-cuttlefish/pool/main/l/leveldb/libleveldb1_1.9.0-1~bpo60+1_amd64.deb
[21:58] <Kioob> I can't rebuild the one from Debian, since it depends of multiarch
[21:58] <sjusthm> it's probably cribbed from the leveldb git tree
[21:59] <Kioob> ok, I check
[22:01] <Kioob> there is no "debian" directory on official sources
[22:01] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[22:04] <firdadud> Kioob: why do you want to rebuild them?
[22:05] <Kioob> because the squeeze package, found on "ceph.com" repository doesn't have syncfs support
[22:06] <Kioob> I use squeeze with recent kernels, and I have multiple OSD per host. I need syncfs support.
[22:06] <firdadud> Kioob: yes so you need to rebuild ceph but do you also need to rebuild leveldb? Does leveldb support syncfs? Then i need the same ;-)
[22:07] <firdadud> Kioob: also using squeeze mit recent kenel and multiple osds per host
[22:07] <Kioob> I need libleveldb-dev to rebuild ceph, which is not available on ceph.com
[22:07] <Kioob> and the one from Debian seems to require multiarch
[22:08] <firdadud> Kioob: oh it is avilable
[22:08] <firdadud> wait
[22:08] <firdadud> Kioob http://gitbuilder.ceph.com/leveldb-deb-x86_64/libleveldb-dev_1.9.0-1~bpo60+1_amd64.deb
[22:08] <Kioob> !!!
[22:08] <Kioob> great
[22:08] <firdadud> Kioob: have fun ;-)
[22:08] <Kioob> thanks a lot !
[22:09] <firdadud> Kioob: no problem i'm on the same boat
[22:09] <Kioob> and there is also sources
[22:10] <firdadud> Kioob haven't checked
[22:10] <firdadud> Kioob good night will go to bed
[22:10] <Kioob> thanks again ;)
[22:11] <firdadud> Kioob: you can also install miltiarch on squeeze it's working fine... just download the wheezy deb and install
[22:11] * newbie (~kvirc@74-61-8-52.war.clearwire-wmx.net) has joined #ceph
[22:11] * newbie is now known as saras
[22:13] * firdadud (~oftc-webi@p4FC2C72B.dip0.t-ipconnect.de) Quit (Quit: Page closed)
[22:13] <saras> you know their only one issue with google summer of code it is only in the summer
[22:14] * ScOut3R (~ScOut3R@business-80-99-161-73.business.broadband.hu) has joined #ceph
[22:14] <Fetch> the assumption being that student coders will be busy with schoolwork during the rest of the year :)
[22:15] <saras> Fetch: I go year round that does not change
[22:16] <saras> just realised how much missed the deal line by
[22:16] <Fetch> well then, let's say the original target of the SoC was university comp sci students who didn't have heavy classes during the summer, only in fall/spring
[22:17] <saras> will i cal university comp sci stundent
[22:17] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[22:18] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[22:18] <saras> am university comp sci student and my school goes year round
[22:20] <saras> will i still think can get 2 more years
[22:21] <saras> does ceph have in project this year
[22:21] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[22:22] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit ()
[22:22] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[22:30] * berant (~blemmenes@sslvpn.ussignalcom.com) Quit (Quit: berant)
[22:30] <Fetch> out of scope for this channel, but if anyone uses radosgw as a swift or s3 endpoint: if I don't have an extra server to make the endpoint with apache/fastcgi, should it work fairly well with a name based vhost?
[22:34] * jjgalvez1 (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[22:39] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:40] <Kioob> � 8 scrub errors �
[22:40] <Kioob> outch...
[22:40] <via> so is it safe for me to roll back ceph after the monitor crash when upgrading to cuttlefish?
[22:40] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[22:40] <via> the crash being during the convesion to the new format
[22:40] <dmick> Fetch: it'll certainly plumb, but my impression is that the server uses a fair bit of network bw
[22:40] * Cube (~Cube@ has joined #ceph
[22:41] <dmick> at least that's my half-informed take
[22:41] <dmick> saras: how goes the Pi port?
[22:43] <saras> dmick: the build competed
[22:43] <dmick> excellent news
[22:44] <saras> I need learn how to package but that not going to happen to day
[22:45] <dmick> oh it would be a long time before I worried about packaging
[22:45] <dmick> look at vstart.sh for a way to quickly bring up a test cluster from the build directory
[22:46] <saras> so don't need to install/build on each node
[22:47] <dmick> you can run a cluster with one monitor and one osd on the same machine, using directories for storage, yes
[22:47] <dmick> vstart by default starts 3 mon, 1 osd, and 3 mds, but you can be even smaller. It's a much faster build/test/edit cycle
[22:48] <dmick> very interested in the changes you had to make, too; you should consider making a fork on github and pushing your work there once it's in reasonable shape
[22:49] <saras> I have I will write some stuff for the readme about the issue and how got pass.
[22:49] <saras> did you see the last paste
[22:51] <saras> dmick: http://paste.ubuntu.com/5648813/
[22:51] <saras> oh i have a bug with apt-get to file
[22:52] <saras> or aleast research
[22:54] * drokita1 (~drokita@ has joined #ceph
[22:58] * madkiss (~madkiss@2001:6f8:12c3:f00f:7c02:affe:6556:164d) has joined #ceph
[22:59] * drokita (~drokita@ Quit (Ping timeout: 480 seconds)
[22:59] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[22:59] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[23:00] * madkiss1 (~madkiss@2001:6f8:12c3:f00f:c6d:6b04:701b:2b1b) has joined #ceph
[23:02] * drokita1 (~drokita@ Quit (Ping timeout: 480 seconds)
[23:05] <Kioob> I'm trying to upgrade from Ceph 0.56.6 to 0.61.1, and one of my 5 mon crash
[23:06] * madkiss (~madkiss@2001:6f8:12c3:f00f:7c02:affe:6556:164d) Quit (Ping timeout: 480 seconds)
[23:07] <Kioob> here my logs : http://pastebin.com/20JkkPdD
[23:08] <Kioob> In function 'virtual void MDSMonitor::update_from_paxos()' thread 7f2200c50700 time 2013-05-09 23:04:30.636974
[23:08] <Kioob> mon/MDSMonitor.cc: 88: FAILED assert(version >= mdsmap.epoch)
[23:09] <Kioob> since other MON are fine, what should I do ?
[23:20] <BMDan> Kioob: Are all of your mons running 0.61?
[23:20] <Kioob> yes
[23:20] <BMDan> What does the quorum say that the mdsmap epoch is?
[23:20] * mnash (~chatzilla@vpn.expressionanalysis.com) Quit (Quit: ChatZilla 0.9.90 [Firefox 20.0.1/20130409194949])
[23:20] <derRichard> does a ceph setup with only two servers make sense? i'd like to use ceph on two servers, each with 4 disks. replication factor 1 (server a mirrors to server b)
[23:21] <BMDan> derRichard: Are you sure you don't want drbd?
[23:21] <derRichard> no
[23:21] <derRichard> :)
[23:21] <derRichard> i'm looking for cluster storage solutions
[23:22] <Kioob> BMDan: from "ceph status" command ? I see "election epoch 2298, quorum 0,1,2,3 a,b,c,e"
[23:22] * Muhlemmer (~kvirc@cable-88-137.zeelandnet.nl) Quit (Ping timeout: 480 seconds)
[23:22] <BMDan> derRichard: You need a minimum of three mons to establish redundancy, so you'd need a third "volunteer".
[23:22] <mjblw> The release notes mention upgrading from 0.56.4 to cuttlefish. Can 0.56.3 be upgraded directly to cuttlefish?
[23:22] * PerlStalker (~PerlStalk@ Quit (Quit: ...)
[23:23] <BMDan> Kioob: from ceph -s:
[23:23] <BMDan> mdsmap e535: 1/1/1 up {0=1=up:active}, 1 up:standby
[23:23] <Kioob> mdsmap e1: 0/0/1 up
[23:23] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[23:24] <derRichard> BMDan: okay. then i'd buy three servers, 4 disks each
[23:24] * tziOm (~bjornar@ti0099a340-dhcp0870.bb.online.no) Quit (Remote host closed the connection)
[23:24] <derRichard> using replication factor 2, two server can die and my setup will still work, right?
[23:24] <BMDan> derRichard: Sounds reasonable to me at that point. Each server is four OSDs, a MON, and an MDS.
[23:25] <derRichard> BMDan: yep
[23:25] <BMDan> I'm not familiar with "replication factor"; I use the term "replica count".
[23:25] <BMDan> Your term appears to mean "replica count minus one".
[23:25] <BMDan> In which case what you said is still inaccurate; if two servers die, you lose mon quorum and cannot continue.
[23:25] <derRichard> okay. then s/replication factor/replica count/g
[23:25] <derRichard> :)
[23:26] * mnash (~chatzilla@66-194-114-178.static.twtelecom.net) has joined #ceph
[23:26] <BMDan> Replica count 2 means that you can lose *ONE* server and everything stays up.
[23:26] <derRichard> but all data exists 3 times?
[23:26] <BMDan> No matter your replica count, if you lose more than 1/2 of the mons (and the number of mons must be odd), you cannot continue.
[23:26] * ScOut3R_ (~ScOut3R@business-80-99-161-73.business.broadband.hu) has joined #ceph
[23:27] <BMDan> Replica count 2 means that all data exists twice.
[23:27] <BMDan> Thus my confusion.
[23:27] <BMDan> Replica count is very simple. The number after it is the number of times data exists.
[23:27] <BMDan> Your term appears to mean something different.
[23:27] <derRichard> hmm, i thought replica count 2 means that each objects has N copies and therefore each object exists N+1 times (N copies and one origin)
[23:27] * TiCPU__ (~jeromepou@190-130.cgocable.ca) has joined #ceph
[23:28] <BMDan> Ah, exactly.
[23:28] <BMDan> You are under a bad principle.
[23:28] <BMDan> There is no "original".
[23:28] <Kioob> well the term "replica" is not clear. From me a "replica" exclude the "original"
[23:28] <BMDan> There are only replicas.
[23:28] <derRichard> okay
[23:28] <BMDan> Even the original is a replica.
[23:29] <BMDan> See http://ceph.com/docs/master/rados/operations/pools/#set-the-number-of-object-replicas
[23:29] <BMDan> The reason this is important is that, if we lose the "original", then the copy is every bit as authoritative as the original.
[23:29] <BMDan> And if the copy gets written to, then we still treat that as valid and now the original is no good.
[23:29] <derRichard> okay :)
[23:29] <BMDan> So we don't bother tracking which sets of data are original versus copies.
[23:30] <BMDan> They're *ALL* copies, of which we strive to maintain a minimum number for each object identifier.
[23:30] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) has joined #ceph
[23:30] <BMDan> Cleaner architecture. :)
[23:30] * ScOut3R (~ScOut3R@business-80-99-161-73.business.broadband.hu) Quit (Ping timeout: 480 seconds)
[23:30] <Kioob> BMDan: and for my failing mon, what should I do ? :D
[23:30] <BMDan> Kioob: That mdsmap seems very, very, very low.
[23:31] <Kioob> yes
[23:31] <BMDan> I seriously doubt e1 is the correct mdsmap.
[23:31] <Kioob> and so...
[23:31] <Kioob> mmm
[23:31] <derRichard> i'm currently exploring ceph because i need ~10tb of redundant storage. so far i'm not sure how many physical server i'll need
[23:31] <Kioob> but MDSmap is for MDS, right ?
[23:31] <Kioob> I don't use mds at all...
[23:31] <BMDan> Honestly? Not sure. Manually remove the tainted MDSes from the MDSmap and let the remainder establish quorum, then add back the tainted ones so they get the new epoch?
[23:31] <BMDan> @Kioob
[23:31] <cephalobot> BMDan: Error: "Kioob" is not a valid command.
[23:32] <dmick> saras: I had not seen that paste, but that's what you said, so, again, excellent
[23:32] <Fetch> derRichard: if you can make your answer "at least 3 physical servers" then Ceph might be an answer for you. But it's such a small storage set that other solutions might be more performant with less overhead
[23:33] <BMDan> derRichard: We are running with 96 TB of storage giving us 48 TB usable on four boxes with twelve 2 TB drives apiece (there are also some mons sitting above that on separate hardware, but that's not relevant here).
[23:33] * jskinner_ (~jskinner@ Quit (Remote host closed the connection)
[23:33] <BMDan> derRichard: So 10 TB is very, very small; that the sort of size where DRBD across two machines makes more sense, at least to me.
[23:33] <BMDan> that's the sort of*
[23:34] <Fetch> drri: I have 6 nodes (3 osd, 3 mon) and 15 TB, and the only reason I'm using Ceph is a technology demonstration to get money out of the suits
[23:34] <cjh_> rounding error :D
[23:34] <derRichard> BMDan: i'm not a drbd expert, but AFAIK it does not offer something like RBD
[23:35] * TiCPU_ (~jeromepou@190-130.cgocable.ca) Quit (Ping timeout: 480 seconds)
[23:35] <pioto> gregaf: if you have a minute... any suggestions on how to turn filepath("foo", 10000000) into "dir/foo"?
[23:35] <BMDan> derRichard: It exports a raw block device. You can then do anything with it. You can make a fish, or an airplane, or a paper tiger, or a walrus....
[23:35] <Kioob> Since I have a MON which refuse to start, I now have 4 mons. Should I shutdown one of them, to have only 3 ?
[23:35] <BMDan> Kioob: Four out of five is better than 3 out of 5. With four, you can survive one more loss.
[23:36] <derRichard> BMDan: yeah, using drbd i'd need a cluster filesystem to serve all users of the storage. which sucks
[23:36] <BMDan> Kioob: That's presumes that #5 will *EVENTUALLY* come back. If not, you should remove #5 from the map, then remove and shutdown #4.
[23:36] <derRichard> currently i have a ocfs2 setup
[23:36] <Kioob> and... what about odd number ?
[23:36] <BMDan> Odd number is number of configured mons. If it meant number of running mons, you'd be in very bad territory indeed whenever you lost a server, no? ;)
[23:37] <Kioob> well... it's about having a "quorum"
[23:38] <Kioob> if it's not running, there is no "vote" from him :S
[23:38] <BMDan> The quorum for a five-server cluster is three. You have four. This is okay.
[23:38] <BMDan> The quorum for a five-server cluster with only three running is still three. This is dangerous.
[23:38] <Kioob> ok, I understand
[23:39] <BMDan> The quorum for a three-server cluster with three running is two. This is okay.
[23:39] <Kioob> thanks :)
[23:39] <BMDan> Point being, simply shutting down #4 is bad. Removing it *AND* #5 from the map, though, is good, but no better than running in your current configuration in any meaningful way.
[23:42] <BMDan> To tie it all up with a bow: that last statement being true because, in both scenarios, you can survive the failure of only and exactly *one* more server.
[23:42] <derRichard> so, it is better to have four servers than three? (four mons and osds)? becuase then two physical server can die and my cluster still works
[23:43] <BMDan> You'd have to define five mons for that to work.
[23:43] <BMDan> At which point, you'd only have four running and can only survive the loss of one and still maintain quorum.
[23:43] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) Quit (Quit: Leaving.)
[23:43] <BMDan> If you defined three mons, then you could have four mons running, but that leads to problems that are hopefully self-evident in the statement of the scenario. ;)
[23:44] <BMDan> If you want to survive the loss of two mons, you must run five mons to begin with.
[23:46] <derRichard> i think i got it ;)
[23:46] <BMDan> Or, more generally, x*2+1, where x is the number of losses you want to be able to sustain and keep the solution up.
[23:46] <BMDan> Which even works for −1 (infinity), because it evaluates to needing −1 (infinity) servers in order to survive infinite failures.
[23:46] <BMDan> Some days I <3 math.
[23:47] <mrjack> :)
[23:48] <cjh_> any idea when multiple active metadata servers will be available? dumpling maybe?
[23:48] <BMDan> cjh_: As soon as you write the patch, presumably. ;)
[23:48] <BMDan> And with that, I'm off! Have a good night, all.
[23:49] * BMDan (~BMDan@ Quit (Quit: Leaving.)
[23:49] * ScOut3R_ (~ScOut3R@business-80-99-161-73.business.broadband.hu) Quit (Ping timeout: 480 seconds)
[23:50] <cjh_> see ya. thanks for the help
[23:50] <derRichard> so, if i have three physical servers with 4x4tb disks each and a replica count of 2, i can use ~24tb. one physical server can die (one mon per physical server). and 1/3 of all disks can fail and i'll still not lose data. is this correct?
[23:51] <Kioob> 1 MON can die, and 2/3 of all disk can fail
[23:52] <Fetch> not quite on having 5 drives fail and being assured all data is good
[23:52] <Fetch> simultaneously, anyway
[23:52] * leseb (~Adium@bea13-1-82-228-104-16.fbx.proxad.net) has joined #ceph
[23:53] <derRichard> Kioob: 2/3 disks? are you sure? i don't think so
[23:54] <Kioob> replica count of 2, so 1/2
[23:54] <Kioob> not 2/3
[23:54] <derRichard> true
[23:55] <Fetch> each object is in 2 places, so the absolute highest number of drives you can lose is 1/2 and potentially not lose any data
[23:55] <Fetch> however
[23:55] <Fetch> if you lose the wrong two drives, you can lose objects. In my understanding
[23:55] <Kioob> +1
[23:55] * BillK (~BillK@58-7-220-225.dyn.iinet.net.au) has joined #ceph
[23:56] <Kioob> so you have to set up crush rule according to that
[23:57] <derRichard> what replica count are you using?
[23:57] <Kioob> 3 and 2
[23:57] <Kioob> 2 for data that I can loss
[23:59] <derRichard> are you using (hardware)raid too?
[23:59] <Kioob> the "PGs are upgrading" step is very very long ! It's running since 20 minutes on a 1TB OSD...
[23:59] <Kioob> derRichard: no

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.