#ceph IRC Log

Index

IRC Log for 2013-08-30

Timestamps are in GMT/BST.

[0:00] <xarses> if nothing good there check /var/log/ceph/ceph.log or ceph-osd*.log
[0:00] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[0:01] * BillK (~BillK-OFT@124-148-224-108.dyn.iinet.net.au) has joined #ceph
[0:02] <alphe> logs are showing a problem within monitors communications
[0:03] <alphe> seems that the nodes can receive data but not emit data a weird thing ...
[0:04] <gregaf1> I'm confused, I thought I fixed the init-to-1 thing before leaving last night
[0:06] <xarses> alphhe: is your firewall on the OSD blocking ports 6800-7100?
[0:06] <xarses> alphe: ^^
[0:10] <sjusthm> sagewk, loicd, gregaf1: quick review on pull request 558?
[0:10] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) has left #ceph
[0:10] * sagewk (~sage@2607:f298:a:607:219:b9ff:fe40:55fe) has joined #ceph
[0:10] * ChanServ sets mode +o sagewk
[0:11] <alphe> nope normally not
[0:11] <sjusthm> sagewk (since you flapped and probably missed the message): review on 558?
[0:11] <sjusthm> one liner
[0:11] <alphe> it is more a bug you can enter a node but it can t get out until you restart the networking
[0:11] <sagewk> sjusthm: looks good
[0:11] <sjusthm> k
[0:12] <alphe> I say it is a bug because of 10 exactly the same os that issue only apears in some of them time to time ...
[0:12] <alphe> weird thing ...
[0:12] <gregaf1> sjusthm: I don't know what that data member does :(
[0:14] <alphe> in the ceph.deploy i see the prepare not the activate
[0:14] <alphe> for the osd
[0:14] <alphe> not sure it should be texted out ...
[0:15] <alphe> ok it works
[0:15] <alphe> it s cool to know about the osd activate
[0:16] <xarses> can some one help some of the finer details with the monitors?
[0:18] <jlhawn> hey everyone, what kind of write speeds to you typically see with CephFS? I'm only getting 10MB/s which makes me think that something's wrong.
[0:20] <jlhawn> I was expecting something on the order of 100MB/s since I've got 1Gbps networking on the client, and 10Gbps for each of the OSDs, MONs, and the MDS
[0:20] <gregaf1> jlhawn: it depends an infinite amount on (1) what your base RADOS cluster can do (very complicated on its own) and (2) what the test looks like
[0:20] <xarses> I need to deploy the monitors serially due to the way our puppet env works. I can use ceph-deploy new on the first node, drop the montior move to the next run ceph-deploy config pull, gatherkeys, mon create ... thats all fine. but then the ceph.conf only refrences mon_hst = fist_mon and mon_initail_members = fist_mon
[0:20] <jlhawn> gregaf1: my test is simple > 'dd if=/dev/zero of=100M.zero bs=4M count=25'
[0:20] <gregaf1> xarses: ceph-deploy is really intended to take the role of something like puppet, not to be used inside of it
[0:21] <gregaf1> for that you'd want to do the steps that ceph-deploy does from your puppet recipes
[0:21] <gregaf1> jlhawn: that's pretty slow then, yes; have you benchmarked raw rados yet?
[0:21] <jlhawn> what's the preferred method for testing the RADOS cluster performance?
[0:22] <gregaf1> and what's your cluster look like, and what client are you using?
[0:22] <alphe> bonie +++
[0:22] <alphe> bonie ++
[0:22] <jlhawn> gregaf1: no, I haven't
[0:22] <alphe> with dd the hell out
[0:22] <xarses> gregaf1, regardless i'd still have this problem, I cant deploy all of my monitors at once
[0:22] <jlhawn> I'm working on ec2. I have 3 cc1.4xlarge instances
[0:22] <xarses> and im worried about what happens when that first monitor is no longer around
[0:22] <gregaf1> xarses: yeah, but presumably puppet can generate its config file once you've already chosen all your monitors?
[0:23] <xarses> still serial
[0:23] <dmick> xarses: why would you need to deploy a monitor that goes away?
[0:23] <dmick> and serial is not fundamentally an issue
[0:23] <xarses> it explodes
[0:23] <jlhawn> gregaf1: each instance is an OSD and a monitor (3 monitors and 3 OSDs) and one is also the MDS
[0:23] <dmick> I don't know what that means.
[0:24] <sagewk> sjusthm: gregaf1: i'm changing the rollback op to roll back the user_version too. sounds ok?
[0:24] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:24] <gregaf1> jlhawn: okay, so you'd want to start with basic local tests first and see what those do; then run "ceph osd tell \* bench" and watch "ceph -w" to see what the results are
[0:24] <xarses> dmick, lets say the hardware becomes broken and i load it into a trash can. the appropiate thing is to replace it with a new node, with a new IP
[0:24] <gregaf1> sagewk: hrm, do we do that for the standard version right now?
[0:25] <sagewk> the internal version goes up
[0:25] <gregaf1> nhm, do we have a doc on basic benchmarking steps?
[0:25] <dmick> xarses: so if that happens you have to do some stuff to remove and add a new mon, but, it's doable...
[0:25] <dmick> still confused
[0:26] <gregaf1> sagewk sjusthm: well, I'm a little concerned about things going backwards just because I haven't enumerated cases
[0:27] <xarses> dmick: so my issue is around understanding what parts need to be in the ceph.conf file to keep the cluster working and what parts can be left to Paxos to deal with
[0:27] <gregaf1> why do we want to start rolling back user versions?
[0:27] <alphe> after the update from 0.61.8 to 0.67.2 my ceph s3 like radosgw stoped to work (did the regionmap update) but then the radosgw wasnt able to locate my users
[0:27] <xarses> do i need to / or should update ceph.conf with all the mon nodes?
[0:27] <sagewk> that's what radosmodel expects, and it makes sense to me.
[0:29] <gregaf1> expects in the new thing from yesterday?
[0:29] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) has joined #ceph
[0:29] <sagewk> yeah :)
[0:29] <gregaf1> I don't really want to add more complicated semantics unless we have a need for them
[0:30] <dmick> xarses: yeah, rules on exactly what of the mon variables are needed for what would be nice to have written down
[0:30] <sagewk> i think it would be a special-case to make it not do it, which makes me think we have the wrong semantics
[0:30] <gregaf1> in particular if the user version rolls back users can't do a > comparison, they have to do an == comparison
[0:30] <sagewk> in the model i mean
[0:31] <gregaf1> I don't understand, sagewk, user-visible versions currently behave as they always have
[0:31] <alphe> what does this means ? 2013-08-29 18:31:35.299371 7f6fcc541700 0 -- :/1029747 >> 20.10.10.103:6789/0 pipe(0x7f6fc80211a0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f6fc8021400).fault
[0:32] <sagewk> yeah i know, i'm suggesting that they're wrong. if you roll back to an earlier snap, it makes intuitive sense that the version would roll back too
[0:32] <MACscr> what do you guys think of running the OS for OSD nodes on usb thumb drives? That way just OSD's are on the spindles. Obviously most of the OS you would want running from ram then of course
[0:32] <joshd> not if you think of a rollback as just a writefull of old data + attrs
[0:33] <gregaf1> sagewk: that doesn't match with our ENOENT semantics, though
[0:33] <joshd> what are the intended uses of the user_version? would rgw use it for versioning?
[0:33] <alphe> MACscr Im doing that it works great
[0:33] * Steki (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[0:33] <MACscr> alphe: what OS?
[0:33] <MACscr> and did you prepare it differently?
[0:33] <alphe> MACscr ubuntu 13.04 what else ?
[0:33] <sagewk> yeah, perhaps. and i'm hitting a comical series of annoying issues making it actually do it.
[0:33] <gregaf1> joshd: it's used exactly the same way we previously used reassert_version; it's just a little more careful and is kept consistent when you move objects between pools
[0:34] <joshd> gregaf1: but what actually used reassert_version?
[0:34] <MACscr> alphe: ha, thats what i plan to run too
[0:34] <gregaf1> oh
[0:34] <alphe> Macsrc to be honnest I made a vm with the os then dded it to the thumb usb drive
[0:34] <gregaf1> yes, rgw uses it for atomic replacement and such
[0:34] <alphe> then it worked like a charm
[0:35] <MACscr> alphe: lol, why did you do it that way versus just installing it directly to the USB thumb drive?
[0:35] <sagewk> even so, though, i think the 'rewrite the old data' is an implementation-view and not a nice data model view
[0:35] <gregaf1> I think all the watch-notify users use it (including rbd?) in order to make sure they have the latest version when somebody notifies they made a change
[0:35] <alphe> as I am a bit crazy I went throught a bootp pxe deployement too just for fun and to see how it goes ...
[0:35] <alphe> Macsrc basically that saves a lot of time
[0:35] <alphe> you install the vm ... 10 minutes
[0:36] <joshd> gregaf1: watch-notify had a separate version field that was unsafe for replay and thus rbd doesn't use, I'm not sure about rgw
[0:36] <gregaf1> sagewk: yeah, but when you do a rollback you don't get to re-use the lower version numbers again
[0:36] <alphe> you dd the vm hard drive to your 10+ thumbs 2 hour for the 10 ... and for 8GB thumb drives
[0:36] <gregaf1> joshd: the version used by watch-notify was the version stored in reassert_version and it was filled in via the same mechanisms
[0:36] <sagewk> in a meeting, done in a few
[0:37] <gregaf1> just a couple special cases in the osd code for making it return the old version on watches
[0:37] <alphe> versus 5 hours with the pxe sulution since a networkin internet bandwidth is slower than a bus ...
[0:37] <xarses> dmick: so what should i configure? should I update mon_host and or mon_initial_members?
[0:37] <alphe> and I had to put a console on each node ...
[0:37] <alphe> wich is a pain in the molly
[0:37] <dmick> xarses: I think mon addr may be the thing that needs to be up-to-date for both mons and clients, unless you specify -m to the client (and maybe something else for the mon, I don't know for sure)
[0:38] <dmick> I have to defer to someone else. gregaf1? sagewk?
[0:38] <MACscr> alphe: im assuming you disabled swap and did some other prep work on maybe the logging, etc, to minimize writes on the usb drives?
[0:38] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Ping timeout: 480 seconds)
[0:38] <alphe> macscr yes ...
[0:38] <joshd> gregaf1: rbd avoids the race by establishing the watch before reading data that should be updated by a notify
[0:39] <joshd> gregaf1: especially since old osds would not work with assert_version on watches
[0:39] <xarses> dmick: gregaf1: sagewk: could I place the mon_host behind a vip for the clients?
[0:39] <alphe> loging not deactivated since I need it to see what is doing ceph ...
[0:39] <gregaf1> joshd: oh, does it just always re-read on notify and not do anything with the versions it gets back?
[0:39] <joshd> yes
[0:39] <gregaf1> ah
[0:39] * zhyan__ (~zhyan@101.83.188.97) has joined #ceph
[0:39] <alphe> what does this means ?
[0:39] <alphe> 20.10.10.103:0/1017907 >> 20.10.10.103:6789/0 pipe(0x7f2674000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f2674000e60).fault
[0:39] <alphe> when doing a ceph -s
[0:40] <alphe> first time i see that ...
[0:40] <gregaf1> well, anyway, rgw uses it to maintain the bucket indices for doing atomic deletes and things in the presence of races
[0:41] * LeaChim (~LeaChim@176.24.168.228) Quit (Ping timeout: 480 seconds)
[0:42] <Tamil1> alphe: what do you see in the monitor log?
[0:43] <alphe> that there is a competition to elect who is the boss and my third osd03 is running ceph-create-keys which seems to be a very very bad idea
[0:43] * zhyan_ (~zhyan@101.83.160.172) Quit (Ping timeout: 480 seconds)
[0:43] <alphe> any clue how i can fix that ?
[0:43] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[0:44] <Tamil1> alphe: you mean the node on which osd03 is running, is also running a ceph-mon?
[0:45] <alphe> kill yes
[0:45] <alphe> yes
[0:45] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) Quit (Ping timeout: 480 seconds)
[0:45] <Tamil1> alphe: ceph-create-keys is not supposed to be running for this long.
[0:47] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[0:48] <alphe> tamil yes ...
[0:48] <alphe> this is weird
[0:48] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[0:48] <Tamil1> alphe: are you sure the firewall is turned off on that node?
[0:48] <alphe> why my thirs monitor doesn t want to go in peon mode ...
[0:49] <alphe> tamil yes it was working
[0:49] <Tamil1> alphe: you mean it was working when you tried ceph-deploy before and not this time?
[0:49] <alphe> port 6789 is not seen from outside ...
[0:50] <alphe> on the third node ...
[0:50] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Read error: Operation timed out)
[0:51] <alphe> can I destrou that monitor then recreate it ?
[0:52] <alphe> can i transfer the keys to it ?
[0:52] * loicd1 (~loic@brln-4db8015a.pool.mediaWays.net) Quit (Quit: Leaving.)
[0:52] <alphe> ceph-deploy mon destroy node03 ?
[0:53] <Tamil1> alphe: well you can try that actually. kill ceph-create-keys, and then try mon destroy
[0:53] <Tamil1> alphe: yes
[0:53] <alphe> yep exactly what i did ...
[0:53] <alphe> hehehe
[0:53] <Tamil1> alphe: will see how it goes
[0:53] <alphe> actually the mon is still running
[0:54] <alphe> i think i will have to manually kill the monitor
[0:54] <Tamil1> alphe: sudo stop ceph-mon ?
[0:54] <alphe> ok no need it stoped on its own
[0:55] <Tamil1> alphe: oh ok, thats what mon destroy is supposed to do
[0:55] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[0:55] <alphe> now i do the mon create but shouldn t I destroy the 2 other mons too and start a new monitor process creation ?
[0:56] <Tamil1> alphe: not needed, what are you trying to do?
[0:56] <alphe> i m trying to make my third monitor understand he is not the boss in the ring ...
[0:57] <alphe> and that he don t have to mess up my ceph cluster calling for monitor elections every seconds of every minute ...
[0:57] <Tamil1> alphe: you could still add the third mon to the cluster
[0:57] <alphe> ok or since the health is ok then do my life without a third monitor :)
[0:58] <Tamil1> alphe: but i have never tried it myself , i think that would alter your ceph.conf and you may have to push ceph.conf to other nodes to sync up
[0:58] <alphe> what means this 0 -- 20.10.10.3:0/1030242 >> 20.10.10.103:6789/0 pipe(0x7f3244001d60 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f3244001fc0).fault
[0:58] <alphe> cluster d8b22d56-e374-426b-af18-7118be57418b
[0:58] <alphe> ?
[0:59] <Tamil1> alphe: yes, that would be a good option too, as far as you are testing
[1:00] <Tamil1> alphe: when and where do you see this error?
[1:00] <alphe> ceph -s
[1:00] <MACscr> alphe: well you wouldnt disable logging all together, just disable it locally and have it log to a remote central logging server
[1:00] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[1:00] <alphe> that show first a couple of XXXXX.fault
[1:00] <alphe> then the list and says all is OK ...>_<
[1:00] <Tamil1> alphe: are you trying this on a node, where ceph-mon is running?
[1:01] <alphe> ok it desapeared
[1:01] <alphe> looking at the logs i know where is was comming from
[1:01] <alphe> the mds didnt like to have a mon that claims to be the master since it already have one ...
[1:02] <alphe> since the third monitor was destroyed the ceph-cluster stabilised and now it is all happy cool and dandy
[1:02] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[1:02] <alphe> lets try to be freaky and create a thrid node ?
[1:02] <alphe> third monitor
[1:02] <xarses> dmick, gregaf1, sagewk: https://gist.github.com/xarses/6384459#file-example-deployment
[1:03] <Tamil1> alphe: ceph —admin-daemon /var/run/ceph/ceph-mon.*.asok mon_status?
[1:03] * rturk is now known as rturk-away
[1:03] <Tamil1> alphe: well, you can always try adding a new mon to the running cluster. upto you
[1:04] <alphe> i did the ceph-deploy mon create node03 so now the ceph -s show that there is an election process
[1:04] <alphe> hehehehe for the moments no fights ...
[1:04] <Tamil1> alphe: kool
[1:06] <alphe> ok so today we learnt that if we have a rebel monitor we can ceph-deploy destroy that punk and then after the ceph -s is showing a stabilised ceph cluster we can recreate it
[1:08] * mschiff (~mschiff@85.182.236.82) Quit (Remote host closed the connection)
[1:10] <bstillwell> Is there a way to add more mon's with ceph-deploy if I initially created the cluster with just one server in the 'new' command?
[1:11] <bstillwell> I'm running into this bug: http://tracker.ceph.com/issues/4924
[1:11] * rturk-away is now known as rturk
[1:13] <alphe> bstillwell yes
[1:13] * AfC (~andrew@2407:7800:200:1011:5910:8716:3d3c:bdb7) has joined #ceph
[1:13] <alphe> bstillwell you can try from you admin server in the local dire where you use ceph-deploy to do a ceph-deploy mon create name of the node
[1:14] <alphe> bstillwell you can try from you admin server in the local dire where you use ceph-deploy to do a ceph-deploy mon create name-of-the-node-that-receives-the-new-monitor
[1:14] <alphe> that should work
[1:14] <alphe> without pain ...
[1:14] * fireD_ (~fireD@93-142-249-127.adsl.net.t-com.hr) has left #ceph
[1:14] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has left #ceph
[1:14] <bstillwell> alphe: It has problems starting the mon:
[1:14] <bstillwell> [ceph_deploy.mon][ERROR ] Failed to execute command: /sbin/service ceph start mon.den2ceph001
[1:15] * thelan (~thelan@paris.servme.fr) Quit (Ping timeout: 480 seconds)
[1:15] <alphe> strange ...
[1:15] <bstillwell> and if I login to the server it has a 'ceph-create-keys -i den2ceph001' for each time I run the create command
[1:16] <alphe> ok so kill the create-keys
[1:16] <alphe> then do a destroy then try again ?
[1:16] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) Quit (Ping timeout: 480 seconds)
[1:16] <xarses> create-keys blocks for the service starting
[1:16] <bstillwell> ahh, haven't tried the destroy yet
[1:16] <xarses> there should be a log stating why the service didn't start or establish quorum
[1:16] <alphe> bstillwell kill the create-keys then manually start it
[1:17] <xarses> if you run create-keys yourself it will tell you why it thinks it's waiting
[1:17] <bstillwell> This is what I get running it manually:
[1:17] <bstillwell> admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[1:17] <bstillwell> INFO:ceph-create-keys:ceph-mon admin socket not ready yet.
[1:18] * thelan (~thelan@paris.servme.fr) has joined #ceph
[1:18] <xarses> so ceph-mon didn't start
[1:18] * tnt (~tnt@109.130.110.3) Quit (Ping timeout: 480 seconds)
[1:18] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[1:19] <xarses> there should be a log of the command line passed
[1:20] <bstillwell> This one?:
[1:20] <bstillwell> 2013-08-29 17:16:46,416 [den2ceph001][INFO ] Running command: ceph-mon --cluster ceph --mkfs -i den2ceph001 --keyring /var/lib/ceph/tmp/ceph-den2ceph001.mon.keyring
[1:20] <xarses> in my case "ulimit -n 8192; /usr/bin/ceph-mon -i controller-18 --pid-file /var/run/ceph/mon.controller-18.pid -c /etc/ceph/ceph.conf"
[1:22] * Kioob (~kioob@2a01:e35:2432:58a0:21e:8cff:fe07:45b6) has joined #ceph
[1:22] <xarses> ok so try running that again
[1:22] <bstillwell> Which log file is that in? ceph.log where ceph-deploy is run? or on den2ceph001?
[1:25] <xarses> i cant remember
[1:29] <sagewk> sjusthm: gregaf1: https://github.com/ceph/ceph/pull/559
[1:31] <bstillwell> so /var/lib/ceph/mon/ceph-den2ceph001 isn't getting created
[1:33] <sjusthm> sagewk: looks good
[1:36] <gregaf1> I defer to sjusthm
[1:37] <gregaf1> oh, guess he did merge it, thought that "looks good" meant he was waiting :)
[1:38] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:38] * ircolle (~Adium@c-67-165-237-235.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[1:40] <bstillwell> ran it again and it did get created, but I have the ceph-create-keys hang again
[1:41] <bstillwell> failed: 'ulimit -n 32768; /usr/bin/ceph-mon -i den2ceph001 --pid-file /var/run/ceph/mon.den2ceph001.pid -c /etc/ceph/ceph.conf '
[1:42] <xarses> ok
[1:42] <xarses> when i had that it was because my working directory (cwd) was invalid
[1:44] <bstillwell> Does this tell us something? (taken from /var/log/ceph/ceph-mon.den2ceph001.log):
[1:44] <bstillwell> 2013-08-29 17:42:51.216322 7f2b1b2ee7a0 0 mon.den2ceph001 does not exist in monmap, will attempt to join an existing cluster
[1:44] <bstillwell> 2013-08-29 17:42:51.216500 7f2b1b2ee7a0 -1 no public_addr or public_network specified, and mon.den2ceph001 not present in monmap or ceph.conf
[1:45] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[1:45] <xarses> oh yes it does
[1:46] <xarses> you can not add additional nodes not in the initial_monitors list with out public_network = network/cird in your /etc/ceph/ceph.conf on both nodes
[1:46] <xarses> i did this
[1:46] <xarses> on ceph-delploy node
[1:47] <xarses> echo public_network = 10.0.0.0/24 >> ceph.conf
[1:47] <xarses> ceph-deploy --overwrite-conf mon create <primary mon>
[1:47] <xarses> ceph-deploy config push <new mon>
[1:47] <xarses> ceph-deploy mon create <new mon>
[1:52] <bstillwell> There we go! I had a slight problem at first because I was reading 10.0.0.0/24 as 10.0.0.0/8...
[1:52] <bstillwell> xarses: thanks!
[1:53] <xarses> bstillwell: you're welcome
[1:55] * alphe (~alphe@0001ac6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:56] * mozg (~andrei@host86-185-78-26.range86-185.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:58] * zhyan__ (~zhyan@101.83.188.97) Quit (Ping timeout: 480 seconds)
[1:58] * AfC (~andrew@2407:7800:200:1011:5910:8716:3d3c:bdb7) Quit (Read error: Connection reset by peer)
[1:59] * AfC (~andrew@2407:7800:200:1011:5910:8716:3d3c:bdb7) has joined #ceph
[2:04] * alphe (~alphe@0001ac6f.user.oftc.net) has joined #ceph
[2:08] <joao> xarses, still around?
[2:08] <xarses> yes
[2:08] * Tamil1 (~Adium@cpe-108-184-67-79.socal.res.rr.com) Quit (Quit: Leaving.)
[2:08] <joao> with regard to your deployment example
[2:08] <joao> I don't really get 1
[2:08] <joao> mind elaborating?
[2:09] <xarses> lets say there is a power outage in the datacenter
[2:09] <xarses> all systems turn off
[2:09] <xarses> is there any consideration needed to safely start the cluster back up
[2:09] <xarses> is there any configuration that would make it easier
[2:09] <joao> not that I am aware of
[2:10] <joao> start the mons and the osds and they should all play nice with each other and amongst themselves
[2:10] <joao> to make it easier: I can only speak for the monitors, but no
[2:10] <xarses> lets further pretend that the only node in the configuration mon1, wont start
[2:10] <joao> they should simply get back up, form a quorum and go back to business
[2:11] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[2:11] <xarses> will the other monitors still establish a quorum with out any additional config
[2:12] <joao> xarses, the monitors keep a "mon map", containing all monitors in the cluster with whom they allow themselves to form a quorum
[2:12] <joao> ceph.conf is mainly for clients and other daemons
[2:12] * Vjarjadian (~IceChat77@176.254.37.210) has joined #ceph
[2:12] <joao> the monitors will get by just fine; clients however won't be able to figure out where the other monitors are if they don't have their locations on ceph.conf
[2:12] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Quit: leaving)
[2:13] <joao> osds, iirc, cache the monmap, but I'm not sure about that nor do I know if they persist those copies
[2:13] <joao> sjust would probably know better
[2:13] <joao> if they do, they may have a chance at finding the other monitors
[2:13] <joao> but you should really consider having the monitors on any ceph.conf supplied to clients
[2:13] * ScOut3R (~scout3r@54026B73.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[2:16] <xarses> so then that goes back to, should they be set in mon_host and/or mon_initial_members or should they be declared as ini blocks [mon.a] host = ... and addr = ...
[2:16] <joao> ah, different questions here
[2:17] <joao> so, afaict, mon_host and mon_initial_members are only used to bootstrap a monitor
[2:17] <joao> and won't be used ever again after said monitor has been deployed
[2:18] <joao> clients won't look at those afaik
[2:18] <joao> but let me check that real quick just in case
[2:18] <xarses> ok so there must be some magic when the client is on the mon
[2:19] <xarses> because it works currently without any further declarations.
[2:19] <joao> maybe I'm wrong
[2:19] <joao> there's also that
[2:20] * Tamil1 (~Adium@cpe-108-184-67-79.socal.res.rr.com) has joined #ceph
[2:22] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) Quit (Remote host closed the connection)
[2:22] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[2:23] <joao> okay, looks like the clients will actually use mon_host
[2:23] <joao> today I learned
[2:24] <joao> mon_initial_members however is only for the monitor's bootstrap process
[2:24] <joao> looks like one can easily generate a monmap from mon_host, if one is a client
[2:25] <joao> I'd have to dig deeper into that portion of monitor code to see if the monitor also will, but I doubt it
[2:25] <joao> wouldn't be paxos-y enough for the monitor
[2:25] * sagelap (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[2:26] <joao> xarses, I then guess that setting them in mon_host would be fine
[2:26] <joao> don't know whether there's a preference or a guideline
[2:26] <xarses> joao, sounds good. If i have time, I'll try both
[2:26] <xarses> thanks for your help
[2:27] <joao> yw
[2:27] <xarses> I'll be back in a bit
[2:28] * sagelap (~sage@2600:1012:b008:164e:f945:a530:9596:aa59) has joined #ceph
[2:35] * xarses (~andreww@204.11.231.50.static.etheric.net) Quit (Ping timeout: 480 seconds)
[2:38] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[2:51] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) has joined #ceph
[2:51] * yy-nm (~Thunderbi@218.74.34.80) has joined #ceph
[2:51] * Tamil1 (~Adium@cpe-108-184-67-79.socal.res.rr.com) Quit (Quit: Leaving.)
[2:51] <xarses> back
[2:52] * Tamil1 (~Adium@cpe-108-184-67-79.socal.res.rr.com) has joined #ceph
[2:54] * AfC (~andrew@2407:7800:200:1011:5910:8716:3d3c:bdb7) Quit (Ping timeout: 480 seconds)
[2:58] * ross_ (~ross@60.208.111.209) has joined #ceph
[3:01] * berant (~blemmenes@24-236-241-163.dhcp.trcy.mi.charter.com) has joined #ceph
[3:01] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) has joined #ceph
[3:02] * dpippenger (~riven@tenant.pas.idealab.com) Quit (Quit: Leaving.)
[3:02] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[3:12] * jaydee (~jeandanie@124x35x46x8.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[3:12] * rturk is now known as rturk-away
[3:18] * jaydee (~jeandanie@124x35x46x15.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:21] <nerdtron> morning
[3:24] * AfC (~andrew@2407:7800:200:1011:5910:8716:3d3c:bdb7) has joined #ceph
[3:27] * yanzheng (~zhyan@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[3:28] * Steki (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[3:34] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley_)
[3:35] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[3:40] <sglwlb> hi, i want to do something in kclient, but ceph-client code has all kernel code in it. https://github.com/ceph/ceph-client. Has something other ways to just get kclient project?
[3:42] <joshd> no, it's just a regular kernel source tree. the relevant dirs are net/ceph include/linux/ceph drivers/block/rbd.c and fs/ceph
[3:44] <sglwlb> So, i just need to git clone those files only?
[3:45] * diegows (~diegows@190.190.11.42) Quit (Ping timeout: 480 seconds)
[3:45] <joshd> git can't clone subdirs - you should just clone the whole tree
[3:45] <joshd> or add it as a remote to another kernel tree if you already have one
[3:45] <joshd> that'll avoid a lot of duplicate content
[3:45] <smiley_> does anyone know a way to make radosgw buckets public by default?
[3:47] <smiley_> I get access denied when trying to do 's3cmd setacl --acl-public --recursive bucket_name'
[3:48] <sglwlb> yeah, is's so big if i clone the whole tree. I will try to add ceph code in a kernel tree.thanks
[3:52] * silversurfer (~jeandanie@124x35x46x8.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:56] * jaydee (~jeandanie@124x35x46x15.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[4:07] * berant (~blemmenes@24-236-241-163.dhcp.trcy.mi.charter.com) Quit (Quit: berant)
[4:09] * malcolm (~malcolm@silico24.lnk.telstra.net) has joined #ceph
[4:09] <malcolm> Hi all, having a bit of fun tracking down a recent answer to this. Which kernel revision has RBD Version 2 image support?
[4:12] <joshd> 3.10, though there are some bugs still (fixed the most visible already, working on more)
[4:16] * bandrus (~Adium@12.248.40.138) Quit (Ping timeout: 480 seconds)
[4:17] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[4:19] <malcolm> Thanks for that!
[4:22] <malcolm> Hmm... Still cant do an rbd map on it...
[4:22] <malcolm> gives me an error 22
[4:22] <alphe> I want a cliente for windows T___T
[4:23] <alphe> I already have the name for it Ceph4Win :)
[4:23] * Tamil1 (~Adium@cpe-108-184-67-79.socal.res.rr.com) has left #ceph
[4:23] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:23] * berant (~blemmenes@24-236-241-163.dhcp.trcy.mi.charter.com) has joined #ceph
[4:24] <alphe> s3 is too hard to set up and there is not high speed client for it the closest is Webdrive but this is so far from a perfect solution ...
[4:24] <alphe> samba is great for big files ...
[4:25] <alphe> ftp lacks a map drive client multi connexion able ...
[4:25] <joshd> malcolm: check dmesg, maybe some other incompatible feature is in use, like stripingv2
[4:25] <malcolm> yeah that is why we want format 2
[4:25] <alphe> that is because s3 clientes are thought for a guy that has 1 gb or 2 to upload time to time some files ...
[4:25] <malcolm> So striping v2 is still a no go on the in kernel rbd driver..
[4:26] <alphe> in my case this is completely useless ...
[4:26] <joshd> right, format 2 in the kernel just supports clones still
[4:26] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[4:27] * BillK (~BillK-OFT@124-148-224-108.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[4:28] <malcolm> Cheers :D That explains everything
[4:28] <malcolm> Any roadmap on the stripingv2 inkernel?
[4:28] <jlhawn> I'm running `rados bench … write` and I keep getting long periods of time when current speed is 0MB/s
[4:28] <jlhawn> does anyone know what might be causing this?
[4:28] <jlhawn> I first guess would be something with io flushing
[4:28] <alphe> ceph -s
[4:29] * dmick (~dmick@2607:f298:a:607:b938:cc3a:9a41:7c65) Quit (Quit: Leaving.)
[4:29] * BillK (~BillK-OFT@124.150.44.103) has joined #ceph
[4:30] <alphe> does 56op/s is a good op rate for ceph cluster 10 nodes ?
[4:30] * dmick (~dmick@38.122.20.226) has joined #ceph
[4:32] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[4:32] <nerdtron> alphe op/s changes depending on read or write operations
[4:32] <alphe> ok
[4:32] <joshd> malcolm: no timeline right now afaik
[4:32] <alphe> ok going to bed now have a good time !
[4:33] * alphe (~alphe@0001ac6f.user.oftc.net) Quit (Quit: Leaving)
[4:33] <malcolm> thanks for everything joshd. Slows down my use case a little.. but I can spruice it up with some Mal brand insanity :D
[4:33] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[4:33] <joshd> jlhawn: journals filling up and then flushing out to disk is a common cause
[4:34] <joshd> malcolm: you're welcome! good luck with the sprucing :)
[4:34] <jlhawn> joshd: thanks. That's what I thought. Do you know of a good solution?
[4:34] <jlhawn> or should I just get a faster journal device
[4:34] * sagelap (~sage@2600:1012:b008:164e:f945:a530:9596:aa59) Quit (Ping timeout: 480 seconds)
[4:34] <jlhawn> Right now, I'm just testing with EBS volumes on EC2
[4:34] <nerdtron> jlhawn an ssd journal is the best
[4:35] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[4:35] <nerdtron> does anyone know what happens to my cephFS data when the MDS goes down?
[4:35] <joshd> ebs is known for inconsistent performance as well
[4:35] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[4:36] <joshd> provisioned iops might help, but it's still not the same as real disks
[4:36] * BillK (~BillK-OFT@124.150.44.103) Quit (Read error: Operation timed out)
[4:36] <joshd> nerdtron: mds is just a cache - all its data is on the osds
[4:37] <nerdtron> yeah, if the MDS goes down, how will i mount the cepFS?
[4:38] <joshd> start another mds
[4:40] * BillK (~BillK-OFT@58-7-166-34.dyn.iinet.net.au) has joined #ceph
[4:42] <nerdtron> really? ceph-deploy mds create node2? simple?
[4:44] <joshd> actually I guess it's a bit more complex, but the easiest way is to run extra mdses in standby mode, with only one active
[4:44] <joshd> they'll take over automatically if the active one goes down
[4:45] <nerdtron> joshd seems to be a good idea..how do i set an mds as standby mode?
[4:46] <joshd> I think they're standby by default as long as you keep mds_max_active=1 (which is also default iirc)
[4:49] <joshd> that is, the first one active, and others above mds_max_active are standby
[4:50] <nerdtron> mds_max_active=1 in ceph.conf right? and it is the default? hmmm...thanks for the info
[4:52] <joshd> ah, it's actually max_mds, and it does default to 1
[4:54] <joshd> you'd change it with 'ceph mds set_max_mds N' after the cluster is initialized
[4:58] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Operation timed out)
[4:59] * rudolfsteiner (~federicon@190.244.11.181) has joined #ceph
[5:02] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[5:06] * rudolfsteiner (~federicon@190.244.11.181) Quit (Quit: rudolfsteiner)
[5:12] * dmick (~dmick@38.122.20.226) Quit (Quit: Leaving.)
[5:13] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:14] <nerdtron> joshd my ceph cluster is currently running is the command "ceph mds set_max_mds 1" safe to run?
[5:14] <joshd> it wouldn't do anything unless you'd changed it already
[5:15] <nerdtron> it outputted max_mds = 1
[5:16] <nerdtron> now i add another mds? with ceph-deploy mds create node2?
[5:19] <joshd> yeah that should work
[5:20] <nerdtron> thanks
[5:57] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[6:11] * haomaiwang (~haomaiwan@124.161.8.209) has joined #ceph
[6:12] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[6:15] * malcolm (~malcolm@silico24.lnk.telstra.net) Quit (Quit: Konversation terminated!)
[6:32] * berant (~blemmenes@24-236-241-163.dhcp.trcy.mi.charter.com) Quit (Quit: berant)
[6:35] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Ping timeout: 480 seconds)
[6:40] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:41] * jlhawn (~jlhawn@208-90-212-77.PUBLIC.monkeybrains.net) Quit (Quit: jlhawn)
[6:41] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[6:42] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[6:47] * cofol1986 (~xwrj@110.90.119.113) has joined #ceph
[6:49] <cofol1986> Hi guys, is there any way to identify the corresponding objects name of the file under cephfs?
[6:50] * haomaiwang (~haomaiwan@124.161.8.209) Quit (Remote host closed the connection)
[6:50] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[6:51] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[6:53] * AfC (~andrew@2407:7800:200:1011:5910:8716:3d3c:bdb7) Quit (Ping timeout: 480 seconds)
[6:54] <cofol1986> Hi guys, is there any way to identify the corresponding objects of the file on cephfs?
[6:54] <yanzheng> cofol1986, sprintf(buf, "%llx.%08llx", (long long unsigned)ino, (long long unsigned)bno);
[7:01] * cofol1986 (~xwrj@110.90.119.113) Quit (Quit: Leaving.)
[7:02] * cofol1986 (~xwrj@110.90.119.113) has joined #ceph
[7:03] * AfC (~andrew@2407:7800:200:1011:78ec:b4f9:cc3d:f08b) has joined #ceph
[7:09] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:10] * lx0 is now known as lxo
[7:11] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[7:15] * athrift (~nz_monkey@203.86.205.13) has joined #ceph
[7:18] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[7:20] * ssejour (~sebastien@lif35-1-78-232-187-11.fbx.proxad.net) has joined #ceph
[7:23] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:24] * wschulze1 (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[7:28] * lx0 is now known as lxo
[7:31] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[7:45] * AfC (~andrew@2407:7800:200:1011:78ec:b4f9:cc3d:f08b) Quit (Quit: Leaving.)
[7:46] * wschulze1 (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:50] * AfC (~andrew@2407:7800:200:1011:78ec:b4f9:cc3d:f08b) has joined #ceph
[7:53] * ssejour (~sebastien@lif35-1-78-232-187-11.fbx.proxad.net) Quit (Quit: Leaving.)
[7:54] * tnt (~tnt@109.130.110.3) has joined #ceph
[7:58] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[8:02] * foosinn (~stefan@office.unitedcolo.de) has joined #ceph
[8:03] * ScOut3R (~ScOut3R@BC2484D1.dsl.pool.telekom.hu) has joined #ceph
[8:04] * AfC (~andrew@2407:7800:200:1011:78ec:b4f9:cc3d:f08b) Quit (Quit: Leaving.)
[8:04] * AfC (~andrew@2407:7800:200:1011:5910:8716:3d3c:bdb7) has joined #ceph
[8:11] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[8:13] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:13] <cofol1986> Hi I got this error:2013-08-30 06:11:21.035043 osd.0 [WRN] slow request 60.738004 seconds old, received at 2013-08-30 06:10:20.296999: osd_op(client.7023.0:5 [pgls start_epoch 0] 0.4 e336) v4 currently waiting for pg to exist locally. the process of osd.0 running well,why the error happens?
[8:14] * ScOut3R (~ScOut3R@BC2484D1.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[8:16] <yanzheng> cofol1986, i guess your cluster is recovering
[8:17] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[8:24] * Pauline (~middelink@2001:838:3c1:1:be5f:f4ff:fe58:e04) Quit (Remote host closed the connection)
[8:40] * sjusthm (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:40] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[8:42] * tnt (~tnt@109.130.110.3) Quit (Ping timeout: 480 seconds)
[8:51] * odyssey4me (~odyssey4m@41.13.4.159) has joined #ceph
[8:56] * KindTwo (~KindOne@50.96.228.46) has joined #ceph
[8:57] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:57] * KindTwo is now known as KindOne
[8:57] * odyssey4me2 (~odyssey4m@165.233.71.2) has joined #ceph
[8:59] * odyssey4me (~odyssey4m@41.13.4.159) Quit (Ping timeout: 480 seconds)
[9:00] * tnt (~tnt@ip-188-118-44-117.reverse.destiny.be) has joined #ceph
[9:06] * YD slaps YD around a bit with a large trout
[9:07] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[9:07] * AfC (~andrew@2407:7800:200:1011:5910:8716:3d3c:bdb7) Quit (Quit: Leaving.)
[9:08] * JustEra (~JustEra@89.234.148.11) has joined #ceph
[9:11] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[9:13] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[9:14] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[9:14] * nerdtron (~kenneth@202.60.8.252) Quit (Remote host closed the connection)
[9:18] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit ()
[9:21] * mschiff (~mschiff@p4FD7F12E.dip0.t-ipconnect.de) has joined #ceph
[9:26] * loicd1 (~loic@brln-4db8015a.pool.mediaWays.net) has joined #ceph
[9:26] * loicd1 (~loic@brln-4db8015a.pool.mediaWays.net) has left #ceph
[9:32] * janisg (~troll@85.254.50.23) has joined #ceph
[9:35] * Bada (~Bada@195.65.225.142) has joined #ceph
[9:42] * odyssey4me2 (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[9:45] * allsystemsarego (~allsystem@5-12-37-127.residential.rdsnet.ro) has joined #ceph
[9:46] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[9:56] * tiger (~chatzilla@58.213.102.114) has joined #ceph
[9:57] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[10:00] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) has joined #ceph
[10:10] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[10:17] * yy-nm (~Thunderbi@218.74.34.80) Quit (Remote host closed the connection)
[10:17] * yy-nm (~Thunderbi@218.74.34.80) has joined #ceph
[10:24] * adam4 (~adam@46-65-111-12.zone16.bethere.co.uk) has joined #ceph
[10:27] <ccourtaut> morning
[10:29] * adam3 (~adam@46-65-111-12.zone16.bethere.co.uk) Quit (Ping timeout: 480 seconds)
[10:36] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[10:40] * jluis (~joao@89.181.146.94) has joined #ceph
[10:40] * ChanServ sets mode +o jluis
[10:43] * joao (~joao@89.181.146.94) Quit (Read error: Operation timed out)
[10:43] * cofol1986 (~xwrj@110.90.119.113) Quit (Read error: Connection reset by peer)
[10:43] * cofol19861 (~xwrj@117.25.126.242) has joined #ceph
[10:47] * tiger (~chatzilla@58.213.102.114) Quit (Remote host closed the connection)
[10:47] * foobar (~foobar@lns-bzn-48f-62-147-153-127.adsl.proxad.net) has joined #ceph
[10:47] * foobar is now known as Guest4959
[10:47] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:47] * Guest4959 (~foobar@lns-bzn-48f-62-147-153-127.adsl.proxad.net) Quit ()
[10:49] * yy-nm (~Thunderbi@218.74.34.80) Quit (Ping timeout: 480 seconds)
[10:50] * tiger (~chatzilla@58.213.102.114) has joined #ceph
[10:51] * cofol19861 (~xwrj@117.25.126.242) Quit (Ping timeout: 480 seconds)
[10:52] * cofol1986 (~xwrj@110.90.119.113) has joined #ceph
[10:54] * yy-nm (~Thunderbi@218.74.34.80) has joined #ceph
[11:05] * yy-nm (~Thunderbi@218.74.34.80) Quit (Remote host closed the connection)
[11:06] * yy-nm (~Thunderbi@218.74.34.80) has joined #ceph
[11:09] * yy-nm (~Thunderbi@218.74.34.80) Quit ()
[11:20] * ninkotech_ (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[11:31] * shimo_ (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[11:34] * madkiss (~madkiss@89.204.130.163) has joined #ceph
[11:34] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Read error: Operation timed out)
[11:34] * shimo_ is now known as shimo
[11:38] * julian (~julianwa@125.70.133.115) has joined #ceph
[11:38] * tiger (~chatzilla@58.213.102.114) Quit (Ping timeout: 480 seconds)
[11:40] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[11:55] * indego (~indego@91.232.88.10) has joined #ceph
[12:00] * yanzheng (~zhyan@jfdmzpr03-ext.jf.intel.com) Quit (Remote host closed the connection)
[12:00] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[12:13] * madkiss1 (~madkiss@89.204.139.118) has joined #ceph
[12:15] * madkiss (~madkiss@89.204.130.163) Quit (Ping timeout: 480 seconds)
[12:24] * silversurfer (~jeandanie@124x35x46x8.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[12:25] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[12:26] * madkiss (~madkiss@82.113.98.227) has joined #ceph
[12:27] * ross_ (~ross@60.208.111.209) Quit (Ping timeout: 480 seconds)
[12:31] * madkiss2 (~madkiss@89.204.137.100) has joined #ceph
[12:31] * madkiss1 (~madkiss@89.204.139.118) Quit (Ping timeout: 480 seconds)
[12:32] * madkiss (~madkiss@82.113.98.227) Quit (Read error: Operation timed out)
[12:38] * tryggvil (~tryggvil@217.28.181.130) has joined #ceph
[12:46] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[12:46] * ChanServ sets mode +v andreask
[12:48] * themgt_ (~themgt@201-223-199-254.baf.movistar.cl) has joined #ceph
[12:52] * julian (~julianwa@125.70.133.115) Quit (Quit: afk)
[12:53] * themgt (~themgt@201-223-252-184.baf.movistar.cl) Quit (Ping timeout: 480 seconds)
[12:53] * themgt_ is now known as themgt
[12:54] * miniyo (~miniyo@0001b53b.user.oftc.net) Quit (Read error: Operation timed out)
[12:55] <lxo> yay, success! adjusting the osd number in the superblock enabled me to use a copy of another osd to initialize a new osd and avoid a longer recovery!
[12:57] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[12:59] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[12:59] * ChanServ sets mode +v andreask
[13:00] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:03] * madkiss2 (~madkiss@89.204.137.100) Quit (Quit: Leaving.)
[13:05] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[13:05] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[13:05] * ChanServ sets mode +v andreask
[13:07] * miniyo (~miniyo@0001b53b.user.oftc.net) has joined #ceph
[13:08] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:12] * miniyo (~miniyo@0001b53b.user.oftc.net) Quit (Read error: Connection reset by peer)
[13:12] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[13:12] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:18] * nerdtron (~kenneth@202.60.8.252) has joined #ceph
[13:24] * miniyo (~miniyo@0001b53b.user.oftc.net) has joined #ceph
[13:26] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[13:28] * diegows (~diegows@190.190.11.42) has joined #ceph
[13:30] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:38] * nerdtron (~kenneth@202.60.8.252) Quit (Remote host closed the connection)
[13:38] * adam4 (~adam@46-65-111-12.zone16.bethere.co.uk) Quit (Ping timeout: 480 seconds)
[13:38] * yy-nm (~Thunderbi@211.140.18.115) has joined #ceph
[13:40] * yy-nm (~Thunderbi@211.140.18.115) Quit ()
[13:44] * adam4 (~adam@46-65-111-12.zone16.bethere.co.uk) has joined #ceph
[13:48] * berant (~blemmenes@gw01.ussignalcom.com) has joined #ceph
[13:48] * LeaChim (~LeaChim@176.24.168.228) has joined #ceph
[13:49] * roald (~oftc-webi@87.209.150.214) has joined #ceph
[14:02] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:06] * tryggvil (~tryggvil@217.28.181.130) Quit (Quit: tryggvil)
[14:13] * sel (~sel@212.62.233.233) Quit (Ping timeout: 480 seconds)
[14:17] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:19] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:19] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[14:19] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[14:20] * sel (~sel@212.62.233.233) has joined #ceph
[14:21] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:32] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[14:40] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:41] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[14:42] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[14:45] * rsanti (~rsanti@74.125.122.33) has joined #ceph
[14:46] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[14:48] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[14:48] <rsanti> Hello everyone. Out of curiosity, how up to date is http://ceph.com/docs/next in general? 0.61? 0.67.2? ...master?
[14:55] * shimo_ (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[14:55] <alfredodeza> rsanti: I *think* it is master
[14:57] * loicd educates himself about paxos http://blog.aetherworks.com/2013/05/paxos-by-example/
[14:58] * tryggvil (~tryggvil@217.28.181.130) has joined #ceph
[14:59] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[14:59] * shimo_ is now known as shimo
[15:00] * rudolfsteiner (~federicon@190.244.11.181) has joined #ceph
[15:03] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) Quit (Ping timeout: 480 seconds)
[15:09] * mikedawson_ (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:14] * loicd likes the catchphrase of http://www.cs.utexas.edu/users/lorenzo/corsi/cs380d/past/03F/notes/paxos-simple.pdf
[15:14] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[15:14] * mikedawson_ is now known as mikedawson
[15:15] <loicd> Abstract: The Paxos algorithm, when presented in plain English, is very simple.
[15:16] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[15:17] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[15:18] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[15:18] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[15:20] * markbby (~Adium@168.94.245.1) has joined #ceph
[15:21] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[15:22] * dmsimard (~Adium@108.163.152.2) has joined #ceph
[15:22] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[15:22] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[15:23] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[15:27] * vata (~vata@2607:fad8:4:6:45a9:9a9a:2fbe:f449) has joined #ceph
[15:34] * haomaiwang (~haomaiwan@111.10.113.213) has joined #ceph
[15:34] * yanzheng (~zhyan@134.134.139.76) has joined #ceph
[15:40] * mschiff (~mschiff@p4FD7F12E.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[15:44] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[15:44] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[15:44] * rudolfsteiner (~federicon@190.244.11.181) Quit (Quit: rudolfsteiner)
[15:47] * julienhuang (~julienhua@106.242-224-89.dsl.completel.net) has joined #ceph
[15:59] * rudolfsteiner (~federicon@190.244.11.181) has joined #ceph
[16:00] <ChoppingBrocoli> mozg: you use cloudstack right?
[16:00] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[16:01] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Read error: Operation timed out)
[16:02] <mozg> ChoppingBrocoli, yeah
[16:02] <mozg> 4.1.1 at the moment
[16:02] <mozg> but i've been using it since version 3.something
[16:03] <jmlowe> loicd: I like that one too
[16:03] <loicd> jmlowe: to be honest his plain english is greek to me. But what do I know ? I'm french ;-)
[16:06] * rudolfsteiner (~federicon@190.244.11.181) Quit (Quit: rudolfsteiner)
[16:07] * xmltok_ (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Bye!)
[16:07] <jmlowe> loicd: plain english is too often foreign to native english speakers
[16:07] <loicd> jmlowe: :-D
[16:16] * torment2 (~torment@pool-96-228-147-151.tampfl.fios.verizon.net) Quit (Read error: Operation timed out)
[16:18] <dmsimard> I have a question about pools and placement groups. The document reads something like "100 * OSD" is a good number for placement groups. What happens when I scale the amount of OSDs over time ?
[16:22] * zhyan_ (~zhyan@101.82.180.60) has joined #ceph
[16:22] * yanzheng (~zhyan@134.134.139.76) Quit (Remote host closed the connection)
[16:35] * sagelap (~sage@2600:1012:b001:7e6f:a1d8:6962:87a3:483b) has joined #ceph
[16:35] <sagelap> zackc: good morning!
[16:35] <sagelap> did you see the top commit in teuthology.git?
[16:36] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[16:36] <berant> wrale: ping
[16:36] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[16:37] <zackc> sagelap: *that* is what was causing hangs?
[16:37] <zackc> wtf
[16:37] <sagelap> seriously. i pulled some hair tracking that down.
[16:38] <sagelap> it's easy to reproduce, at least
[16:38] <zackc> how in the...
[16:38] <wrale> berant: pong .. what's up
[16:38] <zackc> there has to be some other bug
[16:38] <sagelap> none of the scoping or lifetime for that stuff makes any sense to me, so i'm of no help...
[16:39] <berant> wrale: I just wanted to thank you for mentioning Docker, which lead me to CoreOS and destroying all my grey matter cycles thinking about CoreOS+Docker+Ceph container… :)
[16:39] * rudolfsteiner (~federicon@190.244.11.181) has joined #ceph
[16:39] <zackc> importing modules is supposed to be safe
[16:39] <sagelap> i'd just set up a test with one target and one task (say, - clock.check: ) and it should be easy to reproduce
[16:39] <zackc> and in general it's encouraged to be done at the top of the file
[16:40] <wrale> berant: no problem at all.. i'm hoping to see the combination come together, too..
[16:40] <sagelap> wrale: berant: sounds awesome
[16:41] <wrale> berant: i almost think the docker container could contain ceph, but then address block devices outside for the osds.. .. not sure about that
[16:41] * rudolfsteiner (~federicon@190.244.11.181) Quit ()
[16:41] <alfredodeza> norris sagelap
[16:41] <alfredodeza> I have no idea how you got to that
[16:41] <wrale> surely for the mds and mon
[16:42] * zhyan_ (~zhyan@101.82.180.60) Quit (Ping timeout: 480 seconds)
[16:42] <wrale> another thing i'd like to see is openshift on docker
[16:42] <wrale> mostly to easily deploy to ec2 lol
[16:42] <berant> wrale: yeah I don't have the experience with LXC to know how that raw addressing for the OSD would be possible
[16:42] * markbby (~Adium@168.94.245.1) Quit (Quit: Leaving.)
[16:43] <wrale> berant: i have it on my list to find out.. i'm hoping to convince the developers around me to try docker
[16:43] <wrale> (to make my life easier) lol
[16:43] <berant> haha
[16:43] <mozg> guys, I was wondering if there are any ideas what might be causing bug 6139 which i've submitted a few days ago?
[16:43] <kraken> mozg might be talking about: http://tracker.ceph.com/issues/6139 [kernel panic in vms during disk benchmarking]
[16:43] * markbby (~Adium@168.94.245.1) has joined #ceph
[16:43] <wrale> brb.. user support :)
[16:43] <berant> wrale: yeah, our dev's do things bass-akwards so I agree it would be fantastic
[16:44] <mozg> i can reporoduce it pretty much every time i run the io benchmarks
[16:44] * LCF (ball8@193.231.broadband16.iol.cz) has joined #ceph
[16:45] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[16:45] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[16:46] <LCF> Hi
[16:47] <LCF> looking for solution for that problem: https://gist.github.com/ljagiello/c6e64f237743912a780c
[16:47] <LCF> ceph-mon doesn't want to start
[16:48] <berant> sagelap: yeah, I was unawares of docker or CoreOS and it's got me quite intrigued at the possibilities
[16:51] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[16:51] <ChoppingBrocoli> mozg: do you have a reference you used to do DNS round robin?
[16:51] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[16:51] * bandrus (~Adium@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[16:51] <mozg> ChoppingBrocoli, there is no particular reference. i've just setup bind to give ceph-mon.domainname.com 3 different ip addresses
[16:52] <mozg> by default it cycles through when you do request
[16:52] <ChoppingBrocoli> oh that is easy
[16:52] <mozg> when setting up the rbd storage you give it ceph-mon.domainname.com
[16:52] <mozg> and that's it
[16:52] <mozg> yeah, pretty easy
[16:52] * BillK (~BillK-OFT@58-7-166-34.dyn.iinet.net.au) Quit (Read error: Operation timed out)
[16:53] <ChoppingBrocoli> wrale gave me a some good advice on other cloud options, now I want to ask about cloudstack. mozg have you had any issues with yours?
[16:53] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) Quit (Quit: Leaving.)
[16:54] <mozg> ChoppingBrocoli, it's like with anything else, you need to read documentation )) i would suggest twice
[16:54] <mozg> and follow through the installation guide
[16:54] <mozg> step by step
[16:54] <mozg> and you should be okay with your test cluster
[16:54] <mozg> once you are setup, you need to experiment and learn from mistakes )))
[16:55] <mozg> i've been using cloudstack for about a year now
[16:55] <mozg> and i can say that it has been good so far
[16:55] <mozg> not without its bugs, but i would say they are minor
[16:55] <mozg> i do like the flexibility of cloudstack
[16:55] <ChoppingBrocoli> have you ever experienced a total cluster failure?
[16:55] <mozg> plus it works pretty well with ceph
[16:55] * AfC (~andrew@2001:44b8:31cb:d400:2ad2:44ff:fe08:a4c) has joined #ceph
[16:56] <mozg> i have exprienced a cluster failure on several occasions, but it was down to ceph and not cloudstack
[16:56] <mozg> however, thanks to the guys at #ceph i've sorted them out
[16:56] <mozg> plus learned a lot how to deal with issues
[16:57] <mozg> i would say that cloudstack is ready for prime time
[16:57] <ChoppingBrocoli> Can I use MS DNS with it instead of bind?
[16:57] <mozg> what do you mean? to provide the round robin thing for ceph?
[16:58] <mozg> if you mean for round robin, yeah, you can use any dns server
[16:58] <ChoppingBrocoli> Yea, can I use my Active Directories dns?
[16:58] <mozg> however, depending how you want to setup up your cloud, cloudstack has it's own dns services
[16:58] <wrale> back... working on my ansible setup today.. looks like i will have to wait on deploying ceph.. hardware is being redirected to cassandra for a test platform..
[16:58] <mozg> which are handled by the virtual router
[16:59] <mozg> back in 20 mins - got to eat
[17:03] <berant> wrale: bummer! I'll be racking mine today, hopefully have everything bootstrapped next week
[17:03] <wrale> berant: for sure. what is your mode of transport? will you have separate networks for storage, management (drac) and public?
[17:03] <wrale> 1GbE?
[17:05] <dmsimard> Hey, guys, I'll try asking again - Let's pretend I put 300 placement groups in a pool because I have 3 OSDs today (100 x OSD). Next week I need to scale and 3 more OSDs. Should I be adjusting the amount of placement groups so that the data is better balance ? How ?
[17:05] <berant> 10Gb for both public and cluster, drac right now will not be used, I need to get a minimal access switch re-claimed from something else first
[17:05] <wrale> berant: excellent.. i'm eager to hear how it works out performance wise
[17:06] <berant> will be doing the management from a dedicated 1u box through the cluster network to start with
[17:06] <berant> wrale: yeah I'll let you know how it looks.
[17:07] <berant> I have a small single node ceph cluster at home with SATA NL and I added cheap consumer SSDs for journals and was quite shocked at the performance increase. So with this cluster I'm going to try it without the SSD journals first to get some data on the difference
[17:07] <kraken> http://i.imgur.com/zVyOBlR.gif
[17:09] <wrale> berant: right on. i hope to get back on the ceph thing soon.. we're building a big do-everything compute cluster soon.. trying to decide between openstack, virsh, some kinda grid, old school hpc and mesos.. hoping for ceph to back storage, though.. 40GbE, if my sources are correct
[17:09] <wrale> i'm not wanting to do lustre ..lol
[17:10] <berant> wrale: sounds very interesting. 40GE all the way to the nodes or just interconnect between fabrics?
[17:10] <wrale> all the way
[17:10] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[17:11] <berant> wrale: given all the interdependencies what were you considering for monitoring? I'm leaning towards sensu, but I don't have any experience yet
[17:11] <berant> wrale: impressive!
[17:12] <wrale> berant: not familiar with sensu.. will look into that.. i've been looking closely at zabbix for a bit.. looks like nagios + performance reports.. i also hear good things about collectd and graphite
[17:12] <wrale> berant: it's going to be a single rack, but i'm pretty excited
[17:12] <wrale> (maybe two..lol)
[17:13] * sagelap1 (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[17:13] * Bada (~Bada@195.65.225.142) Quit (Ping timeout: 480 seconds)
[17:14] <berant> wrale: yeah I'm just reading about it, it was talked about in dreamhosts preso about their dreamobject ceph cluster
[17:15] * sagelap (~sage@2600:1012:b001:7e6f:a1d8:6962:87a3:483b) Quit (Ping timeout: 480 seconds)
[17:15] <wrale> cool
[17:17] * sprachgenerator (~sprachgen@130.202.135.193) has joined #ceph
[17:18] * JustEra (~JustEra@89.234.148.11) Quit (Quit: This computer has gone to sleep)
[17:18] <berant> wrale: if you have an existing HPC setup with hbase you might want to look at opentsdb, very scalable, similar to graphite but uses hbase and doesn't do any downsampling for the actual metric storage
[17:19] <berant> v2 is in release candidate right now and adds a ton of features
[17:19] <berant> bbl, real work calling
[17:20] * yehudasa_ (~yehudasa@2602:306:330b:1410:ea03:9aff:fe98:e8ff) Quit (Ping timeout: 480 seconds)
[17:24] <via> i'm having a consistently crashing osd: https://pastee.org/6auw5
[17:25] * foosinn (~stefan@office.unitedcolo.de) Quit (Quit: Leaving)
[17:26] * AfC (~andrew@2001:44b8:31cb:d400:2ad2:44ff:fe08:a4c) Quit (Ping timeout: 480 seconds)
[17:26] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[17:27] * yanzheng (~zhyan@101.82.166.42) has joined #ceph
[17:28] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[17:29] * julienhuang (~julienhua@106.242-224-89.dsl.completel.net) Quit (Quit: julienhuang)
[17:37] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) has joined #ceph
[17:39] * JustEra (~JustEra@249.133.17.109.rev.sfr.net) has joined #ceph
[17:39] <wrale> berant: cool cool.. i'm having to create a custom instance type under openstack for a user.. this is anti-cloud.. lol.. 96GB of ram
[17:46] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[17:49] * tryggvil (~tryggvil@217.28.181.130) Quit (Quit: tryggvil)
[17:50] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[17:51] <sagewk> via: what version?
[17:51] <ccourtaut> buck: hi there
[17:51] <buck> ccourtaut: howdy
[17:52] <ccourtaut> buck: i was messing with the radosgw-agent today, and went to a problem
[17:52] <ccourtaut> in your last commit of your branch, you're trying to get the list of objects in json format
[17:53] <ccourtaut> but in the radosgw, the formatter allocated will always be of xml type
[17:53] <buck> ccourtaut: oh, which branch is that?
[17:53] <ccourtaut> so you won't be able to get the json response
[17:53] * tryggvil (~tryggvil@217.28.181.130) has joined #ceph
[17:53] <ccourtaut> in radosgw-agent
[17:53] <ccourtaut> wip-buck-datasync
[17:54] <buck> ccourtaut: ah, ok. yeah, that's just where i left off for the day on Tuesday and I've been digging through some issues in the metadata sync code the last two days. I hadn't been thinking people would pull that so I left it in a state of disrepair with a stickie note on my desk to fix that specific issue
[17:54] <ccourtaut> https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rest_s3.cc#L2153
[17:54] <buck> ccourtaut: sorry :/
[17:55] <ccourtaut> here the configurable option is passed to false, so you won't be able to get json response
[17:55] <via> sagewk: 0.61.8
[17:55] <ccourtaut> buck: oh no problem :)
[17:55] <ccourtaut> but if you need any help on this, feel free to tell me, or give me tasks to do :)
[17:56] <buck> ccourtaut: I need to switch the code and use the boto library to do something like 'bucket = conn.get_bucket(foo); bucket.list()'
[17:56] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) has joined #ceph
[17:56] <ccourtaut> oh ok
[17:56] <ccourtaut> would be easier to read though :)
[17:57] <buck> ccourtaut: cool. We have a meeting today to sort out our next sprint and talk over a few things. After that, we should havea better idea of where we could use help.
[17:57] <ccourtaut> ok
[17:57] <buck> ccourtaut: thanks for taking a look at the code and running it. More eyes is always a good thing.
[17:57] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:58] * indego (~indego@91.232.88.10) Quit (Quit: Leaving)
[17:58] <ccourtaut> i'm dedicated to work on async geo-replication, so as far as i stay in this aero of work, it's fine for me :)
[17:59] <ccourtaut> buck: by now i have a local setup running and i'm able to launch the agent so that the metadata sync happens
[17:59] * tobru (~quassel@2a02:41a:3999::94) Quit (Remote host closed the connection)
[17:59] * JustEra (~JustEra@249.133.17.109.rev.sfr.net) Quit (Quit: This computer has gone to sleep)
[17:59] <ccourtaut> do not know if my setup is complete or accurate, but seems to work by now
[18:00] <buck> ccourtaut: if you can sync users and buckets, that's likely correct (or real close to correct). Quick question: are you setting up the system users on both sides with the same keys or different keys? Just curious
[18:00] * yanzheng (~zhyan@101.82.166.42) Quit (Ping timeout: 480 seconds)
[18:00] <ccourtaut> same keys for now, could try different keys if you'd like
[18:01] <buck> ccourtaut: same keys is easier, but there may be a slight issue in some cases. I'm running it down now. I was just wondering if a) were using the same keys and b) had run into any issues with syncs failing
[18:02] <ccourtaut> for now seems ok to me for buckets, didn't looked up if users was synced
[18:02] <buck> ccourtaut: we should have a fix soon, so if you're not seeing issues, I wouldn't worry about it.
[18:04] * ScOut3R (~ScOut3R@catv-89-133-17-71.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[18:04] <ccourtaut> ok great
[18:05] * rudolfsteiner (~federicon@200.68.116.185) has joined #ceph
[18:08] * JustEra (~JustEra@249.133.17.109.rev.sfr.net) has joined #ceph
[18:10] <ccourtaut> buck: need to go from the office
[18:10] <ccourtaut> it is already week end over here
[18:10] <buck> ccourtaut: ok. i'll email you later today or this weekend
[18:10] <ccourtaut> buck: thanks!
[18:11] <ccourtaut> enjoy your day!
[18:11] <buck> you too
[18:11] <buck> what's left of it
[18:11] <ccourtaut> :)
[18:11] <ccourtaut> I'll try to
[18:12] * sagelap1 (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Quit: Leaving.)
[18:15] * xarses (~andreww@204.11.231.50.static.etheric.net) has joined #ceph
[18:19] * JustEra (~JustEra@249.133.17.109.rev.sfr.net) Quit (Quit: This computer has gone to sleep)
[18:19] * JustEra (~JustEra@249.133.17.109.rev.sfr.net) has joined #ceph
[18:20] * mattch (~mattch@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[18:22] <sagewk> jluis: http://tracker.ceph.com/issues/6178
[18:22] <sagewk> jluis: mon/DataHealthService.cc: 131: FAILED assert(store_size > 0)
[18:22] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:22] <jluis> sagewk, checking
[18:23] <jamespage> hey sagewk
[18:24] <jamespage> if you have a moment could you give me an opinion on https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1218852
[18:25] <sagewk> jamespage: hi! looking
[18:25] <jamespage> sagewk, thanks
[18:26] * tnt (~tnt@ip-188-118-44-117.reverse.destiny.be) Quit (Ping timeout: 480 seconds)
[18:26] <sagewk> 851619ab6645967e5d7659d9b0eea63d5c402b15 fixed the upstart to handle this
[18:26] <sagewk> so, fixed in dumpling. easy to backport that to cuttlefish..
[18:27] <sagewk> not sure what version they are running?
[18:27] <sagewk> s/they/you/ :)
[18:27] <jamespage> sagewk, I hit it on bobtail actually
[18:27] <jamespage> I working on the 0.56.7 point release - I'll pull that in as a patch ontop
[18:28] <sagewk> that explains it. sounds good. i can cherry-pick it as well
[18:28] <jamespage> sagewk, great - thanks
[18:28] <loicd> https://github.com/ceph/ceph/commit/851619ab6645967e5d7659d9b0eea63d5c402b15
[18:29] <sagewk> jamespage: done, 57cb25c851ff25a4270e414b1db617d0db68df53
[18:29] <jamespage> sagewk, marvellous!
[18:30] * JustEra (~JustEra@249.133.17.109.rev.sfr.net) Quit (Quit: This computer has gone to sleep)
[18:30] <mikedawson> sagewk: thanks for your help last Friday night on the rbd bench oddness! That patch didn't fix the problem. Just posted about deep-scrub & rbd latency on ceph-devel. I think this is the real source of my issue.
[18:34] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[18:35] * markbby (~Adium@168.94.245.1) Quit (Quit: Leaving.)
[18:37] <sagewk> mikedawson: hmm, maybe... if i'm remembering right the log that i saw shows that the client was blocking for a long period. did you also try turning the objecter in-flight io limits up/off?
[18:37] <sagewk> i suspect that both things are contributing
[18:38] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Remote host closed the connection)
[18:38] <mikedawson> sagewk: yeah, I upped them, but it made no difference whatsoever. Writes seem fine with RBD cache, then they flush and it gets slow as expected. But read latency is bad news.
[18:40] <mikedawson> sagewk: disks get saturated by deep-scrub, rbd writeback covers up the lack of performance, but reads have no recourse other than wait it line with multi-second latency.
[18:41] <sagewk> yeah, the deep scrub definitely needs to throttle itself.
[18:43] <nhm> mikedawson: any idea what the io looks like during deep scrub?
[18:43] <nhm> mikedawson: I assume if the iowait is shooting up there's a lot of seek overhead.
[18:44] <mikedawson> nhm, sagewk: http://www.gammacode.com/deep-scrub.jpg That shows deep-scrub until about 10:45 when I disabled it, then re-enabled at ~11:30am
[18:44] <sel> quick question about dmcrypt, is a new key created for each osd, or can I create one key for all osds?
[18:45] <nhm> percent util is CPU time during which there was outstanding IO?
[18:45] <sagewk> mikedawson: we could just stick a sleep for a few ms in the scrub loop between objects and see what kind of impact that has
[18:46] <sagewk> ideally, the thread's io requests would be tagged with a lower priority through the kernel io stack; not sure how easy that is
[18:46] <mikedawson> nhm: this it the output of the last column (%util) from 'iostat -xt 2'
[18:48] <mikedawson> sagewk: yeah, prioritizing client i/o seems the way to go (hammering the drives doesn't concern me if the clients aren't affected)
[18:49] <sagewk> even if they are strictly prioritized there will be some impact just because teh physical disk arm is always swinging around to do some operation, but it should get us most of the way there
[18:51] <nhm> sagewk: So I thought we already did some kind of prioritization of client ops? Maybe I'm misremebering.
[18:52] <mikedawson> nhm, sagewk: here is the affect on the guests: http://www.gammacode.com/deep-scrub-rbd-latency.jpg This shows there dips in interface traffic during the same time. This should generally be at or above the dashed baseline. Notive the drops while deep-scrub was occurring.
[18:53] <mikedawson> if you zoom out to a longer duration, the difference is shockingly evident.
[18:53] <kraken> http://i.imgur.com/Q4bI5.gif
[18:54] <sagewk> jluis: also, can you look at https://github.com/ceph/ceph/pull/530 sometime today?
[18:54] <jluis> sagewk, looking next
[18:54] <sagewk> thanks!
[18:55] <jluis> just finishing fixing the datahealthservice assert
[18:57] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[18:57] * tnt (~tnt@109.130.110.3) has joined #ceph
[18:57] <bstillwell> So I'm planning on installing radosgw today, but I already ran into an issue with the docs on this page:
[18:57] <bstillwell> http://ceph.com/docs/next/install/rpm/
[18:57] <bstillwell> It lists two rpm packages under 'Install Apache and FastCGI', but doesn't say where to get them
[18:57] <bstillwell> They aren't available in the EPEL repo
[18:58] <bstillwell> well, fcgi is, but not mod_fastcgi
[18:58] <nhm> mikedawson: btw, nice graphs
[18:59] <mikedawson> nhm: Thanks! We're not where we want to be just yet, but Ceph has given us the opportunity to improve ;-)
[19:00] * yehudasa_ (~yehudasa@2607:f298:a:607:ea03:9aff:fe98:e8ff) has joined #ceph
[19:00] <xarses> bstillwell: looks like fcgi might be the mod_fastcgi provider
[19:01] <bstillwell> I found this: http://apt.sw.be/redhat/el6/en/x86_64/dag/RPMS/mod_fastcgi-2.4.6-2.el6.rf.x86_64.rpm
[19:01] <bstillwell> The docs could be updated to use this command:
[19:01] <bstillwell> yum localinstall http://apt.sw.be/redhat/el6/en/x86_64/dag/RPMS/mod_fastcgi-2.4.6-2.el6.rf.x86_64.rpm
[19:02] <alfredodeza> gitbuilders?
[19:03] <alfredodeza> hrmn
[19:03] <alfredodeza> kraken: gitbuilders?
[19:03] <kraken> gitbuilders are http://ceph.com/gitbuilder.cgi (alfredodeza on 08/29/2013 11:48AM)
[19:03] <alfredodeza> good job kraken
[19:04] <bstillwell> alfredodeza: Are there plans to have ceph-deploy install radosgw as well at some point?
[19:04] <alfredodeza> bstillwell: indeed
[19:04] <berant> wrale: yeah, that does kinda defeat the purpose...
[19:04] <bstillwell> alfredodeza: :)
[19:04] <alfredodeza> we are waiting to get an actual RPM/DEB package
[19:04] <alfredodeza> so that ceph-deploy is just shuffling things around and not doing what an RPM/DEB should :D
[19:05] <bstillwell> RPM/DEB of what?
[19:05] <alfredodeza> radosgw
[19:05] <bstillwell> That's not the ceph-radosgw package?
[19:06] <alfredodeza> maybe? I have not looked into it for a while
[19:06] <alfredodeza> if that is ready then I should get ceph-deploy to use that soon-ish
[19:07] <alfredodeza> a while ago the intention was to make ceph-deploy do all the configuration, installation, etc...
[19:07] <alfredodeza> but a package is better at that
[19:07] <bstillwell> I'm not sure when it was added, but here's the RPM for dumpling:
[19:07] <bstillwell> http://ceph.com/rpm-dumpling/el6/x86_64/ceph-radosgw-0.67.2-0.el6.x86_64.rpm
[19:08] <bstillwell> and the dpkg:
[19:08] <bstillwell> http://ceph.com/debian-dumpling/pool/main/c/ceph/radosgw_0.67.2-1raring_amd64.deb
[19:08] <bstillwell> anyways, that'd be great if support was added to ceph-deploy.
[19:09] <sel> Nobody who here who has used dm-crypt on their osds?
[19:09] <bstillwell> So anyways, when I used yum to install ceph-radosgw it pulled in mod_fcgid from epel.
[19:10] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[19:10] * ChanServ sets mode +v andreask
[19:10] <alfredodeza> issue 5956
[19:10] <bstillwell> mod_fcgid is a binary-compatible alternative to the Apache module mod_fastcgi.
[19:10] <kraken> alfredodeza might be talking about: http://tracker.ceph.com/issues/5956 [Implement a radosgw command in ceph-deploy ]
[19:10] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has left #ceph
[19:10] <alfredodeza> bstillwell: ^ ^
[19:10] <alfredodeza> bstillwell: yes, that is expected
[19:11] <alfredodeza> you want that because that patch is needed for performance reasons iirc
[19:11] <bstillwell> alfredodeza: thanks!
[19:11] <alfredodeza> no problem sir
[19:11] * b1tbkt (~b1tbkt@24-217-192-155.dhcp.stls.mo.charter.com) has joined #ceph
[19:12] <bstillwell> which patch?
[19:12] * rturk-away is now known as rturk
[19:13] <alfredodeza> argh I don't recall what was it
[19:13] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:14] <bstillwell> I find it odd that the instructions call for installing both fcgi-2.4.0-10.el6.x86_64.rpm and mod_fastcgi-2.4.6-2.el6.rf.x86_64.rpm, but mod_fastcgi doesn't exist in EPEL (only mod_fcgid)
[19:14] <bstillwell> afaict, the mod_fastcgi package doesn't contain any ceph specific modifications
[19:14] <wrale> sel: i'd like to try that, but i haven't gotten to it yet
[19:15] * ScOut3R (~scout3r@BC2484D1.dsl.pool.telekom.hu) has joined #ceph
[19:16] <bstillwell> Oh wait, I see custom debs for apache/fastcgi that add 100-continue support, but not seeing anything for RPMs
[19:19] <glowell> bstillwell: There are rpms for apache and fastcgi on gitbuilder.ceph.com
[19:19] <glowell> Appache is: http://gitbuilder.ceph.com/httpd-rpm-centos6-x86_64/
[19:20] <glowell> fastcgi is: http://gitbuilder.ceph.com/mod_fastcgi-rpm-centos6-x86_64/
[19:20] <bstillwell> glowell: ahh, thanks
[19:20] <glowell> The package names are slightly different between the various distros.
[19:21] * Cube (~Cube@cpe-76-95-217-129.socal.res.rr.com) Quit (Quit: Leaving.)
[19:21] * imjustmatthew (~imjustmat@c-24-127-107-51.hsd1.va.comcast.net) Quit (Remote host closed the connection)
[19:24] * dpippenger (~riven@tenant.pas.idealab.com) has joined #ceph
[19:31] <jluis> alfredodeza, there?
[19:31] <zackc> sagewk: the 'bare except:' anti-pattern i was talking about? paramiko uses it all over the place. :(
[19:31] <sagewk> sigh
[19:31] <alfredodeza> hi jluis
[19:31] <jluis> alfredodeza, hi there
[19:31] <jluis> alfredodeza, do you recall what was you comment about on this pull request? https://github.com/ceph/ceph/pull/530
[19:32] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:32] * alfredodeza looks
[19:32] <zackc> and that's why, if teuthology is hanging while trying to connect, ctrl-c won't interrupt it so you at least get a traceback
[19:32] <alfredodeza> jluis: yes, I thought that the file was never closed
[19:32] <alfredodeza> but sage confirmed that was not the case
[19:32] <alfredodeza> why?
[19:32] <jluis> alfredodeza, which file?
[19:32] <alfredodeza> woah
[19:33] <alfredodeza> that looks like completely a different changeset
[19:33] <alfredodeza> O_O
[19:33] <jluis> well, trying to figure out whether to add your reviewed-by or if your review was actually about something different from sage's patch :p
[19:34] * markbby (~Adium@168.94.245.2) has joined #ceph
[19:35] <jluis> alfredodeza, I'm going with your comment being meant either for some other branch or some other patchset no longer in that one :p
[19:36] <alfredodeza> yes
[19:36] <alfredodeza> I would not be the one to be reviewing C++ stuff for sure :)
[19:36] <jluis> sagewk, merged
[19:36] <alfredodeza> no idea why that got there
[19:36] <alfredodeza> I could *swear* it was for something else
[19:36] <jluis> the weirdest things happen on the internet
[19:37] <jluis> that's why I believe we should all file our reviews in hard-copies and mail them around by pigeon
[19:37] <jluis> or that could just be an excuse to get myself a bunch of pigeons
[19:40] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[19:40] <dmsimard> Magic. Magic I tell you. Ceph is magical.
[19:40] <dmsimard> That is all.
[19:40] <nhm> dmsimard: we sprinkle pixie dust in each drive.
[19:43] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[19:48] * Cube (~Cube@12.248.40.138) has joined #ceph
[19:50] * zakoski (~pty2i@bl7-191-196.dsl.telepac.pt) has joined #ceph
[19:58] * tryggvil_ (~tryggvil@217.28.181.130) has joined #ceph
[19:58] * ScOut3R (~scout3r@BC2484D1.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[20:01] * tryggvil (~tryggvil@217.28.181.130) Quit (Ping timeout: 480 seconds)
[20:01] * tryggvil_ is now known as tryggvil
[20:02] * The_Bishop (~bishop@2001:470:50b6:0:9d4b:db1b:3883:df83) Quit (Ping timeout: 480 seconds)
[20:10] * The_Bishop (~bishop@2001:470:50b6:0:6955:55fa:2ee7:b9e7) has joined #ceph
[20:11] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley_)
[20:11] * tryggvil (~tryggvil@217.28.181.130) Quit (Quit: tryggvil)
[20:12] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:22] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[20:22] <twx> hey guys, just thinking. are there any general limitations on what one can change in the crushmap while in flight that i should be aware of?
[20:24] * tryggvil (~tryggvil@217.28.181.130) has joined #ceph
[20:27] * berant (~blemmenes@gw01.ussignalcom.com) has left #ceph
[20:27] * zakoski (~pty2i@bl7-191-196.dsl.telepac.pt) Quit (autokilled: Do not spam. Mail support@oftc.net with questions. (2013-08-30 18:27:57))
[20:29] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) Quit (Quit: smiley_)
[20:31] * tryggvil (~tryggvil@217.28.181.130) Quit (Quit: tryggvil)
[20:39] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[20:41] <dmsimard> What do you guys think in general of the puppet-ceph initiative? https://github.com/enovance/puppet-ceph/network
[20:42] <dmsimard> I see for instance that it's probably not doing things as cleanly as it could (e.g, using mkpart and mkfs instead of ceph-osd-prepare)
[20:42] <dmsimard> Would it be better to wrap a puppet initiative around ceph-deploy, perhaps ?
[20:45] * rturk is now known as rturk-away
[20:47] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) Quit (Ping timeout: 480 seconds)
[20:47] * mikedawson (~chatzilla@12.89.37.218) has joined #ceph
[20:47] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) Quit (Quit: Leaving.)
[20:48] * grepory (~Adium@50-115-70-146.static-ip.telepacific.net) has joined #ceph
[20:48] <alfredodeza> ceph-deploy is actually a bit restricted on its set of features, so I would say to not use ceph-deploy for puppet/chef/ansible etc..
[20:50] <dmsimard> But ceph-deploy will continue to mature and features should be pushed upstream rather than patching around doing manual work in puppet, though
[20:51] * rturk-away is now known as rturk
[20:51] <dmsimard> I see your point, though
[20:53] <alfredodeza> ceph-deploy is meant to get you started, have some nice options and flags to set ceph, but not to open the door to all the options and configurations that are available to ceph
[20:53] <alfredodeza> consider for example the usage of ceph-disk
[20:54] <alfredodeza> ceph-deploy can't do much of those
[20:55] <dmsimard> But that's because the features don't exist yet or will never really exist ?
[20:56] <dmsimard> I'm just trying to wrap my head around the philosophy behind the deployment and gauge the options
[20:57] <alfredodeza> if we go by the fact that ceph-deploy is meant to get you started, then, no. I don't see ceph-deploy having everything else implemented
[20:57] <alfredodeza> but rather, the small subset of things needed to get you started, done well, very well documented and with straightforward execution
[20:57] <alfredodeza> which (right now) is not the case
[20:57] <alfredodeza> although we are trying to get there
[20:58] <alfredodeza> so, ceph-deploy a.k.a. "get started with ceph as easy as possible"
[20:58] <dmsimard> fair enough
[21:00] * jlhawn (~jlhawn@208-90-212-77.PUBLIC.monkeybrains.net) has joined #ceph
[21:02] * ScOut3R (~scout3r@BC2484D1.dsl.pool.telekom.hu) has joined #ceph
[21:05] * doxavore (~doug@99-89-22-187.lightspeed.rcsntx.sbcglobal.net) has joined #ceph
[21:06] * imjustmatthew (~imjustmat@pool-173-53-100-217.rcmdva.fios.verizon.net) has joined #ceph
[21:10] * rootard (~rootard@pirlshell.lpl.arizona.edu) has joined #ceph
[21:11] <rootard> (cross post from #openstack) I am trying really hard to find documentation about ceph+kvm interaction. AFAICT my instance is being trying to connect to ceph but has auth_supported=none when it should be auth_supported=cephx. No matter what I set in the virsh secret list, it seems to always use auth_supported=none. Any tips?
[21:11] * rturk is now known as rturk-away
[21:17] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[21:17] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[21:20] <sel> alfredodeza, Is there a plan to make it possible to set the osd id when a osd is created with ceph-deploy?
[21:20] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[21:20] <alfredodeza> there is currently no plan (if by plan you mean an open ticket)
[21:23] <sel> Ok, do you know if it would be hard to implement? The reason for my question is that I would like to skip some osd id's. At the moment my osd servers has 24 disks, but will be extended with 12 more at some point.
[21:24] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[21:24] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[21:24] <sel> Ideally I would like to use osd ID 100-124 on the first server, and 200-224 for the second one and so on..
[21:25] <alfredodeza> that sounds like something you could be doing with something like chef/puppet/cfengine ?
[21:25] <alfredodeza> that sounds like really specific to extended ceph configuration
[21:25] <alfredodeza> ceph-deploy is really meant to get you started :)
[21:25] <alfredodeza> unless, from a 'getting started with ceph' point of view you believe this should totally be in
[21:26] <alfredodeza> which might be the case (I am open to get feedback here)
[21:28] <sel> Well, it would be nice, then you would be able to add some logic to the osd ids, let's say that osd.223 goes down, then you would know which server has gone down, and which physical disk has the problem.
[21:28] <sel> err on which server the problem is
[21:28] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[21:28] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[21:28] <alfredodeza> sure, I understand the use case
[21:29] <alfredodeza> but do you see that as a use-case for someone that is getting started with ceph?
[21:29] <alfredodeza> if yes, then that definitely should go into ceph-deploty
[21:31] <sel> Well, I think ceph-deploy can be used by small clusters in production as well, and then this would be a good thing.
[21:32] <sel> Also it seems that the puppet-ceph thing is missing the ability to encrypt disks (which we will need to do)
[21:33] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[21:33] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[21:37] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[21:37] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[21:37] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[21:38] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[21:42] <xarses> joshd: are you around?
[21:46] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[21:47] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has joined #ceph
[21:51] * haomaiwang (~haomaiwan@111.10.113.213) Quit (Remote host closed the connection)
[21:51] <ChoppingBrocoli> Does anyone know where to set the value to have libvirt always use writeback on vm creation?
[21:52] <jmlowe> <driver name='qemu' type='raw' cache='writeback'/>
[21:53] <jmlowe> under the disk definition
[21:53] <ChoppingBrocoli> Thanks but not that, I mean to always assign it
[21:53] <ChoppingBrocoli> make it the default
[21:54] <jmlowe> ?
[21:54] <jmlowe> didn't know that was possible
[21:54] <LCF> ChoppingBrocoli: generate xml with data you want and virsh define new.xml
[21:55] <ChoppingBrocoli> Yes it is possible in libvirt config to make it default to where you do not need to define it or change it later on....it will be created with writeback as default. I saw that a while back but I am having trouble finding it again
[22:02] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) Quit (Quit: Leaving.)
[22:08] * roald (~oftc-webi@87.209.150.214) Quit (Quit: Page closed)
[22:25] * mikedawson (~chatzilla@12.89.37.218) Quit (Read error: Connection reset by peer)
[22:30] * miniyo (~miniyo@0001b53b.user.oftc.net) Quit (Read error: Operation timed out)
[22:30] <bstillwell> Found another documentation issue with this page:
[22:30] <bstillwell> http://ceph.com/docs/next/install/rpm/
[22:31] <bstillwell> I has you configure /var/www/s3gw.fcgi in /etc/httpd/conf.d/rgw.conf
[22:31] <bstillwell> then step 2 below tells you to create it, but doesn't tell you where
[22:31] <bstillwell> step 3 tells you to make it executable, but then it's in /var/www/rgw/ instead of /var/www/
[22:32] * athrift (~nz_monkey@203.86.205.13) Quit (Read error: No route to host)
[22:32] * athrift (~nz_monkey@203.86.205.13) has joined #ceph
[22:32] <xarses> please create an issue in tracker.ceph.com
[22:33] <bstillwell> xarses: will do
[22:38] * wusui (~Warren@2607:f298:a:607:ccbb:b6b0:2d7c:a034) Quit (Ping timeout: 480 seconds)
[22:38] * WarrenUsui (~Warren@2607:f298:a:607:ccbb:b6b0:2d7c:a034) Quit (Ping timeout: 480 seconds)
[22:38] * wusui (~Warren@2607:f298:a:607:d1d:b885:7622:d9bd) has joined #ceph
[22:38] * aardvark (~Warren@2607:f298:a:607:d1d:b885:7622:d9bd) has joined #ceph
[22:43] * miniyo (~miniyo@0001b53b.user.oftc.net) has joined #ceph
[22:43] <bstillwell> xarses: issue #6182
[22:43] <kraken> bstillwell might be talking about: http://tracker.ceph.com/issues/6182 [Conflicting locations for s3gw.fcgi]
[22:45] * miniyo (~miniyo@0001b53b.user.oftc.net) Quit ()
[22:47] <jmlowe> Yea! Inktank support contract signed!
[22:47] <xarses> issue 6128
[22:47] <kraken> xarses might be talking about: http://tracker.ceph.com/issues/6128 [glance image-create with rbd --location fails to create image in rdb]
[22:47] * smiley_ (~smiley@pool-173-73-0-53.washdc.fios.verizon.net) has joined #ceph
[22:48] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[22:48] <xarses> joshd: I was able to get better logs for issue 6128, I've found that for whatever reason, rbd.py's Store.add is not eveing being called
[22:55] * alram (~alram@38.122.20.226) has joined #ceph
[22:59] * vata (~vata@2607:fad8:4:6:45a9:9a9a:2fbe:f449) Quit (Quit: Leaving.)
[23:01] * dmsimard (~Adium@108.163.152.2) Quit (Quit: Leaving.)
[23:05] * The_Bishop (~bishop@2001:470:50b6:0:6955:55fa:2ee7:b9e7) Quit (Ping timeout: 480 seconds)
[23:05] * buck (~buck@c-24-6-91-4.hsd1.ca.comcast.net) has left #ceph
[23:06] * torment2 (~torment@pool-72-64-192-178.tampfl.fios.verizon.net) has joined #ceph
[23:08] <bstillwell> With radosgw, any idea what would be giving me this:
[23:08] <bstillwell> WARNING: cannot read region map
[23:10] * markbby (~Adium@168.94.245.4) has joined #ceph
[23:10] <joshd> xarses: hmm, strange that it works for other drivers then. is it calling any rbd driver methods at all? I wonder if there's another whitelist of stores that support uploading somewhere...
[23:11] <xarses> joshd, apparently it works fine in horizon
[23:11] <xarses> just not from glance cli
[23:12] * miniyo (~miniyo@0001b53b.user.oftc.net) has joined #ceph
[23:12] <xarses> there is a location for swift
[23:12] <xarses> but they don't end up there
[23:13] <joshd> so it's not working from the cli for any driver then?
[23:13] <xarses> that i don't know
[23:13] <xarses> i would have thought it was in one of or integration tests for swift
[23:14] <xarses> I'll check
[23:14] <bstillwell> nm, figured it out: radosgw-admin regionmap update
[23:14] * sprachgenerator (~sprachgen@130.202.135.193) Quit (Quit: sprachgenerator)
[23:15] * The_Bishop (~bishop@93.182.144.2) has joined #ceph
[23:17] * roald (~oftc-webi@87.209.150.214) has joined #ceph
[23:17] * The_Bishop (~bishop@93.182.144.2) Quit ()
[23:20] * LeaChim (~LeaChim@176.24.168.228) Quit (Ping timeout: 480 seconds)
[23:21] <xarses> joshd: inital testing shows default_backend = swift --location is still buggy
[23:22] <joshd> xarses: starting to sound like an issue with the glance client or maybe it's using a different version of the api than horizon
[23:22] <xarses> but it would take over an hour to restore the env back to the original state where swift should have worked
[23:23] <joshd> what about just the filesystem backend?
[23:23] <joshd> no setup required for that
[23:24] <xarses> it returned status = active instantly, which is the same symptom of the other two backends
[23:24] <xarses> looking for the file
[23:25] <sel> I'm trying to understand the replication in ceph. I've got 4 nodes in two locations. I've set the rule for the pool to use the different locations as leaf, and I've set the min_size for the pool to 2. Given this configuration, would I survive that one location goes down?
[23:27] <xarses> ok, confirmed same bug with default_store = file
[23:27] <xarses> --location doesn't create anything, from stdin works
[23:27] <joshd> cool, good to know
[23:28] <joshd> maybe see if #openstack-glance is interested in any extra logs
[23:28] <xarses> freenode?
[23:28] <joshd> yeah
[23:28] <xarses> glance -v -d says its using api v1
[23:29] <joshd> I think that's the horizon default too, so perhaps a bug in the cli
[23:29] * mozg (~andrei@host86-185-78-26.range86-185.btcentralplus.com) has joined #ceph
[23:29] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[23:30] * allsystemsarego (~allsystem@5-12-37-127.residential.rdsnet.ro) Quit (Quit: Leaving)
[23:33] * andreask (~andreask@h081217135028.dyn.cm.kabsi.at) has joined #ceph
[23:33] * ChanServ sets mode +v andreask
[23:34] <bstillwell> sweet, I have a working cluster on centos 6.4 now
[23:34] <bstillwell> with rgw going
[23:35] <bstillwell> previously I've only had it going with ubuntu
[23:36] * kraken (~kraken@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Read error: Operation timed out)
[23:36] <bstillwell> are there any limits on the number of objects in a bucket?
[23:40] * dmick (~dmick@2607:f298:a:607:113e:2cc0:562f:17ad) has joined #ceph
[23:40] * gregmark (~Adium@cet-nat-254.ndceast.pa.bo.comcast.net) Quit (Read error: Connection reset by peer)
[23:49] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:51] <xarses> joshd: image-create doesn't exist in the glance v2 api for me
[23:52] <xarses> so horizon is doing something with api v1
[23:55] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[23:55] <xarses> ohhh
[23:55] <xarses> interesting
[23:56] <xarses> http://paste.openstack.org/show/45501/
[23:57] * tobru (~quassel@217-162-50-53.dynamic.hispeed.ch) has joined #ceph
[23:59] * roald (~oftc-webi@87.209.150.214) Quit (Remote host closed the connection)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.