#ceph IRC Log

Index

IRC Log for 2013-04-13

Timestamps are in GMT/BST.

[0:00] <humbolt> nhm: 3 servers, one 3TB disk (5400-7200rpm) each used exclusively for OSD, XFS, journals on the same disks.
[0:01] <nhm> humbolt: when you say 5400-7200rpm, do you mean variable speed or that one or more of the disks is 5400rpm?
[0:01] <humbolt> variable speed
[0:01] * smeven (~diffuse@123.208.160.236) has joined #ceph
[0:01] * smeven_ (~diffuse@123.208.160.236) has joined #ceph
[0:01] <humbolt> WD caviar green and red etc.
[0:02] <nhm> humbolt: ok. my expectation would be you'd see at best an aggregate performance of about half of one of those drives because you are doing 3x replcation to 3 drives, and journal writes to the same disks.
[0:03] <humbolt> will the performance increase with more OSDs per host?
[0:03] <nhm> unfortuantely with XFS I tend to see a fair amount of overhead, so likely you won't even get that much. :/
[0:04] <humbolt> how about putting the metadata pool on SSD OSDs through CRUSH map?
[0:04] <nhm> up until the point where you hit network throughput limitations. With 3x replication, each host can at best do about 35MB/s.
[0:04] <humbolt> how about putting all journals on SSD?
[0:05] <nhm> journals on SSDs will help so long as you don't overload the SSDs.
[0:05] <humbolt> each host, or each OSD?
[0:06] <humbolt> what do you mean with "up until network saturation", when you also say not more than 35 MB/s?
[0:06] <nhm> humbolt: if this is 1GbE, then each host will have 3x the data incoming (1x from the client and 2x from replicas from other OSD hosts). if you assume 105MB/s total for 1GbE, that's about 35MB/s that each host can take from the client.
[0:06] <humbolt> do you mean, if I have more hosts in my cluster, I would have better performance?
[0:07] <nhm> humbolt: if you add more OSDs per host, your throughput will increase up until the point where you saturate the network link.
[0:07] <nhm> right now you are disk limited, but at some point with 1GbE you will become network limited.
[0:08] * drokita1 (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[0:08] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[0:08] <humbolt> why are you saying "from the replicas"? the data is going to the replicas, is it not?
[0:09] <dspano> nhm: I have a very similar setup to humboldt.
[0:10] <humbolt> why is my cephfs performance even less than rados bench?
[0:10] <nhm> humbolt: With 3 hosts that each have 1 OSD, sometimes OSD 1 will be the primary and sometimes the other OSDs will be the primary. In each case, because you are using 3x replication, the primary will always send the data to both of the other OSDs. So for every write, each one of the OSDs is getting a copy of the data. That means that on average, every OSD is going to be getting 2/3 of it's data from other OSDs as replica traffic, and 1/3 of it from t
[0:11] <nhm> humbolt: because cephfs hasn't been performance tuned yet. It's still in active development.
[0:11] <nhm> humbolt: We've been focusing a lot more effort on RBD.
[0:11] <dmick> and because it involves a lot more code and overhead...
[0:14] <humbolt> hey one big question
[0:14] <nhm> Sure, then I've gotta run to dinner. :)
[0:14] <humbolt> through the s3 http interface, am I able to serve byte ranges … for pseudostreaming videos?
[0:15] <nhm> humbolt: that's a great question, and I have no idea. :) Yehuda is the guy to talk to.
[0:15] <humbolt> alright
[0:15] <humbolt> which part are you involved in?
[0:16] <nhm> humbolt: I'm kind of performance/hardwarey sort of guy.
[0:16] <dmick> humbolt: does the S3 protocol support it?
[0:17] <mrjack> nhm: is bonding mode 6 with ceph a good idea for 2x1GE?
[0:17] <humbolt> dmick: Well HTTP supports it.
[0:17] * portante|afk (~user@66.187.233.206) Quit (Quit: bye)
[0:18] <nhm> mrjack: I don't know too much about balance-alb. I've used round-robin on 2 10GbE cards effectively for performance testing, but not for fault tolerance.
[0:18] <dmick> humbolt: sure, but S3 is not HTTP, it's just carried over it
[0:19] <nhm> mrjack: It might be worth trying it out and seeing how it works.
[0:19] <mrjack> nhm: my current switch does not support lcap nor trunking.. so i am stuck with mode 6
[0:19] <mrjack> i use it
[0:19] <mrjack> but i don't know how to benchmark
[0:19] <nhm> mrjack: I think som folks are using mode 4.
[0:19] * LeaChim (~LeaChim@90.212.137.6) Quit (Ping timeout: 480 seconds)
[0:20] <mrjack> nhm: that requires intelligent switch
[0:20] <nhm> mrjack: I'd start out with just making the link works well with iperf.
[0:20] <dspano> nhm: That's what I'm using.
[0:21] <nhm> mrjack: yeah, I have the benefit of just directly connecting a pair of X520s with SFP+. :D
[0:21] <nhm> Ok, I gotta run guys. Good luck!
[0:21] <mrjack> i have a two node system with 2x10GE loop
[0:22] <dspano> nhm: Except I was crazy, and loaded them up with 4 2-port nics for each OSD server because initially, they were going to be DRBD servers.
[0:22] <mrjack> but that is production for $bigcustomer so i cannot "play" there ;)
[0:23] <mrjack> nhm mode 6 works different
[0:24] <mrjack> nhm: it connects via two nics with two different macs to two other hosts, but every tcp connection is limited to max 1gbps
[0:24] * rustam (~rustam@94.15.91.30) has joined #ceph
[0:24] <mrjack> nhm: but you can have two osds send data to you and recieve with 2gbps
[0:24] <dmick> nhm is out guys
[0:26] * BillK (~BillK@124-148-196-159.dyn.iinet.net.au) has joined #ceph
[0:27] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:29] * LeaChim (~LeaChim@90.214.200.177) has joined #ceph
[0:43] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:46] * yehuda_hm (~yehuda@2602:306:330b:1410:4cf0:8225:f626:5c15) Quit (Remote host closed the connection)
[0:53] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:55] * NightDog (~Karl@38.179.202.84.customer.cdi.no) Quit (Ping timeout: 480 seconds)
[1:10] * noob2 (~cjh@173.252.71.3) has left #ceph
[1:11] * mcclurmc_laptop (~mcclurmc@62.205.76.102) Quit (Ping timeout: 480 seconds)
[1:17] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) has left #ceph
[1:31] * rturk is now known as rturk-away
[1:34] * Forced (~Forced@205.132.255.75) Quit ()
[1:42] * xmltok (~xmltok@pool101.bizrate.com) Quit (Quit: Leaving...)
[1:43] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[1:46] <mrjack> what happens when you loos the journal to a osd?
[1:46] * xmltok (~xmltok@pool101.bizrate.com) Quit ()
[1:48] * xmltok (~xmltok@pool101.bizrate.com) has joined #ceph
[1:48] * xmltok (~xmltok@pool101.bizrate.com) Quit ()
[1:49] <dmick> humbolt: was curious, so looked up http://docs.aws.amazon.com/AmazonS3/latest/dev/GettingObjectsUsingAPIs.html
[1:49] <dmick> says it supports Range
[1:50] <dmick> so, not 100% sure about rgw, but I'd assume so
[1:52] <dmick> certainly it tries to handle it; see RGWGetObj::init_common (range_str is filled from the Range header). So I increase the probability.
[2:00] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:01] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[2:04] <mrjack> is there a way to configure osd max outgoing bandwidth?
[2:07] * jlogan (~Thunderbi@2600:c00:3010:1:1::40) Quit (Ping timeout: 480 seconds)
[2:11] * LeaChim (~LeaChim@90.214.200.177) Quit (Ping timeout: 480 seconds)
[2:15] * alram (~alram@38.122.20.226) Quit (Quit: leaving)
[2:17] * Forced (~Forced@205.132.255.75) has joined #ceph
[2:23] <nigwil> I am trying Ceph for the first time with 0.60 and getting this: http://pastebin.com/Ug198jtF
[2:24] <nigwil> one thing is the host only has 1GB of memory, perhaps that is too little?
[2:24] <nigwil> root@ceph0:/var/log/ceph# free -m total used free shared buffers cached
[2:24] <nigwil> Mem: 868 783 85 0 0 363
[2:24] <nigwil> -/+ buffers/cache: 419 449
[2:24] <nigwil> Swap: 3915 3 3912
[2:24] <nigwil> root@ceph0:/var/log/ceph#
[2:26] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[2:32] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:32] <phantomcircuit> nigwil, way too little
[2:32] <nigwil> ok, I will hunt down some memory and see if that helps
[2:34] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[2:35] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[2:38] <gregaf1> nigwil: memory's not your problem; somebody else saw that too with v0.60
[2:38] <gregaf1> but I don't know why...
[2:38] <nigwil> ahh, so a known bug
[2:39] <gregaf1> well, I first saw it in an email today
[2:39] <gregaf1> unless you're Drunkard Zheng, in which case it's all you ;)
[2:39] <gregaf1> *Zhang
[2:39] <nigwil> :-)
[2:39] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[2:40] <nigwil> should I drop back to bobtail so I have a happy Ceph?
[2:40] <gregaf1> I'd kind of like to figure it out — although not really at 5:40pm on Friday
[2:40] <gregaf1> lemme see if I can reproduce it on my dev box real quick
[2:40] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[2:40] <gregaf1> and can you run ceph mds dump for me?
[2:41] <nigwil> root@ceph0:/var/log/ceph# ceph mds dump
[2:41] <nigwil> dumped mdsmap epoch 2612
[2:41] <nigwil> epoch 2612
[2:41] <nigwil> flags 0
[2:41] <nigwil> created 2013-04-12 23:32:41.842322
[2:41] <nigwil> modified 2013-04-13 10:41:03.755111
[2:41] <nigwil> tableserver 0
[2:41] <nigwil> root 0
[2:41] <nigwil> session_timeout 60
[2:41] <nigwil> session_autoclose 300
[2:41] <nigwil> last_failure 0
[2:41] <nigwil> last_failure_osd_epoch 21
[2:41] <nigwil> compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding}
[2:41] <nigwil> max_mds 1
[2:41] <nigwil> in 0
[2:41] <nigwil> up {0=9299}
[2:41] <nigwil> failed
[2:41] <nigwil> stopped
[2:41] <nigwil> data_pools 0
[2:41] <nigwil> metadata_pool 1
[2:41] <nigwil> 9299: 192.168.178.10:6800/2502 'a' mds.0.7 up:replay seq 1 laggy since 2013-04-13 00:23:51.655113
[2:41] <nigwil> root@ceph0:/var/log/ceph#
[2:42] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:43] <gregaf1> was this the first time you booted it up, or was it working previously?
[2:43] <gregaf1> and how did you set up your cluster, nigwil
[2:43] <gregaf1> ?
[2:44] <nigwil> first time, after starting it shows OK health briefly then drops back to laggy and the client always fails with -5 error
[2:44] <nigwil> "set up cluster", as in ceph.conf?
[2:44] <gregaf1> did you run mkcephfs, or something else?
[2:45] <nigwil> 77 mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring
[2:45] <gregaf1> okay
[2:45] <nigwil> 78 service ceph -a start
[2:45] <nigwil> 79 ceph health
[2:45] <gregaf1> and it started out healthy and then went bad — what state was the MDS in when it was healthy?
[2:45] <nigwil> I did mess up one of the OSDs at some point (memory is fuzzy)
[2:46] <nigwil> and I stopped and started the service while I fixed the OSD, so is it possible the OSD is "damaged" in terms of the files in it?
[2:46] <nigwil> not really started out healthy, but if you do a health status quickly after starting then it shows OK for a short interval then degrades
[2:47] <nigwil> starting it = service ceph -a start
[2:47] <gregaf1> no, if you'd messed up the OSDs it would report badness there
[2:47] <nigwil> ok
[2:47] <gregaf1> oh, although it does say 0 bytes data
[2:47] * BillK (~BillK@124-148-196-159.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[2:47] <gregaf1> which is not really correct
[2:48] <gregaf1> yeah, I bet that has something to do with it, but I'm afraid it's been a long week without enough sleep for me so I really have to head out :(
[2:48] <nigwil> I'm cheating with one of the OSDs as it is just a directory on the OS partition
[2:48] <gregaf1> maybe dmick or somebody can carry along a bit
[2:48] <nigwil> the second OSD is an actual block device
[2:49] <gregaf1> that should work fine
[2:49] <gregaf1> you sure do have a lot of mdsmaps though for a brand new cluster
[2:49] <nigwil> no worries, thanks for the help so far, it is a good learning exercise for me.
[2:49] <gregaf1> how many times have you started it up, or do you have something else restarting the daemon — check that stuff
[2:50] <gregaf1> gl and thanks for testing :)
[2:50] <nigwil> I've started it manually 6 or so times
[2:50] * smeven (~diffuse@123.208.160.236) Quit (Quit: smeven)
[2:51] <nigwil> I'll revert to bobtail for now on another machine, and play there. Now that I have a bit of an idea how to do the quickstart it should be smoother the next time.
[2:57] * BillK (~BillK@124.150.40.203) has joined #ceph
[3:06] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[3:07] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit ()
[3:08] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[3:23] * acu (~acu07@24-159-215-150.static.roch.mn.charter.com) Quit (Quit: Leaving)
[3:25] * mrjack (mrjack@office.smart-weblications.net) Quit (Ping timeout: 480 seconds)
[3:26] * mjblw3 (~mbaysek@wsip-174-79-34-244.ph.ph.cox.net) Quit (Remote host closed the connection)
[3:28] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[3:28] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[3:31] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[3:44] <nigwil> anyone used the Dell C6145 as a Ceph node?
[3:45] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[3:46] * dmick (~dmick@2607:f298:a:607:607c:b82e:2508:2c50) Quit (Quit: Leaving.)
[3:57] * diegows (~diegows@190.190.2.126) has joined #ceph
[3:57] * jakku (~jakku@ad046161.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[3:58] * jakku (~jakku@ad046161.dynamic.ppp.asahi-net.or.jp) Quit (Remote host closed the connection)
[3:59] * jakku (~jakku@ad046161.dynamic.ppp.asahi-net.or.jp) has joined #ceph
[4:02] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[4:07] * humbolt (~elias@91-113-98-240.adsl.highway.telekom.at) Quit (Quit: humbolt)
[4:16] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[4:18] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[4:28] * rustam (~rustam@94.15.91.30) has joined #ceph
[4:34] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[4:40] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[5:14] <nigwil> after 0.60 broke down for me, I'm now all ok with 0.56.4, I can copy files ok to CephFS
[5:15] <nigwil> I am testing adding a new OSD (same host), and at the crush set step, is there an easy way of getting defaults for this?
[5:20] * BillK (~BillK@124.150.40.203) Quit (Ping timeout: 480 seconds)
[5:22] * calebamiles (~caleb@c-50-138-218-203.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[5:27] <nigwil> ok, got it, I did: root@ceph1:/etc/ceph# ceph osd crush set 2 1.0 root=default
[5:30] * BillK (~BillK@124.150.42.26) has joined #ceph
[5:40] * nwl (~levine@atticus.yoyo.org) Quit (Quit: Lost terminal)
[5:50] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[5:59] * portante (~user@c-24-63-226-65.hsd1.ma.comcast.net) has joined #ceph
[6:00] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[6:07] * rustam (~rustam@94.15.91.30) has joined #ceph
[6:08] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[6:29] * nwl (~levine@atticus.yoyo.org) has joined #ceph
[6:39] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Read error: Operation timed out)
[6:41] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[7:07] * yasu` (~yasu`@dhcp-59-149.cse.ucsc.edu) Quit (Remote host closed the connection)
[7:08] * eschnou (~eschnou@131.167-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:12] <nigwil> I'm adding another mon on another host, and the operations page is a little unclear (to a newb like me) about step 8: ceph-mon -i ...etc.
[7:12] <nigwil> is it necessary to copy the (altered) /etc/ceph/ceph.conf file to the new host?
[7:25] <nigwil> unable to read magic from mon data.. did you run mkcephfs? <-- ick
[7:33] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[7:33] * Vjarjadian_ (~IceChat77@90.214.208.5) has joined #ceph
[7:38] * Vjarjadian (~IceChat77@90.214.208.5) Quit (Ping timeout: 480 seconds)
[7:49] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) has joined #ceph
[8:01] <nigwil> adding a second MON I am getting this:
[8:01] <nigwil> Starting Ceph mon.b on ceph2...
[8:01] <nigwil> unable to read magic from mon data.. did you run mkcephfs?
[8:01] <nigwil> failed: 'ssh ceph2 ulimit -n 8192; /usr/bin/ceph-mon -i b --pid-file /var/run/ceph/mon.b.pid -c /tmp/ceph.conf.4852d3aac7636c14c1e3f6aa5c1abc1c '
[8:21] * Yen (~Yen@ip-83-134-96-108.dsl.scarlet.be) Quit (Ping timeout: 480 seconds)
[8:23] * Yen (~Yen@ip-81-11-211-6.dsl.scarlet.be) has joined #ceph
[8:29] * Vjarjadian (~IceChat77@90.214.208.5) has joined #ceph
[8:29] * eschnou (~eschnou@131.167-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[8:30] * Vjarjadian_ (~IceChat77@90.214.208.5) Quit (Ping timeout: 480 seconds)
[8:37] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[8:44] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) has joined #ceph
[10:23] * mcclurmc_laptop (~mcclurmc@d8D86A0A3.access.telenet.be) has joined #ceph
[10:43] <nigwil> in 0.56.4 service ceph stop does not appear to in fact stop things
[10:48] * vo1d (~v0@212-183-97-9.adsl.highway.telekom.at) has joined #ceph
[10:49] * mcclurmc_laptop (~mcclurmc@d8D86A0A3.access.telenet.be) Quit (Ping timeout: 480 seconds)
[10:53] * v0id (~v0@212-183-96-57.adsl.highway.telekom.at) Quit (Read error: Operation timed out)
[11:08] * mcclurmc_laptop (~mcclurmc@d8D86A0A3.access.telenet.be) has joined #ceph
[11:12] * rustam (~rustam@94.15.91.30) has joined #ceph
[11:13] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[11:14] * eschnou (~eschnou@131.167-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[11:15] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[11:16] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit ()
[11:16] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[11:26] * mcclurmc_laptop (~mcclurmc@d8D86A0A3.access.telenet.be) Quit (Ping timeout: 481 seconds)
[11:29] * eschnou (~eschnou@131.167-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[11:29] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[11:29] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[11:31] * LeaChim (~LeaChim@90.214.200.177) has joined #ceph
[11:34] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) Quit (Read error: Connection reset by peer)
[11:35] * matt_ (~matt@220-245-1-152.static.tpgi.com.au) has joined #ceph
[11:35] <nigwil> how is the network assignment achieved so that OSDs can talk to each other for replication?
[11:39] * tchmnkyz (~jeremy@0001638b.user.oftc.net) Quit (Quit: Lost terminal)
[11:41] * rustam (~rustam@94.15.91.30) has joined #ceph
[12:05] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[12:10] * rustam (~rustam@94.15.91.30) has joined #ceph
[12:13] * perohig (~email@92.40.253.195.threembb.co.uk) has joined #ceph
[12:21] * perohig (~email@92.40.253.195.threembb.co.uk) has left #ceph
[12:37] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[12:37] * leseb1 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[13:04] * mo- (~mo@2a01:4f8:141:3264::3) Quit (Remote host closed the connection)
[13:29] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) has joined #ceph
[13:44] * john_barbee_ (~jbarbee@c-98-226-73-253.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[14:02] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[14:05] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 19.0.2/20130307023931])
[14:06] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:20] * leseb1 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[14:22] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 19.0.2/20130307023931])
[14:24] * humbolt (~elias@62-46-145-13.adsl.highway.telekom.at) has joined #ceph
[14:25] <humbolt> is it recommended to have the OSD journals on SSDs?
[14:25] <humbolt> is this worth the effort
[14:26] <nigwil> good question. And how is the wear-factor on SSDs with OSD journals?
[14:29] <nigwil> humbolt: there is some discussion here: http://www.hastexo.com/resources/hints-and-kinks/solid-state-drives-and-ceph-osd-journals
[14:31] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[14:37] * diegows (~diegows@190.190.2.126) has joined #ceph
[14:49] <humbolt> nigwil: Yes, I have seen this. But it did not help me in my decisionmaking
[14:49] <humbolt> I will probably keep the journals on the respective disks
[14:49] <humbolt> however, I will set CRUSH to only use the SSDs for metadata.
[14:50] <nhm> humbolt: it depends on a couple of factors
[14:50] <humbolt> the other way round
[14:50] <humbolt> To make CRUSH put metadata only on SSDs
[14:50] <nhm> humbolt: if you don't have SSD journals, it's a good idea to have a controller with WB cache
[14:50] <humbolt> nhm: that is a new aspect for me
[14:50] <nhm> also, depending on the chassis, using SSD journals may mean less OSDs, so lower density and lower read speeds.
[14:51] * sleinen (~Adium@2001:620:0:26:c8ab:ae00:25a:afbd) Quit (Ping timeout: 480 seconds)
[14:51] <humbolt> I have room for 2 SSDs and 4 3.5" HDDs
[14:52] <humbolt> currently I only have one 3TB HDD and a 120GB SSD in 3 servers
[14:52] <nhm> Oh, and with SSD journals, if you are putting multiple journals on 1 SSD, losing the SSD means you lose multiple OSDs at once and a bigger recovery impact during an outage.
[14:52] <humbolt> the SSD has 80GB left for journals and/or for a high speed OSD pool
[14:53] <humbolt> nhm: that is, what worries me the most
[14:53] <nhm> humbolt: yeah, and that's a big deal if you don't have many servers.
[14:53] <humbolt> so I would like to know more about how much faster it would make my system
[14:53] <nhm> humbolt: If you have a 100 servers it doesn't matter as much.
[14:54] <nhm> humbolt: have you read this? http://ceph.com/uncategorized/argonaut-vs-bobtail-performance-preview/
[14:54] <nhm> it shows performance for both 8 disk configurations and 6 disk + 2 SSD configurations.
[14:54] <humbolt> I would have put the SSDs in RAID1
[14:55] <humbolt> the other question I have concerns journal size
[14:55] <Vjarjadian> why use raid 1 when you have Ceph?
[14:55] <BillK> humbolt: was swapping back and forward between a 7200rpm and ssd, ssd better but not by much. faster recovery
[14:55] <humbolt> how bid does it have to be and what happens, if it runs out of space?
[14:55] * sleinen (~Adium@2001:620:0:26:4804:8fa1:cba2:8678) has joined #ceph
[14:55] <humbolt> Vjarjadian: for the journal disks
[14:55] <BillK> humbolt: biggest gain was getting journals off the osd's ... major gain in speed/recovery/stability
[14:56] <humbolt> nhm: how big, that should have meant
[14:56] <nhm> humbolt: Usually we recommend enough journal to absorb like 10-20s of writes. So if your OSD can do 100MB/s that's like 1-2GB per journal.
[14:56] <nhm> humbolt: If it runs out of space, then writes will pause until data can flush to the OSD.
[14:57] <humbolt> it can do 450MB/s
[14:57] <Vjarjadian> are you currently bottlenecked by your NIC(s)? gigabit ethernet isnt insanely fast
[14:57] <humbolt> Vjarjadian: yes I am
[14:57] <nhm> humbolt: a 450MB/s SSD should be good (from a performance perspective) for at least 3-4 OSDs.
[14:57] <humbolt> This is a prototyping setup
[14:58] <humbolt> Vjarjadian: I can not currently afford a $ 10.000 10GB switch
[14:59] <humbolt> nhm: So I would need a 10G journal for each OSD then (20sec * 450MB/s)?
[15:00] <humbolt> ah, damn it
[15:00] <humbolt> OSD speed you meant
[15:00] <nhm> humbolt: yeah
[15:00] <humbolt> is it trivial to move the journals later
[15:00] <humbolt> currently I only have one ssd in there
[15:01] <nhm> humbolt: Also, on SSDs it may be good to only partition as much space as you need to help the drive do it's wear levelling.
[15:01] <humbolt> As soon as I have a second one, I would want to move the journals to a raid1 setup
[15:02] <humbolt> nhm: hmm
[15:02] <humbolt> nhm: smart idea
[15:03] <humbolt> nhm: But I was hoping to use the remaining space for RBD
[15:03] <humbolt> So my virtual machines run fast
[15:04] <nhm> humbolt: might be overloading things too much.
[15:04] <humbolt> I wish I had a 10GB switch
[15:05] <Vjarjadian> i had fiber at one client and it was still only 1gb....
[15:06] <humbolt> not for the clients I want this
[15:06] <humbolt> I am happy with 1GBit at the client side.
[15:07] <humbolt> One question concerning CRUSH
[15:07] <humbolt> weight
[15:08] <humbolt> To make things easy to maintain, should I put a weight of 3 on 3TB HDDs
[15:08] <humbolt> and 0.12 on a 120GB SSD
[15:08] <nhm> humbolt: you can set the weight differently if you have a heterogeneous cluster. In general I recommend sticking with homogeneous clusters though.
[15:09] <humbolt> or does that have any more impact than I presume
[15:09] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[15:09] <humbolt> nhm: well homogenous
[15:10] <humbolt> I would assign the SSDs to one highspeed pool and the HDDs to the others
[15:10] <nhm> humbolt: yes, that's typcally what we recommend if you have a couple of classes of hardware.
[15:10] <humbolt> And I presume that next year, the biggest drives will be 4 or 5 TB not 3 like now
[15:10] <nhm> keep them in seperate pools.
[15:11] <humbolt> for the ssds yes, easy
[15:11] <humbolt> do I even understand that right? weight defines, how much data of the whole cluster will end up there?!
[15:12] <humbolt> 1 1 1 means, each osd gets an equal amount
[15:12] <humbolt> 1 2 1 would mean, the middle one gets half the data
[15:12] <humbolt> of the whole cluster
[15:12] <humbolt> and the others get a quater
[15:12] <humbolt> right?
[15:13] <Vjarjadian> and iirc they recommend starting with weight 0 and then moving it up by 0.2 and then waiting for the cluster to rebalance (or that might have just been an old version)
[15:14] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[15:14] <humbolt> I just read that also. But this part is not my concern here. The end result is.
[15:15] <Vjarjadian> your cluster is only going to have 4 OSDs?
[15:15] <Vjarjadian> 3
[15:17] <humbolt> No
[15:17] <humbolt> It will grow
[15:17] <humbolt> But I have to start somewhere
[15:17] <humbolt> BTW, I hate fucking MACs!
[15:17] <Vjarjadian> lol
[15:18] <Vjarjadian> i hate the price....
[15:18] <humbolt> I just bought one and all the keys are somewhere else. Not even ALT+TAB is what you expect it to be!
[15:18] <nhm> I'm on an air running ubuntu.
[15:19] <nhm> It works pretty well.
[15:19] <humbolt> nhm: I am close to running back to ubuntu as well.
[15:19] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[15:19] <humbolt> nhm: I got an Air here too. How comfortable is it to install Ubuntu on Air?
[15:19] <nhm> humbolt: I did it about a year ago using the special mac disk
[15:20] <nhm> humbolt: it takes some tweaking to get the trackpad right, but otherwise works more or less properly.
[15:20] <humbolt> Ever ran OSX in a VM on Ubuntu on Air?
[15:20] <nhm> humbolt: naw, I just can't do OSX. I keep trying it every year or two and giving up.
[15:21] <humbolt> And how about battery runtime and sleep mode?
[15:21] <humbolt> nhm: I just need my Adobe Suite.
[15:21] <humbolt> Every once in a while
[15:21] <nhm> humbolt: I have a strange issue where the trackpad stops responding after hibernate wake and I have to reload the module.
[15:22] <humbolt> But I don't want to go through dualboot hell
[15:22] <nhm> humbolt: I don't sleep it often though so I haven't bothered fixing it.
[15:22] <nhm> humbolt: battery runtime is probably a couple of hours or more depending on what I'm doing.
[15:23] <nhm> humbolt: Not as good as OSX but not horrible.
[15:35] * joao (~JL@89-181-149-4.net.novis.pt) Quit (Quit: Leaving)
[15:37] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[15:37] * ChanServ sets mode +o scuttlemonkey
[15:46] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[16:12] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[16:30] <humbolt> When I have SSDs and HDDs in the same host, how do I represent that in a CRUSH map. The example in the doc shows, that I would treat them, as if they were separate hosts. Is that the way to go?
[16:40] * sleinen (~Adium@2001:620:0:26:4804:8fa1:cba2:8678) Quit (Quit: Leaving.)
[16:40] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[16:43] <humbolt> When I have 4 HDD-OSDs per server and want to store their journals on the servers SSD, do I need a separate journal partition for each OSD, or can they share a partition?
[16:47] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) has joined #ceph
[16:48] * loicd (~loic@173-12-167-177-oregon.hfc.comcastbusiness.net) Quit (Quit: Leaving.)
[16:48] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[16:49] * themgt (~themgt@96-37-28-221.dhcp.gnvl.sc.charter.com) Quit ()
[16:49] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[16:54] * rustam (~rustam@94.15.91.30) has joined #ceph
[16:56] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[17:34] * rustam (~rustam@94.15.91.30) has joined #ceph
[17:41] * mikedawson (~chatzilla@253.3.63.69.dyn.southslope.net) has joined #ceph
[17:50] * cyclone (~cyclone@37.131.0.227) has joined #ceph
[17:55] * mikedawson (~chatzilla@253.3.63.69.dyn.southslope.net) Quit (Ping timeout: 480 seconds)
[17:56] * cyclone is now known as msheikh
[17:56] * msheikh (~cyclone@37.131.0.227) Quit ()
[18:08] * BillK (~BillK@124.150.42.26) Quit (Ping timeout: 480 seconds)
[18:12] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[18:30] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[18:38] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[18:38] <mrjack> re
[18:44] * rekby (~Adium@2.93.58.253) has joined #ceph
[18:45] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[18:46] * mcclurmc_laptop (~mcclurmc@62.205.79.211) has joined #ceph
[18:47] * sleinen1 (~Adium@2001:620:0:26:18c5:dc9a:d7a2:494a) has joined #ceph
[18:52] * loicd (~loic@67.131.102.78) has joined #ceph
[18:53] <rekby> hello, I have test cluster with one mon, 2 osd. I create next osd (osd.2), add it to crush map, add auth key to ceph, do
[18:53] <rekby> ceph osd in 2, start osd process on node, but it don't change status from down to up
[18:53] <rekby> i have turned off firewall on all nodes and can connect each to other by telnet
[18:54] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[18:54] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Remote host closed the connection)
[18:55] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[18:55] <rekby> I haveno any error in log
[19:34] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[19:35] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Quit: Run away!)
[19:36] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[19:38] * rustam (~rustam@94.15.91.30) has joined #ceph
[19:42] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[19:43] <Kdecherf> hm, I have a corrupted mon during upgrade from 0.56.1 to 0.60
[19:44] <Kdecherf> does anyone know a way to reinit a mon?
[19:49] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[19:54] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Ping timeout: 480 seconds)
[19:57] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[19:57] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Max SendQ exceeded)
[20:07] * mikedawson (~chatzilla@253.3.63.69.dyn.southslope.net) has joined #ceph
[20:13] * rekby (~Adium@2.93.58.253) Quit (Quit: Leaving.)
[20:13] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[20:26] * mikedawson (~chatzilla@253.3.63.69.dyn.southslope.net) Quit (Ping timeout: 480 seconds)
[20:28] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[20:32] * joao (~JL@89.181.149.4) has joined #ceph
[20:32] * ChanServ sets mode +o joao
[20:33] * judu (~judu@abo-241-183-68.mtp.modulonet.fr) has joined #ceph
[20:55] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) Quit (Quit: themgt)
[20:56] * mikedawson (~chatzilla@253.3.63.69.dyn.southslope.net) has joined #ceph
[20:57] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[21:05] * mikedawson_ (~chatzilla@253.3.63.69.dyn.southslope.net) has joined #ceph
[21:06] * mikedawson__ (~chatzilla@253.3.63.69.dyn.southslope.net) has joined #ceph
[21:07] * humbolt (~elias@62-46-145-13.adsl.highway.telekom.at) Quit (Quit: humbolt)
[21:13] * mikedawson (~chatzilla@253.3.63.69.dyn.southslope.net) Quit (Ping timeout: 480 seconds)
[21:13] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[21:14] * mikedawson_ (~chatzilla@253.3.63.69.dyn.southslope.net) Quit (Ping timeout: 480 seconds)
[21:15] * mikedawson__ (~chatzilla@253.3.63.69.dyn.southslope.net) Quit (Ping timeout: 480 seconds)
[21:19] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Quit: Leaving.)
[21:20] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[21:24] * eschnou (~eschnou@131.167-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[21:25] * leseb1 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[21:25] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[21:26] * leseb1 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[21:26] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[21:28] <mrjack> is there a way to priorize io?
[21:28] <mrjack> when using ocfs2 ontop of rbd and then saturating all ceph osd disk i/o, ocfs2 is going to do machine_reset() after hitting timeout for on-disk heartbeat..
[21:29] <mrjack> is there a way to say e.g. small writes have priority?
[21:29] * dxd828_ (~dxd828@host86-133-106-40.range86-133.btcentralplus.com) has joined #ceph
[21:34] * leseb1 (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) has joined #ceph
[21:34] * leseb (~Adium@pha75-6-82-226-32-84.fbx.proxad.net) Quit (Read error: Connection reset by peer)
[21:38] * rustam (~rustam@94.15.91.30) has joined #ceph
[21:40] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[21:45] * BillK (~BillK@124.150.42.26) has joined #ceph
[21:53] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[21:54] * mikedawson (~chatzilla@253.3.63.69.dyn.southslope.net) has joined #ceph
[21:57] * dxd828_ (~dxd828@host86-133-106-40.range86-133.btcentralplus.com) Quit (Quit: Computer has gone to sleep.)
[21:58] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Quit: Run away!)
[21:59] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[21:59] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[21:59] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Max SendQ exceeded)
[22:05] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[22:07] * DarkAceZ (~BillyMays@50.107.54.92) has joined #ceph
[22:24] * dxd828_ (~dxd828@host86-133-106-40.range86-133.btcentralplus.com) has joined #ceph
[22:31] * LeaChim (~LeaChim@90.214.200.177) Quit (Ping timeout: 480 seconds)
[22:31] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[22:33] * DarkAce-Z (~BillyMays@50.107.54.92) has joined #ceph
[22:38] * DarkAceZ (~BillyMays@50.107.54.92) Quit (Read error: Operation timed out)
[22:41] * LeaChim (~LeaChim@176.27.223.143) has joined #ceph
[22:42] * rustam (~rustam@94.15.91.30) has joined #ceph
[22:44] * rustam (~rustam@94.15.91.30) Quit (Remote host closed the connection)
[22:44] * loicd (~loic@67.131.102.78) Quit (Ping timeout: 480 seconds)
[22:59] * humbolt (~elias@62-46-145-13.adsl.highway.telekom.at) has joined #ceph
[23:05] * BillK (~BillK@124.150.42.26) Quit (Ping timeout: 480 seconds)
[23:15] * mikedawson (~chatzilla@253.3.63.69.dyn.southslope.net) Quit (Ping timeout: 480 seconds)
[23:35] <Kdecherf> oh god, we spotted #4521 on our cluster
[23:43] * dxd828_ (~dxd828@host86-133-106-40.range86-133.btcentralplus.com) Quit (Quit: Textual IRC Client: www.textualapp.com)
[23:50] * eschnou (~eschnou@131.167-201-80.adsl-dyn.isp.belgacom.be) Quit (Read error: Operation timed out)
[23:57] <humbolt> for some weird reason my osd stays down after I start the respective osd service
[23:57] <humbolt> service ceph -a start osd.5
[23:59] * rustam (~rustam@94.15.91.30) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.