#ceph IRC Log


IRC Log for 2012-10-18

Timestamps are in GMT/BST.

[0:00] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:00] <sjust> renzhi: can you grep the log for 7fad80029700 and send the last 10000 lines of that?
[0:00] <renzhi> ?
[0:01] <sjust> grep '7fad80029700' <log file name> | tail -n 10000 > log
[0:01] <sjust> and upload log (gzipped)
[0:01] <renzhi> hang on
[0:03] <renzhi> sjust: log2.gz uploaded
[0:04] <sjust> renzhi: you need to grep the whole log
[0:04] <sjust> the problem is that the information I need is earlier than the 10000 line log you sent originally
[0:05] <renzhi> I did grep on the whole log file,
[0:05] <sjust> oh, then we are missing some
[0:05] <sjust> or it did
[0:05] <sjust> hmm
[0:06] <sjust> can you post the contents of dmesg?
[0:06] <sjust> it sounds like your underlying filesystem is corrupt
[0:06] <renzhi> let me grep on the log of another osd
[0:06] <renzhi> ah no
[0:06] <renzhi> not the same thing
[0:06] <sjust> that thread id is from that osd
[0:06] <sjust> won't work
[0:06] <renzhi> yeah
[0:07] <sjust> not that we can't do the same thing with another osd, but it seems like the thread locked up as soon as it tried to touch the fs
[0:07] <sjust> can you post the first 1000 lines of the log?
[0:07] <renzhi> we are trying some test to see if it's a hw issue
[0:07] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[0:07] <renzhi> ok
[0:07] <sjust> renzhi: that is also plausible given the symptoms
[0:07] <sjust> but the number of osds involved makes me doubt that
[0:08] * Ryan_Lane (~Adium@ has joined #ceph
[0:08] <sjust> how many machines are the down osds spread over?
[0:08] <renzhi> 4
[0:08] <sjust> and are there up osds on those machines
[0:08] <sjust> ?
[0:08] <renzhi> on two machines, all 10 osds are down
[0:08] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Remote host closed the connection)
[0:09] <renzhi> I'm not sure it's hardware issue, because that would mean all 10 hd are bad at the same time.
[0:09] <renzhi> these are new machines that have been running for 3 months
[0:09] <sjust> or a controller
[0:09] * aliguori (~anthony@ Quit (Read error: Connection reset by peer)
[0:09] <renzhi> you mean a RAID controller?
[0:10] <sjust> this is a bit outside of my area, but yeah
[0:10] <sjust> we seem to set up our nodes with raid controllers in jbod mode for the most part
[0:10] <sjust> there are 50 defunct osds over 4 machines, right?
[0:10] <renzhi> we don't have RAID, as per recommendation by the ceph docs
[0:11] <renzhi> about 30 somethings
[0:11] * loicd (~loic@ Quit (Quit: Leaving.)
[0:12] <sjust> so 30 something down osds over 4 nodes?
[0:15] <renzhi> yes
[0:15] <renzhi> sjust: uploaded head-10k.log.gz
[0:16] <renzhi> the first 10000 lines o the log file
[0:17] <sjust> did you get a chance to upload the output of dmesg?
[0:18] <renzhi> oh
[0:19] <renzhi> uploaded dmesg.txt
[0:21] <renzhi> we lost even more osds now, only 22 left, out of 76 !
[0:21] <sjust> ok, can you upload out from grep -v 'read_log' <log file> | head -n 10000 > out
[0:21] <sjust> ?
[0:21] <renzhi> ok
[0:23] <renzhi> uploaded grep-log-10K.gz
[0:24] <elder> sagewk, the bug I have a fix for has to do with the function that determines when a page can be added to an existing bio. Our function is indicating it cannot in many cases when it can--like, many many times repeatedly.
[0:24] <elder> The consequence would be all those non-merged requests might go out as single operations rather than as one big coalesced operatino.
[0:25] <sagewk> elder: nice.. that sounds like a win.
[0:25] * cdblack (c0373626@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[0:25] <sagewk> we have seen lots of small ios coming through that seemed wrong, fwiw
[0:25] <elder> But, it didn't mnake the problem go away...
[0:25] <sagewk> the crash you mean...
[0:25] <elder> Ye.s
[0:25] <elder> It took two tries to crash it rather than one.
[0:25] <elder> Half as much!
[0:25] <sagewk> not reproducible with uml?
[0:25] <elder> Progress!
[0:25] <sagewk> :)
[0:25] <elder> I'll give that a try.
[0:25] <elder> Now that I have a reproduction.
[0:26] <elder> I've been more looking at how the code works rather than trying repeatedly to reproduce it. That and discussing things with Dan.
[0:26] <elder> I'll give it a shot now though, I think it will help.
[0:29] * vata (~vata@2607:fad8:4:0:d0ef:740b:1c51:6a09) Quit (Ping timeout: 480 seconds)
[0:29] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[0:30] <elder> Oh yeah, I did try with UML but I was running out of disk space.
[0:30] * PerlStalker (~PerlStalk@ Quit (Quit: rcirc on GNU Emacs 24.2.1)
[0:30] <elder> That doesn't make sense. Hmm.
[0:33] <elder> Yes, I can easily reproduce the problem with UML, sagewk. Sweet!
[0:33] * oxhead (~oxhead@cpe-075-182-099-083.nc.res.rr.com) has joined #ceph
[0:33] <sagewk> nice!
[0:33] <sagewk> ...with debugging cranked up to crazy?
[0:34] <dmick> "set phasers to crazy"
[0:35] <sagewk> turn it to 11!
[0:37] <Q310> were giving it all shes got captain
[0:37] <calebamiles> :)
[0:37] <sjust> renzhi: did you get a core dump?
[0:38] <renzhi> sjust: no
[0:38] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) has joined #ceph
[0:38] <iggy> i didn't know uml still existed
[0:38] <renzhi> I got core dump from mon a couple of times, but not osd
[0:38] <elder> Hmmm.
[0:39] <elder> Now it isn't happening with GDB attached.
[0:40] * steki-BLAH (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:40] <elder> Yes it did! It helps to send your output to the right file system.
[0:40] <elder> Sorry for the emotional ride.
[0:40] * justinwarner (~Thunderbi@ has joined #ceph
[0:43] <rweeks> fasten your seat belts, it's going to be a bumpy chat?
[0:43] <rweeks> ;)
[0:45] <sjust> renzhi: try adding the following lines to the osd section of your ceph.conf's:
[0:45] <sjust> filestore op thread timeout 600
[0:45] <sjust> filestore op thread suicide timeout 18000
[0:45] <sjust> oops
[0:45] <renzhi> this would just lengthen the time out time?
[0:45] <sjust> filestore op thread timeout = 600
[0:46] <sjust> filestore op thread suicide timeout = 18000
[0:46] <sjust> did you very recently create a pool with a lot of pgs?
[0:46] <renzhi> yeah
[0:46] * synapsr (~synapsr@ has joined #ceph
[0:46] <sjust> that's the problem
[0:46] <sjust> it should be ok
[0:46] <renzhi> why is that?
[0:46] <sjust> well, there is per-pg overhead
[0:46] <sjust> hang on
[0:47] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[0:47] <renzhi> the default value is too low, so disk usage is very unbalanced, data were concentrated on only 20 disks or so
[0:47] <sjust> yeah
[0:47] <sjust> hang on
[0:47] <renzhi> ok
[0:48] <sjust> how large is pool 18?
[0:48] <sjust> ideally, you want around 100-200 pgs per osd
[0:49] <renzhi> we plan to add more machines and a lot more osds soon, and that's why
[0:49] * tnt (~tnt@246.121-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[0:50] <sjust> newer versions have an easier time with larger pools, but currently using too many pgs per osd results in certain filesystem transactions taking a very long time
[0:50] <renzhi> pool 18 is the largest, with over 5 millions objects
[0:50] <sjust> anyway, it should be mostly ok if you increase the timeout
[0:50] <sjust> yeah, but how many pgs?
[0:51] * synapsr (~synapsr@ has joined #ceph
[0:51] <sjust> you can see it at the top of ceph osd dump, I think
[0:52] <renzhi> pool 18 'yunio' rep size 3 crush_ruleset 0 object_hash rjenkins pg_num 200000 pgp_num 200000 last_change 2496 owner 0
[0:52] * Kioob (~kioob@luuna.daevel.fr) Quit (Ping timeout: 480 seconds)
[0:53] <sjust> ok, try increasing the timeouts
[0:53] <sjust> it's mostly a problem on startup, it should be better once the cluster stabilizes
[0:53] <renzhi> ok
[0:53] * vata (~vata@ has joined #ceph
[0:53] <sjust> how many pgs do the other pools have?
[0:54] <renzhi> the others have the default values
[0:55] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Quit: Leaving)
[0:55] <sjust> ok
[0:55] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[0:56] <sjust> this is troubling, 10000 pgs per osd shouldn't be this much of a problem
[0:56] <sjust> though it is on the high side
[0:56] <sjust> we'll see how it behaves with the increased timeouts
[0:56] <renzhi> we were planning to have about 1000 to 2000 osds
[0:59] * synapsr (~synapsr@ Quit (Ping timeout: 480 seconds)
[0:59] <sjust> yeah, it's reasonable on your part, unreasonable on ours
[1:00] <sjust> the good news is that 0.50 greatly improved efficiency for exactly this sort of thing
[1:00] <sjust> so when bobtail is good to go, the situation should improve drastically
[1:00] <renzhi> yeah
[1:01] <sjust> but even now, it should behave ok
[1:01] <sjust> with the increased timeouts
[1:01] * synapsr (~synapsr@ has joined #ceph
[1:01] <renzhi> I added the two timeout params, started that osd.18, and it seems to have no effect, or should I wait longer?
[1:02] * loicd (~loic@ has joined #ceph
[1:02] <sjust> it could take a while to come back up
[1:02] <sjust> if you see the process die, that would be a bad sign
[1:02] <renzhi> ok, I'm keeping an eye
[1:02] <renzhi> darn, it's been 8 hours of down time ....
[1:03] * pentabular (~sean@adsl-70-231-129-172.dsl.snfc21.sbcglobal.net) has left #ceph
[1:04] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[1:07] * loicd1 (~loic@ has joined #ceph
[1:07] * loicd (~loic@ Quit (Read error: Connection reset by peer)
[1:08] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[1:08] * LarsFronius (~LarsFroni@95-91-242-153-dynip.superkabel.de) Quit (Quit: LarsFronius)
[1:12] * tziOm (~bjornar@ti0099a340-dhcp0778.bb.online.no) Quit (Remote host closed the connection)
[1:13] * vata (~vata@ Quit (Ping timeout: 480 seconds)
[1:15] * sagelap1 (~sage@2607:f298:a:607:572:128b:51f4:d83) Quit (Quit: Leaving.)
[1:15] * sagelap (~sage@ has joined #ceph
[1:17] * maelfius1 (~mdrnstm@206.sub-70-197-141.myvzw.com) Quit (Quit: Leaving.)
[1:18] <renzhi> sjust: the number of up osds is stuck at 22, the # of in osd increases a bit, and falls again, at 31
[1:18] * Meths (rift@ has joined #ceph
[1:19] * sagelap (~sage@ Quit ()
[1:19] * sagelap (~sage@ has joined #ceph
[1:20] * lofejndif (~lsqavnbok@9KCAACE83.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:21] * sagelap (~sage@ Quit ()
[1:21] * sagelap (~sage@ has joined #ceph
[1:25] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Quit: Leaving)
[1:29] * vata (~vata@ has joined #ceph
[1:30] <sjust> renzhi: are you seeing the processes dieing?
[1:30] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[1:30] <renzhi> sjust: after adding the two timeout params, the processes started after are still going, but those that were running started to die
[1:32] <sjust> well, the ones that were already running don't have the updated timeouts
[1:33] <renzhi> yeah
[1:33] <sjust> are the ones you have restarted with the long timeouts set to up and in yet?
[1:33] <rweeks> restart the other processes then?
[1:33] <sjust> yeah, that's the thing to do
[1:34] <renzhi> yeah, they are still running, so I'm going slowly to see how they would stand
[1:34] * oxhead (~oxhead@cpe-075-182-099-083.nc.res.rr.com) Quit (Remote host closed the connection)
[1:34] <sjust> yeah, good plan
[1:35] * catza (~catza@ has joined #ceph
[1:36] <renzhi> I'm having hope again :)
[1:38] * tryggvil (~tryggvil@ has joined #ceph
[1:38] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:38] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) has joined #ceph
[1:38] * loicd1 (~loic@ Quit (Quit: Leaving.)
[1:41] * sagelap (~sage@ Quit (Quit: Leaving.)
[1:41] * sagelap (~sage@ has joined #ceph
[1:46] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[1:53] * oxhead (~oxhead@cpe-075-182-099-083.nc.res.rr.com) has joined #ceph
[1:53] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[1:58] * loicd (~loic@ has joined #ceph
[1:58] * loicd (~loic@ Quit ()
[2:02] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[2:03] * oxhead (~oxhead@cpe-075-182-099-083.nc.res.rr.com) Quit (Remote host closed the connection)
[2:04] * miroslavk (~miroslavk@ Quit (Quit: Leaving.)
[2:04] * loicd (~loic@ has joined #ceph
[2:07] * loicd1 (~loic@ has joined #ceph
[2:07] * loicd (~loic@ Quit ()
[2:10] <renzhi> sjust: it seems to hold up for some osds, but as the number of osds up goes over 50, it does not seem to go further, and some osds start to die again
[2:10] * Tv_ (~tv@ Quit (Quit: Tv_)
[2:10] * loicd1 (~loic@ Quit ()
[2:10] <sjust> do those osds have the updated timeout value?
[2:11] <sjust> can you send the last 10k lines of one of the newly dead?
[2:12] <renzhi> yeah
[2:12] * loicd (~loic@ has joined #ceph
[2:13] * loicd1 (~loic@ has joined #ceph
[2:13] * loicd (~loic@ Quit ()
[2:13] * oxhead (~oxhead@cpe-075-182-099-083.nc.res.rr.com) has joined #ceph
[2:16] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) has joined #ceph
[2:16] <renzhi> sjust: just uploaded osd23.log
[2:17] <renzhi> at one point, the number of osds up went up to 56, now went down below 50 again
[2:17] * loicd1 (~loic@ Quit ()
[2:18] <sjust> can you restart that node with debug osd = 20, debug filestore = 20, debug ms = 1 ?
[2:19] <renzhi> hang on
[2:20] * synapsr (~synapsr@ has joined #ceph
[2:21] <renzhi> ok, I have all 10 osds on a machines went down again, running with debug osd = 20 and debug ms = 1
[2:21] <renzhi> you want to last 10K lines again?
[2:22] <sjust> yeah
[2:23] <renzhi> uploaded osd10.10K.log.gz
[2:24] * tryggvil (~tryggvil@ Quit (Ping timeout: 480 seconds)
[2:27] <sjust> renzhi: that one doesn't seem to have crashed
[2:28] <sjust> you'll need to wait for it to crash
[2:28] <renzhi> no, however, the process just disappear
[2:28] * sagelap1 (~sage@109.sub-70-197-145.myvzw.com) has joined #ceph
[2:29] <sjust> that's odd
[2:29] <sjust> did any of them crash again?
[2:29] <renzhi> it was running for over an hour, but as I gradually bringing up others, all osds on this machines just disappeared
[2:29] <sjust> how much free memory have you had on your nodes?
[2:30] <sjust> if the nodes were running out of memory, they might have been oom killed?
[2:30] <gregaf> pretty sure OOM-killer shows up in dmesg, right?
[2:30] <sjust> anything in dmesg/
[2:30] <sjust> yeah
[2:31] <renzhi> uploaded the new dmesg.txt
[2:31] * sagelap (~sage@ Quit (Ping timeout: 480 seconds)
[2:32] <sjust> the dmesg has a bunch of backtraces
[2:32] <renzhi> yeah
[2:32] <sjust> looks like something wrong with xfs
[2:33] <sjust> try rebooting the node and trying again?
[2:36] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[2:37] * loicd (~loic@ has joined #ceph
[2:39] * loicd (~loic@ Quit ()
[2:44] * oxhead (~oxhead@cpe-075-182-099-083.nc.res.rr.com) Quit (Remote host closed the connection)
[2:55] <sjust> renzhi: I am going home, I'll be back online in around 1 hour
[2:59] <renzhi> sjust: thanks a lot
[2:59] <renzhi> still struggling...
[3:03] * Cube1 (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[3:04] * joshd (~joshd@ Quit (Ping timeout: 480 seconds)
[3:13] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[3:15] <Q310> anyone know if this is correct when using rbd with openstack/libvirt, /var/log/libvirt/qemu, instance log, "/var/lib/nova/instances/instance-00000002/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1" ? Trying to find why its not creating my volumes when i launch a instance...
[3:16] <renzhi> anyone can provide emergency tech support? :(
[3:17] <Q310> i'm not a guru however explain your problem and someone might speak up :)
[3:19] <renzhi> our osds have been crashing, they are running for a couple hours and crash
[3:19] <renzhi> we have 76 osds, but we can't bring up over 50. As soon as the number goes over 50, they seem to crash randomly
[3:20] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[3:20] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[3:20] <renzhi> a few nice folks have been helping, but they still keep crashing
[3:21] * sagelap1 (~sage@109.sub-70-197-145.myvzw.com) Quit (Ping timeout: 480 seconds)
[3:21] <Q310> you could try inktank directly i guess
[3:22] <gregaf> I've got to run soon, but:
[3:22] <gregaf> Q310: that command-line isn't using rbd at all
[3:23] <gregaf> not sure what you're trying to do, but it's definitely using a qcow2 image
[3:23] <renzhi> we tried to contact inktank previously, but no response though
[3:23] <Q310> gregaf: thats what i though very werid :\
[3:23] <renzhi> anyone from inktank here now?
[3:23] <renzhi> we need to resolve the issues badly
[3:23] <scuttlemonkey> Renzhi, I can help find you someone
[3:23] * synapsr (~synapsr@ has joined #ceph
[3:23] <renzhi> scuttlemonkey: thanks
[3:24] <Q310> renzhi: you could try asking for Dona Holmberg shes been great help for myself
[3:24] <Q310> with inquerys etc
[3:24] <nhm> renzhi: I'm around, not sure if I can help though...
[3:25] <Q310> gregaf: that was just a cut of the libvirt log
[3:25] <scuttlemonkey> heh, I was actually just on the phone w/ Dona
[3:25] <scuttlemonkey> she said to give you her email and she can get you the help you need asap
[3:25] <renzhi> nhm: how can we work out?
[3:25] <gregaf> Sam'll be back on in a bit
[3:27] <Q310> dona@inktank.com
[3:27] * miroslavk (~miroslavk@ has joined #ceph
[3:27] <nhm> It gets dark so early these days. Feel like bed time already.
[3:27] <nhm> By winter it's going to be dark at like 2:30pm PST. :)
[3:27] <Q310> sleeps over rated
[3:28] <Q310> apparently
[3:28] * Cube1 (~Cube@ has joined #ceph
[3:28] <joao> nhm, damn...
[3:28] <nhm> Q310: Sleep hasn't been overrated since college. ;)
[3:28] <Q310> nhm: true, could be why on the weekends i sleep about 12 hours stright ha
[3:29] <joao> speaking about sleep, better head to bed
[3:29] <nhm> I need atleast a good 7 hours these days, and prefer 8.
[3:29] <nhm> joao: yes, isn't it like 4am there?
[3:29] <Q310> <nod>
[3:30] <joao> naa, it's only 2h30
[3:30] * Cube2 (~Cube@ has joined #ceph
[3:30] * danieagle (~Daniel@ has joined #ceph
[3:30] <nhm> joao: good thing you guys have decent coffee.
[3:30] <joao> I guess that helps, yes :p
[3:31] <joao> I meant to call it a day earlier, but then I started watching yesterday's presidential debate and time just went by
[3:32] <joao> well, night #ceph
[3:32] <nhm> joao: good night
[3:33] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) Quit (Quit: This computer has gone to sleep)
[3:33] <Q310> anyone else managed to do a folsom install and have ceph working 100%?
[3:34] * calebamiles1 (~caleb@65-183-128-164-dhcp.burlingtontelecom.net) has joined #ceph
[3:34] * calebamiles (~caleb@65-183-128-164-dhcp.burlingtontelecom.net) Quit (Read error: Connection reset by peer)
[3:34] <Q310> this is really werid, glance is using rbd no worries if i do a "cinder create --display-name test 10" it creates a rbd volume
[3:35] <Q310> i guess its more of a openstack issue really
[3:35] * miroslavk (~miroslavk@ Quit (Ping timeout: 480 seconds)
[3:36] * Cube1 (~Cube@ Quit (Ping timeout: 480 seconds)
[3:39] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[3:39] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[3:40] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[3:44] * calebamiles1 (~caleb@65-183-128-164-dhcp.burlingtontelecom.net) Quit (Ping timeout: 480 seconds)
[3:52] * sjusthm (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[3:53] <sjusthm> renzhi: hows it going?
[3:54] <renzhi> sjust: not well, osds continue to go down ....
[3:55] <renzhi> seems to timeout value just extends the time a bit and make the problem longer to appear
[3:55] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[3:55] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[3:55] <sjusthm> how many machines currently have down osds?
[3:56] <sjusthm> are the down osds clustered around different machines from before?
[3:56] <renzhi> all the machines have down osds, it's quite random
[3:56] <sjusthm> do you see a pattern in the osd logs/
[3:56] <sjusthm> is there a backtrace?
[3:56] <renzhi> not really
[3:57] * Cube2 (~Cube@ Quit (Quit: Leaving.)
[3:58] <sjusthm> so the osds are crashing without a backtrace?
[3:58] * chutzpah (~chutz@ Quit (Quit: Leaving)
[3:58] <renzhi> yeah, they suddenly consume a lot more RAM than normal
[3:59] <renzhi> so sometimes they got OOM
[4:00] <sjusthm> try bringing up 5 osds per node and wait for that to stabilize
[4:00] * Cube1 (~Cube@ has joined #ceph
[4:00] * Cube1 (~Cube@ Quit ()
[4:00] <renzhi> ok
[4:03] <sjusthm> and set:
[4:04] <sjusthm> ceph osd noout
[4:04] <sjusthm> ceph osd nodown
[4:04] <renzhi> what would that do?
[4:05] <sjusthm> that should prevent the mons from marking osds out or down
[4:05] <sjusthm> it'll prevent the problem from getting worse
[4:05] <sjusthm> while you bring up the nodes
[4:05] <renzhi> k
[4:05] <sjusthm> or, actually, just set ceph osd noout
[4:07] <renzhi> shutting down the osds leaves defunct now
[4:07] <sjusthm> what?
[4:07] <renzhi> defunct processes
[4:08] <renzhi> will reboot the machine first
[4:09] * Q310 (~Q@ip-121-0-1-110.static.dsl.onqcomms.net) Quit (Read error: Connection reset by peer)
[4:09] * Q310 (~Q@ip-121-0-1-110.static.dsl.onqcomms.net) has joined #ceph
[4:09] <sjusthm> ok
[4:11] <sjusthm> renzhi: try to bring in the osds a few at a time
[4:11] <sjusthm> renzhi: I'll be back in an hour or so
[4:12] <renzhi> sjusthm: thanks, will do
[4:18] <justinwarner> Under Designing a Cluster, (http://ceph.com/wiki/Designing_a_cluster) it gives examples at the bottom. I'm trying to connect up 30 machines (About 300-400 gigs each if that matters), would I want multiple monitors/servers? Or should I still stick with one of each and OSD's on the others?
[4:27] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[4:43] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) Quit (Remote host closed the connection)
[4:43] * dty (~derek@umiacs-vpn.pc.umiacs.umd.edu) has joined #ceph
[4:43] <phantomcircuit> hmm
[4:43] <phantomcircuit> i have some systems which have high write bandwidth
[4:43] <phantomcircuit> and others which have high read bandwidth
[4:44] <phantomcircuit> im guessing there isn't a way to have things write to one and then be migrated to the other slowly is there
[4:48] <sjusthm> renzhi: how is it going?
[4:56] * slang (~slang@ace.ops.newdream.net) Quit (Quit: slang)
[5:14] * sjustlaptop (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[5:55] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:55] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[6:02] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[6:02] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[6:05] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[6:05] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[6:10] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[6:11] * dty (~derek@umiacs-vpn.pc.umiacs.umd.edu) Quit (Quit: dty)
[6:12] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[6:12] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[6:22] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) has joined #ceph
[6:24] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[6:24] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[6:29] * sagelap (~sage@ has joined #ceph
[6:35] * ballysta (0xff@ has joined #ceph
[6:37] * hky (0xff@ Quit (Ping timeout: 480 seconds)
[6:43] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[6:43] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[6:49] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[6:49] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[6:53] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[6:57] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[6:57] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[6:58] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[7:03] * grant (~grant@202-173-147-27.mach.com.au) has joined #ceph
[7:04] <grant> Hi all
[7:04] <grant> Has anyone tried using feeding rbd images to freenas? or know if this can be done natively?
[7:04] <sjustlaptop> freenas?
[7:05] <grant> I wish to keep my ceph/storage backed pure and have "client"/"appliances" access storage and serve externally.
[7:05] <sjustlaptop> I don't quite follow
[7:06] * tnt (~tnt@246.121-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:06] <grant> :)
[7:06] <grant> Ok, I'll see if I can break it down in a way that makes sense.
[7:08] <grant> I wish to utilise "3 layers" "End user/application" / Appliance / Storage. I would like my ceph clusters to provide and feed storage to my appliances - which are accessible externally. End users or applications etc, will look to the appliance level for data access.
[7:08] <grant> This way, my cluster does not need to accomodate multiple services and export to many different "clients"
[7:08] <grant> Does that make sense?
[7:09] <grant> I would like to utilise FreeNAS as one of the appliances
[7:11] <sjustlaptop> well, if you run freenas on an rbd backed vm, that should work
[7:12] <sjustlaptop> does that make sense/
[7:13] <sjustlaptop> I don't know whether it has actually been done, but it should probably work
[7:13] <grant> That should work.
[7:14] <grant> I am also aware that Proxmox can provide rbd devices as storage for clients, but does not look like I can natively pass rbd devices through to freenas yet
[7:14] <grant> (I.e, it is not recognised as iscsi, nfs, smb etc are)
[7:14] <grant> :(
[7:14] * calebamiles (~caleb@c-24-128-194-192.hsd1.vt.comcast.net) has joined #ceph
[7:15] <sjustlaptop> yeah, rbd has its own protocol
[7:33] * tnt (~tnt@246.121-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[7:35] * Cube1 (~Cube@ has joined #ceph
[7:44] * tziOm (~bjornar@ti0099a340-dhcp0778.bb.online.no) has joined #ceph
[7:53] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) has joined #ceph
[7:53] * loicd (~loic@ has joined #ceph
[7:56] * grant (~grant@202-173-147-27.mach.com.au) Quit (Quit: Leaving)
[7:57] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[7:57] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[8:31] <Q310> hrmm
[8:32] <Q310> openstack folsom is supposed to boot from rbd out of the box yeah?
[8:32] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[8:32] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[8:32] <sjustlaptop> Q310: that is my fuzzy recollection
[8:32] <Q310> some here
[8:32] <Q310> *same
[8:32] <Q310> silly keyboard
[8:34] <Q310> its so anoying i can create rbd's with cinder etc, i can even image a rbd from glance with cinder but yet when i create a new vm with the dashboard it seems to know nothing about rbd
[8:34] <Q310> i can even create the rbd's in the dashboard as additional volumes no worries but not provision/boot from one
[8:42] <gregaf> Q310: I'm about to sleep, but are you actually telling them to boot from volume? I think that's still a checkbox you need to set in Folsom
[8:42] <gregaf> but I may be wrong or debugging at the wrong level...
[9:20] <Q310> fairy nuff
[9:20] <Q310> shall have a further look
[9:20] <Q310> seems to be that the image never gets created to the rbd vol
[9:21] <Q310> anyway get some sleep greg this can wait ;)
[9:24] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:26] * Leseb (~Leseb@ has joined #ceph
[9:31] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:34] <todin> morning
[9:36] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) Quit (Quit: This computer has gone to sleep)
[9:37] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[9:37] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[9:38] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:40] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[9:40] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[9:55] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Ping timeout: 480 seconds)
[10:02] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[10:15] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) has joined #ceph
[10:26] * Cube1 (~Cube@ Quit (Ping timeout: 480 seconds)
[10:32] * miroslavk (~miroslavk@ has joined #ceph
[10:35] * gaveen (~gaveen@ has joined #ceph
[10:40] * BManojlovic (~steki@ has joined #ceph
[10:44] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:47] * miroslavk (~miroslavk@ Quit (Quit: Leaving.)
[10:48] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[10:51] * synapsr (~synapsr@ has joined #ceph
[10:58] * MikeMcClurg1 (~mike@client-7-193.eduroam.oxuni.org.uk) has joined #ceph
[10:58] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) Quit (Read error: Connection reset by peer)
[11:05] * tziOm (~bjornar@ti0099a340-dhcp0778.bb.online.no) Quit (Remote host closed the connection)
[11:06] * miroslavk (~miroslavk@ has joined #ceph
[11:16] * MikeMcClurg1 (~mike@client-7-193.eduroam.oxuni.org.uk) Quit (Read error: Connection reset by peer)
[11:16] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) has joined #ceph
[11:16] <jamespage> is there a nice way to tell osd's and mon's to stop/start using cephx authentication without restarting?
[11:18] * miroslavk (~miroslavk@ Quit (Quit: Leaving.)
[11:19] * Cube1 (~Cube@ has joined #ceph
[11:24] * sjusthm (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) Quit (Remote host closed the connection)
[11:26] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) Quit (Read error: Connection reset by peer)
[11:26] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) has joined #ceph
[11:27] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[11:34] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[11:34] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[11:44] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:45] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:46] <joao> jamespage, none I know of
[11:47] <joao> when it comes to the monitor, I'm pretty sure the whole keyring setup is done upon start
[11:50] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:53] <jamespage> joao, yeah - I'm still setting up the keys - but I wanted to see if I could disable cephx without interrupting the entire deployment
[11:56] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[12:04] * synapsr (~synapsr@ has joined #ceph
[12:06] <joao> jamespage, I don't think that's going to be possible either
[12:07] <jamespage> joao, no - thats what I thought....
[12:07] <joao> we could probably get around that, but I'm not sure what the implications would be out of the top of my head
[12:07] <joao> and it would involve patching the daemons
[12:08] <jamespage> joao, its really only a nice-to-have
[12:08] <jamespage> its unlikely to ever happen IMHO
[12:24] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:24] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:29] * benner (~benner@ has joined #ceph
[12:29] <benner> hi
[12:37] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[12:38] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[12:38] * LarsFronius_ is now known as LarsFronius
[12:40] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[12:40] * Cube1 (~Cube@ Quit (Ping timeout: 480 seconds)
[12:43] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Remote host closed the connection)
[12:43] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) Quit (Read error: Connection reset by peer)
[12:44] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) has joined #ceph
[12:45] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) Quit ()
[12:47] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) has joined #ceph
[12:48] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[12:58] * match (~mrichar1@pcw3047.see.ed.ac.uk) has joined #ceph
[13:00] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[13:02] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[13:03] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[13:04] * Cube1 (~Cube@ has joined #ceph
[13:09] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[13:09] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[13:11] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[13:13] <match> Quick docs question - when it says 'Ceph always uses an odd number of monitors' (http://ceph.com/docs/master/architecture/) does this mean that even given an even number of running mons, ceph will just opt to not use one, or is there an explicit requirement to only set up odd numbers of mons?
[13:13] * deepsa (~deepsa@ has joined #ceph
[13:19] <scalability-junk> afaik match you should only run odd numbers.
[13:21] <match> scalability-junk: It's obviously advisable - I was just hoping that it was clever enough for one mon to 'take a back seat' when there was an odd number, but jump in when mon failure had made the reamining number of mons even
[13:21] <andreask> match: odd numbers are recommended to limit the chance of having two cluster partitions, both not working because they have both no quorum
[13:22] <andreask> ... ceph will also work with an even number of monitors
[13:23] <match> andreask: Yeah - I know why they're needed - I was just envisioning a situation where you have 9 mons, noe fails, then the cluster partitions. If you'd had a 'spare one' it could join in when there were 8 mons, and help avoid the partitioning issue
[13:23] * loicd (~loic@ Quit (Ping timeout: 480 seconds)
[13:25] <andreask> match: all mons are active and use PAXOS algorithm to get consensous about cluster membership
[13:27] <match> andreask: Thanks - just making sure I get my head around how ceph does it - looking into issues of geographically separate parts of a cluster and considering the impacts of network separation etc (you'll probably see some related q's on the linux-ha list later :) )
[13:27] <andreask> ;-)
[13:28] <liiwi> hmm, is that feasible now?
[13:30] <match> liiwi: Currently looking at boothd (for pacemaker) which might tick some boxes
[13:33] <andreask> match: unfortunately there is no asynchronous replication available in ceph atm
[13:36] <match> Actually, that makes me think of a different question... Can you configure ceph (crush map?) to store each copy of a replicated object on specific groups of nodes? Again thinking of situations where you run 2 geographically separated halves of a pool, and want each half to have a copy of the data
[13:37] <scalability-junk> match, there was something I read online let me check
[13:38] <zynzel> match: yes you can.
[13:39] <zynzel> match: simply check yours crush map ;)
[13:39] <zynzel> and search 'rule' definition, step chooseleaf firstn 0 type rack
[13:39] <zynzel> you can change rack for 'dc' or other, but you should provide low delay betweend DCs
[13:40] <scalability-junk> http://ceph.com/wiki/Custom_data_placement_with_CRUSH @match
[13:40] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Read error: Connection reset by peer)
[13:40] <match> hehe - I'm sure I looked at this before and couldn't find anything, but now when I search 'ceph crush map' the first result is that wiki page - oops!
[13:40] <scalability-junk> and there was a great article about it how to start with one rack and go up the ladder, but I couldn't find it.
[13:40] <scalability-junk> was quite interesting
[13:43] * joao (~JL@ Quit (Remote host closed the connection)
[13:44] * joao (~JL@89-181-147-186.net.novis.pt) has joined #ceph
[13:53] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[14:05] * tziOm (~bjornar@ has joined #ceph
[14:09] * Cube1 (~Cube@ Quit (Quit: Leaving.)
[14:24] * joao (~JL@89-181-147-186.net.novis.pt) Quit (Quit: Leaving)
[14:30] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[14:34] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[14:40] * idnc_sk (~idnc@ has joined #ceph
[14:40] <idnc_sk> Hi
[14:40] <idnc_sk> I'm having a trange issue on my ceph cluster
[14:40] <idnc_sk> tid 769 timed out on osd13, will reset os
[14:40] <idnc_sk> the strange thing is
[14:41] <idnc_sk> when I removed osd13 to do hd tests, I got the exact same issue on osd12
[14:42] <idnc_sk> cephfs mounted with a win2012 vm running a - slooooow - installation
[14:43] <idnc_sk> btw, off topic but are there any best practices on how to use ceph in regards to object/block storage and plain fs?
[14:44] * joao (~JL@89-181-147-186.net.novis.pt) has joined #ceph
[14:45] <idnc_sk> aaand if I have (500, 500, 500, 1TB)n HDD config - will crush take care of that 1tb hdd and replicated it in a ha fashion or should I use OSD's of the same size?
[14:45] * deepsa (~deepsa@ has joined #ceph
[14:47] <idnc_sk> ceph version 0.48.1argonaut (commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c)
[14:50] <idnc_sk> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/5633
[14:51] <idnc_sk> ok, will google further - the ro performance is sshfs over modem like, battery almost done, later
[14:51] * idnc_sk (~idnc@ has left #ceph
[15:08] * steki-BLAH (~steki@ has joined #ceph
[15:12] * BManojlovic (~steki@ Quit (Ping timeout: 480 seconds)
[15:22] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) has joined #ceph
[15:28] * nhorman (~nhorman@nat-pool-rdu.redhat.com) has joined #ceph
[15:30] * loicd (~loic@ has joined #ceph
[15:31] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[15:32] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[15:33] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[15:40] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[15:41] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Ping timeout: 480 seconds)
[15:43] * dty (~derek@pool-71-191-131-36.washdc.fios.verizon.net) Quit (Quit: dty)
[15:45] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[15:45] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[15:54] * steki-BLAH (~steki@ Quit (Ping timeout: 480 seconds)
[15:57] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[15:57] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[15:58] * PerlStalker (~PerlStalk@ has joined #ceph
[16:02] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[16:09] * dty (~derek@testproxy.umiacs.umd.edu) has joined #ceph
[16:11] * steki-BLAH (~steki@ has joined #ceph
[16:26] * steki-BLAH (~steki@ Quit (Ping timeout: 480 seconds)
[16:27] * rosco (~r.nap@ Quit (Quit: Changing server)
[16:27] * loicd (~loic@ Quit (Read error: Connection reset by peer)
[16:28] * cdblack (86868949@ircip4.mibbit.com) has joined #ceph
[16:29] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[16:29] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[16:32] * BManojlovic (~steki@ has joined #ceph
[16:33] * rosco (~r.nap@ has joined #ceph
[16:42] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[16:42] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[16:48] * tziOm (~bjornar@ Quit (Remote host closed the connection)
[16:51] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) has joined #ceph
[17:08] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[17:08] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[17:10] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[17:15] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[17:15] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[17:17] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) Quit (Quit: This computer has gone to sleep)
[17:20] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) has joined #ceph
[17:22] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[17:24] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:26] <cdblack> Ping... Question: Updating from 0.48.1 to 0.48.2 process, is there a link to instructions or is it a simple shutdown svcs, update, restart?
[17:35] * tziOm (~bjornar@ti0099a340-dhcp0778.bb.online.no) has joined #ceph
[17:38] * loicd (~loic@ has joined #ceph
[17:40] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:44] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[17:50] * miroslavk (~miroslavk@ has joined #ceph
[17:51] * loicd (~loic@ Quit (Quit: Leaving.)
[17:52] * sjustlaptop (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[17:55] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[17:58] * match (~mrichar1@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[17:59] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[18:02] * MikeMcClurg (~mike@client-7-193.eduroam.oxuni.org.uk) Quit (Ping timeout: 480 seconds)
[18:08] * tnt (~tnt@246.121-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:08] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[18:09] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[18:10] * loicd (~loic@ has joined #ceph
[18:10] * loicd (~loic@ Quit ()
[18:17] * loicd (~loic@ has joined #ceph
[18:18] * loicd (~loic@ Quit ()
[18:18] * loicd (~loic@ has joined #ceph
[18:22] * cowbell (~sean@ has joined #ceph
[18:23] * alphe (~asalas@ has joined #ceph
[18:23] <alphe> hello
[18:23] <gregaf> cdblack: yeah, just restart one daemon at a time (or all at once, but one-at-a-time leaves the cluster operating)
[18:24] <alphe> I have several issues with ceph client with 0.53 for a 0.52 ceph storage cluster
[18:25] <alphe> 1) I can't do mount -t ceph /mnt/ceph -o name=admin,secretefile=/etc/ceph/secret
[18:25] <alphe> adding ceph secret key to kernel failed: Invalid argument.
[18:25] <alphe> failed to parse ceph_options
[18:25] <alphe> like with the 0.52 client I copied the /etc/ceph/secret file
[18:25] <gregaf> "secretefile"? :)
[18:26] <alphe> without e ?
[18:26] <gregaf> wasn't sure if you were copying or typing
[18:26] <rweeks> yep, extraneous e
[18:26] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[18:27] <alphe> ok sorry when I wrote it back on my documentation (internal howtos It went wrong :)) )
[18:27] <alphe> ceph-fuse 175G 30G 146G 17% /mnt/ceph
[18:28] <alphe> 2) ceph-fuse works perfectly once you copied to the client the /etc/hosts and the /etc/ceph content from the MDS
[18:28] <alphe> but I still have a wrong sizing ><
[18:28] <alphe> ceph-fuse 175G 30G 146G 17% /mnt/ceph
[18:28] <gregaf> what are you expecting to see?
[18:28] <alphe> the weird part because there is a weird part is that an ubuntu client with ceph0.39 on it see the size of the ceph datastore perfectly
[18:31] <alphe> something that is close to the data from the ceph -w
[18:31] <alphe> pgmap v444352: 960 pgs: 639 active+clean, 300 active+recovering, 21 active+recovering+remapped+backfill; 3098 GB data, 7420 GB used, 37274 GB / 44712 GB avail; 125923/3450909 degraded (3.649%)
[18:31] <alphe> I copied the relevant line :)
[18:32] <alphe> and the ubuntu with ceph 0.39 sees it properly
[18:33] <gregaf> huh, that is strange
[18:33] <gregaf> does it have the same data in the FS?
[18:34] <alphe> sorry I didn't understoud
[18:34] <gregaf> when you look at the mountpoint, does the filesystem look the same?
[18:34] <alphe> maybe it is because the kernel on my archlinux 64 client is a 3.5.somethingnasty
[18:34] <gregaf> in terms of what files are there, and their contents
[18:35] <alphe> gregaf yeah if I travel to the mount point do a ls then travel lower in the tree from that point then edit files it is completly coherent
[18:35] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[18:35] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) Quit ()
[18:36] <alphe> Linux samceph01 3.6.2-1-ARCH #1 SMP PREEMPT Fri Oct 12 23:58:58 CEST 2012 x86_64 GNU/Linux
[18:37] <gregaf> the usage data is coming out of df?
[18:37] <alphe> (by the way using a cute and small archlinux to interface a windows machine with samba to a ceph data store is really neat )
[18:37] <alphe> gregaf yes df -h
[18:38] * sjusthm (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[18:38] <alphe> and to mount I used ceph-fuse client instead of a mount -t ceph cause it had that error in the args
[18:38] * cowbell (~sean@ Quit (Quit: cowbell)
[18:39] <gregaf> yeah
[18:39] <alphe> and that allowed me to see that this issue was showing what ever way you mount it
[18:39] <gregaf> they're both 64-bit machines?
[18:39] <alphe> yes
[18:39] <gregaf> I'm confused
[18:39] <gregaf> :)
[18:40] <alphe> the datastore clusters servers are ubuntu 64 I tryed to do a client with arch 32 and all I got was a panic on mount and then tiny pieces of kernel all around ><
[18:40] <gregaf> I've got a meeting but I or somebody will get back to you
[18:41] <alphe> ok great to talk to you later I will stay around (can be delayed to replay mail message to get me back in the live conversation would be great)
[18:41] * deepsa (~deepsa@ Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[18:41] <alphe> gregaf see you :)
[18:42] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Ping timeout: 480 seconds)
[18:42] * synapsr (~synapsr@ has joined #ceph
[18:42] * miroslavk (~miroslavk@ Quit (Quit: Leaving.)
[18:42] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[18:42] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[18:43] * loicd (~loic@ Quit (Quit: Leaving.)
[18:45] * nwatkins2 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[18:45] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) Quit (Read error: Connection reset by peer)
[18:46] <alphe> brb
[18:46] * alphe (~asalas@ has left #ceph
[18:48] * alphe2 (~alphe@ has joined #ceph
[18:48] <alphe2> nick problems
[18:48] * alphe2 is now known as alphe
[18:49] <alphe> nice new irc service :)
[18:52] * loicd (~loic@ has joined #ceph
[18:56] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[18:56] * Ryan_Lane1 (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[18:58] * sjusthm (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) Quit (Remote host closed the connection)
[18:58] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[18:58] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) has joined #ceph
[18:59] * sjustlaptop (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[19:00] * sjusthm (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[19:01] * Leseb (~Leseb@ Quit (Quit: Leseb)
[19:02] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[19:02] * chutzpah (~chutz@ has joined #ceph
[19:02] <joao> sjustlaptop, around?
[19:05] <sjusthm> yes
[19:05] <sjusthm> or close enough
[19:07] * cowbell (~sean@adsl-70-231-129-172.dsl.snfc21.sbcglobal.net) has joined #ceph
[19:09] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[19:12] * miroslavk (~miroslavk@ has joined #ceph
[19:22] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:24] <alphe> mount: error writing /etc/mtab: Invalid argument
[19:25] <alphe> when using mount -t ceph XXXX:/ /mnt/ceph -o user=admin,secretfile=/etc/ceph/secret
[19:26] <alphe> any clues why that happend ?
[19:27] * Tv_ (~tv@2607:f298:a:607:bc4d:663f:aa67:a0da) has joined #ceph
[19:29] <calebamiles> I believe the calling format changed
[19:29] <calebamiles> try swapping "user" for "admin"
[19:30] <calebamiles> and maybe also "secretfile" for "secret" if you're still having problems
[19:30] * nhorman (~nhorman@nat-pool-rdu.redhat.com) Quit (Quit: Leaving)
[19:30] <calebamiles> @alphe that is
[19:30] <cephalobot`> calebamiles: Error: "alphe" is not a valid command.
[19:30] <calebamiles> to your question alphe
[19:30] <tnt> alphe: look in dmesg
[19:31] <alphe> [ 4703.314374] libceph: client4517 fsid e756f9f1-a9f2-4a96-ba63-d1c88a44ac2d
[19:31] <alphe> [ 4703.321728] libceph: mon0 session established
[19:31] <alphe> [ 4763.359626] ceph: mds0 caps stale
[19:31] <alphe> [ 4763.360859] ceph: mds0 caps renewed
[19:32] <tnt> strange
[19:32] * BManojlovic (~steki@ has joined #ceph
[19:32] <alphe> yep no information about that /etc/mtab thing
[19:32] <calebamiles> I think that's the new "OK" return messages
[19:33] <alphe> /mnt/ceph ceph rw,relatime,name=admin,secret=<hidden>,nodcache 0
[19:33] <alphe> 0
[19:33] <calebamiles> I believe caps are a relatively new feature, but I'm new here :)
[19:33] <alphe> in the mtab I have that
[19:33] <alphe> I thing the problem is the <hidden>
[19:33] <alphe> secret=<hidden>
[19:34] <calebamiles> I use something like
[19:34] <calebamiles> secret=`/host/home/caleb/ceph/src/ceph-authtool -p /host/home/caleb/ceph/src/keyring`
[19:35] <calebamiles> but my set up is different from yours
[19:35] * jks (~jks@3e6b7571.rev.stofanet.dk) Quit (Quit: jks)
[19:36] <tnt> personally I use the system keychain ...
[19:37] <gregaf> calebamiles: different caps
[19:37] * jks (~jks@3e6b7571.rev.stofanet.dk) has joined #ceph
[19:37] <calebamiles> ah
[19:37] <gregaf> elder: yehudasa: you guys know how mounting the kernel works
[19:38] <gregaf> "mount: error writing /etc/mtab: Invalid argument"
[19:38] <gregaf> "when using mount -t ceph XXXX:/ /mnt/ceph -o user=admin,secretfile=/etc/ceph/secret"
[19:38] <elder> understand that question, I do not.
[19:38] <elder> Oh.
[19:38] <calebamiles> user isn't a valid option anymore
[19:38] <calebamiles> if I remember correctly
[19:38] <calebamiles> and secretfile has probably become secret
[19:39] <elder> I can try to track that down for sure, just a minute.
[19:39] <calebamiles> this is how I mount in my UML environment
[19:39] <calebamiles> mount -t ceph XXX:/ mnt -o name=admin,secret=`ceph-authtool -p /host/home/caleb/ceph/src/keyring`
[19:39] <alphe> gregaf the biggest directory that countains TB of datas I can't se more than a few MB
[19:40] <gregaf> alphe: hmm, are you using multiple pools?
[19:40] <alphe> you mean multi osd ?
[19:40] <alphe> yes with one mds and one mon
[19:41] <gregaf> no, did you add any additional RADOS pools to your filesystem?
[19:41] <gregaf> (if you don't know; you didn't)
[19:41] <alphe> gregaf ceph -w to see that no ?
[19:42] <gregaf> ceph osd dump lists your pools as part of its output
[19:42] * Ryan_Lane (~Adium@wsip-184-191-191-52.sd.sd.cox.net) Quit (Quit: Leaving.)
[19:43] <alphe> ok and rados are mentioned there ?
[19:43] <gregaf> actually, let's do this
[19:43] <alphe> pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 320 pgp_num 320 last_change 1 owner 0
[19:43] <gregaf> what's the output of "ceph pg dump | grep pool"?
[19:43] <alphe> rdb are rados no ?
[19:43] <alphe> ceph pg dump | grep pool
[19:43] <alphe> pool 0 1164405 32982 45002 0 3326050666376 49911320 49911320
[19:43] <alphe> pool 1 364778 2249 11765 0 529716280 45362862 45362862
[19:43] <alphe> pool 2 0 0 0 0 0 0 0
[19:43] <gregaf> rbd is the third of three default pools, which by default is used to store RBD images ;)
[19:44] <gregaf> okay, that makes sense
[19:45] <gregaf> and your arch linux install only sees part of the data in a large directory?
[19:45] <gregaf> have you confirmed it's still there with a different client?
[19:45] <alphe> yes !
[19:45] <alphe> only part of the data and very very slow when browsing
[19:46] <gregaf> do you have any other machines you can check it out on?
[19:46] <alphe> from windows and slow when looking at the directory properties to compute the directory's size
[19:46] <alphe> I have tryed archlinux with ceph 0.52 and 0.52 and 0.38 got the same issue each times
[19:47] <alphe> compiling from sources so maybe there is a trick thing there
[19:47] <gregaf> what's your other node running? ubuntu?
[19:47] <alphe> yes ubuntu with a .deb installed stable
[19:49] <gregaf> this issue just doesn't sound like anything I've ever heard of before
[19:49] <gregaf> I'm not sure how it could happen by building wrong either, though
[19:50] <gregaf> can you try and reproduce it on something besides Arch? I'd like to narrow the problem space down
[19:51] * loicd (~loic@ Quit (Quit: Leaving.)
[19:53] <alphe> I can install a virtual box machine with ubuntu
[19:53] <gregaf> that'd work
[19:53] <alphe> transfere the .deb install and see what's up
[19:53] <alphe> ok do it right now
[19:53] * loicd (~loic@ has joined #ceph
[19:54] * samppah (hemuli@namibia.aviation.fi) has joined #ceph
[19:54] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[19:55] <alphe> lol ... anyone saw the ubuntu.org homepage ?
[19:55] <alphe> ahahaha hillarious
[19:56] <alphe> ubuntu 12.10 avoid the pain of windows 8
[19:56] <alphe> should work rigth with 12.10 no ?
[19:57] <gregaf> yep
[19:57] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[19:58] * miroslavk (~miroslavk@ Quit (Quit: Leaving.)
[19:59] * synapsr (~synapsr@ has joined #ceph
[20:02] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:02] <Tv_> alphe: ubuntu.org has been owned by that organization since 2002...
[20:02] <alphe> ok starting with a mini.iso 12.10
[20:02] <gregaf> he meant ubuntu.com
[20:03] <alphe> it's ubuntu.com old habits
[20:03] <Tv_> hmm the actual release is out
[20:03] <alphe> but still very funny for first time there is a clear stand up from a linux distribution against windows
[20:03] <Tv_> now what boxes should i upgrade... ;)
[20:04] <gregaf> not really the first time for them, haven't you seen bug #1?
[20:04] <todin> hi, how could I debug performance issuse? if put the jounral on a ram disk I get 950MB/s with rados bench, if I put it on a SSD I get just 550MB/s?
[20:04] <alphe> hum all of them with one single broadcasted command yeah !
[20:04] <alphe> why ? because risky life is a happy life
[20:05] <gregaf> https://bugs.launchpad.net/ubuntu/+bug/1
[20:05] <Tv_> gregaf: aka "the launchpad stress test bug"
[20:05] <alphe> gregaf nope
[20:06] <gregaf> todin: sjust has been working on performance stuff lately
[20:06] * slang (~slang@ace.ops.newdream.net) has joined #ceph
[20:07] <gregaf> but for one thing 550MB/s is towards the top end of an SSD
[20:07] <alphe> gregaf don't know what I'm look at ...
[20:07] <gregaf> err, sjustlaptop or sjusthm today, I guess
[20:07] <alphe> gregaf right
[20:07] <todin> gregaf: I have 4 710 intel ssd, each is capable of around 250MB/s, the filestore are 16 sas disks
[20:07] <sjusthm> yo
[20:08] <todin> there the ssd shouldn't be the bottelneck
[20:08] * Ryan_Lane (~Adium@ has joined #ceph
[20:08] <todin> I have of for osd on the node, and context switches it 100K area
[20:10] * jluis (~JL@ has joined #ceph
[20:11] * gaveen (~gaveen@ Quit (Ping timeout: 480 seconds)
[20:12] <sjustlaptop> todin: correct me if I'm wrong, you have 16 sas disks backing 16 osds
[20:12] <sjustlaptop> along with 4 ssds each backing 4 osds as journals?
[20:13] <todin> sjustlaptop: I have 4 osd, each osd has 4 sas disk for the filestore and one ssd for the journal, the plattform is an E3
[20:13] <sjustlaptop> E3?
[20:13] <todin> Intel E3 cpu
[20:13] <sjustlaptop> the 4 sas disks on each osd are raid0?
[20:13] <todin> sjustlaptop: yes via an LSI controller
[20:14] <sjustlaptop> replication off?
[20:14] <sjustlaptop> (i.e. pool size is 1)
[20:14] <todin> for the test yes
[20:14] <alphe> there is no debian package for 0.53 right ?
[20:14] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[20:15] <todin> sjustlaptop: I have quite high latnecies on the journal ssd, I don't know why, if I test the ssd with fio, they are fine
[20:15] <sjustlaptop> how did you measure that?
[20:15] <alphe> ok follwing this to install http://ceph.com/docs/master/install/debian/
[20:15] <gregaf> I believe there are .53 .debs available now
[20:15] <gregaf> if you choose the development packages instead of the stable release
[20:16] <alphe> ok that is what I was reading thank you
[20:16] <todin> sjustlaptop: with rados bench -p rbd 300 write -t 100 -b 1024000
[20:16] * joao (~JL@89-181-147-186.net.novis.pt) Quit (Ping timeout: 480 seconds)
[20:16] <sjusthm> so the latencies are the latencies reported by rados bench?
[20:17] <todin> sjustlaptop: no, the latenices I see in iostat on the journal device
[20:17] <sjusthm> ok
[20:17] <sjusthm> can you post your ceph.conf?
[20:17] <todin> in rados bench the latenies are reported in ms?
[20:17] <alphe> don't know what will be the shape of the result install from the mini.iso of ubuntu 12.10 but it could be a good try for my mini version of a samba interface for windows
[20:17] <sjusthm> (I'm looking for journal aio and journal dio)
[20:18] <todin> both are active in the default, an I didn't overwrite it
[20:18] <alphe> the idea is to put the minimalistic possible working ceph linux samba in a virtal machine on the desktop of a windows client and allow that windows client to access the regular way to the ceph datastore
[20:18] <sjusthm> I think aio is off by default
[20:19] <todin> let me check
[20:19] * gaveen (~gaveen@ has joined #ceph
[20:19] <alphe> as each client is it's own broadcaster then we don't use the scalability of the broadcasted data
[20:20] <alphe> as each client is it's own broadcaster then we don't loose the scalability of the broadcasted data
[20:20] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[20:21] * The_Bishop (~bishop@2001:470:50b6:0:3c0b:fe6b:861a:8c84) Quit (Ping timeout: 480 seconds)
[20:22] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[20:27] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[20:28] <sjustlaptop> todin, you might also get some speed boost out of completely disabling logging
[20:28] <todin> sjustlaptop: how do I do that?
[20:30] * The_Bishop (~bishop@2001:470:50b6:0:5471:3349:17ea:83c1) has joined #ceph
[20:30] <sjustlaptop> looking it up, don't quite remember
[20:31] * miroslavk (~miroslavk@ has joined #ceph
[20:31] * synapsr (~synapsr@ has joined #ceph
[20:33] <todin> sjustlaptop: in the config dump it says "journal_dio": "true",
[20:33] <todin> "journal_aio": "true",
[20:33] <todin> how could I increase the ops which are in flight?
[20:35] * idnc_sk (~idnc@ has joined #ceph
[20:35] <idnc_sk> Hi
[20:36] <todin> sjustlaptop: I did that in the config log file = ""
[20:38] * Kioob (~kioob@luuna.daevel.fr) Quit (Ping timeout: 480 seconds)
[20:42] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[20:42] * Ryan_Lane (~Adium@ Quit (Quit: Leaving.)
[20:43] <todin> sjustlaptop: an if I use 15K SAS disk for the journal the performace gets even worse, the disk can do 190MB/s
[20:43] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[20:45] <alphe> on ceph.com there is no quantal T___T
[20:45] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[20:45] <alphe> in debian/dist
[20:47] <alphe> ceph | 0.53-1quantal | http://ceph.com/debian-testing/ quantal/main amd64 Packages
[20:47] <alphe> good
[20:48] <sjustlaptop> todin: sorry, am in phone call
[20:48] <alphe> sorry wasn't looking the right place
[20:48] <todin> sjustlaptop: np, do want it to debug it later, or via the ml?
[20:48] <sjustlaptop> ok, off phone call
[20:48] <alphe> installing ceph-0.53
[20:49] <sjustlaptop> you can increase filestore_queue_max_ops to 1000 and filestore_queue_max_bytes to 1<<30
[20:49] <sjustlaptop> or in that range
[20:50] <sjustlaptop> similarly with journal_queue_max_ops and journal_queue_max_bytes
[20:50] <todin> sjustlaptop: ok
[20:50] * scuttlemonkey (~scuttlemo@wsip-70-164-119-40.sd.sd.cox.net) Quit (Quit: This computer has gone to sleep)
[20:51] <sjustlaptop> you might also increase osd_client_message_size_cap to 1<<30 (defaults to 500 MB)
[20:51] <sjustlaptop> all of these things will tend to increase memory usage though
[20:51] <sjustlaptop> but worth a try
[20:53] * cowbell (~sean@adsl-70-231-129-172.dsl.snfc21.sbcglobal.net) Quit (Quit: cowbell)
[20:53] <sjustlaptop> todin: I'll be back in an hour or so
[20:54] <todin> sjustlaptop: ok, I am not sure if I will be still here, diffrent time zone ;-)
[20:56] <alphe> almost ended to install my VM with ubuntu 12-10 and ceph 0.53 and samba
[20:58] <idnc_sk> my 2tb backup drive just died
[20:58] <idnc_sk> I can hear the head scratching the plates
[20:59] <idnc_sk> high capacity hdd's are a pain
[21:01] <alphe> so far so good no error with /etc/mtab arguments on ubuntu 12.10
[21:01] <alphe> at the mount time
[21:01] <alphe> root@ubsambceph:~# df -h
[21:01] <alphe> Filesystem Size Used Avail Use% Mounted on
[21:01] <alphe> /dev/sda1 7.4G 1.3G 5.8G 18% /
[21:01] <alphe> udev 237M 4.0K 237M 1% /dev
[21:01] <alphe> tmpfs 99M 1.3M 98M 2% /run
[21:01] <alphe> none 5.0M 0 5.0M 0% /run/lock
[21:01] <alphe> none 246M 0 246M 0% /run/shm
[21:01] <alphe> none 100M 0 100M 0% /run/user
[21:01] * idnc_sk (~idnc@ has left #ceph
[21:01] <alphe> 44T 6.5T 38T 15% /mnt/ceph
[21:01] <alphe> sorry for the spam but it works fine with the couple ubuntu 12.10 /ceph 0.53
[21:02] * nwatkins2 (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[21:02] <alphe> think I will use that as broadcaster ...
[21:02] <alphe> instead of archlinux that pains me cause I liked the idea to have it working properly on lastest kernel
[21:03] * sjustlaptop (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[21:07] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[21:09] * sjust-phone (~sjust@m950536d0.tmodns.net) has joined #ceph
[21:10] <sjust-phone> todin: any improvement?
[21:11] <todin> sjust-phone: not really, maybe a few MB/s more. I thought a node of this size would be able to saturate 10G
[21:12] <nhm> todin: out of curiousity did you ever try single drive raid0 arrays or JBOD mode?
[21:12] <sjust-phone> is the ssd also on the same controller?
[21:13] <todin> nhm: I did, this config was the with the best performance.
[21:13] <todin> sjust-phone: the ssd it is on the 4 on board sata controlers,
[21:13] <nhm> todin: Interesting. I typically see multi drive raid0 falling behind lots of OSDs.
[21:14] <todin> nhm: so you would recommend one osd per disk? so 16 osds?
[21:14] * AaronSchulz (~chatzilla@ Quit (Remote host closed the connection)
[21:14] <nhm> todin: that's the config I've had the best luck with.
[21:14] * synapsr (~synapsr@ has joined #ceph
[21:15] <todin> nhm: I have very high contex switching, that was my reason to reduce the osd numbers
[21:15] <nhm> todin: with 15 OSDs and 5 SSDs, I could do about 1.2-1.3GB/s on one node.
[21:15] <nhm> todin: more like 1.4GB/s if I disable crc32c calculations.
[21:15] <todin> crc32c in btrfs, or where?
[21:15] <nhm> todin: in ceph
[21:16] <todin> how could I do that?
[21:16] <nhm> todin: ms nocrc = false in the ceph.conf file
[21:16] <nhm> er = true, sorry
[21:17] <todin> nhm: and the 1.4GB/s is the bandwidth you get reporte via rados bench?
[21:17] <nhm> todin: I run concurrent rados bench instances.
[21:17] <todin> nhm: I used 2, rados bench instances, you think I should use more?
[21:17] * sjustlaptop (~sam@m950536d0.tmodns.net) has joined #ceph
[21:18] <nhm> todin: At least on that node, I found that up to 8 concurrent instances were needed to saturate the OSDs.
[21:18] * scuttlemonkey (~scuttlemo@ has joined #ceph
[21:18] <nhm> todin: the difference between 4 and 8 is pretty small though.
[21:19] <todin> nhm: ok, I could try that, I do the test via network, so I have one loadgenerator node, and one osd node
[21:19] <nhm> todin: my tests were all on localhost, I haven't been able to try the tests on bonded 10GbE yet.
[21:20] <todin> nhm: ok, I got a free switch from HP
[21:20] <nhm> todin: here's the article I wrote. This doesn't have the 15 OSD scaling numbers, but looks at performance for a couple of different controllers: http://ceph.com/community/ceph-performance-part-1-disk-controller-write-throughput/
[21:21] <nhm> todin: all of those tests are 6 disks + 2 SSDs.
[21:21] <todin> nhm: I read that, I stil have a few question about your test, but didn't have the time to write them down
[21:21] <todin> did you use raw partition for the journal or was a fs on the device?
[21:22] <nhm> todin: each SSD had 3 10G partitions with the journal directly on the partition.
[21:22] <nhm> todin: when I did single OSD raid0 tests, I had a single 60G raid0 partition spanning both SSDs.
[21:23] <sjustlaptop> todin: you might try using -t 1000
[21:23] <sjustlaptop> unlikely to help
[21:23] <todin> nhm: the crc stuff gave me 10-20MB/s more, I think I am cpu bound
[21:24] <slang> sagewk: wip-3346 looks good?
[21:24] <todin> nhm: your setup is much stronge cpu wise than mine
[21:24] * allsystemsarego (~allsystem@ has joined #ceph
[21:24] <nhm> todin: Which filesystem are you using?
[21:24] <todin> nhm: btrfs
[21:25] <todin> with large metadata
[21:25] <nhm> todin: How many cores on your i3?
[21:25] <todin> sjustlaptop: -t 1000 didn't change anything
[21:25] <sjustlaptop> yeah, didn't really think so
[21:25] <todin> nhm: 4 real ones
[21:25] <sjustlaptop> can you post the output of ceph osd tree?
[21:25] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[21:26] <nhm> todin: 16 OSDs with ext4 or xfs may do better if you are CPU bound...
[21:26] <todin> http://pastebin.com/jR99AnbU
[21:27] <sjustlaptop> todin: hmm, try also increasing journal_max_write_size to 100<<20
[21:27] <todin> nhm: I allready have an singel e5 and a dual e5, I will try them tomorrow, and an intel 910 ssd is on the way
[21:27] <amatter> hi guys. having a strange issue where I can't set the pool on a folder using cephfs. http://pastebin.com/ZyL0Z3Ez
[21:28] <nhm> todin: hrm, if I recall you have a single controller for all 20 drives?
[21:29] <sjustlaptop> todin: forgot to ask, are you seeing the throughput burst higher at the beginning of the run?
[21:29] <nhm> todin: sorry, 1 controller for the 16 OSDs, and then on-board SATA for the SSDs?
[21:30] <todin> nhm: right one controller LSI 9266 on the ssd sata on board
[21:31] <nhm> todin: Are the disks SATA or SAS
[21:31] <nhm> ?
[21:31] <todin> nhm: sas 7.200k
[21:31] <amatter> previously if I used to pool number it worked (due to a bug), now I've updated kernel version and it accepts the string name of the pool but doesn't actually change the pool of the folder as evidenced by a show_layout
[21:31] <nhm> todin: ok, theoretically expanders shouldn't cause problems, though I try to avoid them if I can.
[21:32] * loicd (~loic@ Quit (Quit: Leaving.)
[21:32] <todin> nhm: it is the supermicro dual expander backplane I don't have the type here
[21:33] <todin> nhm: I also want to try a passive backplane with a sas HBA
[21:33] <nhm> todin: yeah, I opted for the passive backplane with lots of controllers in our test setup.
[21:34] <todin> sjustlaptop: yes is starts higher for the first 3-4 sec, than it drops, the diff is around 20MB/s
[21:35] <nhm> todin: is it consistent after that?
[21:35] <todin> nhm: more or less just a few MB up and down
[21:35] <sjustlaptop> ok, you probably are journal limited then (which is consistent with what you said about the ramdisk)
[21:36] <todin> sjustlaptop: then I will try tomorrw the inte 510 ssd they can do 500MB/s
[21:36] <todin> the 710 only 250MB/s
[21:36] <sjustlaptop> sagelap: what is the syntax for changing the in-memory debug level?
[21:37] <nhm> todin: yeah, intel lists it at 170MB/s write for the 100GB model and 210MB/s write for the 200+GB model.
[21:37] <sjustlaptop> debug <subsys> = 0 0
[21:37] <sjustlaptop> ?
[21:37] <sjustlaptop> or is it 0,0
[21:38] <todin> nhm: I have the 200GB model
[21:39] <elder> 12.10 http://www.ubuntu.com/download
[21:39] <nhm> yikes, I think my desktop is still on 10.10
[21:41] <todin> nhm: if I write with fio on the ssd I get a bandwidth of 214MB/s
[21:41] <nhm> todin: what was the aggregate throughput you were getting again?
[21:41] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[21:42] <todin> nhm: around 550M/s
[21:42] <todin> nhm: what could I expect from your expierenc?
[21:43] <nhm> todin: ok, so like 140MB/s per SSD.
[21:43] <todin> nhm: around that, what make me wonder is, in iostat the latence of the ssd are in the area of 30-40ms
[21:44] <nhm> todin: tough to say. I think you are right to try a setup with more CPU power.
[21:44] <nhm> todin: out of curiosity, if you run rados bench from localhost on the OSD node, is it any faster?
[21:44] <nhm> (or slower)?
[21:45] <gregaf> amatter: I believe you still need to use pool IDs rather than names
[21:45] * joshd (~joshd@ has joined #ceph
[21:45] <gregaf> sjustlaptop: pretty sure it's debug <subsys> = 1 / 5
[21:45] * sjustlaptop (~sam@m950536d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[21:45] <todin> nhm: there is no different, atm I run them on localhost
[21:46] <nhm> todin: any change if you run 4 or 8 concurrent instances?
[21:46] <todin> nhm: the lat in rados bench is in ms?
[21:46] * joshd (~joshd@ Quit (Read error: Connection reset by peer)
[21:46] * joshd1 (~joshd@ has joined #ceph
[21:47] <nhm> todin: seconds I think
[21:47] <amatter> gregaf: that's what I thought but it just returns invalid argument even though I've confirmed the pool number
[21:48] <amatter> gregaf: also, I checked the other arguments in case that was the problem, but I'm just matching the default configuration
[21:48] * sjustlaptop (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[21:49] <todin> nhm: with 4 concurrnet I get 570-580
[21:49] * sjust-phone2 (~sjust@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[21:49] * sjust-phone2 (~sjust@24-205-61-15.dhcp.gldl.ca.charter.com) Quit ()
[21:49] <nhm> todin: ok, so a marginal increase, but not major.
[21:49] * sjust-phone2 (~sjust@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[21:50] * miroslavk (~miroslavk@ Quit (Quit: Leaving.)
[21:50] <nhm> todin: I think your next step is to see how the 510 does.
[21:51] <nhm> And throw a faster CPU in.
[21:52] <alphe> gregaf it works perfectly (the size of ceph-fuse storage in df -h) with ubuntu 12.10
[21:52] <alphe> I think it is due to the kernel 3.6
[21:52] <nhm> todin: Another crazy idea is you could try just skipping the SSDs entirely and see how you can do with jouranls on the OSD disks.
[21:52] <nhm> todin: With btrfs I've gotten up to around 600MB/s with 12 disks.
[21:52] <rweeks> num, I think he said early on he got more speed without the SSDs
[21:53] <rweeks> er, nhm
[21:53] * sjust-phone (~sjust@m950536d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[21:53] <nhm> rweeks: was that with journals on ramdisk or on osd data disks?
[21:53] <todin> nhm: is a singel e5 realy much faster than a e3? or should I go for the dual e5?
[21:54] <rweeks> oh never mind. He was putting the journal on ram disk, not actual disk
[21:54] <rweeks> had to read scrollback.
[21:54] <todin> nhm: I need the iops form filestore disk for the customer vms
[21:54] * joshd1 (~joshd@ Quit (Ping timeout: 480 seconds)
[21:54] <amatter> gregaf: if I do --pool 0 (which is what it is on) the command returns without error. all other pools return invalid argument. the folder that I'm trying to configure is empty, I already know it doesn't like populated folders. going to try rebooting into an earlier kernel
[21:56] <gregaf> amatter: hmm, did you add those pools to the FS?
[21:58] <nhm> todin: probably depends on which E5 you go for. The ones I have are low power, so 6-cores at 2GHz. Some of the higher power ones I think are 8-core and faster.
[21:58] * maelfius (~mdrnstm@108.sub-70-197-146.myvzw.com) has joined #ceph
[21:59] <nhm> todin: dual 6-core low power ended up being a nice balance of cost, heat, and power for me.
[22:00] <amatter> gregaf: oh yeah, I forgot that step. can you remind me the syntax?
[22:00] <nhm> you could always go for dual E5-2687Ws. ;)
[22:00] * maelfius (~mdrnstm@108.sub-70-197-146.myvzw.com) has left #ceph
[22:00] <gregaf> ceph mds add_data_pool <poolid>
[22:00] <gregaf> amatter: ^
[22:00] <todin> nhm: as I looked at it, the e5 was to expensive, I do a cloudhosting plattform for endcostumer, and the profit margins a small
[22:01] <amatter> gregaf: thanks!
[22:01] * loicd (~loic@ has joined #ceph
[22:01] * lofejndif (~lsqavnbok@04ZAAAWFY.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:01] * synapsr (~synapsr@ has joined #ceph
[22:01] <nhm> todin: Yeah, it's possible that it's overkill.
[22:02] <nhm> todin: at some point I'm going to try restricting cores on the test platform I have to look at how performance scales with more CPU cores.
[22:02] <todin> nhm: does inktank already do consultant service, for project which are in an early phase?
[22:04] <nhm> todin: We do offer some consultation services. You'd have to talk to one of our business folks to see if it would make sense for your purposes...
[22:05] <todin> nhm: ok, you have some contact for me?
[22:06] <nhm> todin: Dona Holmberg is our Business Development Manager: dona@inktank.com
[22:07] <todin> nhm: ok, I meet her on the world hosting days
[22:09] <nhm> todin: I don't want to discourage you from giving us lots of money, but I would try out the 510 and see how it does first. :)
[22:09] <alphe> bye
[22:09] * alphe (~alphe@ Quit (Quit: Leaving)
[22:09] <nhm> todin: And then tell your customer to buy a support contract with the money saved. ;)
[22:10] <dmick> todin: also, of course, http://www.inktank.com/
[22:10] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[22:10] <todin> nhm: If we want to roll out ceph at our campany we need a support contract, because of our risk managment system.
[22:11] * synapsr (~synapsr@ has joined #ceph
[22:11] <rweeks> fortunately Inktank offers those. :)
[22:12] <todin> dmick: we need atm guidance how to size nodes and cluster sizes for an vms hosting system
[22:12] <todin> dmick: that is what you offer?
[22:12] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[22:13] * miroslavk (~miroslavk@ has joined #ceph
[22:13] <dmick> todin: those are some notes around many sorts of things we do. I'm sure we can flex to fit your needs, but, the point is, there's infrastructure and current offerings that we've packaged up
[22:14] * loicd (~loic@ Quit (Quit: Leaving.)
[22:14] * miroslavk (~miroslavk@ Quit ()
[22:15] <todin> dmick: you have CEO compatible info material, which I can give to him?
[22:15] <rweeks> Dona can get that sort of thing to you
[22:16] <dmick> todin: what rweeks said
[22:16] <todin> ok
[22:16] * Ryan_Lane (~Adium@ has joined #ceph
[22:17] <nhm> Ryan_Lane: good afternoon
[22:21] <Ryan_Lane> nhm: howdy
[22:22] <nhm> Ryan_Lane: how goes the ceph testing?
[22:22] <Ryan_Lane> haven't started it yet
[22:22] * scuttlemonkey (~scuttlemo@ Quit (Quit: This computer has gone to sleep)
[22:22] <Ryan_Lane> hopefully will soon :)
[22:22] <nhm> Cool. :)
[22:23] <Ryan_Lane> Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position'
[22:24] <Ryan_Lane> whoops
[22:24] <Ryan_Lane> wrong channel
[22:24] <nhm> that's a mysql error?
[22:24] * loicd (~loic@ has joined #ceph
[22:25] <todin> nhm: but if I am cpu bound, why is it faster when I do the journal in a ramdisk?
[22:25] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) has joined #ceph
[22:25] * scuttlemonkey (~scuttlemo@ has joined #ceph
[22:25] <rweeks> ram is on a faster bus than the SSD
[22:26] <rweeks> CPU to RAM, big fast bus dedicated to CPU to RAM
[22:26] <rweeks> CPU to SSD, you transit PCI Express
[22:26] <rweeks> I am just guessing that's at least part of the difference)
[22:26] <nhm> todin: That's one of the reasons why I'm curious what you will see if you test with Intel 510s.
[22:28] <todin> rweeks: but I do not saturate the pci buss, and if I test with fio I get much more throughput
[22:29] <todin> nhm: but the 510 wouldn't be an option for a production system
[22:29] * cowbell (~sean@ has joined #ceph
[22:29] <todin> the write endurance is do low
[22:31] <nhm> todin: true, it may just mean if you need the write endurance you'll have to use a faster enterprise SSD or more smaller ones.
[22:31] * synapsr (~synapsr@ has joined #ceph
[22:31] <cdblack> todin: there's a new Intel ssd coming out (successor to the 710) which will be a definate prod option, can't give any details but it's very prod-worthy look for it in the next few months or so
[22:33] <todin> nhm: I have the 120GB modle of the 510SSD acording to the datasheet write througput is around 210MB/s
[22:33] <todin> cdblack: slc or mlc?
[22:34] <cdblack> everyones going away from mlc, even fusion IO is mlc now
[22:34] <nhm> todin: I can do about 450-500MB/s on the 180GB model.
[22:35] <todin> cdblack: where are useing the 710 in one of our storage products already, and we have to change them quite offen
[22:35] <cdblack> your throughput on any device will be completely dependent on block size, controller, and CPU frequency
[22:36] <cdblack> todin: the next '710' - DC 3700 I think will fix that swap rate you're seeing
[22:36] <todin> cdblack: hopefully
[22:37] <todin> nhm: I just found a server with a 510 in it, I got 442MB/s with bs=1M
[22:37] <nhm> todin: that sounds about right.
[22:37] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:37] <nhm> todin: I do 3 journals per 510.
[22:37] <cdblack> todin: have you looked at any of the NAND stats from the toolkit and are you seeing high write amp in your 710's you have to swap out?
[22:38] <nhm> todin: 4 journals per 510 is probably reasonable too.
[22:38] <cdblack> just curious
[22:38] <todin> cdblack: I dunno, I am not an admin on this plattform, but I will ask
[22:39] <cdblack> cool, if your write amp is up over 3 you can adjust the LBA on the drive to help get that down
[22:39] <todin> nhm: I goint to try one journal per 510, just swap the 710 with the 510
[22:39] <todin> cdblack: you work for intel?
[22:39] <cdblack> I do, in the intrest of transparencey
[22:39] <nhm> todin: Cool. I'll be very curious how it goes.
[22:40] <nhm> cdblack: Nice, welcome. :)
[22:40] <cdblack> ty
[22:40] <todin> cdblack: I am also waiting for an 910 to arrive, I had an pre production sample with slc, I liked that
[22:40] <nhm> cdblack: I'm looking forward to seeing what you guys do with qlogic's IB and Cray's interconnect divisions. :)
[22:41] <cdblack> todin: be careful with the 910, the 400/800 GB product shows up as 4x logical devices of 99 or 199GB for some folks that's a show stopper
[22:41] <cdblack> nhm: so am I, zero news there
[22:41] <todin> cdblack: I know that, becaue of the protoype
[22:42] <nhm> bbl guys, gotta run for a while.
[22:42] <todin> cdblack: do you have a link to a howto how to read the nand stats?
[22:42] <todin> my server here as four 710 in it
[22:43] <todin> I use them as a ceph journal device and I have quite hight wait time on it, around 30-40ms
[22:44] <cdblack> the 710s are really nice but the write IOPS was a bit low 2.7k, looking forward to the next one, what do you use to measure latency on your journal device?
[22:44] <cdblack> I'm only about a year into Linux so I don't know all the tools yet
[22:44] <todin> cdblack: I just look at iostat
[22:45] <cdblack> ah, what size 710?
[22:45] <todin> 200GB
[22:46] <todin> the intel toolbox is just for win?
[22:47] <cdblack> Yeah, toolkit only works in windows so far. You can probably get better latencies out of it if you nerf the LBA, do a fresh low level format, then reduce the LBA size by 30-50 GB and you see your latencies go way down even after the drive fills. The extra space gives the wear leveling algorythm mor room and makes it more efficient
[22:48] <cdblack> Depending on your use case, the improvement in latency may be more valuable than the physical space
[22:48] <todin> cdblack: I have on the ssd just a 50 partition which I use
[22:48] <todin> 50G
[22:50] <cdblack> If you had it behind a RAID controller that would work if you set the logical volume up in the controller, I don't think that will work by just a partition I had to use an LBA tool here for the nerf to work properly
[22:51] * slang (~slang@ace.ops.newdream.net) Quit (Ping timeout: 480 seconds)
[22:51] <todin> cdblack: hm, I will give that than a try as well, but before I trim the ssds
[22:52] <cdblack> so who do you work for todin?
[22:53] <cdblack> if you can say
[22:53] <todin> for a hosting campany of the german telekom
[22:54] <rweeks> t-systems?
[22:54] <rweeks> :)
[22:54] <cdblack> sweet, you're a busy man - guten abend!
[22:54] <todin> rweeks: no, strato
[22:54] <rweeks> ah ok
[22:54] <todin> rweeks: you work for t-system?
[22:54] <rweeks> no, they were a customer of my last company
[22:55] <rweeks> (NetApp)
[22:55] <todin> rweeks: nice, we have netapp as well
[22:55] <rweeks> many people do. :)
[22:55] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Quit: Leaving)
[22:56] <todin> rweeks: yep, but they are to expensive for cloud hosting
[22:56] <rweeks> agreed
[22:56] * sjustlaptop (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) Quit (Read error: Connection reset by peer)
[22:56] <rweeks> personally I'd say they're too expensive, period, but that's another discussion
[22:57] <cdblack> that's why we're all looking at CEPH right? They're all too expensive!
[22:57] * sjustlaptop (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) has joined #ceph
[22:57] <rweeks> I hope that's just one reason
[22:57] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[22:57] <todin> and ceph has a bigger feature set for hosting than other have
[22:58] <rweeks> cdblack: I came to work at Inktank because I like what Ceph can do. It's not something that the bigger storage vendors have seemed to figure out
[22:58] <rweeks> NetApp is stuck with WAFL and I don't know how they will dig their way out
[22:59] <cdblack> agreed, the scale and ease with wich you can add/remove OSDs is pretty amazing
[23:01] <cdblack> bbl, conf call time
[23:01] * synapsr (~synapsr@ has joined #ceph
[23:03] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[23:06] * sjustlaptop (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[23:09] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) Quit (Quit: tryggvil)
[23:09] * synapsr (~synapsr@ Quit (Ping timeout: 480 seconds)
[23:11] <todin> does anyone of you guys come to the amsterdam workshop?
[23:12] * sjust-phone2 (~sjust@24-205-61-15.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[23:12] <rweeks> there will be several people from Inktank there
[23:12] <rweeks> not sure if Sage is going, but I think he is
[23:13] <gregaf> yep, Sage and I will be there
[23:13] <gregaf> I think that's all the tech guys
[23:13] <gregaf> but several more business people too — Bryan, Ed, Jude at least
[23:13] <rweeks> I think Ross as well
[23:14] <todin> that's great, do you already know where the workshop is?
[23:14] <gregaf> not I
[23:14] <todin> I have to book the hotel quite in advance
[23:17] * synapsr (~synapsr@ has joined #ceph
[23:17] <rweeks> details here:
[23:17] <rweeks> https://www.42on.com/events/
[23:17] <rweeks> Tobacco Theatre
[23:17] <rweeks> Nes 75
[23:17] <rweeks> 1012 KD Amsterdam
[23:17] <rweeks> The Netherlands
[23:18] * justinwarner1 (~ceg442049@osis111.cs.wright.edu) has joined #ceph
[23:19] * sjusthm (~sam@24-205-61-15.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[23:19] <todin> rweeks: great, a few days ago there were no info online
[23:19] <rweeks> The ceph page points to the 42on page. 42on is hosting the event.
[23:21] * synapsr (~synapsr@ Quit (Remote host closed the connection)
[23:22] * tryggvil (~tryggvil@163-60-19-178.xdsl.simafelagid.is) has joined #ceph
[23:25] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[23:28] * synapsr (~synapsr@ has joined #ceph
[23:32] * synapsr_ (~synapsr@ has joined #ceph
[23:32] * synapsr (~synapsr@ Quit (Read error: Connection reset by peer)
[23:34] * sjust-phone (~sjust@md60536d0.tmodns.net) has joined #ceph
[23:34] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[23:37] <elder> sage, it looks like btrfs has a problem that is at least contributing to bug 3291.
[23:38] <dmick> sure does
[23:38] <elder> They build up all their bio's using logical offsets, then translate them to use physical sector offsets late. This means that the merge function is not getting the correct sector offset to use in determining whether a given bio_vec should be added.
[23:39] <elder> They *do* properly use the bio_add_page() interface, they just don't provide the the right information in the process.
[23:39] <sage> ah! that makes sense
[23:40] <sage> send email to linux-btrfs and throw the ball to them, probably?
[23:40] <elder> Josef said he would cry if that was the cause.
[23:40] <elder> They know about it.
[23:40] <elder> Chris and Josef have been very helpful for the last hour or two.
[23:40] <sage> sweet
[23:40] <sage> heh
[23:40] <elder> Chris did point out a problem with my "improved" merge function, which I'll fix.
[23:41] <elder> But it sounds like it won't be an easy fix, or if it's done easily it will not be pretty.
[23:41] <rweeks> good, pretty, fast
[23:41] <rweeks> pick two
[23:42] <sage> yeah... at least it's not something rbd is doing wrong, it sounds like
[23:42] <elder> I'll start recording what I know, and when Chris and Josef return to their keyboards we'll figure out next steps.
[23:42] <sage> great
[23:42] <elder> Well, rbd did have a bad merge function.
[23:42] <elder> But it wasn't the casue of this.
[23:42] <sage> yeah. :) good detective work!
[23:42] <elder> Maybe the causes of small I/O's though, we'll see.
[23:42] <elder> You have to stick with it. I'm glad I asked Josef though.
[23:43] * cowbell (~sean@ has left #ceph
[23:43] <elder> rweeks, pretty good, pretty fast?
[23:43] * sjustlaptop (~sam@md60536d0.tmodns.net) has joined #ceph
[23:44] * synapsr_ (~synapsr@ Quit (Remote host closed the connection)
[23:45] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:46] <slang> sage: wip-3346 now runs pjd.sh successfully
[23:46] <sage> sweet
[23:46] <sage> i'll take a look now
[23:46] <rweeks> hehe
[23:46] <rweeks> it rarely works that way, elder
[23:47] <sage> slang: that works because the O_CREAT in fuse is translated into an ll_create or something?
[23:47] <sage> and doesn't go through ll_open?
[23:49] * jks (~jks@3e6b7571.rev.stofanet.dk) Quit (Remote host closed the connection)
[23:50] <sage> slang: also, on the O_RDONLY thing, i think the "correct" way is if ((mode & O_ACCMODE) == O_whatever)
[23:52] * justinwarner1 (~ceg442049@osis111.cs.wright.edu) has left #ceph
[23:52] * amatter (~amatter@ Quit (Ping timeout: 480 seconds)
[23:53] * synapsr (~synapsr@ has joined #ceph
[23:53] <slang> sage: translated yes
[23:54] <slang> sage: although it looks like the kernel module does the translating
[23:54] <sage> cool.
[23:54] <sage> yeah, there was some recent changes in that area too with the 'atomic open' stuff that went into 3.5(?). anyway, looks good.
[23:54] <slang> sage: maybe it won't always and we should add the check to ll_open?
[23:55] * jks (~jks@3e6b7571.rev.stofanet.dk) has joined #ceph
[23:55] <slang> if we get O_CREAT with ll_open though, we don't get a mode...
[23:55] <tziOm> Keep up the good work, guys
[23:55] <sage> that's what your the patch did, right? that seems right... as long as the caller branches into ll_open or ll_create we're ok
[23:57] <slang> sage: right, the patch will do the check_perms with ll_open even if O_CREAT is specified, which could break
[23:57] <slang> sage: maybe we should just add an assert that ll_open doesn't get called with O_CREAT
[23:58] <sage> hmm maybe...
[23:59] <slang> sage: re O_RDONLY, the & O_ACCMODE bit is at the top of the check_mode() call

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.