#ceph IRC Log


IRC Log for 2010-10-27

Timestamps are in GMT/BST.

[0:23] <jantje> looks like a busy evening/day
[1:06] * seibert (~seibert__@ has joined #ceph
[1:21] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[2:19] * delx (~delx@dslb-092-073-169-246.pools.arcor-ip.net) has joined #ceph
[2:19] <delx> Hey
[2:20] <delx> I followed the "Install on debian" wiki page, but mount takes very long and then quits with input/output error. What's wrong? :)
[2:21] <delx> I have the kernel client and the daemons both on my local machine
[2:21] * cmccabe (~cmccabe@adsl-76-202-119-74.dsl.pltn13.sbcglobal.net) has joined #ceph
[2:23] * eternaleye_ is now known as eternaleye
[2:23] * delx (~delx@dslb-092-073-169-246.pools.arcor-ip.net) Quit ()
[2:23] <sagewk> cmccabe: how's the gui stuff looking?
[2:23] <cmccabe> pretty good
[2:24] <cmccabe> after some fighting with automake over gtk2
[2:24] <cmccabe> I had to step out for an hour to go vote so I'll be working on it later tonight
[2:25] <cmccabe> oh btw
[2:25] <cmccabe> what is a clientmap? (or what did it used to be?)
[2:28] <sagewk> it used to just be the max client_id. it is no more
[2:29] <cmccabe> k
[3:00] <dubst> When you've created ceph over ext3; are you able to remove the journaling from an ext3 partition?
[3:01] <sagewk> more or less. there is a bit of code to trigger a sync that should be fixed, but it'll basically work.
[3:07] * seibert (~seibert__@ Quit (Quit: This computer has gone to sleep)
[3:07] <dubst> And if I wanted to change ceph.conf restarting ceph it should work right? I won't need to recreate the fs?
[3:07] <sagewk> depends on the change. if you're not adding/removing nodes or changing data paths, then generally a restart is all that's needed
[3:08] <dubst> Oh I see, thanks.
[4:10] * sjust (~sam@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[4:21] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[4:32] * jantje_ (~jan@paranoid.nl) has joined #ceph
[4:32] * jantje (~jan@paranoid.nl) Quit (Read error: Connection reset by peer)
[5:00] * todinini_ (tuxadero@kudu.in-berlin.de) Quit (Remote host closed the connection)
[5:14] * todinini (tuxadero@kudu.in-berlin.de) has joined #ceph
[5:30] * henry_c_chang (~chatzilla@ has joined #ceph
[5:33] * henry_c_chang (~chatzilla@ Quit ()
[5:35] <dubst> Is there a specific way you need to setup your ceph.conf when your mon,node and osd's are on one machine?
[5:42] <cmccabe> have you tried vstart.sh?
[5:42] <cmccabe> that will generate a ceph.conf that has the configuration you describe (where all are on the same machine)
[5:43] <cmccabe> and runs the daemons too
[5:43] <dubst> Aha, nooo
[5:51] <dubst> Looks like vstart.sh is only in my git clone; should it not be anywhere else?
[5:52] <cmccabe> ?
[5:52] <cmccabe> it's just for testing
[5:53] <cmccabe> you would normally use the scripts in /etc/init.d/
[6:00] <dubst> Ah okay that's what I was wondering still some bummerish issues. I'll look more into my (which I'm sure it is) faults :d
[6:02] <cmccabe> well, the docs are probably not... er... ideal
[6:03] <cmccabe> don't be discouraged though :)
[6:57] * dubst (~me@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[6:57] * dubst (~me@ip-66-33-206-8.dreamhost.com) has joined #ceph
[7:10] <dubst> cmccabe: thanks for the advice on using the vstart.sh conf! awesome
[7:10] <cmccabe> :)
[7:11] <cmccabe> you can set the number of OSDs if you set an environment variable before running vstart
[7:11] <cmccabe> CEPH_NUM_OSD
[7:12] <dubst> vstart seemed to detect that I'm only using 1 OSD; which is really what I was hoping for.
[7:22] <dubst> if osd data is for the file system that data will be stored in, what is mon data for?
[7:23] <cmccabe> osd data relates to objects
[7:23] <cmccabe> monitor data relates more to the file system as a whole
[7:23] <dubst> Oh I see.
[7:24] <cmccabe> I'm not sure exactly what data monitors store. I don't think it's very large.
[7:24] <cmccabe> I haven't looked at that code in depth
[7:25] <cmccabe> one of the other guys on the team would know about that
[7:29] <dubst> Cool.
[7:35] <dubst> lol @ me not paying attention to big red signs that says this over-rides data
[7:52] * dubst (~me@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[7:57] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:31] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[8:31] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:35] * dubst (~me@pool-173-55-24-140.lsanca.fios.verizon.net) has joined #ceph
[9:02] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:37] * allsystemsarego (~allsystem@ has joined #ceph
[10:18] * gregorg (~Greg@epoc-01.easyrencontre.com) has joined #ceph
[10:18] * Yoric (~David@ has joined #ceph
[11:36] * gregorg (~Greg@epoc-01.easyrencontre.com) Quit (Quit: Quitte)
[11:45] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[11:55] * delx (~delx@dslb-188-101-130-130.pools.arcor-ip.net) has joined #ceph
[11:55] <delx> Hey
[12:09] <dubst> Hi
[12:11] <delx> I saw a presentation (ceph-final.ppt on google) that ceph does not perform with small file reads. Is this an architectural issue?
[12:27] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[12:32] * delx (~delx@dslb-188-101-130-130.pools.arcor-ip.net) Quit (Remote host closed the connection)
[12:38] * johnl (~johnl@cpc3-brad19-2-0-cust563.barn.cable.virginmedia.com) has joined #ceph
[13:43] <jantje_> 2010-10-27 13:43:15.266709 7f16908c6710 -- >> pipe(0x2967500 sd=19 pgs=0 cs=0 l=0).connect claims to be not - wrong node!
[13:43] <jantje_> i'm getting this all the time
[13:43] <jantje_> can I ignore it?
[14:23] * MarkN (~nathan@ Quit (Quit: Leaving.)
[15:10] * seibert (~seibert__@drl-dhcp092.sas.upenn.edu) has joined #ceph
[15:45] * seibert (~seibert__@drl-dhcp092.sas.upenn.edu) Quit (Ping timeout: 480 seconds)
[15:46] * seibert (~seibert__@drl075.apng.sas.upenn.edu) has joined #ceph
[15:47] * deksai (~deksai@96-35-100-192.dhcp.bycy.mi.charter.com) Quit (Ping timeout: 480 seconds)
[16:08] * alexxy (~alexxy@ has joined #ceph
[16:08] * alexxy[home] (~alexxy@ Quit (Read error: Connection reset by peer)
[16:38] * Meths_ (rift@ has joined #ceph
[16:45] * dubst (~me@pool-173-55-24-140.lsanca.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[16:45] * Meths (rift@ Quit (Ping timeout: 480 seconds)
[16:50] * Meths_ is now known as Meths
[17:11] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:15] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:15] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:57] * greglap (~Adium@ has joined #ceph
[18:21] <cmccabe> jantje: I don't know. You could ask the mailing list?
[18:25] * tjikkun (~tjikkun@2001:7b8:356:0:204:bff:fe80:8080) Quit (Remote host closed the connection)
[18:30] * seibert (~seibert__@drl075.apng.sas.upenn.edu) Quit (Ping timeout: 480 seconds)
[18:30] * deksai (~deksai@dsl093-003-018.det1.dsl.speakeasy.net) has joined #ceph
[18:31] * seibert (~seibert__@drl-dhcp092.sas.upenn.edu) has joined #ceph
[18:38] * greglap (~Adium@ Quit (Read error: Connection reset by peer)
[18:38] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:38] * sagelap (~sage@ip-66-33-206-8.dreamhost.com) has left #ceph
[19:06] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:10] * Yoric (~David@ Quit (Quit: Yoric)
[19:10] * sjust (~sam@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:30] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[20:40] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[20:53] * dubst (~me@pool-173-55-24-140.lsanca.fios.verizon.net) has joined #ceph
[21:18] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[22:23] * dubst (~me@pool-173-55-24-140.lsanca.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[22:59] * dubst (~me@ip-66-33-206-8.dreamhost.com) has joined #ceph
[22:59] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[22:59] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit ()
[23:00] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:09] * seibert (~seibert__@drl-dhcp092.sas.upenn.edu) Quit (Ping timeout: 480 seconds)
[23:21] * dubst (~me@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[23:23] * terang (~me@ip-66-33-206-8.dreamhost.com) has joined #ceph
[23:24] <jantje_> evening
[23:24] <jantje_> (or afternoon for some people)
[23:25] <cmccabe> evening
[23:26] <terang> hello
[23:26] <jantje_> when writing a large file with dd, blocksize=4k count=LARGE
[23:26] <jantje_> what happens on the ceph side?
[23:26] <jantje_> does every block go into a different PG ?
[23:27] <jantje_> becuase when I bench every OSD I get speeds like 80-90MB/s
[23:27] <jantje_> and I have my journal on a memory filesystem, 1GB size
[23:27] <gregaf> it depends on how your CRUSH map is set up for the particulars
[23:28] <jantje_> so lets assume the journal does not influence writing speeds of the OSDs
[23:28] <gregaf> but if you write enough blocks then yes, they will be approximately randomly distributed across all your PGs/OSDs
[23:28] <jantje_> i 'only' get 120MB/sec with 6 OSDs on 3 machines (every machine has 4x1gbit LAG)
[23:29] <jantje_> (iperf throughput = 2.2gbit)
[23:29] <gregaf> do they each have their own spindle?
[23:29] <jantje_> yes
[23:29] <cmccabe> what's 4x1gbit LAG
[23:29] <cmccabe> link aggregation?
[23:30] <jantje_> bonding
[23:30] <jantje_> not real LAG
[23:30] <jantje_> I can get real LAG, I can hook up Alcatel-Lucent 7750 service router :P
[23:30] <jantje_> but total overkill for what I need I think
[23:30] <cmccabe> you could try benchmarking radostool on the same setup
[23:30] <gregaf> you don't have it set to sync writes, do you?
[23:31] <jantje_> journal is flushed in parallel, is that what you mean?
[23:31] <gregaf> no, just checking you didn't configure it to avoid client-side buffering
[23:31] <gregaf> oh, does your client have bonded ethernet?
[23:32] <jantje_> oh, it really buffers on the client, small writes are certainly bufferd
[23:32] <jantje_> ofcourse :-)
[23:32] <gregaf> just checking :)
[23:32] <jantje_> same configuration, same hardware as servers (except that the client boots from NFS)
[23:34] <gregaf> well that's about half the performance I'd expect to see
[23:34] <gregaf> can you run the OSD bench on each of your OSDs and see what it reports?
[23:34] <gregaf> ./ceph tell osd # bench
[23:34] <gregaf> for every number
[23:34] <gregaf> it'll write 1GB to each OSD and report the results to the central log, so if you run ./ceph -w you'll see the results as they finish
[23:35] <jantje_> just a second
[23:37] <gregaf> or ./ceph osd tell * bench will tell them all
[23:51] <jantje_> ok, now I have time to check it, just another minute :P
[23:57] <jantje_> gregaf http://pastebin.com/7Hb8m2Eq , the osd's tell the journal is full and throttle
[23:58] <jantje_> anyway, osd1 is consistently show slower write speeds, is it because the throttle ? hmm
[23:58] <jantje_> all disks are exactly the same, all brand new
[23:58] <gregaf> the journal throttling means that the main data store can't keep up with journal
[23:58] <gregaf> which makes sense since you're doing a RAM-disk journal
[23:58] <jantje_> whick makes sens
[23:58] <jantje_> e
[23:59] <jantje_> indeed
[23:59] <gregaf> so there's something wrong with osd1
[23:59] <gregaf> it's probably the disk since they're all set up the same, you could try some local benchmarks on them to see

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.