#ceph IRC Log

Index

IRC Log for 2012-04-18

Timestamps are in GMT/BST.

[0:02] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[0:03] * loicd1 (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[0:03] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Read error: Connection reset by peer)
[0:06] * aliguori (~anthony@32.97.110.59) has joined #ceph
[0:13] * rturk (rturk@ds2390.dreamservers.com) Quit (Ping timeout: 480 seconds)
[0:20] * MK_FG (~MK_FG@188.226.51.71) Quit (Ping timeout: 480 seconds)
[0:21] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[0:22] * aliguori (~anthony@32.97.110.59) Quit (Quit: Ex-Chat)
[0:24] * MK_FG (~MK_FG@188.226.51.71) has joined #ceph
[0:30] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[0:31] * perplexed_ (~ncampbell@216.113.168.141) has joined #ceph
[0:33] * rturk (rturk@ds2390.dreamservers.com) has joined #ceph
[0:39] * perplexed_ (~ncampbell@216.113.168.141) Quit (Ping timeout: 480 seconds)
[0:39] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[0:50] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[0:57] * lofejndif (~lsqavnbok@28IAAD0SW.tor-irc.dnsbl.oftc.net) has joined #ceph
[1:44] * loicd1 (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: Leaving.)
[1:45] * Tv_ (~tv@aon.hq.newdream.net) Quit (Quit: Tv_)
[1:45] * BManojlovic (~steki@212.200.243.246) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:45] * lofejndif (~lsqavnbok@28IAAD0SW.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:53] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[1:59] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[2:01] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[2:03] * Rankin (~sel@90-228-233-162-no223.tbcn.telia.com) Quit (Ping timeout: 480 seconds)
[2:04] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[2:18] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: Leaving.)
[2:37] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[3:00] * adjohn (~adjohn@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: adjohn)
[3:00] * gohko (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[3:01] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[3:02] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[3:03] * ssedov (stas@ssh.deglitch.com) has joined #ceph
[3:03] * stass (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[3:09] * loicd (~loic@99-7-168-244.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[3:48] * joao (~JL@89-181-153-140.net.novis.pt) Quit (Ping timeout: 480 seconds)
[3:52] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[4:04] * MK_FG (~MK_FG@188.226.51.71) Quit (Ping timeout: 480 seconds)
[4:18] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[4:20] * MK_FG (~MK_FG@188.226.51.71) has joined #ceph
[4:42] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:46] * chutzpah (~chutz@216.174.109.254) Quit (Quit: Leaving)
[5:57] * gohko (~gohko@natter.interq.or.jp) Quit (Quit: Leaving...)
[6:02] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[6:05] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[6:11] * gohko (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[6:14] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[6:46] * f4m8_ is now known as f4m8
[6:53] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:24] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Ping timeout: 480 seconds)
[9:27] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:33] <chaos_> gregaf, great news! ;-) changing kernel to 3.2 speed up filesysteme
[9:35] <chaos_> online defragmentations do the magic ;p
[10:10] * gohko_ (~gohko@natter.interq.or.jp) has joined #ceph
[10:10] * gohko (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[10:52] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) Quit (Ping timeout: 480 seconds)
[11:23] * joao (~JL@89.181.153.140) has joined #ceph
[11:27] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:37] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[12:45] * lofejndif (~lsqavnbok@83TAAE2F7.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:39] * The_Bishop (~bishop@cable-89-16-138-109.cust.telecolumbus.net) has joined #ceph
[13:55] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:27] * oliver1 (~oliver@p4FFFE03B.dip.t-dialin.net) has joined #ceph
[14:47] * gohko_ (~gohko@natter.interq.or.jp) Quit (Read error: Connection reset by peer)
[14:48] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[14:49] * lofejndif (~lsqavnbok@83TAAE2F7.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[15:17] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[15:41] <nhm> good morning #ceph
[15:48] * f4m8 is now known as f4m8_
[15:50] <joao> hey nhm
[15:50] <joao> morning
[15:52] <nhm> joao: how goes the load generator?
[15:53] <joao> I stopped working on it when I started working on the idempotency tester
[15:53] <joao> it got into a state where it generates a workload
[15:53] <joao> but I'm not sure if it is representative of the common workload of an OSD
[15:54] <joao> do you happen to need it?
[15:55] <nhm> joao: Hopefully once I get these clusters built and running. :/
[15:56] * lofejndif (~lsqavnbok@82VAAC69M.tor-irc.dnsbl.oftc.net) has joined #ceph
[15:56] <nhm> joao: I've got to get some testing for congress going first, then I can play more.
[15:56] <joao> do you have some kind of workload in mind?
[15:57] <joao> at its current state, (iirc) the generator keeps simply doing random operations over a set of collections and objects
[15:57] <nhm> joao: Well, I guess it depends. What are the options for the generator, and how closely does it match what the OSDs do?
[15:58] <joao> let me check on that... if I spend more than a week away from something I tend to forget the details
[15:58] <joao> I've developed a serious case of selective memory
[15:58] <nhm> Is this just doing various operations against the underlying xfs/ext4/btrfs filesystems?
[15:59] <nhm> with a similar distribution of operations that you see from OSDs?
[15:59] <joao> we keep it random
[15:59] <joao> okay, let's put it this way
[16:00] <joao> we allow you to define the number of initial collections and objects on the store
[16:00] <joao> then the ops will be performed at random, without considering ratios
[16:01] <nhm> Ok. Is that reasonable? (I confess that I haven't really thought any of this through and I'm groggy this morning).
[16:01] <joao> well, I'm lying through my teeth
[16:01] <joao> that's not at all what happens actually
[16:02] <joao> it will write an object, set attr's on it and on the collection and append to a log
[16:03] <joao> iirc, Sage thought this would represent the expected OSD behavior
[16:03] <joao> every now and then, we remove an entire collection, but that's also tunable
[16:04] <joao> nhm, one posterior idea was to gather logs from running OSDs and infer the workload from them
[16:04] <nhm> joao: I've got tons of that data if you want it
[16:04] <joao> and then to create tests to mimic them
[16:04] <nhm> joao: like 15GB worth
[16:04] <joao> damn
[16:05] <joao> I guess that would be a great idea
[16:05] <nhm> various IO sizes, simultanous writers, clients, OSDs, pg_bits values, etc.
[16:06] <joao> how did you come to them?
[16:06] <nhm> sam wrote some quick parsing scripts that I eventually want to expand on, but if you have time you could play with them too.
[16:06] <joao> sure
[16:06] <joao> I'm all for it
[16:07] <joao> is there someplace I can get them?
[16:07] <nhm> joao: I modifiedt he teuthology-suite stuff so I could run suites interactively and wrote some scripts to generate fragments programatically.
[16:07] <nhm> joao: everything I've got is on metropolis in /data/nh/archive
[16:07] <nhm> joao: I'm adding more to it periodically
[16:08] <joao> I think I need a ceph cluster at home... it's getting annoying installing a 1TB disk every now and then, and not having 15GB worth of available space when I need them
[16:08] <joao> nhm, kay, will take a look
[16:10] <nhm> joao: each top level directory in there is a different suite run, with directories for each test underneath. In the test directory, there is the config.yaml file for how that test was run, along with the typical "remote" directory for the archive dirs for each remote host.
[16:11] <joao> nhm, are you able to access metropolis?
[16:11] <nhm> joao: every remote host had collectl running on it during the test for system level monitoring (I can help if you want to look at this). The data is stored in the performance directory. ceph or client logs are in the logs directory for the remote host.
[16:11] <nhm> joao: I'm on it now. The host name changed to metroplis.ops.newdream.net
[16:11] <nhm> sorry metropolis
[16:12] <joao> oh...that must be it then :)
[16:12] <joao> okay, ssh'd
[16:12] <nhm> good deal
[16:15] <joao> nhm, do you have, by any chance, logs for the operations performed on the filestore?
[16:15] <joao> oh
[16:15] <joao> silly me
[16:15] <joao> everything should be in the osd.*.log
[16:16] <nhm> joao: yeah, on many of the tests in that directory I've got debug 20 on the OSDs.
[16:40] * dwm__ (~dwm@2001:ba8:0:1c0:225:90ff:fe08:9150) Quit (Ping timeout: 480 seconds)
[16:43] * dwm_ (~dwm@2001:ba8:0:1c0:225:90ff:fe08:9150) has joined #ceph
[17:11] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[17:11] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[17:35] * oliver1 (~oliver@p4FFFE03B.dip.t-dialin.net) has left #ceph
[17:37] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:37] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[17:41] * loicd (~loic@99-7-168-244.lightspeed.sntcca.sbcglobal.net) Quit (Quit: Leaving.)
[17:59] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[18:04] * technicool (~manschutz@pool-96-226-55-169.dllstx.fios.verizon.net) has joined #ceph
[18:06] <nhm> woo, one of the interns I mentored just got a NDSEG award.
[18:07] <joao> congratz
[18:09] <nhm> joao: I'll relay your congrats to him, I didn't do anything. ;)
[18:09] * perplexed (~ncampbell@216.113.168.141) has joined #ceph
[18:11] <technicool> Does ceph encrypt any/all communication between nodes?
[18:12] <nhm> technicool: http://ceph.newdream.net/wiki/FAQ#What_traffic_does_cephx_protect.3F
[18:14] <technicool> nhm: thanks. So, basically, all communication between nodes is clear?
[18:14] <technicool> Has anybody tried using ceph over stunnel or the like?
[18:14] <technicool> my need is for HIPAA reasons
[18:16] <nhm> technicool: I haven't found any way to encrypt all traffic, but someone else can probably confirm for sure.
[18:17] * perplexed (~ncampbell@216.113.168.141) Quit (Quit: perplexed)
[18:20] <gregaf> yep, there's currently no built-in traffic encryption :(
[18:21] <nhm> gregaf: ever looked at hpn-ssh?
[18:21] <gregaf> never heard of it, so no?
[18:22] <nhm> gregaf: http://www.psc.edu/networking/projects/hpn-ssh/
[18:26] <gregaf> yay ssh
[18:26] <gregaf> right now I think it's two things that have kept us from implementing any kind of network securtiy:
[18:26] <gregaf> 1) concern over cpu utilization
[18:26] <gregaf> 2) programmer laziness (aka time)
[18:26] <gregaf> with (2) being much more important ;)
[18:27] <nhm> gregaf: yeah, for #1 AESNI is probably the way to go on compatible chips.
[18:28] <nhm> just found an interesting paper from Intel where they talk about aesni, hpn-ssh, etc: ftp://download.intel.com/support/network/sb/fedexcasestudyfinal.pdf
[18:29] <nhm> but #2 is indeed probably the bigger concern. ;)
[18:30] <gregaf> I'm actually not sure if it would be that hard to just stick an encryption library under a Messenger
[18:31] <gregaf> but???security? what's that?
[18:32] <gregaf> speaking of which I should find a crypto text, because distributed security is a nifty problem
[18:42] <nhm> gregaf: download.intel.com/design/intarch/papers/324238.pdf
[18:43] <nhm> gregaf: that talks both about single stream and multi-stream AES-NI results.
[18:43] <gregaf> someday :)
[18:50] <technicool> lots of good thoughts... do you know if TCP is exclusively used between the osd, mds, mon services? if any UDP or other stuff is used, that would complicate encryption signifigantly. Otherwise, we should be able to stick in some x509 / certificates into the configurations
[18:51] <gregaf> yeah, it's just tcp
[18:51] <gregaf> it's just that for something like HIPAA I believe you'd need to do a lot of other stuff, since none of the rest of the system is encrypted either
[18:52] <gregaf> (I'm not sure what the technical requirements are or if they can be satisfied just by doing stuff like turning on disk encryption or not)
[18:54] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: Leaving.)
[18:55] <nhm> gregaf: We were being pressured into supporting HIPAA data storage/transfer at my last job. I nevered ended up going through tha training, but some of the requirements seem pretty crazy.
[18:55] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[18:56] <nhm> sadly the professors we worked with basically ignored everything and just uploaded random HIPAA data to our clusters without informing anyone including us.
[18:56] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit ()
[18:56] <SpamapS> Question about python-ceph
[18:57] <SpamapS> is it useful without the rgw stuff?
[18:59] <SpamapS> I'm asking because we're not shipping radosgw or librgw in the Ubuntu 12.04 CEPH packages (due to security's concerns which are reported and being addressed I believe).
[18:59] <SpamapS> So I'm thinking of not shipping rgw.py at all
[18:59] <SpamapS> right now python-ceph is uninstallable in 12.04 because it lists librgw1 as a dep, which is not available anymore.
[19:01] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[19:01] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[19:02] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit ()
[19:02] * loicd1 (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[19:02] <yehudasa> SpamapS: from what I understood, the reason you didn't want to ship radosgw on 12.04 was due to libfcgi not in main
[19:02] <sagewk> python-ceph has the librados and librbd wrappers, so it's probably useful. not needed for kvm etc tho
[19:03] <sagewk> does librgw1 depend on libfcgi?
[19:03] <sagewk> (... in v0.41?)
[19:05] <SpamapS> yehudasa: right, no time to review it for security/bugs/etc.
[19:05] <SpamapS> sagewk: we'd have to patch the build to build librgw without radosgw
[19:05] <SpamapS> --with-radosgw controls them both
[19:06] <yehudasa> SpamapS: maybe a flag to remove python-ceph <-> librgw1 dependency?
[19:07] <SpamapS> So I can just drop the dep..
[19:07] <SpamapS> but then people might try to import rgw
[19:07] <SpamapS> and that will fail spectacularly
[19:07] <sagewk> are there even python wrappers for librgw? not sure why it is a dep at all?
[19:08] <yehudasa> yeah, I'm having trouble seeing its relevance to python-ceph
[19:08] * loicd1 (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Read error: Operation timed out)
[19:08] <yehudasa> the current librgw is just some unuseful hack that was used for obsync, and it was ripped out anyway
[19:09] <yehudasa> from obsync I mean
[19:10] <yehudasa> .. so guess obsync uses python-ceph?
[19:12] <yehudasa> yeah, in any case, rgw was torn out of obsync prior to 0.41, so I believe it's safe
[19:12] <sagewk> we should remove any lingering python wrappers for librgw and remove the dep in current master
[19:14] * chutzpah (~chutz@216.174.109.254) has joined #ceph
[19:15] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[19:19] <SpamapS> yehudasa: ah, so the 'rgw.py' that is in python-ceph is not actually useful?
[19:21] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[19:23] <SpamapS> Should I report this as a bug?
[19:25] * SpamapS awaits confirmation before running test build and uploading an rgw-less python-ceph to Ubuntu 12.04
[19:27] * lofejndif (~lsqavnbok@82VAAC69M.tor-irc.dnsbl.oftc.net) Quit (Read error: Connection reset by peer)
[19:28] * lofejndif (~lsqavnbok@09GAAE2WW.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:34] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[19:35] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[19:36] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[19:37] * lofejndif (~lsqavnbok@09GAAE2WW.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[19:40] <yehudasa> SpamapS: I'm verifying
[19:41] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Read error: Operation timed out)
[19:46] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[19:50] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[19:50] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[19:50] * grape (~grape@216.24.166.226) Quit (Ping timeout: 480 seconds)
[19:50] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[19:50] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit ()
[19:59] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[20:00] * loicd1 (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[20:00] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Read error: Connection reset by peer)
[20:01] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[20:01] * ivan\ (~ivan@108-213-76-179.lightspeed.frokca.sbcglobal.net) has joined #ceph
[20:07] * loicd1 (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: Leaving.)
[20:09] * cattelan (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[20:14] <sagewk> elder: kernel gitbuilder looks ok now... i think it's just slow. let me know next time you see it lagging...
[20:23] * MK_FG (~MK_FG@188.226.51.71) Quit (Remote host closed the connection)
[20:24] * MK_FG (~MK_FG@188.226.51.71) has joined #ceph
[20:32] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[20:33] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit ()
[20:40] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[20:54] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Read error: Connection reset by peer)
[20:55] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[21:07] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: Leaving.)
[21:08] * brambles (brambles@79.133.200.49) Quit (Remote host closed the connection)
[21:09] <todin> did I understand it right, that the osd journal ssd should be able to push the same bandwidth as the stable disk storage in an osd?
[21:11] <NaioN> well i don't think should, but would be nice
[21:12] <NaioN> cause else the journal can be a bottleneck
[21:12] <NaioN> but it also depends on the underlying filesystem, with btrfs you can opt for parallel
[21:13] <todin> NaioN: that what I thought, how do you mean that with btrfs?
[21:13] <NaioN> with btrfs you can add the option for parallel writes, to the journal and to the filesystem
[21:14] <NaioN> and the write is acked after the first returns
[21:14] <todin> hmm, I have btrfs on the hd f??r stable storage, the journal uses raw partiotion on the ssd
[21:15] <NaioN> filestore journal parallel
[21:15] <todin> and the ssd bandwidth is my prob right now
[21:16] <todin> the node has 12 hd which could push together around 1800MB/s, the ssd only 500Mb/s, therefore the whole nodes only does 500MB/s
[21:16] <NaioN> if i'm correct the parallel option is only supported on btrfs
[21:16] <NaioN> todin: true that's a problem with a ssd journal
[21:17] <NaioN> with a lot of disks they can be faster than a single ssd
[21:17] <NaioN> you could also add more ssds...
[21:18] <todin> hmm I looked at the OCZ Velodrive, which could push 1000MB/s but that's still not enough
[21:18] <NaioN> well do you really hit the bottleneck?
[21:18] <todin> NaioN: that's right but at some point the pci bus is full
[21:18] <NaioN> because 12 disks doing 1800mb/s is i think really in the most ideal situation
[21:19] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[21:19] <todin> NaioN: only if the node goes from a failed state in to live and resyncs it's data
[21:19] <NaioN> todin: same problem with all the disks
[21:19] <NaioN> our you need a system with enough seperate busses
[21:20] * brambles (brambles@79.133.200.49) has joined #ceph
[21:20] <todin> NaioN: in comodity hardware?
[21:21] <NaioN> no in every hardware
[21:22] <todin> NaioN: I mean you hardly find any mainboard with 3 Pci x8 ports
[21:22] <NaioN> why not?
[21:24] <todin> NaioN: I couldn't find any on the supermicro site
[21:25] <NaioN> http://www.supermicro.nl/products/motherboard/QPI/5500/X8DTH-6F.cfm
[21:25] <NaioN> i;m using this one
[21:26] <NaioN> with 24 disks in a chassis
[21:26] <NaioN> we divide the sas controllers evenly between the two bridges
[21:26] <todin> an supermiceo chasis? nice mainboard, I looked at e3 boards
[21:27] <NaioN> yeps
[21:27] <NaioN> http://www.supermicro.nl/products/chassis/4U/846/SC846A-R1200.cfm
[21:28] <NaioN> with sas multilane connectors on the backplane and 2 extra lsi sas controllers with each 2 multilane ports
[21:28] <NaioN> and 2 multilane connectors on the mainboard to a lsi 2008
[21:29] <NaioN> well we don't use the raid, osd per disk
[21:29] <NaioN> but with a osd per disk you need a fair amount of cpu power
[21:29] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[21:29] <todin> atm I have 3 disk per osd process
[21:29] <todin> and you uses 10G?
[21:30] <NaioN> infiniband 20G
[21:30] <NaioN> cheap! :)
[21:30] <todin> that's nice, we have only 10G spf+
[21:30] <NaioN> yes that's the joke
[21:30] <NaioN> I also looked at the 10G ethernet options
[21:30] <NaioN> but infiniband is just very cheap on ebay
[21:31] <darkfader> yup :)
[21:31] <todin> I use this at work I cannot by from ebay
[21:32] <darkfader> but i think if you run anything like IP then you'll want a 40g card instead of 20g
[21:32] <todin> we have arista 10G switches
[21:32] <NaioN> well don't know how much it is new
[21:32] <darkfader> NaioN: the 56gbit hcas are around Eur800 list price
[21:32] * LarsFronius (~LarsFroni@31-18-137-57-dynip.superkabel.de) has joined #ceph
[21:32] <NaioN> well infiniband is designed for low latency
[21:32] <NaioN> darkfader: we use the 20G (16G effective) for $125 each
[21:33] <darkfader> i have mostly 20g too, but they don't have ip offload stuff
[21:33] <darkfader> and the newer ones seem to
[21:33] <todin> do you know the latency of ib? can you relly measure it in the ceph perfomrance?
[21:33] <darkfader> todin: yes, very very low and about the second i don't know, i am not using the stuff at the moment
[21:34] <NaioN> darkfader: do you use rdma?
[21:34] <todin> darkfader: a comparison would be nice
[21:34] <NaioN> darkfader: SDP?
[21:34] <darkfader> NaioN: i only used the IB stuff when testing with gluster (fail fail) and then some more lab-ish work. but the lab is no more so it's all a huge pile
[21:35] <NaioN> todin: http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-7.html
[21:35] <darkfader> NaioN: i didn't have any real app that could make use of rdma, so just benchmarks and sdp yes i had a little bit of exposure
[21:35] <NaioN> if you don't mind LD_PRELOAD :)
[21:35] <darkfader> not really mastered the configuration
[21:36] * jluis (~JL@89-181-153-140.net.novis.pt) has joined #ceph
[21:36] <darkfader> one of the showstoppers for me was that nobody can explain ib partioning
[21:37] <NaioN> but even without the SDP you get alot of bandwidth
[21:37] <todin> does ib use a switch or how do you connect the server?
[21:37] <NaioN> todin: you can use a switch
[21:37] * lofejndif (~lsqavnbok@1RDAAA0UZ.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:37] <darkfader> todin: normally you'll have a switch
[21:38] <NaioN> but you can crossconnect two hosts/nodes
[21:38] <todin> ok, how much would a switch cost?
[21:38] <NaioN> well the 10G is about $450 (saw them even for less than $100)
[21:38] <NaioN> all ebay
[21:38] <NaioN> and I bought the 20G for 1750
[21:39] <NaioN> but my problem is that not everyone wants to ship worldwide
[21:39] <NaioN> I don't know new prices
[21:39] <todin> NaioN: where are you?
[21:39] <darkfader> todin: new ones should be similar to 10g switches
[21:39] <NaioN> but I use the Cisco SFS 7000P and D
[21:39] <darkfader> 40g that is
[21:39] <NaioN> netherlands
[21:40] <darkfader> NaioN: i have a SFS 3504
[21:40] <todin> darkfader: 24port 10G is as low as 6800 euro
[21:40] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: Leaving.)
[21:40] <darkfader> cisco is our (cheapstakes) friends since they had to kill off their infiniband line to avoid cannibalizing UCS dce stuff
[21:42] <NaioN> darkfader: aha that's the one with blades?
[21:42] * joao (~JL@89.181.153.140) Quit (Ping timeout: 480 seconds)
[21:42] <darkfader> NaioN: one big one with switch and 4 module slots
[21:42] <NaioN> darkfader: partitioning shouldn't be real hard
[21:42] <darkfader> only have one for gigE
[21:43] <NaioN> you create zones
[21:43] <darkfader> NaioN: yes but how do you put that in linux' config files :(
[21:43] <NaioN> oh you want it like tagged vlans to the host?
[21:44] <NaioN> well I don't know that
[21:44] <nhm> just read backlog... Not sure where we top out at yet in terms of per-node performance.
[21:45] <yehudasa> sjust: can you look at master, make sure that the ceph-python change is sane? (basically just removed the files, modified debian/control)
[21:45] <darkfader> NaioN: yes, like vlans. it's easy to do on the switch / ib side, but on linux very vague
[21:45] <nhm> I think best performance in a 2U box is going to come from 9-10 HDs with 2-3 SSDs for journals if you are doing 10G.
[21:45] <NaioN> todin: well besides I think you can be happy if you get 500mb/s per node :)
[21:45] <sjust> I'll take a look, not very familiar with it though
[21:46] <NaioN> at the moment I don't reach the peak performance of the ssd in my setup
[21:46] <sjust> yehudasa: looks reasonable
[21:46] <todin> NaioN: but i could do more
[21:46] <NaioN> but it also depends on the workload
[21:46] <NaioN> what's your workload?
[21:47] <todin> nhm: I like 2 10G interface and then do Mlag to the switch
[21:47] <todin> NaioN: ISP customer vms
[21:47] * mfoemmel (~mfoemmel@chml01.drwholdings.com) has joined #ceph
[21:47] <NaioN> with kvm+rbd
[21:47] <yehudasa> sjust: sagewk: do you know if is there anything that we can run to test that? (python-ceph)
[21:47] <nhm> todin: how many drives do you have per node?
[21:48] <todin> nhm: 12
[21:48] <nhm> todin: with 1 ssd?
[21:48] <todin> nhm: atm yep
[21:49] <nhm> todin: Ok. I haven't tested it yet, but theortically 2 SSDs should help.
[21:49] <nhm> todin: I'm trying to get a setup like that in house so we can test it.
[21:49] <todin> nhm: I think that two, I was looking at pci-e ssd cards
[21:50] <nhm> todin: yeah, that's another option.
[21:50] <NaioN> I use rbd's to a intermediate server and format them with ext3 and use rsync for backups
[21:50] <nhm> todin: kind of pricey though.
[21:50] <todin> right now I am looking at the best combination
[21:50] <darkfader> good thing about 2 ssds instead of a pci-e is that you can replace them online
[21:50] <todin> darkfader: yes if they are in a hot-swap port
[21:51] <NaioN> and yes if ceph deals well with a journal that disappears
[21:51] * Oliver1 (~oliver1@ip-176-198-98-169.unitymediagroup.de) has joined #ceph
[21:51] <darkfader> i figured i'll go with raid5 + hotspare
[21:51] <darkfader> i saw that risingtide recommends that for their ssd layer in the storage boxes
[21:51] <todin> darkfader: why raid5? the redandany is in ceph itself
[21:52] <darkfader> not for the journal
[21:52] <NaioN> darkfader: on disks or ssds?
[21:52] <darkfader> ssds
[21:52] <NaioN> because then you need at least 4 ssds
[21:52] <darkfader> yes
[21:53] <NaioN> per node
[21:53] <darkfader> NaioN: i basically have one "ceph box", that will have 6 ssds and 16 disks
[21:53] <darkfader> the others will just have 6 disks + 1 ssd
[21:53] <darkfader> + 1 backup disk
[21:53] * mfoemmel (~mfoemmel@chml01.drwholdings.com) has left #ceph
[21:54] * mfoemmel (~mfoemmel@chml01.drwholdings.com) has joined #ceph
[21:54] <NaioN> ok so you don't have it distributed evenly
[21:54] <darkfader> NaioN: couldnt :(
[21:54] <todin> darkfader: what kind of contoler do you use?
[21:54] <nhm> darkfader: interesting config. What kind of networking into the 16 drive box?
[21:54] <NaioN> todin: i recommend the lsi's
[21:55] <NaioN> they have good support under linux
[21:55] <todin> NaioN: yes and a nice cli tool
[21:55] <darkfader> todin: the storage thing has an areca1680 for the disks, the ssds will be on the onboard ports
[21:55] <darkfader> and the others have lsi
[21:55] <nhm> darkfader: have you had any problems with the 1680s? We kept having failures on our 8k core cluster when we were using them.
[21:55] * LarsFronius (~LarsFroni@31-18-137-57-dynip.superkabel.de) Quit (Quit: LarsFronius)
[21:56] <nhm> darkfader: we were beating on them pretty hard though.
[21:56] <darkfader> nhm: so far, with it on my desk, it has behaved
[21:56] <NaioN> hmmm yeah our experience with areca wasn't that good either
[21:56] <darkfader> yup i'm not expecting it to give no trouble
[21:56] <NaioN> also a lot of trouble with failed disks
[21:56] <nhm> NaioN: same here
[21:57] <NaioN> some areca's are very picky on the backplane / disk combo
[21:57] <darkfader> i'd hope it lasts a year or so, then i can go with upgrading it
[21:57] <darkfader> (to lsi)
[21:57] <nhm> NaioN: We had the eaxct same disks in a netapp and in some storage boxes connected to the Areca controllers. In the netapp, 0 failed disks after 5 years. The Arecas were reporting 1-2 failed disks a month.
[21:57] <darkfader> i wanna make it so that this storage brick can go fail
[21:58] <NaioN> with the 4 ports we had several servers with failed disks and after swapping with a good disks the whole raid crashed...
[21:58] <darkfader> nhm: well netapp has their own firmware
[21:58] <darkfader> and QA
[21:58] <NaioN> nhm: we had very much trouble with maxtors
[21:58] <darkfader> areca probably has neither
[21:58] <NaioN> darkfader: true
[21:58] <dmick> nhm: read jabber
[21:59] <nhm> darkfader: fair enough, but 1-2 a month is a little nuts.
[21:59] <darkfader> nhm: yeah :)
[22:00] <nhm> that was with about 400 drives.
[22:01] * gregaf (~Adium@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[22:01] * dmick (~dmick@aon.hq.newdream.net) Quit (Read error: Operation timed out)
[22:03] <darkfader> nhm: did you have a look at the smart data from those?
[22:04] <nhm> darkfader: nope, policy was to ship em back to the vendor and let them deal with it. Having said that, most of them were probably perfectly fine.
[22:05] <darkfader> nhm: hehe
[22:05] <darkfader> that will be an experience i guess
[22:06] <darkfader> need some tea or i'll topple over into bed bbl
[22:06] * mkampe (~markk@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:06] * sagewk (~sage@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:07] * sjust (~sam@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:07] * yehudasa (~yehudasa@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[22:08] * yehudasa (~yehudasa@aon.hq.newdream.net) has joined #ceph
[22:09] * sjust (~sam@aon.hq.newdream.net) has joined #ceph
[22:09] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[22:09] * mkampe (~markk@aon.hq.newdream.net) has joined #ceph
[22:09] <nhm> welcome back all
[22:10] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[22:10] * lofejndif (~lsqavnbok@1RDAAA0UZ.tor-irc.dnsbl.oftc.net) Quit (Max SendQ exceeded)
[22:10] * sagewk (~sage@aon.hq.newdream.net) has joined #ceph
[22:13] * jluis is now known as joao
[22:15] <sagewk> oliver1: can you check if 'rbd writeback window' is present in any of your config files?
[22:15] <sagewk> (sorry if you already replied; network dropped out for a few minutes over here)
[22:16] <darkfader> today is the day of the network drops
[22:16] <Oliver1> @Sage: well, this is an option ever present, yes.
[22:16] <yehudasa> SpamapS: yeah, rgw.py is not useful, I'd say counter-useful. I pushed a fix that removes that to ceph's master branch.
[22:17] <sagewk> oliver1: ok, that's what caused the delete and then EEXIST on create.
[22:17] <sagewk> i'm happy to see that option die :)
[22:17] <sagewk> if you take it out of your config you shouldn't see that behavior any more.
[22:17] <sagewk> oliver1: at this point, i would move to latest master and test the new rbd cache code.
[22:17] <SpamapS> yehudasa: sweeet.. commid ID so I can reference it in the bug fix?
[22:18] <Oliver1> @sage: updated #2311 also.
[22:18] <yehudasa> SpamapS: ee22c97b0f1700fa8c915c71bc3c162ef12c78f3
[22:18] <Oliver1> @sage: was the version I used for first rbd-caching correct?
[22:19] <SpamapS> yehudasa: thanks!
[22:20] <Oliver1> I definitely would love to see the new caching-code perform. But did not get the right place to specify parms, though :-\
[22:23] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:24] <NaioN> rbd caching gets into 0.46?
[22:25] * sjust (~sam@aon.hq.newdream.net) Quit (Remote host closed the connection)
[22:25] <elder> Back in a few hours.
[22:29] <Oliver1> sage: thought of specifiyng ":rbd_cache_enabled=1" instead of ":rbd_writeback_window=xxx" as per the drive param for qemu would do the trick?
[22:30] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[22:30] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit ()
[22:33] <sagewk> :rbd_cache=1
[22:33] <sagewk> no _enabled
[22:34] <sagewk> oliver1: i think that's the way to go, though
[22:34] <sagewk> naion: yeah
[22:34] <wido> sagewk: Short brainspin, but libvirt has cache='writeback'
[22:34] <wido> is that something Qemu specific or could the RBD driver do something with that?
[22:35] <wido> Never found what it really does, nor have I searched really hard :)
[22:37] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[22:37] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[22:37] * tv (~tv@md10536d0.tmodns.net) has joined #ceph
[22:37] * tv is now known as Tv
[22:37] <NaioN> sagewk: yeah!
[22:38] <sagewk> wido: i don't remember exactly what cache=writeback actually does..
[22:38] <sagewk> joshd did say something about wanting to wire it up to rbd so that it would turn on rbd_cache, though
[22:39] <sagewk> which makes me think the behavior is specific to the backend.. e.g. for raw it doesn't call fdatasync, etc.
[22:39] <Tv> sagewk: for qemu? cache=writeback means it can buffer writes for a moment, so crashes corrupt disk state
[22:39] <wido> sagewk: That's what I meant indeed. If it could trigger rbd_cache
[22:39] <Tv> cache=writethrough means reads can be served from cache, but writes go straight through
[22:39] <Tv> now, don't ask how that interacts with barriers
[22:40] <wido> Tv: Who does the write buffering? Qemu or is that up to the driver?
[22:41] <wido> If Qemu does the buffering, you could end up with a double buffer: cache='writeback' and rbd_cache=1
[22:41] <wido> that will confuse people
[22:41] <wido> might me worth mentioning in the docs
[22:41] <wido> me=be
[22:45] <Tv> wido: qemu
[22:45] * rturk (rturk@ds2390.dreamservers.com) Quit (Quit: leaving)
[22:45] <Tv> wido: well maybe qemu driver, but qemu.git
[22:45] <Tv> wido: and yes, it would
[22:45] <Tv> wido: don't know how much the driver gets to see & fix that
[22:46] <wido> Tv: I'll check the source, but what I meant, it would result into a double cache.
[22:47] <wido> Haven't looked into it, just came up
[22:47] <wido> anyway, I'm afk, ttyl
[22:49] * technicool (~manschutz@pool-96-226-55-169.dllstx.fios.verizon.net) Quit (Remote host closed the connection)
[22:49] * technicool (~manschutz@pool-96-226-55-169.dllstx.fios.verizon.net) has joined #ceph
[22:53] <Oliver1> sage: WOW, this is an increase in performance???I use "spew" for testing. About 25% increase in write IOps. Great.
[22:55] <sagewk> tv: i seem to remember that it is the individual qemu driver, not something above that.. but we need to confirm.
[22:55] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[22:56] <Tv> sagewk: that sounds reasonable; i think i saw it as argument to the drive= thing, which would be handled by individual drive types
[22:56] <sagewk> yeah. fingers crossed :)
[22:56] <dmick> Tv: you'll be pleased to know I think I have some vlans lit up
[22:56] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit ()
[22:57] <Tv> dmick: yay
[22:57] <Tv> dmick: i think i did the new-fangled 10g setup on two vercoi -- can they ping each other?
[22:59] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[22:59] <dmick> not sure, but 04 can ping burnupi.back
[22:59] <dmick> after I brought some stuff up on 04
[23:00] <Tv> dmick: brought up = ifup, or what?
[23:03] <dmick> yeah
[23:04] <Tv> dmick: ok that should be enough
[23:04] <dmick> also added an ipmi vlan, but I need an uplink to get there
[23:04] <Tv> sweet!
[23:04] <dmick> that may be coming shortly
[23:04] * BManojlovic (~steki@212.200.243.246) has joined #ceph
[23:04] <Tv> dmick: yeah that'd be via the 4948
[23:05] <dmick> yeah
[23:10] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Quit: Konversation terminated!)
[23:10] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Read error: Connection reset by peer)
[23:10] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) has joined #ceph
[23:10] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) has joined #ceph
[23:31] * Oliver1 (~oliver1@ip-176-198-98-169.unitymediagroup.de) Quit (Quit: Leaving.)
[23:51] * loicd (~loic@204-16-154-194-static.ipnetworksinc.net) Quit (Quit: Leaving.)
[23:58] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.