#ceph IRC Log


IRC Log for 2012-03-09

Timestamps are in GMT/BST.

[0:06] * LarsFronius (~LarsFroni@g231139098.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[0:20] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[0:40] <sagewk> gregaf1: can you take a quick peek at d563f5d47c570ab44876d93ce239b39337d774c5 and tell me if hte naming convention makes you hurl?
[0:40] <sagewk> yehudasa_, gregaf1: don't forget about wip-2139!
[0:41] <yehudasa_> yeah, it's waiting review
[0:41] <gregaf1> didn't I do notes on that one Tuesday?
[0:42] <gregaf1> I'll check
[0:43] <gregaf1> sagewk: those names look reasonable
[0:43] <sagewk> thanks
[0:44] <gregaf1> I haven't looked at any of the rest of the branch, though, did you want me to watch it?
[0:44] * lofejndif (~lsqavnbok@214.Red-83-43-124.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[0:46] <gregaf1> sagewk: yehudasa_: oh, nope, I don't think anybody asked me to look at wip-2139 (I'm getting confused by the number of rgw branches for review though :P)
[0:46] <gregaf1> ..wait, no, isn't that mostly the atomic stuff? augh
[0:47] <yehudasa_> gregaf1: yeah, it shouldn't be more than one or two commits
[0:47] <gregaf1> ah, there we go
[0:48] <gregaf1> yeah, I looked at that when I was checking out the atomic stuff, it all looked fine :)
[0:48] <yehudasa_> thanks
[0:49] <gregaf1> although looking at it now I see it's using TMAP and that got swapped out for OMAP, so you'll need to convert I think?
[0:50] <yehudasa_> heh.. yeah
[0:51] <sagewk> sjust1: do you have a rados.py patch to feed in omap op weights?
[1:11] <sjust1> sagewk: yeah, forgot to push it, one moment
[1:13] <sjust1> pushed
[1:32] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:34] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:34] * Tv|work (~Tv_@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:01] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[2:07] * Tv__ (~tv@cpe-24-24-131-250.socal.res.rr.com) has joined #ceph
[2:24] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[3:54] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:59] * jpieper_ (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[4:02] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:55] * adjohn (~adjohn@rackspacesf.static.monkeybrains.net) Quit (Ping timeout: 480 seconds)
[5:55] * jpieper (~josh@209-6-86-62.c3-0.smr-ubr2.sbo-smr.ma.cable.rcn.com) has joined #ceph
[6:48] * adjohn (~adjohn@50-0-92-115.dsl.dynamic.sonic.net) has joined #ceph
[7:43] * Tv__ (~tv@cpe-24-24-131-250.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[8:02] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:10] * stxShadow (~Jens@ip-88-153-224-220.unitymediagroup.de) has joined #ceph
[8:20] * LarsFronius (~LarsFroni@g231139098.adsl.alicedsl.de) has joined #ceph
[8:58] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[9:03] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:05] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[9:05] * tnt_ (~tnt@office.intopix.com) has joined #ceph
[9:10] * BManojlovic (~steki@ has joined #ceph
[9:25] * stxShadow (~Jens@ip-88-153-224-220.unitymediagroup.de) Quit (Read error: Connection reset by peer)
[9:34] * LarsFronius_ (~LarsFroni@g231139098.adsl.alicedsl.de) has joined #ceph
[9:34] * LarsFronius (~LarsFroni@g231139098.adsl.alicedsl.de) Quit (Read error: Connection reset by peer)
[9:34] * LarsFronius_ is now known as LarsFronius
[9:35] * LarsFronius_ (~LarsFroni@g231139098.adsl.alicedsl.de) has joined #ceph
[9:35] * LarsFronius (~LarsFroni@g231139098.adsl.alicedsl.de) Quit (Read error: Connection reset by peer)
[9:35] * LarsFronius_ is now known as LarsFronius
[9:42] * LarsFronius_ (~LarsFroni@g231139098.adsl.alicedsl.de) has joined #ceph
[9:42] * LarsFronius (~LarsFroni@g231139098.adsl.alicedsl.de) Quit (Read error: Connection reset by peer)
[9:42] * LarsFronius_ is now known as LarsFronius
[10:11] * stxShadow (~jens@tmo-111-130.customers.d1-online.com) has joined #ceph
[10:11] * stxShadow (~jens@tmo-111-130.customers.d1-online.com) Quit ()
[10:11] * stxShadow (~jens@tmo-111-130.customers.d1-online.com) has joined #ceph
[10:19] * Qten (~Q@ppp59-167-157-24.static.internode.on.net) has joined #ceph
[10:27] * stxShadow (~jens@tmo-111-130.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[10:28] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:35] * stxShadow (~jens@jump.filoo.de) has joined #ceph
[10:38] <Qten> hi all, any guesses on how long until ceph is ready for a production deployment?
[10:49] <stxShadow> developers are here late in the evening ..... maybe they could answer that
[10:53] <Qten> fair enough
[10:55] * Enorian (~Enoria@albaldah.dreamhost.com) has joined #ceph
[10:56] * Enoria (~Enoria@albaldah.dreamhost.com) Quit (Ping timeout: 480 seconds)
[11:00] * stxShadow (~jens@jump.filoo.de) Quit (Ping timeout: 480 seconds)
[11:46] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[11:47] * softcrack (~softcrack@ has joined #ceph
[11:55] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[11:58] * softcrack1 (ca55d12f@ircip1.mibbit.com) has joined #ceph
[11:58] * softcrack (~softcrack@ Quit ()
[11:59] * softcrack1 (ca55d12f@ircip1.mibbit.com) Quit ()
[12:08] * joao (~JL@ has joined #ceph
[12:22] * nolan (~nolan@phong.sigbus.net) Quit (Ping timeout: 480 seconds)
[12:40] <wido> Qten: What are you planning to use?
[12:40] <wido> RBD, RADOS or the Posix Filesystem (CephFS)
[12:40] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[12:40] <wido> In order of stablity at the moment: RADOS, RBD, CephFS
[12:43] * softcrack (de808f3b@ircip3.mibbit.com) has joined #ceph
[13:14] * softcrack (de808f3b@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[13:32] * jluis (~JL@ace.ops.newdream.net) has joined #ceph
[13:34] * lofejndif (~lsqavnbok@214.Red-83-43-124.dynamicIP.rima-tde.net) has joined #ceph
[13:35] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[13:35] * lofejndif (~lsqavnbok@214.Red-83-43-124.dynamicIP.rima-tde.net) Quit (Max SendQ exceeded)
[13:37] * lofejndif (~lsqavnbok@214.Red-83-43-124.dynamicIP.rima-tde.net) has joined #ceph
[13:40] * joao (~JL@ Quit (Ping timeout: 480 seconds)
[13:46] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[13:51] * stxShadow (~jens@p4FFFEF07.dip.t-dialin.net) has joined #ceph
[13:51] <stxShadow> Hi all
[13:52] <jluis> hi
[13:52] * jluis is now known as joao
[13:58] * softcrack (de808f3b@ircip3.mibbit.com) has joined #ceph
[14:07] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[14:40] <nhm> good morning all
[14:44] <stxShadow> heyho
[14:44] <joao> hey nhm
[14:46] * wonko_be (bernard@november.openminds.be) has joined #ceph
[14:49] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[14:51] * stxShadow (~jens@p4FFFEF07.dip.t-dialin.net) Quit (Remote host closed the connection)
[14:55] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[15:32] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[15:45] * softcrack (de808f3b@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[16:01] <Azrael> hey folks, i have two OSD's with 50GB each. replication level is size of 2, so data should be mirrored on each of the OSD's. howcome on my client, df shows 100GB size instead of just 50GB?
[16:02] * The_Bishop (~bishop@178-17-163-220.static-host.net) Quit (Ping timeout: 480 seconds)
[16:04] * johnl_ (~johnl@2a02:1348:14c:1720:24:19ff:fef0:5c82) Quit (Quit: leaving)
[16:04] * johnl (~johnl@2a02:1348:14c:1720:24:19ff:fef0:5c82) has joined #ceph
[16:07] * The_Bishop (~bishop@178-17-163-220.static-host.net) has joined #ceph
[16:07] <tnt_> Azrael: I doubt df takes replication into account ... how could it ... you could have different replication per directory so df just displays raw free size (just a guess, but makes sense)
[16:10] <Azrael> tnt_: ok cool
[16:11] <Azrael> tnt_: is there a ceph command i can use to get the actual total and available sizes, with replication taken into account?
[16:17] <Azrael> iiiinteresting
[16:18] <Azrael> if i put a 2GB file into ceph and repl is 2
[16:18] <Azrael> shows up as 4GB used
[16:18] <Azrael> so it works out
[16:20] * hijacker (~hijacker@ Quit (Quit: Leaving)
[16:23] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (Read error: Operation timed out)
[16:27] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[16:28] * hijacker (~hijacker@ has joined #ceph
[16:44] * lofejndif (~lsqavnbok@214.Red-83-43-124.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[17:04] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[17:17] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[17:21] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[17:29] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) has joined #ceph
[17:33] <sagewk> azrael: 'ceph pg dump' will tell you everything, including used disk (from osd perspective) and per-pool data (bytes stored from user perspective)
[17:47] <sagewk> darkfader: yep, several of us will be at WHD, although i'll miss part/most of tuesday.
[17:47] <Azrael> sagewk: cool, thanks
[17:48] <darkfader> sagewk: did you just see my mail or did you just see the old question?
[17:48] <darkfader> i feel paranoid now ;)
[17:48] <sagewk> darkfader: just saw your mail. i'll be the only dev
[17:48] <sagewk> :)
[17:48] <darkfader> hehe
[17:50] <darkfader> i'll see that i'll be able to make it
[17:50] <darkfader> then
[17:50] <darkfader> :)
[17:51] <sagewk> excellent. i'm looking forward to some good beer ;)
[17:51] <joao> WHD is that conference in Germany?
[17:51] <sagewk> it should be fun. i'm pretty sure wido will be around too
[17:51] <sagewk> yeah
[17:54] <joao> sagewk, if you happen to have a connection flight somewhere in Europe, let me know :p
[17:55] <sagewk> i'm going direct to frankfurt. the other guys will be coming from uk, though, not sure what their flights look like
[17:57] <darkfader> i just checked, 5 hours of train ride... seems i get a good nap on the way :>
[17:58] <darkfader> the website says it's a fantastic place for unforgettable parties
[17:59] <darkfader> not too bad ;)
[17:59] <joao> lol
[17:59] <darkfader> going by the hotel prices it might be smarter to party through the night, too :))
[18:00] <darkfader> sagewk: i just checked $calendar, i can only make it on friday
[18:00] <darkfader> i'm at a customer the other part of the week
[18:00] <darkfader> will you still be there on friday?
[18:01] <sagewk> only in the morning.. leaving around noon
[18:01] <darkfader> *scratch* planning fail
[18:11] * nolan (~nolan@phong.sigbus.net) has joined #ceph
[18:12] * tnt_ (~tnt@office.intopix.com) Quit (Ping timeout: 480 seconds)
[18:23] * Tv__ (~tv@cpe-24-24-131-250.socal.res.rr.com) has joined #ceph
[18:25] <Tv__> hey, anyone at the office yet? can you go hit the power button on my desktop machine to wake it up?
[18:30] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[18:31] * tnt_ (~tnt@37.191-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:39] <sagewk> tv__: poked
[18:39] * Tv|work (~Tv_@aon.hq.newdream.net) has joined #ceph
[18:40] <Tv__> heh, and i logged in ;)
[18:42] <Tv__> hrrm except of course i got a different ip address.. if you have zeroconf setup, can you "ping dreamer" or "ping dreamer.local" and tell me what ip address it is?
[18:46] <gregaf1> Tv__: I think mkampe would be crowing about NAS if he were paying attention to this ;)
[18:47] <Tv__> gregaf1: i don't want working files on a NAS anyway
[18:47] <Tv__> if anything, i'm crowing about not having a work laptop -- but then i like having my personal laptop at work..
[18:47] <Tv__> if anything, let's crow about NAT
[18:48] <gregaf1> unfortunately I don't have zerconf (whatever that is; I can't resolve dreamer or dreamer.local anyway), anything I can do from your lock screen to find an ip?
[18:48] <Tv__> gregaf1: you have os x, you most definitely have zeroconf -- they call it bonjour in appleland
[18:48] <gregaf1> otherwise I am going to start Not Caring, since I'm one of the few people who actually doesn't need new sepia :D
[18:48] <Tv__> "host" might not do it right, but ping definitely should
[18:49] <gregaf1> haha, I was a little surprised since I recognize the .local
[18:49] <gregaf1> sadly...
[18:49] <gregaf1> gregory-farnums-mac-mini:teuthology gfarnum$ ping dreamer
[18:49] <gregaf1> ping: cannot resolve dreamer: Unknown host
[18:49] <gregaf1> gregory-farnums-mac-mini:teuthology gfarnum$ ping dreamer.local
[18:49] <gregaf1> ping: cannot resolve dreamer.local: Unknown host
[18:49] <Tv__> hrmph
[18:49] <Tv__> cue my rant about zeroconf being unreliable
[18:49] <Tv__> but i'm not even going to dream about a reliable reverse dns service at the office ;)
[18:51] <jmlowe> diff nmap output with network cable plugged in and unplugged?
[18:51] <Tv__> haha
[18:51] <sagewk> tv__ this is usually when i fall back to nmap
[18:51] <gregaf1> oh, hrm, my server can find my mini just fine
[18:51] <gregaf1> I think your desktop is Doing It Wrong :/
[18:53] <Tv__> and the office gets natted on the way to new sepia, so i can't even use the logs there
[18:53] <Tv__> oh well, i might come to the office to get the files, and sneeze on everything
[18:54] <joao> I'm so glad right now I'm working from the other side of the atlantic
[18:54] <gregaf1> I'd laugh at you for not using a disposable password if I were only doing it myself
[18:55] <sagewk> tv__: for f in `seq 2 254` ; do ssh tv@a.b.c.$f ...
[18:55] <Tv__> heh
[18:55] <Tv__> yeah
[18:58] <Tv__> cssh $(seq monster) worked!
[18:58] <Tv__> one of the postage stamp sized windows does not contain an error message ;)
[19:04] <gregaf1> hurray ridiculous number of cpu cycles
[19:05] <gregaf1> I love using computers today
[19:12] <sagewk> tv__: what's your skype?
[19:12] <Tv__> tommi.virtanen
[19:13] <sagewk> tnx
[19:14] * chutzpah (~chutz@ has joined #ceph
[19:15] * LarsFronius_ (~LarsFroni@f054097134.adsl.alicedsl.de) has joined #ceph
[19:16] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[19:20] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[19:20] * LarsFronius (~LarsFroni@g231139098.adsl.alicedsl.de) Quit (Read error: Operation timed out)
[19:20] * LarsFronius_ is now known as LarsFronius
[19:20] * stxShadow (~jens@ip-88-153-224-220.unitymediagroup.de) has joined #ceph
[19:21] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[19:26] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[19:31] <sagewk> sjust1: do you know if there's an easy way to make 'make check' do our tests before leveldb's? makes it very hard to iterate on those tests
[19:32] <sagewk> or maybe there is a different command that will skip the nested tests...
[19:32] <Tv__> sagewk: order of subdirs in automake
[19:32] <Tv__> sagewk: i think there's a "make checklocal" or something like that
[19:32] <Tv__> checklocal, check-local, local-check.. ;)
[19:32] <joao> sagewk, the ObjectStore::Transaction's setattr are for xattrs, right?
[19:32] <joao> *is
[19:32] <sjust1> joao: yes
[19:33] <joao> thought so
[19:33] <Tv__> sagewk: try cd src && make check-local
[19:33] <sagewk> make check-local does the stuff you define (like our encode/decode tests), but not the stuff in the current makefile
[19:33] <sagewk> (e.g. unittest_*)
[19:33] <sjust1> sagewk: I haven't tried, I usually just run the specific test I'
[19:33] <Tv__> sagewk: how's encode/decode special?
[19:33] <sjust1> m fixing
[19:34] <sagewk> check-local:
[19:34] <sagewk> $(srcdir)/test/encoding/check-generated.sh
[19:34] <sagewk> $(srcdir)/test/encoding/readable.sh ../ceph-object-corpus
[19:34] <Tv__> ohhh
[19:34] <sagewk> i.e. not check_PROGRAMS or whatever
[19:34] <Tv__> i wonder if we're supposed to override it like that
[19:35] <sagewk> sjust1: cool
[19:35] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:38] <Tv__> sagewk: automake defaults to depth-first, put an explicit "." into src/Makefile.am SUBDIRS line.. SUBDIRS = . ocf leveldb
[19:38] <sagewk> tv__ sweet thanks
[19:42] <sagewk> sjust1: it's the iterator that holds the lock, not the IndexedPath, right?
[19:43] <sjust1> IndexedPath holds a different lock
[19:44] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:57] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[20:03] <Tv__> how does one add a file to a tarball with a different filename than what it is in the source?
[20:04] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:04] <Tv__> hmm i can --transform the names..
[20:06] <Tv__> note to self: lift fingers off keyboard before having a sneezing fit
[20:06] <Tv__> *undo undo undo*
[20:15] <sagewk> gregaf1: can you carve out some time today to look at wip-2116?
[20:15] <gregaf1> yeah, will do
[20:16] <sagewk> gregaf1: hmm, the alternative approach is to make them lossy_servers, and turn it into a ping/reply instead of start/heartbeat*n/stop type of deal
[20:17] <sagewk> gregaf1: that may be simpler to understand, actually.
[20:19] <gregaf1> sagewk: I'll have to check out the OSD heartbeat management a bit, but not having to deal with old pings on reconnect is probably happier
[20:19] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:19] <gregaf1> and requiring explicit replies will make debugging and reporting a lot easier
[20:20] <gregaf1> but isn't that what we had before that you tossed out?
[20:20] <dmick> Tv__: yeah, looks like --transform/--xform
[20:21] <Tv__> dmick: currently making it create a .tar with everything one needs to get a client goign
[20:22] <dmick> you could always just toss it in your own prototype tmpdir and tar that, too
[20:23] <Tv__> got it already
[20:25] <dmick> Tv__: network is such that VPN is required for access to plana, yes?
[20:25] <Tv__> dmick: yes
[20:25] <dmick> k
[20:25] <Tv__> alright now i need to write a little script to generate randomness on the client side, then try setting up a client
[20:29] <gregaf1> sagewk: hmm, we're unconditionally setting heartbeat_need_update in PG::update_heartbeat_peers() even if we didn't actually change our peers
[20:30] <gregaf1> and [maybe_]update_heartbeat_peers() takes a lock from every PG as soon as one PG decides it needs a peer update, so maybe we should be more careful about only triggering that when we need it
[20:36] <gregaf1> sagewk: anyway, all the rest of the changes are pretty rote and look like they're right to me
[21:09] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:10] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[21:10] <Qten> lo
[21:11] <Qten> wido: are you still around?
[21:15] <Qten> anyone: the Rados block device, i assume this create's small files eg 64mb or something and the Object storage replicates/distributes them across the cluster?
[21:17] <joshd> Qten: yes, but by default they're 4mb - it's configurable at creation time by the --order parameter
[21:17] <jmlowe> that's correct, I think if you do rbd info it will tell you the exact size and number of objects
[21:18] <joshd> the number of objects is not the number actually used though - that's not tracked - it's just the size / object size
[21:19] <Qten> interesting, so does that mean a 1mb file inside the rbd would use 4mb or is it only creating new chunks as the last one fills up?
[21:19] <Qten> ah
[21:19] <joshd> Qten: the object store can read/write to different portions of an object
[21:19] <Qten> pretty much answered my question :)
[21:20] <joshd> so there's no extra space used due to large object sizes
[21:20] <sagewk> gregaf1: ah, ill fix that bit and merge the update_heartbeat_peers patch
[21:21] <Qten> are you guys familure with moosefs?
[21:22] <Qten> just wondering from a developers point of view how they differ from ceph
[21:22] <Tv__> crypto hashes are so finicky about the exact bytes they are fed....
[21:22] <Qten> it seems to use 64mb chunks (files) like a distributed fs
[21:22] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Operation timed out)
[21:23] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:23] <Qten> but seems slower then what i would have expected or is that just an issue with this concept?
[21:24] <Qten> of distributed fs/object storage / striping?
[21:25] <sagewk> gregaf1: http://fpaste.org/ACN9/
[21:27] <sagewk> tv__: can you get josh set up first so he can move the teuthology vm over? will prob take a while to copy etc.
[21:27] <Qten> so i guess further to that question from a hardware point of view if you had a array with say 36 sata disks 2700iops/3600mb/s would you ever see close to the maximum arrays output with say 1 or 2 clients? or would you need to have 50 or so?
[21:28] <Qten> assuming 40gb IB or something
[21:29] <Tv__> sagewk: yeah sorry fighting authentication now
[21:29] <joshd> Qten: depends on what the client's doing, but I'd guess you'd need more
[21:29] <nhm> Qten: I've read a bit about moose but never tested it. Not sure what kind of performance they see.
[21:30] <Tv__> i have stupid typo somewhere but it's crypto hashing so it's hard to find :(
[21:30] <jmlowe> I had 24 1tb sas 7.2k drives across 2 machines with 10GigE and I got up to 900MBs with vm's running iozone
[21:30] <jmlowe> vm's spread out over 4 hosts
[21:30] <nhm> Qten: one of the things you run into with that many disks in a single box is PCIE/QPI limitations, especially if you have a NUMA setup.
[21:30] <Tv__> ahh \n
[21:30] <Qten> i was thinking 3 box's 12 disks each, sorry :)
[21:31] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[21:31] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[21:31] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:32] <jmlowe> there is a benchmark command for osd's, that should give you some idea of what you can get in aggregate
[21:32] <Qten> jmlowe: so that was 4 hosts at 900mb/s or 900mb/s div by 4 hosts so say 220~ each?
[21:33] <Tv__> so the .sepia.ceph.com dns only resolves from the office
[21:33] <Tv__> wth
[21:33] <nhm> Qten: A single client is going to be limited to about 3GB/s over QDR IB. So if we are talking entirely from a theoreticaly standpoint, you would end up network limited with 36 modern drives.
[21:33] <Tv__> but at least i can access plana via the vpn now
[21:33] <nhm> Qten: ignoring other factors...
[21:34] <Tv__> oh huh the dns is just *my* dns being broken? how can that be
[21:34] <Qten> nhm: sure
[21:34] <nhm> Qten: in the real world, you'd probably not hit the IB limitation first.
[21:34] <Tv__> ohh goddamn evil DH vpn steals routing
[21:34] <Tv__> gah
[21:35] <dmick> that...seems like a lot
[21:35] <jmlowe> Qten: 900mbs aggregate, about 40mbs per vm while each running iozone concurrently
[21:35] <nhm> Tv__: ACL it? ;)
[21:36] <Qten> jm: is that using the RBD driver?
[21:36] <jmlowe> nhm: agreed, IB is the least of your concerns if you are trying to do this
[21:36] * adjohn (~adjohn@50-0-92-115.dsl.dynamic.sonic.net) Quit (Ping timeout: 480 seconds)
[21:37] <nhm> jmlowe: Yeah, I'll be happy when IB is a concern. :)
[21:37] <jmlowe> qten: ubuntu 11.10 with qemu 0.15.1 patched for async rbd and configured —with-rbd
[21:38] <darkfader> nhm: as long as you start pushing rdma support when you're that happy :)
[21:39] <nhm> darkfader: would you use it if it was available?
[21:39] <darkfader> yes
[21:39] <darkfader> i have infiniband in all servers
[21:39] <darkfader> and the "ip over infiniband" performance is very low when comparing
[21:40] <darkfader> plus it's lame
[21:40] <nhm> darkfader: Out of curiosity are you industry or academia?
[21:40] <darkfader> "wanna-be webhost"
[21:41] <nhm> interesting! What kind of IB are you running?
[21:41] <darkfader> the old ones have DDR, and the new supermicro has QDR
[21:41] <nhm> mellanox?
[21:41] <darkfader> my switch is DDR only
[21:41] <darkfader> yes
[21:41] <nhm> QDR switches are still pretty pricey.
[21:42] <nhm> DDR still gives you good throughput and low latency though.
[21:42] <darkfader> yes, i'll just do crossconnects with the QDR (methinks)
[21:42] <nhm> We never were brave enough to try a setup like that.
[21:43] <darkfader> the problem is that due to starting up i'll not be able to rack all of this from start
[21:43] <Tv__> dmick, joshd: check your email
[21:43] <darkfader> so at first it'll just be 1:1 connection for two servers and then well... scale up :)
[21:43] <nhm> darkfader: so are you running ceph on ipoib now?
[21:44] <joshd> Tv__: thanks
[21:44] <darkfader> no, not atm. my vps boxes are still on oracle VM 2.2.3, too old in kernel terms to tackle with ceph on there
[21:45] <Qten> this is probably a silly question but would it be possible has it been done or even usefull to have a fuse-iscsi driver which uses a folder with files in it instead of using like a RBD block files? so your basically removing the extra file inside a file layer
[21:45] <darkfader> nhm: i can understand you didn't yet test something like it because you have a huge amount of dependency hell along with it
[21:46] <nhm> darkfader: could be. I've not even begun to explore it.
[21:46] <darkfader> Qten: fuse would probably eat all your performance anyway :)
[21:46] <Tv__> Qten: how does iscsi enter the picture?
[21:46] <nhm> darkfader: looks like gregaf1 maybe knew of some people testing it based on some old irc logs.
[21:46] <Qten> tv__:as an export to hypervisors
[21:47] <darkfader> nhm: yes, there was one from academia for example
[21:47] <Tv__> Qten: then it's not a folder of files.. anyway, think of rbd as iscsi-on-steroid
[21:48] <Tv__> rbd : iscsi :: ceph fs : nfs
[21:48] <nhm> Tv__: btw, check IM when you have a moment
[21:48] <Tv__> nhm: oh crud i'm not logged into that account from here
[21:48] <Tv__> nhm: can you resend using google apps im or irc?
[21:48] <nhm> Tv__: no worries
[21:48] <Qten> tv__: I suppose the one of many issues would be file locking
[21:49] <Qten> tv__: by using a technology i was just talking about anyway
[21:49] <Qten> so currently the RDB driver isnt fuse based?
[21:50] <NaioN> no
[21:50] <Tv__> Qten: no, fuse is about filesystems, rbd is a block device
[21:50] * The_Bishop (~bishop@178-17-163-220.static-host.net) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[21:50] <Qten> understood however how does rdb talk to the filesystem via the fuse mount?
[21:50] <Tv__> dmick: test your vpn please
[21:50] <Tv__> Qten: it doesn't
[21:51] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[21:52] <NaioN> Qten: rbd = remote block device, so you have to put a filesystem on it or have to re-export it...
[21:53] <Qten> Naion: understand that part however i'm trying to figure out how rdb talks to the DFS
[21:53] <NaioN> or use kvm/qemu
[21:53] <NaioN> there is no DFS
[21:53] <Tv__> Qten: the rbd client talks the rados protocol directly
[21:53] <NaioN> there is a distributed object store
[21:54] <Tv__> joshd: any luck with the vpn?
[21:54] <NaioN> so the rbd gets chopped into parts/objects and distibuted over the OSDs
[21:54] <joshd> Tv__: no, sent mail with log
[21:54] <Qten> sorry i ment dfs as in rados
[21:56] <Qten> so the big question any ideas on the expected stable/production ready release?
[21:56] <NaioN> well as far as I have experimented with rdb it's pretty stable
[21:57] <NaioN> I have a stup with OSD per disk, formatted with XFS
[21:57] <NaioN> and about 80 rbds in use
[21:58] <dmick> Tv__: "~/cephco..."?
[21:58] <Qten> nice
[21:58] <Tv__> dmick: replace with where the tar is
[21:58] <Tv__> joshd: i don't see your attempt on the server side :(
[21:58] <dmick> missed the attachment, sorry
[21:58] <nhm> Qten: it gets closer every day. ;)
[21:58] <jmlowe> Stable? the roadmap says in 23 days 0.45 will be out and the experimental warnings will go away
[21:59] <stxShadow> rbd is very stable .... we have a clust with about 250 rbds
[21:59] <Tv__> joshd: last entry is 12:32:30
[21:59] <Tv__> joshd: i do have openvpn 2.2 here
[22:00] <dmick> Tv__: does MYHOST need to be somehow FQDN, or is simple hostname enough?
[22:00] <Qten> <wido> In order of stablity at the moment: RADOS, RBD, CephFS
[22:00] <Tv__> dmick: any string
[22:00] <dmick> ok
[22:00] <NaioN> Qten: yeah that's true
[22:01] <NaioN> well I had a lot of troubles with btrfs as underlying filesystem for the OSDs
[22:01] <Qten> interesting
[22:01] <stxShadow> NaioN .... same here .... therefore changed to xfs
[22:02] <NaioN> even with the latest kernels (3.2.x)
[22:02] <jmlowe> I've had much better luck with btrfs using 3.2.5+, currently on 3.2.9
[22:02] <NaioN> jmlowe: hmmm haven't tried the later 3.2.x kernels
[22:03] <NaioN> I'm waiting for new hardware, want to make a production cluster and experimental cluster
[22:03] <NaioN> I want to experiment again with btrfs
[22:03] <stxShadow> i've tested 3.2.4 ..... broke btrfs in 12 hours
[22:03] <Qten> so what size chunks if you will does RADOS use?
[22:03] <nhm> jmlowe: Are you seeing good throughput with that setup?
[22:03] <Tv__> joshd: works for dmick using openvpn 2.2.0-2ubuntu1, can you upgrade?
[22:03] <jmlowe> Not that I haven't cursed Chris Mason and his year late btrfsck
[22:03] <joshd> Tv__: trying 2.3 (my desktop is arch)
[22:03] <NaioN> the last time i used btrfs is with 3.2.5 (if i'm correct) and it didn't crash but it slowed down
[22:04] <NaioN> jmlowe: :)
[22:04] * lofejndif (~lsqavnbok@214.Red-83-43-124.dynamicIP.rima-tde.net) has joined #ceph
[22:04] <jmlowe> nhm: I had to go down to mirrored pairs, now seeing about 500-600MBs, hp says I may have gotten a batch of bad disks
[22:04] <NaioN> jmlowe: no trouble with slowdowns?
[22:04] <nhm> jmlowe: how many nodes again?
[22:05] <NaioN> after a time with high load?
[22:05] <jmlowe> I've got two storage nodes with 8 vm hosts, currently I've got 4 vm's banging away with iozone on while /bin/true loops
[22:05] <gregaf1> sagewk: looks good
[22:06] <jmlowe> I don't think I've had slowdowns, I have to run some tests Monday and check it monday
[22:06] <nhm> jmlowe: ok, nice.
[22:06] <jmlowe> right now I'm trying to shake out the remaining bad disks
[22:07] <nhm> jmlowe: 2x replcation?
[22:07] <jmlowe> replaced 5/24 earlier this week, I think my total replacement count is 11/24 disks
[22:07] <jmlowe> default 2x, 6 osd's per host
[22:08] <gregaf1> jmlowe: what disks are you using to get that kind of replacement count? :(
[22:08] <Qten> does RADOS currently have any protection for URE/BitRot etc?
[22:09] <NaioN> yeah haven't had that replacement count even with my cheapass sata disks :)
[22:09] <jmlowe> there is scrubbing, what it actually does is a question I'd like answered for myself
[22:09] <nhm> gregaf1: we had one storage solution that used seagate ES.2 drives that never had a bad disk in 4 years. We had another solution using the exact same drives that had 2-3 drives failing a month.
[22:09] <gregaf1> Qten: sadly not
[22:10] <gregaf1> there is a scrubbing mechanism, which right now is pretty primitive — it compares the expected metadata against the actual FS metadata for each object, and then compares the metadata on each replica
[22:10] <Qten> so you would need to run zfs/brtfs then i guess
[22:10] <sagewk> sjust1: wip-2103? :)
[22:10] <gregaf1> at some point it should also calculate checksums and compare those, but it's not something we currently consider high-enough priority
[22:11] <jmlowe> Model: HP DB1000BABFF,Model: HP MB1000BAWJP,Model: HP MB1000FBZPL,
[22:11] <gregaf1> also at some point we should have checksums that persist along with data, but again — not high enough priority at this time
[22:12] <jmlowe> All 1Tb 7.2k sas dual port drives
[22:12] <NaioN> Qten: yeah you need a fs for the OSDs
[22:12] <NaioN> and XFS and BTRFS have the advantage of dyanmic xattr space
[22:13] <NaioN> although you could use ext3 or ext4
[22:13] <gregaf1> wow, those are some bad numbers out of SAS drives, I thought maybe you had the sleep-happy Caviar Greens that everybody's had trouble with
[22:13] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:13] <NaioN> gregaf1: well those are sata drives
[22:13] <sjust1> sagewk: looking
[22:14] <NaioN> with a sas connector
[22:14] <gregaf1> brb
[22:14] * gregaf1 (~Adium@aon.hq.newdream.net) Quit (Quit: Leaving.)
[22:15] * gregaf (~Adium@aon.hq.newdream.net) has joined #ceph
[22:17] <jmlowe> I don't think it's an coincidence that all the replacement drives have a different model number and the latest firmware rev is 7 less than the older ones, I suspect they replaced their oem
[22:19] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Quit: Leaving)
[22:22] <nhm> jmlowe: I'm pretty sure vibration is what killed a lot of our drives in the batch that failed.
[22:23] <sjust1> sagewk: wip-2103 looks ok
[22:24] <sjust1> there is a watch_notify_stress task for testing watch notify
[22:25] <sagewk> sjust1: oh perfect, i'll run that with lockdep.
[22:30] * stxShadow (~jens@ip-88-153-224-220.unitymediagroup.de) Quit (Quit: bye bye !! )
[22:33] * The_Bishop (~bishop@178-17-163-220.static-host.net) has joined #ceph
[22:39] <sagewk> sjust1: we should add that to the qa suite
[23:05] <sagewk> tv__, sjust1: btw, the SUBDIRS = . leveldb thing doesn't work bc leveldb needs to build first. 'make check-am' doesn't recurse, tho, so i'm happy :)
[23:27] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) Quit (Quit: Leaving.)
[23:27] * LarsFronius (~LarsFroni@f054097134.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[23:43] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[23:57] * mrjack (mrjack@office.smart-weblications.net) has joined #ceph
[23:57] <mrjack> hi
[23:57] <mrjack> is there some sort of fsck for ceph?
[23:59] <gregaf> mrjack: nope, not yet

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.