[0:40] <sagewk> gregaf1: can you take a quick peek at d563f5d47c570ab44876d93ce239b39337d774c5 and tell me if hte naming convention makes you hurl?
[0:40] <sagewk> yehudasa_, gregaf1: don't forget about wip-2139!
[0:41] <yehudasa_> yeah, it's waiting review
[0:41] <gregaf1> didn't I do notes on that one Tuesday?
[0:42] <gregaf1> I'll check
[0:43] <gregaf1> sagewk: those names look reasonable
[0:43] <sagewk> thanks
[0:44] <gregaf1> I haven't looked at any of the rest of the branch, though, did you want me to watch it?
[0:46] <gregaf1> sagewk: yehudasa_: oh, nope, I don't think anybody asked me to look at wip-2139 (I'm getting confused by the number of rgw branches for review though :P)
[0:46] <gregaf1> ..wait, no, isn't that mostly the atomic stuff? augh
[0:47] <yehudasa_> gregaf1: yeah, it shouldn't be more than one or two commits
[0:47] <gregaf1> ah, there we go
[0:48] <gregaf1> yeah, I looked at that when I was checking out the atomic stuff, it all looked fine :)
[0:48] <yehudasa_> thanks
[0:49] <gregaf1> although looking at it now I see it's using TMAP and that got swapped out for OMAP, so you'll need to convert I think?
[0:50] <yehudasa_> heh.. yeah
[0:51] <sagewk> sjust1: do you have a rados.py patch to feed in omap op weights?
[1:11] <sjust1> sagewk: yeah, forgot to push it, one moment
[1:13] <sjust1> pushed
[10:38] <Qten> hi all, any guesses on how long until ceph is ready for a production deployment?
[10:49] <stxShadow> developers are here late in the evening ..... maybe they could answer that
[10:53] <Qten> fair enough
[12:40] <wido> RBD, RADOS or the Posix Filesystem (CephFS)
[12:40] <wido> In order of stablity at the moment: RADOS, RBD, CephFS
[12:43] * softcrack (de808f3b@ircip3.mibbit.com) has joined #ceph
[13:14] * softcrack (de808f3b@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[13:51] <stxShadow> Hi all
[13:52] <jluis> hi
[13:52] * jluis is now known as joao
[13:58] * softcrack (de808f3b@ircip3.mibbit.com) has joined #ceph
[14:07] * mtk (~mtk@ool-44c35967.dyn.optonline.net) has joined #ceph
[14:40] <nhm> good morning all
[14:44] <stxShadow> heyho
[14:44] <joao> hey nhm
[14:46] * wonko_be (bernard@november.openminds.be) has joined #ceph
[16:07] <tnt_> Azrael: I doubt df takes replication into account ... how could it ... you could have different replication per directory so df just displays raw free size (just a guess, but makes sense)
[16:10] <Azrael> tnt_: ok cool
[16:11] <Azrael> tnt_: is there a ceph command i can use to get the actual total and available sizes, with replication taken into account?
[16:17] <Azrael> iiiinteresting
[16:18] <Azrael> if i put a 2GB file into ceph and repl is 2
[16:18] <Azrael> shows up as 4GB used
[16:18] <Azrael> so it works out
[17:21] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[17:29] * jmlowe (~Adium@129-79-195-139.dhcp-bl.indiana.edu) has joined #ceph
[17:33] <sagewk> azrael: 'ceph pg dump' will tell you everything, including used disk (from osd perspective) and per-pool data (bytes stored from user perspective)
[17:47] <sagewk> darkfader: yep, several of us will be at WHD, although i'll miss part/most of tuesday.
[17:47] <Azrael> sagewk: cool, thanks
[17:48] <darkfader> sagewk: did you just see my mail or did you just see the old question?
[17:48] <darkfader> i feel paranoid now ;)
[17:48] <sagewk> darkfader: just saw your mail. i'll be the only dev
[17:48] <sagewk> :)
[17:48] <darkfader> hehe
[17:50] <darkfader> i'll see that i'll be able to make it
[17:50] <darkfader> then
[17:50] <darkfader> :)
[17:51] <sagewk> excellent. i'm looking forward to some good beer ;)
[17:51] <joao> WHD is that conference in Germany?
[17:51] <sagewk> it should be fun. i'm pretty sure wido will be around too
[17:51] <sagewk> yeah
[17:54] <joao> sagewk, if you happen to have a connection flight somewhere in Europe, let me know :p
[17:55] <sagewk> i'm going direct to frankfurt. the other guys will be coming from uk, though, not sure what their flights look like
[17:57] <darkfader> i just checked, 5 hours of train ride... seems i get a good nap on the way :>
[17:58] <darkfader> the website says it's a fantastic place for unforgettable parties
[17:59] <darkfader> not too bad ;)
[17:59] <joao> lol
[17:59] <darkfader> going by the hotel prices it might be smarter to party through the night, too :))
[18:00] <darkfader> sagewk: i just checked $calendar, i can only make it on friday
[18:00] <darkfader> i'm at a customer the other part of the week
[18:00] <darkfader> will you still be there on friday?
[18:01] <sagewk> only in the morning.. leaving around noon
[18:01] <darkfader> *scratch* planning fail
[18:39] <sagewk> tv__: poked
[18:39] * Tv|work (~Tv_@aon.hq.newdream.net) has joined #ceph
[18:40] <Tv__> heh, and i logged in ;)
[18:42] <Tv__> hrrm except of course i got a different ip address.. if you have zeroconf setup, can you "ping dreamer" or "ping dreamer.local" and tell me what ip address it is?
[18:46] <gregaf1> Tv__: I think mkampe would be crowing about NAS if he were paying attention to this ;)
[18:47] <Tv__> gregaf1: i don't want working files on a NAS anyway
[18:47] <Tv__> if anything, i'm crowing about not having a work laptop -- but then i like having my personal laptop at work..
[18:47] <Tv__> if anything, let's crow about NAT
[18:48] <gregaf1> unfortunately I don't have zerconf (whatever that is; I can't resolve dreamer or dreamer.local anyway), anything I can do from your lock screen to find an ip?
[18:48] <Tv__> gregaf1: you have os x, you most definitely have zeroconf -- they call it bonjour in appleland
[18:48] <gregaf1> otherwise I am going to start Not Caring, since I'm one of the few people who actually doesn't need new sepia :D
[18:48] <Tv__> "host" might not do it right, but ping definitely should
[18:49] <gregaf1> haha, I was a little surprised since I recognize the .local
[18:49] <gregaf1> sadly...
[18:49] <gregaf1> gregory-farnums-mac-mini:teuthology gfarnum$ ping dreamer
[18:49] <gregaf1> ping: cannot resolve dreamer: Unknown host
[18:49] <gregaf1> gregory-farnums-mac-mini:teuthology gfarnum$ ping dreamer.local
[18:49] <gregaf1> ping: cannot resolve dreamer.local: Unknown host
[18:49] <Tv__> hrmph
[18:49] <Tv__> cue my rant about zeroconf being unreliable
[18:49] <Tv__> but i'm not even going to dream about a reliable reverse dns service at the office ;)
[18:51] <jmlowe> diff nmap output with network cable plugged in and unplugged?
[18:51] <Tv__> haha
[18:51] <sagewk> tv__ this is usually when i fall back to nmap
[18:51] <gregaf1> oh, hrm, my server can find my mini just fine
[18:51] <gregaf1> I think your desktop is Doing It Wrong :/
[18:53] <Tv__> and the office gets natted on the way to new sepia, so i can't even use the logs there
[18:53] <Tv__> oh well, i might come to the office to get the files, and sneeze on everything
[18:54] <joao> I'm so glad right now I'm working from the other side of the atlantic
[18:54] <gregaf1> I'd laugh at you for not using a disposable password if I were only doing it myself
[18:55] <sagewk> tv__: for f in `seq 2 254` ; do ssh tv@a.b.c.$f ...
[18:55] <Tv__> heh
[18:55] <Tv__> yeah
[18:58] <Tv__> cssh $(seq monster) worked!
[18:58] <Tv__> one of the postage stamp sized windows does not contain an error message ;)
[19:04] <gregaf1> hurray ridiculous number of cpu cycles
[19:05] <gregaf1> I love using computers today
[19:12] <sagewk> tv__: what's your skype?
[19:12] <Tv__> tommi.virtanen
[19:13] <sagewk> tnx
[19:14] * chutzpah (~chutz@ has joined #ceph
[19:31] <sagewk> sjust1: do you know if there's an easy way to make 'make check' do our tests before leveldb's? makes it very hard to iterate on those tests
[19:32] <sagewk> or maybe there is a different command that will skip the nested tests...
[19:32] <Tv__> sagewk: order of subdirs in automake
[19:32] <Tv__> sagewk: i think there's a "make checklocal" or something like that
[19:32] <Tv__> checklocal, check-local, local-check.. ;)
[19:32] <joao> sagewk, the ObjectStore::Transaction's setattr are for xattrs, right?
[19:32] <joao> *is
[19:32] <sjust1> joao: yes
[19:33] <joao> thought so
[19:33] <Tv__> sagewk: try cd src && make check-local
[19:33] <sagewk> make check-local does the stuff you define (like our encode/decode tests), but not the stuff in the current makefile
[19:33] <sagewk> (e.g. unittest_*)
[19:33] <sjust1> sagewk: I haven't tried, I usually just run the specific test I'
[19:33] <Tv__> sagewk: how's encode/decode special?
[19:33] <sjust1> m fixing
[19:34] <sagewk> check-local:
[19:34] <sagewk> $(srcdir)/test/encoding/check-generated.sh
[19:34] <sagewk> $(srcdir)/test/encoding/readable.sh ../ceph-object-corpus
[19:34] <Tv__> ohhh
[19:34] <sagewk> i.e. not check_PROGRAMS or whatever
[19:34] <Tv__> i wonder if we're supposed to override it like that
[19:35] <sagewk> sjust1: cool
[19:38] <Tv__> sagewk: automake defaults to depth-first, put an explicit "." into src/Makefile.am SUBDIRS line.. SUBDIRS = . ocf leveldb
[19:38] <sagewk> tv__ sweet thanks
[19:42] <sagewk> sjust1: it's the iterator that holds the lock, not the IndexedPath, right?
[19:43] <sjust1> IndexedPath holds a different lock
[19:44] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[19:57] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[20:03] <Tv__> how does one add a file to a tarball with a different filename than what it is in the source?
[20:04] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:04] <Tv__> hmm i can --transform the names..
[20:06] <Tv__> note to self: lift fingers off keyboard before having a sneezing fit
[20:06] <Tv__> *undo undo undo*
[20:15] <sagewk> gregaf1: can you carve out some time today to look at wip-2116?
[20:15] <gregaf1> yeah, will do
[20:16] <sagewk> gregaf1: hmm, the alternative approach is to make them lossy_servers, and turn it into a ping/reply instead of start/heartbeat*n/stop type of deal
[20:17] <sagewk> gregaf1: that may be simpler to understand, actually.
[20:19] <gregaf1> sagewk: I'll have to check out the OSD heartbeat management a bit, but not having to deal with old pings on reconnect is probably happier
[20:19] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:19] <gregaf1> and requiring explicit replies will make debugging and reporting a lot easier
[20:20] <gregaf1> but isn't that what we had before that you tossed out?
[20:20] <dmick> Tv__: yeah, looks like --transform/--xform
[20:21] <Tv__> dmick: currently making it create a .tar with everything one needs to get a client goign
[20:22] <dmick> you could always just toss it in your own prototype tmpdir and tar that, too
[20:23] <Tv__> got it already
[20:25] <dmick> Tv__: network is such that VPN is required for access to plana, yes?
[20:25] <Tv__> dmick: yes
[20:25] <dmick> k
[20:25] <Tv__> alright now i need to write a little script to generate randomness on the client side, then try setting up a client
[20:29] <gregaf1> sagewk: hmm, we're unconditionally setting heartbeat_need_update in PG::update_heartbeat_peers() even if we didn't actually change our peers
[20:30] <gregaf1> and [maybe_]update_heartbeat_peers() takes a lock from every PG as soon as one PG decides it needs a peer update, so maybe we should be more careful about only triggering that when we need it
[20:36] <gregaf1> sagewk: anyway, all the rest of the changes are pretty rote and look like they're right to me
[21:10] <Qten> lo
[21:11] <Qten> wido: are you still around?
[21:15] <Qten> anyone: the Rados block device, i assume this create's small files eg 64mb or something and the Object storage replicates/distributes them across the cluster?
[21:17] <joshd> Qten: yes, but by default they're 4mb - it's configurable at creation time by the --order parameter
[21:17] <jmlowe> that's correct, I think if you do rbd info it will tell you the exact size and number of objects
[21:18] <joshd> the number of objects is not the number actually used though - that's not tracked - it's just the size / object size
[21:19] <Qten> interesting, so does that mean a 1mb file inside the rbd would use 4mb or is it only creating new chunks as the last one fills up?
[21:19] <Qten> ah
[21:19] <joshd> Qten: the object store can read/write to different portions of an object
[21:19] <Qten> pretty much answered my question :)
[21:20] <joshd> so there's no extra space used due to large object sizes
[21:20] <sagewk> gregaf1: ah, ill fix that bit and merge the update_heartbeat_peers patch
[21:21] <Qten> are you guys familure with moosefs?
[21:22] <Qten> just wondering from a developers point of view how they differ from ceph
[21:22] <Tv__> crypto hashes are so finicky about the exact bytes they are fed....
[21:22] <Qten> it seems to use 64mb chunks (files) like a distributed fs
[21:22] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Operation timed out)
[21:23] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:23] <Qten> but seems slower then what i would have expected or is that just an issue with this concept?
[21:24] <Qten> of distributed fs/object storage / striping?
[21:25] <sagewk> gregaf1: http://fpaste.org/ACN9/
[21:27] <sagewk> tv__: can you get josh set up first so he can move the teuthology vm over? will prob take a while to copy etc.
[21:27] <Qten> so i guess further to that question from a hardware point of view if you had a array with say 36 sata disks 2700iops/3600mb/s would you ever see close to the maximum arrays output with say 1 or 2 clients? or would you need to have 50 or so?
[21:28] <Qten> assuming 40gb IB or something
[21:29] <Tv__> sagewk: yeah sorry fighting authentication now
[21:29] <joshd> Qten: depends on what the client's doing, but I'd guess you'd need more
[21:29] <nhm> Qten: I've read a bit about moose but never tested it. Not sure what kind of performance they see.
[21:30] <Tv__> i have stupid typo somewhere but it's crypto hashing so it's hard to find :(
[21:30] <jmlowe> I had 24 1tb sas 7.2k drives across 2 machines with 10GigE and I got up to 900MBs with vm's running iozone
[21:30] <jmlowe> vm's spread out over 4 hosts
[21:30] <nhm> Qten: one of the things you run into with that many disks in a single box is PCIE/QPI limitations, especially if you have a NUMA setup.
[21:30] <Tv__> ahh \n
[21:30] <Qten> i was thinking 3 box's 12 disks each, sorry :)
[21:32] <jmlowe> there is a benchmark command for osd's, that should give you some idea of what you can get in aggregate
[21:32] <Qten> jmlowe: so that was 4 hosts at 900mb/s or 900mb/s div by 4 hosts so say 220~ each?
[21:33] <Tv__> so the .sepia.ceph.com dns only resolves from the office
[21:33] <Tv__> wth
[21:33] <nhm> Qten: A single client is going to be limited to about 3GB/s over QDR IB. So if we are talking entirely from a theoreticaly standpoint, you would end up network limited with 36 modern drives.
[21:33] <Tv__> but at least i can access plana via the vpn now
[21:33] <nhm> Qten: ignoring other factors...
[21:34] <Tv__> oh huh the dns is just *my* dns being broken? how can that be
[21:34] <Qten> nhm: sure
[21:34] <nhm> Qten: in the real world, you'd probably not hit the IB limitation first.
[21:34] <Tv__> ohh goddamn evil DH vpn steals routing
[21:34] <Tv__> gah
[21:35] <dmick> that...seems like a lot
[21:35] <jmlowe> Qten: 900mbs aggregate, about 40mbs per vm while each running iozone concurrently
[21:35] <nhm> Tv__: ACL it? ;)
[21:36] <Qten> jm: is that using the RBD driver?
[21:36] <jmlowe> nhm: agreed, IB is the least of your concerns if you are trying to do this
[21:36] * adjohn (~adjohn@50-0-92-115.dsl.dynamic.sonic.net) Quit (Ping timeout: 480 seconds)
[21:37] <nhm> jmlowe: Yeah, I'll be happy when IB is a concern. :)
[21:37] <jmlowe> qten: ubuntu 11.10 with qemu 0.15.1 patched for async rbd and configured —with-rbd
[21:38] <darkfader> nhm: as long as you start pushing rdma support when you're that happy :)
[21:39] <nhm> darkfader: would you use it if it was available?
[21:39] <darkfader> yes
[21:39] <darkfader> i have infiniband in all servers
[21:39] <darkfader> and the "ip over infiniband" performance is very low when comparing
[21:40] <darkfader> plus it's lame
[21:40] <nhm> darkfader: Out of curiosity are you industry or academia?
[21:40] <darkfader> "wanna-be webhost"
[21:41] <nhm> interesting! What kind of IB are you running?
[21:41] <darkfader> the old ones have DDR, and the new supermicro has QDR
[21:41] <nhm> mellanox?
[21:41] <darkfader> my switch is DDR only
[21:41] <darkfader> yes
[21:41] <nhm> QDR switches are still pretty pricey.
[21:42] <nhm> DDR still gives you good throughput and low latency though.
[21:42] <darkfader> yes, i'll just do crossconnects with the QDR (methinks)
[21:42] <nhm> We never were brave enough to try a setup like that.
[21:43] <darkfader> the problem is that due to starting up i'll not be able to rack all of this from start
[21:43] <Tv__> dmick, joshd: check your email
[21:43] <darkfader> so at first it'll just be 1:1 connection for two servers and then well... scale up :)
[21:43] <nhm> darkfader: so are you running ceph on ipoib now?
[21:44] <joshd> Tv__: thanks
[21:44] <darkfader> no, not atm. my vps boxes are still on oracle VM 2.2.3, too old in kernel terms to tackle with ceph on there
[21:45] <Qten> this is probably a silly question but would it be possible has it been done or even usefull to have a fuse-iscsi driver which uses a folder with files in it instead of using like a RBD block files? so your basically removing the extra file inside a file layer
[21:45] <darkfader> nhm: i can understand you didn't yet test something like it because you have a huge amount of dependency hell along with it
[21:46] <nhm> darkfader: could be. I've not even begun to explore it.
[21:46] <darkfader> Qten: fuse would probably eat all your performance anyway :)
[21:46] <Tv__> Qten: how does iscsi enter the picture?
[21:46] <nhm> darkfader: looks like gregaf1 maybe knew of some people testing it based on some old irc logs.
[21:46] <Qten> tv__:as an export to hypervisors
[21:47] <darkfader> nhm: yes, there was one from academia for example
[21:47] <Tv__> Qten: then it's not a folder of files.. anyway, think of rbd as iscsi-on-steroid
[21:48] <Tv__> rbd : iscsi :: ceph fs : nfs
[21:48] <nhm> Tv__: btw, check IM when you have a moment
[21:48] <Tv__> nhm: oh crud i'm not logged into that account from here
[21:48] <Tv__> nhm: can you resend using google apps im or irc?
[21:48] <nhm> Tv__: no worries
[21:48] <Qten> tv__: I suppose the one of many issues would be file locking
[21:49] <Qten> tv__: by using a technology i was just talking about anyway
[21:49] <Qten> so currently the RDB driver isnt fuse based?
[21:50] <NaioN> no
[21:50] <Tv__> Qten: no, fuse is about filesystems, rbd is a block device
[21:50] <Qten> understood however how does rdb talk to the filesystem via the fuse mount?
[21:50] <Tv__> dmick: test your vpn please
[21:50] <Tv__> Qten: it doesn't
[21:52] <NaioN> Qten: rbd = remote block device, so you have to put a filesystem on it or have to re-export it...
[21:53] <Qten> Naion: understand that part however i'm trying to figure out how rdb talks to the DFS
[21:53] <NaioN> or use kvm/qemu
[21:53] <NaioN> there is no DFS
[21:53] <Tv__> Qten: the rbd client talks the rados protocol directly
[21:53] <NaioN> there is a distributed object store
[21:54] <Tv__> joshd: any luck with the vpn?
[21:54] <NaioN> so the rbd gets chopped into parts/objects and distibuted over the OSDs
[21:54] <joshd> Tv__: no, sent mail with log
[21:54] <Qten> sorry i ment dfs as in rados
[21:56] <Qten> so the big question any ideas on the expected stable/production ready release?
[21:56] <NaioN> well as far as I have experimented with rdb it's pretty stable
[21:57] <NaioN> I have a stup with OSD per disk, formatted with XFS
[21:57] <NaioN> and about 80 rbds in use
[21:58] <dmick> Tv__: "~/cephco..."?
[21:58] <Qten> nice
[21:58] <Tv__> dmick: replace with where the tar is
[21:58] <Tv__> joshd: i don't see your attempt on the server side :(
[21:58] <dmick> missed the attachment, sorry
[21:58] <nhm> Qten: it gets closer every day. ;)
[21:58] <jmlowe> Stable? the roadmap says in 23 days 0.45 will be out and the experimental warnings will go away
[21:59] <stxShadow> rbd is very stable .... we have a clust with about 250 rbds
[21:59] <Tv__> joshd: last entry is 12:32:30
[21:59] <Tv__> joshd: i do have openvpn 2.2 here
[22:00] <dmick> Tv__: does MYHOST need to be somehow FQDN, or is simple hostname enough?
[22:00] <Qten> <wido> In order of stablity at the moment: RADOS, RBD, CephFS
[22:00] <Tv__> dmick: any string
[22:00] <dmick> ok
[22:00] <NaioN> Qten: yeah that's true
[22:01] <NaioN> well I had a lot of troubles with btrfs as underlying filesystem for the OSDs
[22:01] <Qten> interesting
[22:01] <stxShadow> NaioN .... same here .... therefore changed to xfs
[22:02] <NaioN> even with the latest kernels (3.2.x)
[22:02] <jmlowe> I've had much better luck with btrfs using 3.2.5+, currently on 3.2.9
[22:02] <NaioN> jmlowe: hmmm haven't tried the later 3.2.x kernels
[22:03] <NaioN> I'm waiting for new hardware, want to make a production cluster and experimental cluster
[22:03] <NaioN> I want to experiment again with btrfs
[22:03] <stxShadow> i've tested 3.2.4 ..... broke btrfs in 12 hours
[22:03] <Qten> so what size chunks if you will does RADOS use?
[22:03] <nhm> jmlowe: Are you seeing good throughput with that setup?
[22:03] <Tv__> joshd: works for dmick using openvpn 2.2.0-2ubuntu1, can you upgrade?
[22:03] <jmlowe> Not that I haven't cursed Chris Mason and his year late btrfsck
[22:03] <joshd> Tv__: trying 2.3 (my desktop is arch)
[22:03] <NaioN> the last time i used btrfs is with 3.2.5 (if i'm correct) and it didn't crash but it slowed down
[22:04] <NaioN> jmlowe: :)
[22:04] <jmlowe> nhm: I had to go down to mirrored pairs, now seeing about 500-600MBs, hp says I may have gotten a batch of bad disks
[22:04] <NaioN> jmlowe: no trouble with slowdowns?
[22:04] <nhm> jmlowe: how many nodes again?
[22:05] <NaioN> after a time with high load?
[22:05] <jmlowe> I've got two storage nodes with 8 vm hosts, currently I've got 4 vm's banging away with iozone on while /bin/true loops
[22:05] <gregaf1> sagewk: looks good
[22:06] <jmlowe> I don't think I've had slowdowns, I have to run some tests Monday and check it monday
[22:06] <nhm> jmlowe: ok, nice.
[22:06] <jmlowe> right now I'm trying to shake out the remaining bad disks
[22:07] <nhm> jmlowe: 2x replcation?
[22:07] <jmlowe> replaced 5/24 earlier this week, I think my total replacement count is 11/24 disks
[22:07] <jmlowe> default 2x, 6 osd's per host
[22:08] <gregaf1> jmlowe: what disks are you using to get that kind of replacement count? :(
[22:08] <Qten> does RADOS currently have any protection for URE/BitRot etc?
[22:09] <NaioN> yeah haven't had that replacement count even with my cheapass sata disks :)
[22:09] <jmlowe> there is scrubbing, what it actually does is a question I'd like answered for myself
[22:09] <nhm> gregaf1: we had one storage solution that used seagate ES.2 drives that never had a bad disk in 4 years. We had another solution using the exact same drives that had 2-3 drives failing a month.
[22:09] <gregaf1> Qten: sadly not
[22:10] <gregaf1> there is a scrubbing mechanism, which right now is pretty primitive — it compares the expected metadata against the actual FS metadata for each object, and then compares the metadata on each replica
[22:10] <Qten> so you would need to run zfs/brtfs then i guess
[22:10] <sagewk> sjust1: wip-2103? :)
[22:10] <gregaf1> at some point it should also calculate checksums and compare those, but it's not something we currently consider high-enough priority
[22:11] <jmlowe> Model: HP DB1000BABFF,Model: HP MB1000BAWJP,Model: HP MB1000FBZPL,
[22:11] <gregaf1> also at some point we should have checksums that persist along with data, but again — not high enough priority at this time
[22:12] <jmlowe> All 1Tb 7.2k sas dual port drives
[22:12] <NaioN> Qten: yeah you need a fs for the OSDs
[22:12] <NaioN> and XFS and BTRFS have the advantage of dyanmic xattr space
[22:13] <NaioN> although you could use ext3 or ext4
[22:13] <gregaf1> wow, those are some bad numbers out of SAS drives, I thought maybe you had the sleep-happy Caviar Greens that everybody's had trouble with
[22:13] <NaioN> gregaf1: well those are sata drives
[22:13] <sjust1> sagewk: looking
[22:14] <NaioN> with a sas connector
[22:14] <gregaf1> brb
[22:17] <jmlowe> I don't think it's an coincidence that all the replacement drives have a different model number and the latest firmware rev is 7 less than the older ones, I suspect they replaced their oem
[22:19] * mtk (~mtk@ool-44c35967.dyn.optonline.net) Quit (Quit: Leaving)
[22:22] <nhm> jmlowe: I'm pretty sure vibration is what killed a lot of our drives in the batch that failed.
[22:23] <sjust1> sagewk: wip-2103 looks ok
[22:24] <sjust1> there is a watch_notify_stress task for testing watch notify
[22:25] <sagewk> sjust1: oh perfect, i'll run that with lockdep.
[22:39] <sagewk> sjust1: we should add that to the qa suite
[23:05] <sagewk> tv__, sjust1: btw, the SUBDIRS = . leveldb thing doesn't work bc leveldb needs to build first. 'make check-am' doesn't recurse, tho, so i'm happy :)
[23:57] <mrjack> hi
[23:57] <mrjack> is there some sort of fsck for ceph?
[23:59] <gregaf> mrjack: nope, not yet

