#ceph IRC Log


IRC Log for 2012-03-27

Timestamps are in GMT/BST.

[0:20] * eroanok (~Adium@pool-72-74-156-95.bstnma.fios.verizon.net) Quit (Quit: Leaving.)
[0:25] * LarsFronius (~LarsFroni@f054111048.adsl.alicedsl.de) Quit (Quit: LarsFronius)
[0:37] <dmick> nhm: I'd sort of assumed we'd just copy all the firmware to never have to worry about this again. Tv|work, do you have any opinions?
[0:38] <Tv|work> dmick: s/copy/git clone/; yes;
[0:38] <dmick> oh just clone it right into /lib/firmware/updates, and let the .git fall where it may, even?
[0:39] <Tv|work> yeah
[0:39] <Tv|work> makes incremental runs cheap
[0:39] <dmick> good point
[0:40] * `gregorg` (~Greg@ has joined #ceph
[0:41] * gregorg_taf (~Greg@ Quit (Read error: Connection reset by peer)
[0:50] <Tv|work> why does this supposedly automated installation insist on prompting me about partitioning :(
[0:56] <dmick> in my case, because there were emulated drives, but you knew that :)
[0:59] * lxo (~aoliva@lxo.user.oftc.net) Quit (Read error: Operation timed out)
[1:09] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:10] * joao (~JL@ Quit (Quit: Leaving)
[1:22] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[1:26] * lofejndif (~lsqavnbok@09GAAECCO.tor-irc.dnsbl.oftc.net) Quit (Quit: Leaving)
[1:30] <Tv|work> note to self: do not use vmbuilder, it's still based on running debootstrap in a chroot on the host and hence inherits that full decade of suck
[1:30] <Tv|work> it just messed up this server
[1:32] <iggy> "you're doing it wrong"
[1:32] <Tv|work> well i tried to use the thing with the most features; that was my problem
[1:32] <Tv|work> time to go back to the basics and construct the same thing myself, with something more reliable
[1:32] <Tv|work> (= don't run the vm's crap in the hypervisor)
[1:34] <iggy> targeting kvm?
[1:34] <Tv|work> yeah
[1:34] <Tv|work> next up, virt-install
[1:34] <iggy> there are some other tools under the libguestfs umbrella that may help
[1:37] * BManojlovic (~steki@ Quit (Ping timeout: 480 seconds)
[1:38] <perplexed> Are there any canned ceph test tools available to test basic file/object storage and retrieval performance of a cluster? Something adaptable to fire a bunch of pre-defined file objects into the cluster ideally?
[1:39] <perplexed> API access
[1:40] <perplexed> (librados I believe)
[1:40] <iggy> normal fs benchmarks generally do a good job of tripping up ceph
[1:40] <Tv|work> perplexed: for files, it's just a filesystem; for objects, radosbench is a low-level benchmark tool that you can use
[1:42] <Tv|work> here's a bunch of filesystem-based test programs we use a lot: https://github.com/ceph/ceph/tree/master/qa/workunits/suites
[1:42] <Tv|work> some of them are benchmarks, some not
[1:42] <perplexed> Thx. I'd like to use the API to store the files (image files in this case) as objects. Clients will be distributed, so I'm assuming mounting ceph as a filesystem isn't viable... but API access to the cluster would be the way to go (I believe).
[1:44] * Guest7776 (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Remote host closed the connection)
[1:46] <joshd> perplexed: yeah, librados is probably the way to go. there's documentation for the C API at http://ceph.newdream.net/docs/master/api/librados/, and in src/include/rados/librados.h
[1:47] <perplexed> thx
[1:53] <iggy> does librbd use librados, or does it talk directly?
[1:53] <sjust> librados
[1:54] <joshd> users of librbd also use librados so librbd doesn't need to duplicate setup/configuration
[1:55] <iggy> that means it gets testing from librbd users too
[1:57] <joshd> yeah, but the portion of librados that rbd uses is small
[1:58] <iggy> :( i was trying to sell it
[1:58] <joshd> the radosgw does use most of librados, if it makes you feel better
[2:01] <iggy> which might be a good option for perplexed... there are likely tons of modules/bindings for whatever lang he's using
[2:02] <joshd> people have written librados bindings for java, ruby, python (those are in-tree now)
[2:03] <Tv|work> using e.g. the python rados bindings vs using boto to talk to rgw are roughly as simple/complex to use; rgw's biggest contribution is multi-tenancy and more common API
[2:08] <perplexed> Originally I was thinking of using the rados GW route w apache on the front-end, but by the sounds of it the highest performance route is to just hit the API direct...
[2:11] <perplexed> Under the gun from a time perspective, so I was hoping to find a test tool that could be modified to do what I need via the API. Interested in seeing how the API performs when storing image files.
[2:11] <perplexed> ls
[2:13] <perplexed> Is radosbench provided in the ceph source, or a separate download?
[2:13] <dmick> "rados bench"
[2:13] <dmick> (i.e. it's a subcommand of the rados command)
[2:13] <perplexed> ah... thx for the clarification
[2:14] <iggy> if you read the radogw page, it says librados is the slow part, so i don't knowhow much you save going directly
[2:31] <elder> sagewk, I am running xfstests on a pair of rbd devices now for the first time. The first 10 tests have all passed, though I've got some weird delays on mount once or twice. I'm going to let it run for a while and check back later for results.
[2:45] <nhm> iggy: fwiw, Sam is making some changes that should have some pretty nice performance benefits.
[2:46] <iggy> in radosgw? or librados?
[2:46] <nhm> iggy: on the OSDs. You'll see it with rados bench
[2:47] <iggy> deets?
[2:47] <iggy> read locality or something?
[2:48] <nhm> iggy: sync being used accidently instead of dsync. Also migrating to writev
[2:49] <gregaf> there's something wrong in the journaling layer that he's tracking down and fixing
[2:49] <gregaf> also what nhm said
[2:50] <iggy> cool... i did notice from mailing list posts that performance was becoming a higher priority
[2:50] <iggy> which usually means stability is coming along nicely
[3:00] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[3:05] * imjustmatthew (~imjustmat@pool-96-228-59-130.rcmdva.fios.verizon.net) has joined #ceph
[3:08] * perplexed (~ncampbell@ Quit (Quit: perplexed)
[3:31] <imjustmatthew> gregaf: filed as http://tracker.newdream.net/issues/2218
[3:32] <imjustmatthew> gregaf: and about 5.25 GB/hour from the MDS at debug=20
[3:47] * danieagle (~Daniel@ Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[4:07] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:20] * chutzpah (~chutz@ Quit (Quit: Leaving)
[5:54] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[6:52] * pruby (~tim@leibniz.catalyst.net.nz) Quit (Remote host closed the connection)
[6:57] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[7:30] * ssedov (stas@ssh.deglitch.com) Quit (Read error: Connection reset by peer)
[7:31] * stass (stas@ssh.deglitch.com) has joined #ceph
[7:37] * perplexed (~ncampbell@ has joined #ceph
[7:45] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (Quit: eternaleye)
[7:45] * eternaleye (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[7:54] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (Quit: eternaleye)
[7:54] * eternaleye (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[7:54] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit ()
[7:55] * eternaleye (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[7:59] <perplexed> Does a librados client distribute requests automatically across available OSD servers (round-robin/other logic)? I'm assuming there is some SW load-balancing logic (?).
[8:01] * perplexed_ (~ncampbell@c-76-21-85-168.hsd1.ca.comcast.net) has joined #ceph
[8:02] * perplexed (~ncampbell@ Quit (Read error: Connection reset by peer)
[8:02] * perplexed_ is now known as perplexed
[8:03] * perplexed (~ncampbell@c-76-21-85-168.hsd1.ca.comcast.net) Quit ()
[8:39] * foosinn (~foosinn@office.unitedcolo.de) has joined #ceph
[8:40] <foosinn> hi, is someone here who can help me with some questions about rados?
[8:51] * Theuni (~Theuni@ has joined #ceph
[8:58] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[9:01] * phil (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[9:02] * phil is now known as Guest7873
[9:07] * yoshi_ (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[9:07] * yoshi (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Read error: Connection reset by peer)
[10:02] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:02] * MarkN (~nathan@ Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * krisk (~kap@rndsec.net) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * ottod_ (~ANONYMOUS@9YYAAELTK.tor-irc.dnsbl.oftc.net) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * ivan` (~ivan`@li125-242.members.linode.com) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * imjustmatthew (~imjustmat@pool-96-228-59-130.rcmdva.fios.verizon.net) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * sboyette (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * rosco (~r.nap@ Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * gohko (~gohko@natter.interq.or.jp) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * iggy (~iggy@theiggy.com) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * eightyeight (~atoponce@pthree.org) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * darkfader (~floh@ Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * blufor (~blufor@mongo-rs2-1.candycloud.eu) Quit (reticulum.oftc.net resistance.oftc.net)
[10:02] * __jt__ (~james@jamestaylor.org) Quit (reticulum.oftc.net resistance.oftc.net)
[10:04] * eternaleye (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[10:04] * imjustmatthew (~imjustmat@pool-96-228-59-130.rcmdva.fios.verizon.net) has joined #ceph
[10:04] * sboyette (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[10:04] * MarkN (~nathan@ has joined #ceph
[10:04] * krisk (~kap@rndsec.net) has joined #ceph
[10:04] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[10:04] * rosco (~r.nap@ has joined #ceph
[10:04] * gohko (~gohko@natter.interq.or.jp) has joined #ceph
[10:04] * cattelan_away (~cattelan@c-66-41-26-220.hsd1.mn.comcast.net) has joined #ceph
[10:04] * ottod_ (~ANONYMOUS@9YYAAELTK.tor-irc.dnsbl.oftc.net) has joined #ceph
[10:04] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[10:04] * ivan` (~ivan`@li125-242.members.linode.com) has joined #ceph
[10:04] * eightyeight (~atoponce@pthree.org) has joined #ceph
[10:04] * cclien (~cclien@ec2-50-112-123-234.us-west-2.compute.amazonaws.com) has joined #ceph
[10:04] * iggy (~iggy@theiggy.com) has joined #ceph
[10:04] * __jt__ (~james@jamestaylor.org) has joined #ceph
[10:06] * darkfader (~floh@ has joined #ceph
[10:06] * blufor (~blufor@mongo-rs2-1.candycloud.eu) has joined #ceph
[10:09] <wonko_be> foosinn: just ask your question - if somebody can answer it, he/she will
[10:10] <foosinn> ok, did i unterstand i
[10:10] <foosinn> is it right that i can mount the same rados block device in multiple clients?
[10:28] * LarsFronius_ (~LarsFroni@testing78.jimdo-server.com) has joined #ceph
[10:28] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Read error: Connection reset by peer)
[10:28] * LarsFronius_ is now known as LarsFronius
[10:30] * Theuni (~Theuni@ Quit (Ping timeout: 480 seconds)
[10:47] * Theuni (~Theuni@alphastar.fe.whq.gocept.net) has joined #ceph
[10:58] <wido> foosinn: Yes and no
[10:58] <wido> A RADOS Block Device is a block device which you can map to multiple machines
[10:59] <wido> so on let's say 4 machines you have /dev/rbd0, which all points to the same device
[10:59] <wido> you can't simply put ext4 on it and mount it on all 4 hosts, since ext4 is not cluster aware
[10:59] <wido> You will need to use something like OCFS2
[11:00] <foosinn> ok thats all i wanted to know, thank you :)
[11:02] * Theuni1 (~Theuni@ has joined #ceph
[11:06] * Theuni (~Theuni@alphastar.fe.whq.gocept.net) Quit (Ping timeout: 480 seconds)
[11:22] * phil (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[11:22] * phil is now known as Guest7898
[11:25] <dwm__> Query: would increasing the number of OSDs in a cluster dynamically increase its PG count?
[11:25] <dwm__> ('Cos that's what I've just seen happen on this test set-up, and it's done the known-broken things.)
[11:26] * Guest7873 (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[11:57] * yoshi_ (~yoshi@p1062-ipngn1901marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[12:37] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[13:01] * mgalkiewicz (~mgalkiewi@ has joined #ceph
[13:05] <mgalkiewicz> Hello guys. I have a problem with dying osd http://pastie.org/3677343. I am using ceph 0.44.
[13:47] * BManojlovic (~steki@ has joined #ceph
[13:57] <wido> mgalkiewicz: I'm seeing the same
[13:57] <wido> what kernel / btrfs are you using?
[13:59] <wido> dwm__: I don't think the PG's should increase automatically
[13:59] <wido> PG's are per pool, not cluster-wide
[14:02] <wido> mgalkiewicz: I'll create an issue for this
[14:09] <wido> mgalkiewicz: http://tracker.newdream.net/issues/2219
[14:14] <elder> I killed plana34 last night. I think maybe a deadlock when backing a loop device by an rbd device.
[14:14] <elder> Back in a couple hours. Off to the dentist.
[14:21] <nhm> elder: We should get IPMI access setup.
[14:21] <nhm> Not being able to sit at a remtoe console is limiting.
[14:28] * oliver1 (~oliver@p4FFFE89B.dip.t-dialin.net) has joined #ceph
[14:31] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[15:00] <mgalkiewicz> wido: 2.6.32-5
[15:00] <mgalkiewicz> wido: not sure how to check btrfs
[15:00] * joao (~JL@ has joined #ceph
[15:02] <wido> mgalkiewicz: Ok, 2.6.32, tnx :)
[15:02] <wido> I'll add to the issue
[15:02] <wido> are you running btrfs or ext4?
[15:02] <mgalkiewicz> wido: btrfs
[15:02] <wido> ok, tnx
[15:03] <wido> hey joao
[15:03] <wido> I started seeing those btrfs errors again today
[15:03] <wido> I'm at the office right now, but I have the pastebin link at my desktop at home
[15:06] <joao> wido, any chance you can update to the 3.3 kernel?
[15:08] <joao> in any case, if you're able to tell me the git's HEAD sha, I will take a look at it :)
[15:09] <wido> joao: Sure, I can upgrade to 3.3
[15:09] <wido> I grabbed the 3.2 kernel somewhere, let me check the git sha
[15:11] <wido> joao: 83eb26af0db71f2dfe551405c55d982288fa6178
[15:11] <wido> it's from ceph-client.git, it had some backported btrfs code
[15:11] <wido> joao: date "Wed Jan 11 17:41:01 2012 -0800"
[15:13] <joao> k thanks
[15:14] <wido> joao: I'm downloading 3.3 now, will build that
[15:18] <joao> well, with btrfs it's possible just upgrading fixes it
[15:21] <joao> but from what I've seen from the list so far, that warning has been triggered before
[15:21] <joao> as in, early last year
[15:21] <dwm__> wido: Hmm, interesting.
[15:21] <dwm__> wido: I also tweaked the replication-count from 2 -> 3, might that have been the trigger?
[15:23] <wido> dwm__: I don't think so. pg_num is a static setting of a pool. A PG gets more acting OSD's when you increase the replication level
[15:23] <wido> a dev might shed some more light on this, since I know PG's don't increase by themselfs
[15:27] <dwm__> Mmm, I've mailed ceph-devel -- will see if anyone's able to shed some light.
[15:27] <joao> I'm checking out that commit, will try to figure out what's happening
[15:28] <joao> wido, you hit that warning since yesterday?
[15:29] <joao> or was it on the same long running system?
[15:30] <joao> what I mean by this is: can you reproduce it? :)
[15:30] <joao> I kinda love reproducible bugs
[15:30] <nhm> joao: btw, how much do you know about the journal code?
[15:31] <joao> ceph's journal?
[15:31] <nhm> joao: yeah
[15:31] <joao> can't say I know anything, I think
[15:32] <nhm> joao: ok, I don't know anything either, but am trying to figure out what kind of locking is done durnig a write to it.
[15:33] <joao> I suppose that would be done somewhere in FileJournal.cc, or something, no?
[15:33] <nhm> joao: yeah, reading it now
[15:34] <joao> nhm, the class has a Mutex
[15:34] <joao> FileJournal.h +210
[15:35] <joao> if I had to infer, I'd say it's plain mutual exclusion
[15:35] <joao> full serialization and all
[15:35] <joao> well, coffee run
[15:35] <joao> brb
[15:36] <nhm> joao: thanks!
[15:36] <nhm> looks like it gets locked at the beginning of submit_entry
[15:36] <nhm> well, and a whole bunch of other stuff.
[15:37] <nhm> nevermind, just committed_thru
[15:42] <wido> joao: I hit it yesterday indeed
[15:43] <wido> But, it might be related to: http://tracker.newdream.net/issues/2219
[15:43] <wido> I can't really say IF I can reproduce it, but sometimes OSD's die with the timeout
[15:43] <wido> and occasionally I see btrfs messages
[15:46] <joao> wido, regarding this
[15:46] <joao> 15:00 < mgalkiewicz> wido: 2.6.32-5
[15:46] <joao> 15:00 < mgalkiewicz> wido: not sure how to check btrfs
[15:46] <joao> on the bug report
[15:46] <joao> well, 2.6.32 is pretty old by btrfs standards
[15:46] <joao> not blaming btrfs though
[15:46] <joao> :)
[15:47] <wido> joao: I know. So I don't know IF they are related
[15:47] <joao> yeah
[15:47] <wido> it's just that I've started seeing both yesterday and today
[15:48] <joao> the thing is, if you don't see btrfs stack traces on the dying nodes, I have a hard time believing this issue with the extent-tree.c warning is related
[15:49] <joao> this warning appears to be triggered when umounting
[15:49] <joao> not sure, gotta take a closer look
[15:50] <mgalkiewicz> joao: I can probably check with kernel 3.2 if you want
[15:50] <joao> mgalkiewicz, 3.3 would be best
[15:50] <joao> but 3.2 would be good as well
[15:50] <joao> :)
[15:54] <elder> nhm absolutely. I *need* to have a console window for kernel work.
[15:56] <mgalkiewicz> joao: I have to go right know. I will update bug report when I check newer kernel.
[15:58] * mgalkiewicz (~mgalkiewi@ Quit (Quit: Ex-Chat)
[16:01] * Theuni (~Theuni@ has joined #ceph
[16:02] * Theuni1 (~Theuni@ Quit (Remote host closed the connection)
[16:42] * Theuni (~Theuni@ Quit (Quit: Leaving.)
[16:42] * Theuni (~Theuni@ has joined #ceph
[17:10] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[17:15] * foosinn (~foosinn@office.unitedcolo.de) Quit (Quit: Verlassend)
[17:36] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:42] * aliguori (~anthony@ has joined #ceph
[17:51] * oliver1 (~oliver@p4FFFE89B.dip.t-dialin.net) has left #ceph
[17:57] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[18:04] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[18:05] * joshd (~joshd@aon.hq.newdream.net) has joined #ceph
[18:09] * chutzpah (~chutz@ has joined #ceph
[18:21] * BManojlovic (~steki@ has joined #ceph
[18:21] * Theuni (~Theuni@ Quit (Quit: Leaving.)
[18:31] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[18:33] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) has joined #ceph
[18:34] * perplexed (~ncampbell@ has joined #ceph
[18:41] <wido> joao: Running 3.3.0 now, seeing what that does
[18:41] <wido> fyi, my pastebin: http://pastebin.com/0xEQusTM
[18:42] <dwm__> Okay, correction from my earlier: it's not the OSD addition that caused the PG count bump, it was changing the replica count from 2 -> 3.
[18:42] * dwm__ goes to update mailing-list thread.
[18:52] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[18:58] * bchrisman (~Adium@ has joined #ceph
[19:10] <sagewk> nhm, elder: get on skype?
[19:10] <sagewk> let me test this before the meeting this time :)
[19:13] <nhm> sage: on now
[19:36] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[19:45] * phil (~quassel@chello080109010223.16.14.vie.surfer.at) has joined #ceph
[19:45] * phil is now known as Guest7936
[19:49] * Guest7898 (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[19:54] * Guest7936 (~quassel@chello080109010223.16.14.vie.surfer.at) Quit (Remote host closed the connection)
[19:56] * swendel (~swendel@ has joined #ceph
[19:57] * swendel (~swendel@ Quit ()
[20:00] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:12] * ghaskins (~ghaskins@68-116-192-32.dhcp.oxfr.ma.charter.com) has joined #ceph
[20:37] * Oliver1 (~oliver1@ip-109-90-14-183.unitymediagroup.de) has joined #ceph
[20:44] <sagewk> gregaf: fixes look good.
[20:44] <sagewk> gregaf: well,maybe make the doc clear that it is specific to cephfs, not rbd
[20:45] * pruby (~tim@leibniz.catalyst.net.nz) has joined #ceph
[20:45] <gregaf> okay, can do
[20:45] <Oliver1> Hey Sage, had a good trip home? :-D
[20:50] <perplexed> Quick Q: Does a librados client distribute requests automatically across available OSD servers (round-robin/other logic)? I'm assuming there is some SW load-balancing logic (?)
[20:50] <Oliver1> BTW: updated #2178.
[20:51] <Tv|work> perplexed: CRUSH spreads data over the cluster; requests are directed to where the data is.
[20:52] <perplexed> Thx... So writes are balanced, and reads are directed to where the replicas are located...
[20:53] <perplexed> ... unless the crush map is biased to a particular server/rack/location? Sorry if this is an obvious thing... still coming up to speed on ceph
[20:55] <perplexed> In my test cluster I currently have 4 osd nodes in the same DC, with no rack location defined. Wondering how writes are distributed in this simple scenario. Looks like default replica count is 2
[20:57] <gregaf> perplexed: CRUSH maps objects (generally by name) to a primary (and secondary, if applicable) OSDs, using a map that you define (although Ceph auto-generates one for you)
[20:57] <gregaf> the client automatically directs reads and writes to the primary OSD
[20:58] <gregaf> you can bias the crush map if you want, but generally it should be matched to performance
[20:58] <gregaf> there is no round-robin on a per-object basis
[20:58] <gregaf> but with 4 OSDs in a default setup you should see each OSD getting roughly one quarter of the reads and writes (though with the 2x replication, writes will actually go to 2 OSDs so each OSD will see half of them)
[20:58] <nhm> gregaf: while we are on the subject, where exactly do placement groups fit in?
[20:59] <gregaf> objects are actually mapped to placement groups, which are then mapped to OSDs
[20:59] <Tv|work> perplexed: http://ceph.newdream.net/docs/master/dev/placement-group/
[20:59] <NaioN> perplexed: because the objects are semi-random placed you get a nice distributed write and read pattern
[21:00] <nhm> It would be fun to look at using a stratified pseudo-random placement strategy, or a halton sequence.
[21:00] <NaioN> but as gregaf normally you only read and write to the primary placement group and not to the replica's
[21:01] <gregaf> NaioN: there's not primary placement groups ?????objects are mapped to a placement group and then the placement group gets replicated to a sequence of OSDs
[21:01] <gregaf> gotta run now, lunch
[21:02] <nhm> gregaf: yeah, just read that on the link TV provided. Clears it up.
[21:02] <NaioN> gregaf: ok, but there is a first pg? because that one replicates further?
[21:02] <NaioN> not the client...
[21:02] <nhm> NaioN: basically the placement group is just a collection of OSDs. The first OSD in the placement group is the primary, and the rest are secondaries.
[21:03] <nhm> So with lots of placement groups you have different combinations of primaries and secondaries.
[21:03] <NaioN> yeah ok but if you look to a pg as a pool of osd's the pool sometimes reads OSD1, OSD2 and sometimes OSD2, OSD1
[21:04] <perplexed> Thx for the link, I'll check it out.
[21:04] <NaioN> so in the first one OSD1 is the first one and in the second OSD2 is the first one
[21:04] <NaioN> and gets the writes and reads for the given object
[21:05] <Tv|work> yes it's a list not a bag or a set
[21:05] <nhm> NaioN: If I'm reading the docs correctly, the first OSD in the placement group should always be the primary. I confess though that I'm not familiar with the code in question.
[21:06] <NaioN> yeah ok, but as I said: does the pg sometimes read OSD1, OSD2 and sometimes OSD2, OSD1?
[21:06] <Tv|work> NaioN: yes
[21:06] <NaioN> because in that case it doesn't matter if you look it from the pg side or the osd side
[21:06] <NaioN> it's the same
[21:06] <NaioN> although I understand you have to code it in one way
[21:09] <NaioN> so if I'm correct it's coded in this way: Object ID maps to a PG (with the hash fuction) -> PG is a list of OSD's -> Client knows which OSD to contact first
[21:09] <Tv|work> NaioN: read the link
[21:16] * MarkDude (~MT@c-71-198-138-155.hsd1.ca.comcast.net) has joined #ceph
[21:29] <sagewk> wido: there?
[21:37] * LarsFronius (~LarsFroni@testing78.jimdo-server.com) Quit (Quit: LarsFronius)
[21:56] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[22:01] * Theuni (~Theuni@ has joined #ceph
[22:18] * nhm (~nh@ Quit (Read error: Connection reset by peer)
[22:23] * nhm (~nh@ has joined #ceph
[22:28] * lofejndif (~lsqavnbok@659AAA3IB.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:30] * dmick (~dmick@aon.hq.newdream.net) has left #ceph
[22:32] * cattelan_away is now known as cattelan
[22:33] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:33] * f4m8_ (f4m8@kudu.in-berlin.de) Quit (Ping timeout: 480 seconds)
[22:34] * adamcrume (~quassel@adsl-99-115-83-144.dsl.pltn13.sbcglobal.net) has joined #ceph
[22:37] <adamcrume> I'm having trouble running autogen.sh. I get the error: "autoreconf: `configure.ac' or `configure.in' is required". Has anyone else seen this?
[22:38] <sagewk> is the leveldb submodule present? (git submodule init ; git submodule update)
[22:38] <Tv|work> adamcrume: git submodule init && git submodule update
[22:39] <adamcrume> That might be the problem.
[22:42] <adamcrume> I don't see that in the wiki; I'll try to add it.
[22:44] <adamcrume> Thanks. It looks like that fixed it.
[23:12] <elder> Does teuthology install the kernel using the debian package?
[23:13] <nhm> elder: I believe so
[23:14] <sagewk> yes
[23:14] <elder> Do you know where exactly that package resides, and where in the source the building of that package sits?
[23:14] <elder> Is it just the ceph-client tree?
[23:15] <sagewk> autobuild-ceph.git/build-kernel.sh builds it
[23:15] <sagewk> the gitbuilder results are all on gitbuilder.ceph.com
[23:15] <yehudasa> gregaf: want to take a look at wip-intent-fixes?
[23:15] <elder> Just found that.
[23:15] <elder> Thanks.
[23:15] <gregaf> give me a bit, looking over that rstat bug report
[23:17] * BManojlovic (~steki@ Quit (Remote host closed the connection)
[23:29] * BManojlovic (~steki@ has joined #ceph
[23:35] * adamcrume (~quassel@adsl-99-115-83-144.dsl.pltn13.sbcglobal.net) Quit (Remote host closed the connection)
[23:35] * lofejndif (~lsqavnbok@659AAA3IB.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[23:42] * lofejndif (~lsqavnbok@04ZAACBN2.tor-irc.dnsbl.oftc.net) has joined #ceph
[23:44] * aliguori (~anthony@ Quit (Remote host closed the connection)
[23:53] <imjustmatthew> gregaf: you around?
[23:58] <gregaf> imjustmatthew: here!
[23:58] <imjustmatthew> cool
[23:59] <imjustmatthew> I can send you the whole log, or I can pull out just whtever you need if you can give me an idea of what I'm looking for
[23:59] <gregaf> probably easiest to just do the whole log, if that's practical for you

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.