#ceph IRC Log

Index

IRC Log for 2012-08-08

Timestamps are in GMT/BST.

[0:00] * s[X] (~sX]@ppp59-167-154-113.static.internode.on.net) has joined #ceph
[0:00] <gregaf> JohnS50: what seems to be the problem?
[0:01] <JohnS50> i can start the services fine manually, but they don't auto start when I reboot
[0:02] <gregaf> okay, sounds like an init system config setting (which I don't know the rules to) ??? have you checked to see if they're set to auto-start?
[0:03] <JohnS50> as far as I can tell, they are set to start and stop on the appropriate run levels
[0:03] <JohnS50> is there a good place to look for errors, or a way to turn on more logging?
[0:04] <gregaf> well, there are logs by default located in /var/log/ceph/
[0:05] <gregaf> you can turn up debugging via the ceph.conf ??? see http://ceph.com/wiki/Debugging
[0:06] <JohnS50> it looks like it isn't even trying to start (no success or failure messages)
[0:06] <JohnS50> I'll crank up the debugging and test it tomorrow.
[0:07] <Tv_> JohnS50: can you pastebin your ceph.conf?
[0:07] <JohnS50> no - sorry - long story..
[0:08] * s[X] (~sX]@ppp59-167-154-113.static.internode.on.net) Quit (Remote host closed the connection)
[0:09] * s[X] (~sX]@ppp59-167-154-113.static.internode.on.net) has joined #ceph
[0:09] <JohnS50> I'll try some more and be back tomorrow - gotta go - thanks!
[0:10] * JohnS50 (~quassel@71-86-129-2.static.stls.mo.charter.com) has left #ceph
[0:10] * MarkDude (~MT@ip-64-134-223-15.public.wayport.net) Quit (Quit: #BAMF )
[0:13] * BManojlovic (~steki@212.200.241.96) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:34] <sjust1> nhm, mikeryan: I sent you an initial small write latency histogram, apparently, it shows up even without creates
[0:34] <sjust1> it being the long latency writes
[0:37] * nhm (~nh@184-97-251-210.mpls.qwest.net) has joined #ceph
[0:39] * MarkDude (~MT@c-98-210-253-235.hsd1.ca.comcast.net) has joined #ceph
[0:39] <mikeryan> sjust1: is that a histogram or a cdf?
[0:39] <mikeryan> also axis labels?
[0:45] <sjust1> cdf, x is latency(seconds)
[0:48] * tnt (~tnt@45.124-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[0:49] * allsystemsarego (~allsystem@188.27.167.83) Quit (Quit: Leaving)
[1:02] <nhm> sjust1: you mentioned that workload is similar to the workloadgen with a few hundred objects. How many creates is your test doing over how long?
[1:02] <sjust1> none. it precreates the objects before hand
[1:02] <sjust1> if you run workloadgen with a smallish number of objects for long enough, the objects will mostly have been created
[1:11] <mikeryan> i noticed with the rbd killa the OSDs tended to spin at 100%
[1:11] <mikeryan> even long after the fio process died
[1:12] <mikeryan> i didn't notice if it was CPU-bound or waiting on IO
[1:19] * maelfius (~Adium@66.209.104.107) has joined #ceph
[1:21] * Tv_ (~tv@2607:f298:a:607:d976:71b0:669f:be18) Quit (Quit: Tv_)
[1:21] * Tv_ (~tv@38.122.20.226) has joined #ceph
[1:23] * Concubidated (~Adium@12.248.40.138) Quit (Ping timeout: 480 seconds)
[1:24] <maelfius> Hello everyone :).
[1:25] <sjust1> hi
[1:26] <dmick> hello maelfius
[1:27] <maelfius> I had a quick question regarding ceph-mon (for 0.48argonaut) if anyone can provide some assistance. Whenever I start up ceph-mon it seems to spin up and eat all available ram on the system until it gets a bad:alloc error due to no more ram available. I tried to find something in the bug tracker, but didn't see anything that seemed to match. I figured someone here might have some insight and/or a direction to look at. (This is using the 12.04 packages provi
[1:27] <sjust1> gregaf: any thoughts?
[1:28] <dmick> maelfius: does the monitor logfile give any hints?
[1:29] <maelfius> the log file stops logging anything after:
[1:29] <maelfius> 2012-08-07 23:16:24.826869 7f2e65288780 1 mon.1@0(probing) e0 win_standalone_election
[1:29] <maelfius> 2012-08-07 23:16:24.826920 7f2e65288780 0 log [INF] : mon.1@0 won leader election with quorum 0
[1:29] <maelfius> (I'm running it as a single mon for an initial test)
[1:29] <maelfius> with 3 OSDs on a separate system.
[1:29] <dmick> ok
[1:30] <maelfius> running it in the foregropund with logging to STDERR it ends up getting
[1:30] <maelfius> erminate called after throwing an instance of 'std::bad_alloc'
[1:30] <maelfius> what(): std::bad_alloc
[1:30] <maelfius> *** Caught signal (Aborted) **
[1:30] <maelfius> in thread 7f2e65288780
[1:30] <maelfius> *** Caught signal (Segmentation fault) **
[1:30] <maelfius> in thread 7f2e65288780
[1:30] <maelfius> ceph version 0.48argonaut (commit:c2b20ca74249892c8e5e40c12aa14446a2bf2030)
[1:31] <maelfius> with a bunch more of the stack trace (I can paste if it would help), but I don't want to dump tons of info in here and spam everyone if it isn't needed.
[1:36] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[1:43] * Tv_ (~tv@38.122.20.226) Quit (Quit: Tv_)
[1:49] * izdubar (~MT@c-98-210-253-235.hsd1.ca.comcast.net) has joined #ceph
[1:50] <dmick> I don't offhand know of reasons ceph-mon would chew up memory, but clearly something's going wrong. Can you try increasing the log level and see if it puts out some status about what it's doing/
[1:50] <dmick> ?
[1:54] * MarkDude (~MT@c-98-210-253-235.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[1:54] <maelfius> sure.
[1:56] <maelfius> huh. looks like it's just spamming tons of this:
[1:56] <maelfius> 2012-08-07 23:55:43.678217 7ff455e80780 10 mon.1@0(leader).pg v1 register_new_pgs will create 0.103a08
[1:56] <maelfius> 2012-08-07 23:55:43.678221 7ff455e80780 10 mon.1@0(leader).pg v1 register_new_pgs will create 0.103a09
[1:56] <maelfius> 2012-08-07 23:55:43.678225 7ff455e80780 10 mon.1@0(leader).pg v1 register_new_pgs will create 0.103a0a
[1:56] <maelfius> 2012-08-07 23:55:43.678226 7ff455e80780 10 mon.1@0(leader).pg v1 register_new_pgs will create 0.103a0b
[1:56] <maelfius> 2012-08-07 23:55:43.678227 7ff455e80780 10 mon.1@0(leader).pg v1 register_new_pgs will create 0.103a0c
[1:56] <maelfius> 2012-08-07 23:55:43.678229 7ff455e80780 10 mon.1@0(leader).pg v1 register_new_pgs will create 0.103a0d
[1:57] <maelfius> yep. spams that until it runs the system out of memory.
[1:57] <maelfius> obv. with different numbers on the end)
[1:57] <dmick> yeah, ok, that's a clue
[1:58] <maelfius> and right before it starts the register_new_pgs it does:
[1:58] <maelfius> 2012-08-07 23:55:39.507927 7ff455e80780 10 mon.1@-1(probing).mds e1 update_logger
[1:58] <maelfius> 2012-08-07 23:55:39.882879 7ff455e80780 7 mon.1@-1(probing).osd e0 update_from_paxos loading latest full map e1
[1:58] <maelfius> 2012-08-07 23:55:40.635794 7ff455e80780 10 mon.1@-1(probing).osd e1 send_to_waiting 1
[1:58] <maelfius> 2012-08-07 23:55:40.635872 7ff455e80780 10 mon.1@-1(probing).osd e1 update_logger
[1:58] <maelfius> 2012-08-07 23:55:40.637972 7ff455e80780 10 mon.1@-1(probing).auth v0 update_from_paxos()
[1:58] <maelfius> 2012-08-07 23:55:40.638008 7ff455e80780 10 mon.1@-1(probing) e0 loading initial keyring to bootstrap authentication for mkfs
[1:58] <maelfius> 2012-08-07 23:55:40.638550 7ff455e80780 10 mon.1@-1(probing) e0 bootstrap
[1:58] <maelfius> 2012-08-07 23:55:40.638560 7ff455e80780 10 mon.1@-1(probing) e0 unregister_cluster_logger - not registered
[1:58] <maelfius> 2012-08-07 23:55:40.638562 7ff455e80780 10 mon.1@-1(probing) e0 cancel_probe_timeout (none scheduled)
[1:59] <maelfius> 2012-08-07 23:55:40.638565 7ff455e80780 0 mon.1@-1(probing) e0 my rank is now 0 (was -1)
[1:59] <maelfius> 2012-08-07 23:55:40.638911 7ff455e80780 10 mon.1@0(probing) e0 reset
[1:59] <maelfius> 2012-08-07 23:55:40.638992 7ff455e80780 1 mon.1@0(probing) e0 win_standalone_election
[1:59] <maelfius> 2012-08-07 23:55:40.639001 7ff455e80780 10 mon.1@0(probing) e0 reset
[1:59] <maelfius> 2012-08-07 23:55:40.639004 7ff455e80780 10 mon.1@0(leader) e0 win_election, epoch 1 quorum is 0 features are 65535
[1:59] <maelfius> 2012-08-07 23:55:40.639028 7ff455e80780 0 log [INF] : mon.1@0 won leader election with quorum 0
[1:59] <maelfius> 2012-08-07 23:55:40.639047 7ff455e80780 10 mon.1@0(leader).pg v1 create_pending v 2
[1:59] <maelfius> 2012-08-07 23:55:40.639051 7ff455e80780 10 mon.1@0(leader).pg v1 check_osd_map applying osdmap e1 to pg_map
[1:59] <maelfius> 2012-08-07 23:55:40.684245 7ff450034700 10 mon.1@0(leader) e0 ms_verify_authorizer 172.31.226.10:6800/3713 mds protocol 0
[1:59] <maelfius> 2012-08-07 23:55:40.764376 7ff455e80780 10 mon.1@0(leader).pg v1 register_new_pgs checking pg pools for osdmap epoch 1, last_pg_scan 0
[1:59] <maelfius> 2012-08-07 23:55:40.764418 7ff455e80780 10 mon.1@0(leader).pg v1 register_new_pgs scanning pool 0 rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 64064256 pgp_num 64064256 last_change 1 owner 0 crash_replay_interval 45
[2:01] <sjust1> maelfius: pg_num/pgp_num appear to be around 64 million
[2:01] <sjust1> how many osds do you have
[2:01] <sjust1> ?
[2:01] <dmick> 3
[2:01] <maelfius> 3.
[2:02] <sjust1> that number should be closer to around 1024?
[2:02] <dmick> did you explicitly set the number of pgs high when creating the cluster?
[2:02] <maelfius> dmick: no
[2:02] <maelfius> my config is fairly simple
[2:03] <sjust1> well, that does appear to be your problem
[2:03] <sjust1> hmm
[2:03] <maelfius> i do have fairly high ID numbers for the osd IDs
[2:03] <sjust1> ah
[2:04] <sjust1> how high?
[2:04] <maelfius> 7digits?
[2:04] <sjust1> yeah, that'll do it
[2:04] <maelfius> so, keep the OSD ids low
[2:04] <sjust1> you've hit an unfortunate implementation detail, the number of osds is pretty much defined to be the largest osd number
[2:04] <maelfius> aha
[2:04] <maelfius> good to know.
[2:04] <maelfius> I'll rethink the numbering convention
[2:05] <sjust1> sorry about that
[2:05] <maelfius> was trying to make it so i could use padded #'s for id of node/disk
[2:05] <sjust1> yeah, it's a reasonable approach
[2:05] <maelfius> for automated configuration
[2:05] <maelfius> not a big deal. glad it was something easy to work with
[2:05] <maelfius> i shouldn't have any issues if I drop to say 3 digits total for now?
[2:06] <sjust1> it's going to generate too many pgs, you'll have to override it
[2:06] <maelfius> ah.
[2:06] <sjust1> when you create the initial osdmap, you'll need to specify the number of pgs
[2:06] <maelfius> ok
[2:06] <sjust1> 200/osd is a good guess
[2:06] <maelfius> good to know
[2:06] <maelfius> that makes it easier if I set it then
[2:06] <maelfius> works for me :)
[2:07] <maelfius> thanks for your help! :) next i was going to head into the code to see how it worked, but that was going to take a good deal longer than this did. Really appriciate the help! :)
[2:10] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:10] * lofejndif (~lsqavnbok@28IAAGNBI.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[2:10] <sjust1> no problem!
[2:12] <joshd> yehuda, elder: was one of you going to mention the fsx failure with the bio memory leak on the list?
[2:12] <joshd> err, yehudasa
[2:12] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[2:13] * bchrisman (~Adium@108.60.121.114) Quit (Quit: Leaving.)
[2:17] * JJ (~JJ@12.248.40.138) Quit (Quit: Leaving.)
[2:17] <joshd> yehudasa, elder: oops, I meant xfstests
[2:20] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:23] <maelfius> sjust1: something like this looks a bit more sane for the osdmap in the smallish cluster
[2:24] <maelfius> pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 2304 pgp_num 576 last_change 0 owner 0 crash_replay_interval 45
[2:24] * lxo (~aoliva@lxo.user.oftc.net) Quit ()
[2:24] <maelfius> small/test that is.
[2:25] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:33] <dmick> maelfius: yes
[2:40] <dmick> maelfius: just out of curiosity, did you use mkcephfs to set up the cluster?
[2:41] <maelfius> dmick, initially, i am in the process of redoing it by hand
[2:42] <maelfius> since this was just a test environment, it was a quick way to get setup
[2:42] <dmick> sure. It's a perfectly fine way to set up a test cluster, just making sure.
[2:43] <maelfius> with a bigger cluster, I can see why mkcephfs wouldn't be very efficient and might make some bad assumptions.
[3:04] <elder> joshd, I'll come talk to you for more context in a few minutes.
[3:05] <joshd> ok
[3:21] * dmick (~dmick@2607:f298:a:607:58ae:bde9:7d72:af37) Quit (Quit: Leaving.)
[3:22] * ivan` (~ivan`@li125-242.members.linode.com) Quit (Quit: ERC Version 5.3 (IRC client for Emacs))
[3:22] * chutzpah (~chutz@100.42.98.5) Quit (Quit: Leaving)
[3:26] * ivan` (~ivan`@li125-242.members.linode.com) has joined #ceph
[3:30] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[3:31] <nhm> mikeryan: I think there is a good chance that cpu utilization is IO wait while the disk seeks trying to deal with the backlog of ops. If you are using teuthology, the collectl task will record a report for you and you can play it back in disk mode with "collectl -p <file> -sD -oT
[3:31] <nhm> "
[3:58] * womble (~mjp16@mpalmer-1-pt.tunnel.tserv8.dal1.ipv6.he.net) has joined #ceph
[4:02] <womble> Apologies if this is covered in a doc I haven't found yet, but does Ceph have the ability to know that some storage locations are slower than others? I'm thinking tiered storage, specifically, but the general mechanism would be "prefer to read your blocks from this location, and only get a copy from over here if you don't have another option"
[4:02] <womble> So, in the event of a rebuild being required, you would read off the slow storage, and otherwise you'd read off the fast storage
[4:03] <ajm> tiering wasn't implemented last i looked
[4:03] <ajm> it probably existing on some roadmap somewhere.
[4:04] <womble> ajm: OK, thanks.
[4:04] <mikeryan> womble: that's the long and short of it
[4:04] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) Quit (Quit: themgt)
[4:04] <nhm> womble: I don't think we have that capability yet. right now I think the best you can do is specify custom crush maps for how objects should be distributed or alternate pools.
[4:05] <mikeryan> that's right, the most intelligent thing you can do is create a pool of slow storage and a pool of fast storage
[4:05] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) has joined #ceph
[4:05] <womble> Fair enough. It's not a deal-breaker, I just thought it'd be a nice bonus to wave at people and say "see, Ceph is *THIS* awesome!", when I convince everyone we should use it.
[4:05] <womble> mikeryan: That sounds like it might do
[4:06] <womble> I'm happy to manually define priorities / labels / whatever's required -- I don't need Ceph to benchmark my storage or anything
[4:06] <womble> nhm: I don't suppose you've got a link off-hand where I could learn all about crush maps and object distribution policies, do you?
[4:07] * themgt (~themgt@cpe-74-78-71-246.maine.res.rr.com) Quit ()
[4:08] <nhm> womble: we've got some documentation on our wiki, though I confess that I don't know how recent it is: http://ceph.com/wiki/Custom_data_placement_with_CRUSH
[4:08] <womble> Spooky... Google just served that page up for me. Also http://ceph.com/docs/master/ops/manage/crush/ looks useful, and http://ceph.com/wiki/OSD_cluster_expansion/contraction will come in handy sooner or later.
[4:10] <nhm> womble: I haven't actually personally done custom crush placements, but if you end up having questions many of the guys around here during the day should be able to help out.
[4:10] <womble> nhm: Thanks, I'll bear that in mind.
[4:10] <nhm> good deal. Off to watch some babylon 5. Good luck!
[4:52] * izdubar (~MT@c-98-210-253-235.hsd1.ca.comcast.net) Quit (Quit: #BAMF )
[5:00] * adjohn (~adjohn@69.170.166.146) Quit (Quit: adjohn)
[5:09] * jefferai (~quassel@quassel.jefferai.org) Quit (Ping timeout: 480 seconds)
[5:37] * maelfius (~Adium@66.209.104.107) Quit (Quit: Leaving.)
[6:10] * cattelan (~cattelan@2001:4978:267:0:21c:c0ff:febf:814b) has joined #ceph
[6:17] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[6:50] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:15] * cattelan (~cattelan@2001:4978:267:0:21c:c0ff:febf:814b) Quit (Ping timeout: 480 seconds)
[8:42] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[8:44] * deepsa_ (~deepsa@122.167.169.8) has joined #ceph
[8:44] * deepsa (~deepsa@122.172.171.208) Quit (Ping timeout: 480 seconds)
[8:44] * deepsa_ is now known as deepsa
[8:53] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[8:58] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[9:05] * EmilienM (~EmilienM@arc68-4-88-173-120-14.fbx.proxad.net) Quit (Remote host closed the connection)
[9:08] * loicd (~loic@brln-4dbc32b5.pool.mediaWays.net) has joined #ceph
[9:21] * s[X] (~sX]@ppp59-167-154-113.static.internode.on.net) Quit (Remote host closed the connection)
[9:23] * deepsa (~deepsa@122.167.169.8) Quit ()
[9:25] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:37] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) Quit (Ping timeout: 480 seconds)
[9:38] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[9:38] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[9:41] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[9:43] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:44] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[9:46] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[9:48] * EmilienM (~EmilienM@arc68-4-88-173-120-14.fbx.proxad.net) has joined #ceph
[10:13] * deepsa (~deepsa@122.172.18.30) has joined #ceph
[10:27] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[11:25] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[11:27] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[11:27] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Read error: Connection reset by peer)
[11:28] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[11:28] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[11:38] * alexxy (~alexxy@masq119.gtn.ru) has joined #ceph
[11:45] * alexxy (~alexxy@masq119.gtn.ru) Quit (Read error: Operation timed out)
[12:15] * fiddyspence (~fiddyspen@94-192-234-112.zone6.bethere.co.uk) has joined #ceph
[12:15] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[12:19] * asadpanda (~asadpanda@2001:470:c09d:0:20c:29ff:fe4e:a66) has joined #ceph
[12:26] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[12:32] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[12:38] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[12:45] * fiddyspence (~fiddyspen@94-192-234-112.zone6.bethere.co.uk) Quit (Quit: Leaving.)
[13:01] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[13:07] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[13:30] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[13:55] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Ping timeout: 480 seconds)
[13:59] * deepsa (~deepsa@122.172.18.30) Quit (Quit: Computer has gone to sleep.)
[14:39] * BManojlovic (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[14:40] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[14:40] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) has joined #ceph
[14:53] * s[X] (~sX]@ppp59-167-157-96.static.internode.on.net) Quit (Remote host closed the connection)
[14:55] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[14:57] * Kioob (~kioob@luuna.daevel.fr) Quit (Read error: Operation timed out)
[14:59] * fiddyspence (~fiddyspen@94-192-234-112.zone6.bethere.co.uk) has joined #ceph
[15:20] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[15:27] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:27] * NashTrash (~Adium@mobile-166-147-116-209.mycingular.net) has joined #ceph
[15:31] * deepsa (~deepsa@122.172.18.30) has joined #ceph
[16:03] * tjpatter (~tjpatter@c-68-62-88-148.hsd1.mi.comcast.net) has joined #ceph
[16:04] <tjpatter> Newbie question, hoping someone could shed some light for me. How does the OSD daemon know what drive or partition to use as its back-end data store? I've read the docs but this isn't clear to me.
[16:05] <tjpatter> Rephase: OSD, not "OSD daemon" (double word)
[16:09] * deepsa (~deepsa@122.172.18.30) Quit (Quit: Computer has gone to sleep.)
[16:16] <tnt> it's in the config file ...
[16:18] <tjpatter> But what option under the OSD config?
[16:19] <tjpatter> I've read through all of the settings in the docs online??? Is there a complete example ceph.conf somewhere I could reference?
[16:23] <tjpatter> Okay, nm. Found one in /usr/share/ceph
[16:37] * NashTrash (~Adium@mobile-166-147-116-209.mycingular.net) Quit (Quit: Leaving.)
[17:05] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:24] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Ping timeout: 480 seconds)
[17:26] * loicd1 (~loic@brln-4d0cc179.pool.mediaWays.net) has joined #ceph
[17:31] * loicd (~loic@brln-4dbc32b5.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[17:35] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[17:40] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:44] * deepsa (~deepsa@122.172.18.30) has joined #ceph
[17:44] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[17:45] * tnt (~tnt@212-166-48-236.win.be) Quit (Read error: Operation timed out)
[17:52] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[17:54] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[17:58] * Tv_ (~tv@2607:f298:a:607:d976:71b0:669f:be18) has joined #ceph
[18:00] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[18:01] * tnt (~tnt@45.124-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:08] * maelfius (~Adium@pool-71-160-33-115.lsanca.fios.verizon.net) has joined #ceph
[18:27] * bchrisman (~Adium@108.60.121.114) has joined #ceph
[18:31] * BManojlovic (~steki@212.200.241.96) has joined #ceph
[18:35] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[18:37] * adjohn (~adjohn@69.170.166.146) has joined #ceph
[18:38] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[18:39] <flakrat> is there a PDF of the Ceph documentation, or is ceph.com/docs/master it?
[18:40] <gregaf> nope, no pdf
[18:40] <gregaf> although there are probably tools to turn .rst into PDFs, if you wanted to make one
[18:40] <flakrat> I'll just cache the pages to take them on the go :-)
[18:40] <joshd> pretty sure sphinx can generate pdf
[18:42] * maelfius (~Adium@pool-71-160-33-115.lsanca.fios.verizon.net) Quit (Quit: Leaving.)
[18:44] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[18:48] <Tv_> sphinx can generate pdf, i don't think we've burned any effort at making it pretty
[18:50] <Tv_> yeah we're missing configuration, latex_documents etc
[18:53] <Tv_> and our ditaa diagrams make it fail
[19:02] <Tv_> well i just commented out the ditaa stuff and i'm now looking at a 267-page book
[19:04] <Tv_> http://tracker.newdream.net/issues/2920
[19:12] <Tv_> and http://tracker.newdream.net/issues/2921 for the hell of it
[19:13] * chutzpah (~chutz@100.42.98.5) has joined #ceph
[19:17] * Cube (~Adium@12.248.40.138) has joined #ceph
[19:20] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:27] * deepsa (~deepsa@122.172.18.30) Quit (Quit: Computer has gone to sleep.)
[19:31] * dmick (~dmick@2607:f298:a:607:4cd9:fe1c:42bd:84be) has joined #ceph
[19:31] <dmick> morning all
[19:33] <joao> hi Dan :)
[19:34] <dmick> How's life in Iberia?
[19:36] <yehudasa> gregaf: wip-2841, wip-2869, wip-2877, wip-2878, wip-2879 need some review ...
[19:38] * sileht (~sileht@sileht.net) Quit (Ping timeout: 480 seconds)
[19:39] * sileht (~sileht@sileht.net) has joined #ceph
[19:40] <joao> dmick, warm :)
[19:40] <joao> kind of reminds me of LA back in May :p
[19:41] <dmick> Yeah, it's *hot* in LA today
[19:41] <dmick> first really serious summer day yesterday IMO
[19:43] <joao> I was going with "hot", but considering that it's roughly 84F, I thought that those temperatures would be mid-winter for you guys :p
[19:43] <dmick> heh
[19:43] * Meths (rift@2.25.212.34) Quit (Remote host closed the connection)
[19:44] <joao> (and yes, I used google to get the temperature in Fahrenheit)
[19:44] <dmick> heh. well it's reportedly 84F here right now, but I'm sure it's still rising
[19:45] * EmilienM (~EmilienM@arc68-4-88-173-120-14.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[19:45] * yehudasa (~yehudasa@2607:f298:a:607:1420:29b1:501c:3f82) Quit (Ping timeout: 480 seconds)
[19:45] * EmilienM (~EmilienM@arc68-4-88-173-120-14.fbx.proxad.net) has joined #ceph
[19:47] <nhm> today it's finally cool after a very hot summer. About 72f.
[19:50] * Meths (rift@2.25.212.34) has joined #ceph
[19:50] <Tv_> sjust1: your samuelj-2012-08-07_17:12:00-regression-wip_deep_scrub4-testing-basic/5524 run borked due to networking
[19:51] <sjust1> meh, it was borked anyway
[19:55] * yehudasa (~yehudasa@2607:f298:a:607:d175:c595:b735:d9a1) has joined #ceph
[19:56] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[20:01] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Ping timeout: 480 seconds)
[20:06] * alexxy (~alexxy@masq119.gtn.ru) has joined #ceph
[20:10] * Kioob (~kioob@luuna.daevel.fr) has joined #ceph
[20:10] * alexxy[home] (~alexxy@2001:470:1f14:106::2) has joined #ceph
[20:14] * alexxy (~alexxy@masq119.gtn.ru) Quit (Ping timeout: 480 seconds)
[20:15] * maelfius (~Adium@66.209.104.107) has joined #ceph
[20:25] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:35] * jluis (~JL@89.181.159.77) has joined #ceph
[20:36] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[20:41] * joao (~JL@89.181.156.247) Quit (Ping timeout: 480 seconds)
[20:50] * nhm (~nh@184-97-251-210.mpls.qwest.net) Quit (Ping timeout: 480 seconds)
[20:56] * maelfius (~Adium@66.209.104.107) Quit (Quit: Leaving.)
[21:29] * lofejndif (~lsqavnbok@19NAABLFR.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:45] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:59] * nhm (~nh@184-97-251-210.mpls.qwest.net) has joined #ceph
[22:10] * glowell1 (~Adium@38.122.20.226) has joined #ceph
[22:10] * glowell (~Adium@38.122.20.226) Quit (Read error: Connection reset by peer)
[22:13] * glowell (~Adium@38.122.20.226) has joined #ceph
[22:13] * glowell1 (~Adium@38.122.20.226) Quit (Read error: Connection reset by peer)
[22:13] * glowell1 (~Adium@38.122.20.226) has joined #ceph
[22:19] * glowell2 (~Adium@38.122.20.226) has joined #ceph
[22:19] * glowell (~Adium@38.122.20.226) Quit (Read error: Connection reset by peer)
[22:26] * glowell1 (~Adium@38.122.20.226) Quit (Ping timeout: 480 seconds)
[22:32] * loicd1 (~loic@brln-4d0cc179.pool.mediaWays.net) Quit (Read error: Operation timed out)
[22:52] * sagelap1 (~sage@38.122.20.226) has joined #ceph
[22:53] <sagelap1> can someone take a look at wip-keyring2 before i merge it into stable?
[22:53] <sagelap1> (tested it out, looks good)
[22:53] <sagelap1> yehudasa, gregaf: ^
[22:56] <gregaf> sagelap1: looking at it
[22:59] <gregaf> not obviously broken to me (although the testing is worth more ;) )
[23:00] <sagelap1> gregaf: k thanks
[23:00] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[23:08] * loicd (~loic@brln-4d0cc179.pool.mediaWays.net) has joined #ceph
[23:08] <yehudasa> sagelap1: should wip-2869 go into argonout?
[23:09] <yehudasa> "rgw: expand date format support", basically adding UTC to date
[23:09] <yehudasa> in addtion to GMT
[23:13] <dmick> I vote yes fwiw
[23:15] * sagelap1 (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[23:16] <yehudasa> dmick: thanks
[23:23] * lofejndif (~lsqavnbok@19NAABLFR.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[23:24] <Leseb> yo guys
[23:24] <Leseb> simple question
[23:24] <Leseb> why this parameter should belong in the [MON] section of ceph.conf? --> osd pool default size
[23:25] <wido> Leseb: where did you find that?
[23:25] <Leseb> just tested it
[23:25] <Leseb> I first put it in [OSD], didn't work
[23:26] <wido> Hmm, I thought that on my system I recently moved it to the osd section
[23:26] <Leseb> find some stuff into the ML and tried [MON] and it works
[23:26] <wido> Ah, no, you're right, it's in the mon section on my cluster as well
[23:26] <Leseb> I'm not sure why
[23:27] <wido> I checked the code, it's in the mon code in prepare_new_pool
[23:27] <dmick> it could be because pool creation is actually driven by the monitors?
[23:27] <wido> dmick: Indeed, see src/mon/OSDMonitor.cc
[23:28] <dmick> seems odd that it would be named "osd_" then, but...yeah.
[23:28] <wido> So the name is a bit confusing since you'd assume that osd_* is osd section
[23:28] <darkfader> it's technically correct and still confusing hehe
[23:28] <dmick> heh, I'll just let wido talk now :
[23:28] <dmick> )
[23:28] <gregaf> yeah, that's my fault and I spent some time thinking about it but didn't like any other names better???.sorry :/
[23:28] <wido> gregaf: tnx! ;)
[23:29] <wido> Leseb: there's your answer
[23:29] <Leseb> I'm reading :)
[23:30] <Leseb> haha ok thx guys!
[23:31] <Leseb> quite confusing but I get it :)
[23:33] <darkfader> http://tracker.newdream.net/issues/2423 did you really kick gceph
[23:33] <darkfader> ?
[23:33] * fiddyspence (~fiddyspen@94-192-234-112.zone6.bethere.co.uk) Quit (Quit: Leaving.)
[23:33] <dmick> darkfader: yes
[23:34] <Tv_> darkfader: have you seen American History X?
[23:34] <dmick> lol
[23:34] <darkfader> Tv_: ages ago i think
[23:34] <Tv_> so, not as much kick as curb stomp
[23:34] <darkfader> well, i personally don't care
[23:34] <dmick> I hear gregaf moaning from the other office
[23:34] <dmick> :)
[23:34] <darkfader> meh :)
[23:34] <gregaf> ewwwwwww
[23:34] <gregaf> *cry*
[23:35] <darkfader> gceph was AWESOME for giving the ceph class
[23:35] <darkfader> (the 10 seconds till it crashed)
[23:35] <gregaf> haha
[23:35] <gregaf> well, it should be pretty easy to rewrite in python ;)
[23:35] <Tv_> extract json, show in a browser...
[23:36] <darkfader> if i do anythign like that i'll invest the time in the monitoring stuff for check_mk to get good ceph monitoring / stats
[23:36] <Tv_> darkfader: check_mk?
[23:36] <darkfader> but for beginners it was really nice the way it showed the single osds and all that
[23:36] <Tv_> oh nagios
[23:37] <darkfader> i can explain 30 minutes how ceph works and they stare, and then show them gceph and they understand ceph before it crashes :)
[23:37] <darkfader> Tv_: people mock that nagios is a check_mk addon :)
[23:37] <darkfader> officially it's the other way round
[23:38] <darkfader> anyway it's ok i just need to make better slides till next time
[23:40] <darkfader> if i may say, the bug track is damn tidy for an oss project
[23:40] <darkfader> very nice
[23:42] <dmick> glad you like it. I find Redmine pretty usable
[23:42] <Leseb> do you guys used this parameter "mon clock drift allowed" to avoid message like" [WRN] message from mon.0 was stamped 0.065721s in the future, clocks not synchronized" ?
[23:42] <Tv_> dmick: it's a decent implementation of a flawed process ;)
[23:43] <darkfader> hehe
[23:43] <dmick> Tv_: I ignore the process anyway and just use it to track bugs :)
[23:43] <darkfader> i meant it's well-maintained
[23:43] <dmick> Leseb: you can, but, if that's a sign NTP isn't working right, you're better off diagnosing/fixing that
[23:44] <dmick> the cluster really prefers reasonable time sync
[23:44] <gregaf> (although 6 hundredths of a second is fine on the time scales we use)
[23:44] <darkfader> ceph might be a good reason to enable ntp orphan mode
[23:44] <darkfader> i.e. all mons as ntp servers in orphan mode and all osds as clients to them
[23:44] <darkfader> then it's indestructible
[23:45] <darkfader> we use that for air traffic control
[23:45] <darkfader> if all sources fail the servers just agree on some time and keep that stable
[23:45] <darkfader> and phase back into "real" time when the sources come back
[23:46] <Leseb> ok thank you, so should I let the defat option or change to something like "1"?
[23:46] <Leseb> *default
[23:47] <Leseb> because all those WARN fulfill my logs...
[23:47] <gregaf> 1 would be too large; you could change it to .1 if you really didn't think you could get NTP any better
[23:47] <dmick> is your NTP set up and working correctly?
[23:48] <Leseb> it is but I will try to improve it
[23:48] <Leseb> last time I got this WARN, I restart the MON and it was ok
[23:48] <dmick> if it is, then the next step is likely as gregaf says
[23:49] * steki-BLAH (~steki@212.200.240.248) has joined #ceph
[23:55] * BManojlovic (~steki@212.200.241.96) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.