#ceph IRC Log

Index

IRC Log for 2013-10-18

Timestamps are in GMT/BST.

[0:00] <loicd> :-)
[0:01] <loicd> dmick: but ... ?
[0:01] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[0:01] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[0:01] <dmick> but what?
[0:02] * mikedawson (~chatzilla@23-25-46-107-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:02] * sagelap (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[0:02] <loicd> since you said you filed an issue I assumed it did not work as you want to but maybe it's unrelated to my question
[0:02] <dmick> oh. no, the issue I thought I filed was to document this better
[0:03] <loicd> ah !
[0:03] <dmick> sorry, I could have said that :)
[0:03] <dmick> I was hoping to just give you a tracker URL
[0:03] <dmick> you know, in fact, I'm sure I filed this, because I updated it. let me search harder.
[0:05] * a_ (~a@209.12.169.218) Quit (Quit: This computer has gone to sleep)
[0:05] <loicd> I think I get it. It gets formated by ceph-disk which then activates it and presumably talks to the mon to register a the new OSD ? Which would require the machine hosting the OSD to have enough rights to do an osd create. Am I getting close ?
[0:05] * rendar (~s@host200-180-dynamic.1-87-r.retail.telecomitalia.it) Quit ()
[0:06] * loicd reading ceph-disk actviate
[0:06] <dmick> yeah, the top comment is pretty useful
[0:07] * JustEra (~JustEra@ALille-555-1-102-208.w90-34.abo.wanadoo.fr) Quit (Quit: This computer has gone to sleep)
[0:07] <dmick> another wrinkle is that two different udev strategies exist, depending on how old the udev is
[0:07] <dmick> 95-ceph-osd.rules and 95-ceph-osd-alt.rules for the laggards
[0:07] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[0:08] <dmick> the latter uses ceph-disk-udev to make up for the missing udev functionality
[0:09] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[0:12] <loicd> I'm glad I finally understood the udev logic :-) thanks dmick !
[0:12] <loicd> s/stood/stand/
[0:13] <dmick> it's tricky
[0:16] * dmsimard (~Adium@2607:f748:9:1666:890d:6d98:9fc5:a429) Quit (Ping timeout: 480 seconds)
[0:17] * ScOut3R_ (~scout3r@dsl51B61603.pool.t-online.hu) has joined #ceph
[0:17] * ScOut3R (~scout3r@dsl51B61603.pool.t-online.hu) Quit (Read error: Connection reset by peer)
[0:18] * a (~a@209.12.169.218) has joined #ceph
[0:19] * a is now known as Guest2696
[0:24] * Guest2696 (~a@209.12.169.218) Quit (Quit: This computer has gone to sleep)
[0:25] <loicd> dmick: say I have a new disk, how would I trigger this magic ? I guess what I mean is : what should I do ( read ? ) to make it so ENV{ID_PART_ENTRY_TYPE}=="4fbd7e29-9d25-41b8-afd0-062c0ceff05d" sees that the corresponding partition is destined to be a data partition as listed in 95-ceph-osd.rules ?
[0:25] <dmick> the easy way is to set it up with ceph-deploy
[0:25] <dmick> if you want to do it manually, I recommend reading ceph-deploy
[0:25] <dmick> (which is really going to call out to ceph-disk)
[0:26] <loicd> I'm trying to figure out how puppet-ceph should handle this
[0:26] <dmick> ah, so you want to read the document which is not created
[0:26] <dmick> :)
[0:26] <loicd> ahahah
[0:26] <dmick> ceph-disk:prepare_dev()
[0:26] * loicd reading
[0:26] * sagelap (~sage@2607:f298:a:607:cb5:e9b5:6897:6977) has joined #ceph
[0:27] <loicd> cool crystal clear, thanks dmick
[0:27] <dmick> GPTs have 3 different UUIDs: a whole-disk, a 'partition type' (generic), and a 'partition' (unique to each partition)
[0:27] <dmick> the 'partition type' uuid is the key
[0:28] <dmick> what sgdisk calls --partition-guid
[0:28] <dmick> is not it
[0:28] <dmick> but rather --typecode
[0:28] <dmick> (--partition-guid is the "unique" one)
[0:28] * sleinen1 (~Adium@2001:620:0:26:94fc:24a6:8fd3:36ce) Quit (Quit: Leaving.)
[0:28] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[0:29] <loicd> cool
[0:29] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Read error: Connection reset by peer)
[0:30] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[0:33] * rudolfsteiner_ (~federicon@mail.bittanimation.com) has joined #ceph
[0:33] * Sara88 (~Sara@dynamic-adsl-78-14-186-198.clienti.tiscali.it) has joined #ceph
[0:33] * a_ (~a@209.12.169.218) has joined #ceph
[0:34] * rudolfsteiner (~federicon@mail.bittanimation.com) Quit (Read error: Operation timed out)
[0:34] * rudolfsteiner_ is now known as rudolfsteiner
[0:43] * Sara88 (~Sara@dynamic-adsl-78-14-186-198.clienti.tiscali.it) Quit (Quit: Leaving)
[0:44] * mattbenjamin (~matt@aa2.linuxbox.com) has left #ceph
[0:48] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[0:55] <ponyofdeath> hi, trying to do this ceph-deploy osd prepare --fs-type btrfs prod-ent-ceph01:sdb:/dev/sda3 but i get unknown option --fs-type any ideas how i can force to use btrfs
[0:56] * ScOut3R_ (~scout3r@dsl51B61603.pool.t-online.hu) Quit (Remote host closed the connection)
[1:00] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:01] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[1:01] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[1:01] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:13] <sagewk> ponyofdeath: alfredodeza is gone for the day (or at a conference); can you send an emai lto ceph-devel?
[1:13] <ponyofdeath> sagewk: sure
[1:16] * ircolle (~Adium@c-67-172-132-222.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[1:16] <bandrus> ponyofdeath: you should be specifying fs type in the ceph.conf, ceph-deploy no longer allows for a --fs-type option
[1:17] <bandrus> or rather, not sure it has ever worked for ceph-deploy
[1:19] <ponyofdeath> ahh so something like this?
[1:19] <ponyofdeath> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-April/000988.html
[1:19] <ponyofdeath> i need to add that to the ceph.conf
[1:19] <ponyofdeath> and then run mkcephfs
[1:21] <ponyofdeath> bandrus: also how does the journal files work? if i specify a block device will it format it with btrfs and then put the journal files on there or do i have to specify an dir with an btrfs partition already made
[1:21] <bandrus> your initial ceph-deploy create command will create a ceph.conf in the local directory. You can modify that ceph.conf with your desired settings (osd mkfs type = btrfs, osd mkfs options btrfs = whatever, osd mount options btrfs - whatever) before proceeding with using ceph-deploy to create your osds
[1:21] <ponyofdeath> bandrus: perfect thanks!
[1:21] <ponyofdeath> bandrus: what about the jorunal block device
[1:22] <bandrus> ponyofdeath: as far as your journal question goes, I do not know the answer, sorry.
[1:22] <ponyofdeath> bandrus: thanks!
[1:22] <ponyofdeath> for your help
[1:25] <bandrus> ponyofdeath: I have only tested the above with xfs and ext4 to be honest, but I have seen in the past that ceph-deploy has provisions for btrfs as well. Some steps in using ceph.conf to create your cluster will overwrite the ceph.conf, so after each step, make sure the directives are still there and that they've properly been copied to other nodes and it worked as expected. It's a bit finicky, but you should be able to get it to work!
[1:26] <ponyofdeath> bandrus: thanks! how about the fact that when i do ceph-deply create
[1:26] <ponyofdeath> it creates configs in the local dir
[1:26] <ponyofdeath> do i move them to /etc/ceph/
[1:26] <ponyofdeath> after
[1:26] <bandrus> ceph-deploy will take care of that automatically when you install on the respective nodes
[1:27] <ponyofdeath> ahh thats becase of the admin node
[1:27] <ponyofdeath> got it
[1:27] <ponyofdeath> in my case admin node is the ceph node as well
[1:27] <bandrus> that's fine, it will still work!
[1:27] * davidzlap (~Adium@76.173.16.173) has joined #ceph
[1:27] <ponyofdeath> [prod-ent-ceph01][ERROR ] RuntimeError: config file /etc/ceph/ceph.conf exists with different content; use --overwrite-conf to overwrite
[1:28] <ponyofdeath> ceph-deploy osd prepare prod-ent-ceph01:sdb:/dev/sda3
[1:28] <ponyofdeath> still running tho
[1:29] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[1:29] * LeaChim (~LeaChim@host86-174-76-26.range86-174.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:29] <bandrus> that means you already have a ceph.conf present in /etc/ceph that the command you are running wants to overwrite. If you're okay with that, then use --overwrite-conf. You should make sure that no processes are still running though, and that no mounts are still present before re-running anything like that. I assume this is a fresh cluster with no data?
[1:30] <ponyofdeath> yeah
[1:30] <ponyofdeath> would it put the [osd.1] section ?
[1:30] <ponyofdeath> if i do the overwrite
[1:30] <bandrus> if you only want osd.1 to be btrfs
[1:31] <bandrus> Also this bug: http://tracker.ceph.com/issues/6154
[1:31] <ponyofdeath> nice thanks!
[1:31] <bandrus> Greg mentions the specific parameter "fs type = foo"
[1:31] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:32] <bandrus> I do not know which is proper, the specific osd directives, or the single fs type one
[1:32] * sagelap (~sage@2607:f298:a:607:cb5:e9b5:6897:6977) Quit (Ping timeout: 480 seconds)
[1:33] <bandrus> and as he's mentioned, you would specify it under whatever section is appropriate for your cluster, either a single OSD or a more broad category
[1:33] <bandrus> ponyofdeath: what version of ceph-deploy are you using?
[1:33] <ponyofdeath> 1.2.7precise
[1:34] <ponyofdeath> deb http://ceph.com/debian-dumpling/ precise main
[1:34] <ponyofdeath> is what i have in my sources
[1:34] <infernix> there was something to be done if you are running ceph osds on only one server
[1:34] <bandrus> interesting that --fs-type doesn't work then, according to the patch in that bug
[1:34] <ponyofdeath> infernix: i have two servers
[1:35] <infernix> does anyone know what that was? I'm merely running some tests
[1:35] <bandrus> Crush Rules need to be adjusted for one node
[1:35] <infernix> wasnt there a way to optimize it automatically in dumpling?
[1:36] <bandrus> Not sure, but here's the documentation I'm aware of on the subject: http://ceph.com/docs/next/start/quick-ceph-deploy/
[1:36] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[1:36] <bandrus> or specifically: http://ceph.com/docs/next/start/quick-ceph-deploy/#create-a-cluster
[1:37] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[1:38] <ponyofdeath> bandrus: so yeah the command ceph-deploy --overwrite-conf osd prepare prod-ent-ceph01:sdb:/dev/sda3 did overwrite my ceph.conf but did not put anything osd related in it
[1:38] <ponyofdeath> only took my [osd] serction out
[1:39] <ponyofdeath> should it not generate [osd.0]
[1:39] <infernix> that's it, thanks
[1:39] * jmlowe (~Adium@2601:d:a800:511:345c:56cb:4960:1280) has left #ceph
[1:41] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:44] <bandrus> ponyofdeath: you modified the ceph.conf in your local folder, right? You might need to back up a few steps in order for it to work properly, such as modifying the file before the install step
[1:45] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[1:48] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[1:48] <infernix> so, osd bench does about 300MB/s
[1:48] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) Quit ()
[1:48] <infernix> 300-400
[1:48] * infernix tries rados
[1:49] <Pedras> infernix: your ssd-only box, what kind of network interfaces are you using?
[1:49] <infernix> infiniband
[1:49] <bandrus> ponyofdeath: unfortunately it's been a while since I've done this, and I don't think that in particular is well documented. It might take some experimenting
[1:51] <infernix> yeah, 1.2GB/s
[1:51] <infernix> nothing like the 10GB writes with raid 10
[1:54] <infernix> and that is with 2 E5-2690s
[1:54] <infernix> sagewk: we talked about this box at ceph days in NYC, what kind of profiling would you like me to capture?
[1:55] * rudolfsteiner (~federicon@mail.bittanimation.com) Quit (Quit: rudolfsteiner)
[1:56] <infernix> seq read is about 3GB/s
[1:57] <sagewk> infernix: nyc was so long ago! remind me what teh box looks like?
[1:57] <infernix> all ssd all the time
[1:57] <infernix> md raid 10: 23GByte/sec random reads, 10gbyte/sec random writes
[1:57] <infernix> large block of course. 4k about 1million random read iops, half that on write
[1:58] <infernix> just one box, rados bench running on the same box currently
[1:58] <sagewk> and a single ceph-osd in front of it?
[1:58] <infernix> no, 48 osds
[1:58] <infernix> 1 per disk
[1:58] <sagewk> ah, so no md raid with ceph, gotcha.
[1:58] <infernix> no, just for apples/oranges comparison
[1:59] <sagewk> well, something like perf top would be interesting.
[1:59] <sagewk> intel processors?
[1:59] <sagewk> if it's 0.69 or later the intel sse crc32c stuff may speed things up
[1:59] <infernix> E5-2690
[2:00] <infernix> dumpling, i can probably apt-get upgrade
[2:00] <infernix> emperor is it?
[2:00] <infernix> or cuttlefish/
[2:00] <sagewk> not yet. use ceph.com/debian-testing for latest dev release (v0.70)
[2:00] <infernix> x2 by the way, so that's 32 physical cores
[2:00] <infernix> at 2.6ghz
[2:00] <infernix> *2.9
[2:01] <infernix> i'm also just on xfs still, can try butter
[2:02] <sagewk> xfs vs btrfs pbly doesn't matter too much here
[2:03] <infernix> ok let me upgrade and build 3.11.5 to get perf up
[2:04] <sagewk> infernix: sounds good!
[2:14] * sagelap (~sage@2600:1012:b013:90ab:2075:1090:46da:31e0) has joined #ceph
[2:14] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Read error: Operation timed out)
[2:15] <infernix> sagewk: 0.70 same read/write numbers
[2:15] * sagelap1 (~sage@2600:1012:b013:90ab:98ec:6f0c:4a63:80e1) has joined #ceph
[2:16] * sagelap (~sage@2600:1012:b013:90ab:2075:1090:46da:31e0) Quit (Read error: Connection reset by peer)
[2:19] * onizo (~onizo@wsip-98-175-245-242.sd.sd.cox.net) has joined #ceph
[2:20] <mikedawson> infernix: if you run multiple rados bench instances concurrently, do you get more aggregate throughput?
[2:20] <infernix> not really
[2:21] <infernix> 1600, sometimes 2000mb/sec writes
[2:22] * imjustmatthew (~imjustmat@pool-71-251-233-166.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[2:22] <infernix> peak to 3gb but then drops to 1gb
[2:22] <mikedawson> infernix: nhm typically uses multiples, believe he saw a bottleneck somewhere at one point. At any rate, concurrency tends to help
[2:23] <infernix> yeah not here
[2:26] * rudolfsteiner (~federicon@181.167.96.123) has joined #ceph
[2:26] <infernix> looking for rados bench results that go beyond 3000mb reads or 1 to 2gb/s writes
[2:27] * mtanski (~mtanski@69.193.178.202) Quit (Read error: Operation timed out)
[2:27] <infernix> let me test 4kb too
[2:28] <infernix> segfault
[2:28] <infernix> o_O
[2:29] <mikedawson> infernix: nhm was getting ~2000MB/s with 4MB writes and ~1650MB/s for 4MB reads on xfs here http://ceph.com/performance-2/ceph-cuttlefish-vs-bobtail-part-1-introduction-and-rados-bench/
[2:29] <infernix> i see that
[2:29] <infernix> so i'm going a tadbit faster
[2:29] <infernix> but not much
[2:30] <infernix> if i run rados bench -p pbench 90 write -t 16 --no-cleanup -b 4096 i get like 17MB/sec
[2:30] <infernix> all 16 cores are totally hammered
[2:30] <infernix> avg-cpu: %user %nice %system %iowait %steal %idle
[2:30] <infernix> 53.18 0.00 30.37 4.76 0.00 11.69
[2:31] <infernix> ceph -w reports anywhere between 3000-8000 op/s
[2:32] <mikedawson> infernix: with 2x as many osds and ssd vs spinners, you should crush his results for small reads/writes if ceph were to scale efficiently (which I don't think it will in this case)
[2:32] * a_ (~a@209.12.169.218) Quit (Quit: This computer has gone to sleep)
[2:32] <infernix> pool size is 2
[2:32] <infernix> well this is what i discussed with sage
[2:33] <infernix> we see ceph rock on spindles, but to justify cost for pure ssd, it needs to get more out of them
[2:33] <mikedawson> infernix: well, that will nullify the benefit of 2x the drives
[2:33] <infernix> size 1 makes it go lots faster
[2:34] <infernix> i think i'm getting better numbers with size 2 than mark was getting with size 1 in his benchmars
[2:34] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[2:34] <mikedawson> infernix: yep
[2:34] <infernix> interestinly enough only small io is affectes
[2:35] <infernix> getting 70GByte/s from page cache
[2:35] <infernix> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
[2:35] <infernix> 20 32 352738 352706 70533.8 71284 0.0015590.00181158
[2:35] <infernix> maybe it's printing kb here
[2:35] <infernix> doesn't make much sense
[2:35] * infernix drops caches
[2:35] <mikedawson> infernix: what are you getting with 4K reads/writes?
[2:36] <infernix> 31mb/sec writes
[2:36] <infernix> yeah, that's not 70gbyte, thats mbyte
[2:36] <infernix> what the
[2:37] <infernix> 60mbyte/s 4k seq reads
[2:38] <mikedawson> infernix: that stinks. nhm was getting close to 12mb/sec for 4K writes with 7200 rpm drives that likely max at 75 iops or so.
[2:38] <infernix> right
[2:38] <mikedawson> infernix: what chassis is this, and what is the controller / expander setup?
[2:38] * wenjianhn (~wenjianhn@114.245.46.123) has joined #ceph
[2:39] <infernix> 6x lsi 9207s
[2:39] <infernix> all pcie 3
[2:39] <infernix> all disks direct attached
[2:39] <infernix> no sas backplane
[2:39] <mikedawson> seems reasonable
[2:40] <infernix> i can get the most out of it with raid 10 or 50 so far, exporting over srp with scst
[2:40] <infernix> haven't really tried LIO srp target yet
[2:41] * The_Bishop (~bishop@i59F6C731.versanet.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[2:42] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[2:47] * yanzheng (~zhyan@134.134.137.71) has joined #ceph
[2:49] * rudolfsteiner (~federicon@181.167.96.123) Quit (Quit: rudolfsteiner)
[2:54] * a (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[2:55] * a is now known as Guest2701
[2:57] * nigwil_ (~chatzilla@2001:44b8:5144:7b00:dc89:763e:eb:8b67) has joined #ceph
[2:59] * nwat (~nwat@c-24-5-146-110.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:01] * sagelap1 (~sage@2600:1012:b013:90ab:98ec:6f0c:4a63:80e1) Quit (Quit: Leaving.)
[3:02] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[3:02] * nigwil (~chatzilla@2001:44b8:5144:7b00:dc89:763e:eb:8b67) Quit (Ping timeout: 480 seconds)
[3:02] * nigwil_ is now known as nigwil
[3:04] <infernix> sagewk: ok, so i have perf running
[3:04] <infernix> what kind of reporting are you looking for?
[3:04] <infernix> 71.02% [kernel] [k] copy_user_generic_string \u25c6
[3:04] <infernix> 19.91% ceph-osd [.] crc32_iscsi_00
[3:05] <infernix> another thread
[3:05] <infernix> 92.01% libc-2.13.so [.] __memcpy_ssse3 ◆
[3:06] * onizo (~onizo@wsip-98-175-245-242.sd.sd.cox.net) Quit (Remote host closed the connection)
[3:08] * xmir (~xmeer@cm-84.208.159.149.getinternet.no) Quit (Ping timeout: 480 seconds)
[3:08] * yy-nm (~Thunderbi@122.224.154.38) has joined #ceph
[3:15] * rudolfsteiner (~federicon@181.167.96.123) has joined #ceph
[3:15] * rudolfsteiner (~federicon@181.167.96.123) Quit ()
[3:16] * yy-nm (~Thunderbi@122.224.154.38) Quit (Quit: yy-nm)
[3:18] * nerdtron (~Administr@202.60.8.250) has joined #ceph
[3:24] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[3:26] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[3:26] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Read error: Operation timed out)
[3:26] * yehudasa (~yehudasa@2602:306:330b:1980:ea03:9aff:fe98:e8ff) has joined #ceph
[3:31] <smiley> is there any reason to not be using the 3.11 kernel with kernel rbd?
[3:31] <infernix> so, with 8 OSDs on btrfs i also get 1.3GB/s writes
[3:32] <smiley> I am seeing normal write speed…but little to no read speed using kernel rbd on 3.11.
[3:32] * sarob_ (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[3:33] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[3:33] * xmir (~xmeer@cm-84.208.159.149.getinternet.no) has joined #ceph
[3:34] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[3:37] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[3:39] <infernix> it doesn'tmatter if i use 8 osds or 48 osds
[3:39] <infernix> rados bottlenecks on cpu at around 1.3GB/s writes
[3:39] <infernix> :|
[3:41] <infernix> not that that's really the problem
[3:41] <infernix> but 35MB/s 4kb writes is
[3:42] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[3:47] * yy-nm (~Thunderbi@122.224.154.38) has joined #ceph
[3:47] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Ping timeout: 480 seconds)
[3:47] * cofol1986 (~xwrj@120.35.11.138) Quit (Read error: Connection reset by peer)
[3:48] * sarob (~sarob@nat-dip29-wl-c.cfw-a-gci.corp.yahoo.com) has joined #ceph
[3:48] <joshd1> infernix: turning off any kind of logging (even in-memory) should help http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/10485
[3:50] * sarob (~sarob@nat-dip29-wl-c.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[3:50] <infernix> joshd1: 1.4GB/s writes
[3:50] * sarob (~sarob@nat-dip29-wl-c.cfw-a-gci.corp.yahoo.com) has joined #ceph
[3:51] <infernix> let me add some more osds again
[3:51] * themgt_ (~themgt@201-223-197-181.baf.movistar.cl) has joined #ceph
[3:51] * sarob (~sarob@nat-dip29-wl-c.cfw-a-gci.corp.yahoo.com) Quit (Read error: Connection reset by peer)
[3:51] * themgt_ (~themgt@201-223-197-181.baf.movistar.cl) Quit ()
[3:51] * sarob (~sarob@nat-dip29-wl-c.cfw-a-gci.corp.yahoo.com) has joined #ceph
[3:54] <infernix> 3GB/s
[3:54] <infernix> that does help
[3:55] * themgt (~themgt@201-223-255-38.baf.movistar.cl) Quit (Ping timeout: 480 seconds)
[3:57] <infernix> about 126MB of random 4k writes with 13 rados bench processes, 16 threads each
[3:58] <infernix> am now cpu constraint agian
[4:00] <mikedawson> infernix: are you on 0.70 with all 48 osds for the the 126MB test?
[4:02] * yanzheng (~zhyan@134.134.137.71) Quit (Remote host closed the connection)
[4:03] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) has joined #ceph
[4:07] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[4:08] * angdraug (~angdraug@64-79-127-122.static.wiline.com) Quit (Quit: Leaving)
[4:10] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[4:12] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[4:19] * yy-nm (~Thunderbi@122.224.154.38) Quit (Quit: yy-nm)
[4:34] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[4:42] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[4:55] * aliguori (~anthony@74.202.210.82) Quit (Remote host closed the connection)
[5:06] * fireD_ (~fireD@93-139-148-230.adsl.net.t-com.hr) has joined #ceph
[5:06] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[5:07] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Remote host closed the connection)
[5:07] * fireD (~fireD@93-139-172-198.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:22] * nerdtron (~Administr@202.60.8.250) Quit (Quit: Leaving)
[5:26] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 24.0/20130910160258])
[5:26] * XiaoNi (~XiaoNi@203.114.244.88) Quit (Quit: 离开)
[5:29] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:35] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[5:36] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:43] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[5:58] * huangjun (~kvirc@59.173.201.121) has joined #ceph
[6:06] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[6:25] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[6:27] * sarob (~sarob@nat-dip29-wl-c.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[6:27] * onizo (~onizo@wsip-98-175-245-242.sd.sd.cox.net) has joined #ceph
[6:27] * sarob (~sarob@nat-dip29-wl-c.cfw-a-gci.corp.yahoo.com) has joined #ceph
[6:35] * sarob (~sarob@nat-dip29-wl-c.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[6:35] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[6:39] * onizo (~onizo@wsip-98-175-245-242.sd.sd.cox.net) Quit (Remote host closed the connection)
[6:43] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[6:44] * Guest2701 (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Quit: This computer has gone to sleep)
[6:46] * aarontc (~aaron@static-50-126-79-226.hlbo.or.frontiernet.net) Quit (Ping timeout: 480 seconds)
[6:49] * phoenix (~phoenix@vpn1.safedata.ru) Quit ()
[6:53] * a_ (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[7:00] * erice (~erice@71-208-244-175.hlrn.qwest.net) has joined #ceph
[7:09] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[7:16] * newbie|2 (~kvirc@59.173.201.121) has joined #ceph
[7:16] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley)
[7:23] * huangjun (~kvirc@59.173.201.121) Quit (Ping timeout: 480 seconds)
[7:24] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[7:27] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:35] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[7:38] * joelio (~Joel@88.198.107.214) has joined #ceph
[7:39] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[7:40] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[7:41] * erice (~erice@71-208-244-175.hlrn.qwest.net) Quit (Ping timeout: 480 seconds)
[7:43] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:48] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[7:56] * davidzlap (~Adium@76.173.16.173) Quit (Quit: Leaving.)
[8:07] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Ping timeout: 480 seconds)
[8:18] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[8:18] <newbie|2> i have an host runing 3 osds and 1 mon,and i stopped all ceph related daemons on this host,then use start zbkc-all, the zbkc-mon is always the first daemon to start?
[8:19] <mattt> what is zbkc?
[8:20] <newbie|2> an alias for osd and mon
[8:21] * RuediR (~Adium@2001:620:0:2d:cae0:ebff:fe18:5325) has joined #ceph
[8:22] <mattt> newbie|2: i'm not following what the problem is
[8:23] <mattt> newbie|2: the mons are critical to cluster operation, so why not start first?
[8:23] * RuediR1 (~Adium@2001:620:0:26:f0d1:16ff:fe35:bce9) has joined #ceph
[8:24] <newbie|2> yes, i think we should start mon first and then start mds\osd
[8:25] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[8:25] <mattt> i'm not sure what the osds do if they come up with no mon tho, not tried that
[8:26] <newbie|2> it will always try to connect to mon until reached an timeout
[8:28] <newbie|2> in upstart, if there are two jobs(B,C) relay on the same event A, if event A emitted, what the occuring order of B and C��
[8:29] * RuediR (~Adium@2001:620:0:2d:cae0:ebff:fe18:5325) Quit (Ping timeout: 480 seconds)
[8:30] <xarses> if the mon and osd are on the same host, the mon will allways attempt to start first
[8:31] <xarses> but since they are seperate functions one will run if the other fails
[8:32] <xarses> the osd will continue to attempt to start because there could be enough monitors online to work
[8:32] <xarses> it would otherwise have no care if a monitor was running on the local host or elsewere
[8:35] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:35] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[8:44] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[8:45] * Vjarjadian (~IceChat77@94.1.37.151) Quit (Quit: Never put off till tomorrow, what you can do the day after tomorrow)
[8:47] <newbie|2> if osd start but mon doesn't, the osd will failed?
[8:48] <newbie|2> and osd,mon will stop if respwan 5 times in 30s,
[8:52] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:53] * thomnico (~thomnico@2a01:e35:8b41:120:40c9:d032:a17b:d4df) has joined #ceph
[9:01] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:01] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) has joined #ceph
[9:06] * jochen_ (~jochen@laevar.de) has joined #ceph
[9:06] * jochen (~jochen@laevar.de) Quit (Read error: Connection reset by peer)
[9:07] * Pedras (~Adium@c-24-130-196-123.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:15] * huangjun (~kvirc@59.173.201.121) has joined #ceph
[9:15] * sleinen (~Adium@2001:620:0:26:98e0:abc9:b6c:40f3) has joined #ceph
[9:16] * john_barbee_ (~jbarbee@c-98-220-74-174.hsd1.in.comcast.net) Quit (Remote host closed the connection)
[9:17] * odyssey4me (~odyssey4m@41-132-104-169.dsl.mweb.co.za) has joined #ceph
[9:19] * RuediR (~Adium@2001:620:0:2d:cae0:ebff:fe18:5325) has joined #ceph
[9:21] * newbie|2 (~kvirc@59.173.201.121) Quit (Ping timeout: 480 seconds)
[9:24] * JustEra (~JustEra@89.234.148.11) has joined #ceph
[9:25] * RuediR1 (~Adium@2001:620:0:26:f0d1:16ff:fe35:bce9) Quit (Ping timeout: 480 seconds)
[9:27] * sleinen (~Adium@2001:620:0:26:98e0:abc9:b6c:40f3) Quit (Quit: Leaving.)
[9:27] * sleinen (~Adium@130.59.94.165) has joined #ceph
[9:33] * sleinen1 (~Adium@130.59.94.165) has joined #ceph
[9:34] * sleinen2 (~Adium@2001:620:0:25:29:b615:cf94:4603) has joined #ceph
[9:35] * sleinen (~Adium@130.59.94.165) Quit (Ping timeout: 480 seconds)
[9:36] * gucki (~smuxi@p549F966C.dip0.t-ipconnect.de) has joined #ceph
[9:38] * i_m (~ivan.miro@deibp9eh1--blueice1n1.emea.ibm.com) has joined #ceph
[9:41] * sleinen1 (~Adium@130.59.94.165) Quit (Ping timeout: 480 seconds)
[9:48] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) Quit (Read error: Operation timed out)
[9:53] * sleinen2 (~Adium@2001:620:0:25:29:b615:cf94:4603) Quit (Quit: Leaving.)
[9:53] * sleinen (~Adium@130.59.94.165) has joined #ceph
[9:54] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) has joined #ceph
[10:00] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[10:01] * sleinen (~Adium@130.59.94.165) Quit (Ping timeout: 480 seconds)
[10:03] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[10:06] * sleinen (~Adium@2001:620:0:26:5894:2430:5149:8c7d) has joined #ceph
[10:09] * thomnico (~thomnico@2a01:e35:8b41:120:40c9:d032:a17b:d4df) Quit (Quit: Ex-Chat)
[10:17] * newbie|2 (~kvirc@111.172.154.2) has joined #ceph
[10:22] * huangjun (~kvirc@59.173.201.121) Quit (Ping timeout: 480 seconds)
[10:34] * saumya (uid12057@ealing.irccloud.com) Quit (Ping timeout: 480 seconds)
[10:34] * saumya (uid12057@ealing.irccloud.com) has joined #ceph
[10:38] <soren> I have a couple of pg's that are stuck. I'm pretty new to Ceph, so I need a little bit of help.
[10:39] <soren> "ceph health detail" says:
[10:39] <soren> pg 0.4 is stuck unclean since forever, current state active+remapped, last acting [1,3]
[10:39] <soren> The three others are identical (different pg id's, of course, but everything else is the same).
[10:42] <soren> "ceph pg 0.4 query" says: http://pastebin.com/N5R4QEXE
[10:43] <soren> What can I do to clean this up?
[10:45] <soren> "ceph osd dump" shows the same 4 pg's like so:
[10:45] <soren> pg_temp 0.4 [1,3]
[10:45] <soren> What's this pg_temp?
[10:47] <newbie|2> use ceph pg dump_stuck unclean get the pgid
[10:47] <newbie|2> if a pg stuck in unclean, it means maybe some objects unfound
[10:48] * capri (~capri@212.218.127.222) has joined #ceph
[10:49] <soren> I have the pgid's.
[10:52] * huangjun (~kvirc@111.173.83.95) has joined #ceph
[10:52] * LeaChim (~LeaChim@host86-174-76-26.range86-174.btcentralplus.com) has joined #ceph
[10:53] <huangjun> get files and dirs in src/libs3/ lost by using git-clone from ceph respository
[10:56] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: No route to host)
[10:56] * newbie|2 (~kvirc@111.172.154.2) Quit (Ping timeout: 480 seconds)
[10:57] * thomnico (~thomnico@2a01:e35:8b41:120:40c9:d032:a17b:d4df) has joined #ceph
[11:31] <soren> Fixed it. Apparently, my CRUSH map was wrong. :-/
[11:45] * haomaiwa_ (~haomaiwan@117.79.232.201) has joined #ceph
[11:45] * haomaiwang (~haomaiwan@211.155.113.208) Quit (Read error: Connection reset by peer)
[11:52] * allsystemsarego (~allsystem@5-12-37-46.residential.rdsnet.ro) has joined #ceph
[12:06] * odyssey4me2 (~odyssey4m@165.233.71.2) has joined #ceph
[12:07] * dmouse (~dmouse@141.0.32.125) has joined #ceph
[12:08] <dmouse> Hey. I'm getting 403 Forbidden when creating a bucket via radogw (other calls work fine, like listing buckets). Can I debug this in any other way except the radosgw.log? This is an excerpt from the radosgw.log output http://pastebin.com/pmtazCs5
[12:09] * odyssey4me (~odyssey4m@41-132-104-169.dsl.mweb.co.za) Quit (Ping timeout: 480 seconds)
[12:30] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[12:39] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[12:55] * huangjun (~kvirc@111.173.83.95) Quit (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/)
[12:57] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) Quit (Read error: Connection reset by peer)
[12:59] * ninkotech (~duplo@static-84-242-87-186.net.upcbroadband.cz) has joined #ceph
[13:02] * morse (~morse@supercomputing.univpm.it) Quit (Remote host closed the connection)
[13:08] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[13:12] * glzhao (~glzhao@118.195.65.67) Quit (Quit: Lost terminal)
[13:14] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[13:16] * BillK (~BillK-OFT@58-7-67-236.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[13:25] * tobru (~quassel@2a02:41a:3999::94) Quit (Ping timeout: 480 seconds)
[13:28] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[13:34] * tobru (~quassel@2a02:41a:3999::94) has joined #ceph
[13:39] * kb70 (~Adium@2001:628:1:5:903f:ef19:80e2:e435) Quit (Read error: Connection reset by peer)
[13:39] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[13:43] * yguang11 (~yguang11@corp-nat.peking.corp.yahoo.com) Quit (Read error: Operation timed out)
[13:43] * kb70 (~kb70@kb.aco.net) has joined #ceph
[13:46] * shang (~ShangWu@175.41.48.77) Quit (Quit: Ex-Chat)
[13:47] * BillK (~BillK-OFT@58-7-67-236.dyn.iinet.net.au) has joined #ceph
[14:17] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[14:17] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[14:21] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:23] <loicd> ccourtaut: let's start with http://ceph.com/docs/next/start/quick-start-preflight/ ?
[14:24] <loicd> only with no authentication
[14:31] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley)
[14:31] * agh (~oftc-webi@gw-to-666.outscale.net) has joined #ceph
[14:31] <agh> Hello to all,
[14:32] <agh> Is there some people who use radogw usage function ?
[14:40] * haomaiwang (~haomaiwan@117.79.232.233) has joined #ceph
[14:44] <mattt> agh: tested it, but don't use it extensively
[14:44] * haomaiwa_ (~haomaiwan@117.79.232.201) Quit (Ping timeout: 480 seconds)
[14:45] * tchmnkyz (~jeremy@0001638b.user.oftc.net) has joined #ceph
[14:45] <tchmnkyz> ok guys need some help really quick. i have 2 pg's that are "stuck" active+remapped... not really sure how i would go about fixing that. Can someone point me in the right direction?
[14:52] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 24.0/20130910160258])
[14:54] * madkiss (~madkiss@chello062178057005.20.11.vie.surfer.at) Quit (Quit: Leaving.)
[14:55] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginmedia.com) has joined #ceph
[15:06] <agh> mattt: ok. So, i've a problem. I've a user who use A LOT his account. And, when i do a query to grab his usage... it takes a lot of time (in fact, it hangs)
[15:07] * diegows (~diegows@190.190.11.42) has joined #ceph
[15:11] * CANNIBAL (~CANNIBAL@200-232-224-242.dsl.telesp.net.br) has joined #ceph
[15:11] <CANNIBAL> Pass this message http://www.tatuuu.com.br Thank you!
[15:11] * CANNIBAL (~CANNIBAL@200-232-224-242.dsl.telesp.net.br) Quit ()
[15:11] <loicd> ccourtaut: https://github.com/CiscoSystems/openstack-installer/blob/master/data/role_mappings.yaml
[15:12] <loicd> this is where you would say : compute-server02: compute
[15:12] <loicd> assuming compute-server02 is the hostname of your other
[15:12] <loicd> host
[15:15] <ccourtaut> ccourtaut: hum ok
[15:15] <loicd> it points to https://github.com/CiscoSystems/openstack-installer/blob/master/data/scenarios/2_role.yaml#L22
[15:15] <ccourtaut> loicd: ok
[15:16] <loicd> because https://github.com/CiscoSystems/openstack-installer/blob/master/data/config.yaml#L8
[15:16] <loicd> and the class_group https://github.com/CiscoSystems/openstack-installer/blob/master/data/scenarios/2_role.yaml#L28
[15:16] <loicd> compute
[15:17] <loicd> refers to https://github.com/CiscoSystems/openstack-installer/blob/master/data/class_groups/compute.yaml
[15:17] * fretb (~fretb@37.139.16.111) Quit (Remote host closed the connection)
[15:18] * fretb (~fretb@37.139.16.111) has joined #ceph
[15:20] <loicd> which has class nova_compute which can be found in the https://github.com/stackforge/puppet-openstack/blob/master/manifests/controller.pp which is supposed to be installed in /etc/puppet/modules
[15:21] <loicd> sorry
[15:21] <loicd> https://github.com/stackforge/puppet-openstack/blob/master/manifests/compute.pp
[15:21] <loicd> but guessing the the "compute" class is in the "puppet-openstack" is not quite intuitive ;-)
[15:21] * max-100 (~max-100@217.25.5.178) Quit (Quit: Leaving)
[15:22] <loicd> ccourtaut: this is the hierarchy class
[15:22] <ccourtaut> loicd: even more when not familiar with puppet :)
[15:22] <loicd> cat > /srv/openstack-installer/data/global_hiera_params/user.yaml <<'EOF'
[15:22] <loicd> cinder_backend: rbd
[15:22] <loicd> glance_backend: rbd
[15:22] <loicd> EOF
[15:22] <loicd> is how its it bound to ceph
[15:24] <loicd> the default value is https://github.com/CiscoSystems/openstack-installer/blob/master/data/global_hiera_params/common.yaml#L7
[15:25] * thomnico (~thomnico@2a01:e35:8b41:120:40c9:d032:a17b:d4df) Quit (Ping timeout: 480 seconds)
[15:26] * agh (~oftc-webi@gw-to-666.outscale.net) Quit (Quit: Page closed)
[15:27] <loicd> and it is used in
[15:27] <loicd> https://github.com/CiscoSystems/openstack-installer/blob/master/data/class_groups/cinder_volume.yaml#L4
[15:27] <loicd> ( I should have traced things down to cinder_volume above instead of compute ... but you get the idea ;-)
[15:32] * gaveen (~gaveen@175.157.29.49) has joined #ceph
[15:35] <Grasshopper> has anyone else had the problem where an osd will go into standby and then it is unable to come out of standby saying resource busy?
[15:35] <Grasshopper> This only happens on nodes where 2 osds or more are running
[15:36] <loicd> ccourtaut: http://ceph.com/docs/next/start/quick-ceph-deploy/
[15:37] <loicd> I advise you to ssh -A on the machine and skip the key creation step
[15:37] <loicd> ccourtaut:
[15:44] * erice (~erice@71-208-244-175.hlrn.qwest.net) has joined #ceph
[15:44] * vata (~vata@2607:fad8:4:6:4da0:eb94:ac70:4302) has joined #ceph
[15:46] <loicd> ccourtaut: http://ceph.com/docs/master/rados/operations/authentication/#disabling-cephx this is what you need to disable cephx
[15:52] * kb70 (~kb70@kb.aco.net) Quit (Quit: Leaving.)
[15:55] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:58] * bmurphy (~bmurphy@dc3officefw-outside.rtr.liquidweb.com) Quit (Remote host closed the connection)
[16:08] <Grasshopper> loicd : have you ever experienced an OSD being unable to come out of standby?
[16:08] * dmsimard (~Adium@2607:f748:9:1666:b560:a1e6:6767:52ea) has joined #ceph
[16:11] * odyssey4me2 (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[16:14] * erice_ (~erice@50.240.86.181) has joined #ceph
[16:16] * odyssey4me (~odyssey4m@165.233.71.2) has joined #ceph
[16:18] * erice (~erice@71-208-244-175.hlrn.qwest.net) Quit (Ping timeout: 480 seconds)
[16:23] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[16:24] <loicd> Grasshopper: hope
[16:24] <loicd> sorry
[16:24] <loicd> Grasshopper: nope :-)
[16:25] <dmouse> tchmnkyz, have you changed your crush map recently?
[16:26] * sjm (~sjm@38.98.115.250) has joined #ceph
[16:32] * L2SHO (~L2SHO@office-nat.choopa.net) has joined #ceph
[16:39] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:40] <nhm> any experts on HP Proliant gear around?
[16:40] <L2SHO> question, what would I need to do to enable cephx on a cluster that it's currently disabled on?
[16:42] <alfredodeza> L2SHO: this is very well documented here: http://ceph.com/docs/master/rados/operations/authentication/#enabling-cephx
[16:47] <L2SHO> alfredodeza, thanks, but those instructions are not very clear. The first step says to generate a keyring, do I need to do this on every server? Or do I make 1 keyring and copy it to every server. Is /etc/ceph/keyring different from the other keyrings that go in /tmp/monitor-key and /var/lib/ceph/mon/ceph-a/keyring?
[16:49] * KevinPerks1 (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[16:50] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) Quit (Quit: Leaving)
[16:53] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:53] <alfredodeza> L2SHO: the CEPHX configuration guide goes into more detail about keyrings and where to generate them http://ceph.com/docs/master/rados/operations/authentication/#configuring-cephx
[16:54] <mikedawson> nhm: jmlowe may be a source
[16:55] * raipin (raipin@a.clients.kiwiirc.com) has joined #ceph
[16:56] * tsnider1 (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[16:58] <sjm> Azrael: ping
[16:58] * tsnider1 (~tsnider@nat-216-240-30-23.netapp.com) Quit (Remote host closed the connection)
[16:58] * tsnider1 (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[16:58] * RuediR (~Adium@2001:620:0:2d:cae0:ebff:fe18:5325) Quit (Quit: Leaving.)
[16:59] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[17:03] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) Quit (Ping timeout: 480 seconds)
[17:04] * gucki_ (~smuxi@p549F966C.dip0.t-ipconnect.de) has joined #ceph
[17:04] <L2SHO> alfredodeza, do you know if I need to re-start all the mon's and osd's simultaneously, or is there some recommended order?
[17:05] * yanzheng (~zhyan@134.134.139.72) has joined #ceph
[17:05] * JustEra (~JustEra@89.234.148.11) Quit (Quit: This computer has gone to sleep)
[17:06] * dmouse (~dmouse@141.0.32.125) Quit (Read error: Operation timed out)
[17:16] <tchmnkyz> dmouse i have been changing the weight of OSD's
[17:16] <tchmnkyz> i have some old nodes i am phasing out in favor of faster nodes
[17:22] * yehudasa (~yehudasa@2602:306:330b:1980:ea03:9aff:fe98:e8ff) Quit (Ping timeout: 480 seconds)
[17:23] * Pedras (~Adium@c-24-130-196-123.hsd1.ca.comcast.net) has joined #ceph
[17:25] * odyssey4me2 (~odyssey4m@41-132-104-169.dsl.mweb.co.za) has joined #ceph
[17:26] * diegows (~diegows@190.190.11.42) Quit (Read error: Operation timed out)
[17:26] * diegows (~diegows@190.190.11.42) has joined #ceph
[17:29] * sagelap (~sage@2600:1012:b008:9708:a978:a418:588b:6faa) has joined #ceph
[17:30] * odyssey4me (~odyssey4m@165.233.71.2) Quit (Ping timeout: 480 seconds)
[17:30] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:32] * yanzheng (~zhyan@134.134.139.72) Quit (Remote host closed the connection)
[17:33] * yanzheng (~zhyan@101.83.199.47) has joined #ceph
[17:38] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[17:40] * gucki_ (~smuxi@p549F966C.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[17:47] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:49] * sprachgenerator (~sprachgen@130.202.135.198) has joined #ceph
[17:49] * odyssey4me2 (~odyssey4m@41-132-104-169.dsl.mweb.co.za) Quit (Ping timeout: 480 seconds)
[17:50] * a_ (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Quit: This computer has gone to sleep)
[17:52] * wenjianhn (~wenjianhn@114.245.46.123) Quit (Ping timeout: 480 seconds)
[17:52] * sleinen (~Adium@2001:620:0:26:5894:2430:5149:8c7d) Quit (Quit: Leaving.)
[17:52] * sleinen (~Adium@130.59.94.165) has joined #ceph
[17:58] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:58] * gregsfortytwo1 (~Adium@2607:f298:a:607:1137:e269:4f0d:4cba) has joined #ceph
[18:00] * sleinen (~Adium@130.59.94.165) Quit (Ping timeout: 480 seconds)
[18:01] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) has joined #ceph
[18:02] * aliguori (~anthony@74.202.210.82) has joined #ceph
[18:05] * yanzheng (~zhyan@101.83.199.47) Quit (Ping timeout: 480 seconds)
[18:08] <sagelap> josef: it's something obvious?
[18:09] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[18:09] <josef> yeah
[18:09] * sagelap (~sage@2600:1012:b008:9708:a978:a418:588b:6faa) Quit (Read error: Connection reset by peer)
[18:12] <sagewk> josef: glad to hear it :)
[18:13] <sagewk> josef: can trigger it pretty easily, happy to test a patch
[18:13] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[18:13] <josef> sagewk: i just sent it out
[18:13] <josef> please test, i'm 100% sure it will fix the problem, i'm 99% sure i didn't break anything in the meantime :)
[18:14] * glzhao (~glzhao@118.195.65.67) Quit (Quit: leaving)
[18:14] * a_ (~a@209.12.169.218) has joined #ceph
[18:14] <sagewk> building now. thanks!
[18:15] <josef> np
[18:16] * aarontc (~aaron@static-50-126-79-226.hlbo.or.frontiernet.net) has joined #ceph
[18:16] <infernix> sagewk: i'm still looking for some pointers as to what data i need to capture with perf on my all-ssd box. going to 0.70 didn't improve performance in any way, but i'm not sure if the intel crc32c is in use and not sure how to confirm
[18:17] <infernix> nhm, fyi i've ran rados bench with pool size = 1 and can get about 36MB/sec of 4k writes over 48 osds in 1 box, with 1 rados bench running on the same box
[18:17] <sagewk> a snapshot of 'perf top' under load would be a start...
[18:18] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[18:18] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:18] * ircolle (~Adium@c-67-172-132-222.hsd1.co.comcast.net) has joined #ceph
[18:19] * nwat (~nwat@eduroam-251-62.ucsc.edu) has joined #ceph
[18:21] * raipin (raipin@a.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[18:22] <infernix> snapshot meaning just a screen dump into a pastebin?
[18:23] * tsnider1 (~tsnider@nat-216-240-30-23.netapp.com) Quit (Quit: Leaving.)
[18:24] <sagewk> infernix: yeah sure
[18:25] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) has joined #ceph
[18:27] * yankcrime is now known as _nick
[18:27] * raipin (raipin@a.clients.kiwiirc.com) has joined #ceph
[18:30] * i_m (~ivan.miro@deibp9eh1--blueice1n1.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[18:32] * angdraug (~angdraug@64-79-127-122.static.wiline.com) has joined #ceph
[18:32] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[18:35] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) Quit (Remote host closed the connection)
[18:35] * G_H_I_S (G_H_I_S@d.clients.kiwiirc.com) has joined #ceph
[18:35] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) has joined #ceph
[18:36] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginmedia.com) Quit (Quit: Computer has gone to sleep.)
[18:36] * yehudasa (~yehudasa@2607:f298:a:607:ea03:9aff:fe98:e8ff) has joined #ceph
[18:37] * G_H_I_S (G_H_I_S@d.clients.kiwiirc.com) Quit ()
[18:40] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[18:40] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) has joined #ceph
[18:41] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) Quit (Remote host closed the connection)
[18:41] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) has joined #ceph
[18:41] * gregsfortytwo1 (~Adium@2607:f298:a:607:1137:e269:4f0d:4cba) Quit (Quit: Leaving.)
[18:42] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[18:43] * Pedras (~Adium@c-24-130-196-123.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:45] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[18:45] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) has joined #ceph
[18:53] * Vjarjadian (~IceChat77@94.1.37.151) has joined #ceph
[18:55] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[18:58] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 24.0/20130910160258])
[18:58] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) Quit (Remote host closed the connection)
[18:59] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) has joined #ceph
[19:03] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) Quit (Read error: Connection reset by peer)
[19:03] * diegows (~diegows@190.190.11.42) Quit (Ping timeout: 480 seconds)
[19:04] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) has joined #ceph
[19:04] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[19:07] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[19:09] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:13] <infernix> sagewk: there's your perf top, is that working for you?
[19:13] * davidzlap (~Adium@38.122.20.226) has joined #ceph
[19:18] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[19:19] * jbd_ (~jbd_@2001:41d0:52:a00::77) has left #ceph
[19:22] * xarses1 (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[19:22] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Read error: Connection reset by peer)
[19:25] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[19:26] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[19:26] <infernix> ah damn
[19:26] <infernix> sagewk: n/m. power outage.
[19:27] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[19:28] * xarses1 (~andreww@64-79-127-122.static.wiline.com) Quit (Quit: Leaving.)
[19:29] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[19:34] * sarob_ (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[19:35] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[19:35] * Gamekiller77 (~oftc-webi@128-107-239-235.cisco.com) has joined #ceph
[19:37] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[19:39] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[19:40] * davidzlap (~Adium@38.122.20.226) Quit (Read error: Connection reset by peer)
[19:40] * themgt (~themgt@201-223-223-113.baf.movistar.cl) has joined #ceph
[19:42] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Ping timeout: 480 seconds)
[19:43] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[19:45] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit ()
[19:45] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[19:45] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit ()
[19:46] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[19:48] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[19:49] * papamoose1 (~kauffman@hester.cs.uchicago.edu) Quit (Remote host closed the connection)
[19:52] * davidzlap (~Adium@38.122.20.226) has joined #ceph
[19:54] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) Quit (Remote host closed the connection)
[19:55] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) has joined #ceph
[19:56] * papamoose1 (~kauffman@hester.cs.uchicago.edu) has joined #ceph
[19:56] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Remote host closed the connection)
[19:59] <n1md4> can anyone help me with an rbd authentication problem? Just have a read of this, and let me know what problem I might be facing? http://pastebin.com/LVVNqzes
[20:00] * gaveen (~gaveen@175.157.29.49) Quit (Quit: Leaving)
[20:00] <n1md4> please? :)
[20:01] <Gamekiller77> n1md4: i not big in to the xen setup but look like you are missing the keys
[20:01] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Quit: Leaving.)
[20:01] <Gamekiller77> i just got my KVM setup working and it had to do with file permissions
[20:02] * onizo (~onizo@wsip-70-166-5-159.sd.sd.cox.net) Quit (Read error: Operation timed out)
[20:02] <Gamekiller77> am i reading the past correct that you are use Xen
[20:04] * sarob_ (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[20:05] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[20:09] <mikedawson> loicd: could you point me to a commit / explaination of this snippet from the 0.71 release note? "fix exponential backoff of slow request warnings (Loic Dachary)"
[20:13] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[20:16] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Quit: Leaving.)
[20:16] * chris_lu (~ccc2@bolin.Lib.lehigh.EDU) has joined #ceph
[20:16] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[20:19] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[20:27] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[20:28] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:30] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:36] <xarses> n1md4 - line 20 requires /etc/ceph/ceph.client.admin.keyring
[20:37] <xarses> because you didn't specify -k
[20:39] <xarses> n1md4: line 1-3 looks like you didn't insert the secret correctly into libvirt
[20:42] <loicd> mikedawson: that would be https://github.com/ceph/ceph/pull/630
[20:43] * Pedras (~Adium@216.207.42.132) has joined #ceph
[20:43] <mikedawson> loicd: thx
[20:45] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[20:53] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[20:54] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[21:00] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) Quit (Ping timeout: 480 seconds)
[21:07] * papamoose1 (~kauffman@hester.cs.uchicago.edu) Quit (Remote host closed the connection)
[21:14] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[21:15] * papamoose (~kauffman@hester.cs.uchicago.edu) has joined #ceph
[21:15] * sarob (~sarob@adsl-99-150-210-177.dsl.pltn13.sbcglobal.net) has joined #ceph
[21:19] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) has joined #ceph
[21:22] * aliguori (~anthony@74.202.210.82) Quit (Ping timeout: 480 seconds)
[21:24] <mtanski> Is there a guide for how to setup a HA MDS service. I saw the configuration settings for standby but there really isn't a clear guide on how to do it.
[21:24] * sarob (~sarob@adsl-99-150-210-177.dsl.pltn13.sbcglobal.net) Quit (Read error: Operation timed out)
[21:25] <mtanski> unless I'm doing a poor job of looking for HA mds
[21:33] <nhm> infernix: hey, sorry, I was in a meeting earlier
[21:33] <nhm> infernix: might want to try multiple rados bench commands at once, lots of concurrency, and disable all in-memory loggin in ceph.
[21:34] <nhm> infernix: "perf top" is useful but limited. You can get a call graph by running perf record -g -a and then perf report (or perf report > foo for an ascii dump).
[21:35] <nhm> infernix: likely unless you are running a really new kernel and also have perf compiled with unwind support, there will be a bunch of missing symbols.
[21:35] <n1md4> Gamekiller77: thanks. what permissions should be set? ..yes, using xen.
[21:36] <n1md4> xarses: I've not inserted any secret, what's involved?
[21:38] * sjm (~sjm@38.98.115.250) has left #ceph
[21:40] <xarses> n1md4 : sorry, i found some refrence about Xen, aparently no secret is involved like openstack
[21:40] <xarses> i was hoping they where similar since they both use libvirt
[21:42] <infernix> nhm: i need to compile it without unwind?
[21:42] <infernix> nhm: all logging is off; multiple rados benches doesn't greatly improve things
[21:43] <n1md4> xarses: hmm thanks, any way.
[21:44] <xarses> has any one seen upstart issues with cinder-volume in grizzly using ceph?
[21:45] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[21:45] <n1md4> xarses: not repling to your question, but if any one else asks, this page looks like it has the answer ... http://libvirt.org/formatsecret.html#CephUsageType
[21:47] * sarob (~sarob@ip-64-134-228-129.public.wayport.net) has joined #ceph
[21:47] * erice_ (~erice@50.240.86.181) Quit (Ping timeout: 480 seconds)
[21:49] * erice (~erice@50.240.86.181) has joined #ceph
[21:50] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[21:57] * sarob_ (~sarob@ip-64-134-228-129.public.wayport.net) has joined #ceph
[21:57] * sarob (~sarob@ip-64-134-228-129.public.wayport.net) Quit (Read error: Connection reset by peer)
[21:57] <mikedawson> xarses: I have seen some startup issues in that environment. What issue are you seeing? ours were related to osds
[21:58] <xarses> mikedawson: we are seeing that upstart doesn't start the cinder-volume service, while running sudo -u cinder CEPH_ENV='--id volumes' cinder-volume run's fine as expected. but the upstart fails to start or produce any usable logs
[22:00] <mikedawson> xarses: we don't have that issue. Is cinder-volume on a "controller" type machine or co-located with ceph mon/osds?
[22:03] * albionandrew (~albionand@65-128-12-211.hlrn.qwest.net) has joined #ceph
[22:04] * sagelap (~sage@38.122.20.226) has joined #ceph
[22:04] <albionandrew> If I have one node with ceph on should I see another node in the odd tree?
[22:04] <infernix> perf record: Captured and wrote 5461.522 MB perf.data (~238617550 samples)
[22:04] <infernix> o_O
[22:04] * aliguori (~anthony@74.202.210.82) has joined #ceph
[22:05] <xarses> mikedawson, it can be with the OSD, it will be with the Mon and controller
[22:06] * allsystemsarego (~allsystem@5-12-37-46.residential.rdsnet.ro) Quit (Quit: Leaving)
[22:08] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) Quit (Quit: Leaving.)
[22:11] <infernix> nhm, sagewk: so i have a perf.data report ready: http://dx.infernix.net/ceph.perf.data.xz
[22:12] <infernix> this is with one rados bench -p pbench 900 write -t 16 -b 4096 which gives about 30Mbyte/s
[22:12] <infernix> if i run 13 of those, i get about 3.5MB/s each
[22:15] <infernix> ascii report: http://dx.infernix.net/ceph.perf.log.xz
[22:18] <infernix> 48 osds all ssd, 2x intel e5-2690 (16 cores at 2.9ghz), 1 box, size 1, rados bench on the box
[22:20] <infernix> and md raid 10 does about 1 million random 4k read iops, 0.5 million random write (directio)
[22:31] <nhm> infernix: disabling in-memory debugging may help
[22:31] <nhm> infernix: not enough to get you 1 million IOPS, but at least a little. :)
[22:33] <nhm> infernix: btw, how's CPU usage during that time period? Are the cores all maxed out?
[22:34] <mikedawson> nhm: As we get closer to automatic tiering in Icehouse, are you going get gear look for this bottleneck?
[22:34] <mikedawson> nhm: g/gear look/gear to look/
[22:35] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley)
[22:36] <infernix> nhm: it's off
[22:36] <infernix> all cores maxed
[22:36] <mikedawson> infernix: have any more nodes to generate load?
[22:36] <nhm> mikedawson: yes, but it's one of lots of priorities. :/
[22:36] <nhm> infernix: that's not terribly surprising.
[22:36] <infernix> i have a few but i don't see how that will help much
[22:36] <nhm> infernix: I can max my box with OSDs on 8 SSDs.
[22:37] <nhm> when doing small IO
[22:37] <nhm> infernix: so yeah, there's lots of missing symbols in this callgraph like expected
[22:37] <infernix> if i do 4MB i can't go past about 1.2GB writes, 3GB reads
[22:37] <mikedawson> nhm or infernix: is the load coming from rados bench (the client) or from osds?
[22:37] <nhm> infernix: I need to sit down and get a kernel and perf built with libunwind support.
[22:37] <infernix> nhm: not sure what else to install, i installed jsut about all the dbg packages
[22:37] <infernix> and this is 3.11.5 with perf+unwind
[22:37] <nhm> mikedawson: probably a bit of both, but the OSDs tend to consume a lot during small IO.
[22:38] <nhm> infernix: really? :/
[22:38] <nhm> infernix: hrm, maybe there is a switch
[22:38] <infernix> ldd shows /usr/lib/libunwind-x86_64.so.7
[22:38] * alexxy[home] (~alexxy@2001:470:1f14:106::2) Quit (Remote host closed the connection)
[22:38] <infernix> the only thing i didn't build is gtk
[22:38] <nhm> infernix: might have to tell it to use dwarf on the command line
[22:39] <nhm> infernix: trying to find the release notes
[22:39] <mikedawson> nhm: what throughput are you getting while saturating cpu with 8 ssds and small random writes?
[22:39] <nhm> try perf record -g dwarf -a or something
[22:39] <nhm> mikedawson: don't remember, wasn't terribly impressive
[22:40] <nhm> mikedawson: I didn't get very far before I had to go do other stuff
[22:40] <infernix> raid10 on the box with 128kb block does 23GB/s read, 10GB/s write
[22:40] <nhm> mikedawson: highest report I've heard was something like 85K IOPs from 1 node.
[22:40] <infernix> i mean it screams, and i would love to get a bit closer to that with ceph somehow
[22:40] <nhm> mikedawson: don't know if those were reads or writes, but it was from 8 SSDs.
[22:41] <infernix> perf record: Captured and wrote 15375.120 MB
[22:41] <infernix> dwarf adds a lot
[22:41] <nhm> yikes!
[22:41] <mikedawson> nhm: I have 8 new boxes arriving for teuthology on Monday. I'll play around a bit to see what I can get with plenty of cpu available (1 ssd per node).
[22:41] <nhm> infernix: btw, I haven't used the dwarf support yet, so this is blind leading the blind. :)
[22:42] <nhm> mikedawson: fwiw, we've heard recently that 1 OSD seems to top out at like 8 cores for some reason, even with lots of threads.
[22:42] <nhm> might be lock contention or something.
[22:44] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[22:44] <nhm> whoa, perf mem report
[22:44] <nhm> neat
[22:45] * erice (~erice@50.240.86.181) Quit (Ping timeout: 480 seconds)
[22:45] * erice (~erice@71-208-244-175.hlrn.qwest.net) has joined #ceph
[22:46] <nhm> infernix: are you seeing anything interesting in that dwarf report?
[22:46] <infernix> rebuilding kernel with symbols
[22:46] <infernix> perf report -s dso
[22:46] <infernix> makes most sense to me
[22:49] <nhm> infernix: the other thing to do is probably dump_historic_ops after the benchmark is run
[22:50] <nhm> infernix: on all of the OSDs
[22:50] <nhm> you can start to get a feel for where latency is in the OSD
[22:57] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[23:02] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit ()
[23:03] <infernix> nhm: is that working for oyu?
[23:04] <sagewk> josef: that fix seems good!
[23:04] <sagewk> will throw the full qa suite against it tonight
[23:05] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[23:06] * yehudasa (~yehudasa@2607:f298:a:607:ea03:9aff:fe98:e8ff) Quit (Ping timeout: 480 seconds)
[23:07] * sleinen1 (~Adium@2001:620:0:25:8cdd:20fa:553d:a0ea) has joined #ceph
[23:08] <nhm> infernix: sorry, is what working for me?
[23:08] <nhm> oops, nm
[23:09] * yehudasa (~yehudasa@2607:f298:a:607:ea03:9aff:fe98:e8ff) has joined #ceph
[23:09] <nhm> neat
[23:09] * albionandrew (~albionand@65-128-12-211.hlrn.qwest.net) Quit (Quit: albionandrew)
[23:10] <nhm> infernix: that's awesome
[23:11] <nhm> infernix: I suspect single rados bench isn't going to get much more than like 1.3GB/s anyway
[23:12] <nhm> infernix: after a run, something like: find /var/run/ceph/*.asok -maxdepth 1 -exec sudo ceph --admin-daemon {} dump_historic_ops \; > foo
[23:13] <nhm> infernix: hrm... I wonder hwy it was unhappy
[23:13] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:14] <nhm> oh, mon asok
[23:14] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:14] <nhm> infernix: that command gives you the 10 slowest ops over thel ast 10 minutes on every OSD
[23:15] <nhm> infernix: and a trace of latencies as it went through the OSD
[23:18] <nhm> infernix: look at that, like half a second latencies on that OSD
[23:19] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:20] <infernix> nhm: which osd is it?
[23:20] <nhm> infernix: probably with that 1 liner is that it didn't record it.
[23:20] <nhm> problem rather
[23:21] <nhm> probably in order though
[23:21] <nhm> well, that order. ;)
[23:22] <nhm> infernix: how many controllers?
[23:23] <infernix> 6
[23:23] <nhm> infernix: on my todo list is to do some more work on a tool Sam wrote to go through optracker logs and make histograms
[23:23] <nhm> infernix: so we can track this kind of thing better
[23:23] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[23:24] <infernix> all osds perform as expected
[23:24] <infernix> 75k read io
[23:24] <infernix> 4k random, directio
[23:24] <infernix> perf report with dwarf on is slow
[23:25] <nhm> infernix: yeah, I don't think it's the drives necessarily.
[23:25] <nhm> infernix: but clearly we were doing something there that made something unhappy since the op commit took 0.5s
[23:26] <infernix> could be btrfs
[23:26] <infernix> i can switch to xfs or ext4 again
[23:26] <infernix> it's mostly the same on xfs
[23:26] <infernix> this is a 5 second dwarf record during a bench with 16 threads 4k io
[23:26] <nhm> tried turning off in-memory logging yet?
[23:26] <infernix> all off
[23:26] <nhm> ok
[23:27] <infernix> as in, debug just about * = 0/0
[23:29] <infernix> i don't know about this dwarf mode, it is really really slow
[23:30] <nhm> infernix: might need to filter it or something
[23:31] * sleinen1 (~Adium@2001:620:0:25:8cdd:20fa:553d:a0ea) Quit (Quit: Leaving.)
[23:31] <nhm> how big is the perf data for 5s?
[23:32] * erice_ (~erice@50.240.86.181) has joined #ceph
[23:33] <infernix> 1gb
[23:33] * KindOne (~KindOne@0001a7db.user.oftc.net) has joined #ceph
[23:34] <nhm> sheesh
[23:35] <nhm> infernix: maybe just try attaching it to one of hte cpeh-osd processes
[23:36] <nhm> perf record -g dwarf -p pid
[23:37] * erice (~erice@71-208-244-175.hlrn.qwest.net) Quit (Ping timeout: 480 seconds)
[23:37] <nhm> ooh, I think it did something
[23:37] <nhm> no -a
[23:40] <infernix> i broke it, sec
[23:40] <nhm> :D
[23:40] * onizo (~onizo@wsip-98-175-245-242.sd.sd.cox.net) has joined #ceph
[23:41] <infernix> even a 35mb takes forever
[23:41] <nhm> hrm
[23:41] <nhm> just normal perf report?
[23:42] <infernix> with a dwarf capture
[23:42] <infernix> i think it's stuck in a loop or something
[23:42] <nhm> yeah, but just "perf report" afterward?
[23:42] <infernix> yes
[23:43] <nhm> bah
[23:43] <nhm> I wonder if perf stat would tell us anything useful
[23:44] <nhm> like lots fo context switches
[23:47] * sarob_ (~sarob@ip-64-134-228-129.public.wayport.net) Quit (Remote host closed the connection)
[23:48] * sarob (~sarob@ip-64-134-228-129.public.wayport.net) has joined #ceph
[23:49] <nhm> try perf stat -p <pid>
[23:50] <nhm> context switches?
[23:53] <infernix> nhm: i really have no idea what I'm looking at
[23:53] <nhm> infernix: are we under load right now?
[23:53] <infernix> rados bench running
[23:54] <infernix> it's cool stuff but what are we looking for?
[23:54] <nhm> infernix: we want that dwarf stuff. ;)
[23:54] <infernix> yeah and that just hangs
[23:54] * sprachgenerator (~sprachgen@130.202.135.198) Quit (Quit: sprachgenerator)
[23:55] <nhm> yeah. :/
[23:55] <nhm> yikes, it's going kind of crazy
[23:56] * sarob (~sarob@ip-64-134-228-129.public.wayport.net) Quit (Ping timeout: 480 seconds)
[23:56] <infernix> terminal emulator issue
[23:57] <nhm> didn't get anything did you?
[23:57] <infernix> no samples found. f5 the page though
[23:58] <infernix> 1 second capture, 13mb, perf report just sits there eating 1 core
[23:58] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[23:59] <nhm> grr
[23:59] <infernix> oh look, sub-1 second and we have something

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.