#ceph IRC Log

Index

IRC Log for 2013-03-13

Timestamps are in GMT/BST.

[0:01] * alram (~alram@38.122.20.226) Quit (Read error: Connection reset by peer)
[0:07] * Cube1 (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[0:08] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) Quit (Remote host closed the connection)
[0:13] <mauilion> dmick: I found my problem.
[0:13] <mauilion> it's a routing problem
[0:13] <mauilion> nothing to do with the mds config
[0:13] <dmick> cool
[0:13] * barryo1 (~barry@host86-128-180-76.range86-128.btcentralplus.com) has left #ceph
[0:14] <mauilion> multi homed servers really make more work than they solve :)
[0:14] * fghaas (~florian@5.135.71.96) has left #ceph
[0:17] * sleinen (~Adium@2001:620:0:25:459c:91da:6bb2:fb69) Quit (Quit: Leaving.)
[0:18] * ScOut3R (~ScOut3R@c83-249-233-227.bredband.comhem.se) Quit (Remote host closed the connection)
[0:18] <dmick> mauilion: that's why they pay you the big bucks :)
[0:19] <mauilion> HAH
[0:20] * gucki (~smuxi@HSI-KBW-095-208-162-072.hsi5.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[0:41] <nwat> Howdy.. I'm looking at librados ObjectOperation API, and it seems as though both read and write ops cannot be used in a single compound operation. I'm wondering if this is a fundamental limitation at the OSD level, or just the state of the API, or .. ?
[0:42] * gucki (~smuxi@46.115.52.89) has joined #ceph
[0:45] * ScOut3R (~ScOut3R@c83-249-233-227.bredband.comhem.se) has joined #ceph
[0:49] <gregaf> nwat: I think there might be some softening of this coming up that joshd or yehuda_hm could talk about...
[0:49] <gregaf> but in general it's a fundamental limitation forced on us by operation replay
[0:49] <gregaf> if you try and replay an already-performed write operation you don't get data back, and making it give back data would be really hard
[0:50] <gregaf> since all users need to handle that and are unlikely to do so via librados, it just doesn't give you the option
[0:50] <nwat> gregaf: what do you mean by a write operation giving data back
[0:51] <gregaf> if a compound operation contains both read and write operations and it gets replayed due to some kind of failure
[0:51] <gregaf> then the OSD will get it and say "I already applied that operation" and send back success without doing any of the reads
[0:52] <gregaf> that's not really a choice — it's possible-to-probable that the writes the compound op performed changed the data which was being read
[0:52] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[0:52] <gregaf> so, you can't do both reads and writes
[0:52] <gregaf> make sense?
[0:54] * Cube (~Cube@66-87-66-85.pools.spcsdns.net) has joined #ceph
[0:55] <nwat> yeh i think it seems to make sense that there isn't a good general solution. so, one form of softening of the restriction might be for very specific situations, such as entirely non-overlapping reads/writes?
[0:56] * yanzheng (~zhyan@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[0:56] <nwat> It seems like someone could also impose a specific order on the reads/writes.
[0:56] <gregaf> I'm not a big fan of softening it but I think I heard them saying they'd come up with a good way to do it
[0:56] * gucki (~smuxi@46.115.52.89) Quit (Ping timeout: 480 seconds)
[0:57] <gregaf> so you'll have to talk to them about that
[0:57] <nwat> Cool, thanks
[0:57] <gregaf> detecting the non-overlaps etc isn't really feasible and would make the API way too complicated to communicate; we're not going to do that
[0:57] <gregaf> the most common read+write case is covered by the versioning asserts that I believe are actually part of the librados API ;)
[0:58] <gregaf> what were you wanting to do?
[0:59] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[1:00] <nwat> I wanted to build a new compound object operation with read/writes/comparisons, but impose within the OSD some specific rules about when to abort/commit the transaction
[1:00] <gregaf> sounds like something that's perfect for class objects ;)
[1:00] <nwat> yup, that was the backup plan
[1:00] <gregaf> s/class objects/object classes
[1:01] <nwat> all that client code boilerplate for cls_ stuff is kinda a pain sometimes
[1:01] * gucki (~smuxi@46.115.52.89) has joined #ceph
[1:03] * ScOut3R (~ScOut3R@c83-249-233-227.bredband.comhem.se) Quit (Remote host closed the connection)
[1:07] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[1:14] * gucki_ (~smuxi@HSI-KBW-149-172-49-22.hsi13.kabel-badenwuerttemberg.de) has joined #ceph
[1:14] * gucki (~smuxi@46.115.52.89) Quit (Ping timeout: 480 seconds)
[1:25] * jjgalvez1 (~jjgalvez@12.248.40.138) has joined #ceph
[1:25] * jjgalvez1 (~jjgalvez@12.248.40.138) Quit ()
[1:27] * noob2 (~cjh@173.252.71.4) Quit (Quit: Leaving.)
[1:30] * jjgalvez (~jjgalvez@12.248.40.138) Quit (Ping timeout: 480 seconds)
[1:32] * nwat (~Adium@eduroam-242-170.ucsc.edu) Quit (Quit: Leaving.)
[1:34] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[1:35] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit (Quit: Leaving.)
[1:35] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:35] * SteveB (~steveb@pool-72-66-65-227.washdc.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[1:38] * yanzheng (~zhyan@jfdmzpr03-ext.jf.intel.com) Quit (Remote host closed the connection)
[1:38] * Cube (~Cube@66-87-66-85.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[1:39] * jlogan (~Thunderbi@72.5.59.176) Quit (Read error: Connection reset by peer)
[1:39] * jlogan (~Thunderbi@2600:c00:3010:1:f195:abb6:9669:db22) has joined #ceph
[1:41] * Cube (~Cube@66-87-65-31.pools.spcsdns.net) has joined #ceph
[1:42] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[1:49] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[1:50] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:55] * esammy (~esamuels@host-2-102-70-24.as13285.net) Quit (Quit: esammy)
[2:02] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[2:06] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[2:06] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[2:12] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:25] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) has joined #ceph
[2:31] * Cube1 (~Cube@66-87-65-31.pools.spcsdns.net) has joined #ceph
[2:31] * Cube (~Cube@66-87-65-31.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[2:33] * LeaChim (~LeaChim@02da1ea0.bb.sky.com) Quit (Ping timeout: 480 seconds)
[2:35] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[2:38] * brambles (lechuck@s0.barwen.ch) has joined #ceph
[2:41] * gucki_ (~smuxi@HSI-KBW-149-172-49-22.hsi13.kabel-badenwuerttemberg.de) Quit (Read error: Operation timed out)
[2:42] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[2:45] * gucki (~smuxi@46.115.52.89) has joined #ceph
[2:54] * gucki (~smuxi@46.115.52.89) Quit (Ping timeout: 480 seconds)
[2:54] * gucki (~smuxi@HSI-KBW-149-172-49-22.hsi13.kabel-badenwuerttemberg.de) has joined #ceph
[2:55] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[2:57] * lightspeed (~lightspee@fw-carp-wan.ext.lspeed.org) has joined #ceph
[3:05] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[3:09] <infernix> would it hurt if i break off an rdb export early, like rbd export $image - | dd of=mbr bs=512 count=1
[3:09] <infernix> i can't think of a nicer way
[3:11] <dmick> infernix: shouldn't. You can certainly easily write a bit of python to read 512 bytes if you wish, but that's the easiest CLI-only way I can think of too
[3:12] <infernix> yeah, and i will, later :)
[3:12] <dmick> it's practically a oneliner :)
[3:16] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[3:21] * Frank9999 (~frank@kantoor.transip.nl) Quit ()
[3:22] <infernix> first things first
[3:22] * infernix puts the finishing touches on a fully fledged DR backup system for his cloud
[3:22] <infernix> including a chkntfs that runs in linux
[3:23] * infernix runs
[3:25] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[3:30] <jmlowe> infernix: you have the insanely fast ib based cluster?
[3:30] <infernix> yes
[3:30] <infernix> well not as insanely fast yet as i want it to be
[3:30] <infernix> but fast enough for the moment
[3:31] <infernix> rsockets and more nodes will do it good
[3:36] <jmlowe> it's too late tonight, I'd love to have a rundown of your setup
[3:37] * gucki (~smuxi@HSI-KBW-149-172-49-22.hsi13.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[3:38] <infernix> i'm always here, ping me sometime :)
[3:39] * gucki (~smuxi@HSI-KBW-109-192-063-077.hsi6.kabel-badenwuerttemberg.de) has joined #ceph
[3:42] * KindOne (KindOne@h147.26.131.174.dynamic.ip.windstream.net) Quit (Quit: Quit Message!? Never heard of such a thing.)
[3:45] * jlogan (~Thunderbi@2600:c00:3010:1:f195:abb6:9669:db22) Quit (Ping timeout: 480 seconds)
[3:48] * livekcats (~stackevil@178.112.21.157.wireless.dyn.drei.com) Quit (Quit: This Mac has gone to sleep!)
[3:50] * jlogan (~Thunderbi@72.5.59.176) has joined #ceph
[3:51] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Quit: ChatZilla 0.9.90 [Firefox 19.0.2/20130307023931])
[3:55] * KindOne (KindOne@h147.26.131.174.dynamic.ip.windstream.net) has joined #ceph
[3:58] * nwl (~levine@atticus.yoyo.org) Quit (Ping timeout: 480 seconds)
[4:08] * nwl (~levine@atticus.yoyo.org) has joined #ceph
[4:09] * Cube1 (~Cube@66-87-65-31.pools.spcsdns.net) Quit (Quit: Leaving.)
[4:12] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[4:16] * gucki (~smuxi@HSI-KBW-109-192-063-077.hsi6.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[4:19] * gucki (~smuxi@HSI-KBW-109-192-063-077.hsi6.kabel-badenwuerttemberg.de) has joined #ceph
[4:27] * gucki (~smuxi@HSI-KBW-109-192-063-077.hsi6.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[4:29] * gucki (~smuxi@HSI-KBW-109-192-063-077.hsi6.kabel-badenwuerttemberg.de) has joined #ceph
[4:29] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Quit: Leaving.)
[4:29] * dosa (dosa@140.114.91.32) has joined #ceph
[4:37] <dosa> Hello~
[4:38] <dosa> I have a question about Ceph File System
[4:38] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[4:38] * ChanServ sets mode +o scuttlemonkey
[4:39] <dosa> I use Ceph File System to save VM images
[4:40] * gucki (~smuxi@HSI-KBW-109-192-063-077.hsi6.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[4:40] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Read error: Connection reset by peer)
[4:40] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[4:41] <dosa> But only one folder which VM images running I can't use command 'ls' in this folder.
[4:42] <dmick> what does "can't use" mean?
[4:43] <dosa> it's block
[4:43] <dosa> it's lock
[4:43] <dmick> the ls command hangs?
[4:45] <dosa> only ls lock , I puting file or using file is working
[4:46] <dmick> so tell me if this is correct: you're using cephfs to store VM images; the VMs are running OK; but if you try to use ls on the filesystem, ls hangs and doesn't show any output. Is that right?
[4:46] <dosa> Yes,
[4:47] <dosa> and terminal is dead
[4:47] <dosa> I will open another Terminal to kill ls process
[4:49] <dmick> you mean "you can't interrupt the ls process with ^C"?
[4:49] <dosa> I can't
[4:50] <dosa> Do you need to see my config file?
[4:51] <iggy> one question (won't help you solve your problem, but...), why are you putting VM images in cephfs?
[4:51] <dmick> no. are you mounting the filesystem with the kernel, or with ceph-fuse?
[4:53] <dosa> some machines mount with ceph-fuse and some some machines use 'mount -t ceph'
[4:54] <dmick> and this happens on all of them?
[4:54] <dosa> Yes
[4:54] <iggy> what version of ceph?
[4:54] <dosa> 0.56.3
[4:55] <dmick> does ls hang when you first mount the filesystem, before anything else is using it?
[4:55] <dosa> No
[4:56] <dosa> it happened last week
[4:56] <dmick> well obviously this shouldn't happen
[4:56] <dosa> But only one node can use 'ls'
[4:56] <dmick> ?!
[4:56] <dosa> I have 20 nodes.
[4:57] <dosa> But only one node can use 'ls' and it does not lock
[4:58] <dmick> you're not using the kernel to mount cephfs on the same nodes that are running the cluster, are you?
[4:59] <dosa> Yes
[4:59] <iggy> also... I'm pretty sure cephfs still isn't considered production ready
[4:59] <dmick> you can't do that. deadlock will result.
[4:59] <dosa> Sorry, l got go.
[5:01] <dmick> cheers
[5:01] * esammy (~esamuels@host-2-102-70-24.as13285.net) has joined #ceph
[5:18] * gucki (~smuxi@HSI-KBW-078-042-029-115.hsi3.kabel-badenwuerttemberg.de) has joined #ceph
[5:19] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[5:43] <dosa> HI, I come back.
[5:45] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[5:46] <dosa> I only use the kernel to mount cephfs in one node
[5:47] <dosa> I only use the kernel to mount cephfs in some nodes.
[5:47] <dosa> only VM use ceph-fuse
[5:53] * dosa (dosa@140.114.91.32) has left #ceph
[5:54] * dosa (dosa@140.114.91.32) has joined #ceph
[5:54] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[5:56] * jlogan (~Thunderbi@72.5.59.176) Quit (Ping timeout: 480 seconds)
[5:58] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) Quit ()
[5:59] * jlogan (~Thunderbi@2600:c00:3010:1:488d:5414:12e0:8e1c) has joined #ceph
[6:03] * mauilion (~dcooley@c-71-198-86-127.hsd1.ca.comcast.net) Quit (Quit: leaving)
[6:23] <iggy> you'd be better off doing it the other way around
[6:24] <iggy> but you're really not supposed to run kernel clients on the same hosts as OSDs, MDSs, or MONs
[7:31] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[7:31] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit ()
[7:34] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[7:35] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit ()
[8:05] * jlogan (~Thunderbi@2600:c00:3010:1:488d:5414:12e0:8e1c) Quit (Ping timeout: 480 seconds)
[8:10] * jlogan (~Thunderbi@72.5.59.176) has joined #ceph
[8:15] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[8:15] * lmh_ (~lmh@203.114.244.88) has joined #ceph
[8:22] * sleinen (~Adium@130.59.94.175) has joined #ceph
[8:24] * sleinen1 (~Adium@2001:620:0:26:1f0:986:4c7e:4689) has joined #ceph
[8:30] * sleinen (~Adium@130.59.94.175) Quit (Ping timeout: 480 seconds)
[8:48] * fghaas (~florian@dhcp-admin-217-66-51-168.pixelpark.com) has joined #ceph
[8:52] * l0nk (~alex@83.167.43.235) has joined #ceph
[8:57] * dosaboy (~user1@host86-164-136-44.range86-164.btcentralplus.com) Quit (Quit: Leaving.)
[9:00] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[9:04] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[9:05] * capri (~capri@212.218.127.222) Quit (Read error: Connection reset by peer)
[9:06] * capri (~capri@212.218.127.222) has joined #ceph
[9:06] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[9:14] * Philip_ (~Philip@hnvr-4d0797a7.pool.mediaWays.net) Quit (Ping timeout: 480 seconds)
[9:20] * ScOut3R (~ScOut3R@c83-249-233-227.bredband.comhem.se) has joined #ceph
[9:22] * leseb (~leseb@83.167.43.235) has joined #ceph
[9:23] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[9:23] * leseb (~leseb@83.167.43.235) Quit (Read error: Connection reset by peer)
[9:23] * leseb (~leseb@83.167.43.235) has joined #ceph
[9:25] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) has joined #ceph
[9:25] * gerard_dethier (~Thunderbi@85.234.217.115.static.edpnet.net) has joined #ceph
[9:28] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:33] * LeaChim (~LeaChim@02da1ea0.bb.sky.com) has joined #ceph
[9:33] * frank9999 (~frank@kantoor.transip.nl) has joined #ceph
[9:57] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[10:02] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: If your not living on the edge, you're taking up too much space)
[10:06] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[10:09] * yanzheng (~zhyan@jfdmzpr06-ext.jf.intel.com) Quit (Remote host closed the connection)
[10:10] * ninkotech (~duplo@ip-89-102-24-167.net.upcbroadband.cz) Quit (Quit: Konversation terminated!)
[10:16] * jlogan (~Thunderbi@72.5.59.176) Quit (Ping timeout: 480 seconds)
[10:22] * jlogan (~Thunderbi@72.5.59.176) has joined #ceph
[10:30] * lofejndif (~lsqavnbok@09GAAAOTU.tor-irc.dnsbl.oftc.net) has joined #ceph
[10:37] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[10:37] * jtangwk (~Adium@2001:770:10:500:3565:409a:b866:e8b2) Quit (Quit: Leaving.)
[10:38] * jtangwk (~Adium@2001:770:10:500:98b3:6878:990d:83c3) has joined #ceph
[10:40] * sleinen1 (~Adium@2001:620:0:26:1f0:986:4c7e:4689) Quit (Quit: Leaving.)
[10:41] * sleinen (~Adium@130.59.94.175) has joined #ceph
[10:42] * sleinen1 (~Adium@2001:620:0:25:cc9d:6b21:da95:14ae) has joined #ceph
[10:43] * maxiz (~pfliu@222.128.144.239) has joined #ceph
[10:45] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[10:49] * sleinen (~Adium@130.59.94.175) Quit (Ping timeout: 480 seconds)
[10:55] * leseb (~leseb@83.167.43.235) has joined #ceph
[10:56] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[10:59] * leseb (~leseb@83.167.43.235) has joined #ceph
[11:00] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[11:05] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[11:11] * lofejndif (~lsqavnbok@09GAAAOTU.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[11:13] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[11:23] * maxiz (~pfliu@222.128.144.239) Quit (Quit: Ex-Chat)
[11:25] * ShaunR (~ShaunR@staff.ndchost.com) Quit ()
[11:30] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[11:31] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[11:42] * dosa (dosa@140.114.91.32) Quit ()
[11:49] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[11:55] * sleinen1 (~Adium@2001:620:0:25:cc9d:6b21:da95:14ae) Quit (Quit: Leaving.)
[11:56] * sleinen (~Adium@130.59.94.175) has joined #ceph
[11:56] * leseb (~leseb@83.167.43.235) has joined #ceph
[11:58] * sleinen1 (~Adium@130.59.94.175) has joined #ceph
[11:59] * sleinen2 (~Adium@2001:620:0:26:1db8:89c8:e39c:7ac8) has joined #ceph
[12:01] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[12:04] * sleinen (~Adium@130.59.94.175) Quit (Ping timeout: 480 seconds)
[12:06] * sleinen1 (~Adium@130.59.94.175) Quit (Ping timeout: 480 seconds)
[12:16] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[12:19] <gucki> hi there
[12:20] <gucki> is there any where documented what the output of "rados df" means?
[12:24] * sleinen2 (~Adium@2001:620:0:26:1db8:89c8:e39c:7ac8) Quit (Quit: Leaving.)
[12:28] * jlogan (~Thunderbi@72.5.59.176) Quit (Ping timeout: 480 seconds)
[12:30] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) has joined #ceph
[12:33] * jlogan (~Thunderbi@2600:c00:3010:1:acf5:5480:846b:b86) has joined #ceph
[12:43] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[12:46] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[12:47] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) Quit (Read error: Connection reset by peer)
[12:47] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[12:48] * fghaas (~florian@dhcp-admin-217-66-51-168.pixelpark.com) Quit (Read error: Operation timed out)
[12:51] * leseb (~leseb@83.167.43.235) has joined #ceph
[13:00] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) Quit (Read error: No route to host)
[13:01] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[13:09] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) Quit (Ping timeout: 480 seconds)
[13:11] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) has joined #ceph
[13:18] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) has joined #ceph
[13:29] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[13:31] * wschulze (~wschulze@cpe-69-203-80-81.nyc.res.rr.com) has joined #ceph
[13:44] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) Quit (Ping timeout: 480 seconds)
[13:48] * markbby (~Adium@168.94.245.2) has joined #ceph
[13:57] * ashish (~ashish.ch@81.209.159.2) Quit ()
[13:59] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[14:00] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[14:03] * livekcats (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[14:03] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) Quit (Read error: Connection reset by peer)
[14:04] * ramonskie (ab15507e@ircip2.mibbit.com) has joined #ceph
[14:05] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:08] <ramonskie> i have some problems with ceph and cinder/opentstack anyone here that has some knowledge about this setup?
[14:14] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[14:18] * sleinen (~Adium@130.59.94.175) has joined #ceph
[14:18] <iggy> ramonskie: there are a few that will be around in a couple hours (us west coast), but until then just ask
[14:18] * markbby (~Adium@168.94.245.2) has joined #ceph
[14:18] <ramonskie> iggy: thanks
[14:19] * sleinen1 (~Adium@2001:620:0:25:a885:8d4c:7afc:d594) has joined #ceph
[14:20] * fghaas (~florian@dhcp-admin-217-66-51-168.pixelpark.com) has joined #ceph
[14:20] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[14:26] * sleinen (~Adium@130.59.94.175) Quit (Ping timeout: 480 seconds)
[14:26] <joelio> gucki: Looks self-explanatory to me, what are you not sure about?
[14:32] * drokita (~drokita@199.255.228.128) has joined #ceph
[14:35] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[14:36] * Rocky (~r.nap@188.205.52.204) has left #ceph
[14:37] * BillK (~BillK@124-149-92-238.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[14:39] * jlogan (~Thunderbi@2600:c00:3010:1:acf5:5480:846b:b86) Quit (Ping timeout: 480 seconds)
[14:42] * jlogan (~Thunderbi@72.5.59.176) has joined #ceph
[14:43] * hybrid5121 (~w.moghrab@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Quit: Leaving.)
[14:45] * BillK (~BillK@58-7-59-228.dyn.iinet.net.au) has joined #ceph
[14:46] * aliguori (~anthony@32.97.110.51) has joined #ceph
[14:48] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[14:54] * Philip_ (~Philip@hnvr-4dbd07c1.pool.mediaWays.net) has joined #ceph
[14:56] * slang (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) Quit (Remote host closed the connection)
[14:56] * PerlStalker (~PerlStalk@72.166.192.70) has joined #ceph
[14:57] * leseb (~leseb@83.167.43.235) has joined #ceph
[14:59] * stackevil__ (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[15:05] * livekcats (~stackevil@cpe90-146-43-165.liwest.at) Quit (Ping timeout: 480 seconds)
[15:05] * yanzheng (~zhyan@jfdmzpr01-ext.jf.intel.com) Quit (Remote host closed the connection)
[15:09] <yehuda_hm> jamespage: around?
[15:11] * stackevil__ (~stackevil@cpe90-146-43-165.liwest.at) Quit (Ping timeout: 480 seconds)
[15:13] <jamespage> yehuda_hm, yep
[15:13] <yehuda_hm> jamespage: https://bugs.launchpad.net/ubuntu/+source/curl/+bug/613274
[15:14] <yehuda_hm> libcurl is still compiled without c-ares support
[15:15] <yehuda_hm> we use libcurl with keystone, found some issue yesterday
[15:15] <jamespage> indeed
[15:15] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) has joined #ceph
[15:15] <yehuda_hm> has some multithreading issue, the fix for that, however, may lead to name resolving not timing out
[15:17] <yehuda_hm> I think the fix to that would be by compiling in c-ares (as long as it doesn't break other things like IPv6 as it used to)
[15:17] * vata (~vata@2607:fad8:4:6:a842:ac48:fdc:58d6) has joined #ceph
[15:22] <jamespage> yehuda_hm, leave it with me - we are past feature freeze for raring and this is a core package so its a little outside my comfort zone tbh
[15:24] <yehuda_hm> jamespage: sure, thanks
[15:29] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Read error: Connection reset by peer)
[15:29] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[15:32] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[15:43] * Rocky (~r.nap@188.205.52.204) has joined #ceph
[15:46] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[15:51] * leseb (~leseb@83.167.43.235) has joined #ceph
[15:52] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[15:54] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[15:56] * leseb (~leseb@83.167.43.235) has joined #ceph
[15:58] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[16:01] <ramonskie> when i want to create a volume under a different id called volumes. i get operation not permitted
[16:02] <ramonskie> under admin its no problem
[16:04] * b1tbkt (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) has joined #ceph
[16:05] <scuttlemonkey> did you give permissions to the new id?
[16:06] <ramonskie> yes
[16:06] <ramonskie> i followed http://ceph.com/docs/master/rbd/rbd-openstack/
[16:06] <scuttlemonkey> you're just doing it w/ the native rbd client then? Not via openstack or somesuch?
[16:06] <ramonskie> yes native rbd
[16:07] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[16:09] * sleinen1 (~Adium@2001:620:0:25:a885:8d4c:7afc:d594) Quit (Quit: Leaving.)
[16:14] <ramonskie> there must be some permissions that i'm overlooking
[16:15] <scuttlemonkey> what is 'ls -l /etc/ceph' ?
[16:16] <ramonskie> ceph.client.admin.keyring
[16:16] <ramonskie> ceph.client.volumes.keyring
[16:17] <ramonskie> ceph.client.images.keyring
[16:17] <ramonskie> ceph..conf
[16:18] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[16:18] <scuttlemonkey> yeah, I was looking for permissions of ls -l
[16:18] <scuttlemonkey> ownership and whatnot
[16:18] <scuttlemonkey> http://pastebin.com/
[16:20] <ramonskie> images.keyring are owned by glance user and volumes are owned by cinder
[16:20] <ramonskie> sorry can't copy past separate network
[16:20] <ramonskie> al files are rw-r-r
[16:23] * sleinen (~Adium@130.59.94.175) has joined #ceph
[16:24] * sleinen1 (~Adium@2001:620:0:26:e491:c9ea:4462:72f7) has joined #ceph
[16:26] * jtangwk (~Adium@2001:770:10:500:98b3:6878:990d:83c3) Quit (Quit: Leaving.)
[16:27] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[16:27] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[16:27] * jtangwk (~Adium@2001:770:10:500:20fc:ee30:1f20:e4e2) has joined #ceph
[16:31] * sleinen (~Adium@130.59.94.175) Quit (Ping timeout: 480 seconds)
[16:34] * dosaboy (~gizmo@faun.canonical.com) Quit (Ping timeout: 480 seconds)
[16:34] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[16:35] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[16:36] * diegows (~diegows@190.190.2.126) Quit (Ping timeout: 480 seconds)
[16:40] <scuttlemonkey> sry, phone call
[16:40] <scuttlemonkey> you see the volumes key in 'sudo ceph auth list' ?
[16:42] <ramonskie> yup
[16:42] <scuttlemonkey> k
[16:42] <scuttlemonkey> try this: 'rbd --id volumes --keyring=/etc/ceph/ceph.client.volumes.keyring ls -p volumes'
[16:43] <ramonskie> nothing but no error
[16:43] <scuttlemonkey> ok
[16:43] <scuttlemonkey> do the same thing now with a create command
[16:44] <ramonskie> and if i create one under admin it works and then i see the disk i created with above command
[16:44] <ramonskie> the creation fails
[16:45] <scuttlemonkey> really?
[16:45] <scuttlemonkey> rbd --id volumes --keyring=/etc/ceph/ceph.client.volumes.keyring create foo --size 1024 --pool volumes
[16:45] <scuttlemonkey> with that ^
[16:45] <ramonskie> yup used 'rbd --id volumes --keyring=/etc/ceph/ceph.client.volumes.keyring create --pool volumes --size 1024 test'
[16:46] <scuttlemonkey> try with "...create {image name} --size {size} --pool volumes"
[16:46] <ramonskie> without pool specified same problem
[16:47] <ramonskie> also same problem
[16:47] <scuttlemonkey> hrm
[16:47] <scuttlemonkey> whoami?
[16:47] <ramonskie> ??
[16:47] <scuttlemonkey> are you currently the cinder user?
[16:47] <ramonskie> no as sudo root
[16:48] <scuttlemonkey> oh
[16:48] <scuttlemonkey> sudo -u cinder -i
[16:48] <scuttlemonkey> then do 'rbd --id volumes --keyring=/etc/ceph/ceph.client.volumes.keyring create foo --size 1024 --pool volumes'
[16:49] * jlogan (~Thunderbi@72.5.59.176) Quit (Ping timeout: 480 seconds)
[16:49] <ramonskie> same prob
[16:51] * The_Bishop (~bishop@2001:470:50b6:0:8510:6c35:1f57:ec80) has joined #ceph
[16:52] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[16:52] <scuttlemonkey> can you ls as cinder?
[16:52] <scuttlemonkey> a la 'rbd --id volumes --keyring=/etc/ceph/ceph.client.volumes.keyring ls -p volumes'
[16:52] <ramonskie> yes
[16:53] <scuttlemonkey> so when you sudo ceph auth list
[16:53] <scuttlemonkey> volumes says:
[16:53] <scuttlemonkey> caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images
[16:53] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[16:53] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit ()
[16:54] <ramonskie> i created the auth with the following line " ceph auth get-or-create client.volumes mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rx pool=images' "
[16:55] <ramonskie> caps: mon allow r
[16:55] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[16:55] * leseb (~leseb@83.167.43.235) has joined #ceph
[16:55] <ramonskie> yup states the same
[16:55] <scuttlemonkey> k
[16:56] <scuttlemonkey> ceph -v ?
[16:56] <scuttlemonkey> 0.56.3?
[16:56] <scuttlemonkey> or a newer version?
[16:56] <ramonskie> this morning i used 0.48 and upgraded to 0.56.3 so currently 0.56.3
[16:56] <ramonskie> both versions had the same problem
[16:57] <scuttlemonkey> are all mons and osds running 56.3?
[16:57] <ramonskie> health ok and all are up
[16:57] * jlogan (~Thunderbi@72.5.59.176) has joined #ceph
[16:57] <joao> just a thought, try 'ceph -m ip:port auth list' for each mon ip:port combination
[16:58] <joao> and check if all the monitors output the same
[16:58] <joao> they should
[16:58] * gerard_dethier (~Thunderbi@85.234.217.115.static.edpnet.net) Quit (Quit: gerard_dethier)
[16:58] <gregaf> and verify that the output includes the key (and the right caps) matching your client key
[16:58] <ramonskie> i only have 1 machine now for debugin with 4 osd and 1 mon installed
[16:58] <ramonskie> machines had 6 disks
[17:00] <ramonskie> run on ceph node and on client node and both state the same
[17:01] * portante (~user@66.187.233.206) has joined #ceph
[17:08] * BillK (~BillK@58-7-59-228.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[17:08] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[17:09] * leseb (~leseb@83.167.43.235) has joined #ceph
[17:09] <scuttlemonkey> just to be sure...ceph auth list and cat /etc/ceph/ceph.client.volumes.keyring are the same?
[17:10] <ramonskie> yup they are the same i even give the file all permissions
[17:10] <ramonskie> same result
[17:12] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[17:12] * dosaboy (~gizmo@faun.canonical.com) Quit (Quit: Leaving.)
[17:13] <elder> gitbuilder down?
[17:15] <gregaf> elder: I'm about to force push the testing branch to deal with those two patches of Yan's; do you like to announce that anywhere or is it sufficiently internal?
[17:15] * leseb (~leseb@83.167.43.235) has joined #ceph
[17:15] <gregaf> and I can't see the kernel gitbuilder's web page so it looks like maybe so
[17:15] <elder> I haven't been consistent about announcing changes on the testing branch. I personally would prefer if we didn't rebase things much if possible, but OK with me. What are you forcing?
[17:16] <elder> Did you replace those two patches?
[17:16] <gregaf> yeah, the ones you were asking about yesterday
[17:16] <elder> Are you tacking them onto the end?
[17:16] * dosaboy (~gizmo@faun.canonical.com) has joined #ceph
[17:16] <elder> That would be best.
[17:16] <gregaf> the new ones, yes, but I'm pulling the old ones out
[17:16] <gregaf> and one of the two new ones continues to be a VFS change that we can't upstream
[17:17] <elder> Can you revert the old two and then add the new ones? (If you've already done the work just say "no.")
[17:17] <gregaf> sorry, already pushed :(
[17:17] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[17:17] <elder> OK.
[17:17] <gregaf> I got the impression somewhere that testing was basically a free-for-all
[17:17] <elder> It sort of is.
[17:18] <elder> But I have been recording comit ids in tracker issues lately. Now those are all crap.
[17:18] <elder> It's not a big deal.
[17:18] <gregaf> in particular it seems impolitic to push nullified commits upstream when they're all internal testing cycles
[17:18] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[17:18] <elder> We'll be disciplined about the master branch in any case. Yes, the testing branch is more or less a free-for-all.
[17:18] <gregaf> Don't you pull the important ones into a separate upstreaming branch first?
[17:19] <gregaf> s/the important ones/the finalized ones
[17:19] <gucki> joelio: still there?
[17:19] <elder> Basically when the merge cycle opens, we set the master branch to be equal to the testing branch.
[17:19] <gregaf> ah
[17:19] <elder> Personally, I don't push to testing until they're finalized from my perspective.
[17:20] <elder> I don't know what Sage or others do.
[17:20] <elder> I do all my non-final testing in a private branch.
[17:20] <scuttlemonkey> gucki: I can try to help if you still have df questions
[17:20] <scuttlemonkey> ramonskie: I would try deleting the pool and starting clean tbh
[17:21] <scuttlemonkey> ramonskie: I really can't see what would allow you to ls the pool but not write to it if you permission are indeed the same as mine
[17:21] <scuttlemonkey> your permissions*
[17:21] <ramonskie> what do you mean with clean tbh
[17:21] <scuttlemonkey> tbh == to be honest
[17:22] <ramonskie> haha okee :)
[17:22] <ramonskie> okay will do that luckly its still a test enviorment..
[17:22] <scuttlemonkey> right
[17:22] <gucki> scuttlemonkey: great. i guess rd = read ops and wd = write ops. but i wonder why rd never increases for my cluster?
[17:22] <gucki> scuttlemonkey: http://pastebin.com/Ehc1cKyL
[17:22] <scuttlemonkey> if it weren't I'd see if we could snag someone to dig deeper
[17:23] <gucki> also, is there a way to make "ceph -w" also output rd/s, just like it outputs wd/s?
[17:23] <ramonskie> okay well thanks for all the help so far.
[17:24] <ramonskie> will let you know if it helped..
[17:24] <gucki> here's the output of my ceph -w: http://pastebin.com/A9f2012X
[17:24] <scuttlemonkey> cool, yeah...am definitely puzzled
[17:24] <scuttlemonkey> would like to know what's up
[17:25] <gregaf> elder: that's generally a good plan but unfortunately I don't have the time to run private suites of the kernel, so when I'm pulling in patches from the list I mostly have to rely on the nightlies
[17:25] <gucki> scuttlemonkey: probably a bug?
[17:25] <elder> No problem.
[17:25] <gregaf> someday we'll have more infrastructure and clearer lines of responsibility around that stuff so it's not just me or Sage when we have the time ;)
[17:27] <ramonskie> scuttlemonkey: do i also need to delete the auth keys?
[17:27] <scuttlemonkey> ok, sorry gucki...was cleaning up from my work with ramonskie
[17:27] <scuttlemonkey> ramonskie: yeah, I'd blow it all away and start fresh
[17:27] <scuttlemonkey> then verify each step...if you can paste what you type into a text window for later dump to me, that'd be great
[17:28] <scuttlemonkey> gucki: ok, so rd doesn't change?
[17:28] <gucki> scuttlemonkey: yes, rd and rd KB don#
[17:28] <gucki> 't chabge
[17:28] <gucki> change ;)
[17:28] <gucki> they also don't show up in ceph -w ...
[17:29] <gucki> (i assume because they don't change.. )
[17:29] <scuttlemonkey> gucki: what is your ceph -v
[17:29] <gucki> scuttlemonkey: latest bobtail from your repos
[17:30] <gucki> ceph version 0.56.3 (6eb7e15a4783b122e9b0c85ea9ba064145958aa5)
[17:30] <scuttlemonkey> hrm
[17:30] <scuttlemonkey> http://tracker.ceph.com/issues/2209 was resolved quite some time ago
[17:31] <gucki> scuttlemonkey: well....i do have another argonaut clister 0.48.2 ...i just looked, rd is also not changing there
[17:31] <gucki> scuttlemonkey: i only use qemu rbd images...could this be the problem?
[17:32] <gucki> should i open a new bug?
[17:32] * alram (~alram@38.122.20.226) has joined #ceph
[17:32] <scuttlemonkey> one sec, lemme look here
[17:33] <paravoid> so, when using ceph-disk-prepare/activate, how do you handle disk failures?
[17:33] <paravoid> new OSD ids?
[17:37] <scuttlemonkey> gucki: yeah, will reopen this bug for sage to look at
[17:37] <scuttlemonkey> thanks for letting us know
[17:37] * andreask (~andreas@h081217068225.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[17:38] <gucki> scuttlemonkey: ok, i'll also write a comment and put my links in so i also get notified of comments
[17:38] <scuttlemonkey> cool
[17:39] * ramonskiee (3e295201@ircip2.mibbit.com) has joined #ceph
[17:40] * noob2 (~cjh@173.252.71.4) has joined #ceph
[17:41] <ramonskiee> scuttlemonkey: still the same i did the following http://mibpaste.com/iTYFPJ
[17:42] <scuttlemonkey> ramonskie: 'ceph auth get-or-create client.images | sudo tee /etc/ceph/ceph.client.images.keyring'
[17:42] <gucki> scuttlemonkey: ok done. but i don't see how i can change the state (resolved -> open?)
[17:42] <scuttlemonkey> 'sudo chown glance:glance /etc/ceph/ceph.client.images.keyring'
[17:42] <scuttlemonkey> same for volumes / cinder
[17:43] <scuttlemonkey> gucki: I'll have to do that...but my perms appear borked atm
[17:43] <ramonskie> no set them to root
[17:43] <ramonskie> but did try it both :)
[17:43] <gucki> btw, not a ceph question...but anybody here knows how much memory overhead for kvm is normal?
[17:44] <scuttlemonkey> ramonskie: you did the above, just substituting root:root?
[17:45] <ramonskie> i did chown root:root and tried it with cinder:cinder
[17:45] <scuttlemonkey> ok, just didn't see a command on your paste that had you creating the keyring file
[17:46] <ramonskie> no forgot to put it there..
[17:46] * eschnou (~eschnou@85.234.217.115.static.edpnet.net) Quit (Remote host closed the connection)
[17:48] * The_Bishop (~bishop@2001:470:50b6:0:8510:6c35:1f57:ec80) Quit (Ping timeout: 480 seconds)
[17:48] * sleinen1 (~Adium@2001:620:0:26:e491:c9ea:4462:72f7) Quit (Quit: Leaving.)
[17:48] * sleinen (~Adium@130.59.94.175) has joined #ceph
[17:49] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[17:50] <scuttlemonkey> since you can't paste...can you screenshot the term window for 'sudo ceph auth list' and 'sudo ceph-authtool -l /etc/ceph/ceph.client.volumes.keyring' ?
[17:50] <scuttlemonkey> a la: https://dl.dropbox.com/u/5334652/ramonskie.png
[17:51] <ramonskie> i can paste now
[17:51] <scuttlemonkey> k
[17:52] <ramonskiee> http://mibpaste.com/IyZDKn
[17:54] * KevinPerks1 (~Adium@cpe-066-026-239-136.triad.res.rr.com) has joined #ceph
[17:55] <scuttlemonkey> ramonskie: and ceph auth list?
[17:56] * The_Bishop (~bishop@2001:470:50b6:0:b420:1010:bbab:da7b) has joined #ceph
[17:56] <ramonskiee> http://mibpaste.com/57hMOe
[17:57] <scuttlemonkey> and you set chown root:root /etc/ceph/ceph.client.volumes.keyring?
[17:57] * KevinPerks (~Adium@cpe-066-026-239-136.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[17:58] <ramonskie> yup
[17:58] <ramonskie> and even tried it wih cinder:cinder
[17:59] <ramonskiee> -rw-r--r-- 1 root root 66 Mar 13 11:44 ceph.client.volumes.keyring
[17:59] * sleinen (~Adium@130.59.94.175) Quit (Ping timeout: 480 seconds)
[17:59] <scuttlemonkey> ok
[17:59] <scuttlemonkey> so as root, you can 'rbd --id volumes --keyring=/etc/ceph/ceph.client.volumes.keyring ls -p volumes'
[18:00] <scuttlemonkey> but not 'rbd --id volumes --keyring=/etc/ceph/ceph.client.volumes.keyring create foo2 --size 1024 --pool volumes'
[18:00] <ramonskie> yup exactly
[18:01] * sleinen (~Adium@2001:620:0:26:d41c:272:ac37:7df7) has joined #ceph
[18:02] <scuttlemonkey> can you pastebin your ceph -s?
[18:03] <ramonskiee> http://mibpaste.com/M5RkWu
[18:05] * sleinen (~Adium@2001:620:0:26:d41c:272:ac37:7df7) Quit ()
[18:05] * sleinen (~Adium@130.59.94.175) has joined #ceph
[18:06] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[18:07] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[18:08] <ramonskie> scuttlemonkey: i'm done for the day (maby the whole week... heads exploding) so thanks for all the help so far wil try again tomorrow.
[18:08] * leseb (~leseb@83.167.43.235) Quit (Remote host closed the connection)
[18:08] <scuttlemonkey> ok, sorry we couldn't get you straightened around
[18:09] <ramonskie> no problem you tried alot.
[18:09] <ramonskie> will be back tomorrow
[18:10] * ramonskie (ab15507e@ircip2.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[18:11] <yehudasa> gregaf: can you take a look at wip-4425-next (one liner)
[18:13] * sleinen (~Adium@130.59.94.175) Quit (Ping timeout: 480 seconds)
[18:16] <gregaf> yehudasa: we'll need a plan for those side effects you mention in the commit message
[18:16] <gregaf> the patch itself looks fine
[18:16] * jlk (~jlk@50.22.62.14-static.reverse.softlayer.com) Quit (Quit: Changing server)
[18:16] <yehudasa> gregaf: https://bugs.launchpad.net/ubuntu/+source/curl/+bug/613274
[18:21] * eschnou (~eschnou@148.91-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:21] * mattch (~mattch@pcw3047.see.ed.ac.uk) Quit (Quit: Leaving.)
[18:22] * mattch (~mattch@pcw3047.see.ed.ac.uk) has joined #ceph
[18:23] * houkouonchi-work (~linux@12.248.40.138) Quit (Read error: Connection reset by peer)
[18:24] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[18:26] * sjustlaptop (~sam@71-83-191-116.dhcp.gldl.ca.charter.com) has joined #ceph
[18:28] * sleinen (~Adium@2001:620:0:25:3036:246:fa55:7b96) has joined #ceph
[18:29] * sleinen (~Adium@2001:620:0:25:3036:246:fa55:7b96) Quit ()
[18:31] * eschnou (~eschnou@148.91-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[18:33] * oddover (~oddover@glados.colorado.edu) has joined #ceph
[18:34] <oddover> hi all. I'm trying to set up a ceph stack for testing and I need some help.
[18:34] <oddover> I'm following the 5-minute quick start guide, and I think I've followed all the steps correctly, but I'm getting an error when I try to mount the cephfs
[18:35] <oddover> "can't read superblock"
[18:37] <gregaf> you probably didn't set up an MDS node – I don't think the regular quick start does, which means you haven't set it up for the filesystem, just for RADOS (RBD and Rados Gateway)
[18:38] <oddover> ok. how do I do that?
[18:38] <scuttlemonkey> oddover: yeah, the quickstart has you add mds stuff to your conf, but not actually add the mds
[18:38] * dosaboy (~gizmo@faun.canonical.com) Quit (Quit: Leaving.)
[18:38] <scuttlemonkey> at the bottom of the quickstart should be another couple of links to "other quickstarts"
[18:39] <oddover> yep. there's one for cephfs. is that what I want?
[18:39] <scuttlemonkey> ahh, yeah...although I see that doesn't take you through mds setup
[18:39] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) has joined #ceph
[18:39] <oddover> yep
[18:40] * l0nk (~alex@83.167.43.235) Quit (Quit: Leaving.)
[18:42] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[18:43] <oddover> so how do I do the mds setup?
[18:43] <scuttlemonkey> one sec...was digging for more complete
[18:43] <scuttlemonkey> actually it looks like the 5-minute does include it now
[18:44] <scuttlemonkey> 'sudo mkdir -p /var/lib/ceph/mds/ceph-a' and [mds.a]
[18:44] <scuttlemonkey> the restriction being, you can't mount ceph from that same quickstart box
[18:44] <scuttlemonkey> has to be a diff box
[18:51] * b1tbkt (~Peekaboo@68-184-193-142.dhcp.stls.mo.charter.com) Quit (Remote host closed the connection)
[18:53] <oddover> ok, then my setup should work.
[18:53] <oddover> I did that step
[18:54] <oddover> what did you mean by [mds.a]?
[18:54] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:54] <scuttlemonkey> in your ceph.conf, there is an entry for [mds.a] and a hostname (if you followed 5-min quickstart)
[18:55] * gucki (~smuxi@HSI-KBW-078-042-029-115.hsi3.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[18:56] <oddover> yep
[18:57] <oddover> I'm trying to re-run the mkcephfs step, and now I'm getting "Invalid argument"
[18:57] <scuttlemonkey> can you pastebin your 'ceph -s' ?
[18:57] <oddover> I deleted /var/lib/ceph/osd/ceph-0's contents
[19:01] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Quit: tryggvil)
[19:02] * fghaas (~florian@dhcp-admin-217-66-51-168.pixelpark.com) Quit (Ping timeout: 480 seconds)
[19:02] * jlogan (~Thunderbi@72.5.59.176) Quit (Read error: Connection reset by peer)
[19:03] * jlogan (~Thunderbi@72.5.59.176) has joined #ceph
[19:03] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[19:06] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) Quit (Quit: Easy as 3.14159265358979323846... )
[19:09] * aliguori (~anthony@32.97.110.51) Quit (Remote host closed the connection)
[19:12] * alram (~alram@38.122.20.226) Quit (Read error: Connection reset by peer)
[19:12] * alram (~alram@38.122.20.226) has joined #ceph
[19:17] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[19:18] * sleinen1 (~Adium@2001:620:0:26:9ef:6c83:c71a:6719) has joined #ceph
[19:24] * themgt (~themgt@24-177-232-181.dhcp.gnvl.sc.charter.com) has joined #ceph
[19:24] * ScOut3R (~ScOut3R@c83-249-233-227.bredband.comhem.se) Quit (Remote host closed the connection)
[19:24] * sleinen1 (~Adium@2001:620:0:26:9ef:6c83:c71a:6719) Quit (Quit: Leaving.)
[19:25] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[19:26] <mgalkiewicz> guys any estimates on 0.56.4?
[19:30] * mcclurmc_laptop (~mcclurmc@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[19:31] * Tribaal (uid3081@hillingdon.irccloud.com) Quit (Ping timeout: 480 seconds)
[19:37] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) has joined #ceph
[19:44] * diegows (~diegows@200.68.116.185) has joined #ceph
[19:50] * mjevans- (~mje@209.141.34.79) has joined #ceph
[19:50] * mjevans (~mje@209.141.34.79) Quit (Remote host closed the connection)
[19:52] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[19:53] <scuttlemonkey> mgalkiewicz: no solid estimate atm (that I'm aware of)...only numbers I have seen are for larger roadmap stuff
[19:56] * The_Bishop (~bishop@2001:470:50b6:0:b420:1010:bbab:da7b) Quit (Ping timeout: 480 seconds)
[19:58] * jjgalvez (~jjgalvez@cpe-76-175-30-67.socal.res.rr.com) Quit (Quit: Leaving.)
[19:59] * The_Bishop (~bishop@2001:470:50b6:0:b420:1010:bbab:da7b) has joined #ceph
[20:00] * dpippenger (~riven@cpe-75-85-17-224.socal.res.rr.com) Quit (Remote host closed the connection)
[20:02] * mjevans- (~mje@209.141.34.79) Quit (Ping timeout: 480 seconds)
[20:04] * leseb (~leseb@78.250.109.87) has joined #ceph
[20:05] <lx0> so, I tried to rearrange my crushmap to get 4 replicas equally divided among 3 equal-sized servers, on 13 different-sized disks, with usage proportional to disk size. it proved to be a bit challenging to set the weights so as to get the right number of PGs on each OSD
[20:06] <lx0> so I decided to experiment with the other crush algorithms; I used uniform and tree on different layers of the crushmap, and that turned out to get ceph.ko on 3.8.2 thoroughly confused: it started dumping out the hex representation of the osdmap to dmesg
[20:06] * lx0 is now known as lxo
[20:07] * mjevans (~mje@209.141.34.79) has joined #ceph
[20:09] <lxo> it was only when I went back to straw all over that I could mount the filesystem again (I had rebooted the client before, thinking it had got corrupted somehow)
[20:10] <scuttlemonkey> lxo: yeah, as far as I know only the one algorithm is really supported
[20:12] <lxo> now, as for placement, I kind of with there was a more predictable algorithm that enabled easier PG assignments on a more static manner. I don't mean placing individual PGs manually, but perhaps some way to set weights manually, and then have internally-computed adjusted weights that brings the actual usage closer to the specified weights
[20:13] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[20:13] <lxo> 'cause otherwise I get severe imbalance on the (smallish) cluster, with one small disk being assigned even twice as many PGs as a crushmap neighbor with the same weight!
[20:13] <janos> lxo: how are you determining disk usage? parsing pg dump?
[20:14] * mjevans (~mje@209.141.34.79) Quit (Read error: Operation timed out)
[20:14] <scuttlemonkey> lxo: gregaf did a great job of summing up some of the decisions and functionality in his reply here: http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/
[20:14] <lxo> janos, I used df at first ;-) I went through parsing pg dumps, then I moved to crushmap placement dumps
[20:15] <janos> ah i haven't checked out crushmap placement dumps yet
[20:15] <lxo> scuttlemonkey, thanks for the pointer
[20:15] <janos> i need to check that out
[20:15] * nwat (~Adium@eduroam-242-170.ucsc.edu) has joined #ceph
[20:16] <lxo> janos, it works on my case, where I have exactly as many buckets as replicas, and you have to take into account that the pool number gets added to the pg number submitted to the crush hashing algorithm
[20:16] <janos> i wrote a script to parse pg dump to give me output like this: http://paste.fedoraproject.org/4982/36320219
[20:17] <janos> but i will check out the crush dump
[20:17] <scuttlemonkey> janos: neat
[20:17] * dpippenger (~riven@cpe-76-166-221-185.socal.res.rr.com) has joined #ceph
[20:17] <scuttlemonkey> that something you would mind putting up on github or similar?
[20:17] <janos> it was my ugly first hit ever writing python
[20:17] <scuttlemonkey> I am assembling a list of "random useful crap" for when we relaunch the wiki
[20:17] <janos> i have no idea how, but i would gladly
[20:17] <scuttlemonkey> or just email it to patrick at inktank dot com
[20:18] <lxo> osdmaptool can overcome the above and dump the placement given an osdmap with an embedded crushmap, I was told after I'd scripted what I needed based on the crushtool csv placement output ;-)
[20:18] <janos> any particluar open source "use as you wish" header i should put on it?
[20:18] <scuttlemonkey> /shrug
[20:18] <janos> i'll do the standard - no warranty. use at your own risk ;)
[20:18] * stackevil (~stackevil@cpe90-146-43-165.liwest.at) Quit (Quit: This Mac has gone to sleep!)
[20:18] <scuttlemonkey> I usually just choose something from this list: http://opensource.org/licenses/alphabetical
[20:19] * janos looks
[20:20] * mjevans (~mje@209.141.34.79) has joined #ceph
[20:21] <scuttlemonkey> but for random script-y stuff I often thumb my nose at the establishment too and do things like making users guarantee to provide me with 3 pigs and a chicken before any support requests are made
[20:21] <scuttlemonkey> ...so I may not be the best one to ask :P
[20:21] <janos> damn i like that deal
[20:22] <janos> aww bummer, no block comment in python
[20:23] <dmick> you can use """ strings
[20:24] <dmick> or an editor that knows how to format with # leading; vim's pretty good
[20:24] <janos> hrm, this is not ideal, but if you wanted it now i pasted
[20:24] <janos> http://paste.fedoraproject.org/4983/13632026/
[20:24] <janos> can paste elsewhere if more convenient
[20:24] <janos> be warned - ugly code ahead. my first python ever
[20:25] <janos> lol, i'm a tard. i see where i used ''' later in the code
[20:25] <janos> gah
[20:25] <scuttlemonkey> hehe
[20:25] <janos> my noobness exposed!
[20:26] <scuttlemonkey> I would just use visual mode in vi and :'<,'>s!^!#!
[20:27] <janos> i just tossed MIT license on it, but if anyone has roadbumps with it or whatever i'm completely flexible
[20:27] <scuttlemonkey> thanks, I'll definitely drop this in the pile
[20:28] <janos> cool
[20:34] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[20:38] * leseb_ (~leseb@78.250.109.87) has joined #ceph
[20:41] * leseb (~leseb@78.250.109.87) Quit (Ping timeout: 480 seconds)
[20:49] * eschnou (~eschnou@148.91-201-80.adsl-dyn.isp.belgacom.be) has joined #ceph
[20:50] * leseb_ (~leseb@78.250.109.87) Quit (Ping timeout: 480 seconds)
[20:50] * leseb (~leseb@78.250.109.87) has joined #ceph
[20:52] * fghaas (~florian@ds80-237-216-36.dedicated.hosteurope.de) has joined #ceph
[20:52] * gaveen (~gaveen@175.157.72.253) has joined #ceph
[20:54] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[20:58] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) has joined #ceph
[20:59] * sleinen1 (~Adium@2001:620:0:26:4578:c043:9cd2:1d79) has joined #ceph
[21:00] * yanzheng (~zhyan@134.134.139.70) has joined #ceph
[21:06] * sleinen (~Adium@217-162-132-182.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[21:08] * sstan (~chatzilla@dmzgw2.cbnco.com) has joined #ceph
[21:09] <sstan> I mapped some rbd on 3 machines. Somehow fdisk -l /dev/rbd1 shows different sizes depending on which machine I run the command
[21:09] <sstan> any ideas ?
[21:12] * sleinen1 (~Adium@2001:620:0:26:4578:c043:9cd2:1d79) Quit (Quit: Leaving.)
[21:13] * fghaas (~florian@ds80-237-216-36.dedicated.hosteurope.de) Quit (Quit: Leaving.)
[21:13] * diegows (~diegows@200.68.116.185) Quit (Read error: Operation timed out)
[21:15] <jjgalvez> that's interesting, let me see if I can find something for you
[21:15] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:15] <sstan> thanks
[21:15] <sstan> partprobe didn't help either btw :/
[21:18] <jjgalvez> are the machines all the same OS? any differences between them?
[21:19] <sstan> ceph version 0.57 (9a7a9d06c0623ccc116a1d3b71c765c20a17e98e) on all three of them
[21:20] <sstan> Linux 3.0.58-0.6.2-xen on all 3 of them
[21:20] * stackevil (~stackevil@178.112.22.72.wireless.dyn.drei.com) has joined #ceph
[21:21] <jjgalvez> I'm going to create an image on my cluster and check it against a few of the machine in it, I'll let you know what I find on my end
[21:21] <jjgalvez> what size is the image?
[21:21] <sstan> about 200 GB
[21:21] <sstan> but I did a resize at some point
[21:21] * slang1 (~slang@207-229-177-80.c3-0.drb-ubr1.chi-drb.il.cable.rcn.com) has joined #ceph
[21:22] <sstan> also, why rbds are listed in /proc/devices
[21:22] <sstan> rbd1, rbd2, ...
[21:22] <sstan> shouldn't there be only one entry?
[21:23] <gregaf> 3.0 has a pretty old version of the code; I'm not sure it's doing any of the watch-notify stuff that would tell it when the size changes
[21:23] <gregaf> and of course mounting on multiple machines at the same time is rarely a good plan
[21:23] <gregaf> joshd?
[21:24] <sstan> why not? it's required for clustered tasks
[21:24] <gregaf> if you've got a cluster-aware filesystem on top it'll work fine
[21:24] <gregaf> but you can't use it with eg ext4 any more than you could a hard drive with some magic SATA splicer
[21:24] <joshd> yeah, I'm pretty sure 3.0 krbd had a bug with reseting the block device size while it was mapped. you really want to use 3.6+ for a number of krbd bug fixes
[21:24] <sstan> unless the size gets modified :/
[21:25] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[21:25] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[21:25] <joshd> or the stable kernels for 3.4 and 3.5, which have backports
[21:25] <sstan> joshd, about /proc/devices ...
[21:26] <sstan> did it change in kernel > 3.6
[21:26] <joshd> there were bugs around rbd id allocation which could cause something like that, yeah
[21:27] <sstan> yeah I guess it shouldn't be listed as "rbd1 , rbd2 , ..."
[21:27] * Tribaal (uid3081@hillingdon.irccloud.com) has joined #ceph
[21:28] <sstan> could someone who's running kernel > 3.6 please check their /proc/devices files please?
[21:28] <nhm> sstan: sorry, I'm still on 3.6.3
[21:29] <sstan> sorry I mean >=
[21:29] * yanzheng (~zhyan@134.134.139.70) Quit (Remote host closed the connection)
[21:29] * LeaChim (~LeaChim@02da1ea0.bb.sky.com) Quit (Ping timeout: 480 seconds)
[21:30] <nhm> sstan: I'm seeing lots of listed rbds.
[21:30] <sstan> do they have numbers i.e." rbd1"
[21:30] <nhm> yep
[21:31] <sstan> yeah that might be a problem imho
[21:31] <nhm> rbd1 through 16 for all 16 volumes
[21:31] <joshd> sstan: that won't change, the /dev/rbd/pool/image is just a symlink for convenience set up by udev
[21:31] <sstan> for examples, all "fs"
[21:32] <sstan> I like that feature! but I'm talking about /proc/devices
[21:32] <sstan> if you have lvm, go see /etc/lvm/lvm.conf
[21:32] <sstan> # List of pairs of additional acceptable block device types found
[21:32] <sstan> # in /proc/devices with maximum (non-zero) number of partitions.
[21:32] <sstan> types = [ "rbd1", 252 ]
[21:33] <sstan> when I map another rbd. RBD1 will be 251 ... and rbd2 will take 252
[21:37] * PerlStalker (~PerlStalk@72.166.192.70) Quit (Quit: ...)
[21:37] * LeaChim (~LeaChim@b01bdcc0.bb.sky.com) has joined #ceph
[21:40] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[21:44] <joshd> sstan: the major number won't change, but that isn't what lvm is looking for anyway, is it?
[21:44] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[21:44] <sstan> "rbd1 252" has to be in /proc/devices ...else it doesn't work
[21:46] <sstan> if types = [ "rbd1", 252 ] is present. If it's not, it doesn't work.
[21:46] * Vjarjadian (~IceChat77@5ad6d005.bb.sky.com) has joined #ceph
[21:47] * gaveen (~gaveen@175.157.72.253) Quit (Remote host closed the connection)
[21:50] <sstan> hmm I unmapped /dev/rbd1, but it's still in /proc/devices (perhaps because of LVM)
[21:50] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) Quit (Remote host closed the connection)
[21:57] * jlogan (~Thunderbi@72.5.59.176) Quit (Read error: Connection reset by peer)
[21:57] * jlogan (~Thunderbi@72.5.59.176) has joined #ceph
[21:59] * terje (~Adium@c-67-176-69-220.hsd1.co.comcast.net) has left #ceph
[22:09] * rturk-away is now known as rturk
[22:13] * portante (~user@66.187.233.206) Quit (Remote host closed the connection)
[22:17] * danieagle (~Daniel@177.99.133.222) has joined #ceph
[22:22] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)
[22:25] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) has joined #ceph
[22:26] * stackevil (~stackevil@178.112.22.72.wireless.dyn.drei.com) Quit (Quit: This Mac has gone to sleep!)
[22:33] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[22:33] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[22:37] <mjblw> can someone tell me, at http://ceph.com/docs/master/rados/configuration/osd-config-ref/#operations, the priority is not clearly explained. is a lower number a higher or lower priority?
[22:43] <dmick> given the defaults, my guess is "higher number is higher priority"
[22:45] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[22:45] * Cube (~Cube@12.248.40.138) has joined #ceph
[22:46] * Cube (~Cube@12.248.40.138) Quit ()
[22:46] * Cube (~Cube@12.248.40.138) has joined #ceph
[22:46] * eschnou (~eschnou@148.91-201-80.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[22:47] * Cube (~Cube@12.248.40.138) Quit ()
[22:47] * Cube (~Cube@12.248.40.138) has joined #ceph
[22:48] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[22:51] * aliguori (~anthony@cpe-70-112-157-87.austin.res.rr.com) Quit (Quit: Ex-Chat)
[22:52] * BillK (~BillK@58-7-59-228.dyn.iinet.net.au) has joined #ceph
[22:53] * diegows (~diegows@190.190.2.126) has joined #ceph
[23:02] <dmick> but I agree it would be nice to actually state that
[23:02] * jluis (~JL@89-181-154-201.net.novis.pt) has joined #ceph
[23:08] * joao (~JL@89.181.152.115) Quit (Ping timeout: 480 seconds)
[23:08] * jluis is now known as joao
[23:09] * sstan_ (~chatzilla@modemcable016.164-202-24.mc.videotron.ca) has joined #ceph
[23:16] <mjblw> I'm kind of hoping for more than a guess just because this is a live system and I've got what seems to be backfilling causing blocking io to rbd volumes. I just put IN about 15 OSDs and the resulting backfilling has caused a number of osd processes to die across the cluster and I have blocked IO on volumes.
[23:17] <mjblw> what exactly does it mean when a pg is in active+recovery_wait
[23:17] <jjgalvez> I've been reading the docs on those settings and it definitely looks like a higher number is higher priority, client defaults to 63 with recovery set to 10.
[23:18] <mjblw> then why would backfilling be DOSing client requests?
[23:18] <dmick> something else may be going on
[23:19] <gregaf> those priorities are just the message dispatch priorities
[23:20] <mjblw> i have http://pastebin.com/uU1zhz9H
[23:20] <jjgalvez> In regards to the pg status, active means ceph will process requests, and recovery_wait would mean it is in the queue to start the recovery process. http://ceph.com/docs/master/rados/operations/pg-states/
[23:20] <dpippenger> if your clients are using the same network interfaces as the osd for backfill maybe you are overloading the network interface
[23:20] <lurbs> mjblw: Have you tried decreasing the number of backfills per osd?
[23:20] <lurbs> http://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling
[23:20] <mjblw> dual 10gbe is not saturated
[23:21] <gregaf> recovery_wait on its own shouldn't impact throughput
[23:21] <mjblw> how can i modify this setting on the running system?
[23:21] * drokita (~drokita@199.255.228.128) Quit (Ping timeout: 480 seconds)
[23:22] <gregaf> the dying OSDs, that's a problem and is probably why your client IO is blocking in a noticeable fashion
[23:22] <mjblw> even when no osds are dead it's blocked. i have no persistent stuck inactive pgs.
[23:23] * leseb (~leseb@78.250.109.87) Quit (Remote host closed the connection)
[23:24] <gregaf> what's ceph -s output? and how are you determining blocked IO?
[23:24] <mjblw> websites hosted on the volume not loading. reported from multiple servers using volumes
[23:26] * mcclurmc_laptop (~mcclurmc@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[23:28] <mjblw> http://pastebin.com/G8AwGTFt ceph -s
[23:32] <gregaf> are those 3 peering PGs stuck there, or are they just some that are moving out of backfill and into active+clean?
[23:32] <mjblw> pgs go into peering and then come out
[23:33] <gregaf> and my *guess* is that your cluster was busy enough that this migration in addition to your normal workload is just overloading the disks, so stuff is becoming highly latent
[23:33] <gregaf> if the OSDs that died committed suicide on their store threads, that would be an indication of this
[23:33] <mjblw> i am trying to work out reducing the number of backfill per osd
[23:33] <gregaf> I believe jjgalvez can help you with that :)
[23:34] <mjblw> something like a ceph osd tell command?
[23:34] * stackevil (~stackevil@77.116.2.173.wireless.dyn.drei.com) has joined #ceph
[23:34] <jjgalvez> yeah I'm pulling that up now
[23:35] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[23:36] <mjblw> maybe ceph osd tell * injectargs --osd_max+backfills=1 ?
[23:36] <mjblw> er, ceph osd tell * injectargs --osd_max_backfills=1
[23:37] <mjblw> that's not it
[23:37] <jjgalvez> try: ceph osd tell \* injectargs '--osd-max-backfills 1'
[23:37] * ScOut3R (~ScOut3R@c83-249-233-227.bredband.comhem.se) has joined #ceph
[23:38] <mjblw> trying that
[23:45] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[23:47] <mjblw> this is incredible, the cluster went down to only 9 pgs backfilling, down from 145+ and the disks on the cluster are still pegged, as observed by iostat
[23:48] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) Quit (Ping timeout: 480 seconds)
[23:48] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[23:52] * vata (~vata@2607:fad8:4:6:a842:ac48:fdc:58d6) Quit (Quit: Leaving.)
[23:52] <jjgalvez> what does your current ceph status read?
[23:57] * mgalkiewicz (~mgalkiewi@toya.hederanetworks.net) has joined #ceph
[23:57] * loicd (~loic@AAnnecy-257-1-112-254.w90-36.abo.wanadoo.fr) Quit (Quit: Leaving.)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.