#ceph IRC Log

Index

IRC Log for 2014-09-04

Timestamps are in GMT/BST.

[0:00] * JayJ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[0:01] * nolan_ (~nolan@2001:470:1:41:a800:ff:fe3e:ad08) has joined #ceph
[0:02] * xarses (~andreww@12.164.168.117) Quit (Ping timeout: 480 seconds)
[0:03] * qhartman (~qhartman@den.direwolfdigital.com) Quit (Quit: Ex-Chat)
[0:08] * dneary (~dneary@nat-pool-bos-u.redhat.com) Quit (Ping timeout: 480 seconds)
[0:08] * markbby (~Adium@168.94.245.4) Quit (Quit: Leaving.)
[0:10] * JayJ (~jayj@157.130.21.226) has joined #ceph
[0:13] * xarses (~andreww@12.164.168.117) has joined #ceph
[0:13] * Shmouel (~Sam@fny94-12-83-157-27-95.fbx.proxad.net) has joined #ceph
[0:15] * tab (~oftc-webi@89-212-99-37.dynamic.t-2.net) has joined #ceph
[0:16] * monsterz_ (~monsterzz@94.19.146.224) has joined #ceph
[0:16] * monsterzz (~monsterzz@94.19.146.224) Quit (Ping timeout: 480 seconds)
[0:16] <tab> On what basis does ceph decides when the disk is bad? Is there any script the filters logs for certain disk problem pattern?
[0:17] <tab> Does ceph than automaticly remove disk from the PG and uses another disk?
[0:17] * jobewan (~jobewan@snapp.centurylink.net) Quit (Quit: Leaving)
[0:17] * JayJ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[0:22] * sreddy (~oftc-webi@32.97.110.56) Quit (Remote host closed the connection)
[0:25] * andrew__ (~oftc-webi@32.97.110.56) Quit (Remote host closed the connection)
[0:27] * alfredodeza (~alfredode@198.206.133.89) has joined #ceph
[0:29] * rendar (~I@host60-177-dynamic.8-79-r.retail.telecomitalia.it) Quit ()
[0:32] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[0:33] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) has joined #ceph
[0:34] * marrusl (~mark@2604:2000:60e3:8900:99c5:57ab:ba78:1518) Quit (Remote host closed the connection)
[0:35] <carmstrong> is there a way to have rbd map not create a device? this doesn't seem to work in a container: rbd: add failed: (30) Read-only file system
[0:35] <carmstrong> only certain filesystems are writeable
[0:36] * monsterz_ (~monsterzz@94.19.146.224) Quit (Ping timeout: 480 seconds)
[0:36] * sputnik13 (~sputnik13@207.8.121.241) has joined #ceph
[0:37] * sz0 (~sz0@94.55.197.185) has joined #ceph
[0:41] * scuttlemonkey is now known as scuttle|afk
[0:45] <dmick> er....the whole point of rbdmap is to create a device
[0:45] * fsimonce (~simon@host135-17-dynamic.8-79-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)
[0:45] * _nitti_ (~nitti@162.222.47.218) has joined #ceph
[0:45] <dmick> you can access rbd images with librbd instead?...
[0:46] <carmstrong> didn't know about librbd - I'll check out that route
[0:46] <carmstrong> thanks
[0:46] * leseb (~leseb@81-64-215-19.rev.numericable.fr) has joined #ceph
[0:46] <carmstrong> I'd love to have a device or virtual file I can write to, but didn't know about the filesystem permissions until now
[0:47] * Eco (~Eco@99-6-86-41.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[0:47] * [1]bavila (~bavila@mail.pt.clara.net) has joined #ceph
[0:47] <carmstrong> containerizing ceph has been an interesting ride
[0:47] * LeaChim (~LeaChim@host86-135-182-184.range86-135.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[0:48] * alram_ (~alram@38.122.20.226) has joined #ceph
[0:50] <dmick> I don't know how flexible that all is. Maybe you can symlink /dev/rbd to somewhere else or something?...
[0:50] * bandrus (~Adium@216.57.72.205) Quit (Read error: Connection reset by peer)
[0:51] * bandrus (~Adium@216.57.72.205) has joined #ceph
[0:53] * _nitti (~nitti@162.222.47.218) Quit (Ping timeout: 480 seconds)
[0:53] * bavila (~bavila@mail.pt.clara.net) Quit (Ping timeout: 480 seconds)
[0:53] * [1]bavila is now known as bavila
[0:53] * _nitti_ (~nitti@162.222.47.218) Quit (Ping timeout: 480 seconds)
[0:54] * sz0 (~sz0@94.55.197.185) Quit ()
[0:54] * alram (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[0:54] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) Quit (Quit: ...)
[0:56] * lupu (~lupu@86.107.101.214) has joined #ceph
[0:57] * diegows (~diegows@host131.181-1-236.telecom.net.ar) Quit (Read error: Operation timed out)
[0:59] * joerocklin (~joe@cpe-65-185-149-56.woh.res.rr.com) Quit (Quit: ZNC - http://znc.in)
[1:05] * [1]bavila (~bavila@mail.pt.clara.net) has joined #ceph
[1:06] * zack_dolby (~textual@p843a3d.tokynt01.ap.so-net.ne.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[1:10] * joerocklin (~joe@cpe-65-185-149-56.woh.res.rr.com) has joined #ceph
[1:11] * marrusl (~mark@2604:2000:60e3:8900:59a0:b2cb:af6b:a402) has joined #ceph
[1:11] * sputnik13 (~sputnik13@207.8.121.241) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[1:12] * sputnik13 (~sputnik13@207.8.121.241) has joined #ceph
[1:12] * sputnik13 (~sputnik13@207.8.121.241) Quit ()
[1:12] * bavila (~bavila@mail.pt.clara.net) Quit (Ping timeout: 480 seconds)
[1:12] * [1]bavila is now known as bavila
[1:13] * hasues (~hasues@kwfw01.scrippsnetworksinteractive.com) Quit (Quit: Leaving.)
[1:15] * oms101 (~oms101@p20030057EA44EA00C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:18] * _nitti (~nitti@c-66-41-30-224.hsd1.mn.comcast.net) has joined #ceph
[1:23] * marrusl (~mark@2604:2000:60e3:8900:59a0:b2cb:af6b:a402) Quit (Remote host closed the connection)
[1:24] * oms101 (~oms101@p20030057EA3E6F00C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:25] * flaxy (~afx@dark.deflax.net) Quit (Quit: WeeChat 1.0)
[1:29] * Hazelesque_ (~hazel@2a03:9800:10:13::2) has joined #ceph
[1:29] * Hazelesque (~hazel@2a03:9800:10:13::2) Quit (Read error: Connection reset by peer)
[1:29] * dmsimard is now known as dmsimard_away
[1:31] * Eco (~Eco@99-6-86-41.lightspeed.sntcca.sbcglobal.net) Quit (Quit: Leaving)
[1:32] * astellwag (~astellwag@209.132.181.86) Quit (Read error: Connection reset by peer)
[1:32] * astellwag (~astellwag@209.132.181.86) has joined #ceph
[1:36] * flaxy (~afx@dark.deflax.net) has joined #ceph
[1:38] * flaxy (~afx@dark.deflax.net) Quit ()
[1:49] * yanzheng (~zhyan@171.221.139.239) has joined #ceph
[1:53] * reed (~reed@75-101-54-131.dsl.static.sonic.net) Quit (Remote host closed the connection)
[1:54] * flaxy (~afx@dark.deflax.net) has joined #ceph
[1:55] <SPACESHIP> question - if I lose more failure nodes that the replica goal can't be met anywmore what happens? are all writes blocked>
[1:57] * flaxy (~afx@dark.deflax.net) Quit (Read error: No route to host)
[1:57] * flaxy (~afx@dark.deflax.net) has joined #ceph
[1:59] * zack_dolby (~textual@e0109-114-22-3-142.uqwimax.jp) has joined #ceph
[1:59] * zerick (~eocrospom@190.118.30.195) Quit (Ping timeout: 480 seconds)
[1:59] <lurbs> SPACESHIP: Depends what 'min size' is set to. See http://ceph.com/docs/master/rados/configuration/pool-pg-config-ref/
[1:59] * yanzheng (~zhyan@171.221.139.239) Quit (Quit: This computer has gone to sleep)
[2:00] <lurbs> You can set a default, and also per pool.
[2:00] <SPACESHIP> Awesome, that answers my question perfectly, I was just about to clarify
[2:00] <SPACESHIP> as I used the wrong termnlogy :P
[2:08] * lofejndif (~lsqavnbok@176.10.100.227) has joined #ceph
[2:09] * dmsimard_away is now known as dmsimard
[2:10] * ircolle (~Adium@2601:1:a580:145a:4ad:3c8b:b00e:acf4) Quit (Quit: Leaving.)
[2:10] * dmsimard is now known as dmsimard_away
[2:10] * rmoe (~quassel@12.164.168.117) Quit (Read error: Operation timed out)
[2:12] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[2:16] <carmstrong> well, running the container as --privileged allowed me to get back the device creation error, but now I'm getting `rbd: add failed: (22) Invalid argument`
[2:16] <carmstrong> I'm running `rbd map $pool/$name`
[2:16] <carmstrong> tried also specifying --pool separately
[2:17] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[2:18] <carmstrong> anyone have any clues on which argument is invalid? a lot of google searching turned up various unrelated issues
[2:19] * lightspeed (~lightspee@2001:8b0:16e:1:8326:6f70:89f:8f9c) Quit (Ping timeout: 480 seconds)
[2:23] * marrusl (~mark@2604:2000:60e3:8900:2876:7efc:d1ce:3d36) has joined #ceph
[2:23] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) has joined #ceph
[2:24] * sjusthm (~sam@24-205-54-233.dhcp.gldl.ca.charter.com) Quit (Quit: Leaving.)
[2:24] <dmick> is rbd.ko loaded?
[2:25] <dmick> (I think you can tell that with lsmod | grep rbd)
[2:26] <carmstrong> it is indeed. I loaded it beforehand
[2:28] <carmstrong> interestingly, my osds aren't generating anything in the logs
[2:28] <carmstrong> monitors have lots of chatter, and I can see the osd lspools command in the logs
[2:28] * lightspeed (~lightspee@2001:8b0:16e:1:8326:6f70:89f:8f9c) has joined #ceph
[2:31] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[2:32] * xarses (~andreww@12.164.168.117) Quit (Read error: Operation timed out)
[2:32] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[2:33] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[2:37] <alram_> carmstrong: anything in dmesg?
[2:38] <carmstrong> alram_: unfortunately not
[2:39] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[2:39] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[2:43] <alram_> carmstrong: are trying to map from a container?
[2:44] <alram_> https://lists.linuxcontainers.org/pipermail/lxc-users/2013-October/005795.html
[2:44] <carmstrong> alram_: indeed. a container running with --privileged
[2:44] <carmstrong> oh wow, great find
[2:44] <carmstrong> thanks. reading now
[2:44] <alram_> apparently you're not alone :) found a couple of msgs on ceph ML and lxc
[2:44] * yanzheng (~zhyan@171.221.139.239) has joined #ceph
[2:45] <carmstrong> well, that's definitely the issue. doesn't seem like there's any resolution :(
[2:45] * danieagle (~Daniel@179.184.165.184.static.gvt.net.br) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[2:50] * _nitti (~nitti@c-66-41-30-224.hsd1.mn.comcast.net) Quit (Quit: Leaving...)
[2:50] * lucas1 (~Thunderbi@222.240.148.154) has joined #ceph
[2:52] <alram_> carmstrong: not too familiar with linux containers, but is there a reason why you can't map the RBD on the host and share the FS to the container?
[2:53] <carmstrong> alram_: cleanliness and portability, mostly. this is running on CoreOS, which has a lot of ro filesystems as well
[2:53] <carmstrong> containers are the preferred way to run things
[2:57] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Ping timeout: 480 seconds)
[2:58] * scuttle|afk is now known as scuttlemonkey
[2:58] <Nats> invalid argument is probably that it cant authenticate
[2:59] <Nats> ^^ carmstrong
[2:59] <carmstrong> hmm ok
[3:00] <Nats> # rbd map rbd_ssd/au000046_disk
[3:00] <Nats> rbd: add failed: (22) Invalid argument
[3:00] <Nats> in my case, its because i need --id compute in my setup
[3:00] <Nats> in your case, your /etc/ceph files are perhaps inaccessible?
[3:01] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[3:07] * lofejndif (~lsqavnbok@99WAAA93P.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[3:08] * alram_ (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[3:18] <carmstrong> Nats: hmm... it should have the same keyring files as the osd and monitor nodes, since they are templated the same
[3:18] * angdraug (~angdraug@12.164.168.117) Quit (Quit: Leaving)
[3:18] <carmstrong> although I'm not providing an --id flag
[3:18] * Sysadmin88 (~IceChat77@176.250.164.108) Quit (Ping timeout: 480 seconds)
[3:21] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Read error: Connection reset by peer)
[3:22] * chuffpdx__ (~chuffpdx@208.186.186.51) Quit (Read error: Connection reset by peer)
[3:22] * chuffpdx__ (~chuffpdx@208.186.186.51) has joined #ceph
[3:22] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[3:23] <joshd> carmstrong: to get more logging out of the kernel you can 'mount -t debugfs none /sys/kernel/debug', run https://raw.githubusercontent.com/ceph/ceph/master/src/script/kcon_all.sh and then try rbd map again - it'll appear in dmesg
[3:24] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[3:25] <carmstrong> joshd: mount: none is already mounted or /sys/kernel/debug busy
[3:25] <carmstrong> I can try running the scipr
[3:25] <carmstrong> script
[3:26] <joshd> yeah, if /sys/kernel/debug/dynamic_debug/control already exists it'll work (otherwise you'd need to compile your kernel with debugging enabled)
[3:32] * diegows (~diegows@190.190.5.238) has joined #ceph
[3:41] * bandrus (~Adium@216.57.72.205) Quit (Quit: Leaving.)
[3:41] * bandrus (~Adium@216.57.72.205) has joined #ceph
[3:41] * bandrus (~Adium@216.57.72.205) Quit ()
[3:42] * KevinPerks (~Adium@2606:a000:80a1:1b00:24f8:39af:f63e:b4d6) Quit (Quit: Leaving.)
[3:44] <carmstrong> made that change, and now rbd create is hanging
[3:46] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[3:46] <carmstrong> ah. health HEALTH_WARN 66 pgs peering; 194 pgs stuck inactive; 194 pgs stuck unclean
[3:46] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) has joined #ceph
[3:46] <carmstrong> that can't help things
[3:47] * tab (~oftc-webi@89-212-99-37.dynamic.t-2.net) Quit (Remote host closed the connection)
[3:49] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Remote host closed the connection)
[3:50] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[3:53] * adamcrume (~quassel@2601:9:6680:47:a840:2967:7e24:9028) Quit (Remote host closed the connection)
[3:57] * dmsimard_away is now known as dmsimard
[4:05] <carmstrong> ok. [ 874.254853] rbd: Error adding device 172.17.8.100:6789 name=admin,key=client.admin deis db
[4:05] <carmstrong> potentially an auth issue?
[4:06] <joshd> anything before that?
[4:06] * Alssi_ (~Alssi@lpe4.p59-icn.cdngp.net) has joined #ceph
[4:08] <Alssi_> Hi guys, any idea why http://ceph.com/ return me a 403 Forbidden?
[4:08] * zhaochao (~zhaochao@111.204.252.1) has joined #ceph
[4:09] * dmsimard is now known as dmsimard_away
[4:12] <dmick> Alssi_: working for me
[4:12] * Alssi_ (~Alssi@lpe4.p59-icn.cdngp.net) Quit (Read error: Connection reset by peer)
[4:13] <carmstrong> joshd: some docker-related networking, but that's about it
[4:13] * Alssi_ (~Alssi@lpe4.p59-icn.cdngp.net) has joined #ceph
[4:16] * carrot (carrot@103.6.103.83) Quit ()
[4:19] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Quit: Computer has gone to sleep.)
[4:32] * diegows (~diegows@190.190.5.238) Quit (Ping timeout: 480 seconds)
[4:33] <joshd> carmstrong: that means it's failing very early, not even talking over the network
[4:34] <carmstrong> joshd: not a good sign :(
[4:34] <joshd> I'm suspicious it may have to do with the way auth info is passed to the kernel (maybe some extra capability is needed inside a container)
[4:34] <joshd> can you map one outside of a container with this kernel?
[4:35] <carmstrong> I'm unable to install the ceph packages in the root CoreOS machine. do you know of another way to test it?
[4:35] <joshd> which version is the kernel?
[4:37] <carmstrong> 3.15.8
[4:37] <joshd> it's worth trying in the container with cephx auth disabled ('auth supported = none' in the [global] section of /etc/ceph/ceph.conf on every node and restart the cluster)
[4:40] <carmstrong> that's just where I was headed! ok lemme try
[4:41] <absynth__> Alssi_: are you in china?
[4:42] * vbellur (~vijay@122.167.92.122) has joined #ceph
[4:44] <joshd> carmstrong: with auth disabled it's also easy to try out on the host with echo "172.17.8.100:6789 name=admin deis db" > /sys/bus/rbd/add
[4:45] <joshd> it'll show up as /dev/rbd0 if it works, and you can remove it with echo 0 > /sys/bus/rbd/remove
[4:45] * lucas1 (~Thunderbi@222.240.148.154) Quit (Quit: lucas1)
[4:48] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has joined #ceph
[4:48] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has left #ceph
[4:49] * lupu (~lupu@86.107.101.214) Quit (Quit: Leaving.)
[4:49] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) has joined #ceph
[4:50] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:50] * fmanana (~fdmanana@bl8-167-65.dsl.telepac.pt) has joined #ceph
[4:53] <carmstrong> joshd: /bin/bash: line 1: echo: write error: Invalid argument
[4:53] <carmstrong> weird - same error
[4:54] <carmstrong> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
[4:54] <carmstrong> so I should be able to write
[4:55] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Remote host closed the connection)
[4:56] <carmstrong> also, is [global] implicit in ceph.conf? I don't have section headings at all
[4:57] * fdmanana (~fdmanana@bl5-77-181.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[5:10] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[5:10] * yanzheng (~zhyan@171.221.139.239) Quit (Quit: This computer has gone to sleep)
[5:12] * yanzheng (~zhyan@171.221.139.239) has joined #ceph
[5:16] * jtaguinerd (~jtaguiner@203.215.116.66) has joined #ceph
[5:20] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[5:21] <carmstrong> joshd: no dice, even with no auth
[5:22] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Read error: Operation timed out)
[5:23] * marrusl (~mark@2604:2000:60e3:8900:2876:7efc:d1ce:3d36) Quit (Ping timeout: 480 seconds)
[5:25] * Vacuum (~vovo@i59F79388.versanet.de) has joined #ceph
[5:32] * Vacuum_ (~vovo@88.130.202.88) Quit (Ping timeout: 480 seconds)
[5:41] * yanzheng (~zhyan@171.221.139.239) Quit (Quit: This computer has gone to sleep)
[5:52] * saurabh (~saurabh@121.244.87.117) has joined #ceph
[5:52] * yanzheng (~zhyan@171.221.139.239) has joined #ceph
[5:58] * bkunal (~bkunal@121.244.87.115) has joined #ceph
[5:58] * Pedras (~Adium@50.185.218.255) has joined #ceph
[6:00] * saurabh (~saurabh@121.244.87.117) Quit (Quit: Leaving)
[6:09] * yanzheng (~zhyan@171.221.139.239) Quit (Quit: This computer has gone to sleep)
[6:15] * vbellur (~vijay@122.167.92.122) Quit (Ping timeout: 480 seconds)
[6:16] * barnim (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) Quit (Quit: Verlassend)
[6:20] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[6:21] <cooldharma06> alfredodeza: :)
[6:28] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:33] <Alssi_> absynth__: Korea more precisely
[6:36] * Jakey (uid1475@id-1475.uxbridge.irccloud.com) Quit (Quit: Connection closed for inactivity)
[6:38] <Alssi_> absynth__: well, I was able to access the web site after redirecting my trafic into our US office. Seems like Asian IPs are blocked sadly.
[6:46] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Remote host closed the connection)
[6:47] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[6:48] * rdas (~rdas@110.227.44.34) has joined #ceph
[6:59] * lucas1 (~Thunderbi@222.247.57.50) has joined #ceph
[7:01] * lucas1 (~Thunderbi@222.247.57.50) Quit ()
[7:05] * Concubidated (~Adium@66-87-153-188.pools.spcsdns.net) has joined #ceph
[7:13] * vbellur (~vijay@121.244.87.117) has joined #ceph
[7:13] * Concubidated (~Adium@66-87-153-188.pools.spcsdns.net) Quit (Quit: Leaving.)
[7:14] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) has joined #ceph
[7:18] <cooldharma06> hi i am newbie to this ceph. and i am having some doubt
[7:20] * AfC (~andrew@customer-hotspot.esshotell.se) has joined #ceph
[7:21] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[7:29] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:42] <cooldharma06> how to run the ceph-deploy install or new. its running fine as root user but its not running as other user.
[7:45] * lucas1 (~Thunderbi@222.240.148.154) has joined #ceph
[7:47] * michalefty (~micha@p20030071CE792547B814B40FBA00F8BD.dip0.t-ipconnect.de) has joined #ceph
[7:48] * squisher (~squisher@2601:0:580:8be:3285:a9ff:fe9c:4b04) Quit (Quit: Leaving)
[7:51] * yanzheng (~zhyan@171.221.139.239) has joined #ceph
[7:53] * ashishchandra (~ashish@49.32.0.239) has joined #ceph
[7:54] * ashishchandra (~ashish@49.32.0.239) Quit ()
[7:56] * ashishchandra (~ashish@49.32.0.239) has joined #ceph
[7:56] * ashishchandra (~ashish@49.32.0.239) Quit ()
[7:57] * ashishchandra (~ashish@49.32.0.239) has joined #ceph
[7:57] * DrewBeer (~DrewBeer@216.152.240.203) has joined #ceph
[7:57] * ashishchandra (~ashish@49.32.0.239) Quit ()
[7:57] * lalatenduM (~lalatendu@121.244.87.117) has joined #ceph
[7:58] * ashishchandra (~ashish@49.32.0.239) has joined #ceph
[7:58] * ashishchandra (~ashish@49.32.0.239) Quit ()
[7:58] <DrewBeer> so i just got up a new cluster, and mapped an rdb, formatted and mounted, then ran a speed test. for about the first 10 seconds it seems to be writing, now for the last hour its just hung
[7:59] <skullone> 9/j ovs
[7:59] <DrewBeer> i can't kill the dd process, or bonnie++, ceph -w doesn't show anything being written or ops
[7:59] * ashishchandra (~ashish@49.32.0.239) has joined #ceph
[7:59] * ashishchandra (~ashish@49.32.0.239) Quit ()
[7:59] * ashishchandra (~ashish@49.32.0.239) has joined #ceph
[8:00] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[8:07] * saurabh (~saurabh@121.244.87.117) has joined #ceph
[8:08] * peedu (~peedu@170.91.235.80.dyn.estpak.ee) has joined #ceph
[8:08] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[8:12] * lcavassa (~lcavassa@89.184.114.246) has joined #ceph
[8:16] * peedu_ (~peedu@185.46.20.35) has joined #ceph
[8:16] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) has joined #ceph
[8:22] * peedu (~peedu@170.91.235.80.dyn.estpak.ee) Quit (Ping timeout: 480 seconds)
[8:24] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) Quit (Quit: Leaving.)
[8:24] * sleinen (~Adium@2001:620:0:2d:7ed1:c3ff:fedc:3223) has joined #ceph
[8:28] * b0e (~aledermue@213.95.25.82) has joined #ceph
[8:28] * zhaozhiming (~zhaozhimi@192.200.151.151) has joined #ceph
[8:28] * thomnico (~thomnico@2a01:e35:8b41:120:9456:1a39:bdaf:324a) has joined #ceph
[8:29] * true (~antrue@2a02:6b8:0:401:14f7:cca2:4631:98ac) Quit (Read error: Connection timed out)
[8:29] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) Quit (Read error: Connection reset by peer)
[8:31] * true (~antrue@2a02:6b8:0:401:ddb7:ca51:b50e:33b5) has joined #ceph
[8:31] * AfC (~andrew@customer-hotspot.esshotell.se) Quit (Quit: Leaving.)
[8:32] * ccourtaut (~ccourtaut@2001:41d0:2:4a25::1) Quit (Ping timeout: 480 seconds)
[8:33] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Quit: Leaving.)
[8:46] * peedu (~peedu@170.91.235.80.dyn.estpak.ee) has joined #ceph
[8:48] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[8:49] * linuxkidd (~linuxkidd@cpe-066-057-017-151.nc.res.rr.com) Quit (Remote host closed the connection)
[8:50] * rendar (~I@95.234.176.198) has joined #ceph
[8:53] * peedu_ (~peedu@185.46.20.35) Quit (Ping timeout: 480 seconds)
[8:56] * hybrid512 (~walid@195.200.167.70) has joined #ceph
[9:01] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[9:01] * garphy`aw is now known as garphy
[9:02] * ccourtaut (~ccourtaut@2001:41d0:2:4a25::1) has joined #ceph
[9:03] * analbeard (~shw@support.memset.com) has joined #ceph
[9:08] * zerick (~eocrospom@190.118.30.195) has joined #ceph
[9:08] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Ping timeout: 480 seconds)
[9:08] * Nacer (~Nacer@252-87-190-213.intermediasud.com) has joined #ceph
[9:09] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[9:11] * eqhmcow (~eqhmcow@adsl-98-69-161-166.rmo.bellsouth.net) Quit (Ping timeout: 480 seconds)
[9:13] * peedu_ (~peedu@185.46.20.35) has joined #ceph
[9:14] * Jakey (uid1475@id-1475.uxbridge.irccloud.com) has joined #ceph
[9:15] * eqhmcow (~eqhmcow@adsl-98-69-161-166.rmo.bellsouth.net) has joined #ceph
[9:18] * monsterzz (~monsterzz@94.19.146.224) has joined #ceph
[9:19] * peedu (~peedu@170.91.235.80.dyn.estpak.ee) Quit (Ping timeout: 480 seconds)
[9:20] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[9:22] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[9:22] <mgarcesMZ> morning
[9:23] * TMM (~hp@75.101.56.247) has joined #ceph
[9:23] * cok (~chk@2a02:2350:18:1012:fc9e:8435:8a3a:2505) has joined #ceph
[9:27] * zhaozhiming (~zhaozhimi@192.200.151.151) Quit (Quit: Computer has gone to sleep.)
[9:29] * rturk|afk (~rturk@nat-pool-rdu-t.redhat.com) Quit (Quit: Coyote finally caught me)
[9:29] * scuttlemonkey (~scuttle@nat-pool-rdu-t.redhat.com) Quit (Quit: Coyote finally caught me)
[9:30] * monsterzz (~monsterzz@94.19.146.224) Quit (Ping timeout: 480 seconds)
[9:31] * dgurtner (~dgurtner@249-236.197-178.cust.bluewin.ch) has joined #ceph
[9:35] * mgarcesMZ (~mgarces@5.206.228.5) Quit (Quit: mgarcesMZ)
[9:35] * fsimonce (~simon@host135-17-dynamic.8-79-r.retail.telecomitalia.it) has joined #ceph
[9:44] * thomnico (~thomnico@2a01:e35:8b41:120:9456:1a39:bdaf:324a) Quit (Quit: Ex-Chat)
[9:45] * mathias (~mathias@ip-109-47-184-161.web.vodafone.de) has joined #ceph
[9:49] * steki (~steki@91.195.39.5) has joined #ceph
[9:50] * thomnico (~thomnico@2a01:e35:8b41:120:d4d5:4c7d:6707:1912) has joined #ceph
[9:54] * thomnico (~thomnico@2a01:e35:8b41:120:d4d5:4c7d:6707:1912) Quit (Remote host closed the connection)
[9:54] * rdas (~rdas@110.227.44.34) Quit (Quit: Leaving)
[9:55] * swizgard (~swizgard@gate.gxp-brain.fta-berlin.de) has joined #ceph
[9:55] * thomnico (~thomnico@2a01:e35:8b41:120:d4d5:4c7d:6707:1912) has joined #ceph
[9:57] * hybrid512 (~walid@195.200.167.70) Quit (Quit: Leaving.)
[9:57] * garphy is now known as garphy`aw
[9:58] * hybrid512 (~walid@195.200.167.70) has joined #ceph
[9:59] * jtang_ (~jtang@80.111.83.231) has joined #ceph
[10:00] * branto (~borix@ip-213-220-214-245.net.upcbroadband.cz) has joined #ceph
[10:01] * freire_ (~freire@186.202.170.145) has joined #ceph
[10:02] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[10:02] * steki (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[10:02] * RameshN (~rnachimu@121.244.87.117) has joined #ceph
[10:04] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Read error: Operation timed out)
[10:04] * garphy`aw is now known as garphy
[10:06] * freire (~freire@186.202.170.145) Quit (Ping timeout: 480 seconds)
[10:07] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[10:08] * mgarcesMZ (~mgarces@5.206.228.5) Quit ()
[10:09] * zack_dolby (~textual@e0109-114-22-3-142.uqwimax.jp) Quit (Ping timeout: 480 seconds)
[10:09] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[10:15] * zack_dolby (~textual@e0109-114-22-3-142.uqwimax.jp) has joined #ceph
[10:21] * steki (~steki@91.195.39.5) has joined #ceph
[10:23] * cok (~chk@2a02:2350:18:1012:fc9e:8435:8a3a:2505) Quit (Quit: Leaving.)
[10:25] * blackmen (~Ajit@121.244.87.115) has joined #ceph
[10:26] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[10:34] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[10:34] * rotbeard (~redbeard@b2b-94-79-138-170.unitymedia.biz) Quit (Ping timeout: 480 seconds)
[10:35] * linjan (~linjan@176.195.196.165) has joined #ceph
[10:36] * lucas1 (~Thunderbi@222.240.148.154) Quit (Quit: lucas1)
[10:36] <mgarcesMZ> ey guys
[10:36] <mgarcesMZ> I have a radosgw, and Im testing this with python using swift
[10:36] <mgarcesMZ> I had a moron who created a few buckets, for example: "\/test\/0bda7298a1d03c9fc6a5425a05c8941f"
[10:36] <mgarcesMZ> in python, it says it invalid name
[10:37] <mgarcesMZ> if I try: ???radosgw-admin bucket rm "\/test\/0bda7298a1d03c9fc6a5425a05c8941f" ???purge-data???
[10:37] <mgarcesMZ> it does not remove the container/bucket
[10:37] <mgarcesMZ> can you help me?
[10:38] <mgarcesMZ> even the normal named containers, do not dissapear when I perform the bucket rm
[10:42] * linjan (~linjan@176.195.196.165) Quit (Remote host closed the connection)
[10:42] <mgarcesMZ> I tried removing a non existing container, and no error is output
[10:45] * steveeJ (~junky@HSI-KBW-085-216-022-246.hsi.kabelbw.de) has joined #ceph
[10:47] * darkling (~hrm@00012bd0.user.oftc.net) has joined #ceph
[10:55] * linjan (~linjan@176.195.196.165) has joined #ceph
[10:58] * zack_dolby (~textual@e0109-114-22-3-142.uqwimax.jp) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[11:00] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) has joined #ceph
[11:02] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[11:05] * jtang_ (~jtang@80.111.83.231) Quit (Remote host closed the connection)
[11:06] * jtang_ (~jtang@80.111.83.231) has joined #ceph
[11:10] * mgarcesMZ (~mgarces@5.206.228.5) Quit (Quit: mgarcesMZ)
[11:10] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[11:15] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[11:24] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) has joined #ceph
[11:26] * ashishchandra (~ashish@49.32.0.239) Quit (Quit: Leaving)
[11:33] * rdas (~rdas@110.227.40.203) has joined #ceph
[11:33] * dgurtner (~dgurtner@249-236.197-178.cust.bluewin.ch) Quit (Read error: Connection reset by peer)
[11:33] * cok (~chk@2a02:2350:1:1203:2df8:3fa0:a7bf:b57e) has joined #ceph
[11:34] * davidz (~Adium@cpe-23-242-12-23.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[11:41] * hyperbaba (~hyperbaba@mw-at-rt-nat.mediaworksit.net) has joined #ceph
[11:42] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) has joined #ceph
[11:42] * ChanServ sets mode +v andreask
[11:46] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[11:48] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[11:49] * dgurtner (~dgurtner@249-236.197-178.cust.bluewin.ch) has joined #ceph
[11:53] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[11:58] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[12:03] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[12:03] * dmsimard_away is now known as dmsimard
[12:05] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) has joined #ceph
[12:06] * dmsimard is now known as dmsimard_away
[12:11] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[12:13] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) Quit (Quit: Leaving)
[12:15] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) Quit (Quit: Splunk> Be an IT superhero. Go home early.)
[12:17] * madkiss (~madkiss@089144197096.atnat0006.highway.a1.net) has joined #ceph
[12:25] * dgurtner (~dgurtner@249-236.197-178.cust.bluewin.ch) Quit (Read error: Connection reset by peer)
[12:26] * lucas1 (~Thunderbi@218.76.25.66) has joined #ceph
[12:27] * lkoranda (~lkoranda@nat-pool-brq-t.redhat.com) has joined #ceph
[12:29] * dgurtner (~dgurtner@249-236.197-178.cust.bluewin.ch) has joined #ceph
[12:31] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Ping timeout: 480 seconds)
[12:36] * dmsimard_away is now known as dmsimard
[12:38] * mgarcesMZ (~mgarces@5.206.228.5) Quit (Quit: mgarcesMZ)
[12:46] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[12:47] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) has joined #ceph
[12:47] * ChanServ sets mode +v andreask
[12:53] * lucas1 (~Thunderbi@218.76.25.66) Quit (Ping timeout: 480 seconds)
[12:54] * cok (~chk@2a02:2350:1:1203:2df8:3fa0:a7bf:b57e) Quit (Quit: Leaving.)
[12:55] <bitserker> hi everyone! anybody knows how i can test if my replication network works? When i launch a ceph bench i cannot see traffic in the interface associated to this network
[12:56] <bitserker> i have this in [global]
[12:56] <bitserker> cluster network = 192.168.201.0/24
[12:56] <bitserker> public network = 192.168.200.0/24
[12:57] <darkling> bitserker: I had that too on our system. Turns out that the network traffic tool I was using didn't work on that network for some reason.
[12:57] <darkling> I showed myself that there was something happening by running watch ifconfig -a
[12:57] <bitserker> mmm
[12:57] <bitserker> darkling: i use nmon
[12:58] <darkling> Our cluster network is InfiniBand, so possibly just caused by weird hardware.
[12:58] <bitserker> darkling: my network is a typical 1gb with bonding type 4
[12:58] <singler> bitserker: nuke an OSD
[12:59] <singler> or write data
[12:59] <bitserker> singler: yes i use the tool for benchmark
[13:00] <singler> well, there should be activity on that network then
[13:00] <singler> also you can check with tcpdump
[13:01] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[13:01] <bitserker> singler: it writes a lot of data. in previous versions of ceph i could see the two networks working. now i dont see traffic in replication network when i launch the benchmark
[13:02] <bitserker> singler: yes. the diferences are that in the past i configure all by hand and now i use ceph-deploy
[13:04] <bitserker> singler: tnx for your help anyway
[13:06] * vbellur (~vijay@121.244.87.117) Quit (Ping timeout: 480 seconds)
[13:07] * ashishchandra (~ashish@49.32.0.239) has joined #ceph
[13:07] * andreask (~andreask@h081217017238.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[13:09] * peedu (~peedu@170.91.235.80.dyn.estpak.ee) has joined #ceph
[13:09] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[13:09] * JayJ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit (Quit: Computer has gone to sleep.)
[13:12] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) has joined #ceph
[13:12] * vbellur (~vijay@121.244.87.124) has joined #ceph
[13:13] * davidz (~Adium@cpe-23-242-12-23.socal.res.rr.com) has joined #ceph
[13:15] * danieljh (~daniel@0001b4e9.user.oftc.net) Quit (Quit: Lost terminal)
[13:15] * JayJ_ (~jayj@pool-96-233-113-153.bstnma.fios.verizon.net) Quit ()
[13:16] * peedu_ (~peedu@185.46.20.35) Quit (Ping timeout: 480 seconds)
[13:17] * yanzheng (~zhyan@171.221.139.239) Quit (Quit: This computer has gone to sleep)
[13:23] * dneary (~dneary@nat-pool-bos-u.redhat.com) has joined #ceph
[13:24] * zhaochao (~zhaochao@111.204.252.1) has left #ceph
[13:31] * dis is now known as Guest1435
[13:32] * dis (~dis@109.110.67.220) has joined #ceph
[13:33] * Guest1435 (~dis@109.110.67.48) Quit (Ping timeout: 480 seconds)
[13:36] * dgurtner (~dgurtner@249-236.197-178.cust.bluewin.ch) Quit (Ping timeout: 480 seconds)
[13:40] * stefano (~stefano@dragas.org) has joined #ceph
[13:41] * danieljh (~daniel@0001b4e9.user.oftc.net) has joined #ceph
[13:43] <stefano> hello everybody! i've set up a nice ceph environment. A storage pool composed of "slow" disks and a ssd writeback tier cache. Everything works, objects are cached. What I don't get is how objects are evicted. I've set cache_target_full_ratio at 0.5 and cache_target_dirty_ratioat 0.4 but by ssds are 70% full and nothing gets evicted unless I force it manually. What's wrong here?
[13:49] * dgurtner (~dgurtner@217.192.177.51) has joined #ceph
[13:51] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[13:52] * peedu_ (~peedu@185.46.20.35) has joined #ceph
[13:55] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) has joined #ceph
[13:57] * diegows (~diegows@190.190.5.238) has joined #ceph
[13:58] * bjornar (~bjornar@ns3.uniweb.no) has joined #ceph
[13:58] * peedu (~peedu@170.91.235.80.dyn.estpak.ee) Quit (Ping timeout: 480 seconds)
[13:58] * vbellur (~vijay@121.244.87.124) Quit (Ping timeout: 480 seconds)
[14:01] * linjan (~linjan@176.195.196.165) Quit (Ping timeout: 480 seconds)
[14:04] * mathias (~mathias@ip-109-47-184-161.web.vodafone.de) Quit (Quit: leaving)
[14:07] * ksingh (~Adium@2001:708:10:10:2472:baad:91e8:4c34) has joined #ceph
[14:08] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) has joined #ceph
[14:10] * fghaas (~florian@213162068042.public.t-mobile.at) has joined #ceph
[14:10] * yanzheng (~zhyan@171.221.139.239) has joined #ceph
[14:12] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) has joined #ceph
[14:12] * linjan (~linjan@176.195.196.165) has joined #ceph
[14:13] * fghaas (~florian@213162068042.public.t-mobile.at) Quit ()
[14:21] * hyperbaba (~hyperbaba@mw-at-rt-nat.mediaworksit.net) Quit (Ping timeout: 480 seconds)
[14:23] * dis is now known as Guest1438
[14:23] * dis (~dis@109.110.66.113) has joined #ceph
[14:23] * rwheeler (~rwheeler@173.48.207.57) has joined #ceph
[14:23] * michalefty (~micha@p20030071CE792547B814B40FBA00F8BD.dip0.t-ipconnect.de) has left #ceph
[14:24] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:24] * rwheeler (~rwheeler@173.48.207.57) Quit ()
[14:24] * marrusl (~mark@2604:2000:60e3:8900:508a:879a:1b12:efac) has joined #ceph
[14:24] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:25] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:25] * Guest1438 (~dis@109.110.67.220) Quit (Ping timeout: 480 seconds)
[14:25] * monsterz_ (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:25] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:26] * mgarcesMZ (~mgarces@5.206.228.5) Quit (Quit: mgarcesMZ)
[14:26] * monsterz_ (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:26] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:27] * monsterz_ (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:27] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:28] * lalatenduM (~lalatendu@121.244.87.117) Quit (Quit: Leaving)
[14:28] * monsterz_ (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:28] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:29] * monsterz_ (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:29] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:29] * ganders (~root@200-127-158-54.net.prima.net.ar) has joined #ceph
[14:29] * rwheeler (~rwheeler@173.48.207.57) has joined #ceph
[14:29] * monsterz_ (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:30] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[14:30] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:31] * monsterz_ (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:31] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:31] <am88b> I see that 'ceph-deploy osd prepare' does not complain if preparing disk with journal partition that is already being used by another OSD. Also the cluster seems to be running just fine after the fact even though "ls -l /var/lib/ceph/osd/*/journal" reveals that the same partition is used as journal for many OSD's. How can this be? Shouldn't there be serious data corruption?
[14:31] * saurabh (~saurabh@121.244.87.117) Quit (Quit: Leaving)
[14:31] * monsterz_ (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:32] * lalatenduM (~lalatendu@121.244.87.117) has joined #ceph
[14:32] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:33] * yanzheng (~zhyan@171.221.139.239) Quit (Quit: This computer has gone to sleep)
[14:33] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:33] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:33] * mgarcesMZ (~mgarces@5.206.228.5) Quit (Remote host closed the connection)
[14:33] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:34] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:35] * monsterz_ (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) has joined #ceph
[14:35] * monsterzz (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Read error: Connection reset by peer)
[14:35] * markbby (~Adium@168.94.245.1) has joined #ceph
[14:35] * marrusl (~mark@2604:2000:60e3:8900:508a:879a:1b12:efac) Quit (Ping timeout: 480 seconds)
[14:38] * madkiss (~madkiss@089144197096.atnat0006.highway.a1.net) Quit (Quit: Leaving.)
[14:39] * KevinPerks (~Adium@2606:a000:80a1:1b00:80d5:8f07:a8ea:5c4d) has joined #ceph
[14:40] * linjan (~linjan@176.195.196.165) Quit (Ping timeout: 480 seconds)
[14:42] * yanzheng (~zhyan@171.221.139.239) has joined #ceph
[14:45] * marrusl (~mark@2604:2000:60e3:8900:dcad:f09:3e35:1181) has joined #ceph
[14:46] * bkunal (~bkunal@121.244.87.115) Quit (Ping timeout: 480 seconds)
[14:46] * tab (~oftc-webi@194.249.247.164) has joined #ceph
[14:50] * yanzheng (~zhyan@171.221.139.239) Quit (Quit: This computer has gone to sleep)
[14:50] * michalefty (~micha@p20030071CE022739B814B40FBA00F8BD.dip0.t-ipconnect.de) has joined #ceph
[14:51] * cok (~chk@2a02:2350:18:1012:74fa:6b67:1901:4f0b) has joined #ceph
[14:56] * darkling (~hrm@00012bd0.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:57] * madkiss (~madkiss@089144197096.atnat0006.highway.a1.net) has joined #ceph
[14:57] * yanzheng (~zhyan@171.221.139.239) has joined #ceph
[14:58] * JayJ_ (~jayj@157.130.21.226) has joined #ceph
[14:59] * vbellur (~vijay@122.167.217.129) has joined #ceph
[15:00] <ganders> hi to all, i've the following crush: http://pastebin.com/raw.php?i=eQsfSJBr
[15:01] <ganders> and when trying to compile it with (crushtool -c /tmp/crush..txt -o /tmp/crush...map) im getting this error ":143 error: parse error at ''"
[15:02] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has joined #ceph
[15:02] * t0rn (~ssullivan@c-68-62-1-186.hsd1.mi.comcast.net) has left #ceph
[15:02] <ganders> i went to line 143 and found nothing wrong with it
[15:02] * true (~antrue@2a02:6b8:0:401:ddb7:ca51:b50e:33b5) Quit (Read error: Connection timed out)
[15:02] <ganders> maybe im missing some parameter?
[15:04] * scuttle|afk (~scuttle@nat-pool-rdu-t.redhat.com) has joined #ceph
[15:04] * scuttle|afk is now known as scuttlemonkey
[15:04] * true (~antrue@2a02:6b8:0:401:ddb7:ca51:b50e:33b5) has joined #ceph
[15:05] * madkiss (~madkiss@089144197096.atnat0006.highway.a1.net) Quit (Ping timeout: 480 seconds)
[15:06] * linuxkidd (~linuxkidd@cpe-066-057-017-151.nc.res.rr.com) has joined #ceph
[15:08] * madkiss (~madkiss@089144197096.atnat0006.highway.a1.net) has joined #ceph
[15:12] <steveeJ> ganders: not so smart to give us the raw link ;)
[15:12] <ganders> yeah sorry i notice that once i've alredy send it :P
[15:13] <ganders> anyway i found the error :P
[15:13] <steveeJ> ganders: in your fusionio ruleset, you "choose" default
[15:13] <steveeJ> that should be take
[15:14] <ganders> steveeJ, you mean to remove the "choose" default line from the fusionio ruleset?
[15:15] <ganders> yeah that was the error
[15:15] * michalefty (~micha@p20030071CE022739B814B40FBA00F8BD.dip0.t-ipconnect.de) Quit (Quit: Leaving.)
[15:15] <steveeJ> ganders: no, step choose needs more arguments because it wants to actually select a type. when you want to select a bucket group, you have to use "step take <bucket>"
[15:15] <steveeJ> see http://ceph.com/docs/master/rados/operations/crush-map/
[15:16] * shang (~ShangWu@111-83-90-44.EMOME-IP.hinet.net) has joined #ceph
[15:17] * shang (~ShangWu@111-83-90-44.EMOME-IP.hinet.net) Quit ()
[15:18] * simulx2 (~simulx@66-194-114-178.static.twtelecom.net) Quit (Read error: Connection reset by peer)
[15:18] * madkiss (~madkiss@089144197096.atnat0006.highway.a1.net) Quit (Ping timeout: 480 seconds)
[15:22] * ashishchandra (~ashish@49.32.0.239) Quit (Quit: Leaving)
[15:23] * simulx (~simulx@66-194-114-178.static.twtelecom.net) has joined #ceph
[15:23] * brad_mssw (~brad@shop.monetra.com) has joined #ceph
[15:24] * sz0 (~sz0@94.55.197.185) has joined #ceph
[15:24] <ganders> steveeJ: thx! my mistake, it was step take default and then the step chooseleaf firstn -1 type host
[15:25] <steveeJ> yep!
[15:25] <steveeJ> does it compile after correcting it?
[15:25] <ganders> yes :), thnx! I want to guarantee that at least one copy goes yes-or-yes to the fusionio root on cephosd01 and cephosd02 osd's
[15:26] * yanzheng (~zhyan@171.221.139.239) Quit (Quit: This computer has gone to sleep)
[15:27] <steveeJ> ganders: have you heard of cache tiering yet? personally i don't have the hardware to make use of it, but it could be interesting for you
[15:27] * yanzheng (~zhyan@171.221.139.239) has joined #ceph
[15:27] * yanzheng (~zhyan@171.221.139.239) Quit ()
[15:28] * sz0 (~sz0@94.55.197.185) Quit ()
[15:28] <stefano> steveeJ: it's great, but i'm not being able to have it flushed
[15:28] <steveeJ> stefano: you mean with the evict-flush(..) command?
[15:29] <ganders> steveeJ: yes, I've heard about that, and we are going to test it in another cluster, in this particulary case, we have 4 OSD servers, 2 have a Fusion-io card for journals, and the other 2 have ramdisk for journals, and we are getting really good numbers of performance
[15:29] <ganders> but the thing is that we need at least 1or2 OSD servers to be able to hold all the pools data, so if power goes down, then we could rebuild the 2 OSD servers (ramdisks journals)
[15:30] * simulx (~simulx@66-194-114-178.static.twtelecom.net) Quit (Quit: Nettalk6 - www.ntalk.de)
[15:30] <steveeJ> ganders: your OSDs on the fusion io has to be same-sized as your default OSDs right?
[15:31] <ganders> the OSDs are actually on SAS disks, only the journals are on the fusion-io card
[15:31] <steveeJ> ganders: oh, so the name fusion-io for the ruleset just suggests the journals
[15:31] <ganders> steveeJ: yes
[15:32] <ganders> on ceph.conf file we have defined the SAS disks to be used as OSD data
[15:32] <steveeJ> ganders: I'm always jealous of you guys in here :)
[15:33] <ganders> but since if you lost the journal you lost the OSDs, we need to guarantee at least one OSD serv to be able to reconstruct the rest :)
[15:33] <ganders> steveeJ: hehe :), we are trying to get this thing working since we have really nice perf numbers on small and large writes/reads
[15:34] <stefano> steveeJ: the command is ok, it just doesn't seem to flush when reaching some limits I set
[15:35] <stefano> steveeJ: I've set cache_target_full_ratio at 0.5 and cache_target_dirty_ratioat 0.4 but by ssds are 70% full and nothing gets evicted unless I force it manually.
[15:35] * JayJ_ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[15:36] * yanzheng (~zhyan@171.221.139.239) has joined #ceph
[15:38] * Gill (~Gill@static-72-80-16-227.nycmny.fios.verizon.net) has joined #ceph
[15:38] <Gill> Hey guys. I just found this link??? has anyone tried this out? have it working?
[15:38] <Gill> http://ceph.com/docs/master/radosgw/federated-config/#multi-site-data-replication
[15:44] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[15:48] * fghaas (~florian@217.116.189.90) has joined #ceph
[15:48] <JayJ__> I'm trying to import a VMDK data volume into Cinder volume at create time. Ceph is cinder backend. Is there a way I can do this without uploading the data image into glance then creating a cinder volume using glance as source?
[15:49] <flaf> Hi, I search doc to install a mds manually. Have you a link please?
[15:49] * freire (~freire@191.252.9.133) has joined #ceph
[15:50] * freire (~freire@191.252.9.133) Quit (Remote host closed the connection)
[15:50] * freire (~freire@186.202.170.145) has joined #ceph
[15:50] * freire_ (~freire@186.202.170.145) Quit (Read error: Connection reset by peer)
[15:51] * freire_ (~freire@191.252.9.133) has joined #ceph
[15:52] * portante (~portante@nat-pool-bos-t.redhat.com) Quit (Quit: ZNC - http://znc.in)
[15:52] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:58] * freire (~freire@186.202.170.145) Quit (Ping timeout: 480 seconds)
[16:00] * freire_ (~freire@191.252.9.133) Quit (Ping timeout: 480 seconds)
[16:01] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Quit: Leaving)
[16:02] * gregsfortytwo1 (~Adium@cpe-107-184-64-126.socal.res.rr.com) Quit (Quit: Leaving.)
[16:05] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[16:05] * analbeard1 (~shw@5.153.255.226) has joined #ceph
[16:05] <ganders> I had 640 pgs stuck unclean,640 active+remapped, but that number never get lower.. i mean,.. it's like it was freeze there for a long time, also I've query one of the pgs and the recovery_state seems to be ok: http://pastebin.com/raw.php?i=m9aZazHx
[16:06] <ganders> any ideas?
[16:07] * peedu_ (~peedu@185.46.20.35) Quit (Ping timeout: 480 seconds)
[16:10] * RameshN (~rnachimu@121.244.87.117) Quit (Ping timeout: 480 seconds)
[16:11] * analbeard (~shw@support.memset.com) Quit (Ping timeout: 480 seconds)
[16:12] * bgardner_ is now known as millsu2
[16:13] <Gugge-47527> ganders: "ceph pg dump" and "ceph osd tree"
[16:16] <devicenull> do the file names in /var/lib/cpeh/mon/ceph-x/store.db have any meaning?
[16:16] <devicenull> trying to determine why I have one mon out of sync, strace shows the file names there slowly incrementing
[16:16] <devicenull> and I checked another mon, the filenames are about 20k higher
[16:18] <ganders> Gugge-47527: http://pastebin.com/raw.php?i=BKpe37C9
[16:22] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[16:22] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:24] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[16:24] * portante (~portante@nat-pool-bos-t.redhat.com) has joined #ceph
[16:25] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) has joined #ceph
[16:41] * JayJ__ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[16:43] <ganders> Gugge-47527: got it resolved
[16:45] * markbby1 (~Adium@168.94.245.2) has joined #ceph
[16:46] * zerick (~eocrospom@190.118.30.195) Quit (Ping timeout: 480 seconds)
[16:46] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[16:47] * markbby (~Adium@168.94.245.1) Quit (Remote host closed the connection)
[16:47] * cok (~chk@2a02:2350:18:1012:74fa:6b67:1901:4f0b) has left #ceph
[16:49] * monsterz_ (~monsterzz@77.88.2.43-spb.dhcp.yndx.net) Quit (Ping timeout: 480 seconds)
[16:51] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[16:51] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit ()
[16:51] * analbeard1 (~shw@5.153.255.226) Quit (Ping timeout: 480 seconds)
[16:51] * lalatenduM (~lalatendu@121.244.87.117) Quit (Quit: Leaving)
[16:54] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[16:55] * fghaas (~florian@217.116.189.90) Quit (Quit: Leaving.)
[16:56] * blackmen (~Ajit@121.244.87.115) Quit (Quit: Leaving)
[16:56] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[16:56] * blackmen (~Ajit@121.244.87.115) has joined #ceph
[16:58] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit ()
[16:58] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) has joined #ceph
[16:59] * JayJ__ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[17:02] * lalatenduM (~lalatendu@121.244.87.117) has joined #ceph
[17:04] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[17:04] * schegi (~oftc-webi@maaswestcloud.uni-koblenz.de) has joined #ceph
[17:06] * rkdemon (~rkdemon@pool-71-244-62-208.dllstx.fios.verizon.net) has joined #ceph
[17:06] * jobewan (~jobewan@snapp.centurylink.net) has joined #ceph
[17:07] <lalatenduM> scuttlemonkey, ping
[17:08] * JayJ__ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[17:09] <schegi> i got some problems with a fresh ceph deployment. used ceph-deploy on ubuntu 14.04 and still getting into the "ERROR: missing keyring" issue when calling ceph somehow (As user/superuser). All key are in place an readable. I am a bit puzzled right now. All logs state that the cluster is up an running.
[17:11] * madkiss (~madkiss@chello080108036100.31.11.vie.surfer.at) has joined #ceph
[17:12] * fghaas (~florian@188.118.222.14) has joined #ceph
[17:14] * marrusl (~mark@2604:2000:60e3:8900:dcad:f09:3e35:1181) Quit (Remote host closed the connection)
[17:17] * amospalla (~amospalla@0001a39c.user.oftc.net) Quit (Quit: WeeChat 1.1-dev)
[17:18] <schegi> ok got it but strange behaviour. I deploy a named cluster and now if i run ceph -c /<pathtoconfig> -w i get the missing keyring error. But if i run ceph --cluster <clustername> everything works fine
[17:18] * alfredodeza (~alfredode@198.206.133.89) has left #ceph
[17:18] * amospalla (~amospalla@0001a39c.user.oftc.net) has joined #ceph
[17:18] <schegi> is this intended behaviour?
[17:18] * marrusl (~mark@2604:2000:60e3:8900:d97b:659a:c316:3f84) has joined #ceph
[17:20] * ircolle (~Adium@rrcs-74-62-57-62.west.biz.rr.com) has joined #ceph
[17:26] * dgurtner (~dgurtner@217.192.177.51) Quit (Ping timeout: 480 seconds)
[17:26] <scuttlemonkey> lalatenduM: hey
[17:27] <lalatenduM> scuttlemonkey, hello, did you my yesterday's msg to u? regarding building ceph for sig
[17:31] <lalatenduM> s/did you/did you get/
[17:31] <kraken> lalatenduM meant to say: scuttlemonkey, hello, did you get my yesterday's msg to u? regarding building ceph for sig
[17:31] <scuttlemonkey> lalatenduM: yeah, got it. thanks :)
[17:33] <lalatenduM> scuttlemonkey, so as of now it would be pretty straight forward. If you are want. I can build them for you.. it would be just couple of commands
[17:38] <scuttlemonkey> lalatenduM: sure, in the interests of getting it done (correctly) for our first push, it would be great if you could do that
[17:39] <lalatenduM> scuttlemonkey, cool , will do it then . The question I have is are the versions correct
[17:39] <lalatenduM> i.e. ceph-0.80.5-8.el7, is the version correct
[17:40] <lalatenduM> ?
[17:41] <lalatenduM> s/is the version correct/is the correct version, right?/
[17:42] * b0e (~aledermue@213.95.25.82) Quit (Quit: Leaving.)
[17:43] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:43] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[17:44] <scuttlemonkey> lalatenduM: yeah
[17:44] <lalatenduM> scuttlemonkey, ok
[17:49] * true (~antrue@2a02:6b8:0:401:ddb7:ca51:b50e:33b5) Quit (Read error: Connection timed out)
[17:49] * lalatenduM (~lalatendu@121.244.87.117) Quit (Quit: Leaving)
[17:50] * hasues (~hasues@kwfw01.scrippsnetworksinteractive.com) has joined #ceph
[17:50] * monsterzz (~monsterzz@94.19.146.224) has joined #ceph
[17:51] * true (~antrue@2a02:6b8:0:401:ddb7:ca51:b50e:33b5) has joined #ceph
[17:51] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) Quit (Read error: Operation timed out)
[17:52] <fghaas> Everyone, bear with me here for a moment, I want to pen a writeup of this issue at some point but I'm not sure whether I'll have time, so I'm doing a quick brain dump here so the logbot picks it up and people can google it. Here goes:
[17:53] <fghaas> Issue: *creating* and RBD image works, but interacting with it and retrieving or writing data does not. rbd clients report the following error message:
[17:53] <fghaas> librbd: Error listing snapshots: (95) Operation not supported rbd: failed to open image
[17:54] <fghaas> additional symptoms:
[17:54] <fghaas> "rbd image rm" returns an error, but afterwards "rbd ls" does not list the image, but "rados ls" in the RBD pool *does* list the header object
[17:55] <fghaas> seeking out the OSD that it's trying to talk to an setting the debug osd level to 10/10 yields "no such file or directory".
[17:56] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) has joined #ceph
[17:57] <fghaas> in this case, this was caused by an upgrade from 0.80.5 (firefly from ceph.com) to 0.81 (from EPEL). this causes a conflict of librbd (0.80.5) and ceph-libs (0.81) which causes ceph to be updated and ceph-libs to not be installed
[17:57] <fghaas> meaning the OSD node can no longer find /usr/lib64/rados-classes/libcls_rbd.so
[17:58] <fghaas> so while the OSD node as such seems operational, it breaks all RBDs that it hosts.
[17:58] * RameshN (~rnachimu@101.222.225.14) has joined #ceph
[17:59] <fghaas> the remedy is obviously to either downgrade all the ceph packages to 0.80.5 or upgrade to 0.81, where I'd say the former is the better idea.
[17:59] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) has joined #ceph
[17:59] <fghaas> but if you're running a CentOS host with both ceph.com packages and EPEL, be sure to get your priorities right.
[18:00] <mgarcesMZ> fghaas: thanks
[18:01] <fghaas> That said (and maybe one of the devs can confirm or refute this), it would be nice if an OSD could just kill itself if it can't detect a single libcls_*.so. gregsfortytwo or sage, perhaps?
[18:02] * adamcrume (~quassel@50.247.81.99) has joined #ceph
[18:02] <fghaas> oh, and if you do run into this issue, dear random googler, and the logbot just saved your behind, consider (1) buying wido a beer and (2) leaving a comment at http://www.hastexo.com/shoutbox ??? SCNR :)
[18:02] * sleinen1 (~Adium@2001:620:0:68::100) has joined #ceph
[18:03] <mgarcesMZ> fghaas: is writting message to the future
[18:04] * rdas (~rdas@110.227.40.203) Quit (Quit: Leaving)
[18:07] * Gill (~Gill@static-72-80-16-227.nycmny.fios.verizon.net) Quit (Quit: Gill)
[18:08] * RameshN (~rnachimu@101.222.225.14) Quit (Ping timeout: 480 seconds)
[18:09] * sleinen (~Adium@2001:620:0:2d:7ed1:c3ff:fedc:3223) Quit (Ping timeout: 480 seconds)
[18:10] * rmoe (~quassel@12.164.168.117) has joined #ceph
[18:10] * zack_dolby (~textual@p843a3d.tokynt01.ap.so-net.ne.jp) has joined #ceph
[18:12] * yanzheng (~zhyan@171.221.139.239) Quit (Quit: This computer has gone to sleep)
[18:13] * RameshN (~rnachimu@101.222.225.14) has joined #ceph
[18:14] * danieagle (~Daniel@179.184.165.184.static.gvt.net.br) has joined #ceph
[18:19] * fghaas (~florian@188.118.222.14) Quit (Quit: Leaving.)
[18:20] * ksingh (~Adium@2001:708:10:10:2472:baad:91e8:4c34) Quit (Quit: Leaving.)
[18:21] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[18:22] * JayJ__ (~jayj@157.130.21.226) Quit ()
[18:26] * RameshN (~rnachimu@101.222.225.14) Quit (Quit: Quit)
[18:27] <rkdemon> hi
[18:27] * tab_ (~oftc-webi@89-212-99-37.dynamic.t-2.net) has joined #ceph
[18:27] * monsterzz (~monsterzz@94.19.146.224) Quit (Ping timeout: 480 seconds)
[18:27] <rkdemon> I need some advice on the following
[18:28] * madkiss (~madkiss@chello080108036100.31.11.vie.surfer.at) Quit (Quit: Leaving.)
[18:28] <rkdemon> While setting up my osds in a ceph cluster I decided to get the journals to reside on different physical drives
[18:29] * marrusl (~mark@2604:2000:60e3:8900:d97b:659a:c316:3f84) Quit (Remote host closed the connection)
[18:30] * joshd1 (~jdurgin@2602:306:c5db:310:6d90:cc4b:79c0:1eb8) has joined #ceph
[18:30] * bandrus (~oddo@216.57.72.205) has joined #ceph
[18:30] * bandrus (~oddo@216.57.72.205) has left #ceph
[18:31] <rkdemon> My question is 1. How much journal space is needed .. ? Is it a ratio of the osd drive size ?
[18:31] <rkdemon> Is there a good way to undertsand that ?
[18:31] <rkdemon> Also can multiple journals maybe in the same physical driver in the folder or do they need to be on separate dedicatd folders ?
[18:32] * steki (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[18:32] * steki (~steki@91.195.39.5) has joined #ceph
[18:33] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[18:34] * ircolle (~Adium@rrcs-74-62-57-62.west.biz.rr.com) Quit (Quit: Leaving.)
[18:34] * branto (~borix@ip-213-220-214-245.net.upcbroadband.cz) has left #ceph
[18:34] * BManojlovic (~steki@93-87-222-215.dynamic.isp.telekom.rs) has joined #ceph
[18:35] <schegi> rkdemon, osd journal size = {2 * (expected throughput * filestore max sync interval)}
[18:35] <schegi> says the ceph doc
[18:36] <schegi> http://ceph.com/docs/master/rados/configuration/osd-config-ref/
[18:37] <rkdemon> schegi: thanks.. I have not got my head wrapped aroung what is the filestore max sync value
[18:40] <schegi> rkdemon http://ceph.com/docs/master/rados/configuration/filestore-config-ref/
[18:40] * linjan (~linjan@176.195.196.165) has joined #ceph
[18:40] <schegi> The maximum interval in seconds for synchronizing the filestore.
[18:40] <rkdemon> thanks schegi (this is day 2 in my ceph history.. ) thanks a bunch
[18:41] * steki (~steki@91.195.39.5) Quit (Ping timeout: 480 seconds)
[18:41] * JayJ__ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[18:41] * mgarcesMZ (~mgarces@5.206.228.5) Quit (Quit: mgarcesMZ)
[18:42] * mgarcesMZ (~mgarces@5.206.228.5) has joined #ceph
[18:45] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[18:45] * BManojlovic (~steki@93-87-222-215.dynamic.isp.telekom.rs) Quit (Ping timeout: 480 seconds)
[18:48] * linjan (~linjan@176.195.196.165) Quit (Ping timeout: 480 seconds)
[18:51] * analbeard (~shw@support.memset.com) has joined #ceph
[18:52] * lcavassa (~lcavassa@89.184.114.246) Quit (Remote host closed the connection)
[18:52] * Nacer (~Nacer@252-87-190-213.intermediasud.com) Quit (Read error: Operation timed out)
[18:55] * schegi (~oftc-webi@maaswestcloud.uni-koblenz.de) Quit (Remote host closed the connection)
[18:55] * mgarcesMZ (~mgarces@5.206.228.5) Quit (Quit: mgarcesMZ)
[18:57] * marrusl (~mark@2604:2000:60e3:8900:c044:9727:f15b:71a3) has joined #ceph
[18:57] * adamcrume (~quassel@50.247.81.99) Quit (Remote host closed the connection)
[18:59] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[19:01] * linjan (~linjan@176.195.196.165) has joined #ceph
[19:02] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[19:03] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[19:10] * JayJ__ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[19:15] * darkling (~hrm@00012bd0.user.oftc.net) has joined #ceph
[19:17] * zerick (~eocrospom@190.187.21.53) has joined #ceph
[19:18] * davidz1 (~Adium@cpe-23-242-12-23.socal.res.rr.com) has joined #ceph
[19:19] * davidz (~Adium@cpe-23-242-12-23.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[19:20] <SPACESHIP> I'm currently doing some rados benches
[19:20] <SPACESHIP> and while the performance I'm currently getting is expected
[19:21] <SPACESHIP> the writes are getting absorbed by the journal (separate SSD)
[19:21] <SPACESHIP> but I'm not seeing the data getting flushed to disk
[19:22] <SPACESHIP> So I guess my question is two-fold
[19:22] * davidz (~Adium@cpe-23-242-12-23.socal.res.rr.com) has joined #ceph
[19:22] <SPACESHIP> Due to the way rados bench works, the data might be getting removed before it flushes to disk from journal
[19:23] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[19:23] <SPACESHIP> ?
[19:23] <SPACESHIP> and second, at what intervals does ceph flush from journal to disk
[19:23] * davidz1 (~Adium@cpe-23-242-12-23.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[19:24] * darkling (~hrm@00012bd0.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:27] <SPACESHIP> nevermind
[19:27] <SPACESHIP> The way I've got the OSD drives set up things are just getting crazy compressed before they get written to disk
[19:28] <SPACESHIP> resulting in the data disks only seeing 100KBps of load - because rados bench data is highly compressable
[19:29] * linjan (~linjan@176.195.196.165) Quit (Ping timeout: 480 seconds)
[19:30] * reed (~reed@75-101-54-131.dsl.static.sonic.net) has joined #ceph
[19:30] <SPACESHIP> and http://ceph.com/docs/master/rados/configuration/filestore-config-ref/ answered my journal flushing question *I think*
[19:32] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[19:32] * blackmen (~Ajit@121.244.87.115) Quit (Quit: Leaving)
[19:34] * JayJ__ (~jayj@157.130.21.226) Quit ()
[19:37] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[19:39] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[19:40] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[19:41] * xarses (~andreww@c-76-126-112-92.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[19:41] * brad_mssw (~brad@shop.monetra.com) Quit (Quit: Leaving)
[19:41] * JayJ__ (~jayj@157.130.21.226) Quit ()
[19:43] * adamcrume (~quassel@2601:9:6680:47:d90c:73bf:4474:dda6) has joined #ceph
[19:45] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[19:47] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[19:47] * monsterzz (~monsterzz@94.19.146.224) has joined #ceph
[19:48] * sreddy (~oftc-webi@32.97.110.56) has joined #ceph
[19:49] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit ()
[19:50] <sreddy> Of the 6 OSDs, 4 of them are down right after setting up the cluster
[19:50] <sreddy> all OSD hotst same IP tables rules
[19:51] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[19:51] <sreddy> all OSD hosts have same IPtables rules
[19:51] <sreddy> wondering why one node connected and others are unable to do so..
[19:59] * monsterzz (~monsterzz@94.19.146.224) Quit (Ping timeout: 480 seconds)
[20:00] <carmstrong> joshd: starting fresh today. I think I'm going to try to access the cluster using python's librbd, just to isolate the kernel driver as the issue
[20:02] <joshd1> carmstrong: if you can use rbd create and other non-map commands you're already using librbd
[20:02] <carmstrong> joshd1: ah. so the issue is almost certainly the kernel driver, then. otherwise the auth would have failed sooner, right?
[20:03] <joshd1> yeah, and it's not necessarily auth
[20:04] <carmstrong> gotcha
[20:04] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[20:04] * analbeard (~shw@support.memset.com) Quit (Quit: Leaving.)
[20:06] * dgbaley27 (~matt@c-98-245-167-2.hsd1.co.comcast.net) has joined #ceph
[20:06] <carmstrong> cephfs is out for us, as running just one mds server isn't great for HA. not sure what options are left, except writing a python wrapper which would watch the filesystem and use librbd to store those things in ceph :/
[20:10] * angdraug (~angdraug@12.164.168.117) has joined #ceph
[20:11] <joshd1> carmstrong: did you try mapping on the host without using 'rbd map'?
[20:12] <carmstrong> joshd1: I used the echo command you gave me and got ` /bin/bash: line 1: echo: write error: Invalid argument`
[20:12] <carmstrong> same error we were getting with rbd map
[20:13] <joshd1> ah, so it's not a container issue then
[20:13] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[20:16] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[20:20] <carmstrong> joshd1: actually, can't recall if I did that on the host or in a container. let me try again to be certain
[20:22] * xarses (~andreww@12.164.168.117) has joined #ceph
[20:24] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[20:25] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) has joined #ceph
[20:26] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[20:27] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) has joined #ceph
[20:29] * tab (~oftc-webi@194.249.247.164) Quit (Remote host closed the connection)
[20:30] * JayJ__ (~jayj@157.130.21.226) Quit ()
[20:35] * dgbaley27 (~matt@c-98-245-167-2.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[20:35] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[20:47] * rendar (~I@95.234.176.198) Quit (Read error: Operation timed out)
[20:47] * JayJ__ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[20:49] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[20:51] * rendar (~I@95.234.176.198) has joined #ceph
[20:51] * aaron_ (~oftc-webi@63.140.121.146) has joined #ceph
[20:52] <aaron_> hey guys
[20:52] <aaron_> I have a pg that is stuck "active+remapped"
[20:52] <aaron_> how do I troubleshoot this?
[20:54] <aaron_> the map shows: osdmap e7068 pg 1.f2 (1.f2) -> up [15,0,7] acting [15,0,7,1]
[20:54] <aaron_> but all OSDs are upo
[20:54] * monsterzz (~monsterzz@94.19.146.224) has joined #ceph
[20:56] * zerick (~eocrospom@190.187.21.53) Quit (Ping timeout: 480 seconds)
[20:58] <sreddy> any clues on why would a set of osds won't comeup right off the bat
[21:00] <sreddy> [WRN] map e36 wrongly marked me down
[21:06] * Jakey (uid1475@id-1475.uxbridge.irccloud.com) Quit (Quit: Connection closed for inactivity)
[21:07] <carmstrong> joshd1: so this is interesting - trying a `rbd info` on an rbd that is created results in a librbd error: https://gist.githubusercontent.com/carmstrong/cd31ee408860084a9970/raw/101e4024457a29c6470f4953ad0170cb64519d4c/gistfile1.txt
[21:10] <carmstrong> looks like it actually wasn't created
[21:10] <carmstrong> which could result in the invalid argument error when trying to map, no?
[21:10] <carmstrong> like it's in a weird state - partially created
[21:13] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[21:14] <carmstrong> ah nevermind - that's my stupidity
[21:14] <carmstrong> info needs a pool flag
[21:20] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) has joined #ceph
[21:21] * zerick (~eocrospom@190.187.21.53) has joined #ceph
[21:25] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[21:27] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit ()
[21:28] * lalatenduM (~lalatendu@122.172.34.85) has joined #ceph
[21:28] * true (~antrue@2a02:6b8:0:401:ddb7:ca51:b50e:33b5) has left #ceph
[21:32] * jtaguinerd (~jtaguiner@203.215.116.66) Quit (Quit: Leaving.)
[21:34] * carmstrong (sid22558@id-22558.charlton.irccloud.com) Quit (Ping timeout: 480 seconds)
[21:34] * rotbeard (~redbeard@2a02:908:df10:d300:76f0:6dff:fe3b:994d) has joined #ceph
[21:35] * carmstrong (sid22558@id-22558.charlton.irccloud.com) has joined #ceph
[21:35] * monsterzz (~monsterzz@94.19.146.224) Quit (Ping timeout: 480 seconds)
[21:37] * theanalyst (theanalyst@0001c1e3.user.oftc.net) Quit (Quit: ZNC - http://znc.in)
[21:38] * theanalyst (theanalyst@open.source.rocks.my.socks.firrre.com) has joined #ceph
[21:38] * masterpe (~masterpe@2a01:670:400::43) Quit (Remote host closed the connection)
[21:38] * masterpe (~masterpe@2a01:670:400::43) has joined #ceph
[21:39] * Japje (~Japje@2001:968:672:1::12) Quit (Remote host closed the connection)
[21:39] * Japje (~Japje@alsjeblieft.knuffel.me) has joined #ceph
[21:40] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[21:41] * shk (sid33582@id-33582.charlton.irccloud.com) Quit (Ping timeout: 480 seconds)
[21:41] * grepory (uid29799@id-29799.uxbridge.irccloud.com) Quit (Ping timeout: 480 seconds)
[21:42] * jgornick (~jgornick@2600:3c00::f03c:91ff:fedf:72b4) Quit (Quit: ZNC - http://znc.in)
[21:42] * jgornick (~jgornick@2600:3c00::f03c:91ff:fedf:72b4) has joined #ceph
[21:42] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit ()
[21:42] * shk (sid33582@id-33582.charlton.irccloud.com) has joined #ceph
[21:43] * grepory (uid29799@id-29799.uxbridge.irccloud.com) has joined #ceph
[21:44] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[21:46] <carmstrong> joshd1: so with auth disabled, doing an echo "172.17.8.100:6789 name=admin deis test" > /sys/bus/rbd/add as root causes a hang for a few minutes, then the machine crashes and reboots
[21:46] <carmstrong> that's on the host machine
[21:50] <joshd1> carmstrong: any log of the crash in syslog or dmesg or anything?
[21:50] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[21:50] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[21:50] * monsterzz (~monsterzz@94.19.146.224) has joined #ceph
[21:50] <carmstrong> joshd1: dmesg only has the most recent boot, nothing before. it mentioned that the system journal wasn't closed correctly because of the crash, but that's about it
[21:53] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[21:53] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[21:58] * aaron_ (~oftc-webi@63.140.121.146) Quit (Quit: Page closed)
[22:00] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[22:02] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[22:05] <absynth__> there was a way to make machines that crash regularly spit some details before they die with a panic... some sysrq magic?
[22:05] <kraken> http://i.imgur.com/H7PXV.gif
[22:07] * [1]bavila (~bavila@mail.pt.clara.net) has joined #ceph
[22:10] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[22:12] * photocyte (~hobbs@wsip-70-184-94-49.ph.ph.cox.net) has joined #ceph
[22:12] * bavila (~bavila@mail.pt.clara.net) Quit (Ping timeout: 480 seconds)
[22:12] * [1]bavila is now known as bavila
[22:13] * astellwag (~astellwag@209.132.181.86) Quit (Ping timeout: 480 seconds)
[22:15] * jiffe (~jiffe@mail.cryptotc.us) Quit (Quit: WeeChat 0.3.7)
[22:15] * lalatenduM (~lalatendu@122.172.34.85) Quit (Quit: Leaving)
[22:16] * lalatenduM (~lalatendu@122.172.34.85) has joined #ceph
[22:18] <photocyte> hi, i have a question. I've read that cephfs isn't fully supported yet
[22:18] <photocyte> and I wonder, can I create a block device, attach it to a machine, then share it out via NFS?
[22:18] <photocyte> to multiple users
[22:19] <photocyte> it seems like that would dodge the metadata issue. then i could rsync it or something for a backup
[22:20] <runfromnowhere> photocyte: That's definitely a viable way to go. Many people are having good results with CephFS, I've bumped into limitations a few times but found it to be fairly stable.
[22:21] <runfromnowhere> If you do it the way you're saying, you'll get all the replication and redundancy you configure on the RBD backend, and NFS will handle concurrency. Your only challenge, if it matters to you, is high-availability for the NFS mount if the node hosting it fails
[22:21] <photocyte> i'm just worried about the MDS as a SPOF
[22:21] * fghaas (~florian@213162068042.public.t-mobile.at) has joined #ceph
[22:22] <runfromnowhere> Well with recent Ceph versions you can run a multi-MDS setup
[22:22] <runfromnowhere> I run 3 MDSs
[22:22] <photocyte> oh really....starting at which version?
[22:22] <runfromnowhere> I HAVE encountered failure modes that will leave an MDS hung, however
[22:22] <runfromnowhere> And that can be a problem
[22:22] <runfromnowhere> But it seems to be pretty specific
[22:22] <photocyte> so how do you do multi-mds then?
[22:22] <runfromnowhere> I'm not sure which version introduced the feature, I'm running 0.80.5
[22:22] <photocyte> do they replicate data themselves now?
[22:23] * monsterzz (~monsterzz@94.19.146.224) Quit (Ping timeout: 480 seconds)
[22:23] <runfromnowhere> I'm not sure about the internal structure because this is the first version of Ceph I've messed with :)
[22:24] <photocyte> very cool
[22:24] <photocyte> so did you just install 3 mds and they magically work?
[22:24] <photocyte> like, did you fail 2 out for giggles?
[22:25] <runfromnowhere> LOL
[22:25] <runfromnowhere> I installed three and they're listed as one active and two backup
[22:25] <photocyte> ah ok cool
[22:25] <runfromnowhere> I'm not trying multi-live MDS yet
[22:25] <runfromnowhere> If one fails, another catches up and takes over
[22:25] <photocyte> i'm going to have to try that and fail out the active and see what kabooms
[22:26] * astellwag (~astellwag@209.132.181.86) has joined #ceph
[22:31] * lalatenduM (~lalatendu@122.172.34.85) Quit (Quit: Leaving)
[22:33] * nolan_ is now known as nolan
[22:36] * Nacer (~Nacer@203-206-190-109.dsl.ovh.fr) Quit (Remote host closed the connection)
[22:39] * vbellur (~vijay@122.167.217.129) Quit (Ping timeout: 480 seconds)
[22:39] * eqhmcow_ (~eqhmcow@adsl-98-69-161-166.rmo.bellsouth.net) has joined #ceph
[22:39] * eqhmcow (~eqhmcow@adsl-98-69-161-166.rmo.bellsouth.net) Quit (Remote host closed the connection)
[22:40] * fghaas (~florian@213162068042.public.t-mobile.at) Quit (Quit: Leaving.)
[22:42] * monsterzz (~monsterzz@94.19.146.224) has joined #ceph
[22:59] * fghaas (~florian@85-127-80-104.dynamic.xdsl-line.inode.at) has joined #ceph
[23:01] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) Quit (Read error: Connection timed out)
[23:02] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) has joined #ceph
[23:02] * photocyte (~hobbs@wsip-70-184-94-49.ph.ph.cox.net) Quit (Read error: Connection reset by peer)
[23:07] <rkdemon> Hello, I was able to configure a cluster and osds etc.. the last step yields this very annoying health status
[23:07] <rkdemon> HEALTH_WARN 192 pgs stuck unclean; too few pgs per osd (8 < min 20); clock skew detected on mon.ceph2, mon.ceph1
[23:07] <rkdemon> I have added these lines in the ceph.conf file
[23:07] <rkdemon> before creating the monitors and osds
[23:07] <rkdemon> osd pool default pg num = 2048
[23:07] <rkdemon> osd pool default pgp num = 2048
[23:08] <rkdemon> any words of advice for me would be super cool1??
[23:08] <gleam> how many osds?
[23:08] <ganders> rkdaemon: you need to put more pg_num and pgp_num your pools
[23:08] <lurbs> http://ceph.com/docs/master/rados/operations/placement-groups/#set-the-number-of-placement-groups
[23:08] <ganders> rkdemon: did u try to get the actual values of the pools just to confirm the values?
[23:09] <dmick> the default numbers affect new pool creation
[23:09] <dmick> the original pools are created very early and maybe with fixed values, not sure
[23:10] <rkdemon> ganders: how do I find that out ?
[23:10] <rkdemon> I have 24 osds
[23:10] <lurbs> rkdemon: The link above has information of getting/setting that.
[23:11] <fghaas> I'm taking the liberty to re-raise my question from earlier, which is shouldn't the OSD daemon kill itself if it fails to load a single .so in /usr/lib64/rados_classes? (please see scrollback for context)
[23:11] <rkdemon> lurbs: How do I find my pool name ?
[23:11] <rkdemon> I have not explicitly set a pool name thus far in the ceph cluster setup
[23:12] <lurbs> 'rados lspools', although there may be a more current way.
[23:12] <lurbs> By default the pools 'rbd', 'metadata', and 'data' get created, I believe.
[23:12] <dmick> fghaas
[23:12] <rkdemon> yes that is what is listed
[23:12] <ganders> rkdemon: ceph osd pool get <pool> pg_num
[23:12] <dmick> looking
[23:12] <ganders> rkdemon: ceph osd pool get <pool> pgp_num
[23:12] <lurbs> dmick: Don't engage with him. He's no good.
[23:13] <sreddy> newly setup cluster not coming into active+clean state
[23:13] <ganders> and then to set the new values: ceph osd pool set <pool> pg_num <value>
[23:13] <rkdemon> ganders: for "data" pgp_num: 64
[23:13] <fghaas> lurbs: yeah thanks. regards to the rest of the Catalyst crew in Welly. :)
[23:14] <rkdemon> lurbs: pgp_num is set to 64 for all 3 pools : data, metadata,rbd
[23:14] <ganders> rkdemon: you need to adjust that values
[23:15] <rkdemon> ganders: Adding those values to the ceph.conf was no good "osd pool default pg num = 2048
[23:15] <rkdemon> osd pool default pgp num = 2048"
[23:15] <lurbs> fghaas: I'll pass that on. We're hiring, if you're bored of know anyone BTW. :)
[23:15] * garphy is now known as garphy`aw
[23:15] <ganders> that's for newly created pools
[23:15] <rkdemon> so at stage should I issue commands on the admin node ? I imagine right after the ceph install on the admin ?
[23:15] <sreddy> pg 0.24 is stuck inactive for 482.997044, current state remapped+peering, last acting [0,2,3]
[23:16] <lurbs> fghaas: Also, I blame your training session for the fact that we're now running OpenStack/Ceph in production, and have just finished kitting out a new data centre for a new region.
[23:16] <fghaas> lurbs: Sweet! Can I quote you on that one?
[23:16] <lurbs> Sure.
[23:16] <ganders> on those pools that are already created by default you need to set the new pg_num and pgp_num values with the command "ceph osd pool set <pool> pg_num 2048"
[23:16] <liiwi> /win 16
[23:16] <dmick> fghaas: eh. classes were intended to be loaded on-demand
[23:16] <sreddy> dmick can you please help
[23:16] <ganders> wait a couple of seg/min and then issue the same command but with pgp_num
[23:16] <liiwi> blerp
[23:16] <rkdemon> Ok so I dont need to tear it down and restart ?
[23:17] <dmick> we preload in order to try to stave off some of the silly errors, but that's kinda a hack
[23:17] <dmick> IIRC it's just "preload everything in the dir".
[23:17] <dmick> arguably there could be *lots* better error reporting when we can't find the rbd cls .so
[23:18] <dmick> but I think (without looking) that you're suggesting making some .so's "required", which is new mechanism
[23:18] <fghaas> dmick: that, yes, plus it would be nice if there were, say, a configurable "required classes" option
[23:18] <ganders> rkdemon: and also you need to adjust the ntp on the mon servers
[23:19] <fghaas> as in, "RBD is what I use 95% of my cluster for; an OSD is useless to me if it can't load libcls_rbd.so"
[23:19] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Quit: Leaving.)
[23:19] <rkdemon> you think the time sync is lost on the other servers .. !! I will check on that and see if the ntp dameon is messed up
[23:20] <dmick> fghaas: at very least a clear log message, agreed
[23:20] <dmick> it's particularly frustrating. I've been through the same dance.
[23:20] <fghaas> yes, just -ENOENT isn't very helpful :)
[23:20] <fghaas> and even that only at a debug level >5
[23:21] * ganders (~root@200-127-158-54.net.prima.net.ar) Quit (Quit: WeeChat 0.4.1)
[23:22] * eqhmcow_ (~eqhmcow@adsl-98-69-161-166.rmo.bellsouth.net) Quit (Read error: Operation timed out)
[23:25] <rkdemon> ganders: There was a problem on one server "ceph2" and it needed a time sync. However ceph health from the admin node reports this
[23:25] <rkdemon> HEALTH_WARN 1 pgs degraded; 896 pgs stuck unclean; clock skew detected on mon.ceph2
[23:25] <rkdemon> On the admin node I fixed this
[23:25] <rkdemon> 16 ceph osd pool set data pg_num 768
[23:25] <rkdemon> 317 ceph health
[23:25] <rkdemon> 318 ceph osd pool set data pgp_num 768
[23:25] <lurbs> rkdemon: I've had issues with the clock skew issue not being cleared until I restarted the monitor(s).
[23:25] <rkdemon> 768 (24 osd X 32 )
[23:26] <lurbs> Also, how many hosts do you have that contain OSDs?
[23:26] <rkdemon> I have 3 servers, all 3 set to monitors. However only 2 of them have jbods with 12 drives each
[23:26] <rkdemon> So ther is a 3 monitors and 24 osds
[23:26] <lurbs> In which case I suspect that your pools also have their size (number of replicas) set to the default of 3.
[23:27] <lurbs> Can you pastebin 'ceph osd dump | grep size'?
[23:27] * alfredodeza (~alfredode@198.206.133.89) has joined #ceph
[23:28] <rkdemon> http://pastebin.com/KaQ4MXS8
[23:28] <dmick> fghaas: didn't you get
[23:28] <dmick> dout(0) << "_load_class could not open class " << fname
[23:28] <dmick> << " (dlopen failed): " << dlerror() << dendl;
[23:28] <dmick> ?
[23:28] <lurbs> Yeah, they're all 'size 3'.
[23:28] <fghaas> dmick: that would be at debug level 20, no?
[23:28] <dmick> dout(0)?
[23:28] <lurbs> rkdemon: http://ceph.com/docs/master/rados/operations/pools/#set-the-number-of-object-replicas
[23:29] <fghaas> wut? debug osd = 0?
[23:29] <dmick> ah, no, there's a stat() above it that just returns. doh.
[23:29] <dmick> (i.e. "always")
[23:29] <rkdemon> ceph osd pool set data size 2
[23:29] <rkdemon> ok..
[23:29] <fghaas> yeah. plus even at debug level 10 you get those annoying "tick" entries constantly, which you then have to filter out
[23:29] <lurbs> rkdemon: You'll want to set them to a size of 2, and probably also set the default in ceph.conf.
[23:29] <rkdemon> BTW is there an easy way to restart a monitor ?
[23:30] <fghaas> (as a side note to rkdemon/lurbs' discussion, the fact that the default pool size changed from 2 to 3 in firefly seems to not be very well known yet.)
[23:30] <lurbs> 'osd pool default size = 2' in ceph.conf BTW.
[23:31] <rkdemon> lurbs: thanks.
[23:31] <lurbs> fghaas: Oh, I'm *very* aware of it. This isn't exactly the first time I've run through this. ;)
[23:31] <fghaas> ah. lurbs, on a related note:
[23:31] <rkdemon> The "pool" is generally only "data" or do I set it for all ... metadata and rbd as well .. ? that should be the right thing to do I think
[23:32] <dmick> fghaas: http://fpaste.org/131119/66334140/, perhaps
[23:32] <lurbs> rkdemon: Yep, all of 'em.
[23:32] <fghaas> "osd pool default pg num" seems to be ignored for the initially created rbd, data and metadata pools -- have you devised a workaround for that?
[23:32] <fghaas> do you just never use those pools, or just pg-split them right after creation?
[23:33] * eqhmcow (~eqhmcow@cpe-075-177-132-024.nc.res.rr.com) has joined #ceph
[23:34] <fghaas> dmick: yes, that does look a bit more helpful :)
[23:34] <dmick> not 100% certain that's the failure point, but it looks plausible
[23:34] <dmick> too lazy to test :)
[23:35] * dgbaley27 (~matt@c-98-245-167-2.hsd1.co.comcast.net) has joined #ceph
[23:35] <lurbs> rkdemon: You'll need to reload the daemons for that setting in ceph.conf to take effect BTW. Or use (at a guess) "ceph tell osd.* injectargs '--osd-pool-default-size 2'" to set it live.
[23:37] <lurbs> In either case, you still need to edit ceph.conf - setting it via 'ceph tell' isn't persistent across restarts.
[23:38] <rkdemon> lurbs: I can't use this ceph tell osd.* injectargs '--osd-pool-default-size 2 ..
[23:39] <rkdemon> some error message with cannot use "tell" with interactive mode
[23:39] * shaon (~shaon@198.50.164.24) has joined #ceph
[23:39] <lurbs> Did you miss a finishing '?
[23:39] * LeaChim (~LeaChim@host86-135-182-184.range86-135.btcentralplus.com) has joined #ceph
[23:39] <rkdemon> got it
[23:40] <rkdemon> all 24 osds changed their pool default size
[23:41] <rkdemon> The health warning still looks bad with the clock skewand 896 pages stuck
[23:41] <rkdemon> I am researching how to restart the monitor
[23:41] <fghaas> rkdemon: it's generally wise for your mons to act as each others NTP peers (i.e. configure them as "peer" in ntp.conf, not "server")
[23:42] <lurbs> To be honest, at this point I'd verify that your ceph.conf contains the default size changes, and just 'service ceph-all restart' on each node.
[23:44] * JayJ__ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[23:44] <rkdemon> lurbs: its gotten messed up
[23:45] <lurbs> Yaaay!
[23:45] <lurbs> What state's it in now?
[23:45] <rkdemon> http://pastebin.com/FxjziSVz
[23:45] * JayJ__ (~jayj@157.130.21.226) has joined #ceph
[23:46] <lurbs> Woah. NFI why it tried to re-create the leveldb store there.
[23:47] <rkdemon> :-) I imagine that is the journalling.. sorry this is my day 2 of serious ceph install and playtime with ceph
[23:48] <rkdemon> this was on one ceph node with the jbod
[23:48] <rkdemon> same state on the other node as well
[23:49] <rkdemon> Can I share my ceph.conf from the admin node ?
[23:49] <lurbs> Yeah, I'm curious to see what 'mon data' is set to.
[23:49] * Sysadmin88 (~IceChat77@94.8.80.73) has joined #ceph
[23:49] <rkdemon> http://pastebin.com/rAksNJU3
[23:50] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph
[23:52] <lurbs> That's quite bizarre. What does /var/lib/ceph/mon/ceph-a contain on ceph1?
[23:52] <lurbs> Does store.db even exist, or is it populated with data?
[23:53] <shaon> anyone's using ceph-cookbook to build ceph cluster?
[23:55] * JayJ__ (~jayj@157.130.21.226) Quit (Quit: Computer has gone to sleep.)
[23:55] <rkdemon> lurbs: /var/run/ceph/mon.a does not exist
[23:55] <rkdemon> lurbs: /var/lib/ceph/mon/ceph-ceph1 exists
[23:56] <rkdemon> oops.. sorry /var/lib/ceph/ceph-ceph1 des not
[23:56] <rkdemon> lurbs: /var/lib/ceph contains mds mon osd tmp and bootstrap*
[23:58] * ufven (~ufven@130-229-28-120-dhcp.cmm.ki.se) Quit (Read error: Connection reset by peer)
[23:58] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[23:59] <lurbs> Was curious about ceph1:/var/lib/ceph/mon/ceph-a specifically. Should contain a few small files (keyring, upstart, done, or similar), and a store.db directory.
[23:59] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[23:59] * Tamil (~Adium@cpe-108-184-74-11.socal.res.rr.com) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.