#ceph IRC Log

Index

IRC Log for 2016-09-21

Timestamps are in GMT/BST.

[0:01] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Remote host closed the connection)
[0:02] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) has joined #ceph
[0:04] <jermudgeon> I???m not finding a detailed procedure for diagnosing stuck unclean pgs at docs.ceph.com; I find the section on ???Unfound Objects???. If I reweight an osd to, say, 0.8, then some pgs get marked stuck unclean, active+remapped. How do I get them to actually move/release/mark clean?
[0:04] <sphinxx> Hello all, my ceph admin node's CPU utilization has doubled within a few weeks. Any idea what may be causing this?
[0:04] <jermudgeon> sphinxx: have you used something like atop to watch ps usage, or is this something spiky/intermittent?
[0:07] <sphinxx> oh i haven't. i usually just monitor it from the VMware performance tab
[0:07] <jermudgeon> these are ceph only nodes, or are you running VMs on them?
[0:07] <sphinxx> i could try that. Thanks for the suggestion jermudgeon
[0:07] <jermudgeon> yw
[0:08] <sphinxx> no, i have other VMs together with the ceph nodes
[0:08] <sphinxx> but the spike is just on the ceph admin node
[0:08] <jermudgeon> got it
[0:08] <jermudgeon> if you have access to the CLI, you should be able to watch top (or in my case, atop, as it shows all load, not just cpu)
[0:10] * ledgr (~ledgr@88-119-196-104.static.zebra.lt) Quit (Ping timeout: 480 seconds)
[0:10] <sphinxx> oh okay, i'll try that
[0:12] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[0:15] * peetaur (~peter@p57AAAF92.dip0.t-ipconnect.de) Quit (Quit: Konversation terminated!)
[0:24] * bene3 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[0:25] * ivve (~zed@c83-254-7-248.bredband.comhem.se) has joined #ceph
[0:25] * bene3 (~bene@nat-pool-bos-t.redhat.com) Quit ()
[0:26] * vata (~vata@207.96.182.162) Quit (Quit: Leaving.)
[0:27] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[0:27] * hoonetorg (~hoonetorg@77.119.226.254.static.drei.at) Quit (Ping timeout: 480 seconds)
[0:27] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[0:31] * xinli (~charleyst@32.97.110.56) Quit (Ping timeout: 480 seconds)
[0:33] * ivve (~zed@c83-254-7-248.bredband.comhem.se) Quit (Ping timeout: 480 seconds)
[0:42] <jermudgeon> Maybe I???m just too impatient. How long should creation of 256 pgs on three SSD osds take? Backed by a 10G network, too.
[0:42] * fsimonce (~simon@host98-71-dynamic.1-87-r.retail.telecomitalia.it) Quit (Remote host closed the connection)
[0:43] <jermudgeon> I???m trying to isolate behavior to a smaller set of osds, as it seems that no matter what size cluster I use, I???m getting stuck eventually.
[0:49] * sphinxx (~sphinxx@154.118.120.40) Quit (Ping timeout: 480 seconds)
[0:56] * hoonetorg (~hoonetorg@77.119.226.254.static.drei.at) has joined #ceph
[0:57] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) Quit (Read error: No route to host)
[1:08] * K3NT1S_aw (~AluAlu@exit0.radia.tor-relays.net) has joined #ceph
[1:11] * pdrakeweb (~pdrakeweb@pool-98-118-150-184.bflony.fios.verizon.net) Quit (Remote host closed the connection)
[1:16] * cathode (~cathode@50-232-215-114-static.hfc.comcastbusiness.net) has joined #ceph
[1:30] * [0x4A6F] (~ident@0x4a6f.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:30] * [0x4A6F] (~ident@p4FC26068.dip0.t-ipconnect.de) has joined #ceph
[1:38] * K3NT1S_aw (~AluAlu@635AAAP5U.tor-irc.dnsbl.oftc.net) Quit ()
[1:41] * oms101 (~oms101@p20030057EA01CF00C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:47] * yanzheng (~zhyan@118.116.115.254) has joined #ceph
[1:48] * yanzheng (~zhyan@118.116.115.254) Quit ()
[1:51] * oms101 (~oms101@p20030057EA016E00C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:54] * yanzheng (~zhyan@118.116.115.254) has joined #ceph
[1:55] * cathode (~cathode@50-232-215-114-static.hfc.comcastbusiness.net) Quit (Quit: Leaving)
[1:58] * neurodrone (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[1:58] * vbellur (~vijay@71.234.224.255) has joined #ceph
[1:59] <jermudgeon> It looks like I???m running into a 5GB journal size limit (buffer?) despite having 25 GB journal partitions. Hmm.
[2:00] * borei (~dan@216.13.217.230) Quit (Ping timeout: 480 seconds)
[2:04] * xarses (~xarses@64.124.158.3) Quit (Ping timeout: 480 seconds)
[2:05] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Ping timeout: 480 seconds)
[2:09] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[2:12] * wkennington (~wkenningt@c-71-204-170-241.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[2:13] * linuxkidd (~linuxkidd@ip70-189-202-62.lv.lv.cox.net) Quit (Ping timeout: 480 seconds)
[2:18] * lixiaoy1 (~lixiaoy1@shzdmzpr01-ext.sh.intel.com) has joined #ceph
[2:21] <jermudgeon> Doesn???t look like there is a way to change journal size at runtime. Hmm.
[2:22] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) has joined #ceph
[2:23] * linuxkidd (~linuxkidd@ip70-189-202-62.lv.lv.cox.net) has joined #ceph
[2:24] * salwasser (~Adium@2601:197:101:5cc1:806e:b3d5:6930:9088) has joined #ceph
[2:25] * wushudoin (~wushudoin@2601:646:8200:c9f0:2ab2:bdff:fe0b:a6ee) Quit (Ping timeout: 480 seconds)
[2:30] * linuxkidd (~linuxkidd@ip70-189-202-62.lv.lv.cox.net) Quit (Quit: Leaving)
[2:33] * jlayton (~jlayton@cpe-2606-A000-1125-405B-14D9-DFF4-8FF1-7DD8.dyn6.twc.com) Quit (Quit: ZNC 1.6.2 - http://znc.in)
[2:35] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:35] * jlayton (~jlayton@cpe-2606-A000-1125-405B-14D9-DFF4-8FF1-7DD8.dyn6.twc.com) has joined #ceph
[2:43] * malevolent (~quassel@192.146.172.118) Quit (Quit: No Ping reply in 180 seconds.)
[2:43] * ircuser-1 (~Johnny@158.183-62-69.ftth.swbr.surewest.net) has joined #ceph
[2:44] * malevolent (~quassel@192.146.172.118) has joined #ceph
[2:50] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) has joined #ceph
[2:58] * jermudgeon (~jhaustin@gw1.ttp.biz.whitestone.link) Quit (Quit: jermudgeon)
[3:09] * lixiaoy1 (~lixiaoy1@shzdmzpr01-ext.sh.intel.com) Quit (Remote host closed the connection)
[3:10] * jfaj__ (~jan@p4FD269FB.dip0.t-ipconnect.de) has joined #ceph
[3:14] * salwasser (~Adium@2601:197:101:5cc1:806e:b3d5:6930:9088) Quit (Quit: Leaving.)
[3:16] * jfaj_ (~jan@p4FD273F7.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[3:20] * masuberu (~masber@149.171.21.105) has joined #ceph
[3:27] * chutz (~chutz@rygel.linuxfreak.ca) Quit (Quit: Leaving)
[3:27] * masber (~masber@129.94.15.152) Quit (Ping timeout: 480 seconds)
[3:37] * Jeffrey4l (~Jeffrey@110.252.72.251) has joined #ceph
[3:40] * wido (~wido@2a00:f10:121:100:4a5:76ff:fe00:199) Quit (Quit: No Ping reply in 180 seconds.)
[3:40] * wido (~wido@2a00:f10:121:100:4a5:76ff:fe00:199) has joined #ceph
[3:42] * sebastian-w_ (~quassel@212.218.8.139) has joined #ceph
[3:44] * sebastian-w (~quassel@212.218.8.138) Quit (Ping timeout: 480 seconds)
[3:44] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:46] * adamcrume (~quassel@2601:647:cb01:f890:a10c:97c0:74c7:6069) has joined #ceph
[3:47] * squizzi_ (~squizzi@107.13.237.240) Quit (Quit: bye)
[3:57] * kefu (~kefu@114.92.125.128) has joined #ceph
[3:58] * jermudgeon (~jhaustin@tab.biz.whitestone.link) has joined #ceph
[4:02] * Meths_ (~meths@95.151.244.152) has joined #ceph
[4:02] * Meths (~meths@95.151.244.152) Quit (Remote host closed the connection)
[4:03] * vata (~vata@96.127.202.136) has joined #ceph
[4:03] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[4:09] * kefu (~kefu@114.92.125.128) has joined #ceph
[4:15] * chutz (~chutz@rygel.linuxfreak.ca) has joined #ceph
[4:16] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[4:20] * jermudgeon (~jhaustin@tab.biz.whitestone.link) Quit (Quit: jermudgeon)
[4:31] * hgjhgjh (~Scymex@5.153.234.98) has joined #ceph
[4:45] * neurodrone (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone)
[5:00] * jamespage (~jamespage@culvain.gromper.net) Quit (Read error: No route to host)
[5:01] * jamespag` (~jamespage@culvain.gromper.net) has joined #ceph
[5:01] * hgjhgjh (~Scymex@5.153.234.98) Quit ()
[5:01] * yanzheng (~zhyan@118.116.115.254) Quit (Quit: ??????)
[5:03] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[5:03] * jermudgeon (~jhaustin@199.200.6.173) has joined #ceph
[5:03] * georgem (~Adium@69-165-135-139.dsl.teksavvy.com) Quit (Quit: Leaving.)
[5:08] * kevcampb (~kev@orchid.vm.bytemark.co.uk) has left #ceph
[5:11] * davidzlap (~Adium@2605:e000:1313:8003:60e4:8915:f702:1f2) Quit (Quit: Leaving.)
[5:11] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Ping timeout: 480 seconds)
[5:16] * Vacuum__ (~Vacuum@88.130.211.244) has joined #ceph
[5:23] * Vacuum_ (~Vacuum@88.130.219.117) Quit (Ping timeout: 480 seconds)
[5:32] * jermudgeon (~jhaustin@199.200.6.173) Quit (Quit: jermudgeon)
[5:34] * rotbeard (~redbeard@2a02:908:df13:bb00:7d59:ba6a:b1b6:1109) has joined #ceph
[5:34] * doppelgrau1 (~doppelgra@132.252.235.172) has joined #ceph
[5:34] * jermudgeon (~jhaustin@199.200.6.173) has joined #ceph
[5:36] * ivve (~zed@c83-254-7-248.bredband.comhem.se) has joined #ceph
[5:37] * yanzheng (~zhyan@118.116.115.254) has joined #ceph
[5:38] * yanzheng (~zhyan@118.116.115.254) Quit ()
[5:45] * rotbeard (~redbeard@2a02:908:df13:bb00:7d59:ba6a:b1b6:1109) Quit (Ping timeout: 480 seconds)
[5:48] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) has joined #ceph
[5:50] * ivve (~zed@c83-254-7-248.bredband.comhem.se) Quit (Ping timeout: 480 seconds)
[5:51] * jermudgeon (~jhaustin@199.200.6.173) Quit (Quit: jermudgeon)
[5:55] * valeech (~valeech@pool-96-247-203-33.clppva.fios.verizon.net) has joined #ceph
[5:55] * kuku (~kuku@119.93.91.136) has joined #ceph
[5:56] * rotbeard (~redbeard@aftr-109-90-233-215.unity-media.net) has joined #ceph
[6:02] * jermudgeon (~jhaustin@199.200.6.173) has joined #ceph
[6:02] * jermudgeon (~jhaustin@199.200.6.173) Quit ()
[6:02] * jermudgeon (~jhaustin@199.200.6.173) has joined #ceph
[6:03] * valeech (~valeech@pool-96-247-203-33.clppva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[6:05] <ben1> i see openattic has a fancy looking webui. does anyone know of anything standalone for such?
[6:05] <ben1> for graphs etc
[6:07] * jermudgeon (~jhaustin@199.200.6.173) Quit ()
[6:08] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[6:09] * ircolle (~Adium@2601:285:201:633a:d0d2:c6dd:fe8c:a356) Quit (Quit: Leaving.)
[6:09] * vata (~vata@96.127.202.136) Quit (Quit: Leaving.)
[6:11] * walcubi (~walcubi@p5797AF6C.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[6:11] * walcubi (~walcubi@p5795B7DD.dip0.t-ipconnect.de) has joined #ceph
[6:13] * davidzlap (~Adium@rrcs-74-87-213-28.west.biz.rr.com) Quit (Quit: Leaving.)
[6:26] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[6:32] * jermudgeon (~jhaustin@199.200.6.173) has joined #ceph
[6:40] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[6:40] * kefu (~kefu@114.92.125.128) has joined #ceph
[6:43] * haplo37 (~haplo37@107.190.42.94) Quit (Ping timeout: 480 seconds)
[6:48] * swami1 (~swami@49.38.1.60) has joined #ceph
[7:06] * janos_ (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) has joined #ceph
[7:07] * toastyde2th (~toast@pool-71-255-253-39.washdc.fios.verizon.net) has joined #ceph
[7:09] * jermudgeon (~jhaustin@199.200.6.173) Quit (Quit: jermudgeon)
[7:10] * rwheeler (~rwheeler@bzq-84-111-170-30.cablep.bezeqint.net) Quit (Quit: Leaving)
[7:10] * TomasCZ (~TomasCZ@yes.tenlab.net) Quit (Quit: Leaving)
[7:13] * jermudgeon (~jhaustin@199.200.6.173) has joined #ceph
[7:13] * janos (~messy@static-71-176-211-4.rcmdva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:14] * toastyde1th (~toast@pool-71-255-253-39.washdc.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[7:14] * icey_ (~Chris@pool-71-162-145-72.phlapa.fios.verizon.net) has joined #ceph
[7:14] * icey (~Chris@0001bbad.user.oftc.net) Quit (Ping timeout: 480 seconds)
[7:18] * rdas (~rdas@121.244.87.116) has joined #ceph
[7:19] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[7:32] * sphinxx (~sphinxx@154.118.120.40) has joined #ceph
[7:37] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[7:40] * yanzheng (~zhyan@118.116.115.254) has joined #ceph
[7:40] * sphinxx (~sphinxx@154.118.120.40) Quit (Ping timeout: 480 seconds)
[7:43] * karnan (~karnan@106.51.139.84) has joined #ceph
[7:49] * Jeffrey4l (~Jeffrey@110.252.72.251) Quit (Remote host closed the connection)
[7:51] * Jeffrey4l (~Jeffrey@110.252.72.251) has joined #ceph
[7:54] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) has joined #ceph
[8:04] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[8:05] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[8:05] * vikhyat (~vumrao@121.244.87.116) Quit ()
[8:05] * vikhyat (~vumrao@121.244.87.116) has joined #ceph
[8:11] * ledgr (~ledgr@qw1.kauko.lt) has joined #ceph
[8:14] * ledgr (~ledgr@qw1.kauko.lt) Quit (Remote host closed the connection)
[8:15] * ledgr (~ledgr@qw1.kauko.lt) has joined #ceph
[8:23] * ledgr (~ledgr@qw1.kauko.lt) Quit (Ping timeout: 480 seconds)
[8:25] * nardial (~ls@p5DC0728A.dip0.t-ipconnect.de) has joined #ceph
[8:29] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[8:33] <IcePic> jermudgeon: <60s
[8:33] <IcePic> if they get stuck, the reason is something else, like missing/bad crush rules or so
[8:34] <jermudgeon> IcePic: yeah, I???m thinking maybe so. I???ve done a bunch of testing today, pool creation/benchmarking, and getting the hang of it
[8:35] <jermudgeon> one thing I???m still not figuring out is journal performance; default journal size is something like 5GB, and I was seeing write performance tank after every 5G, presumably as journals flushed ??? this is with rados bench.
[8:35] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[8:35] <jermudgeon> So I set up a three SSD, three node system; two journals on SSD 1, OSDs on SSDs 2 and 3, and then tested??? same diff
[8:36] <jermudgeon> is it pointless using a dedicated SSD for journals if you???re using SSD for osds?
[8:37] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Ping timeout: 480 seconds)
[8:42] * rakeshgm (~rakesh@125.16.34.66) has joined #ceph
[8:44] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[8:46] <Be-El> jermudgeon: the main difference between writing to an osd and writing to the journal is the use of synchronous write mechanisms.
[8:47] <Be-El> jermudgeon: a journal entry is always written in synchronous mode (O_DSYNC flag), ensuring that it is stored on the device when the write is acknowledged
[8:48] <Be-El> jermudgeon: writes to the osd are queued, written asynchronously and use filesystem level synchronisation (e.g. calling fssync after writing)
[8:48] <jermudgeon> Gotcha. This is where min takes effect
[8:49] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[8:49] <doppelgrau> jermudgeon: there can be a benefit, e.g. using very fast and durable pcie attached ssds for journal or using ???cheaper??? SSDs for the storage, but if the SSDs are good (durable, enough IOPs, good performance with the sync IO) in my eyes the benefit is small,
[8:49] <Be-El> if you were able to ensure that writes to the osd's filesystem are synchronous, disabling the journal might be possible
[8:50] <jermudgeon> Hmm.
[8:50] <jermudgeon> doppelgrau: OK, so I seem to be hitting the limit of a single SSD in writes ???
[8:50] <Be-El> jermudgeon: the upcoming bluestore will drop the requirements to have a journal for fast writes. but it's not ready for production use yet
[8:50] <jermudgeon> using 10G between nodes
[8:52] <jermudgeon> Be-El: so I read. I???ll have to stick with hammer for now
[8:52] <jermudgeon> the default is to have journals directly on the node, no?
[8:53] <Be-El> jermudgeon: are you planing to use cephfs? in that case i highly recommend to use jewel
[8:53] <jermudgeon> nope, just rbd at present
[8:53] <jermudgeon> I???d love to be able to replicate the ???fusion??? model: first tier SSD, second tier platters, without having to explicitly replicate acorss. This could actually work though ??? I could do something like a 3:1 ratio of ssd to platters
[8:54] <jermudgeon> I three nodes with 4 bays each, so this could work well
[8:54] <jermudgeon> with max=3
[8:54] <Be-El> in a default setup without any explicitly given journal, ceph-disk (or ceph-deploy invoking ceph-disk on the target host) will create a small journal partition and use the remaining space on the storage device for the osd
[8:55] * sphinxx (~sphinxx@41.217.204.74) has joined #ceph
[8:55] <jermudgeon> Be-El: this might be the answer, just let each device maintain its own journal, and suffer the higher latency on the spinning disks ??? or tier with crush
[8:55] <Be-El> jermudgeon: you can use a ssd cache tier
[8:55] <jermudgeon> can you recommend a guide for this? I???ve seen a few different approaches, and wasn???t sure what would fit my use case
[8:55] <Be-El> so reads/writes will hit the ssd first, and the cache tier agent will flush the data to the backend storage later
[8:56] <jermudgeon> that???s exactly what I want
[8:56] <jermudgeon> so I get at least single SSD block performance over a three node cluster; more would be better, but what was killing me was the flushing, and having average performance be half a single SSD for a 6-osd pool, all SSD!
[8:57] <Be-El> i would propose to start with the osd setup, crush rules etc. to ensure that you can put pools on ssd and on hdd without mixing the storage device technologies
[8:57] <Be-El> see https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ for some examples and guidelines
[8:58] <Be-El> after pool setup is ok, use can configure cache tiering (http://docs.ceph.com/docs/master/rados/operations/cache-tiering/)
[8:59] <Be-El> i would also recommend to have a closer look at jewel for cache tiering, there've been a lot of improvements to reduce flush/evict effects
[8:59] <jermudgeon> Be-El: this looks like what I???m after. I???m running proxmox, and I???ve heard of at least a few people successfully upgrading to jewel
[9:00] <jermudgeon> I ran into some serious issues back in the spring when I first started using rbd, so I put it on the back burner
[9:00] <jermudgeon> now I???m keen to figure out what I was doing wrong
[9:01] <Be-El> the cache tier documentation has been somewhat lacking, it did not state that the ceph user actually need access right to the cache tier pools, too.
[9:01] <Be-El> it also did not mention that you need to set all the configuration value for the cache tier to work correctly
[9:01] <jermudgeon> I???m interested in e-coding, of course, but I see that it???s not recommended yet with tiering
[9:01] <Be-El> not sure about the current state, cache tiering in jewel is still on my todo list
[9:02] <Be-El> jermudgeon: i would use a standard replicated pool for the cache tier (on ssds), and an EC pool for the backend storage
[9:02] <jermudgeon> despite the warning in the last link you sent?
[9:02] <Be-El> not sure if that setup is transparent in rbd use cases
[9:03] <jermudgeon> yeah, it???s for vm storage
[9:03] <Be-El> rbd cannot work wih ec pool directly, since it requires to modify objects
[9:03] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[9:03] <Be-El> ec pool do not allow this operation (yet)
[9:03] <jermudgeon> interesting!
[9:03] <jermudgeon> I didn???t know that
[9:04] <Be-El> ec pool are good at creating objects, appending data or reading data
[9:04] <jermudgeon> you???d think that modify is just copy on write, no? or since you have to diff blocks???
[9:04] <Be-El> modifications of an existing object requires reading the complete object, modify it and write is back (similar to raid 5/6 on traditional storage)
[9:05] <Be-El> the cache tier handles this transparently for the ceph client
[9:05] <jermudgeon> honestly, I???ll be happy enough with something that takes reasonable bitrot precautions, and we???re getting there
[9:05] <Be-El> deep scrub my detect bit rot
[9:05] <mistur> Be-El: finally, I have deleted the entire cephfs filessytem and recreated another one from scratch it wasn't in use (cf my question yesterday)
[9:05] <Be-El> may even
[9:05] <mistur> (hello)
[9:06] <Be-El> mistur: i also wanted to delete the first data pool some time ago, but i wasn't able to do so (and with about 70 TB of data I do not want to restore from backup ;-) )
[9:06] <jermudgeon> Be-El: thanks for your help. I gotta run, power outage in my area, might have to be spinning up backup gens in morning
[9:07] <mistur> Be-El: indeed, with data on, it's a bug issue
[9:07] <Be-El> jermudgeon: there are also plans to include data checksum in bluestore, but i don't know about the current state
[9:07] <mistur> jermudgeon: I did many test with EC pool for radosgw
[9:07] <mistur> jermudgeon: in that case, it works like a charm
[9:08] <Be-El> jermudgeon: if you really really want instant protection from bitrot, you need to run a osd filesystem that provides bitrot detection on the filesystem level. -> btrfs
[9:08] <Be-El> and there are many caveeat with btrfs....
[9:08] <jermudgeon> yeah, I???m not that keen ??? played around with zfs, it wasn???t ready for me yet
[9:08] <ronrib> zfs snapshots are amazing
[9:09] <mistur> jermudgeon: and with my configurationm I got 40% less performance on EC pool vs replicat pool (throughput
[9:09] <ronrib> where's my cephfs snapshots
[9:09] <jermudgeon> mistur: seems like a reasonable tradeoff
[9:09] <jermudgeon> I still have quite a bit on glusterfs
[9:10] <Be-El> ronrib: zfs is nice, but afaik there's not explicit support for zfs features in filestore -> fallback to posix behaviour
[9:10] <mistur> jermudgeon: yes, it's not so bad, it was a 7+3 EC pool, the new one will be 8+2
[9:11] <jermudgeon> how many replicas do you use for 8+2?
[9:11] <jermudgeon> if only there were the holy grail of file systems?????good for vm storage, tiering, and general purpose small files with full support for all desktop OSes :)
[9:12] <jermudgeon> that would be a horse of a different color
[9:12] <mistur> < jermudgeon> how many replicas do you use for 8+2? <<< what do you mean ?
[9:12] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[9:12] <Be-El> jermudgeon: + replication with low storage overhead ;-)
[9:13] <mistur> jermudgeon: https://xkcd.com/927/
[9:13] * BlS (~zc00gii@209.95.50.58) has joined #ceph
[9:13] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:13] <mistur> :)
[9:13] <jermudgeon> mistur: so true
[9:14] <jermudgeon> Be-El: and erasure coding, and bitrot detection, and it also waxes your floors. don???t forget snapshots. and it recovers Hilary???s emails.
[9:14] <jermudgeon> mistur: there???s ALWAYS a relevant xkcd
[9:14] <mistur> jermudgeon: ALWAYS !
[9:15] <jermudgeon> mistur: I mean how many pgs do you require min and max for a typical pool
[9:15] <jermudgeon> since you have two nodes, I assume a primary tier is level host
[9:16] * neurodrone (~neurodron@158.106.193.162) has joined #ceph
[9:17] <mistur> jermudgeon: I have 12 nodes with 10 OSDs in each
[9:17] <jermudgeon> ah, I was misunderstanding 8+2
[9:17] <mistur> 8+2 is k+m EC parameter
[9:17] <jermudgeon> gotcha
[9:18] <mistur> :)
[9:18] <jermudgeon> I???m really fading, gotta run ??? thanks for hanging
[9:18] <mistur> jermudgeon: np
[9:19] * nardial (~ls@p5DC0728A.dip0.t-ipconnect.de) Quit (Quit: Leaving)
[9:20] * AlexeyAbashkin (~AlexeyAba@91.207.132.67) Quit (Quit: Leaving)
[9:23] * AlexeyAbashkin (~AlexeyAba@91.207.132.67) has joined #ceph
[9:27] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Ping timeout: 480 seconds)
[9:27] * T1w (~jens@node3.survey-it.dk) Quit (Remote host closed the connection)
[9:29] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[9:30] * hommie (~hommie@87.255.57.200) has joined #ceph
[9:31] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Remote host closed the connection)
[9:31] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[9:32] <hommie> quick question for the community: yesterday I replaced a couple of dead OSDs on a 0.94.7 cluster, and this morning I noticed that the amount of child processes per OSD increased significantly (thousands of child processes), the memory usage became sky high... has anyone ever faced this issue before? any suggestions?
[9:32] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[9:33] * doppelgrau1 (~doppelgra@132.252.235.172) Quit (Quit: Leaving.)
[9:34] * wjw-freebsd2 (~wjw@smtp.digiware.nl) has joined #ceph
[9:38] <sep> hommie, backfilling and recovery uses a lot of resources and memory yes, that is expected. probably a child process per backfilling. i have reduced the --osd-max-backfills from default since my hardware can not keep up with that heavy demand.
[9:38] <sep> it ofcourse means that recovery and backfill goes slower.
[9:38] * fsimonce (~simon@host98-71-dynamic.1-87-r.retail.telecomitalia.it) has joined #ceph
[9:39] <sep> i have not looked into the amount of child processes. thousands extra sounds like a lot
[9:39] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Ping timeout: 480 seconds)
[9:39] <hommie> sep, actually the cluster health is 'ok', the recovey process has finished many hours ago
[9:41] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[9:41] <sep> looking at one of my osd nodes i see 3k threads more during recovery compared to normal operation. so i assume what you see is correct.
[9:42] * doppelgrau (~doppelgra@132.252.235.172) has joined #ceph
[9:42] <sep> if you have hardware that can deal with the load, then getting the recovery and backfilling done quickly is best. absolutly. i am using repurposed hardware that does not meet the normal requirements, hence my need to throttle down the recovery???
[9:43] * BlS (~zc00gii@209.95.50.58) Quit ()
[9:44] <hommie> well, I have 8 cores and 16GB per node, each node has 3x 2TB OSD
[9:45] * TMM (~hp@dhcp-077-248-009-229.chello.nl) Quit (Quit: Ex-Chat)
[9:45] <hommie> what concerns me is that I have no backfilling going on (at all), or anything else for that matter, everything is active+clean
[9:46] <sep> lucky you...
[9:46] <sep> :)
[9:46] <sep> did you expect to have ?
[9:48] <hommie> nope, I just didn't except this amount of threads given the current cluster status.. it feels as if the these threads are just stuck there
[9:49] <T1w> what about setting noout and then restart the OSDs?
[9:50] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[9:52] <hommie> T1w, I suppose this is the only course of action available at this point, but I'm still interested in knowing what could possibly have triggered such behaviour :( <googling stuff>
[9:53] * arthur (~arthur@89.200.203.7) has joined #ceph
[9:54] <T1w> how long since backfill completed?
[9:54] <hommie> 5 hours approx.
[9:55] <hommie> http://docs.ceph.com/docs/jewel/rados/operations/monitoring-osd-pg/#recovering
[9:55] <T1w> hm.. good question
[9:55] <hommie> found this regarding "recovery thread timeout"
[9:55] * Hemanth (~hkumar_@125.16.34.66) has joined #ceph
[9:56] <T1w> 30 sec as default..
[9:58] <hommie> well, I guess it's time to start restarting 900 OSDs :)
[9:59] * kuku (~kuku@119.93.91.136) Quit (Remote host closed the connection)
[10:01] * DanFoster (~Daniel@2a00:1ee0:3:1337:815e:13d9:d188:782d) has joined #ceph
[10:02] * nardial (~ls@p5DC0728A.dip0.t-ipconnect.de) has joined #ceph
[10:04] * bara_ (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[10:09] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Quit: Bye guys! (??????????????????? ?????????)
[10:09] * bara_ (~bara@nat-pool-brq-t.redhat.com) Quit (Quit: Bye guys! (??????????????????? ?????????)
[10:11] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[10:12] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) has joined #ceph
[10:16] * sphinxx_ (~sphinxx@41.217.207.67) has joined #ceph
[10:21] <T1w> hommie: eeeks.. :)
[10:23] <liiwi> shouldn't that be just one command?
[10:23] <etienneme> you will probably get downtimes with only 1 command
[10:23] * sphinxx (~sphinxx@41.217.204.74) Quit (Ping timeout: 480 seconds)
[10:26] <boolman> in case of a node failure, is it expected to see blocked read/writes while the osd's is still marked as UP ?
[10:27] <boolman> so basically: osd_heartbeat_grace * mon_osd_min_down_reports seconds of downtime of the entire cluster
[10:28] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) has joined #ceph
[10:29] <peetaur2> boolman: in my limited testing, that is what I have seen if the only copy is down ... or even if the whole cluster is down
[10:29] <peetaur2> osds that is
[10:30] <peetaur2> blocks and resumes later (and writes that were interrupted corrupt cephfs maybe :))
[10:30] <boolman> well my size is 3 and min_size is 2, so I have other copies
[10:30] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[10:33] <boolman> that seems a bit odd, since i know that peeps here have hundreds of osds and or nodes
[10:34] <peetaur2> well I wouldn't expect it to block very long if there are other copies... is that what you observed?
[10:34] <peetaur2> how can I reproduce it?
[10:36] <boolman> I just killed the node, hard reboot
[10:36] <boolman> I'm using Jewel
[10:37] <peetaur2> I killed an osd and it still didn't rebuild... :/
[10:37] <peetaur2> using hammer 0.94.9
[10:41] <peetaur2> ah there it goes... I still need to figure out how to read/set the timeouts for that stuff :)
[10:41] <peetaur2> but anyway, doing drop_caches=3 and then md5sum a 1GB file and killing the osd in the middle doesn't break anything
[10:42] <peetaur2> I didn't time it, dunno if it was slower, but it didn't hang
[10:42] <boolman> peetaur2: so you killed a node and you didnt see any interuptions?
[10:43] <peetaur2> I did killall ceph-osd rather than reboot, and I didn't notice an interruption, but maybe there was a delay
[10:43] <peetaur2> with qemu and rbd
[10:43] <boolman> peetaur2: ok thx
[10:43] <peetaur2> and that's what I'd expect... the client can connect to the mons too, and find out what's going on and reach any osd it wants... makes no sense to just hang, only timeout and retry another
[10:45] <peetaur2> also at home I made a 1 osd cluster with jewel 10.2.2 and the osd kept crashing, and each time it would hang the client, but the client would just resume gracefully(other than corrupting files) when the osd was back
[10:45] <Be-El> peetaur2: afaik the client has a permanent connection to the mons, and changes to the osd map (osd up/down) are actively notified to the clients
[10:45] <peetaur2> and at home that was cephfs
[10:46] * TMM (~hp@185.5.121.201) has joined #ceph
[10:48] <peetaur2> oh and bringing up the osd doesn't require any io to rebuild... (assuming little/no data written since) how nice. :)
[10:54] * derjohn_mob (~aj@46.189.28.88) has joined #ceph
[10:59] * hyperbaba (~hyperbaba@private.neobee.net) has joined #ceph
[11:10] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:35fa:21a0:3929:f94d) has joined #ceph
[11:12] <IcePic> peetaur2: there is a timestamp or something, so unchanged data can be quickly validated and added back without heavy moving ops
[11:13] <TMM> peetaur2, yeah, if you know you're just doing some maintenance on an OSD just setting 'noout' on the cluster means that usually you can go back to HEALTH_OK in seconds or minutes
[11:13] <TMM> if you don't expect problems leaving noout on is a terrible idea, btw
[11:17] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[11:21] <peetaur2> yeah I figured that, and I've also seen noout mentioned.
[11:45] * libracious (~libraciou@catchpenny.cf) has left #ceph
[11:58] * jermudgeon (~jhaustin@199.200.6.173) Quit (Read error: No route to host)
[11:58] * jermudgeon (~jhaustin@199.200.6.173) has joined #ceph
[12:01] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[12:03] <chrome0> I have an issue on a jewel cluster where 1 pg is stuck degraded and undersized, any ideas?
[12:08] * jermudgeon (~jhaustin@199.200.6.173) Quit (Ping timeout: 480 seconds)
[12:09] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[12:09] * kefu is now known as kefu|afk
[12:10] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit ()
[12:13] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[12:14] * jermudgeon (~jhaustin@199.200.6.173) has joined #ceph
[12:20] * kefu|afk (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[12:20] * kefu (~kefu@li753-94.members.linode.com) has joined #ceph
[12:22] * evrardjp (~oftc-webi@104.130.231.135) has joined #ceph
[12:22] <evrardjp> hello
[12:23] <evrardjp> I can't find in the documentation the procedure about changing a monitor hostname without changing its IP
[12:23] <evrardjp> could someone point me at the procedure?
[12:23] <evrardjp> The host hasn't changed (no disk change, etc...) only a hostname change
[12:25] <peetaur2> all I can say is probably the monmap can do it...but if you screw up the monmap, bad things happen :)
[12:27] <hyperbaba> hi there,
[12:28] <hyperbaba> Is there a way to "evacute" existing data from a pg and remove a pg? i have 6 incomplete pgs and can't do anything with it. So i need to evacute a good part of the data to some other pg and remove / recreate / whatever with those 6
[12:29] <Be-El> evrardjp: afaik the monmap does not contain any hostnames, just ip adresses
[12:30] <peetaur2> ceph mon getmap > monmap; monmaptool --print monmap prints ips and names (names by convention are hostnames but do not have to be)
[12:31] * KindOne_ (kindone@198.14.199.32) has joined #ceph
[12:32] * dux0r (~ggg@exit0.radia.tor-relays.net) has joined #ceph
[12:37] * KindOne (kindone@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:37] * KindOne_ is now known as KindOne
[12:40] * valeech (~valeech@pool-96-247-203-33.clppva.fios.verizon.net) has joined #ceph
[12:47] * kefu is now known as kefu|afk
[12:48] * valeech (~valeech@pool-96-247-203-33.clppva.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[12:50] * Jeffrey4l (~Jeffrey@110.252.72.251) Quit (Read error: Connection reset by peer)
[12:50] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:51] * masber (~masber@149.171.21.105) has joined #ceph
[12:53] * masuberu (~masber@149.171.21.105) Quit (Read error: Connection reset by peer)
[12:53] * Jeffrey4l (~Jeffrey@119.251.252.244) has joined #ceph
[12:56] * psieklFH (psiekl@wombat.eu.org) has joined #ceph
[12:56] * psiekl (psiekl@wombat.eu.org) Quit (Read error: Connection reset by peer)
[12:59] <evrardjp> I'll have a look at this, thanks peetaur2
[12:59] * masuberu (~masber@129.94.15.152) has joined #ceph
[13:00] <evrardjp> Be-El: well the hostname is what I use for ID, so that's my problem
[13:00] <peetaur2> and use monmaptool to edit the file, not vim,etc.
[13:00] <Be-El> it would be interesting to know what's going to happen if you just shutdown the monitor, change the ip configuration, and restart it
[13:01] <peetaur2> and how to import it is something I have no idea about... all I know is ceph-mon --monmap ... which is for making a new monitor, not modifying the running cluster of monitors
[13:01] <peetaur2> you have backups or are doing it on a test cluster, right? :)
[13:01] <Be-El> evrardjp: you might also try to remove the monitor, change the host configuration, and add it again
[13:02] <Be-El> (given you have three or more monitors)
[13:02] * dux0r (~ggg@26XAAB26T.tor-irc.dnsbl.oftc.net) Quit ()
[13:03] <evrardjp> Be-El: that's what I've tried first remove and add it-again
[13:03] <evrardjp> it's terrible
[13:03] <evrardjp> :p
[13:03] <evrardjp> I thought it was going to be alright
[13:04] <evrardjp> I just reconfigured it with a new ID and it seems ok, but it's ugly because there are still remnants of the old hostname in the /var/lib/ceph/mon
[13:04] <evrardjp> so I'll investigate the best way
[13:04] <evrardjp> and yes, it's a test cluster :)
[13:06] <T1w> MONs are not supposed to change - period
[13:06] * masber (~masber@149.171.21.105) Quit (Ping timeout: 480 seconds)
[13:06] * jermudgeon (~jhaustin@199.200.6.173) Quit (Ping timeout: 480 seconds)
[13:06] <T1w> if you relaly need to, then add a new and remove the old is the way
[13:07] <peetaur2> evrardjp: if you have stopped the mon and do not want to use it again, you can simply rm -rf that dir (not mon but mon/clustername-monname)
[13:07] <T1w> .. and remember to update the MON IPs listed in ceph.conf and push the config to all nodes and restart all daemons
[13:07] <T1w> afterwards
[13:07] * kefu (~kefu@114.92.125.128) has joined #ceph
[13:07] <peetaur2> evrardjp: the old files would even cause problems if they had the "done" and "sysvinit" file there (marking it as ready to start with the init/rc system)
[13:08] <peetaur2> restart daemons, or some tell command
[13:09] * kefu|afk (~kefu@li753-94.members.linode.com) Quit (Ping timeout: 480 seconds)
[13:11] * wjw-freebsd2 (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[13:13] * karnan (~karnan@106.51.139.84) Quit (Read error: Connection timed out)
[13:14] * karnan (~karnan@106.51.139.84) has joined #ceph
[13:20] * rraja (~rraja@125.16.34.66) has joined #ceph
[13:21] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) has joined #ceph
[13:21] * Racpatel (~Racpatel@2601:87:3:31e3::77ec) Quit (Quit: Leaving)
[13:25] * krypto (~krypto@G68-121-13-250.sbcis.sbc.com) has joined #ceph
[13:26] * rmart04 (~rmart04@support.memset.com) has joined #ceph
[13:31] * bene2 (~bene@2601:193:4101:f410:ea2a:eaff:fe08:3c7a) has joined #ceph
[13:34] * salwasser (~Adium@2601:197:101:5cc1:5f7:8d8e:786d:5661) has joined #ceph
[13:38] * yanzheng (~zhyan@118.116.115.254) Quit (Quit: This computer has gone to sleep)
[13:40] * salwasser (~Adium@2601:197:101:5cc1:5f7:8d8e:786d:5661) Quit (Quit: Leaving.)
[13:47] * T1w (~jens@node3.survey-it.dk) Quit (Remote host closed the connection)
[13:47] * pdrakeweb (~pdrakeweb@pool-98-118-150-184.bflony.fios.verizon.net) has joined #ceph
[13:47] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[13:51] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[13:52] * kefu (~kefu@114.92.125.128) has joined #ceph
[13:56] * georgem (~Adium@24.114.55.197) has joined #ceph
[13:56] * georgem (~Adium@24.114.55.197) Quit ()
[13:56] * georgem (~Adium@206.108.127.16) has joined #ceph
[13:57] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) has joined #ceph
[14:00] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[14:00] * kefu (~kefu@114.92.125.128) has joined #ceph
[14:05] * hommie (~hommie@87.255.57.200) Quit (Quit: Leaving...)
[14:06] * kefu is now known as kefu|afk
[14:07] * kefu|afk (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[14:14] * valeech (~valeech@166.170.31.125) has joined #ceph
[14:16] * yanzheng (~zhyan@118.116.115.254) has joined #ceph
[14:16] * rdas (~rdas@121.244.87.116) Quit (Ping timeout: 480 seconds)
[14:17] * mattbenjamin (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[14:20] * hyperbaba (~hyperbaba@private.neobee.net) Quit (Quit: Konversation terminated!)
[14:22] * Peaced (~TomyLobo@tsn109-201-152-225.dyn.nltelcom.net) has joined #ceph
[14:23] <evrardjp> T1w: I'm pretty sure you don't get my issue here: I don't have choice, I have to change the hostname. And it's the same machine. So adding itself in the cluster and removing the old host name seems tedious and stupid
[14:25] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[14:25] <evrardjp> peetaur2: I planned to stop the ceph mon with the old hostname, reconfiguring it with the new hostname, then removing the folder in /var/lib/ceph/mon/<old_hostname>; restarting with new hostname
[14:26] * morse (~morse@supercomputing.univpm.it) Quit (Ping timeout: 480 seconds)
[14:26] <peetaur2> making a new directory with keyring, etc. first, then restarting, right?
[14:26] * morse (~morse@supercomputing.univpm.it) has joined #ceph
[14:27] * rakeshgm (~rakesh@125.16.34.66) Quit (Ping timeout: 480 seconds)
[14:31] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[14:32] <evrardjp> peetaur2: yes
[14:39] * rakeshgm (~rakesh@125.16.34.66) has joined #ceph
[14:39] * georgem (~Adium@206.108.127.16) Quit (Ping timeout: 480 seconds)
[14:42] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[14:42] * Racpatel (~Racpatel@2601:87:3:31e3::77ec) has joined #ceph
[14:45] * pdrakeweb (~pdrakeweb@pool-98-118-150-184.bflony.fios.verizon.net) Quit (Quit: Leaving...)
[14:46] * pdrakeweb (~pdrakeweb@pool-98-118-150-184.bflony.fios.verizon.net) has joined #ceph
[14:46] * topology (~oftc-webi@89.36.217.13) has joined #ceph
[14:46] * topology (~oftc-webi@89.36.217.13) has left #ceph
[14:50] * jiffe (~jiffe@nsab.us) has joined #ceph
[14:51] * georgem (~Adium@206.108.127.16) has joined #ceph
[14:51] <jiffe> so I have an osd that died and I have a pg that is stuck peering, everything else has migrated to other osds. pg query says 'starting or marking this osd lost may let us proceed', is marking as lost safe to do?
[14:52] * andrewschoen (~andrewsch@192.237.167.184) Quit (Ping timeout: 480 seconds)
[14:52] * Peaced (~TomyLobo@tsn109-201-152-225.dyn.nltelcom.net) Quit ()
[14:53] <peetaur2> jiffe: maybe the cause is you have another bad disk
[14:54] * sphinxx (~sphinxx@41.217.204.74) has joined #ceph
[14:54] <peetaur2> and I could also say it seems obvious that marking lost was intended to be used that way, but I'm not such an expert yet :)
[14:54] <jiffe> I would agree, I'm just not sure if there are other consequences to marking it as lost
[14:55] <jiffe> or what the best way to proceed here is
[14:56] <jiffe> also if I mark as lost is that going to pull it from crush and cause a rebalance
[14:56] <peetaur2> maybe find the osd that has the pg, and test it for bad sectors
[14:57] <peetaur2> if it is the only copy and is lost, there's nothing to pull, is there?
[14:57] * nass5 (~fred@l-p-dn-in-12a.lionnois.site.univ-lorraine.fr) Quit (Quit: Leaving.)
[14:57] * nass5 (~fred@l-p-dn-in-12a.lionnois.site.univ-lorraine.fr) has joined #ceph
[14:59] <peetaur2> find it with: ceph pg {pg number} query
[15:00] * zhen (~Thunderbi@124.205.254.29) has joined #ceph
[15:00] * Kioob (~Kioob@LMontsouris-656-1-1-206.w80-12.abo.wanadoo.fr) has joined #ceph
[15:01] * sphinxx_ (~sphinxx@41.217.207.67) Quit (Ping timeout: 480 seconds)
[15:01] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) has joined #ceph
[15:02] <jiffe> yeah, it says probing osds 8, 24, 27 and blocked by 36
[15:02] <jiffe> 36 is a dead drive
[15:03] <jiffe> acting and up are 8 and 27
[15:04] <jiffe> 36 is still in crush, just marked as down/out
[15:04] <peetaur2> what does it mean... you have size=4?
[15:05] * neurodrone_ (~neurodron@pool-100-35-225-168.nwrknj.fios.verizon.net) Quit (Quit: neurodrone_)
[15:05] <jiffe> if size is the number or replicas, replica count is 2
[15:05] <peetaur2> then I would expect it to only mention 2 osds
[15:06] <jiffe> one would think
[15:06] <peetaur2> in up, acting, and actingbackfill, mine says 2 osd ids with size 2
[15:06] <jiffe> I don't know what this probing process is trying to achieve, maybe it keeps backups
[15:07] <jiffe> pg's that are up don't show a probing section in query
[15:07] * rotbeard (~redbeard@aftr-109-90-233-215.unity-media.net) Quit (Quit: Leaving)
[15:07] <peetaur2> yep
[15:09] <peetaur2> when I killed 2 osds (0 and 1), then query would just hang... I didn't want to mark it lost, so I started osd 1 again, and then https://bpaste.net/show/989858259a32
[15:11] <jiffe> all my osds are up except this one and this pg seems to already have its 2 replicas in other osds, it just seems to be hanging onto this osd for some reason
[15:11] <arthur> hi, let's say i have a pool of osds backed by sata ssd disks. how much improvement would a dedicated ssd (nvme/pcie) for journals achieve?
[15:12] <peetaur2> and my guess was that the 2nd copy of that pg is on a bad sector...did you check that?
[15:13] <peetaur2> arthur: for random write, a bunch I'd think
[15:13] <peetaur2> er I missed "ssd disks" .... I think not much then
[15:13] <peetaur2> maybe even negative since it has to write to more places for the same amount of work
[15:14] * sankarshan (~sankarsha@125.16.34.66) has joined #ceph
[15:14] * Kurt1 (~Adium@2001:628:1:5:30bc:1c0:6911:a5bb) has joined #ceph
[15:15] * Kurt1 (~Adium@2001:628:1:5:30bc:1c0:6911:a5bb) Quit ()
[15:15] <peetaur2> jiffe: did you read this yet? http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#placement-group-down-peering-failure
[15:16] * nardial (~ls@p5DC0728A.dip0.t-ipconnect.de) Quit (Quit: Leaving)
[15:17] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[15:17] <- *johnavp1989* To prove that you are human, please enter the result of 8+3
[15:21] * salwasser (~Adium@72.246.3.14) has joined #ceph
[15:22] * valeech (~valeech@166.170.31.125) Quit (Quit: valeech)
[15:25] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[15:27] * lincolnb (~lincoln@c-71-57-68-189.hsd1.il.comcast.net) Quit (Read error: Connection reset by peer)
[15:27] * lincolnb (~lincoln@c-71-57-68-189.hsd1.il.comcast.net) has joined #ceph
[15:28] * mattbenjamin (~mbenjamin@76-206-42-50.lightspeed.livnmi.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[15:28] * TMM (~hp@185.5.121.201) has joined #ceph
[15:33] * rakeshgm (~rakesh@125.16.34.66) Quit (Remote host closed the connection)
[15:37] * mattbenjamin (~mbenjamin@12.118.3.106) has joined #ceph
[15:37] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[15:38] * Nicho1as (~nicho1as@00022427.user.oftc.net) has joined #ceph
[15:43] * swami1 (~swami@49.38.1.60) Quit (Quit: Leaving.)
[15:43] * kuku (~kuku@112.203.19.176) has joined #ceph
[15:45] * davidb (~David@MTRLPQ42-1176054809.sdsl.bell.ca) has joined #ceph
[15:46] * kuku (~kuku@112.203.19.176) Quit (Remote host closed the connection)
[15:59] * mitchty_ (~quassel@130-245-47-212.rev.cloud.scaleway.com) Quit (Remote host closed the connection)
[15:59] * mitchty (~quassel@130-245-47-212.rev.cloud.scaleway.com) has joined #ceph
[16:05] * yanzheng (~zhyan@118.116.115.254) Quit (Quit: This computer has gone to sleep)
[16:09] * andrewschoen (~andrewsch@2001:4801:7821:77:be76:4eff:fe10:afc7) has joined #ceph
[16:14] * vata (~vata@207.96.182.162) has joined #ceph
[16:21] * xarses (~xarses@c-73-202-191-48.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:21] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[16:22] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[16:25] * squizzi (~squizzi@107.13.237.240) has joined #ceph
[16:27] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[16:29] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[16:30] * haplo37 (~haplo37@199.91.185.156) has joined #ceph
[16:31] * kuku (~kuku@112.203.19.176) has joined #ceph
[16:35] * kuku (~kuku@112.203.19.176) Quit (Remote host closed the connection)
[16:37] * F|1nt (~F|1nt@host37-211.lan-isdn.imaginet.fr) has joined #ceph
[16:40] * kefu (~kefu@114.92.125.128) has joined #ceph
[16:43] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[16:44] * kefu (~kefu@114.92.125.128) has joined #ceph
[16:44] * krypto (~krypto@G68-121-13-250.sbcis.sbc.com) Quit (Read error: Connection reset by peer)
[16:44] * krypto (~krypto@G68-121-13-250.sbcis.sbc.com) has joined #ceph
[16:44] * xinli (~charleyst@32.97.110.50) has joined #ceph
[16:47] * zhen (~Thunderbi@124.205.254.29) Quit (Remote host closed the connection)
[16:50] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[16:50] * ivve (~zed@cust-gw-11.se.zetup.net) has joined #ceph
[16:50] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) Quit (Ping timeout: 480 seconds)
[16:52] * EinstCrazy (~EinstCraz@222.69.240.111) has joined #ceph
[16:53] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[16:54] * kefu (~kefu@114.92.125.128) has joined #ceph
[16:54] * TMM (~hp@185.5.121.201) Quit (Quit: Ex-Chat)
[16:56] * EinstCrazy (~EinstCraz@222.69.240.111) Quit (Remote host closed the connection)
[16:56] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) Quit (Remote host closed the connection)
[17:02] * ircolle (~Adium@2601:285:201:633a:ec7a:4ecb:2ec7:d829) has joined #ceph
[17:05] * wushudoin (~wushudoin@2601:646:8200:c9f0:2ab2:bdff:fe0b:a6ee) has joined #ceph
[17:06] * ivve (~zed@cust-gw-11.se.zetup.net) Quit (Ping timeout: 480 seconds)
[17:07] <etienneme> Is there a way to get status of a specific monitor?
[17:07] <etienneme> Doing ceph mon_status on monitor01 can give status of monitor02
[17:10] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has left #ceph
[17:10] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[17:12] * vikhyat (~vumrao@121.244.87.116) Quit (Quit: Leaving)
[17:12] * xarses (~xarses@64.124.158.3) has joined #ceph
[17:15] * Hemanth (~hkumar_@125.16.34.66) Quit (Ping timeout: 480 seconds)
[17:15] * xinli (~charleyst@32.97.110.50) Quit (Remote host closed the connection)
[17:16] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) has joined #ceph
[17:16] * xinli (~charleyst@32.97.110.50) has joined #ceph
[17:18] <xinli> rkeene: if I install and configure ceph cluster in Ubuntu, should I also configure firewall and monitor node by iptables? sudo iptables -A INPUT -i {iface} -p tcp -s {ip-address}/{netmask} --dport 6789 -j ACCEPT
[17:22] * kristen (~kristen@jfdmzpr03-ext.jf.intel.com) has joined #ceph
[17:24] <xinli> rkeene: when I run command: ""ceph-deploy new ceph-osd2" to initilize the monitor node, I get error "[ERROR ] RuntimeError: connecting to host: ceph-osd2 resulted in errors: IOError cannot send (already closed?)"
[17:24] * EinstCrazy (~EinstCraz@222.69.240.111) has joined #ceph
[17:26] * EinstCrazy (~EinstCraz@222.69.240.111) Quit (Remote host closed the connection)
[17:26] * EinstCrazy (~EinstCraz@222.69.240.111) has joined #ceph
[17:26] * yanzheng (~zhyan@118.116.115.254) has joined #ceph
[17:27] * sankarshan (~sankarsha@125.16.34.66) Quit (Quit: Are you sure you want to quit this channel (Cancel/Ok) ?)
[17:27] * krypto (~krypto@G68-121-13-250.sbcis.sbc.com) Quit (Read error: Connection reset by peer)
[17:28] * krypto (~krypto@49.207.63.172) has joined #ceph
[17:33] * swami1 (~swami@27.7.173.101) has joined #ceph
[17:36] * yanzheng (~zhyan@118.116.115.254) Quit (Quit: This computer has gone to sleep)
[17:40] * vbellur (~vijay@71.234.224.255) Quit (Ping timeout: 480 seconds)
[17:41] * F|1nt (~F|1nt@host37-211.lan-isdn.imaginet.fr) Quit (Quit: Be back later ...)
[17:46] * swami1 (~swami@27.7.173.101) Quit (Quit: Leaving.)
[17:47] <xinli> hi, can some one help me to fix the issue that my 'ceph-deploy new {mon}" failed with error: ""RuntimeError: connecting to host: ceph-osd2 resulted in errors: IOError cannot send (already closed?)""
[17:47] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Ping timeout: 480 seconds)
[17:53] * rraja (~rraja@125.16.34.66) Quit (Quit: Leaving)
[17:55] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) has joined #ceph
[17:58] * vbellur (~vijay@nat-pool-bos-t.redhat.com) has joined #ceph
[17:58] * kefu (~kefu@114.92.125.128) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:59] * EinstCrazy (~EinstCraz@222.69.240.111) Quit (Remote host closed the connection)
[18:00] * kefu (~kefu@114.92.125.128) has joined #ceph
[18:01] * linuxkidd (~linuxkidd@ip70-189-202-62.lv.lv.cox.net) has joined #ceph
[18:03] * Nicho1as (~nicho1as@00022427.user.oftc.net) Quit (Quit: A man from the Far East; using WeeChat 1.5)
[18:06] * TMM (~hp@31.161.155.187) has joined #ceph
[18:09] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[18:09] * krypto (~krypto@49.207.63.172) Quit (Ping timeout: 480 seconds)
[18:09] * mykola (~Mikolaj@91.245.73.87) has joined #ceph
[18:09] * derjohn_mob (~aj@46.189.28.88) Quit (Ping timeout: 480 seconds)
[18:09] * krypto (~krypto@192.71.175.30) has joined #ceph
[18:10] * borei (~dan@216.13.217.230) has joined #ceph
[18:13] * jarrpa (~jarrpa@205.158.164.101.ptr.us.xo.net) has joined #ceph
[18:14] * doppelgrau (~doppelgra@132.252.235.172) Quit (Quit: Leaving.)
[18:15] * kefu (~kefu@114.92.125.128) Quit (Remote host closed the connection)
[18:15] * kefu (~kefu@li1432-31.members.linode.com) has joined #ceph
[18:16] * sphinxx (~sphinxx@41.217.204.74) Quit (Ping timeout: 480 seconds)
[18:17] * blizzow (~jburns@50-243-148-102-static.hfc.comcastbusiness.net) has joined #ceph
[18:18] * kefu_ (~kefu@114.92.125.128) has joined #ceph
[18:20] <blizzow> I added a new OSD node to my 8 node cluster this morning. It's got 4 OSD in it. The OSDs are 3x7200RPM SATA spinners and 1xSSD. All of my VMs have painfully slow disk IO right now while it's in "recovery" and rebalancing. I don't see anything glaring when I look at my OSD nodes other than disks spiking to 100% utilization frequently.
[18:20] <blizzow> Is there anything I should be looking at in particular, or a way to fix this?
[18:20] <blizzow> I see the recovery rate spikes up past 900MB/sec sometimes, but it's still really slow.
[18:21] * kefu_ (~kefu@114.92.125.128) Quit ()
[18:22] <blizzow> All of my coworkers are complaining that anything accessing disk is brutal now.
[18:22] * vasu (~vasu@c-73-231-60-138.hsd1.ca.comcast.net) has joined #ceph
[18:22] <Kingrat> http://docs.ceph.com/docs/jewel/rados/configuration/osd-config-ref/#recovery
[18:22] <Kingrat> suggest looking at osd recovery max active
[18:23] * karnan (~karnan@106.51.139.84) Quit (Quit: Leaving)
[18:24] <blizzow> I'm at about 6% misplaced objects (~6TB) on a 10GBe store. I'm wondering if I turn that up and bear the pain for a short time vs. turning it down to improve performance.
[18:25] <Be-El> blizzow: do you use the hammer or jewel release? did you set specific options for backfilling (e.g. osd_max_backfills0?
[18:25] <blizzow> Be-El: I'm using jewell, it's a pretty vanilla cluster.
[18:25] <blizzow> I haven't set any backfill options.
[18:25] * kefu (~kefu@li1432-31.members.linode.com) Quit (Ping timeout: 480 seconds)
[18:26] <Be-El> blizzow: jewel has an improved op queuing implementation that should prevent a high impact of maintenance operations on client i/o
[18:26] <Be-El> blizzow: try setting osd_max_backfills to 1 for all osd to reduce the backfill traffic
[18:26] * jarrpa (~jarrpa@205.158.164.101.ptr.us.xo.net) Quit (Ping timeout: 480 seconds)
[18:27] * TMM (~hp@31.161.155.187) Quit (Ping timeout: 480 seconds)
[18:27] <blizzow> Be-El: can I set them ad-hoc, or do I have to change /etc/ceph/ceph.conf on all OSDs?
[18:28] <Be-El> blizzow: "ceph tell osd.* injectsargs "--osd-max-backfills=1"
[18:28] <Be-El> without the first quote
[18:28] * kefu (~kefu@114.92.125.128) has joined #ceph
[18:29] <blizzow> and without the s in injects ;)
[18:29] <Be-El> absolutely ;-)
[18:30] <Be-El> with this setting each osd will either have one outgoing or incoming backfill operation
[18:30] <Be-El> with 4 new osd you will have 4 backfill operations overall (given the 4 new osds are filled by four different existing osds)
[18:31] <Be-El> the overall backfill time will increase, but client operations should be smoothier
[18:31] <Be-El> changing this setting will not affect running backfills, so you'll have to wait for the number of currently running ones to decrease
[18:32] <blizzow> okey dokey. It's kind of rough that adding OSDs to a cluster running in the green can bring it to it's knees. :/
[18:33] <johnavp1989> How can I tell if a recovery operation is going well or not? I've got 81 active+degraded pg's. I've got ceph -w going but I don't see any change to the status over the last half our. Still at .0630%
[18:36] <Be-El> johnavp1989: does ceph -w prints some information about the number of recovered objects?
[18:37] <Be-El> the values for degraded/remapped object counts sometimes only change after a complete pg has been processed
[18:37] <johnavp1989> 2016-09-21 16:36:23.509120 mon.0 [INF] pgmap v11498866: 4800 pgs: 81 active+degraded, 3 active+remapped, 4716 active+clean; 1243 GB data, 3671 GB used, 62181 GB / 65853 GB avail; 1184 B/s wr, 0 op/s; 4499/714492 objects degraded (0.630%)
[18:38] <johnavp1989> Be-ElL is this what you mean pgmap v11498866?
[18:38] * valeech (~valeech@50-20-85-98.customerip.birch.net) has joined #ceph
[18:38] * derjohn_mob (~aj@tmo-108-163.customers.d1-online.com) has joined #ceph
[18:39] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[18:39] <Be-El> johnavp1989: no, there should be some information about objects recovered since the last output, including bandwidth for recovery
[18:40] * kefu (~kefu@114.92.125.128) has joined #ceph
[18:40] <Be-El> johnavp1989: the pg are also only in the active+degraded and active+remapped state. there's no active backfilling going on at the moment
[18:42] <johnavp1989> yea that's all their is. do i need to force a backfill or something?
[18:42] <Be-El> johnavp1989: the recovery seems to be stuck, so you might need to trigger it again
[18:43] <Be-El> johnavp1989: afaik there's no way to trigger recovery directly, but i was able to trigger it in the past with a little trick
[18:43] <Be-El> johnavp1989: change the crush weight of a osd (triggers backfilling), and change it back to the original value
[18:44] * krypto (~krypto@192.71.175.30) Quit (Read error: Connection reset by peer)
[18:45] * krypto (~krypto@49.207.63.172) has joined #ceph
[18:45] <johnavp1989> hmmm okay
[18:46] <johnavp1989> what's also interesting is that one or my OSD's is down. but there no woarning in ceph health
[18:46] * jermudgeon (~jhaustin@31.207.56.59) has joined #ceph
[18:47] * todin (~tuxadero@kudu.in-berlin.de) has joined #ceph
[18:49] * rmart04 (~rmart04@support.memset.com) Quit (Ping timeout: 480 seconds)
[18:49] * sxc731_ (~sxc731@xdsl-188-154-67-50.adslplus.ch) has joined #ceph
[18:54] <johnavp1989> Be-El: Just change the weight of a single OSD?
[18:54] <johnavp1989> or a whole node?
[18:55] <Be-El> johnavp1989: that's what i usually do. just a small change (0.1) to prevent triggering to much additional backfilling
[18:56] <johnavp1989> ok and change it back immediately?
[18:57] <Be-El> after backfilling has started
[19:01] <johnavp1989> Be-El: Ok thank you. That kicked of a backfill.
[19:02] <johnavp1989> I'm not sure what's going on here though after I changed the weight.
[19:02] <johnavp1989> { "id": 55,
[19:02] <johnavp1989> "weight": 71434,
[19:02] <johnavp1989> "pos": 18},
[19:02] <johnavp1989> { "id": 59,
[19:02] <johnavp1989> "weight": -2147483648,
[19:02] <johnavp1989> "pos": 19}]}],
[19:02] <Be-El> did you changed it back to the former value?
[19:02] <johnavp1989> yes
[19:03] <Be-El> and ceph osd tree display the correct value for the osd?
[19:03] * fdmanana (~fdmanana@2001:8a0:6e0c:6601:35fa:21a0:3929:f94d) Quit (Ping timeout: 480 seconds)
[19:04] <johnavp1989> ahhh no i used the wrong value
[19:04] <johnavp1989> I got it back now
[19:05] <johnavp1989> Thanks for your help. I'll see if this does the trick
[19:05] <Be-El> the pg should be backfilling or wait_backfill now
[19:06] <johnavp1989> ceph -w shows a count of the objects recovery now
[19:06] <Be-El> what's the overall state for the pgs?
[19:07] <johnavp1989> 2349/714759 objects degraded (0.329%)
[19:07] <johnavp1989> 1 active+recovering+remapped
[19:07] <johnavp1989> 4713 active+clean
[19:07] <johnavp1989> 3 active+remapped+wait_backfill
[19:07] <johnavp1989> 14 active+degraded
[19:07] <johnavp1989> 51 active+remapped
[19:07] <johnavp1989> 18 active+recovery_wait+remapped
[19:08] <Be-El> there's still something wrong. the active+degraded and active+remapped ones should also be wait_backfill
[19:08] <Be-El> how many hosts do you have?
[19:08] <johnavp1989> 3
[19:08] <Be-El> and how many osds per host?
[19:09] <johnavp1989> 20
[19:11] <Be-El> can you upload the output of 'ceph pg dump | grep degraded' to some pastebin?
[19:11] <johnavp1989> http://paste.openstack.org/show/582434/
[19:12] <Be-El> do you use a replication factor of three?
[19:12] * sxc731_ (~sxc731@xdsl-188-154-67-50.adslplus.ch) Quit (Remote host closed the connection)
[19:13] <johnavp1989> yes
[19:13] <Be-El> so there's a problem with your crush rulesets
[19:14] <Be-El> there're only two osd associated to the degraded pgs
[19:14] <Be-El> which is why they are in the degraded state...the third replicate is missing
[19:14] <Be-El> which ceph release do you use?
[19:14] <johnavp1989> ceph version 0.80.9
[19:15] * BrianA (~BrianA@fw-rw.shutterfly.com) has joined #ceph
[19:16] <Be-El> that's firefly or emperor?
[19:16] * DanFoster (~Daniel@2a00:1ee0:3:1337:815e:13d9:d188:782d) Quit (Quit: Leaving)
[19:17] * sxc731 (~sxc731@xdsl-188-154-67-50.adslplus.ch) has joined #ceph
[19:17] <johnavp1989> firefly
[19:18] * mhack (~mhack@24-151-36-149.dhcp.nwtn.ct.charter.com) Quit (Remote host closed the connection)
[19:18] * kefu (~kefu@114.92.125.128) Quit (Max SendQ exceeded)
[19:18] * swami1 (~swami@27.7.173.101) has joined #ceph
[19:18] <Be-El> do you use the firefly crush tuneables?
[19:20] * valeech (~valeech@50-20-85-98.customerip.birch.net) Quit (Ping timeout: 480 seconds)
[19:20] <johnavp1989> vs the legacy ones?
[19:20] <Be-El> yes
[19:21] <Be-El> if all of your clients support the firefly tuneable, i would propose to change the tuneables to the 'optimal' value (might trigger a lot of data movement)
[19:21] <johnavp1989> I don't think it's using legacy. There's no warning and it's not disabled in ceph.conf
[19:21] <Be-El> otherwise (or if the firefly tunables don't resolve the problem), you can try to increase the choose_total_tries value to e.g. 50
[19:23] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) has joined #ceph
[19:23] <Be-El> the documentation has more details on how to change this value
[19:24] <Be-El> -> off for today
[19:24] * Be-El (~blinke@nat-router.computational.bio.uni-giessen.de) Quit (Quit: Leaving.)
[19:24] * jermudgeon (~jhaustin@31.207.56.59) Quit (Quit: jermudgeon)
[19:24] <johnavp1989> ok thank you
[19:26] <johnavp1989> Change to optimal. Looks better?
[19:26] <johnavp1989> osdmap e756: 60 osds: 59 up, 59 in
[19:26] <johnavp1989> pgmap v11499779: 4800 pgs, 14 pools, 1243 GB data, 232 kobjects
[19:26] <johnavp1989> 3697 GB used, 62155 GB / 65853 GB avail
[19:26] <johnavp1989> 728677/883845 objects degraded (82.444%)
[19:26] <johnavp1989> 1809 active
[19:26] <johnavp1989> 299 active+recovery_wait
[19:26] <johnavp1989> 12 active+degraded+remapped
[19:26] <johnavp1989> 1616 active+clean
[19:26] <johnavp1989> 10 active+remapped+wait_backfill
[19:26] <johnavp1989> 3 active+recovering
[19:26] <johnavp1989> 1051 active+remapped
[19:26] <johnavp1989> recovery io 593 MB/s, 105 objects/s
[19:33] <SamYaple> johnavp1989: just be careful, make sure all your clients can use it
[19:33] <SamYaple> johnavp1989: otherwise youre going to have alot of trouble
[19:33] * stiopa (~stiopa@cpc73832-dals21-2-0-cust453.20-2.cable.virginm.net) has joined #ceph
[19:33] <johnavp1989> SamYaple: by all my clients you mean all of my ceph nodes? Or everything that mounts RBD's
[19:34] <SamYaple> everything
[19:35] <SamYaple> if you have any client that doesnt support firefly tuneables that has an rbd right now, it is not able to talk to the OSDs and youre going to lose data
[19:36] <johnavp1989> ok thanks
[19:40] * Kioob (~Kioob@LMontsouris-656-1-1-206.w80-12.abo.wanadoo.fr) Quit (Quit: Leaving.)
[19:43] * jermudgeon (~jhaustin@199.200.6.48) has joined #ceph
[19:45] * derjohn_mob (~aj@tmo-108-163.customers.d1-online.com) Quit (Ping timeout: 480 seconds)
[19:45] * F|1nt (~F|1nt@85-170-91-210.rev.numericable.fr) has joined #ceph
[19:55] * sxc731 (~sxc731@xdsl-188-154-67-50.adslplus.ch) Quit (Remote host closed the connection)
[19:58] * sphinxx (~sphinxx@154.118.120.40) has joined #ceph
[20:02] * ledgr (~ledgr@88-222-11-185.meganet.lt) has joined #ceph
[20:06] <blizzow> @joao or @nhm can you change the room banner to point out where cephlogbot logs to?
[20:09] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) Quit (Ping timeout: 480 seconds)
[20:10] * ledgr (~ledgr@88-222-11-185.meganet.lt) Quit (Ping timeout: 480 seconds)
[20:11] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[20:15] * rmart04 (~rmart04@host86-185-106-132.range86-185.btcentralplus.com) has joined #ceph
[20:23] * rmart04_ (~rmart04@support.memset.com) has joined #ceph
[20:23] * rmart04 (~rmart04@host86-185-106-132.range86-185.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[20:23] * rmart04_ is now known as rmart04
[20:25] * F|1nt (~F|1nt@85-170-91-210.rev.numericable.fr) Quit (Quit: Oups, just gone away...)
[20:29] * davidb (~David@MTRLPQ42-1176054809.sdsl.bell.ca) has left #ceph
[20:30] * xinli (~charleyst@32.97.110.50) Quit (Remote host closed the connection)
[20:34] * TomasCZ (~TomasCZ@yes.tenlab.net) has joined #ceph
[20:39] * mykola (~Mikolaj@91.245.73.87) Quit (Read error: Connection reset by peer)
[20:39] * salwasser (~Adium@72.246.3.14) Quit (Quit: Leaving.)
[20:40] * xinli (~charleyst@32.97.110.53) has joined #ceph
[20:42] * derjohn_mob (~aj@b2b-94-79-172-98.unitymedia.biz) has joined #ceph
[20:45] * mykola (~Mikolaj@91.245.73.87) has joined #ceph
[20:54] <xinli> @joao, @nhm; can you help to to fix the ceph-deploy failed issue: I use ceph-deploy install node1 node2 node3, the installation in node2 failed. How can I handle this issue?
[20:54] <xinli> error message: [2016-09-21 13:26:29,102][ceph-osd1][DEBUG ] dpkg: error processing package ceph-common (--configure):^M
[20:54] <xinli> [2016-09-21 13:26:29,102][ceph-osd1][DEBUG ] subprocess installed post-installation script returned error exit status 8^M
[20:54] * rwheeler (~rwheeler@bzq-84-111-170-30.red.bezeqint.net) has joined #ceph
[20:58] * jermudgeon (~jhaustin@199.200.6.48) Quit (Ping timeout: 480 seconds)
[20:58] * swami1 (~swami@27.7.173.101) Quit (Quit: Leaving.)
[21:00] * debian112 (~bcolbert@2600:1005:b054:9068:863a:4bff:fe9b:d1ba) has joined #ceph
[21:00] * jermudgeon (~jhaustin@wpc-pe-l2.whitestone.link) has joined #ceph
[21:00] <mnaser> speaking of which, what's the best way to find out if all clients can support firefly tuneable
[21:06] <xinli> team: Can I use ceph-deploy to install the cluster node one by one instead of a cluster? I mean can I just use ceph-deploy install node 1, and after the node1 done, then using ceph-deploy install node2?
[21:06] * kefu (~kefu@114.92.125.128) has joined #ceph
[21:08] * Mattress (~slowriot@exit1.radia.tor-relays.net) has joined #ceph
[21:09] * evrardjp (~oftc-webi@104.130.231.135) Quit (Remote host closed the connection)
[21:12] * debian1121 (~bcolbert@2600:1005:b014:291e:863a:4bff:fe9b:d1ba) has joined #ceph
[21:14] * kefu (~kefu@114.92.125.128) Quit (Ping timeout: 480 seconds)
[21:16] * debian112 (~bcolbert@2600:1005:b054:9068:863a:4bff:fe9b:d1ba) Quit (Ping timeout: 480 seconds)
[21:18] <vasu> yes its one and the same, even if you specify node1 node2 .. noden, the install is one by one
[21:19] <xinli> vasu: thanks, any idea why doe smy install failed? dpkg: error processing package ceph-common (--configure):^M
[21:19] <vasu> what version of ceph and ceph-deploy?
[21:20] <xinli> vasu: how can I reinstall ? purgedata / purge?
[21:20] <vasu> yeah purge and purgedata should cleanup and then you can rerun
[21:20] <xinli> vasu: in Ubuntu, ceph-jewel
[21:21] <xinli> vasu: how to check the release?
[21:21] <vasu> follow the release notes and dont miss release flag during install, it should work
[21:21] * rmart04 (~rmart04@support.memset.com) Quit (Quit: rmart04)
[21:22] <blizzow> xinli: log into the node and try dpkg --configure -a and see if you have a messed up package state.
[21:22] <vasu> ceph-deploy install --stable jewel node1 node2 ...
[21:22] <vasu> http://docs.ceph.com/docs/jewel/release-notes/
[21:23] <blizzow> Make sure your nodes are in good health before running your ceph-deploy commands.
[21:23] <xinli> can I firt purge and purgedata ?
[21:24] * TMM (~hp@dhcp-077-248-009-229.chello.nl) has joined #ceph
[21:26] <xinli> blizzow: yes, the dpkg --configure -a show a lot of errors in ceph: dpkg: error processing package ceph-base (--configure):
[21:26] <xinli> dependency problems - leaving unconfigured
[21:33] * fusl (fusl@1.0.0.127.in-addr.arpa.li) has joined #ceph
[21:33] * krypto (~krypto@49.207.63.172) Quit (Ping timeout: 480 seconds)
[21:36] <fusl> is there anyone who knows how to solve an "mds.$HOSTNAME unable to obtain rotating service keys; retrying" error? a couple of standby mds nodes have been kicked out of my small cluster and cant "rejoin" it anymore - restarting the daemons or the entire node does not help either, neither does another "ceph-deploy mds create $HOSTNAME" on the admin node
[21:36] <fusl> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
[21:38] * Mattress (~slowriot@26XAAB3J7.tor-irc.dnsbl.oftc.net) Quit ()
[21:40] * jermudgeon (~jhaustin@wpc-pe-l2.whitestone.link) Quit (Read error: Network is unreachable)
[21:42] * jermudgeon (~jhaustin@wpc-pe-l2.whitestone.link) has joined #ceph
[21:43] <xinli> vasu: I cleaned up the system, and try: install --stable ceph-osd0, but it saying missing file: Err:13 https://download.ceph.com/debian-jewel xenial/main amd64 ceph amd64 10.2.3-1xenial
[21:43] <xinli> [ceph-osd0][DEBUG ] 404 Not Found, any idea to debug?
[21:43] <blizzow> xinli: you may want to run apt-get purge ceph-common and then apt-get -f install to clean up your dependency problems.
[21:47] <mnaser> xinli, looks like you caught weird timing, 10.2.3 is just being released right now
[21:47] * derjohn_mob (~aj@b2b-94-79-172-98.unitymedia.biz) Quit (Ping timeout: 480 seconds)
[21:47] <mnaser> https://download.ceph.com/debian-jewel/pool/main/c/ceph/
[21:47] <xinli> mnaser: what should I do now, again purgedata and purge , and rerun?
[21:47] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) Quit (Ping timeout: 480 seconds)
[21:48] <mnaser> apt-get update
[21:48] <mnaser> and run it again
[21:48] <xinli> mnaser: ok
[21:48] <xinli> mnaser: even I used install --stable?
[21:49] <mnaser> i dont know about those details.. i dont use ceph-deploy sorry
[21:49] <mnaser> all i know is 10.2.3 just came out and thats' what caused that to happen for you
[21:51] * jermudgeon (~jhaustin@wpc-pe-l2.whitestone.link) Quit (Read error: Connection reset by peer)
[21:52] * jermudgeon (~jhaustin@wpc-pe-l2.whitestone.link) has joined #ceph
[21:54] * valeech (~valeech@pool-96-247-203-33.clppva.fios.verizon.net) has joined #ceph
[21:58] * dyasny (~dyasny@cable-192.222.152.136.electronicbox.net) has joined #ceph
[21:59] * sxc731 (~sxc731@xdsl-188-154-67-50.adslplus.ch) has joined #ceph
[21:59] * diver (~diver@95.85.8.93) has joined #ceph
[22:01] <xinli> mnaser: yes, now the first node installed correctly: [ceph-osd0][INFO ] Running command: sudo ceph --version
[22:01] <xinli> [ceph-osd0][DEBUG ] ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
[22:01] * shorton (~oftc-webi@50.58.7.244) has joined #ceph
[22:01] <mnaser> you might just be the first 10.2.3 install xinli :)
[22:01] * icey_ is now known as icey
[22:01] <shorton> Hello, I'm new here, running Ceph Jewel on Ubuntu with Openstack Mitaka. Using Ceph to back Nova ephemeral, Cinder volumes, Glance images, and Manila File Share. All have been working well.
[22:01] <xinli> blizzow: my second node1 still failed
[22:01] <shorton> 2 days ago, I moved Manila File Share from using ceph-fuse driver to kernel driver for better performance with fileIO to the share. Now, I am getting, "HEALTH_WARN mds0: Many clients (12) failing to respond to cache pressure" in the logs.
[22:02] * doppelgrau (~doppelgra@dslb-088-072-094-200.088.072.pools.vodafone-ip.de) Quit (Quit: doppelgrau)
[22:02] <shorton> Can anyone advise me?
[22:03] * sphinxx_ (~sphinxx@41.217.204.74) has joined #ceph
[22:04] <xinli> blizzow: the error message is: [ceph-osd1][DEBUG ] Setting system user ceph properties..usermod: user ceph is currently used by process 4811
[22:04] <xinli> [ceph-osd1][DEBUG ] dpkg: error processing package ceph-common (--configure):
[22:04] * sxc731 (~sxc731@xdsl-188-154-67-50.adslplus.ch) Quit (Remote host closed the connection)
[22:04] <xinli> blizzow: my first node is admin node, it will ask for sudo, but the second noe needs the ssh , is there any configuration issue?
[22:05] <gregsfortytwo> shorton: you'll want to be using an exceedingly recent kernel for CephFS clients, both for general bug fixes and for that particular issue
[22:05] <shorton> gregsfortytwo: I am on Ubuntu 16.04.1. Is that new enough?
[22:06] <shorton> gregsfortytwo: inux arccloud01 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:42:33 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[22:06] <gregsfortytwo> as long as your MDS isn't running out of memory you can just bump up its "mds cache size" config (and maybe restart? I forget) and things may be okay
[22:06] <gregsfortytwo> shorton: I'm not sure, I don't track it closely
[22:07] <shorton> gregsfortytwo: ok, this is the latest release Ubuntu. I'm pretty new with ceph. Can you tell me how to tell if MDS is out of memory? ceph-s says all healthy
[22:07] <gregsfortytwo> I just meant the box, use top or whatever ;)
[22:07] <neurodrone> Have anyone seen an error like this before while bootstrapping MONs?
[22:07] <shorton> gregsfortytwo: and also where to change the cache size for MDS?
[22:07] <neurodrone> 2016-09-21 19:58:04.704101 7f6a835d9700 2 -- 10.114.128.25:6789/0 >> 0.0.0.0:0/1 pipe(0x7f6a8e482000 sd=8 :0 s=1 pgs=0 cs=0 l=0 c=0x7f6a8e2f2880).connect error 0.0.0.0:0/1, (111) Connection refused
[22:08] <gregsfortytwo> ceph.conf
[22:08] <blizzow> xinli: kill all running ceph processes on the node then fix your apt packages with the purge and install commands.
[22:08] <neurodrone> Whats 0.0.0.0:0/1 even mean?
[22:08] * sphinxx (~sphinxx@154.118.120.40) Quit (Ping timeout: 480 seconds)
[22:09] <shorton> gregsfortytwo: ok, box is a monster with 14 TB disk, 256GB RAM, 30-something cpus.
[22:09] <neurodrone> 10.114.128.25 is where I have my primary mon01 running. My public network has 10.114.0.0/16.
[22:09] <neurodrone> Using 10.2.2.
[22:09] <gregsfortytwo> shorton: so the default config is 100,000 (inodes), which is not a lot (~2KB of memory per dentry+inode) to keep in cache if you have a large system or a lot of clients with active files
[22:10] <gregsfortytwo> bump that way, way up and you may see things do better
[22:11] * derjohn_mob (~aj@x590ca8e2.dyn.telefonica.de) has joined #ceph
[22:14] <shorton> gregsfortytwo: great, thanks so much! Based on my machine size, are there other ceph.conf tweaks that I should make to increase performance? non-defaults that I have now: mon_pg_warn_max_per_osd=400, mon_lease=50, mon_lease_renew_interval=30, mon_lease_ack_timeout=100
[22:14] <gregsfortytwo> not off the top of my head; the cache size is definitely the big one that's bad for most users (but there isn't really a good number and if it's set too large it's really bad)
[22:16] <shorton> gregsfortytwo: ok, thanks again for the speedy help!
[22:22] * debian1121 (~bcolbert@2600:1005:b014:291e:863a:4bff:fe9b:d1ba) Quit (Ping timeout: 480 seconds)
[22:23] * georgem (~Adium@206.108.127.16) Quit (Quit: Leaving.)
[22:23] <xinli> blizzow: after I run purge ceph-osd1, all the messed up package are gone: ceph@ceph-osd1:~$ sudo dpkg --configure -a, no out put
[22:24] <xinli> blizzow: can I now reinstall ceph-deploy install --stable jewel ceph-osd1
[22:24] <blizzow> xinli: make sure ceph is gone with dpkg --get-selections|grep ceph
[22:25] <xinli> blizzow: no output, and /etc/ceph also gone.
[22:27] * bniver (~bniver@71-9-144-29.static.oxfr.ma.charter.com) Quit (Remote host closed the connection)
[22:27] * shorton (~oftc-webi@50.58.7.244) Quit (Quit: Page closed)
[22:29] * davidzlap (~Adium@2605:e000:1313:8003:f890:a356:798d:2d6d) has joined #ceph
[22:30] * diver (~diver@95.85.8.93) Quit ()
[22:30] * mykola (~Mikolaj@91.245.73.87) Quit (Quit: away)
[22:34] <blizzow> Then I'd reboot the node and try run the ceph-deploy installation.
[22:35] * BrianA (~BrianA@fw-rw.shutterfly.com) has left #ceph
[22:35] * Jeffrey4l_ (~Jeffrey@110.244.236.101) has joined #ceph
[22:37] * d_shizzzzle (~d_shizzzz@128.104.173.229) has joined #ceph
[22:37] * d_shizzzzle (~d_shizzzz@128.104.173.229) Quit ()
[22:37] * debian112 (~bcolbert@c-73-184-103-26.hsd1.ga.comcast.net) has joined #ceph
[22:37] * georgem (~Adium@24.114.67.56) has joined #ceph
[22:38] * georgem (~Adium@24.114.67.56) Quit ()
[22:38] * georgem (~Adium@206.108.127.16) has joined #ceph
[22:39] * Jeffrey4l (~Jeffrey@119.251.252.244) Quit (Ping timeout: 480 seconds)
[22:41] * goberle (~goberle@jot.ygg.tf) Quit (Remote host closed the connection)
[22:41] * pdrakeweb (~pdrakeweb@pool-98-118-150-184.bflony.fios.verizon.net) Quit (Remote host closed the connection)
[22:44] * goberle (~goberle@jot.ygg.tf) has joined #ceph
[22:51] * goberle (~goberle@jot.ygg.tf) Quit (Remote host closed the connection)
[22:52] * goberle (~goberle@jot.ygg.tf) has joined #ceph
[22:52] <neurodrone> Is ceph-deploy mon create-initial used by anyone for deploying initial sets of mons?
[22:57] * Mosibi_ (~Mosibi@dld.unixguru.nl) Quit (Quit: reboot)
[22:58] * vbellur (~vijay@nat-pool-bos-t.redhat.com) Quit (Quit: Leaving.)
[22:59] * Mosibi (~Mosibi@dld.unixguru.nl) has joined #ceph
[23:00] * cathode (~cathode@50-232-215-114-static.hfc.comcastbusiness.net) has joined #ceph
[23:01] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) has joined #ceph
[23:03] * sphinxx (~sphinxx@154.118.120.40) has joined #ceph
[23:04] * sphinxx__ (~sphinxx@41.217.204.74) has joined #ceph
[23:08] * branto (~branto@ip-78-102-208-181.net.upcbroadband.cz) Quit (Quit: Leaving.)
[23:09] * dgurtner (~dgurtner@84-73-130-19.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[23:10] * sphinxx_ (~sphinxx@41.217.204.74) Quit (Ping timeout: 480 seconds)
[23:11] * sphinxx (~sphinxx@154.118.120.40) Quit (Ping timeout: 480 seconds)
[23:13] * loicd (~loicd@211.ip-167-114-243.eu) has left #ceph
[23:15] <xinli> blizzow: after clean up, and reboot the node, the ceph-deploy still failed. I paste error in this link: http://paste.openstack.org/show/582467/
[23:20] * xarses (~xarses@64.124.158.3) Quit (Ping timeout: 480 seconds)
[23:28] * georgem (~Adium@206.108.127.16) has left #ceph
[23:35] * doppelgrau (~doppelgra@132.252.235.172) has joined #ceph
[23:35] * newbie (~kvirc@host217-114-156-249.pppoe.mark-itt.net) Quit (Ping timeout: 480 seconds)
[23:39] * wak-work (~wak-work@2620:15c:2c5:3:8cc1:c036:72d9:db09) Quit (Ping timeout: 480 seconds)
[23:46] <xinli> blizzow: I find the issue, the root cause is that I used ceph as deploy user which is wrong
[23:46] * mattbenjamin (~mbenjamin@12.118.3.106) Quit (Quit: Leaving.)
[23:47] * wak-work (~wak-work@2620:15c:2c5:3:a4c5:19d8:d655:a7ea) has joined #ceph
[23:50] * effractur (~Erik@hlm000.nl.z4p.nl) Quit (Quit: Reconnecting)
[23:50] * effractu1 (~Erik@hlm000.nl.z4p.nl) has joined #ceph
[23:55] * haplo37 (~haplo37@199.91.185.156) Quit (Remote host closed the connection)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.