#ceph IRC Log

Index

IRC Log for 2015-12-16

Timestamps are in GMT/BST.

[0:00] * derjohn_mobi (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[0:02] * reed_ (~reed@2607:f298:a:607:8de3:3d27:1b21:9029) has joined #ceph
[0:02] * reed (~reed@2607:f298:a:607:8de3:3d27:1b21:9029) Quit (Read error: Connection reset by peer)
[0:02] * brad_mssw (~brad@66.129.88.50) Quit (Quit: Leaving)
[0:03] * fred`` (fred@2001:6f8:10c0:0:2010:abec:24d:2500) Quit (Server closed connection)
[0:03] * codice (~toodles@75-128-34-237.static.mtpk.ca.charter.com) has joined #ceph
[0:05] * fred`` (fred@2001:6f8:10c0:0:2010:abec:24d:2500) has joined #ceph
[0:08] * kraken (~kraken@8.43.84.3) has joined #ceph
[0:08] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[0:16] * Zombiekiller (~tZ@7V7AAB0JV.tor-irc.dnsbl.oftc.net) Quit ()
[0:18] * leseb- is now known as leseb_away
[0:18] * debian112 (~bcolbert@24.126.201.64) Quit (Quit: Leaving.)
[0:22] * jamespage (~jamespage@2a00:1098:0:80:1000:42:0:1) Quit (Server closed connection)
[0:22] * jamespage (~jamespage@2a00:1098:0:80:1000:42:0:1) has joined #ceph
[0:24] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[0:42] * Steki (~steki@87.116.182.245) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:43] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[0:47] * dyasny (~dyasny@dsl.198.58.159.95.ebox.ca) Quit (Ping timeout: 480 seconds)
[0:48] * dgurtner (~dgurtner@84.203.232.226) has joined #ceph
[0:49] * m0zes (~mozes@beocat.cis.ksu.edu) Quit (Ping timeout: 480 seconds)
[0:51] * dyasny (~dyasny@dsl.198.58.171.185.ebox.ca) has joined #ceph
[0:52] * theengineer (~theengine@45-31-177-36.lightspeed.austtx.sbcglobal.net) Quit (Quit: This computer has gone to sleep)
[0:54] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) Quit (Server closed connection)
[0:54] * bearkitten (~bearkitte@cpe-76-172-86-115.socal.res.rr.com) has joined #ceph
[0:55] * daviddcc (~dcasier@84.197.151.77.rev.sfr.net) Quit (Ping timeout: 480 seconds)
[0:56] * cathode (~cathode@50-198-166-81-static.hfc.comcastbusiness.net) has joined #ceph
[0:59] * steveeJ (~junky@HSI-KBW-149-172-252-139.hsi13.kabel-badenwuerttemberg.de) Quit (Ping timeout: 480 seconds)
[1:01] * lcurtis (~lcurtis@47.19.105.250) Quit (Remote host closed the connection)
[1:01] * dyasny (~dyasny@dsl.198.58.171.185.ebox.ca) Quit (Ping timeout: 480 seconds)
[1:03] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[1:05] * fsimonce (~simon@host28-31-dynamic.30-79-r.retail.telecomitalia.it) Quit (Quit: Coyote finally caught me)
[1:06] * theengineer (~theengine@45-31-177-36.lightspeed.austtx.sbcglobal.net) has joined #ceph
[1:07] * mancdaz (~mancdaz@2a00:1a48:7806:117:be76:4eff:fe08:7623) Quit (Server closed connection)
[1:07] * mancdaz (~mancdaz@2a00:1a48:7806:117:be76:4eff:fe08:7623) has joined #ceph
[1:08] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[1:09] * johnavp1989 (~jpetrini@pool-100-14-5-21.phlapa.fios.verizon.net) has joined #ceph
[1:11] * moore (~moore@64.202.160.88) Quit (Remote host closed the connection)
[1:12] * moore (~moore@64.202.160.88) has joined #ceph
[1:13] * garphy is now known as garphy`aw
[1:14] * yanzheng (~zhyan@125.71.106.102) has joined #ceph
[1:15] * vata (~vata@cable-21.246.173-197.electronicbox.net) has joined #ceph
[1:20] * moore (~moore@64.202.160.88) Quit (Ping timeout: 480 seconds)
[1:21] * kraken (~kraken@8.43.84.3) Quit (Remote host closed the connection)
[1:21] * kraken (~kraken@8.43.84.3) has joined #ceph
[1:25] * EinstCrazy (~EinstCraz@117.15.122.189) Quit (Remote host closed the connection)
[1:27] * dgurtner (~dgurtner@84.203.232.226) Quit (Ping timeout: 480 seconds)
[1:31] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[1:31] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[1:31] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[1:32] * yanzheng (~zhyan@125.71.106.102) Quit (Quit: This computer has gone to sleep)
[1:33] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[1:34] * cathode (~cathode@50-198-166-81-static.hfc.comcastbusiness.net) Quit (Quit: Leaving)
[1:37] * elder (~elder@c-24-245-18-91.hsd1.mn.comcast.net) Quit (Server closed connection)
[1:37] * elder (~elder@c-24-245-18-91.hsd1.mn.comcast.net) has joined #ceph
[1:37] * ChanServ sets mode +o elder
[1:39] * ircolle (~Adium@c-71-229-136-109.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[1:40] <KaneK> I???ve got plenty of log lines: slow request, currently waiting for active
[1:40] <KaneK> what does it mean?
[1:42] * oms101 (~oms101@p20030057EA02C700C6D987FFFE4339A1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[1:44] <doppelgrau> KaneK: do you have PGs that are not active
[1:44] * MACscr (~Adium@2601:247:4101:a0be:75f5:10f0:3de2:d6e3) has joined #ceph
[1:45] <KaneK> yes, something like this: pg 1.3e is undersized+degraded+remapped+peered, acting [2] ?
[1:46] <doppelgrau> KaneK: so you have unsersized PGs => IO s blocked until the problem is solved
[1:46] <KaneK> doppelgrau: should ceph health report HEALTH_ERR in this case?
[1:47] <doppelgrau> not shure if error or warning
[1:47] <KaneK> it shows warning: ceph health detail
[1:47] <KaneK> HEALTH_WARN 5 pgs backfill; 1 pgs backfilling; 82 pgs degraded; 1 pgs recovering; 3 pgs recovery_wait; 40 pgs stuck degraded; 12 pgs stuck inactive; 106 pgs stuck unclean; 39 pgs stuck undersized; 65 pgs undersized;
[1:47] <KaneK> so the whole cluster is blocked or only those PGs?
[1:47] <doppelgrau> can yo paste in a pastebin ???ceph -s??? and ???ceph health detail???
[1:49] <KaneK> doppelgrau: ceph -s: http://pastebin.com/sDzuyyHF
[1:50] * oms101 (~oms101@p20030057EA020300C6D987FFFE4339A1.dip0.t-ipconnect.de) has joined #ceph
[1:51] <doppelgrau> can you take a look at the pg-details why the peering takes so long?
[1:53] * m0zes (~mozes@beocat.cis.ksu.edu) has joined #ceph
[1:54] * xarses (~xarses@64.124.158.100) Quit (Ping timeout: 480 seconds)
[1:55] * babilen (~babilen@babilen.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:56] * vata1 (~vata@cable-21.246.173-197.electronicbox.net) has joined #ceph
[1:57] <KaneK> I took down 3 osds in my staging at once (number of replicas was 3), I added them back, there is a bit of data to move around
[1:57] <KaneK> will it fully recover after some time?
[1:57] <KaneK> 3 out of 6 total
[1:58] <lurbs> You only have 4 of the 6 currently up.
[1:58] * EinstCrazy (~EinstCraz@111.30.21.47) has joined #ceph
[1:59] <KaneK> yeah, I brought 2 remaining after ceph -s
[1:59] <doppelgrau> KaneK: with only 4 out of the 6 up & in you might need manual intervention
[1:59] <doppelgrau> KaneK: with 6 up&in it should recover
[2:00] <KaneK> doppelgrau: if I had 4 - why manual intervention would be needed? cluster should have at least 1 replica, right?
[2:02] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[2:02] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[2:03] <doppelgrau> KaneK: since the newest object-version might be on one of the lost OSDs => have to tell ceph that you risk the loss of some information
[2:03] <KaneK> got it
[2:04] <KaneK> so that means that some objects might not be fully replicated on the remaining osd?
[2:05] <doppelgrau> depends how the cluster fails
[2:06] <doppelgrau> e.g. size=3, min_size=2: OSD-A goes offline, min_size still satified => writes to OSD-B and OSD-C
[2:06] <doppelgrau> then OSD-B/C fails and OSD-A comes back online
[2:06] <doppelgrau> one copy, but not the most recent one
[2:07] <KaneK> aha makes sense.
[2:08] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[2:08] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[2:09] * xarses (~xarses@50.141.33.74) has joined #ceph
[2:10] * leseb_away is now known as leseb-
[2:10] * leseb- is now known as leseb_away
[2:11] <KaneK> doppelgrau: how ceph keeps track if osd is in sync and has latest data?
[2:12] <doppelgrau> KaneK: not shure, but I think it???s linked to the pgmap-Version somehow
[2:12] * wushudoin (~wushudoin@2601:646:8201:7769:2ab2:bdff:fe0b:a6ee) Quit (Ping timeout: 480 seconds)
[2:13] <KaneK> doppelgrau: then in case you described above: when you bring OSD-A, ceph will know that OSD-A has outdated pgmap-version?
[2:14] <KaneK> i.e. if OSD-B/C fail, and then OSD-B comes up, ceph will be able to continue, because it knows that OSD-B has latest data?
[2:14] <doppelgrau> KaneK: the pgmap is maintained by the monitors, so they can keep track
[2:14] <KaneK> or in general if you loose all replicas, then one replica comes up - ceph will need manual repair?
[2:17] * rendar (~I@host1-143-dynamic.59-82-r.retail.telecomitalia.it) Quit (Quit: std::lower_bound + std::less_equal *works* with a vector without duplicates!)
[2:17] <doppelgrau> KaneK: not sure in general. I had the case where lots of disks had failed during a move (of the hardware) into a new datacenter, and for some PGs I had to manually mark the failed OSDs lost. But no idea if other PGs had recovered without assistance, didn???t track the mapping
[2:18] <KaneK> thanks
[2:19] <doppelgrau> but after I found the reason the manual intervention was small
[2:20] <doppelgrau> (and loosing about 10% of your disks at the same time distributed around all failure domains is really bad luck ^^)
[2:23] * dyasny (~dyasny@dsl.198.58.171.185.ebox.ca) has joined #ceph
[2:24] * xarses (~xarses@50.141.33.74) Quit (Ping timeout: 480 seconds)
[2:24] * KaneK (~kane@cpe-172-88-240-14.socal.res.rr.com) Quit (Quit: KaneK)
[2:24] <davidz1> Kanek: If an osd comes up with the only replica of a PG and was "in" the last time there were modifications then recovery is automatic. Repair is an admin action when there are inconsistencies between replicas.
[2:31] * smf68 (~n0x1d@46.166.188.193) has joined #ceph
[2:32] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[2:32] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[2:33] * angdraug (~angdraug@c-69-181-140-42.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[2:35] * smerz (~ircircirc@37.74.194.90) Quit (Server closed connection)
[2:35] * smerz (~ircircirc@37.74.194.90) has joined #ceph
[2:44] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[2:44] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[2:47] * yanzheng (~zhyan@222.209.142.107) has joined #ceph
[2:49] * MACscr (~Adium@2601:247:4101:a0be:75f5:10f0:3de2:d6e3) Quit (Ping timeout: 480 seconds)
[2:50] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[2:50] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[3:01] * smf68 (~n0x1d@4MJAAAGU8.tor-irc.dnsbl.oftc.net) Quit ()
[3:03] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[3:03] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[3:05] * xarses (~xarses@50.141.33.30) has joined #ceph
[3:07] * doppelgrau_ (~doppelgra@p5DC0731D.dip0.t-ipconnect.de) has joined #ceph
[3:07] * zhaochao (~zhaochao@60.206.230.66) has joined #ceph
[3:11] * doppelgrau (~doppelgra@p5DC07526.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[3:11] * doppelgrau_ is now known as doppelgrau
[3:11] * yanzheng (~zhyan@222.209.142.107) Quit (Quit: This computer has gone to sleep)
[3:21] * dyasny (~dyasny@dsl.198.58.171.185.ebox.ca) Quit (Ping timeout: 480 seconds)
[3:21] * dyasny (~dyasny@dsl.198.58.171.185.ebox.ca) has joined #ceph
[3:22] * reed_ (~reed@2607:f298:a:607:8de3:3d27:1b21:9029) Quit (Quit: Ex-Chat)
[3:28] * aarontc (~aarontc@2001:470:e893::1:1) Quit (Server closed connection)
[3:28] * aarontc (~aarontc@2001:470:e893::1:1) has joined #ceph
[3:34] * theengineer (~theengine@45-31-177-36.lightspeed.austtx.sbcglobal.net) Quit (Quit: This computer has gone to sleep)
[3:34] * naoto (~naotok@2401:bd00:b001:8920:27:131:11:254) has joined #ceph
[3:37] * LeaChim (~LeaChim@host86-185-146-193.range86-185.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:45] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[3:45] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[4:00] * leseb_away is now known as leseb-
[4:00] * leseb- is now known as leseb_away
[4:04] * basicxman (~kiasyn@46.166.188.193) has joined #ceph
[4:05] * naoto_ (~naotok@27.131.11.254) has joined #ceph
[4:08] * evilrob (~evilrob@2600:3c00::f03c:91ff:fedf:1d3d) Quit (Ping timeout: 480 seconds)
[4:09] * dyasny (~dyasny@dsl.198.58.171.185.ebox.ca) Quit (Ping timeout: 480 seconds)
[4:10] * naoto (~naotok@2401:bd00:b001:8920:27:131:11:254) Quit (Ping timeout: 480 seconds)
[4:12] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Server closed connection)
[4:12] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[4:13] * dyasny (~dyasny@dsl.dynamic.191-116-74-190.electronicbox.net) has joined #ceph
[4:25] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[4:30] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[4:30] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[4:34] * basicxman (~kiasyn@4MJAAAGXJ.tor-irc.dnsbl.oftc.net) Quit ()
[4:37] * telnoratti (~telnoratt@2a00:1768:1001:37:101:b00b:5:101) Quit (Ping timeout: 480 seconds)
[4:43] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[4:48] * xarses_ (~xarses@50.141.33.30) has joined #ceph
[4:49] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[4:49] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[4:49] * xarses_ (~xarses@50.141.33.30) Quit (Read error: Connection reset by peer)
[4:49] * xarses_ (~xarses@50.141.33.30) has joined #ceph
[4:50] * bj0rnar (~Bjornar@109.247.131.38) Quit (Server closed connection)
[4:50] * telnoratti (~telnoratt@2a00:1768:1001:37:101:b00b:5:101) has joined #ceph
[4:50] * bj0rnar (~Bjornar@109.247.131.38) has joined #ceph
[4:55] * xarses (~xarses@50.141.33.30) Quit (Ping timeout: 480 seconds)
[4:56] * x00350071 (~x00350071@119.145.15.121) has joined #ceph
[4:56] * Mosibi (~Mosibi@77.37.12.119) Quit (Server closed connection)
[4:56] * Mosibi (~Mosibi@77.37.12.119) has joined #ceph
[4:57] * vbellur (~vijay@121.244.87.124) has joined #ceph
[4:59] * bene_in_mtg (~bene@2601:18c:8300:f3ae:ea2a:eaff:fe08:3c7a) Quit (Quit: Konversation terminated!)
[5:00] * dyasny (~dyasny@dsl.dynamic.191-116-74-190.electronicbox.net) Quit (Ping timeout: 480 seconds)
[5:03] * yanzheng (~zhyan@125.71.106.102) has joined #ceph
[5:05] * MACscr (~Adium@2601:247:4101:a0be:88a8:92cd:80d8:7f59) has joined #ceph
[5:07] * theengineer (~theengine@45-31-177-36.lightspeed.austtx.sbcglobal.net) has joined #ceph
[5:13] * hgichon (~hgichon@112.220.91.130) has joined #ceph
[5:14] * yanzheng (~zhyan@125.71.106.102) Quit (Quit: This computer has gone to sleep)
[5:14] * lobstar (~Rosenblut@tor2r.ins.tor.net.eu.org) has joined #ceph
[5:17] * johnavp1989 (~jpetrini@pool-100-14-5-21.phlapa.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[5:21] * Vacuum__ (~Vacuum@88.130.200.164) has joined #ceph
[5:22] * wyang (~wyang@server213-171-196-75.live-servers.net) Quit (Remote host closed the connection)
[5:28] * Vacuum_ (~Vacuum@i59F79723.versanet.de) Quit (Ping timeout: 480 seconds)
[5:30] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[5:30] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Remote host closed the connection)
[5:32] * yanzheng (~zhyan@125.71.106.102) has joined #ceph
[5:38] * Concubidated (~Adium@pool-98-119-93-148.lsanca.fios.verizon.net) has joined #ceph
[5:40] * yanzheng (~zhyan@125.71.106.102) Quit (Quit: This computer has gone to sleep)
[5:44] * lobstar (~Rosenblut@7V7AAB0UI.tor-irc.dnsbl.oftc.net) Quit ()
[5:47] * yanzheng (~zhyan@125.71.106.102) has joined #ceph
[5:51] * kefu (~kefu@114.92.107.250) has joined #ceph
[5:53] * yanzheng (~zhyan@125.71.106.102) Quit (Quit: This computer has gone to sleep)
[6:00] * jamespage (~jamespage@2a00:1098:0:80:1000:42:0:1) Quit (Quit: Coyote finally caught me)
[6:00] * jamespage (~jamespage@2a00:1098:0:80:1000:42:0:1) has joined #ceph
[6:01] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[6:01] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[6:17] * theengineer (~theengine@45-31-177-36.lightspeed.austtx.sbcglobal.net) Quit (Quit: This computer has gone to sleep)
[6:20] * rdas (~rdas@121.244.87.116) has joined #ceph
[6:25] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[6:25] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[6:25] * KaneK (~kane@cpe-172-88-240-14.socal.res.rr.com) has joined #ceph
[6:31] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[6:31] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[6:39] * swami1 (~swami@106.216.181.17) has joined #ceph
[6:41] * leseb_away is now known as leseb-
[6:41] * leseb- is now known as leseb_away
[6:44] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[6:44] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[6:45] * yanzheng (~zhyan@125.71.106.102) has joined #ceph
[6:45] * lpabon (~quassel@24-151-54-34.dhcp.nwtn.ct.charter.com) Quit (Remote host closed the connection)
[6:46] * vata1 (~vata@cable-21.246.173-197.electronicbox.net) Quit (Quit: Leaving.)
[7:01] * Mika_c (~Mika@122.146.93.152) has joined #ceph
[7:06] * nils_ (~nils_@doomstreet.collins.kg) Quit (Quit: This computer has gone to sleep)
[7:08] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[7:08] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[7:09] * overclk (~vshankar@121.244.87.117) has joined #ceph
[7:17] * kefu (~kefu@114.92.107.250) Quit (Max SendQ exceeded)
[7:18] * kefu (~kefu@211.22.145.245) has joined #ceph
[7:28] * wtracz2 (~williamtr@host109-155-156-10.range109-155.btcentralplus.com) has joined #ceph
[7:28] * wtracz2 (~williamtr@host109-155-156-10.range109-155.btcentralplus.com) has left #ceph
[7:28] * Cybertinus (~Cybertinu@cybertinus.customer.cloud.nl) Quit (Read error: Connection reset by peer)
[7:29] * Cybertinus (~Cybertinu@cybertinus.customer.cloud.nl) has joined #ceph
[7:31] * DV (~veillard@2001:41d0:1:d478::1) Quit (Remote host closed the connection)
[7:38] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[7:40] * Infected_ (infected@peon.lantrek.fi) Quit (Server closed connection)
[7:40] * Infected (infected@peon.lantrek.fi) has joined #ceph
[7:42] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[7:55] * sleinen (~Adium@2001:620:0:2d:7ed1:c3ff:fedc:3223) has joined #ceph
[7:56] * skoude (~skoude@193.142.1.54) has joined #ceph
[7:57] <Be-El> hi
[7:57] <skoude> noob question. If I have 480GB SSD, and I create 3 journal on it /dev/sdc1, /dev/sdc2, /dev/sdc3) how can I use the rest of the free space in /dev/sdc disk as osd with ceph-deploy command?
[7:58] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[7:58] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[7:58] <skoude> i tried to create /dev/sdc4(as journal) and /dev/sdc5 (as data) and then execute a command: ceph-deploy osd create d0c-c4-7a-1f-04-6c:sdc5:sdc4
[7:59] * cooldharma06 (~chatzilla@14.139.180.40) has joined #ceph
[7:59] <skoude> but it does not work.. Partitions have "type other" when I check with ceph-deploy disk list
[8:04] * Concubidated (~Adium@pool-98-119-93-148.lsanca.fios.verizon.net) Quit (Quit: Leaving.)
[8:05] <Be-El> ceph-deploy uses ceph-disk on the target host for managing osds, and ceph-disk does allow using existing partitions
[8:06] <Be-El> (i'm not sure whether ceph-deploy helps users or whether it keeps them from actually understanding how ceph works...)
[8:07] <Be-El> skoude: the output should have more information abouth the steps performed by ceph-deploy and ceph-disk. you also might want to check with ceph-disk on the target system itself
[8:08] <Be-El> skoude: there are some corner case in which ceph-disk does not recognize partitions and creates a new partition table insight the partition, but working with standard sd?? devices should be fine
[8:08] * zaitcev (~zaitcev@c-50-130-189-82.hsd1.nm.comcast.net) Quit (Quit: Bye)
[8:10] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[8:13] * KaneK (~kane@cpe-172-88-240-14.socal.res.rr.com) Quit (Quit: KaneK)
[8:15] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) has joined #ceph
[8:20] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) has joined #ceph
[8:23] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[8:23] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[8:23] * overclk (~vshankar@121.244.87.117) Quit (Quit: BitchX-1.2.1 -- just do it.)
[8:24] * overclk (~vshankar@121.244.87.117) has joined #ceph
[8:25] * Zethrok (~martin@95.154.26.34) has joined #ceph
[8:25] * Snowcat4 (~LorenXo@176.123.6.153) has joined #ceph
[8:25] * shohn (~shohn@dslb-188-102-031-075.188.102.pools.vodafone-ip.de) has joined #ceph
[8:27] * joshd (~jdurgin@206.169.83.146) Quit (Ping timeout: 480 seconds)
[8:27] <skoude> Be-El: thanks, I will check with ceph-disk
[8:33] * bara (~bara@ip4-83-240-10-82.cust.nbox.cz) has joined #ceph
[8:34] * x00350071 (~x00350071@119.145.15.121) Quit (Read error: Connection reset by peer)
[8:34] * x00350071 (~x00350071@119.145.15.121) has joined #ceph
[8:37] * bvi (~bastiaan@185.56.32.1) has joined #ceph
[8:37] * andreww (~xarses@50.141.33.30) has joined #ceph
[8:38] * dmick (~dmick@206.169.83.146) has joined #ceph
[8:40] * joshd (~jdurgin@206.169.83.146) has joined #ceph
[8:41] <skoude> Be-El: strange, I don't have disk-list executable at all..
[8:41] <skoude> Be-El: sorry I meant ceph-disk executable :)
[8:41] <Be-El> skoude: the command is 'ceph-disk list'
[8:42] <Be-El> well, that's going to explain a lot of problems ;-)
[8:43] <skoude> Be-El: yeap, but I don't have it.. I have SLES12 installed with this server, and I'm using the Suse's own packages for this.. the version is: ceph version 0.80.9
[8:44] <Be-El> 0.80 is firefly afaik. and firefly should have a ceph-disk binary. maybe it's in another package for sles?
[8:44] * xarses_ (~xarses@50.141.33.30) Quit (Ping timeout: 480 seconds)
[8:44] <skoude> Be-El: but I can create osds like: ceph-deploy osd create d0c-c4-7a-1f-06-2a:sdl
[8:44] * DV__ (~veillard@2001:41d0:1:d478::1) has joined #ceph
[8:44] <skoude> Be-El: and with that command everything works, and I get the journal and osd on the same disk
[8:45] <skoude> Be-El: I will check that
[8:45] * dmick (~dmick@206.169.83.146) has left #ceph
[8:45] <Be-El> skoude: are you sure you are checking in the right host? ceph-disk should be available on d0c-c4-7a-1f-06-2a, it does not need to be installed on the admin host
[8:47] <skoude> Be-El: yeap, the d0c-c4-7a-1f-06-2a is the same as admin host
[8:47] <skoude> Be-El: and I have all the ceph packages installed on SLES12 that suse offers.. I think I have to ask from Suse why ceph-disk is not included
[8:49] <Be-El> skoude: ceph-deploy should use ceph-disk, so it has to be there. maybe it is not available in PATH
[8:50] <skoude> Be-El: yeap, i found it with find.. For some reason it's not in the path.. now I included it so it will work :)
[8:51] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[8:55] * Snowcat4 (~LorenXo@6YRAABP5J.tor-irc.dnsbl.oftc.net) Quit ()
[8:55] * DV__ (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[8:56] * jasuarez (~jasuarez@237.Red-83-39-111.dynamicIP.rima-tde.net) has joined #ceph
[8:58] <skoude> also I have one fundamental question.. I have 4 hosts and each of them has 2x 120GB ssd for the operating system, 6 x 480GB SSD and 6X 4B sata -disks. I wan't to use ssd as a journal for these 4TB drives. So is it best to create two journals for each ssd from which one is used for ssd and one is used for sata drive? So basically each ssd will have it's own journal and also a journal for one Sata drive? I will also wan't to split these drives to two pools: ceph_
[8:59] <skoude> or is it just best to dedicate one ssd for the journals for 4TB sata drives?
[9:01] * Kupo2 (~tyler.wil@23.111.254.159) has joined #ceph
[9:03] <Be-El> i would distribute the OSD journals between the SSD. In case of a ssd failure you will loose only half of the OSDs
[9:04] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[9:04] <Be-El> if you want to put OSDs on the remaining space of the SSDs, you have to add some extra effort for placing the OSD at the right location within the crush tree
[9:05] <Be-El> for a mixed setup you usually create a second root element and a different crush rule for pools based on the second root
[9:05] * Kupo1 (~tyler.wil@23.111.254.159) Quit (Ping timeout: 480 seconds)
[9:05] * proc_ (~proc@213.180.65.2) has joined #ceph
[9:06] <Be-El> see http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
[9:06] <proc_> Hi, is it possible to mount a ceph block device in os x?
[9:06] <skoude> Be-El: So you mean, you would create the osd's for one ssd only? and then use the rest of the ssd space as osd?
[9:07] <Be-El> skoude: i would put 3 journals on one ssd, and the other three journals on the other ssd
[9:07] <skoude> Be-El: I'm just trying to achieve that I can use the rest of the space left in ssd as osd, but haven't found a way yet :)
[9:07] <Be-El> skoude: and if the ssd can handle the extra load the remaining space can become a OSD of its own, yes
[9:08] <skoude> Be-El: And I also checked that Sebastiens hans blog, and I'm making the crushmaps and pools like in sebastiens example
[9:10] <skoude> Be-El: The only problem is that I don't know how I can use the rest of the space as OSD.. Have been reading docs, but still haven't been able to figure that out..
[9:10] <skoude> Be-El: And I was also thinking the 2SSD option, so I will use that one :)
[9:11] <Be-El> skoude: use ceph-disk on the host itself
[9:11] <skoude> Be-El: but do I need to create the partition before that it can use it or will it create it automatically?
[9:12] <Be-El> you should create it before if other partitions exists. afaik ceph-disk wants to create the first and the second partition and fails if they already exists
[9:13] <Be-El> if there are active journal partitions on the ssd, creating new partitions may also require a run of partprobe to allow the kernel to update its partition tables
[9:13] <Be-El> that's why i prefer to do it manually in this case
[9:16] * b0e (~aledermue@213.95.25.82) has joined #ceph
[9:17] * pabluk_ is now known as pabluk
[9:17] * leseb_away is now known as leseb-
[9:17] * leseb- is now known as leseb_away
[9:18] * bara (~bara@ip4-83-240-10-82.cust.nbox.cz) Quit (Ping timeout: 480 seconds)
[9:21] * linjan (~linjan@86.62.112.22) has joined #ceph
[9:22] <skoude> Be-El: Thanks, I will try to figure this out... somehow.. :)
[9:26] * LeaChim (~LeaChim@host86-185-146-193.range86-185.btcentralplus.com) has joined #ceph
[9:26] * leseb_away is now known as leseb-
[9:26] * leseb- is now known as leseb_away
[9:29] * overclk (~vshankar@121.244.87.117) Quit (Ping timeout: 480 seconds)
[9:30] * DV__ (~veillard@2001:41d0:1:d478::1) has joined #ceph
[9:30] * fsimonce (~simon@host28-31-dynamic.30-79-r.retail.telecomitalia.it) has joined #ceph
[9:32] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[9:36] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit ()
[9:36] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[9:37] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[9:37] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[9:37] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[9:43] * dgurtner (~dgurtner@nat-pool-ork-u.redhat.com) has joined #ceph
[9:43] * dugravot6 (~dugravot6@dn-infra-04.lionnois.univ-lorraine.fr) Quit (Ping timeout: 480 seconds)
[9:44] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) has joined #ceph
[9:45] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[9:47] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit ()
[9:48] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[9:48] * bruc (~bruno.car@178.237.98.13) has joined #ceph
[9:48] <bruc> Hello
[9:49] <bruc> I have trouble getting my radosgw to work
[9:49] <bruc> it tells me
[9:49] <bruc> /etc/init.d/radosgw start /usr/bin/radosgw is not running.
[9:49] <bruc> which is obvious
[9:50] * leseb_away is now known as leseb-
[9:50] * leseb- is now known as leseb_away
[9:50] <bruc> and I try to have some details :
[9:50] <bruc> /usr/bin/radosgw -d --debug-rgw 20 --debug-ms 1 start
[9:50] <bruc> no monitors specified to connect to.
[9:50] <bruc> 2015-12-16 09:16:03.953260 7f7fd39337c0 -1 did not load config file, using default settings.
[9:50] <bruc> 2015-12-16 09:16:03.953570 7f7fd39337c0 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process radosgw, pid 5592
[9:50] <bruc> 2015-12-16 09:16:03.956241 7f7fd39337c0 -1 Couldn't init storage provider (RADOS)
[9:50] <bruc> does anyone got an idea ?
[9:52] * egonzalez (~egonzalez@238.238.14.62.static.jazztel.es) has joined #ceph
[9:53] * steveeJ (~junky@HSI-KBW-149-172-252-139.hsi13.kabel-badenwuerttemberg.de) has joined #ceph
[9:54] <Be-El> bruc: "no monitors specified to connect to.", "did not load config file, using default settings."
[9:55] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[9:55] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[9:56] * nardial (~ls@dslb-088-072-085-164.088.072.pools.vodafone-ip.de) has joined #ceph
[9:57] <bruc> I'm following the instructions from Ceph, there is no such thing specified
[9:57] <bruc> I don't know how to use these informations :)
[9:57] * analbeard (~shw@support.memset.com) has joined #ceph
[9:57] * leseb_away is now known as leseb-
[9:58] <Be-El> you do not have a ceph configuration file
[9:59] * jclm (~jclm@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[10:00] * RMar04 (~RMar04@support.memset.com) has joined #ceph
[10:01] * jclm1 (~jclm@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[10:02] * T1w (~jens@node3.survey-it.dk) has joined #ceph
[10:03] * rabeeh (~rabeeh@IGLD-84-228-195-199.inter.net.il) Quit (Server closed connection)
[10:03] * rabeeh (~rabeeh@IGLD-84-228-195-199.inter.net.il) has joined #ceph
[10:04] <boolman> I have a slow recovery of my mds servers, 13 from I stop the active for the hot-standby to become active.. any tip?
[10:04] * garphy`aw is now known as garphy
[10:05] * overclk (~vshankar@121.244.87.124) has joined #ceph
[10:05] * Daniel (~Daniel@2a00:1ee0:3:1337:91ba:136c:f45a:13ce) has joined #ceph
[10:06] * Daniel is now known as Guest1520
[10:06] * derjohn_mob (~aj@p578b6aa1.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[10:07] * TMM (~hp@178-84-46-106.dynamic.upc.nl) Quit (Quit: Ex-Chat)
[10:07] * jclm (~jclm@nat-persul-plg.wifi.univ-lorraine.fr) Quit (Ping timeout: 480 seconds)
[10:08] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[10:08] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[10:10] * nardial (~ls@dslb-088-072-085-164.088.072.pools.vodafone-ip.de) Quit (Quit: Leaving)
[10:12] * enax (~enax@hq.ezit.hu) has joined #ceph
[10:14] * babilen (~babilen@babilen.user.oftc.net) has joined #ceph
[10:14] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[10:14] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[10:14] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[10:15] * DV__ (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[10:16] * lincolnb (~lincoln@c-71-57-68-189.hsd1.il.comcast.net) Quit (Remote host closed the connection)
[10:17] * dgurtner (~dgurtner@nat-pool-ork-u.redhat.com) Quit (Ping timeout: 480 seconds)
[10:19] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[10:19] * dgurtner (~dgurtner@nat-pool-ork-u.redhat.com) has joined #ceph
[10:23] * DV__ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[10:25] * AGaW (~PuyoDead@46.101.196.46) has joined #ceph
[10:27] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[10:27] * hgichon (~hgichon@112.220.91.130) Quit (Ping timeout: 480 seconds)
[10:27] * rendar (~I@95.239.179.247) has joined #ceph
[10:30] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[10:30] <bruc> Be-El: do want to see the conf file?
[10:31] <bruc> you*
[10:32] * dgurtner (~dgurtner@nat-pool-ork-u.redhat.com) Quit (Ping timeout: 480 seconds)
[10:33] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[10:33] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[10:33] <bruc> [global]
[10:33] <bruc> fsid = faa83a09-3f21-41db-b8a7-6f281ad64c00
[10:33] <bruc> mon_initial_members = ceph-01, ceph-02, ceph-03, ceph-04
[10:33] <bruc> mon_host = X.X.112.199,X.X.112.224,X.X.112.225,X.X.112.56
[10:33] <bruc> auth_cluster_required = cephx
[10:33] <bruc> auth_service_required = cephx
[10:33] <bruc> auth_client_required = cephx
[10:33] <bruc> filestore_xattr_use_omap = true
[10:33] <bruc> [client.radosgw.gateway]
[10:33] <bruc> host = ceph-01
[10:33] <bruc> keyring = /etc/ceph/clustobj.client.radosgw.keyring
[10:33] <bruc> rgw socket path = /tmp/radosgw.sock
[10:33] <bruc> log file = /var/log/radosgw/client.radosgw.gateway.log
[10:33] <bruc> rgw frontends = fastcgi socket_port=9000 socket_host=X.X.112.199
[10:33] <bruc> rgw print continue = false
[10:33] <bruc> [client.admin]
[10:33] <bruc> keyring = /etc/ceph/clustobj.client.admin.keyring
[10:37] <bruc> my conf file works fine, as it is ok to use ceph or rados commands
[10:37] <bruc> I just can't launch radosgw
[10:38] * oniane (~oniane@etno.u-strasbg.fr) Quit (Server closed connection)
[10:38] * oniane (~oniane@etno.u-strasbg.fr) has joined #ceph
[10:38] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[10:40] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[10:40] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[10:40] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[10:40] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[10:42] * lincolnb (~lincoln@c-71-57-68-189.hsd1.il.comcast.net) has joined #ceph
[10:45] * DV (~veillard@2001:41d0:1:d478::1) has joined #ceph
[10:46] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[10:46] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[10:47] * proc_ (~proc@213.180.65.2) Quit (Ping timeout: 480 seconds)
[10:47] * TMM (~hp@sams-office-nat.tomtomgroup.com) has joined #ceph
[10:49] * bara_ (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[10:50] * derjohn_mob (~aj@fw.gkh-setu.de) has joined #ceph
[10:50] * bara_ (~bara@nat-pool-brq-t.redhat.com) Quit (Read error: Connection reset by peer)
[10:50] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Read error: Connection reset by peer)
[10:51] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[10:51] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[10:51] * daviddcc (~dcasier@80.12.63.157) has joined #ceph
[10:52] * DV__ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[10:52] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[10:55] * AGaW (~PuyoDead@7V7AAB04L.tor-irc.dnsbl.oftc.net) Quit ()
[10:56] <bruc> /usr/bin/radosgw -d --debug-rgw 20 --debug-ms 1 start
[10:56] <bruc> no monitors specified to connect to.
[10:56] <bruc> 2015-12-16 09:16:03.953260 7f7fd39337c0 -1 did not load config file, using default settings.
[10:56] <bruc> 2015-12-16 09:16:03.953570 7f7fd39337c0 0 ceph version 0.80.10 (ea6c958c38df1216bf95c927f143d8b13c4a9e70), process radosgw, pid 5592
[10:56] <bruc> 2015-12-16 09:16:03.956241 7f7fd39337c0 -1 Couldn't init storage provider (RADOS)
[10:56] <bruc> does anyone see why exactly I can't lauch rados gw?
[10:56] <bruc> I did everything Ceph recommands and my conf files seems to be good
[10:59] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[10:59] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[11:00] * atg (~atg@10.127.254.xxx) Quit (Quit: I bet that was the wrong command...)
[11:03] * DV (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[11:04] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[11:04] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[11:07] * skorgu (skorgu@pylon.skorgu.net) Quit (Ping timeout: 480 seconds)
[11:08] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Ping timeout: 480 seconds)
[11:10] * daviddcc (~dcasier@80.12.63.157) Quit (Ping timeout: 480 seconds)
[11:10] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[11:10] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[11:10] * proc_ (~proc@213.180.65.2) has joined #ceph
[11:12] * skorgu (skorgu@pylon.skorgu.net) has joined #ceph
[11:12] <boolman> I have a slow recovery of my mds servers, 13-20 seconds from I stop the active for the hot-standby to become active.. during this time all my clients get stale. Looks like the rejoin is taking forever. I have 2 vCPU 8GB ram per mds
[11:13] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[11:13] * New_to_Ceph (~oftc-webi@84.241.250.11) has joined #ceph
[11:13] * colde1 (~Sketchfil@176.123.6.154) has joined #ceph
[11:14] <New_to_Ceph> Hi All I am new to ceph, and I want to start testing it, can I test it with 2 nodes only?
[11:17] <doppelgrau> New_to_Ceph: yes, but you have to change some settings, defaults works with >=2 nodes
[11:18] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[11:18] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[11:18] <New_to_Ceph> What I need to change? I have 2 nodes each is 4x1TB spinning disks, and 4x1TB SSD
[11:19] * proc_ (~proc@213.180.65.2) Quit (Quit: leaving)
[11:19] * overclk (~vshankar@121.244.87.124) Quit (Ping timeout: 480 seconds)
[11:19] * ngoswami (~ngoswami@121.244.87.116) has joined #ceph
[11:20] * naoto_ (~naotok@27.131.11.254) Quit (Quit: Leaving...)
[11:21] <IcePic> you could test it with one machine and one disk of small size, but it would not be a very good test. Your two boxes will work and show you most of what ceph is
[11:21] <doppelgrau> New_to_Ceph: easiest change ist change ???size??? to two and min_size to one
[11:22] <doppelgrau> New_to_Ceph: or edit the crush-rules to choose osd instead of host as failure domain
[11:22] <New_to_Ceph> and what will be the final size of both nodes? I will create per nodes 2 volumes one of 3TB spinnig disks, and the other is 3TB SSD
[11:23] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[11:23] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[11:26] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[11:26] <mistur> Hello
[11:27] <mistur> I'd like to know if s3ql is workimg with ceph
[11:28] <mistur> and if it is stable
[11:31] <skoude> is there any bood ceph operation guide available, that tells you what you need to do in case of for example osd and journal failures?
[11:35] <doppelgrau> skoude: basically lost journal = lost osd => replace/recreate OSD (remove from ceph, create new)
[11:35] * Mika_c (~Mika@122.146.93.152) Quit (Ping timeout: 480 seconds)
[11:36] <New_to_Ceph> What will be the final size of both nodes? I will create per nodes 2 volumes one of 3TB spinnig disks, and the other is 3TB SSD
[11:36] <IcePic> New_to_Ceph: its not like raid, you dont immediately lose data for checksums and so on, you decide how many replicas of data or which erasure-coding strategy to use, then as you fill with data it eats more or less depending on the ambition of replication you have selected.
[11:38] <New_to_Ceph> Ohh clear now :-) Thank you a lot, for testing purpose I will decrease the replicas to minimum specially I have HW raid 5
[11:39] <IcePic> you should give ceph the raw disks, not something-on-top-of-raid
[11:40] <IcePic> it wont hurt if it happens to be r5 or so, but it wont be optimal either
[11:40] <IcePic> "it wont hurt your testing if.."
[11:40] <IcePic> corrected.
[11:43] * colde1 (~Sketchfil@176.123.6.154) Quit ()
[11:43] <New_to_Ceph> so I do not need to create any logical volumes? Just put the raw disks as is and install the ceph correct?will all the raw disks will be used for storage or their will be some used for the control and management?
[11:48] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[11:48] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[11:52] <IcePic> some, but you make pools inside the total storage(s) later on, which can have different levels of copies and so on. Or one that covers all.
[11:54] <New_to_Ceph> Thank you so much for your clarification.
[11:55] * ade (~abradshaw@hicloud-mgmt.urz.uni-heidelberg.de) has joined #ceph
[12:00] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[12:00] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[12:04] <bruc> nobody know how to get radosgw to work ?
[12:05] * badone (~badone@66.187.239.16) Quit (Remote host closed the connection)
[12:06] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[12:06] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[12:06] <bruc> it doesn't want to start, even if I can use rados commands
[12:10] * badone (~badone@66.187.239.16) has joined #ceph
[12:10] * zhaochao (~zhaochao@60.206.230.66) Quit (Ping timeout: 480 seconds)
[12:11] <skoude> doppelgrau: thanks..
[12:14] <skoude> question about the crushmap. I have 4 nodes. Two of them are in dc1 and two of them are on dc2. On dc1 serv1 and serv2 should mirror each other. On dc2 serv3 and srv4 should mirror the data. So basically I wan't that ceph cluster will stand the failure of one dc and one host. So basically ceph would still have all the data store even if second dc goes down and second node goes down from another dc..
[12:15] <skoude> so is that possible?
[12:15] <bruc> is radowgw supposed to work?
[12:18] * Venturi (~Venturi@194.249.247.164) Quit (Remote host closed the connection)
[12:18] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Ping timeout: 480 seconds)
[12:19] * kanagaraj (~kanagaraj@121.244.87.117) has joined #ceph
[12:20] <doppelgrau> skoude: yes, with 4 copies, two in each dc
[12:22] <skoude> doppelgrau: what if I wan't to add more nodes later on? For example to grow from two nodes to three nodes per dc? Is that a problem, and can I hot add those nodes later on?
[12:23] <doppelgrau> skoude: no problem if you use a correct location and crush rules
[12:23] <skoude> doppelgrau: this is where I would definately need some help with :D
[12:24] * EinstCrazy (~EinstCraz@111.30.21.47) Quit (Remote host closed the connection)
[12:24] <T1w> take a proper look at the CRUSH map
[12:24] <T1w> and how the different bucket types work
[12:25] <T1w> basicly you need to define dc1 and dc2
[12:25] <T1w> and make sure the right nodes are placed below the right dcs
[12:27] * sleinen1 (~Adium@2001:620:0:82::103) has joined #ceph
[12:27] <skoude> T1w: thanks. so basically the correct type is type 8.. so basically I just add bucket like: datacenter: dc1 {.. item serv1 weight 2 item serv2 weight 2 } ??
[12:28] <T1w> as long as you do not have much data in ceph you can moce nodes around without having to inject a new map - you can simply move your nodes to new parents with a few easy commands
[12:28] <skoude> T1w: and then just add the other datacenter at the same way
[12:28] <T1w> type (
[12:28] <T1w> type 8 even?
[12:28] <T1w> don't use the integers
[12:28] <T1w> use the names
[12:28] * RMar041 (~RMar04@5.153.255.226) has joined #ceph
[12:28] <T1w> either use the existing ones or add new proper (ie. that make sense to you)
[12:28] <skoude> so basically: datacenter dc1 {}
[12:29] <skoude> ans datacenter dc2{}
[12:29] <T1w> I added a new "serverroom 2" room with this
[12:29] <T1w> sudo ceph osd crush add-bucket sr2 room
[12:29] <T1w> and then I moved sr2 to below "root" with
[12:29] <T1w> sudo ceph osd crush move sr2 root=default
[12:30] <T1w> I then added a rack - row 2, rack 2 with
[12:30] <T1w> rack b even
[12:30] <T1w> sudo ceph osd crush add-bucket sr2-b-02 rack
[12:30] <T1w> oh, sorry..
[12:30] <skoude> aah, okay, so I can do it with command alsol.. But there is also the second thing.. I have a mixed ssd and sata drives, so i need to define ceph_fast and ceph_slow cluster
[12:30] <T1w> rack o2 in row B
[12:30] <T1w> and then I moved that rack below the sr2 room with
[12:30] <skoude> so it add's a little bit complexity in the config
[12:30] <T1w> sudo ceph osd crush move sr2-b-02 root=default room=sr2
[12:31] * krogon (~krogon@irdmzpr02-ext.ir.intel.com) has joined #ceph
[12:31] <T1w> lastly I moved a single node (here "ceph1") below the sr2-b-02 rack in room sr2 with
[12:31] <T1w> sudo ceph osd crush move ceph1 rack=sr2-b-02 root=default
[12:31] <T1w> lastly I checked that everything was as I expected with
[12:31] <T1w> sudo ceph osd crush tree
[12:32] * sleinen (~Adium@2001:620:0:2d:7ed1:c3ff:fedc:3223) Quit (Read error: Connection reset by peer)
[12:32] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[12:32] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[12:32] <T1w> or you could extract a editable (in ascii) CRUSH map, edit it and inject it again
[12:33] <T1w> but the other commands are IMO just as easy as long as the number of nodes/OSDs or other items are not that large
[12:34] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[12:34] <bruc> does anyone knows anythign about radowsgw configuration ?
[12:34] * RMar041 (~RMar04@5.153.255.226) Quit (Quit: Leaving.)
[12:34] * RMar04 (~RMar04@support.memset.com) Quit (Ping timeout: 480 seconds)
[12:34] * rakeshgm (~rakesh@121.244.87.124) has joined #ceph
[12:34] <skoude> T1w: thanks, I really appreciate this.. will test now..
[12:34] <T1w> sorry, I'm not using radosgw
[12:35] <T1w> skoude: np
[12:35] <skoude> T1w: can I just use datacenter and ignore room and rack? So that the server is just onluy in the datacenter?
[12:35] <T1w> skoude: sure
[12:36] <T1w> skoude: but you might need them at a later point where you've got lots of data in ceph
[12:36] <T1w> adding them at that point could cause a lot of data to shuffle around
[12:36] <skoude> T1w: okay thanks
[12:36] <T1w> atm I've got everything in that single rack sr2-b-02
[12:37] <T1w> but perhaps at some point in the future I might add some new nodes in sr2-b-01 or in sr2-d-04
[12:38] <T1w> just by having the individual rack specified right now I can much much easier add other racks at a later time
[12:38] <T1w> or other server rooms
[12:38] <T1w> or perhaps datacenters
[12:38] <T1w> (ok, I havn't got any DCs defined, but I probably should)
[12:39] * RMar04 (~RMar04@5.153.255.226) has joined #ceph
[12:43] * dgurtner (~dgurtner@nat-pool-ork-u.redhat.com) has joined #ceph
[12:43] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit (Quit: Leaving.)
[12:43] * jclm1 (~jclm@nat-persul-plg.wifi.univ-lorraine.fr) Quit (Quit: Leaving.)
[12:46] * enax (~enax@hq.ezit.hu) Quit (Remote host closed the connection)
[12:47] * kanagaraj (~kanagaraj@121.244.87.117) Quit (Quit: Leaving)
[12:48] * DV__ (~veillard@2001:41d0:1:d478::1) has joined #ceph
[12:50] * enax (~enax@hq.ezit.hu) has joined #ceph
[12:52] * RMar04 (~RMar04@5.153.255.226) Quit (Quit: Leaving.)
[12:53] <skoude> T1w: now I have the buckets defined and servers are in correct place... No I just need to separate the storage to ceph_fast and ceph_slow.. So basically If I undestood correctly I just need to create new buckets (hosts, racks) and move the osd's to those new hosts.. Am I correct?
[12:54] <skoude> I think it's easier to do it by just editing the crushmap. or?
[12:54] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[12:55] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[12:56] * ogzy (~oguzy@212.174.232.222) Quit (Remote host closed the connection)
[12:56] * tacticus (~tacticus@v6.kca.id.au) Quit (Ping timeout: 480 seconds)
[12:57] * sankarshan (~sankarsha@121.244.87.124) has joined #ceph
[12:57] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[12:58] * RMar04 (~RMar04@5.153.255.226) has joined #ceph
[12:59] * dux0r (~shishi@nl2x.mullvad.net) has joined #ceph
[13:00] * nils_ (~nils_@doomstreet.collins.kg) has joined #ceph
[13:02] * RMar041 (~RMar04@support.memset.com) has joined #ceph
[13:06] * RMar04 (~RMar04@5.153.255.226) Quit (Ping timeout: 480 seconds)
[13:09] * RMar041 (~RMar04@support.memset.com) Quit (Quit: Leaving.)
[13:10] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[13:10] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[13:11] * overclk (~vshankar@121.244.87.117) has joined #ceph
[13:12] <skoude> can one room be inside a multiple roots?
[13:12] * overclk (~vshankar@121.244.87.117) Quit ()
[13:14] <Be-El> no, labels have to be unique
[13:17] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Ping timeout: 480 seconds)
[13:20] * tacticus (~tacticus@v6.kca.id.au) has joined #ceph
[13:20] * kefu (~kefu@211.22.145.245) Quit (Ping timeout: 480 seconds)
[13:21] * thomnico (~thomnico@2a01:e35:8b41:120:4d4:ff04:2bff:227b) has joined #ceph
[13:21] * rdas (~rdas@121.244.87.116) Quit (Quit: Leaving)
[13:22] * EinstCrazy (~EinstCraz@117.15.122.189) has joined #ceph
[13:22] * kefu (~kefu@114.92.107.250) has joined #ceph
[13:23] * Fr33K (~freeky@80.123.93.206) Quit (Ping timeout: 480 seconds)
[13:25] <skoude> Be-El: thanks, I noticed that now..
[13:25] <skoude> Can somebody explain me the meaning of Weight?
[13:29] * dux0r (~shishi@4MJAAAHAJ.tor-irc.dnsbl.oftc.net) Quit ()
[13:29] <skoude> basically if I have 480GB SSD drive will i give it weight = 0.430 and in 4TB drives I would give like 3.640?
[13:30] <skoude> what if I give 480GB drive weight = 1.00, what will happen?
[13:31] <bjozet> weight will distribute i/o requests
[13:32] <bjozet> i don't think it's recommended to mix such different devices in same pool
[13:32] <skoude> I have different pools now :)=
[13:32] <bjozet> higher weight = more requests
[13:32] * swami1 (~swami@106.216.181.17) Quit (Ping timeout: 480 seconds)
[13:32] <skoude> So basically I will set the weight to be 1.
[13:32] <skoude> Because now I have ceph_fast and ceph_slow pools.
[13:33] <skoude> and in ceph_fast all of the disks are ssd, so they will have the same weight = 1?
[13:33] <skoude> and same in ceph_slow
[13:33] <bjozet> if they're the same size and specs, yes
[13:33] <bjozet> then that would ensure even distribution
[13:33] <peeejayz> Anyone done Hammer to infernalis upgrade? Testing on dev cluster and having some problems bringing up OSDs
[13:34] <T1w> read this for an explanation of the 2 different weights
[13:34] <T1w> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040961.html
[13:34] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[13:34] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[13:35] <T1w> there is a CRUSH OSD weight that should reflect the OSD capacity (it COULD reflect on ssd/sata capacity) and is most often based on the amout of data the OSD can store
[13:36] <T1w> and then there is the OSD weight that tells CRUSH how many requests CRUSH will send to it
[13:37] <Be-El> eh...no
[13:38] * Fr33K (~freeky@80.123.93.206) has joined #ceph
[13:38] <Be-El> both weights define the amoung of data to be put on the OSD. they do not interfere with IO or requests.
[13:38] <Be-El> the OSD crush weight is the relative capacity of a OSD (usually reflected by being based on its size)
[13:39] <Be-El> the OSD weight (without crush) is just a corrective weight since data distribution is not perfectly even
[13:39] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[13:39] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[13:39] <Be-El> if you use 'ceph osd reweight-by-utilization' with will change the OSD weight to allow for a more even distibution of data without changing the crush weight itsel
[13:39] <Be-El> +f
[13:40] <skoude> So basically, I have it like this now.. don't worry there are no racks and datacenters yet, i will add them later: http://pastebin.com/mqQEyXWk
[13:40] * thomnico (~thomnico@2a01:e35:8b41:120:4d4:ff04:2bff:227b) Quit (Read error: No route to host)
[13:40] <Be-El> skoude: you don't need racks,datacenters or whatever if they are not part of your failure domain
[13:41] <Be-El> skoude: our cluster only has root(s), hosts and OSDs, too
[13:41] <skoude> but now when I have ceph_fast and ceph_slow, how do I define that all of the datas on hosts are replicated on all of the hosts in ceph_fast? So that if three of the hosts goes down, it would still work?
[13:42] <skoude> so basically 3 of 4 hosts could go down, and still the data would be secured and accessible..
[13:43] <Be-El> create a pool with the correct crush ruleset and a size of 4 and a min_size of 1
[13:43] <skoude> rule ssd { ruleset 0 type replicated min_size 1 max_size 10 step take ceph_fast step chooseleaf firstn 0 type host step emit
[13:43] <skoude> }
[13:44] <skoude> So I just change the max_size to 4 and that should do it?
[13:44] * thomnico (~thomnico@2a01:e35:8b41:120:4d4:ff04:2bff:227b) has joined #ceph
[13:44] <Be-El> the rule is independent of the size. for certain rules you need to ensure that a certain min or max size is used
[13:45] <Be-El> the size is set for the individual pool, since each pool can have a different size
[13:45] <Be-El> size == number of replicates
[13:45] <skoude> aah, okay it's explained here : http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
[13:47] * DV__ (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[13:49] * analbeard (~shw@support.memset.com) Quit (Ping timeout: 480 seconds)
[13:50] <Be-El> a size of 4 is quite high. you will end up with only 25% usual capacity since every bit of data stored in 4 replicates
[13:50] * DV__ (~veillard@2001:41d0:1:d478::1) has joined #ceph
[13:51] <Be-El> that's paranoia mode++
[13:51] * T1w (~jens@node3.survey-it.dk) Quit (Ping timeout: 480 seconds)
[13:51] <Be-El> also keep in mind that every write operations needs to be completed four times before it is acknowledges to a client
[13:55] <skoude> Be-El: Basically we want' to survive complete loss of one dc and one server in another dc, so that's the reason why we are doing it. :)
[13:56] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Ping timeout: 480 seconds)
[13:56] <Be-El> skoude: sound reasonable. but do not expect it to be amazingly fast ;-
[13:58] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[13:58] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[13:58] <skoude> Be-El: the replication could also be so that ceph1_ssd and ceph3_ssd mirrors each others, then It would tolerate complete loss of one dc and the data would still be in a second dc. Maybe that would be bettet solution, then we would just need to write the same data only two times?
[13:59] * kanagaraj (~kanagaraj@27.7.8.160) has joined #ceph
[14:00] * nils_ (~nils_@doomstreet.collins.kg) Quit (Quit: This computer has gone to sleep)
[14:00] * lpabon (~quassel@24-151-54-34.dhcp.nwtn.ct.charter.com) has joined #ceph
[14:00] <Be-El> skoude: i would keep 4 separate hosts, since it allows you to change a single host in case of a failure. mirroring data between hosts sounds a little bit too adventurous to me. just let ceph do the distribution and run some benchmarks
[14:01] * analbeard (~shw@5.153.255.226) has joined #ceph
[14:03] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Ping timeout: 480 seconds)
[14:05] * sankarshan (~sankarsha@121.244.87.124) Quit (Quit: Are you sure you want to quit this channel (Cancel/Ok) ?)
[14:05] <Be-El> skoude: if you want to ensure that copies are stored on different datacenters, i would propose to write a new crush rule that is only applicable for a size of 4. it distributes 2 copies in datacenter A using host level for distribution on different hosts, and two copies in datacenter B
[14:06] <Be-El> skoude: if your cluster grows e.g. to 3 or 4 hosts in the individual datacenters, you don't need to change the rules
[14:06] <skoude> hmm. but what if I define ceph_fast1 and ceph_fast2. ceph_fast1 would ave hosts ceph1_ssd and ceph3_ssd. And ceph_fast2 would have hosts: ceph2_ssd and ceph4_ssd. Then i would have min_size=1 on ruleset. This way the same data would be on two dc's and it would still work if another server goes down
[14:06] <skoude> am I correct?
[14:07] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[14:09] <Be-El> skoude: i would use only two roots (ssd + sata) and use intermediate branches like datacenter level and a crush ruleset as described above
[14:09] <Be-El> skoude: but with two datacenters you might have another problem
[14:09] <Be-El> skoude: how many monitor system do you want to use?
[14:10] <skoude> Be-El: do you mean network monitoring like nagios etc?
[14:10] <Be-El> no, ceph monitors
[14:12] <skoude> Be-El: I have 4 mons
[14:12] <skoude> Be-El: ceph mon stat
[14:12] <skoude> e1: 4 mons at {d0c-c4-7a-1f-00-1e=172.20.36.12:6789/0,d0c-c4-7a-1f-04-6c=172.20.36.10:6789/0,d0c-c4-7a-1f-06-2a=172.20.36.11:6789/0,d0c-c4-7a-1f-0a-44=172.20.36.13:6789/0}, election epoch 4, quorum 0,1,2,3 d0c-c4-7a-1f-04-6c,d0c-c4-7a-1f-06-2a,d0c-c4-7a-1f-00-1e,d0c-c4-7a-1f-0a-44
[14:12] <Be-El> skoude: that's bad
[14:12] <Be-El> skoude: i assume two mons are located in each datacenter
[14:13] <skoude> Be-El: basically one on each server
[14:13] <Be-El> skoude: if the connection between the datacenters fails, you will have a netsplit with two mons on each side
[14:13] <Be-El> skoude: but you need quorum to have an operational cluster
[14:14] <Be-El> skoude: so you either should use 3 or 5 mons. and the additional mon should not be placed in one of the datacenters in the optimal case
[14:14] <skoude> Be-El: that's a good point :)
[14:14] <peeejayz> When upgrading to infernalis, why are: /usr/lib/udev/rules.d/95-ceph-osd.rules no automatically copied so they run?
[14:14] * Dragonshadow (~DougalJac@ns316491.ip-37-187-129.eu) has joined #ceph
[14:15] <skoude> Be-El: I can basically install one monitor more in the middle of the dc's
[14:15] <Be-El> skoude: i haven't used multi-datacenter setups yet, but there's been some discussion on the mailing list about it
[14:16] <skoude> Be-El: or if I just remove one of the monitors on one node?
[14:17] * skorgu (skorgu@pylon.skorgu.net) Quit (Quit: leaving)
[14:17] * skorgu (skorgu@pylon.skorgu.net) has joined #ceph
[14:17] <Be-El> skoude: in that case a link failure would allow one datacenter to be operational
[14:18] <Be-El> skoude: but it may be the datacenter with the failed uplink, so accessibility is not granted
[14:18] <skoude> Be-El: We have multiple routes and connections between the datacenters and so there should be no connection losses between the sites.
[14:19] * skorgu (skorgu@pylon.skorgu.net) Quit ()
[14:19] * skorgu (skorgu@pylon.skorgu.net) has joined #ceph
[14:23] * Mined is now known as MinedAWAY
[14:24] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[14:24] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[14:24] * daviddcc (~dcasier@LAubervilliers-656-1-16-160.w217-128.abo.wanadoo.fr) has joined #ceph
[14:26] * bara (~bara@213.175.37.12) has joined #ceph
[14:26] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[14:26] * jclm (~jclm@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[14:28] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit (Remote host closed the connection)
[14:29] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[14:31] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit (Remote host closed the connection)
[14:33] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[14:34] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit ()
[14:35] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) has joined #ceph
[14:36] * dugravot6 (~dugravot6@nat-persul-plg.wifi.univ-lorraine.fr) Quit ()
[14:37] * vbellur (~vijay@121.244.87.124) Quit (Ping timeout: 480 seconds)
[14:37] * nhm (~nhm@c-50-171-139-246.hsd1.mn.comcast.net) Quit (Remote host closed the connection)
[14:37] * nardial (~ls@dslb-088-072-085-164.088.072.pools.vodafone-ip.de) has joined #ceph
[14:37] * nhm (~nhm@c-50-171-139-246.hsd1.mn.comcast.net) has joined #ceph
[14:37] * ChanServ sets mode +o nhm
[14:38] * dgurtner (~dgurtner@nat-pool-ork-u.redhat.com) Quit (Ping timeout: 480 seconds)
[14:44] * jasuarez (~jasuarez@237.Red-83-39-111.dynamicIP.rima-tde.net) Quit (Ping timeout: 480 seconds)
[14:44] * Dragonshadow (~DougalJac@7V7AAB1DH.tor-irc.dnsbl.oftc.net) Quit ()
[14:45] * theengineer (~theengine@45-31-177-36.lightspeed.austtx.sbcglobal.net) has joined #ceph
[14:47] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[14:47] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[14:47] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) Quit ()
[14:51] * kanagaraj (~kanagaraj@27.7.8.160) Quit (Quit: Leaving)
[14:52] * rotbeard (~redbeard@185.32.80.238) has joined #ceph
[14:53] * jdillaman (~jdillaman@pool-108-18-97-82.washdc.fios.verizon.net) has joined #ceph
[14:54] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[14:54] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[14:55] <Heebie> I have a pool, which appears to have policy applied for erasure-code, and a rule applied etc... if I try to "put" an object or even "ls" the pool, the command just hangs and nothing happens. No change if I specify a mon. Any thoughts on where should look for clues?
[14:57] * dyasny (~dyasny@dsl.dynamic.191-116-74-190.electronicbox.net) has joined #ceph
[14:58] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) has joined #ceph
[14:59] <skoude> I was thinking that what if i just put the 4 servers on same cluster and set it to write the data three times? This way the data would be still stored in one dc even if the two nodes from second dc goes out?
[14:59] * mhack (~mhack@66-168-117-78.dhcp.oxfr.ma.charter.com) has joined #ceph
[15:00] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) Quit ()
[15:02] * lcurtis (~lcurtis@ool-44c08556.dyn.optonline.net) has joined #ceph
[15:02] * rakeshgm (~rakesh@121.244.87.124) Quit (Quit: Leaving)
[15:02] * Concubidated (~Adium@pool-98-119-93-148.lsanca.fios.verizon.net) has joined #ceph
[15:02] * dyasny (~dyasny@dsl.dynamic.191-116-74-190.electronicbox.net) Quit (Quit: Ex-Chat)
[15:03] * dyasny (~dyasny@dsl.dynamic.191-116-74-190.electronicbox.net) has joined #ceph
[15:03] * Kupo1 (~tyler.wil@23.111.254.159) has joined #ceph
[15:05] * Kupo2 (~tyler.wil@23.111.254.159) Quit (Ping timeout: 480 seconds)
[15:08] * xarses (~xarses@50.141.34.210) has joined #ceph
[15:11] * wjw-freebsd (~wjw@vpn.ecoracks.nl) has joined #ceph
[15:12] * andreww (~xarses@50.141.33.30) Quit (Ping timeout: 480 seconds)
[15:15] * ira (~ira@c-73-238-173-100.hsd1.ma.comcast.net) has joined #ceph
[15:22] * rotbeard (~redbeard@185.32.80.238) Quit (Quit: Leaving)
[15:24] <bruc> finally got my radosgw to start !
[15:24] <skoude> can I just delete metadata and rbd pools (that are created in default installation) or are they needed anywhere?
[15:25] <skoude> bruc: brilliant, what was the problem?
[15:25] <bruc> but now i'm facing a 405 error when using the s3test.py script
[15:25] * kanagaraj (~kanagaraj@27.7.8.160) has joined #ceph
[15:25] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[15:25] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[15:25] * New_to_Ceph (~oftc-webi@84.241.250.11) Quit (Quit: Page closed)
[15:26] <bruc> actually the problem came from the way I was trying to start the thing I got it to work with this command :
[15:26] <bruc> radosgw --cluster clustername
[15:26] <bruc> using /etc/init.d/radosgw start dosen't work at all
[15:27] * TheSov2 (~TheSov@cip-248.trustwave.com) has joined #ceph
[15:28] <bruc> I see that it could comes from some entries missing such as rgw dns host
[15:28] <bruc> but I a have no clue why I should use it
[15:29] <TheSov2> it has been my experience that ceph monitors REQUIRE a HA dns solution for client connectivity
[15:32] * bara (~bara@213.175.37.12) Quit (Ping timeout: 480 seconds)
[15:34] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) Quit (Read error: Connection reset by peer)
[15:34] * kanagaraj (~kanagaraj@27.7.8.160) Quit (Quit: Leaving)
[15:35] * brad_mssw (~brad@66.129.88.50) has joined #ceph
[15:37] <bruc> "HA"? what do you mean?
[15:37] * thomnico (~thomnico@2a01:e35:8b41:120:4d4:ff04:2bff:227b) Quit (Quit: Ex-Chat)
[15:37] <bruc> i'm trying to use the general DNS but it doesn't do much
[15:38] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[15:38] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[15:40] <TheSov2> HA as in highly available
[15:40] <TheSov2> remember the whole point is to avoid a single point of failure
[15:40] <bruc> ok, so it's already the case
[15:41] <bruc> so in my cluster conf ile I added :
[15:41] <bruc> rgw dns name = dnsservername.domain.com
[15:41] <bruc> it should be this way ?
[15:41] <TheSov2> your monitors need to also have hostnames
[15:41] * mattbenjamin1 (~mbenjamin@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[15:42] <bruc> they are on servers with their own hostname, it's ok ?
[15:42] <TheSov2> use those hostnames
[15:42] <TheSov2> yeah as long as a hostname corresponds with the monitor
[15:43] <bruc> okay
[15:43] * thomnico (~thomnico@2a01:e35:8b41:120:4d4:ff04:2bff:227b) has joined #ceph
[15:43] * DV__ (~veillard@2001:41d0:1:d478::1) Quit (Ping timeout: 480 seconds)
[15:43] <bruc> how does the dns line should look like?
[15:44] <bruc> I used ceph deploy with the host name everytime
[15:44] <bruc> (when icreated my ceph cluster)
[15:45] <bruc> so it created automatically a conf file for the cluster, using the Mon hostnames
[15:46] * vata (~vata@cable-21.246.173-197.electronicbox.net) Quit (Ping timeout: 480 seconds)
[15:47] <bruc> I still have a 405 not allowed error
[15:47] * bara (~bara@nat-pool-brq-t.redhat.com) has joined #ceph
[15:49] * jcsp (~jspray@82-71-16-249.dsl.in-addr.zen.co.uk) has joined #ceph
[15:49] <via> win 35
[15:49] <via> dammit
[15:49] * dgurtner (~dgurtner@nat-pool-ork-u.redhat.com) has joined #ceph
[15:49] * bene_in_mtg (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[15:53] * DV__ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[15:54] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) has joined #ceph
[15:54] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) Quit ()
[15:54] * cephuser_ (~Ceph@al.secure.elitehosts.com) Quit (Ping timeout: 480 seconds)
[15:54] * avib (~Ceph@al.secure.elitehosts.com) Quit (Ping timeout: 480 seconds)
[15:55] <bruc> TheSov2: for the dns line in the conf file, should it look like :
[15:55] <bruc> rgw dns name = dnsservername.domain.com
[15:55] <bruc> ?
[15:56] <TheSov2> no
[15:56] * ngoswami (~ngoswami@121.244.87.116) Quit (Quit: Leaving)
[15:56] <TheSov2> wait
[15:57] <TheSov2> what exactly is your issue
[15:57] <TheSov2> i just got here
[15:57] <bruc> ^^
[15:58] <bruc> I'm trying the s3test.py script from the ceph documentation, to check if the radosgws is working properly
[15:58] * cephuser_ (~Ceph@al.secure.elitehosts.com) has joined #ceph
[15:58] <bruc> it displays me S3ResponseError: 405 Method Not Allowed
[15:58] * avib (~Ceph@al.secure.elitehosts.com) has joined #ceph
[15:59] <bruc> looking on some obscure websites, I found that it might be linked to the lack of a rgw dns name in the cluster.conf file
[16:00] <TheSov2> and you are sure all versions of the software you are using is correct?
[16:00] * thomnico (~thomnico@2a01:e35:8b41:120:4d4:ff04:2bff:227b) Quit (Quit: Ex-Chat)
[16:00] * thomnico (~thomnico@2a01:e35:8b41:120:4d4:ff04:2bff:227b) has joined #ceph
[16:00] <bruc> yes, every server is upgraded
[16:00] <TheSov2> one of the biggest issues i have seen people deal with is software mismatching
[16:01] <TheSov2> do you get a 405 on upload test
[16:01] <TheSov2> ?
[16:01] <TheSov2> or just when you run the s3test
[16:02] <Be-El> the log file might contain information about the cause of the error message...
[16:02] <bruc> for now i'm just using the s3test script
[16:02] <TheSov2> ok did you install phython-virtualenv?
[16:02] <TheSov2> python-virtualenv
[16:02] * thomnico (~thomnico@2a01:e35:8b41:120:4d4:ff04:2bff:227b) Quit ()
[16:02] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[16:02] <bruc> yes but i'm a bit lost between all the log that it generates
[16:02] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[16:02] * bara (~bara@nat-pool-brq-t.redhat.com) Quit (Ping timeout: 480 seconds)
[16:03] <bruc> no, i'm using python-boto
[16:04] <bruc> (following what the official documentation says)
[16:04] <TheSov2> ok bruc show me your ceph.conf
[16:05] * xarses (~xarses@50.141.34.210) Quit (Ping timeout: 480 seconds)
[16:06] <TheSov2> rgw dns name = dns>.<domainname>.com is that actually in your ceph.conf?
[16:06] <bruc> nope, it's just not the real name
[16:07] <bruc> just to say that it's not an IP address
[16:07] <TheSov2> it should be equal to the hostname
[16:07] <bruc> oh
[16:07] <bruc> I try
[16:08] <TheSov2> hold on we arent done
[16:08] <bruc> Just to be sure I push / pull the conf + restart radosgw, is that enough?
[16:08] <TheSov2> we are gonna go step by step
[16:09] * DV (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[16:09] <bruc> ok
[16:09] <bruc> I did put the right hostname
[16:11] <bruc> I push pull the conf + restart radosgw now?
[16:11] * wyang (~wyang@46.21.158.66) has joined #ceph
[16:11] * yanzheng (~zhyan@125.71.106.102) Quit (Quit: This computer has gone to sleep)
[16:12] * bara (~bara@213.175.37.12) has joined #ceph
[16:13] * DV__ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[16:14] * BillyBobJohn (~hyst@109.201.143.40) has joined #ceph
[16:15] * enax (~enax@hq.ezit.hu) Quit (Ping timeout: 480 seconds)
[16:15] <TheSov2> try it
[16:16] <bruc> still 405
[16:16] * bara (~bara@213.175.37.12) Quit ()
[16:17] * bara (~bara@213.175.37.12) has joined #ceph
[16:17] <TheSov2> fak
[16:17] <TheSov2> ok
[16:18] <bruc> you're saying what comes to my mind
[16:19] * xarses (~xarses@64.124.158.100) has joined #ceph
[16:19] <bruc> ^^
[16:19] * DV__ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[16:20] * DV (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[16:22] <bruc> do you have another clue?
[16:25] * cmendes0101 (~cmendes01@pool-74-100-211-221.lsanca.fios.verizon.net) has joined #ceph
[16:26] * tsg (~tgohad@192.55.54.43) has joined #ceph
[16:29] * thomnico (~thomnico@2a01:e35:8b41:120:4d4:ff04:2bff:227b) has joined #ceph
[16:29] * zaitcev (~zaitcev@c-50-130-189-82.hsd1.nm.comcast.net) has joined #ceph
[16:29] <bruc> I tried sudo stop ceph-all
[16:29] <bruc> sudo start ceph-all
[16:29] <bruc> on every node, still nothing
[16:30] * egonzalez (~egonzalez@238.238.14.62.static.jazztel.es) Quit (Quit: Saliendo)
[16:30] <bruc> here is the complete error message :
[16:30] <bruc> Traceback (most recent call last):
[16:30] <bruc> File "s3test.py", line 12, in <module>
[16:30] <bruc> bucket = conn.create_bucket('my-new-bucket')
[16:30] <bruc> File "/usr/lib/python2.7/dist-packages/boto/s3/connection.py", line 504, in create_bucket
[16:30] <bruc> response.status, response.reason, body)
[16:30] <bruc> boto.exception.S3ResponseError: S3ResponseError: 405 Method Not Allowed
[16:30] <bruc> None
[16:31] * vata (~vata@207.96.182.162) has joined #ceph
[16:32] * dgurtner (~dgurtner@nat-pool-ork-u.redhat.com) Quit (Ping timeout: 480 seconds)
[16:34] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[16:34] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[16:38] * magicrobotmonkey (~magicrobo@8.29.8.68) Quit (Remote host closed the connection)
[16:38] <bruc> There is a lack on explanation over the use of options in the doc...
[16:40] * magicrobotmonkey (~magicrobo@8.29.8.68) has joined #ceph
[16:41] * dgurtner (~dgurtner@nat-pool-ork-u.redhat.com) has joined #ceph
[16:42] <bruc> TheSov2 do you have any other suggestion ?
[16:43] * enax (~enax@94-21-125-67.pool.digikabel.hu) has joined #ceph
[16:44] * BillyBobJohn (~hyst@4MJAAAHH6.tor-irc.dnsbl.oftc.net) Quit ()
[16:44] * swami1 (~swami@27.7.169.76) has joined #ceph
[16:51] * dgurtner (~dgurtner@nat-pool-ork-u.redhat.com) Quit (Ping timeout: 480 seconds)
[16:53] * enax (~enax@94-21-125-67.pool.digikabel.hu) Quit (Ping timeout: 480 seconds)
[16:55] * egonzalez (~egonzalez@238.238.14.62.static.jazztel.es) has joined #ceph
[16:55] * ngoswami (~ngoswami@114.143.45.41) has joined #ceph
[16:57] * debian112 (~bcolbert@24.126.201.64) has joined #ceph
[17:00] * ircolle (~Adium@2601:285:201:2bf9:c99d:f371:fcea:f79e) has joined #ceph
[17:00] <bruc> halp
[17:00] * moore (~moore@64.202.160.88) has joined #ceph
[17:00] * analbeard (~shw@5.153.255.226) Quit (Quit: Leaving.)
[17:01] * brutuscat (~brutuscat@31.4.208.200) has joined #ceph
[17:02] * brutuscat (~brutuscat@31.4.208.200) Quit ()
[17:07] * Hazmat (~Inverness@anon-41-165.vpn.ipredator.se) has joined #ceph
[17:13] * jclm (~jclm@nat-persul-plg.wifi.univ-lorraine.fr) Quit (Quit: Leaving.)
[17:16] * wjw-freebsd2 (~wjw@vpn.ecoracks.nl) has joined #ceph
[17:20] * wjw-freebsd (~wjw@vpn.ecoracks.nl) Quit (Ping timeout: 480 seconds)
[17:23] * EinstCrazy (~EinstCraz@117.15.122.189) Quit (Remote host closed the connection)
[17:24] * shylesh (~shylesh@59.95.71.70) has joined #ceph
[17:26] * wushudoin (~wushudoin@38.140.108.2) has joined #ceph
[17:26] <Heebie> RE: My issue with my pool not accepting objects... it appears to have been because the crush rule I created was created with the wrong create command. I had used `ceph osd crush rule create-simple` but I needed to use `ceph osd crush rule create-erasure` I did the create with the new command, then created a pool with that rule, and it works now.
[17:26] * KaneK (~kane@cpe-172-88-240-14.socal.res.rr.com) has joined #ceph
[17:31] * joshd1 (~jdurgin@68-119-140-18.dhcp.ahvl.nc.charter.com) has joined #ceph
[17:32] * danieagle (~Daniel@177.188.66.106) has joined #ceph
[17:36] * libracious_ (uid95058@2001:67c:2f08:6::1:7352) has joined #ceph
[17:36] * dougf (~dougf@96-38-99-179.dhcp.jcsn.tn.charter.com) Quit (Ping timeout: 480 seconds)
[17:36] * Hazmat (~Inverness@84ZAAAAH7.tor-irc.dnsbl.oftc.net) Quit ()
[17:37] * linjan (~linjan@86.62.112.22) Quit (Ping timeout: 480 seconds)
[17:37] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[17:38] <libracious_> hi...so i have been reading and watching videos on ceph a lot lately.. and i become very interested in it...
[17:38] * dougf (~dougf@96-38-99-179.dhcp.jcsn.tn.charter.com) has joined #ceph
[17:38] <TheSov2> bruc, i didnt forget you, im having a difficult time with your problem
[17:39] <TheSov2> everything points to the name
[17:39] <TheSov2> libracious_, hello. ceph is awesome
[17:39] <TheSov2> libracious_, what can we do for you
[17:39] <libracious_> i am a programmer and i wanted to contribute to it..
[17:40] <TheSov2> oh then you wanna goto the dev channel heh
[17:40] <TheSov2> this is practical application
[17:40] * Be-El (~quassel@fb08-bcf-pc01.computational.bio.uni-giessen.de) Quit (Remote host closed the connection)
[17:40] * sudocat (~dibarra@2602:306:8bc7:4c50::46) Quit (Remote host closed the connection)
[17:41] <libracious_> thanks
[17:45] * DV__ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[17:47] * kefu (~kefu@114.92.107.250) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[17:49] * MinedAWAY is now known as Mined
[17:50] <bruc> thanks sov2
[17:50] <bruc> I have to leave now unfortunately
[17:50] <bruc> I got some help and now i'm stuck with a 503 error
[17:51] <bruc> that's awesome
[17:51] <bruc> ^^
[17:51] * squizzi (~squizzi@nat-pool-rdu-t.redhat.com) has joined #ceph
[17:51] <bruc> I wish Ceph codumentation was a bit clearer and with less errors
[17:51] <bruc> thank you for your help
[17:52] <TheSov2> yeah redhat has no incentive to do that
[17:52] <TheSov2> lol
[17:52] <TheSov2> they wanna sell support
[17:52] <bruc> I see that haha
[17:52] * DV_ (~veillard@2001:41d0:a:f29f::1) Quit (Ping timeout: 480 seconds)
[17:52] * bruc (~bruno.car@178.237.98.13) has left #ceph
[17:53] * cephuser__ (~Ceph@al.secure.elitehosts.com) has joined #ceph
[17:54] * b0e (~aledermue@213.95.25.82) Quit (Ping timeout: 480 seconds)
[17:54] * nardial (~ls@dslb-088-072-085-164.088.072.pools.vodafone-ip.de) Quit (Remote host closed the connection)
[17:55] * kefu (~kefu@114.92.107.250) has joined #ceph
[17:55] * jrankin (~jrankin@64.125.97.37) has joined #ceph
[18:02] * DV_ (~veillard@2001:41d0:a:f29f::1) has joined #ceph
[18:05] * swami1 (~swami@27.7.169.76) Quit (Quit: Leaving.)
[18:05] * dgurtner (~dgurtner@84.203.232.226) has joined #ceph
[18:05] * lcurtis (~lcurtis@ool-44c08556.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[18:06] * lcurtis (~lcurtis@ool-44c08556.dyn.optonline.net) has joined #ceph
[18:12] * mattbenjamin1 (~mbenjamin@76-206-42-105.lightspeed.livnmi.sbcglobal.net) Quit (Quit: Leaving.)
[18:12] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Quit: Ex-Chat)
[18:12] * reed (~reed@2607:f298:a:607:8de3:3d27:1b21:9029) has joined #ceph
[18:12] * martinohansen (~martinoha@185.85.5.78) has joined #ceph
[18:14] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[18:15] * Rachana (~Rachana@2601:87:3:3601::65f2) has joined #ceph
[18:16] * Rachana (~Rachana@2601:87:3:3601::65f2) Quit ()
[18:16] * Rachana (~Rachana@2601:87:3:3601::65f2) has joined #ceph
[18:17] * lcurtis (~lcurtis@ool-44c08556.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[18:18] * ade (~abradshaw@hicloud-mgmt.urz.uni-heidelberg.de) Quit (Ping timeout: 480 seconds)
[18:18] * kefu (~kefu@114.92.107.250) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:19] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[18:19] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[18:22] * pabluk is now known as pabluk_
[18:26] * maku1 (~HoboPickl@104.238.169.53) has joined #ceph
[18:26] * TMM (~hp@sams-office-nat.tomtomgroup.com) Quit (Quit: Ex-Chat)
[18:26] * wjw-freebsd2 (~wjw@vpn.ecoracks.nl) Quit (Ping timeout: 480 seconds)
[18:27] * bvi (~bastiaan@185.56.32.1) Quit (Quit: Ex-Chat)
[18:29] * martinohansen (~martinoha@185.85.5.78) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[18:30] * joshd1 (~jdurgin@68-119-140-18.dhcp.ahvl.nc.charter.com) Quit (Quit: Leaving.)
[18:34] * martinohansen (~martinoha@185.85.5.78) has joined #ceph
[18:34] * bara (~bara@213.175.37.12) Quit (Ping timeout: 480 seconds)
[18:35] * egonzalez (~egonzalez@238.238.14.62.static.jazztel.es) Quit (Quit: Saliendo)
[18:41] * beardo_ (~sma310@207-172-244-241.c3-0.atw-ubr5.atw.pa.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[18:45] * mykola (~Mikolaj@91.225.200.204) has joined #ceph
[18:46] <KaneK> hey guys, can you help me understand this state: http://pastebin.com/riNyg4pW
[18:47] <KaneK> I have 6 osds, with 2 out: http://pastebin.com/rg51GiFd
[18:47] <TheSov2> 6 osds: 4 up, 4 in; 46 remapped pgs
[18:47] <TheSov2> yes
[18:47] <TheSov2> where are those 2 located physically
[18:47] <TheSov2> are they the same node?
[18:48] <KaneK> yes, there is output of osd tree
[18:48] <KaneK> under 2nd link
[18:48] <KaneK> those 2 on different nodes
[18:48] <KaneK> the ones that are out
[18:48] <KaneK> there is total of 3 hosts, 2 osd each, 2 osd out on different hosts
[18:48] <KaneK> cluster never rebalanced/healed itself
[18:49] <TheSov2> why should it
[18:49] <TheSov2> you never deleted the out disks
[18:49] * thomnico (~thomnico@2a01:e35:8b41:120:4d4:ff04:2bff:227b) Quit (Ping timeout: 480 seconds)
[18:49] <KaneK> after osd is out shouldn???t ceph fix undersized pgs?
[18:49] <TheSov2> if it has enough spare disks yes, but you dont
[18:50] <TheSov2> it will also backfill if you remove the out disks
[18:50] <KaneK> why? I have available osds with 50% usage
[18:51] <TheSov2> how long had the osds been out?
[18:51] <KaneK> almost 24 hours
[18:51] * logan2 (~a@216.144.251.246) Quit ()
[18:51] <TheSov2> i am assuming you are using the default crush rules?
[18:52] * LDA (~DM@host217-114-156-249.pppoe.mark-itt.net) has joined #ceph
[18:52] <KaneK> yes, it???s default
[18:52] <TheSov2> ok
[18:52] <KaneK> I have flat hierarchy, host->osd
[18:53] <TheSov2> so default crush says, keep 3 copies of data on 3 seperate hosts
[18:53] <KaneK> yes, I still have 3 separate hosts
[18:53] <TheSov2> indeed
[18:53] <KaneK> and all remaining osds have 50% free space
[18:53] <KaneK> why it wouldn???t heal?
[18:53] <TheSov2> but you no longer have enough space to have 3 copies of data on 3 different hosts
[18:53] <TheSov2> osd 0 is 60 percent full
[18:53] <TheSov2> osd 1 is 69 percent full
[18:54] <TheSov2> osd 6 is 55 percent full
[18:54] <KaneK> wouldn???t it at least try to rebalance what possible?
[18:54] <TheSov2> no
[18:54] <KaneK> until it fills up to 95%
[18:54] <KaneK> or whatever full ratio warn is
[18:55] <TheSov2> an ez fix for now is can be to set copies = 2
[18:55] <TheSov2> or rather size
[18:55] <TheSov2> you have 2 pools
[18:55] <TheSov2> which pool has the important data in it
[18:55] <TheSov2> leave that at 3
[18:56] <KaneK> and it should start rebalancing itself after that?
[18:56] <KaneK> or pg repair would be needed?
[18:56] <TheSov2> in theory yes
[18:56] <KaneK> lemme try that
[18:56] * maku1 (~HoboPickl@104.238.169.53) Quit ()
[18:56] <KaneK> thanks for advice
[18:57] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) has joined #ceph
[18:57] <TheSov2> no probs
[18:57] <TheSov2> once you get how crush works stuff like that gets easy
[18:58] <TheSov2> now if i could only find myself a ceph crush for dummies book
[19:00] <KaneK> TheSov2: what does ???remapped??? state mean btw?
[19:00] <TheSov2> it means its repairing it
[19:00] <TheSov2> when its done that will disappear
[19:00] <KaneK> apparently I???ve got 3 active+undersized+degraded+remapped hanging forever in that state
[19:01] <TheSov2> did those 2 drives fail?
[19:01] <TheSov2> or did you pull them as a test
[19:01] <TheSov2> dont assume that, is there any activity in the clusteR?
[19:01] <TheSov2> ceph -s shows a continous output of whats happening in the cluster
[19:01] <KaneK> yeah, I pulled them out as test, trying to see if I can tolerate loosing 2 osds
[19:02] <KaneK> out of 3 replicas
[19:02] <TheSov2> it can
[19:02] <TheSov2> if you want it to heal from that you have to make sure whatever data is left as singular orphans is still ok
[19:02] <TheSov2> 3 copies is good a test
[19:03] <TheSov2> real life on the other hand needs in production would need EC or 4 to 5 copies depending on osd numbers
[19:03] <TheSov2> you never want to not have some redundancy
[19:03] <TheSov2> it floors me that some san storage is fully shelf dependant
[19:03] <KaneK> won???t 3 copies will let me loose 2 osds?
[19:04] <TheSov2> it would if you had more than 2 per host
[19:04] <TheSov2> look at it like this
[19:04] <TheSov2> if u have 3 hosts and 3 disks
[19:04] <TheSov2> 1 in each host
[19:04] <TheSov2> and u lose 1
[19:04] <TheSov2> you still have 2 copies but it wont "heal"
[19:04] <TheSov2> if you lose 2, same situation
[19:05] * EinstCrazy (~EinstCraz@117.15.122.189) has joined #ceph
[19:05] <KaneK> right, I???m going to have at least 4 hosts and pool size of 3
[19:05] * linjan (~linjan@176.195.31.43) has joined #ceph
[19:05] <TheSov2> now u have 2 disks each
[19:05] <TheSov2> which is great
[19:05] <TheSov2> but the same rule applies
[19:05] <TheSov2> you need enough space for 3 total copies, 1 on each node
[19:05] <TheSov2> u didnt have that
[19:06] <KaneK> I was expecting it to start heal and stop when reaching close to full ratio
[19:06] <KaneK> that???s wrong expectation?
[19:06] <TheSov2> its not wrong, it just doesnt work that way
[19:07] <KaneK> got it
[19:07] <TheSov2> if you have 4 nodes
[19:07] <TheSov2> ther is enough play to lose a whole node with 0 issue
[19:07] * shylesh (~shylesh@59.95.71.70) Quit (Remote host closed the connection)
[19:08] <KaneK> great, sounds good then
[19:08] <TheSov2> is your status health_ok yet?
[19:09] * Guest1520 is now known as DanFoster
[19:09] <KaneK> no yet, looks like progress stopped
[19:09] <KaneK> I???ve changed pool size to 2
[19:09] <KaneK> it started scrubbing
[19:09] <KaneK> then stopped
[19:09] <KaneK> end state is: 2015-12-16 12:03:36.409941 mon.0 [INF] HEALTH_WARN; 19 pgs degraded; 19 pgs stuck degraded; 2 pgs stuck inactive; 43 pgs stuck unclean; 19 pgs stuck undersized; 19 pgs undersized; recovery 485/43289 objects degraded (1.120%); recovery 1844/43289 objects misplaced (4.260%)
[19:10] <TheSov2> wow
[19:10] <KaneK> ceph osd df:
[19:10] <TheSov2> its rebalancing
[19:10] <TheSov2> LOL
[19:10] <KaneK> [root@ceph0001s1dfw2 ~]# ceph osd df
[19:10] <KaneK> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
[19:10] <KaneK> 4 0.09270 0 0 0 0 0 0 0
[19:10] <KaneK> 0 0.09270 1.00000 97231M 37077M 60153M 38.13 0.87 93
[19:10] <KaneK> 5 0.09270 0 0 0 0 0 0 0
[19:10] <KaneK> 1 0.09270 1.00000 97231M 53028M 44202M 54.54 1.24 113
[19:10] <KaneK> 6 0.09270 1.00000 97231M 44313M 52917M 45.58 1.04 87
[19:10] <KaneK> 2 0.09270 1.00000 97231M 36613M 60618M 37.66 0.86 80
[19:10] <KaneK> TOTAL 379G 167G 212G 43.98
[19:10] <KaneK> looks like it stuck there
[19:10] * avib (~Ceph@al.secure.elitehosts.com) Quit (Remote host closed the connection)
[19:10] * cephuser__ (~Ceph@al.secure.elitehosts.com) Quit (Remote host closed the connection)
[19:10] * cephuser_ (~Ceph@al.secure.elitehosts.com) Quit (Remote host closed the connection)
[19:10] <KaneK> no more activity
[19:10] <TheSov2> use ceph -s
[19:10] <TheSov2> it will tell you whats going on
[19:10] <TheSov2> its continous
[19:10] <KaneK> nothing
[19:10] <KaneK> ceph -w actually
[19:11] <TheSov2> one of thems
[19:11] <TheSov2> i forgot
[19:11] <KaneK> yeah nothing is happening
[19:11] <KaneK> last line is that HEALTH_WARN state and that???s it
[19:12] <TheSov2> odd
[19:12] <KaneK> yeah, looks like my cluster is broken
[19:12] <KaneK> I can try pg repair?
[19:13] <KaneK> this is really weird, I???d not expect it to break loosing 2 osds
[19:15] * EinstCrazy (~EinstCraz@117.15.122.189) Quit (Ping timeout: 480 seconds)
[19:15] * jordanP (~jordan@204.13-14-84.ripe.coltfrance.com) Quit (Quit: Leaving)
[19:17] <davidz1> KaneK: So another ceph -s doesn't show changes of these numbers "recovery 1844/43289 objects misplaced (4.260%)"?
[19:17] <KaneK> nop
[19:17] <KaneK> it stuck in this state for ~10 minutes for now
[19:18] <KaneK> ceph -w doesn???t advance, ceph -s doesn???t have any changes in numbers
[19:19] <davidz1> KaneK: undersized means it can't find places to create 3 copies using the crush rules.
[19:20] * TMM (~hp@178-84-46-106.dynamic.upc.nl) has joined #ceph
[19:20] <KaneK> yep, but I have place - there are 3 hosts available
[19:20] <KaneK> and I???ve changed pool size to 2
[19:20] <KaneK> now there should be enough space for it (I assume)
[19:21] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[19:21] * shohn (~shohn@dslb-188-102-031-075.188.102.pools.vodafone-ip.de) Quit (Ping timeout: 480 seconds)
[19:22] * reed (~reed@2607:f298:a:607:8de3:3d27:1b21:9029) Quit (Quit: Ex-Chat)
[19:25] * bene_in_mtg (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[19:25] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) Quit (Ping timeout: 480 seconds)
[19:28] * BlS (~luckz@85.17.25.22) has joined #ceph
[19:29] <davidz1> KaneK: What version of ceph are you running?
[19:29] <KaneK> infernalis
[19:29] <KaneK> I brought those 2 osds back - cluster started recovering
[19:30] * daviddcc (~dcasier@LAubervilliers-656-1-16-160.w217-128.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[19:30] <KaneK> gonna try to put them down again (after/if cluster is healthy)
[19:30] * martinohansen (~martinoha@185.85.5.78) Quit (Quit: My Mac has gone to sleep. ZZZzzz???)
[19:30] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) has joined #ceph
[19:34] * martinohansen (~martinoha@185.85.5.78) has joined #ceph
[19:34] * ade (~abradshaw@tmo-096-245.customers.d1-online.com) has joined #ceph
[19:34] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) Quit (Quit: Ex-Chat)
[19:35] * ade (~abradshaw@tmo-096-245.customers.d1-online.com) Quit (Remote host closed the connection)
[19:35] * owasserm (~owasserm@2001:984:d3f7:1:5ec5:d4ff:fee0:f6dc) has joined #ceph
[19:35] * sleinen1 (~Adium@2001:620:0:82::103) Quit (Ping timeout: 480 seconds)
[19:39] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph
[19:40] * dgurtner (~dgurtner@84.203.232.226) Quit (Ping timeout: 480 seconds)
[19:40] * bara (~bara@ip4-83-240-10-82.cust.nbox.cz) has joined #ceph
[19:40] * bara (~bara@ip4-83-240-10-82.cust.nbox.cz) Quit ()
[19:40] * enax (~enax@94-21-125-67.pool.digikabel.hu) has joined #ceph
[19:45] * krypto (~krypto@G68-121-13-149.sbcis.sbc.com) has joined #ceph
[19:48] * mattbenjamin1 (~mbenjamin@aa2.linuxbox.com) has joined #ceph
[19:49] * mattbenjamin1 (~mbenjamin@aa2.linuxbox.com) Quit ()
[19:49] * mattbenjamin1 (~mbenjamin@aa2.linuxbox.com) has joined #ceph
[19:50] * Concubidated (~Adium@pool-98-119-93-148.lsanca.fios.verizon.net) Quit (Remote host closed the connection)
[19:52] * mattbenjamin1 (~mbenjamin@aa2.linuxbox.com) has left #ceph
[19:53] * mattbenjamin1 (~mbenjamin@aa2.linuxbox.com) has joined #ceph
[19:57] * angdraug (~angdraug@c-69-181-140-42.hsd1.ca.comcast.net) has joined #ceph
[19:58] * lpabon (~quassel@24-151-54-34.dhcp.nwtn.ct.charter.com) Quit (Remote host closed the connection)
[19:58] * jasuarez (~jasuarez@237.Red-83-39-111.dynamicIP.rima-tde.net) has joined #ceph
[19:59] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) Quit (Ping timeout: 480 seconds)
[19:59] * ngoswami (~ngoswami@114.143.45.41) Quit (Quit: Leaving)
[20:00] * johnavp19891 (~jpetrini@8.39.115.8) has joined #ceph
[20:00] * BlS (~luckz@4MJAAAHPQ.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[20:01] * enax (~enax@94-21-125-67.pool.digikabel.hu) Quit (Ping timeout: 480 seconds)
[20:02] * sleinen2 (~Adium@84-72-160-233.dclient.hispeed.ch) has joined #ceph
[20:03] * sleinen2 (~Adium@84-72-160-233.dclient.hispeed.ch) Quit ()
[20:03] * sleinen2 (~Adium@2001:620:1000:3:7ed1:c3ff:fedc:3223) has joined #ceph
[20:05] * LPG (~LPG@c-50-181-212-148.hsd1.wa.comcast.net) Quit (Remote host closed the connection)
[20:05] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[20:06] * nils_ (~nils_@doomstreet.collins.kg) has joined #ceph
[20:06] * daviddcc (~dcasier@80.12.55.163) has joined #ceph
[20:10] * alram (~alram@cpe-172-250-2-46.socal.res.rr.com) has joined #ceph
[20:12] * nils_ (~nils_@doomstreet.collins.kg) Quit (Read error: Connection reset by peer)
[20:18] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[20:18] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[20:20] * johnavp1989 (~jpetrini@8.39.115.8) has joined #ceph
[20:20] * dgbaley27 (~matt@75.148.118.217) has joined #ceph
[20:20] * Bj_o_rn (~Sigma@178.32.251.105) has joined #ceph
[20:21] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) has joined #ceph
[20:22] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) Quit ()
[20:24] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) has joined #ceph
[20:24] * johnavp19891 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[20:27] * beardo_ (~sma310@207-172-244-241.c3-0.atw-ubr5.atw.pa.cable.rcn.com) has joined #ceph
[20:30] <TheSov2> im out there evangelizing ceph every day
[20:30] <TheSov2> because i need to destroy all storage companies
[20:30] <TheSov2> they are a virus foisted on us by large and shitty corporations
[20:31] <TheSov2> Compellent, EMC, EQUALLOGIC, 3PAR, ALL YOU DIE!@
[20:33] * leseb- is now known as leseb_away
[20:34] * Rachana (~Rachana@2601:87:3:3601::65f2) Quit (Quit: Leaving)
[20:35] * Rachana (~Rachana@2601:87:3:3601::766) has joined #ceph
[20:37] * daviddcc (~dcasier@80.12.55.163) Quit (Ping timeout: 480 seconds)
[20:39] * mgolub (~Mikolaj@91.225.201.143) has joined #ceph
[20:39] * beardo_ (~sma310@207-172-244-241.c3-0.atw-ubr5.atw.pa.cable.rcn.com) Quit (Ping timeout: 480 seconds)
[20:41] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) Quit (Ping timeout: 480 seconds)
[20:42] * mgolub (~Mikolaj@91.225.201.143) Quit ()
[20:44] * mykola (~Mikolaj@91.225.200.204) Quit (Ping timeout: 480 seconds)
[20:47] * krypto (~krypto@G68-121-13-149.sbcis.sbc.com) Quit (Ping timeout: 480 seconds)
[20:47] * enax (~enax@94-21-125-67.pool.digikabel.hu) has joined #ceph
[20:48] * krypto (~krypto@G68-121-13-5.sbcis.sbc.com) has joined #ceph
[20:50] * Bj_o_rn (~Sigma@84ZAAAAR1.tor-irc.dnsbl.oftc.net) Quit ()
[20:53] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) has joined #ceph
[20:55] * lpabon (~quassel@24-151-54-34.dhcp.nwtn.ct.charter.com) has joined #ceph
[20:58] * enax (~enax@94-21-125-67.pool.digikabel.hu) Quit (Ping timeout: 480 seconds)
[20:59] * daviddcc (~dcasier@80.12.55.163) has joined #ceph
[20:59] * Mined is now known as MinedAWAY
[21:06] <snakamoto> .
[21:06] <TheSov2> ..
[21:06] <snakamoto> basically, screw HP and Dell at this point
[21:10] * ptx0 (~kash@megatron.tripleback.net) has joined #ceph
[21:10] * leseb_away is now known as leseb-
[21:10] <TheSov2> yep
[21:11] * leseb- is now known as leseb_away
[21:11] <ptx0> i've got two nodes running ZFS + iscsi targets, i'm trying to come up with a clean way to do per-volume redundancy instead of whole-server master/slave relationship
[21:12] <ptx0> right now i have some custom stuff with DB and a UI that allows configuration of boot modes, but it's always one IP per iSCSI node to connect to, no elastic/virtual IP stuff
[21:13] * _s1gma (~Ian2128@185.100.86.69) has joined #ceph
[21:13] <TheSov2> ptx0, ?
[21:13] <TheSov2> what do you mean ceph is a peered system
[21:13] <ptx0> so if you boot to a slave node and it goes down you'll need to reboot or the target will have to reconnect somehow to the same IP because there's no IP takeover
[21:13] <ptx0> i netboot iSCSI clients (Windows)
[21:14] <ptx0> if the iscsi target goes down they blue screen and reboot but i'd love for it to not happen that way
[21:14] <TheSov2> ptx0, i dont know what you are talking about, are you using a ceph to iscsi setup?
[21:14] <ptx0> i dont want one sereer to be unused either, i do have the ability to add a couple tables in my DB for virtual IPs and their assignments
[21:15] <ptx0> TheSov2: right now there is no ceph, i'm imagining my storage as shared so i can come up with a good design to implement
[21:16] <TheSov2> well because ceph is peered, there no master/slave
[21:16] <ptx0> i have a UI, two servers, and i can have it deploy multiple OSDs, one per ZVOL (ZFS volume) - but the servers aren't using virtual IPs so there's no capability to move the IP from one node to another for takeover
[21:16] <TheSov2> if a node goes down the cluster kind of routes around it
[21:16] <ptx0> right but there's only one iscsi target
[21:16] <ptx0> because the target daemon is not clustered it's not aware of other nodes
[21:16] <TheSov2> wait you are building ceph onto traditional storage?
[21:17] <TheSov2> I'm very confused as to what you are doing
[21:17] <ptx0> i've got windows and linux clients netbooting from a zfs/iscsi server network, there's multiple servers and they replicate using zfs send/recv
[21:18] <TheSov2> ok
[21:18] <ptx0> it's not really ideal because if a iscsi target goes down, the windows/linux client can not do IO
[21:18] <TheSov2> yes that sounds about right
[21:19] <ptx0> i'm wondering if i should create like, a virtual IP for each server
[21:19] <ptx0> that can be reassigned to any other node if needed
[21:19] <theengineer> multipathing and a VIP?
[21:20] * pabluk_ is now known as pabluk
[21:20] <ptx0> TheSov2: i can create volumes in ZFS that are a fixed size, like 50G, on each node, right
[21:20] <TheSov2> yes
[21:20] <TheSov2> zvols
[21:20] <ptx0> then make a RBD on top of it
[21:21] <TheSov2> you could but that would be bad
[21:21] <ptx0> why?
[21:21] <TheSov2> ceph is a object based storage, that uses XFS which is cow
[21:21] <TheSov2> putting an osd on top of that would be a cow on cow
[21:21] <TheSov2> and thats bad
[21:22] <TheSov2> much like zfs osd's should be on hardware
[21:22] <TheSov2> you can do what you said but it will me like molasses in winter
[21:23] * mtanski (~mtanski@65.244.82.98) has joined #ceph
[21:24] * bene2 (~bene@nat-pool-bos-t.redhat.com) Quit (Ping timeout: 480 seconds)
[21:24] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[21:24] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Remote host closed the connection)
[21:26] * derjohn_mob (~aj@fw.gkh-setu.de) Quit (Ping timeout: 480 seconds)
[21:29] * bene2 (~bene@nat-pool-bos-t.redhat.com) has joined #ceph
[21:30] <ptx0> i'm more interested in massive reliability than speed with this setup
[21:30] <TheSov2> then you would drop your zfs
[21:31] <TheSov2> setup ceph, and put zfs ontop of RBD
[21:31] <ptx0> you can't do that because ZFS is not cluster-aware
[21:31] <TheSov2> i know
[21:31] <TheSov2> it doesnt have to be
[21:31] <theengineer> don't use ZFS?
[21:31] <TheSov2> the lowest level no
[21:31] <TheSov2> why would you?
[21:32] * tsg (~tgohad@192.55.54.43) Quit (Remote host closed the connection)
[21:32] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[21:32] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[21:32] <TheSov2> zfs is like raid
[21:32] <ptx0> because i like its ability to clone volumes, and it has a decent management interface, plus the whole admin backend is already set up to use it, i'd love to be able to just make its failover seamless
[21:32] <TheSov2> ceph does that too
[21:32] <theengineer> zfs? on solaris? freebsd?
[21:32] <ptx0> linux, solaris and freebsd
[21:32] <theengineer> are these VM's, zones?
[21:32] <TheSov2> wait i think we are confused as to what i mean
[21:32] <ptx0> containers, VMs, bare metal machines
[21:33] <theengineer> so a ton of randoms hit
[21:33] <TheSov2> on the hardware, you install linux and then ceph
[21:33] <TheSov2> once the ceph cluster is up
[21:33] <TheSov2> you patch an RBD to another system and u can format that as zfs
[21:33] <TheSov2> use that for your vm's
[21:33] <theengineer> setup a gateway for RBD, then setup your initiators on that gateway
[21:33] <ptx0> right but it's still not active-active
[21:34] <TheSov2> oh so you want an active active setup for iscsi
[21:34] <ptx0> i don't want a single gateway
[21:34] <ptx0> right
[21:34] <TheSov2> then you cannot use ceph at the moment
[21:34] <TheSov2> you can only do active passive
[21:34] <ptx0> i want all servers to be a gateway for their own little clustered volumes
[21:34] <TheSov2> using corosync or something
[21:34] <ptx0> my solution is to use zvol's with OSDs, i'd have to test this though
[21:34] <TheSov2> im doing that in my test evironment now
[21:35] <ptx0> i've run CoW on CoW several times with satisfactory responses
[21:35] <TheSov2> using cephfs and ganesha
[21:35] <TheSov2> i have a cephfs storage
[21:35] <TheSov2> mapped to several servers
[21:35] <TheSov2> using ganesha to export cephfs as nfs
[21:35] <TheSov2> each nfs server has a corosponding host
[21:35] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) Quit (Ping timeout: 480 seconds)
[21:35] <TheSov2> since the nfs servers run in userland, when vmware locks a file
[21:35] <TheSov2> none of the other nfs servers can access it
[21:36] <TheSov2> because the file is locked according to cephfs
[21:36] * BManojlovic (~steki@cable-89-216-231-142.dynamic.sbb.rs) has joined #ceph
[21:36] <ptx0> i think my best hope without completely reworking how the system talks to its storage, is to allow assignment of an iSCSI VIP to each gateway, and when it goes down, elect a new master based on resource consumption & health.. which i can do easily
[21:37] <ptx0> s/new master/new node to takeover that IP/
[21:37] <TheSov2> I was going to say you could build a HA proxy cluster with several iscsi hosts
[21:37] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[21:37] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[21:37] <TheSov2> or do an active passive deally
[21:37] <TheSov2> using heartbeat and corosync
[21:37] <ptx0> i was thinking before i came in here that i'd have to create a virtual IP pool and allow assignment on a per-device basis but it makes more sense to just give one per host
[21:38] * martinohansen (~martinoha@185.85.5.78) Quit (Quit: Textual IRC Client: www.textualapp.com)
[21:38] <ptx0> because it's not just going to be one device on a server, with an issue, it'll be all of the services there that must be fenced off
[21:39] <TheSov2> you could put a gateway on each host itself
[21:39] <TheSov2> hrh
[21:39] <TheSov2> heh
[21:39] <ptx0> that's how it is now, they only have two IPs though, one for internet and one for iSCSI/replication
[21:39] <TheSov2> ptx0, what is your plan for the iscsi
[21:40] <TheSov2> because if you are using kvm
[21:40] <ptx0> i do have libvirt in the equation but i need to remain compatible with bare-metal Windows installations
[21:40] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) has joined #ceph
[21:40] * tsg (~tgohad@134.134.139.77) has joined #ceph
[21:40] <TheSov2> ahh you are ahead of me eh?
[21:41] <ptx0> yeah i've been designing this for years
[21:41] <ptx0> i didn't like openstack so i created my own alternative
[21:41] <ptx0> woo
[21:41] <TheSov2> reinventing the wheel
[21:41] <TheSov2> hah
[21:41] <TheSov2> i love it
[21:41] <ptx0> yep, like Michelin did with the Tweel
[21:41] <ptx0> :)
[21:41] <TheSov2> no one has the engineering spirit these days
[21:42] <TheSov2> everyone just wants to buy whatever the big boys are selling
[21:42] <ptx0> i just want magic to be real
[21:42] <ptx0> i've already got windows booting from ZFS which feels pretty great :)
[21:42] * _s1gma (~Ian2128@76GAAAAWA.tor-irc.dnsbl.oftc.net) Quit ()
[21:43] <TheSov2> pretty impressive
[21:43] <TheSov2> qlogic?
[21:43] <ptx0> and i have the system abstracted enough that i could add ceph alongside ZFS so you'd do either-or but then i'd have to figure out how to import from ZFS to Ceph and vice-versa
[21:43] <ptx0> no, iPXE
[21:43] <TheSov2> right but at some point you have to load ntfs
[21:43] <TheSov2> so whats in between
[21:43] <TheSov2> if not an HBA
[21:43] <ptx0> windows has its own native stuff
[21:44] <ptx0> it loads int 13 from BIOS
[21:44] <ptx0> also known as ibft, iscsi boot firmware tables
[21:44] <TheSov2> right but is that software?
[21:44] <ptx0> so ipxe loads the iscsi target and windows says okey and loads up, but it's pretty specific, you have to install windows directly to the target usually
[21:44] <ptx0> yes
[21:44] <TheSov2> or is that something within your bios
[21:45] <TheSov2> ah ok
[21:45] <TheSov2> thats what i was asking
[21:45] <ptx0> it works on most PCs ive tested except for Macs
[21:45] <ptx0> and VirtualBox has some of its own unique issues
[21:45] <TheSov2> so it loads a linux kernel, then iscsi, then the windows kernel
[21:45] <ptx0> no, ipxe has its own iscsi drivers
[21:45] <TheSov2> oh really
[21:45] <ptx0> it has a UNDI driver layer afaik
[21:45] <TheSov2> where might i fight this ipxe
[21:45] <lincolnb> anyone ever have trouble getting iostats out of nvmes?
[21:45] <ptx0> www.ipxe.org
[21:45] * tsg (~tgohad@134.134.139.77) Quit (Remote host closed the connection)
[21:46] <theengineer> you should be able to use pacemaker to do active/active
[21:48] <theengineer> heres an NFS example http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/ You should be able to do similar with iscsi / mulltipathing and a VIP then do redundancy across the switches
[21:49] * valeech (~valeech@pool-108-44-162-111.clppva.fios.verizon.net) has joined #ceph
[21:49] * thomnico (~thomnico@2a01:e35:8b41:120:e1bb:6cfe:5c3:27c8) Quit (Quit: Ex-Chat)
[21:52] <TheSov2> how do you controle which iscsitarget systems connect to via pxe?
[21:53] <ptx0> it's the PHP scripts that are hit during the boot process
[21:53] <TheSov2> ok but how do you specifiy that?
[21:53] <ptx0> there's a web UI
[21:53] <TheSov2> how does php know lol
[21:53] <TheSov2> system guid or something?
[21:54] <ptx0> and it saves to a DB, which the servers download when there is a change to be made
[21:54] <ptx0> the device is assigned a server ID and then the php boot scripts pull the server IP from the server ID row
[21:54] <TheSov2> so the pxe boots, does a undi driver load, assigns a ip from dhcp?
[21:54] <TheSov2> or uses the pxe ip?
[21:55] <ptx0> if the device is in round-robin or loadbalanced mode, it'll pull the servers in a circle, or based on a weighted list that the web UI made when it collected statistics and set a "score" for the server based on # clients connected, IOPS, bw usage etc
[21:55] <ptx0> it first loads from BOOTP/DHCP the bootstrap IP and then sets a static IP that is assigned via the DB
[21:56] <TheSov2> so is the ipxe server HA or no?
[21:56] <ptx0> each server has its own DHCP and TFTP so yes
[21:56] <TheSov2> wow that sounds nice
[21:56] <ptx0> each server advertises DHCP and tells the connecting device to use its own IP for TFTP
[21:56] <TheSov2> we have some prod systems that currently boot from SDcard a full centos
[21:56] <ptx0> after TFTP it is instructed by the panel / DB where to go next
[21:56] <TheSov2> and they die pretty fast
[21:56] <TheSov2> so this may save us
[21:57] <ptx0> i'm loading CentOS from this for dev work for contracts
[21:57] <TheSov2> i hate centos but thats what we use
[21:57] <ptx0> i can create & destroy virtual environments like no one's business :P
[21:57] * tsg (~tgohad@134.134.139.77) has joined #ceph
[21:57] <ptx0> i'm pretty good at screwing CentOS up, so it's good i can revert it just by rebooting etc
[21:58] * linuxkidd (~linuxkidd@49.sub-70-209-96.myvzw.com) Quit (Ping timeout: 480 seconds)
[21:58] <ptx0> i put in a new feature to my panel last week to assign persistent data volumes to devices, because before only one iSCSI volume was attached, the boot volume
[21:58] <ptx0> now it will add some disks to libvirt native or iscsi for bare metal
[21:58] <TheSov2> so when ipxe loads connects to iscsi and starts windows booting, the iscsi disk is C:
[21:59] * danieagle (~Daniel@177.188.66.106) Quit (Quit: Obrigado por Tudo! :-) inte+ :-))
[21:59] <ptx0> yeah or whatever it ends up being, doesn't need to be C
[21:59] <TheSov2> i know but its persistent eh?
[21:59] <ptx0> yes
[21:59] <ptx0> and you can load identical machines off the same volumes
[21:59] <TheSov2> ?
[21:59] <ptx0> if they have different NIC, windows will not want to boot, though there are workarounds
[21:59] <TheSov2> what?!?! mind=blown
[22:00] <TheSov2> how do you assign system names?
[22:00] <ptx0> CentOS i've been able to pretty much load anywhere.. like green eggs and ham.
[22:00] <ptx0> they don't. that's another one of those pesky limits.
[22:00] <TheSov2> well we dont use netbios so it doesnt matter for us
[22:00] <ptx0> i've integrated PowerDNS so it can assign DNS records in the future
[22:01] <TheSov2> we are 90 percent linux anyway
[22:01] <ptx0> Linux could probably boot right from Ceph, too
[22:01] <TheSov2> i was gonna say they need to add librados to ipxe and use that
[22:01] <TheSov2> shit that would be amazing
[22:01] <ptx0> more like to windows
[22:02] <ptx0> windows needs the ability to load the disk after ipxe or else it's gonna bsod and crash
[22:02] <TheSov2> you just said the iscsi was persistent
[22:02] <ptx0> windows must be intelligent too though
[22:02] <ptx0> it's driver / protocol related
[22:02] <ptx0> linux netboot clients will be fine
[22:02] <TheSov2> no i mean the windows iscsi service is not required to boot up right
[22:03] <ptx0> i guess it is for iscsi boot, but it wouldn't if you had librados
[22:03] <TheSov2> why not?
[22:03] <TheSov2> its basically the same
[22:03] <ptx0> tbh i'm not sure how MS does their iscsi boot stuff, it does look at the BIOS tables for some info about where to go but it's all closed source
[22:03] <TheSov2> in fact i would probably say that iscsi is more complex than ceph requests
[22:03] <ptx0> so when it breaks i'm left scratching my head
[22:04] <ptx0> Windows workstations seem to netboot more reliably than MS Server
[22:04] <ptx0> which probably require an HBA
[22:04] * coreping (~Michael_G@n1.coreping.org) has joined #ceph
[22:04] <ptx0> i shouldn't say "more reliably" because we never really had any issues with windows workstations
[22:05] <ptx0> probably because the workstations reprovision every time you reboot it
[22:05] <ptx0> aha
[22:05] <ptx0> how can your windows disks clutter up with trash when there is no cluttering to be had? >:)
[22:07] * rotbart (~redbeard@aftr-95-222-29-74.unity-media.net) has joined #ceph
[22:07] * linuxkidd (~linuxkidd@49.sub-70-209-96.myvzw.com) has joined #ceph
[22:07] <TheSov2> LOL
[22:08] <ptx0> in any case, iSCSI is not the same protocol so we'll stll need a custom driver for windows to boot from RADOS :)
[22:08] <TheSov2> yes
[22:08] <TheSov2> I am still left scratching my head as to why there is no ceph client for windows
[22:08] <ptx0> because the Ceph developers probably cringe when they think about working on MS products
[22:09] <qman> also driver signing
[22:09] <TheSov2> forget driver signing i mean just a plain vanilla exe
[22:10] <TheSov2> you know how badly i need comvault to have access to ceph
[22:10] * tnt (~tnt@ec2-54-200-98-43.us-west-2.compute.amazonaws.com) has left #ceph
[22:12] <ptx0> pnfs?
[22:13] <TheSov2> pnfs is awesome
[22:13] <TheSov2> even vmware supports it now
[22:13] <TheSov2> its nfs with multipathing
[22:18] * cathode (~cathode@50-198-166-81-static.hfc.comcastbusiness.net) has joined #ceph
[22:22] * nardial (~ls@dslb-088-072-085-164.088.072.pools.vodafone-ip.de) has joined #ceph
[22:32] * rendar (~I@95.239.179.247) Quit (Ping timeout: 480 seconds)
[22:34] * BManojlovic (~steki@cable-89-216-231-142.dynamic.sbb.rs) Quit (Remote host closed the connection)
[22:34] * rendar (~I@95.239.179.247) has joined #ceph
[22:37] * leseb_away is now known as leseb-
[22:37] * leseb- is now known as leseb_away
[22:45] * brad[] (~Brad@184-94-17-26.dedicated.allstream.net) has joined #ceph
[22:45] * mLegion (~BlS@176.56.230.162) has joined #ceph
[22:46] <brad[]> Hi all, I've prepared and activated 9 OSD's which are situated on the fifth partition of a GPT formatted disk
[22:47] <brad[]> This worked fine until I restarted one of the systems, and now ceph-disk activate-all seems to do nothing, and the partitions aren't detected with the included udev rules
[22:47] <brad[]> Any chance I'm missing something obvious?
[22:48] * linjan (~linjan@176.195.31.43) Quit (Remote host closed the connection)
[22:48] * Concubidated (~Adium@pool-98-119-93-148.lsanca.fios.verizon.net) has joined #ceph
[22:49] * linjan (~linjan@176.195.31.43) has joined #ceph
[22:50] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) has joined #ceph
[22:50] * harold (~hamiller@71-94-227-123.dhcp.mdfd.or.charter.com) Quit ()
[22:51] <TheSov2> brad[], mount the partition to /var/ceph/osd/blabh blah osdnumber here
[22:51] <TheSov2> and then start the osd service
[22:51] <TheSov2> and why on earth are you putting ceph osd on disks with other data
[22:52] <brad[]> it's a sordid tale of woe
[22:52] <brad[]> TheSov: That worked. Any idea why udev didn't detect the partition?
[22:52] <brad[]> or, any way to find out
[22:52] <TheSov2> it has to do with not using the guid value of the diskj
[22:53] <TheSov2> i noticed it happened any time the osd was not the first partition on disk
[22:53] <brad[]> ahh
[22:54] * libracious_ (uid95058@2001:67c:2f08:6::1:7352) Quit (Quit: Connection closed for inactivity)
[22:54] * leseb_away is now known as leseb-
[22:54] * leseb- is now known as leseb_away
[22:55] <brad[]> and healthy pool. awesome.
[22:55] <brad[]> I'll stop using partitions soon I swear
[22:55] <TheSov2> so dont shutdown your osds maybe?
[22:56] <brad[]> ?
[22:57] <TheSov2> lol
[22:57] <TheSov2> next time u reboot that the osd wont mount again
[22:57] <TheSov2> you have to add the mount to fstab or something
[22:58] <brad[]> ah lol
[22:58] <brad[]> yeah. Added to fstab.
[22:59] <brad[]> I'm glad I encountered this, I want to move some of the OSD's to a new node and I would have had a bit of a panic if they didn't automount.
[22:59] <brad[]> Am I right in assuming the name of the mountpoint is arbitrary? It gets the OSD number from an id file in there?
[23:03] * LDA (~DM@host217-114-156-249.pppoe.mark-itt.net) Quit (Quit: Nettalk6 - www.ntalk.de)
[23:03] <TheSov2> guid value i think
[23:03] <TheSov2> i dont remember
[23:10] * TheSov2 (~TheSov@cip-248.trustwave.com) Quit (Read error: Connection reset by peer)
[23:12] * dyasny (~dyasny@dsl.dynamic.191-116-74-190.electronicbox.net) Quit (Ping timeout: 480 seconds)
[23:14] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[23:14] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[23:15] <cholcombe> With erasure coded pools does the client or the osd do the data splitting into chunks? I think it's the primary osd
[23:15] * dyasny (~dyasny@dsl.198.58.153.172.ebox.ca) has joined #ceph
[23:15] * mLegion (~BlS@76GAAAAZH.tor-irc.dnsbl.oftc.net) Quit ()
[23:15] <cholcombe> I just found my answer in the architecture docs :). Looks like the primary receives all writes and then splits
[23:19] * lpabon (~quassel@24-151-54-34.dhcp.nwtn.ct.charter.com) Quit (Ping timeout: 480 seconds)
[23:22] * johnavp1989 (~jpetrini@8.39.115.8) Quit (Ping timeout: 480 seconds)
[23:24] * leseb_away is now known as leseb-
[23:24] * leseb- is now known as leseb_away
[23:25] * nardial (~ls@dslb-088-072-085-164.088.072.pools.vodafone-ip.de) Quit (Quit: Leaving)
[23:26] * Ceph-Log-Bot (~logstash@185.66.248.215) has joined #ceph
[23:26] * Ceph-Log-Bot (~logstash@185.66.248.215) Quit (Read error: Connection reset by peer)
[23:29] * leseb_away is now known as leseb-
[23:29] * leseb- is now known as leseb_away
[23:30] * jasuarez (~jasuarez@237.Red-83-39-111.dynamicIP.rima-tde.net) Quit (Ping timeout: 480 seconds)
[23:30] * sleinen2 (~Adium@2001:620:1000:3:7ed1:c3ff:fedc:3223) Quit (Quit: Leaving.)
[23:32] * diegows (~diegows@190.190.21.75) has joined #ceph
[23:33] * moore (~moore@64.202.160.88) Quit (Remote host closed the connection)
[23:34] * doppelgrau (~doppelgra@p5DC0731D.dip0.t-ipconnect.de) Quit (Quit: doppelgrau)
[23:34] * moore (~moore@64.202.160.88) has joined #ceph
[23:42] * moore (~moore@64.202.160.88) Quit (Ping timeout: 480 seconds)
[23:43] * Concubidated (~Adium@pool-98-119-93-148.lsanca.fios.verizon.net) Quit (Quit: Leaving.)
[23:44] * sep76 (~sep@95.62-50-191.enivest.net) Quit (Ping timeout: 480 seconds)
[23:44] * pabluk is now known as pabluk_
[23:48] * wjw-freebsd (~wjw@smtp.digiware.nl) Quit (Read error: Connection reset by peer)
[23:50] * shawniverson (~shawniver@69.174.160.25) has joined #ceph
[23:50] * wjw-freebsd (~wjw@smtp.digiware.nl) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.