#ceph IRC Log

Index

IRC Log for 2013-12-30

Timestamps are in GMT/BST.

[0:00] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Remote host closed the connection)
[0:01] * bjornar (~bjornar@ti0099a340-dhcp0395.bb.online.no) Quit (Ping timeout: 480 seconds)
[0:02] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Remote host closed the connection)
[0:02] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[0:06] * AjoCebollaSisal (~AjoCeboll@200.79.253.35) Quit (autokilled: Do not spam. Mail support@oftc.net with questions. (2013-12-29 23:06:24))
[0:10] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Ping timeout: 480 seconds)
[0:11] * runfromnowhere (~runfromno@pool-108-29-25-203.nycmny.fios.verizon.net) has joined #ceph
[0:13] * rendar (~s@host141-177-dynamic.1-87-r.retail.telecomitalia.it) Quit ()
[0:22] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has joined #ceph
[0:22] * MarkN (~nathan@142.208.70.115.static.exetel.com.au) has left #ceph
[0:24] * AfC (~andrew@2407:7800:400:1011:6e88:14ff:fe33:2a9c) Quit (Quit: Leaving.)
[0:25] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[0:27] * wschulze1 (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[0:32] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[0:32] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[0:34] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[0:34] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Ping timeout: 480 seconds)
[0:36] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[0:49] * sarob (~sarob@76.197.12.195) Quit (Remote host closed the connection)
[0:49] * sarob (~sarob@76.197.12.195) has joined #ceph
[0:50] * AfC (~andrew@2407:7800:400:1011:6e88:14ff:fe33:2a9c) has joined #ceph
[0:50] * Dark-Ace-Z (~BillyMays@50-32-37-62.drr01.hrbg.pa.frontiernet.net) Quit (Ping timeout: 480 seconds)
[0:50] * codice_ (~toodles@75-140-64-194.dhcp.lnbh.ca.charter.com) has joined #ceph
[0:52] * AfC (~andrew@2407:7800:400:1011:6e88:14ff:fe33:2a9c) Quit ()
[0:52] * codice (~toodles@75-140-64-194.dhcp.lnbh.ca.charter.com) Quit (Ping timeout: 480 seconds)
[0:57] * sarob (~sarob@76.197.12.195) Quit (Ping timeout: 480 seconds)
[0:59] * Muhlemmer (~kvirc@cable-90-50.zeelandnet.nl) Quit (Ping timeout: 480 seconds)
[1:04] * sleinen (~Adium@2001:620:0:25:493:17f3:a294:b20) Quit (Quit: Leaving.)
[1:04] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[1:05] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[1:06] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[1:07] * lightspeed (~lightspee@2001:8b0:16e:1:216:eaff:fe59:4a3c) Quit (Ping timeout: 480 seconds)
[1:12] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:15] * lightspeed (~lightspee@2001:8b0:16e:1:216:eaff:fe59:4a3c) has joined #ceph
[1:16] * ShaunR- (~ShaunR@ip68-5-215-171.oc.oc.cox.net) has joined #ceph
[1:20] * ShaunR (~ShaunR@staff.ndchost.com) Quit (Ping timeout: 480 seconds)
[1:21] * DarkAceZ (~BillyMays@50-32-3-135.drr01.hrbg.pa.frontiernet.net) has joined #ceph
[1:31] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[1:35] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[1:42] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Remote host closed the connection)
[1:47] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Ping timeout: 480 seconds)
[1:48] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[1:51] * xevwork (~xevious@6cb32e01.cst.lightpath.net) Quit (Quit: No Ping reply in 180 seconds.)
[1:55] * LeaChim (~LeaChim@host86-161-89-52.range86-161.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[1:59] * mattbenjamin1 (~matt@aa2.linuxbox.com) Quit (Quit: Leaving.)
[2:00] * sarob (~sarob@76.197.12.195) has joined #ceph
[2:00] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[2:05] * diegows (~diegows@190.190.17.57) Quit (Ping timeout: 480 seconds)
[2:08] * sarob (~sarob@76.197.12.195) Quit (Ping timeout: 480 seconds)
[2:14] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[2:40] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[2:43] * KindTwo (KindOne@h163.41.186.173.dynamic.ip.windstream.net) has joined #ceph
[2:44] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[2:44] * KindTwo is now known as KindOne
[2:47] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Read error: Operation timed out)
[3:02] * Cube1 (~Cube@66-87-65-128.pools.spcsdns.net) has joined #ceph
[3:02] * Cube (~Cube@66-87-65-169.pools.spcsdns.net) Quit (Read error: Connection reset by peer)
[3:03] * haomaiwang (~haomaiwan@117.79.232.192) has joined #ceph
[3:07] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[3:07] * ChanServ sets mode +v andreask
[3:22] * i_m (~ivan.miro@95.180.8.206) Quit (Quit: Leaving.)
[3:40] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[3:48] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[4:06] * alfredodeza (~alfredode@198.206.133.89) Quit (Remote host closed the connection)
[4:07] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[4:07] * alfredodeza (~alfredode@198.206.133.89) has joined #ceph
[4:41] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[4:49] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[5:24] * Vacum (~vovo@88.130.220.14) has joined #ceph
[5:31] * Vacum_ (~vovo@i59F7AD26.versanet.de) Quit (Ping timeout: 480 seconds)
[5:41] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[5:49] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[6:05] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Ping timeout: 480 seconds)
[6:23] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[6:24] <sherry> Hi, I'd like to know any file-benchmark tool that can specify my own workload (distribution of file sizes)
[6:41] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[6:49] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:03] * destrudo (~destrudo@this.is.a.d4m4g3d.net) has joined #ceph
[7:05] * Sysadmin88 (~IceChat77@2.218.8.40) has joined #ceph
[7:11] * wschulze1 (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:11] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[7:19] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[7:38] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[8:11] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[8:20] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[8:33] * peedu (~peedu@adsl89.uninet.ee) has joined #ceph
[8:36] * mozg (~andrei@46.229.149.194) has joined #ceph
[8:40] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[8:41] * Guest1129 (~coyo@thinks.outside.theb0x.org) Quit (Ping timeout: 480 seconds)
[8:45] * Coyo (~coyo@thinks.outside.theb0x.org) has joined #ceph
[8:45] * Coyo is now known as Guest1830
[8:55] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[8:58] * rendar (~s@host72-183-dynamic.11-87-r.retail.telecomitalia.it) has joined #ceph
[9:01] <peedu> hi
[9:02] <peedu> any idead how to monitor ceph latency, like how long it takes ceph to process read and write operations
[9:02] <peedu> with ceph perf counters i can get avarage latency on read operations, but how can i monitor it in live time?
[9:07] * Cube1 (~Cube@66-87-65-128.pools.spcsdns.net) Quit (Quit: Leaving.)
[9:09] * haomaiwa_ (~haomaiwan@117.79.232.155) has joined #ceph
[9:12] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[9:15] * haomaiwang (~haomaiwan@117.79.232.192) Quit (Ping timeout: 480 seconds)
[9:16] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) has joined #ceph
[9:20] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[9:26] * Muhlemmer (~kvirc@cable-90-50.zeelandnet.nl) has joined #ceph
[9:27] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[9:37] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[9:37] * ChanServ sets mode +v andreask
[9:55] * zidarsk8 (~zidar@89-212-28-144.dynamic.t-2.net) has joined #ceph
[9:55] * Sysadmin88 (~IceChat77@2.218.8.40) Quit (Quit: OUCH!!!)
[9:58] * zidarsk8 (~zidar@89-212-28-144.dynamic.t-2.net) has left #ceph
[9:58] <peedu> any idead how to monitor ceph latency, like how long it takes ceph to process read and write operations
[9:58] <peedu> with ceph perf counters i can get avarage latency on read operations, but how can i monitor it in live time?
[9:58] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[9:59] * sherry (~sherry@mike-alien.esc.auckland.ac.nz) Quit (Quit: Leaving)
[9:59] * sherry (~sherry@mike-alien.esc.auckland.ac.nz) has joined #ceph
[10:07] <andreask> peedu: there is a collectd plugin to collect and aggregate the perf counters ... feed it into a graphing-tool of your choice
[10:08] * sherry (~sherry@mike-alien.esc.auckland.ac.nz) Quit (Quit: Leaving)
[10:09] * sherry (~sherry@mike-alien.esc.auckland.ac.nz) has joined #ceph
[10:11] * sherry (~sherry@mike-alien.esc.auckland.ac.nz) Quit ()
[10:12] * hjjg (~hg@p3EE3271B.dip0.t-ipconnect.de) has joined #ceph
[10:12] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[10:14] * sherry (~sherry@mike-alien.esc.auckland.ac.nz) has joined #ceph
[10:15] <peedu> thank you andreask, but i dont have problems with collecting and graphing the data. Perf counters show avarage latency of read and write, where can i get live latency of those operations?
[10:16] <andreask> you mean of the complete cluster=
[10:16] <andreask> ?
[10:16] * Cube (~Cube@12.248.40.138) has joined #ceph
[10:17] * ShaunR- (~ShaunR@ip68-5-215-171.oc.oc.cox.net) Quit (Ping timeout: 480 seconds)
[10:17] <peedu> yes, i created more osd-s and recovery process of ceph just took all of the performance, i would like to see how much time ceph operations take
[10:19] <peedu> like ceph write operations latency
[10:20] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[10:21] <peedu> for latency i can create zabbix item, and can see when somthing goes bad
[10:23] <andreask> hmm ... I don't think there are more than you can get from the perfcounters
[10:24] * thomnico (~thomnico@2a01:e35:8b41:120:699a:5c06:d10a:1b56) has joined #ceph
[10:29] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) Quit (Quit: Leaving.)
[10:30] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) has joined #ceph
[10:37] * i_m (~ivan.miro@95.180.8.206) has joined #ceph
[10:41] * sherry (~sherry@mike-alien.esc.auckland.ac.nz) Quit (Quit: Leaving)
[10:44] * sherry (~sherry@mike-alien.esc.auckland.ac.nz) has joined #ceph
[10:46] * sherry (~sherry@mike-alien.esc.auckland.ac.nz) Quit ()
[10:48] <andreask> peedu: maybe you can get more information by querying ceph-rest-api
[10:48] * codice_ (~toodles@75-140-64-194.dhcp.lnbh.ca.charter.com) Quit (Read error: Operation timed out)
[10:48] <peedu> im reasearching rest-api atm, when i get sucsess ill notify you
[10:49] * sherry (~sherry@mike-alien.esc.auckland.ac.nz) has joined #ceph
[10:52] * codice (~toodles@75-140-64-194.dhcp.lnbh.ca.charter.com) has joined #ceph
[11:01] * LeaChim (~LeaChim@host86-161-89-52.range86-161.btcentralplus.com) has joined #ceph
[11:04] * sleinen (~Adium@2001:620:0:26:ecf3:1220:d301:882d) has joined #ceph
[11:25] * thomnico (~thomnico@2a01:e35:8b41:120:699a:5c06:d10a:1b56) Quit (Ping timeout: 480 seconds)
[11:30] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Remote host closed the connection)
[11:31] * marrusl_ (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[11:31] * marrusl_ (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) Quit ()
[11:32] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) Quit (Remote host closed the connection)
[11:44] * thomnico (~thomnico@2a01:e35:8b41:120:1bb:a8ed:ab6:dc91) has joined #ceph
[12:04] * jcsp (~Adium@0001bf3a.user.oftc.net) has joined #ceph
[12:07] * thomnico_ (~thomnico@2a01:e35:8b41:120:8166:c68a:a08b:20a4) has joined #ceph
[12:10] * thomnico (~thomnico@2a01:e35:8b41:120:1bb:a8ed:ab6:dc91) Quit (Ping timeout: 480 seconds)
[12:22] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[12:24] * thomnico (~thomnico@2a01:e35:8b41:120:f8df:1d10:a2c2:1f6b) has joined #ceph
[12:27] * dzianis__ (~dzianis@86.57.255.91) has joined #ceph
[12:27] * thomnico_ (~thomnico@2a01:e35:8b41:120:8166:c68a:a08b:20a4) Quit (Ping timeout: 480 seconds)
[12:33] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) Quit (Ping timeout: 480 seconds)
[12:34] * dzianis_ (~dzianis@86.57.255.91) Quit (Ping timeout: 480 seconds)
[12:41] * dzianis__ (~dzianis@86.57.255.91) Quit (Quit: Leaving)
[12:41] * dzianis__ (~dzianis@86.57.255.91) has joined #ceph
[12:50] * xdeller (~xdeller@91.218.144.129) has joined #ceph
[12:51] * thomnico_ (~thomnico@2a01:e35:8b41:120:85fe:c020:bdc6:307b) has joined #ceph
[12:52] * thomnico (~thomnico@2a01:e35:8b41:120:f8df:1d10:a2c2:1f6b) Quit (Ping timeout: 480 seconds)
[12:55] * fouxm (~foucault@ks3363630.kimsufi.com) has joined #ceph
[13:12] <haomaiwa_> Hi, everyone. Where could I found the compatible of CephFS with POSIX?
[13:16] * sleinen1 (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[13:16] * sleinen (~Adium@2001:620:0:26:ecf3:1220:d301:882d) Quit (Ping timeout: 480 seconds)
[13:19] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[13:19] * sleinen1 (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Read error: Connection reset by peer)
[13:19] * sleinen1 (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[13:19] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Read error: Connection reset by peer)
[13:22] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[13:22] * sleinen1 (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Read error: Connection reset by peer)
[13:25] * sleinen1 (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[13:25] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Read error: Connection reset by peer)
[13:26] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[13:26] * sleinen1 (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Read error: Connection reset by peer)
[13:28] * sleinen1 (~Adium@2001:620:0:25:e8cb:1532:7223:e7c6) has joined #ceph
[13:34] * thomnico_ (~thomnico@2a01:e35:8b41:120:85fe:c020:bdc6:307b) Quit (Ping timeout: 480 seconds)
[13:34] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[13:36] <loicd> haomaiwa_: how do you mean ?
[13:36] <loicd> Noah is working on portability, FWIW
[13:44] * pvsa (~pvsa@pd95c6a80.dip0.t-ipconnect.de) has joined #ceph
[13:44] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[13:45] * pvsa (~pvsa@pd95c6a80.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[13:55] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) has joined #ceph
[13:57] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[14:01] * mschiff (~mschiff@port-30114.pppoe.wtnet.de) has joined #ceph
[14:02] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[14:03] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Read error: Operation timed out)
[14:17] * thomnico (~thomnico@2a01:e35:8b41:120:c8d5:57f3:90b7:3251) has joined #ceph
[14:23] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[14:23] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has left #ceph
[14:24] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[14:28] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Quit: Leaving.)
[14:32] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[14:35] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[14:39] * psomas (~psomas@inferno.cc.ece.ntua.gr) has joined #ceph
[14:40] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[14:56] * psomas (~psomas@inferno.cc.ece.ntua.gr) Quit (Ping timeout: 480 seconds)
[15:10] * BillK (~BillK-OFT@106-69-25-13.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[15:15] * edle (~oftc-webi@199.168.44.193) has joined #ceph
[15:16] <edle> hello is there any ceph openstack experts online?
[15:16] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Remote host closed the connection)
[15:16] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[15:17] <edle> test
[15:20] <edle> has anyone got ceph to work on openstack havana to work specifically bootable volumes?
[15:20] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Read error: Operation timed out)
[15:21] * yo61_ (~yo61@lin001.yo61.net) Quit (Ping timeout: 480 seconds)
[15:22] * Cube (~Cube@12.248.40.138) Quit (Quit: Leaving.)
[15:23] * t0rn1 (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[15:31] <mozg> edle, your best bet is to email the mailing list
[15:32] <edle> ill try that thanks mozg
[15:32] <mozg> no probs
[15:33] <mozg> there are more people reading mailing lists
[15:33] <mozg> i am using cloudstack and I have no problems with bootable volumes
[15:34] <mozg> i am using kvm + ubuntu cloud havana repo with libvirt 1.1.4 and qemu 1.5.0
[15:34] <mozg> that works like a charm without any tweaking
[15:37] * allsystemsarego (~allsystem@86.121.85.58) has joined #ceph
[15:37] <edle> i had openstack and ceph to work on grizzly but cannot get havana version to work on bootable volumes using fedora.
[15:38] <edle> do you know when this chat room is most active?
[15:39] <janos> i'd say in about 2-3 hours
[15:39] <janos> though now it's warming up
[15:39] <edle> thanks ill ask again then. in the mean time i guess ill try the mailing list.
[15:48] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[15:50] * AfC (~andrew@2001:44b8:31cb:d400:6e88:14ff:fe33:2a9c) Quit (Quit: Leaving.)
[15:53] * Cube (~Cube@66-87-65-128.pools.spcsdns.net) has joined #ceph
[15:58] * clayb (~kvirc@proxy-nj1.bloomberg.com) has joined #ceph
[16:00] * peedu (~peedu@adsl89.uninet.ee) Quit (Remote host closed the connection)
[16:01] * pressureman (~pressurem@62.217.45.26) has joined #ceph
[16:02] * markbby (~Adium@168.94.245.2) has joined #ceph
[16:03] <pressureman> i've just added two new OSDs to an existing two-OSD cluster, and whilst all 512 pgs are active+clean, the status is "HEALTH_WARN pool data pg_num 128 > pgp_num 64; ..."
[16:03] <pressureman> is there a way to increase pgp_num, or did i screw up when i created these two new OSDs?
[16:06] * mattbenjamin (~matt@aa2.linuxbox.com) has joined #ceph
[16:08] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Remote host closed the connection)
[16:09] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[16:09] <pressureman> nevermind, just figured it out... ceph osd pool set data pgp_num 128
[16:10] <pressureman> yay, HEALTH_OK
[16:14] * diegows (~diegows@190.190.17.57) has joined #ceph
[16:15] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[16:15] * ChanServ sets mode +v andreask
[16:15] * zerick (~eocrospom@190.187.21.53) has joined #ceph
[16:17] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Ping timeout: 480 seconds)
[16:22] * diegows (~diegows@190.190.17.57) Quit (Ping timeout: 480 seconds)
[16:23] * dmsimard (~Adium@108.163.152.2) has joined #ceph
[16:32] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[16:33] * wrencsok (~wrencsok@wsip-174-79-34-244.ph.ph.cox.net) has left #ceph
[16:40] * sagelap (~sage@cpe-23-242-158-79.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[16:45] * hjjg (~hg@p3EE3271B.dip0.t-ipconnect.de) Quit (Remote host closed the connection)
[16:45] * edle (~oftc-webi@199.168.44.193) Quit (Remote host closed the connection)
[16:46] * wrencsok (~wrencsok@wsip-174-79-34-244.ph.ph.cox.net) has joined #ceph
[16:54] * tryggvil (~tryggvil@83.151.131.116) has joined #ceph
[16:59] * sagelap (~sage@38.122.20.226) has joined #ceph
[17:00] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) has joined #ceph
[17:01] * vata (~vata@2607:fad8:4:6:d074:8110:9aba:32f6) has joined #ceph
[17:02] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[17:04] <pmatulis2> strange, they should be identical by default
[17:06] * zirpu (~zirpu@2600:3c02::f03c:91ff:fe96:bae7) Quit (Quit: leaving)
[17:10] * peedu (~peedu@adsl89.uninet.ee) has joined #ceph
[17:10] * nwat (~textual@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[17:12] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[17:18] * peedu (~peedu@adsl89.uninet.ee) Quit (Ping timeout: 480 seconds)
[17:20] * gdavis33 (~gdavis@38.122.12.254) has joined #ceph
[17:20] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Ping timeout: 480 seconds)
[17:20] * gdavis33 (~gdavis@38.122.12.254) has left #ceph
[17:28] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[17:30] * jhurlbert (~jhurlbert@216.57.209.252) Quit (Quit: jhurlbert)
[17:39] * Machske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[17:47] * bandrus (~Adium@107.222.155.194) has joined #ceph
[17:48] * mozg (~andrei@46.229.149.194) Quit (Ping timeout: 480 seconds)
[17:50] * hjjg (~hg@p3EE30EFB.dip0.t-ipconnect.de) has joined #ceph
[17:51] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[17:53] * ganders (~ganders@200.0.230.234) has joined #ceph
[17:55] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[17:56] <EWDurbin> ls
[17:56] * EWDurbin (~ernestd@ewd3do.ernest.ly) has left #ceph
[18:01] * jhurlbert (~jhurlbert@216.57.209.252) has joined #ceph
[18:04] * diegows (~diegows@200.68.116.185) has joined #ceph
[18:12] * hjjg (~hg@p3EE30EFB.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[18:16] * BillK (~BillK-OFT@106-69-25-13.dyn.iinet.net.au) has joined #ceph
[18:18] * thomnico (~thomnico@2a01:e35:8b41:120:c8d5:57f3:90b7:3251) Quit (Quit: Ex-Chat)
[18:28] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[18:29] * mattbenjamin1 (~matt@aa2.linuxbox.com) has joined #ceph
[18:31] <lxo> I'm getting a PGLog.cc:737 assertion failure upon starting an osd. it has been very slow in processing parent setxattrs, so I brought it down to complete things faster
[18:31] * pressureman (~pressurem@62.217.45.26) Quit (Quit: Ex-Chat)
[18:31] <lxo> that went well, but after I brought it back up, the assertion came up twice already (it restarted once during recovery due to a timeout)
[18:32] <lxo> now, I could bypass the error by simply disabling the assertion; after the PG fully recovers, the error is cleared (scrub is pending)
[18:33] * mattbenjamin (~matt@aa2.linuxbox.com) Quit (Ping timeout: 480 seconds)
[18:33] <lxo> question is, would you like me to collect any further info before the problem goes away again?
[18:33] <lxo> this is on 0.72.2, BTW
[18:35] * i_m (~ivan.miro@95.180.8.206) Quit (Quit: Leaving.)
[18:35] <lxo> I'm also confused because there are plenty of ancient pglog files in the osd's current/meta, but the code makes it seem like they're deprecated in favor of something else. is that so? why wouldn't they be getting cleaned up, as the rewrite_log = true assignment should imply?
[18:36] * i_m (~ivan.miro@95.180.8.206) has joined #ceph
[18:36] <lxo> oh, the existing pglog files are all zero-sized; I guess that's why
[18:39] * alphe (~alphe@0001ac6f.user.oftc.net) has joined #ceph
[18:40] <alphe> hello every body !
[18:40] <alphe> I have a weird problem with my main ceph-mon
[18:40] <alphe> it doesn t start and crash with auth messages
[18:44] <alphe> ceph is freeking me out it is so unstable ...
[18:46] * sagelap1 (~sage@38.122.20.226) has joined #ceph
[18:46] * sagelap (~sage@38.122.20.226) Quit (Read error: Connection reset by peer)
[18:49] <alphe> I don t know why but serveral things are not working
[18:49] <alphe> on some nodes the osd doesn t auto start anymore
[18:49] <alphe> on some nodes the monitor crash at start
[18:50] <alphe> on some os the journal was point to a disk-by-partuuid and that disapeared ..
[18:52] <janos> alphe - not sure what to say. i have a cluster on fedora min install machines that was bobtail upgraded to dumpling upgraded to emperor and it's fine
[18:53] <janos> and it's a mix of xfs and btrfs osd's
[18:53] <alphe> and and in my case every 2 days I have to find solution on half disapeared cluster
[18:54] <alphe> problem can be basically that the os is on usb pendrives ...
[18:54] <alphe> other problem can be that I use saucy versions from gitbuilder
[18:56] <alphe> ok on osd 03 the journal of my xfs disks are pointing to a /dev/disk/by-partuuid that doesn t exists anymore this is weird
[18:56] <alphe> all the nodes were installed the same way
[18:56] <janos> i would imagine missing partition uuid's is outside the scope of ceph
[18:57] <alphe> on node 1 the main monitor crash at start trapping a ton of blah blah number starting with problem in auth
[18:57] <alphe> janos yeah that is related with kernel or better said udev ...
[19:03] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[19:03] * ChanServ sets mode +v andreask
[19:04] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Remote host closed the connection)
[19:05] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[19:06] * xdeller (~xdeller@91.218.144.129) Quit (Quit: Leaving)
[19:13] * houkouonchi-home (~linux@houkouonchi-1-pt.tunnel.tserv15.lax1.ipv6.he.net) has joined #ceph
[19:13] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Ping timeout: 480 seconds)
[19:15] * gregsfortytwo1 (~Adium@2607:f298:a:607:607c:463f:25ad:d43d) has joined #ceph
[19:15] * houkouonchi-work (~linux@12.248.40.138) Quit (Remote host closed the connection)
[19:16] * bandrus (~Adium@107.222.155.194) Quit (Quit: Leaving.)
[19:18] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[19:21] * gregsfortytwo (~Adium@2607:f298:a:607:78c4:ad72:a71d:da1c) Quit (Ping timeout: 480 seconds)
[19:21] * bandrus (~Adium@107.222.155.194) has joined #ceph
[19:34] * joshd (~joshd@2607:f298:a:607:e530:6c8f:27b5:91e) Quit (Ping timeout: 480 seconds)
[19:35] * sarob (~sarob@40.sub-70-211-64.myvzw.com) has joined #ceph
[19:39] * ShaunR (~ShaunR@staff.ndchost.com) has joined #ceph
[19:41] * joshd (~joshd@2607:f298:a:607:5daa:8dd2:ce69:ffbe) has joined #ceph
[19:44] * dmick (~dmick@2607:f298:a:607:5460:8109:e204:5d24) has joined #ceph
[19:46] * fireD (~fireD@93-139-140-159.adsl.net.t-com.hr) has joined #ceph
[20:00] <ganders> how can i remove a node from the cluster?
[20:00] <ganders> i've already remove all the OSD's from that node
[20:00] <alphe> ganders depends the services ...
[20:00] <ganders> but in the osd tree is still showing the node
[20:01] <ganders> it only had the osd services
[20:01] <alphe> ganders you want to stop the node first
[20:01] <alphe> so kill .-9 its pid
[20:02] <ganders> the node has crush, that's why i need to take it out of the cluster
[20:02] <ganders> i don't had access to that node anymore
[20:02] <ganders> i'll reinstall it
[20:02] <andreask> ganders: see http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
[20:03] <ganders> thanks andreask, i've already done those steps, and remove the OSD's from that node from the crush map
[20:06] * Pedras (~Adium@216.207.42.132) has joined #ceph
[20:23] * ircolle (~Adium@2601:1:8380:2d9:3d28:3acb:bb72:7346) has joined #ceph
[20:23] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[20:28] <alphe> really weird ...
[20:28] <alphe> janos I solve my main monitor crash problem by forcing a resync ...
[20:29] <alphe> but then I don t have service auto start osd and monitors on 3 nodes ...
[20:29] <alphe> 1 node simply forgot a ton of data ...
[20:29] <alphe> I think the usb pendrive I use are not fit to the osd workload
[20:30] <alphe> I think the usb pendrive I use are not fit to the OS workload
[20:32] <ganders> alphe: did you try to mount the osd's and then start the services?
[20:33] <alphe> ganders yes ...
[20:33] <alphe> tha works on node 1 and 2 on node 3 I have a freak message /dev/sda1 is not a special device ...
[20:34] * Sysadmin88 (~IceChat77@2.218.8.40) has joined #ceph
[20:34] <ganders> if you do a fdisk /dev/sda do you see the partitions?
[20:34] <alphe> yes
[20:35] <ganders> so, you had the osd mounted on the corresponding osd structure, but you can't get the service up & running
[20:36] <alphe> ganders on node 3 (the one that doesn t want to mount the xfs disk ) I have another weird issu I can t apt-get install or upgrade or update anything ..
[20:36] <alphe> I think the os filesystem since it is on a pendrive and submited to sustained I/O activity don t handle it and break
[20:37] <alphe> as a matter of fact only node that has monitors in them does those weird things ...
[20:37] <ganders> that's rare
[20:38] <alphe> ganders I think it is rare to have OS filesystem on pendrive too anyway :)
[20:38] <alphe> not many folks do it so there is no precedent but yes I have a ton of strange things happenning on those 3 nodes
[20:38] <ganders> hahah you're right on that, i'm scare to ask why they had the OS in a pendrive..
[20:39] <alphe> ganders don t be ...
[20:39] <alphe> I m developing a ceph box .... 10 nodes in a 8U box and the mother board I use only can plug 2 sata 2 drives
[20:39] <ganders> I think that if the cluster is only 3 nodes it's not for production right?
[20:40] <alphe> it s a 10 node
[20:40] <ganders> oh ok i see
[20:40] <alphe> and only the ones with monitors on them goes all freak
[20:41] <alphe> I have an 11th machine with a real hard drive as filesystem I should use it as main monitor I think and remove the other monitors
[20:41] <ganders> that's sound good
[20:41] <alphe> or reduce the verbosity of the monitors ...
[20:42] <alphe> the other ventage of having nodes with pendrive filesystem is to easyly maintain them
[20:42] <alphe> my goal is to sell ceph boxes to clients and so they can have 50 to 100 nodes ...
[20:43] <ganders> that's really interesting
[20:43] <alphe> you dd often you os pendrive into a spare pendrive and if the main pendrive is dead you unplug it and replace it
[20:44] <alphe> the os will still boot and work on the first pendrive it found which is able to boot
[20:45] <alphe> ceph means a tons of good and we try here in my company to make the black point like install, administration, maintenance the flawlessly possible in order to make our client happy to purchase ceph clusters
[20:45] <ganders> Nice to see it work, it's a really cool project
[20:46] <alphe> until now that ceph box is under developement everyday I get into problems that means finding way around or changing the initial design
[20:46] <ganders> we are trying in our company ceph
[20:46] <ganders> and we use an HP Encl that we had with 5 blades and also each blade with a D2200sb storage tray with 12 disks each
[20:46] <alphe> the most problem we face now the more efficient will be our reaction to our users problems
[20:47] <ganders> exactly, we also are trying to make all kind of tests
[20:48] <ganders> shutdown nodes, kill processes, see how it works from the underground and measure performance and availability, and also to see how "easy" is to get back the cluster to a Health status
[20:48] <alphe> o the other thing is that our ceph box has to be low cost as much as possible ...
[20:48] <alphe> and harddrives for os instead of pendrive means at least 6 times the money spent on that particular area
[20:49] <alphe> and once again the other nodes that only hosts osd has no problems so far
[20:49] <alphe> they work as perfectly as you could expect
[20:51] <alphe> and the motherboard we use is not fit for more than 2 sata drives which are use for osd os should find another way like usb and a real hardrive on usb 2.0 port is like a loss of possibility
[20:51] <alphe> eventually I will have to come to that if i don t solve my filesystem stability problems
[20:52] * xmltok (~xmltok@cpe-76-90-130-65.socal.res.rr.com) has joined #ceph
[20:52] * angdraug (~angdraug@12.164.168.115) has joined #ceph
[20:52] <ganders> what os are you using?
[20:56] * JonTheNiceGuy (~JonTheNic@cpc15-stkp8-2-0-cust64.10-2.cable.virginm.net) has joined #ceph
[20:57] * Muhlemmer (~kvirc@cable-90-50.zeelandnet.nl) Quit (Quit: KVIrc 4.3.1 Aria http://www.kvirc.net/)
[20:57] <alphe> ubuntu 13.10
[20:57] <alphe> I tested many other I found more interresting using ubuntu
[20:58] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[20:58] * Muhlemmer (~kvirc@cable-90-50.zeelandnet.nl) has joined #ceph
[20:59] <JonTheNiceGuy> Hi, is it possible to run Ceph on a single host? I've got a bunch of disks and I want to make sure I don't lose the data across them.
[20:59] <JonTheNiceGuy> Or, (and this might not be the best place to ask), is there a better way to do it? :)
[21:00] <JonTheNiceGuy> The disks are all USB based
[21:03] * markbby (~Adium@168.94.245.2) has joined #ceph
[21:11] <runfromnowhere> JonTheNiceGuy: You might be better off with some manner of software RAID if you're talking local to a single host and don't need any of the more advanced features Ceph provides
[21:13] <ganders> im using ubuntu 12.10
[21:13] <ganders> and run fine
[21:14] <JonTheNiceGuy> Cool, thanks for the feedback runfromnowhere
[21:26] * ganders (~ganders@200.0.230.234) Quit (Quit: WeeChat 0.4.0)
[21:29] * Tamil (~tamil@38.122.20.226) Quit (Read error: Connection reset by peer)
[21:37] <aarontc> JonTheNiceGuy: Ceph has an advantage over software RAID for your setup because if a drive is lost but not damaged, recovery will go much faster with Ceph... in my experience, the biggest problem with USB disks is they randomly stop talking to the host and need powercycled or unplugged/replugged to talk again
[21:38] * Tamil (~tamil@38.122.20.226) has joined #ceph
[21:38] <aarontc> JonTheNiceGuy: that being said, Ceph will require a lot more RAM and CPU time than dmraid would
[21:41] <runfromnowhere> aarontc: Not being sarcastic or antyhing - I'd definitely believe that Ceph can recover faster than RAID5 but what about RAID0 or RAID10?
[21:42] <aarontc> runfromnowhere: RAID0 can't be recovered if any disk fails, and if any disk becomes unresponsive the whole array will stop responding, so I guess that recovery is pretty instant but not fault tolerant
[21:42] <janos> i think he means raid1
[21:42] <janos> i hope ;)
[21:42] <aarontc> runfromnowhere: and RAID1 or 10 or 01 will require (at best) the time it takes to copy the entire size of one disk back
[21:42] <runfromnowhere> Crud I always screw my numbers up
[21:42] <runfromnowhere> I did mean RAID1
[21:42] <janos> raid0 isn't raid!
[21:44] <aarontc> Linux's software RAID does have support for write-intent bitmaps, but those are (AFAIK) only used for erasure-coding levels, like 4,5,6, so to recover on RAID1 you have to rewrite the entire contents of the disk
[21:44] <runfromnowhere> Wow, that's harsh
[21:44] <runfromnowhere> I recall doing some software RAID1 setups with GEOM on BSD that were quite reliable, way back when
[21:45] <aarontc> RAID1 is certainly quite useful and reliable, but Ceph has the advantage that only object changes since a disk went out to lunch need propagated
[21:45] <aarontc> and since USB drives have (in my experience) a high tendency to go out to lunch, recovery will have to happen often
[21:45] <runfromnowhere> Interesting
[21:45] <runfromnowhere> Presuming the disk can be recovered as-is
[21:46] <janos> ceph doesn't sound like the proper path for that situation
[21:46] <aarontc> I could just have bad luck with USB<->SATA controllers
[21:46] <JonTheNiceGuy> And I guess that by using ceph it means that if I want to add a box later with more disks, I can :)
[21:46] <runfromnowhere> Ceph DEFINITELY lets you easily expand to multiple nodes, which RAID can't in a reasonable way
[21:46] <janos> JonTheNiceGuy: you would need to change the default failure domain from host to disk for your initial use-case
[21:46] <janos> with one host
[21:46] <aarontc> runfromnowhere: wellllllllll you could use NBD to do a multi-node RAID, but your failure domain only grows with each expansion ;)
[21:47] <janos> tehn if you added another host, then change the failure daomin back up to host, assuming that's what wou're aiming at
[21:47] <runfromnowhere> Hahah yeah I wouldn't advocate something like that
[21:47] <janos> *domain
[21:47] <runfromnowhere> The number of potential failure modes just goes up like crazy
[21:48] <aarontc> (I've tried a LOT of things to get more storage than a single host could contain over the years, I'm really happy Ceph has come along)
[21:48] <aarontc> At one point I had a bunch of nodes running RAID5 exporting NBD devices which were then RAID0'd on another host
[21:52] <JonTheNiceGuy> So if I follow the quickstart, will that get me a working localhost ceph service, or is there a better beginners-guide-to-ceph I could follow?
[21:53] <aarontc> JonTheNiceGuy: I don't know if ceph-deploy can handle a single-node deployment, but you can definitely do it with a manual deploy. You just have to edit the default CRUSH map as janos mentioned
[21:53] <aarontc> JonTheNiceGuy: but you might want to tell us more about your setup like number of disks you plan to use, and your RAM and CPU core count for a better discussion of real-world feasibility ;)
[21:55] <janos> i've yet to touch ceph-deploy
[21:55] <janos> ;)
[21:55] <Sysadmin88> aarontc, so you had redundancy on your raid 5 and then killed it with raid 0...
[21:55] <aarontc> janos: yeah, me too. I tried it once on some VMs and couldn't make it work so went back to manual deployment
[21:55] * Macheske (~Bram@d5152D87C.static.telenet.be) has joined #ceph
[21:56] <aarontc> Sysadmin88: you're talking about my NBD setup? It didn't matter because that single host was the biggest point of failure anyway
[21:56] <janos> last i looked on an f18 or f19 box i don't think i had it installed and didn't see where to. and old manual style is working fine for me
[21:56] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[21:57] <JonTheNiceGuy> Thanks for your help runfromnowhere, aarontc and janos
[21:57] <aarontc> no problem, JonTheNiceGuy
[21:58] <janos> any time
[21:58] * Machske (~Bram@d5152D87C.static.telenet.be) Quit (Ping timeout: 480 seconds)
[21:58] <runfromnowhere> NP!
[22:01] * t0rn1 (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) Quit (Quit: Leaving.)
[22:03] <JonTheNiceGuy> aarontc; it's for my home media - I've got three 1tb hdds on an "Intel(R) Atom(TM) CPU 330 @ 1.60GHz" (according to /proc/cpuinfo) with 4Gb RAM
[22:03] <JonTheNiceGuy> Plus, a 250Gb internal drive
[22:04] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Read error: Operation timed out)
[22:04] * sarob (~sarob@40.sub-70-211-64.myvzw.com) Quit (Remote host closed the connection)
[22:04] <aarontc> JonTheNiceGuy: ah, so I wouldn't expect that setup to be very performant, and with 2 replicas (the default) you'll have a usable capacity of around 1400GiB
[22:04] * Muhlemmer (~kvirc@cable-90-50.zeelandnet.nl) Quit (Read error: Operation timed out)
[22:04] * sarob (~sarob@40.sub-70-211-64.myvzw.com) has joined #ceph
[22:04] <JonTheNiceGuy> Damn.
[22:05] <Sysadmin88> acceptable performance is different for every situation
[22:05] <JonTheNiceGuy> Adding a single 1tb drive adds about 400GiB then?
[22:05] <aarontc> Ceph needs some free space on each OSD to facilitate internal operations and recovery, normally you'll get warnings if you go below 30% free space
[22:05] <JonTheNiceGuy> OK, that's useful to know
[22:06] <alphe> [osd03][WARNIN] Error EINVAL: entity osd.0 exists but key does not match
[22:06] <JonTheNiceGuy> I just need to not panic when I lose a drive again (like I nearly did today!)
[22:06] <kraken> http://i.imgur.com/H7PXV.gif
[22:06] <alphe> what does that means ?
[22:06] <alphe> the key in osd-boostrap is wrong ?
[22:06] <JonTheNiceGuy> Anyway, thanks for that stat - it's useful to know, but I'll probably look more closely at this early next week I think now!
[22:06] <aarontc> JonTheNiceGuy: you also need a journal for each OSD (I guess 1GiB is about the minimum...)
[22:07] <JonTheNiceGuy> OK, cool!
[22:07] * JonTheNiceGuy (~JonTheNic@cpc15-stkp8-2-0-cust64.10-2.cable.virginm.net) Quit (Quit: Leaving)
[22:07] <aarontc> alphe: so your osd.3 is trying to be osd.0?
[22:08] <alphe> the machine is named osd03
[22:08] <alphe> should be better named node03
[22:09] <alphe> i do a ceph-deploy osd activate osd03:sda1 and it doesn t work
[22:09] <aarontc> alphe: okay, so you're trying to start osd.0, with its data store mounted at /var/lib/ceph/osd/ceph-0?
[22:09] <alphe> aarontc I just created the node ...
[22:10] <aarontc> oh, I have no idea what black magic ceph-deploy does, sorry. alfredodeza is an expert on that though :)
[22:10] <alphe> /var/lib/ceph/osd is empty and disks for osd are not mounted ...
[22:11] <alphe> ok I did it manually and it worked
[22:11] <alphe> mounted the osd disk after creating the dirs ...
[22:12] <alphe> /var/lib/ceph/osd/ceph-0 and /var/lib/ceph/osd/ceph-1
[22:12] <alphe> but as all is done manually I bet a big bunch of monney that it this node will never auto start
[22:12] * sarob (~sarob@40.sub-70-211-64.myvzw.com) Quit (Ping timeout: 480 seconds)
[22:14] <aarontc> alphe: you can always manually add the correct entries to the /etc/ceph/ceph.conf file shared by all your nodes and that'll fix any auto-start issues :)
[22:16] <alphe> aarontc but the osd don t even start ...
[22:17] <aarontc> alphe: how did you try to start them (after the ceph-deploy phase)
[22:17] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[22:17] <alphe> ceph-deploy did worked
[22:18] <alphe> so I mounted the disk manually
[22:18] <alphe> and started the ceph-osd --cluster=ceph -i 0 -f &
[22:19] <alphe> but in fact those 2 ids are removed from the map so somehow I have to reinsert them in the using ceph osd create or something similar
[22:19] <aarontc> it sounds like your ceph-deploy didn't actually work, otherwise I'm sure those nodes would have been in the CRUSH map
[22:20] <alphe> osd.0 down out weight 0 up_from 4 up_thru 26 down_at 29 last_clean_interval [0,0) 20.10.10.102:6800/8180 20.10.10.102:6801/8180 20.10.10.102:6802/8180 20.10.10.102:6803/8180 exists,new
[22:20] <alphe> it should be on osd03 with ip 20.10.10.103
[22:20] <alphe> heheh
[22:24] <alphe> aarontc what is the crush map ?
[22:24] <aarontc> alphe: I recommend reading http://ceph.com/docs/master/rados/operations/crush-map/
[22:26] <alphe> crushmap is empty
[22:27] * sarob (~sarob@40.sub-70-211-64.myvzw.com) has joined #ceph
[22:29] <alphe> ** ERROR: osd init failed: (1)
[22:29] <alphe> Operation not permitted
[22:29] <alphe> on cls/hello
[22:29] <alphe> that is the most the ceph-osd log tells
[22:35] * allsystemsarego (~allsystem@86.121.85.58) Quit (Quit: Leaving)
[22:40] * ircolle1 (~Adium@c-67-172-132-222.hsd1.co.comcast.net) has joined #ceph
[22:47] * ircolle (~Adium@2601:1:8380:2d9:3d28:3acb:bb72:7346) Quit (Ping timeout: 480 seconds)
[22:49] <alphe> aarontc the reading of this page didn t made crush map more clear
[22:50] <aarontc> alphe: that page should tell you how to add buckets and devices/nodes to your crush map...
[22:51] <alphe> aarontc but this isn t something ceph-deploy does by default
[22:51] <alphe> the gain of having crush map is not explained
[22:51] * rendar (~s@host72-183-dynamic.11-87-r.retail.telecomitalia.it) Quit ()
[22:51] <aarontc> alphe: I believe it is, ceph-deploy is supposed to leave you with a working cluster as far as I understand
[22:52] <alphe> aarontc a working cluster with a 192 pgs and 3 pools ...
[22:52] <janos> i could be off base here, but it sounds like your problems possibly derive from ceph-deploy much moreso than ceph itself
[22:52] <alphe> janos hum I don t know ...
[22:52] <aarontc> I agree with janos
[22:53] * Discovery (~Discovery@192.162.100.197) has joined #ceph
[22:53] <aarontc> I would recommend doing a manual deployment just to get a thorough understanding of the internals, honestly :)
[22:53] <janos> yep
[22:53] <alphe> now crushmap is full how i decode it to get it in readable format ?
[22:53] <alphe> with osdmaptool ?
[22:54] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[22:54] <janos> alphe: http://ceph.com/docs/master/rados/operations/crush-map/#editing-a-crush-map
[22:55] <alphe> could not be more easy to read ?
[22:55] <alphe> i have to decompile recompile it ...
[22:55] <janos> it's a very simple process
[22:56] <janos> it's better day to day that's it's easy for the system to read, not humans
[22:56] <alphe> janos anyway I have zapped the disks on my node3 and it still doesn t want to run them
[22:56] <janos> i have always only done manual operations
[22:58] <alphe> ceph-deploy always worked so far
[22:58] <janos> that's cool. i'm just no help with it since i have never used it
[22:59] * AfC (~andrew@2407:7800:400:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[23:00] <alphe> i will have to fully reinstall it ...
[23:00] <alphe> damn i do it every week that tires me a lot
[23:00] <janos> when i did my first cluster pre-bobtail i did that a lot
[23:00] <janos> i wrote scripts to automate parts
[23:00] <janos> because it was tiring
[23:00] <janos> ;)
[23:01] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[23:01] <alphe> what is tiring me is that as soon i start to heavy write data to ceph cluster it goes in deqad mode and no way to repare it
[23:01] * tryggvil (~tryggvil@83.151.131.116) Quit (Quit: tryggvil)
[23:06] <alphe> ok it crash like hell
[23:06] <janos> i'm curious what's different with your setup. but not sure where to start
[23:06] <alphe> janos same disk ...
[23:07] <alphe> same everyting i just use ceph 0.72.2-8
[23:07] <janos> sorry, i meant different from mone or other people here
[23:07] <janos> but it's just one disk that consistently causes problems?
[23:08] <alphe> two disk on the third node
[23:08] <alphe> they are seen
[23:08] <alphe> I zapped them removed the partition
[23:08] <alphe> they can be mounted i see data after their mount point
[23:08] <janos> my fix for disks used to involve .357's. now i just take the magnets from them. somehow i don't think that's the answer you're looking for though
[23:09] <alphe> but I can start the osd related to them without having a hard crash
[23:09] <alphe> I should resync the data ?
[23:09] <janos> not sure
[23:09] <alphe> they are brand new osd just created with ceph-deploy osd prepare --zap-disk
[23:15] <alphe> ok now they are working ...
[23:15] <alphe> this is just super crazy
[23:18] <alphe> starting osd.27 at :/0 is the port number normal ?
[23:22] <alphe> cls_hello and then crash
[23:23] <alphe> osd.27 101 handle_osd_map epo
[23:23] <alphe> chs [102,202], i have 101, src has [1,494]
[23:23] <alphe> there is a problem in epoch how do I reset that ?
[23:23] * xmltok (~xmltok@cpe-76-90-130-65.socal.res.rr.com) Quit (Quit: Bye!)
[23:25] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Remote host closed the connection)
[23:32] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) has joined #ceph
[23:34] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Read error: Operation timed out)
[23:36] <Pedras> greetings folks
[23:36] <alphe> wow janos ... i solved the problem it seems
[23:37] <Pedras> what is the "preferred" platform to run ceph? ie. intel/amd ? if there is one...
[23:37] <alphe> all I had to do is unmount the disks related to the osd and do ceph-deploy osd activate node03:sda1 until it works
[23:38] <janos> alphe: sorry. i stepped away
[23:38] <janos> about the end of my day here
[23:38] <janos> hrm
[23:38] <janos> that's funky
[23:38] <janos> Pedras: no inherent preference
[23:38] <alphe> hum and it crashed again ...
[23:39] <alphe> one of the osd is working the other is crashing at init
[23:39] <alphe> can do the cls_hello
[23:39] <alphe> and can t get the right epoch
[23:39] <Pedras> janos: just wondering if there was some (slight) difference
[23:40] <janos> Pedras: none that i've heard
[23:40] <Pedras> janos: okidoky. Tkx!
[23:44] * vata (~vata@2607:fad8:4:6:d074:8110:9aba:32f6) Quit (Quit: Leaving.)
[23:44] * diegows (~diegows@200.68.116.185) Quit (Ping timeout: 480 seconds)
[23:45] <alphe> what is happenning with the gitbuilder store I have 1 0.72.2-11 now and 11 commit in the day it s a bit a lot
[23:45] <alphe> I will go back to the raring version
[23:45] <alphe> most of the probs I get is from the unstable gitbuilder version
[23:47] <alphe> i forced a reboot and now it is working ...
[23:48] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[23:48] * ChanServ sets mode +o elder
[23:49] <alphe> anyone knows what is happening with the saucy emperoror version on gitbuilder ? why i get 12 commited versions in the day ?
[23:50] * ScOut3R (~ScOut3R@c83-253-234-122.bredband.comhem.se) Quit (Remote host closed the connection)
[23:53] <alphe> janos my probs where probably originated in a version of 0.72.2 ...
[23:53] <alphe> now i upgraded ceph to lastest version and it is working fine ...
[23:53] * sileht (~sileht@gizmo.sileht.net) Quit (Quit: WeeChat 0.4.2)
[23:57] <janos> alphe: excellent. i hope that was it. would possibly explain the difficulty in pinning it down
[23:57] <alphe> janos it crashed again
[23:57] <janos> awwwww
[23:58] <alphe> I m going to full reboot the ceph cluster and see if that change somethig
[23:58] <janos> well it's family time for this man! good luck sir
[23:58] <janos> i shall be around much later
[23:58] <alphe> ok have fun !
[23:59] * scuttlemonkey_ (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.