#ceph IRC Log

Index

IRC Log for 2013-10-21

Timestamps are in GMT/BST.

[0:05] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[0:05] * AfC (~andrew@ppp244-218.static.internode.on.net) has joined #ceph
[0:07] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[0:08] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit ()
[0:08] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[0:11] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit ()
[0:11] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[0:25] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[0:39] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[0:56] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[0:57] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[1:04] * xarses1 (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[1:04] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Read error: Connection reset by peer)
[1:06] * xarses1 (~andreww@64-79-127-122.static.wiline.com) Quit ()
[1:07] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[1:08] * danieagle (~Daniel@186.214.61.130) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[1:09] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Read error: Connection reset by peer)
[1:09] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[1:10] * Knorrie (knorrie@yoshi.kantoor.mendix.nl) has joined #ceph
[1:13] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit ()
[1:13] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[1:15] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Remote host closed the connection)
[1:17] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[1:20] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:37] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[1:38] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Ping timeout: 480 seconds)
[1:56] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[1:56] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) Quit ()
[1:58] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[2:09] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) has joined #ceph
[2:11] * pedro (~pedro@186.92.209.84) has joined #ceph
[2:13] * pedro (~pedro@186.92.209.84) Quit ()
[2:19] * jhujhiti (~jhujhiti@00012a8b.user.oftc.net) has joined #ceph
[2:20] <jhujhiti> when an OSD goes down, does ceph automatically rebalance its data onto other OSDs?
[2:22] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[2:37] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:45] * The_Bishop (~bishop@2001:470:50b6:0:d052:4fac:ffc8:a87d) Quit (Ping timeout: 480 seconds)
[2:54] * The_Bishop (~bishop@2001:470:50b6:0:d052:4fac:ffc8:a87d) has joined #ceph
[3:01] * yy-nm (~Thunderbi@122.224.154.38) has joined #ceph
[3:09] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Read error: Operation timed out)
[3:15] * erice_ (~erice@c-98-245-48-79.hsd1.co.comcast.net) has joined #ceph
[3:15] * thomnico (~thomnico@38.126.120.10) has joined #ceph
[3:17] <lurbs> jhujhiti: I believe that after a configurable amount of time ('mon osd down out interval', defaults to 300 seconds) a 'down' OSD will be marked 'out', and data will be rebalanced.
[3:20] <jhujhiti> lurbs: how do i show that value?
[3:27] * peetaur (~peter@CPE788df73fb301-CM788df73fb300.cpe.net.cable.rogers.com) Quit (Quit: Konversation terminated!)
[3:27] <lurbs> jhujhiti: http://ceph.com/docs/master/rados/troubleshooting/log-and-debug/#runtime
[3:27] <jhujhiti> thanks
[3:30] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[3:32] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[3:48] * tsnider (~tsnider@ip68-102-128-87.ks.ok.cox.net) has joined #ceph
[4:01] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[4:01] * tsnider1 (~tsnider@198.95.226.40) has joined #ceph
[4:03] * tsnider1 (~tsnider@198.95.226.40) Quit (Read error: Connection reset by peer)
[4:07] * tsnider (~tsnider@ip68-102-128-87.ks.ok.cox.net) Quit (Ping timeout: 480 seconds)
[4:12] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[4:13] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) has joined #ceph
[4:23] * LeaChim (~LeaChim@host86-162-2-255.range86-162.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[4:25] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:26] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[4:28] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) Quit (Quit: Leaving)
[4:33] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[4:37] * themgt (~themgt@190.54.84.24) Quit (Ping timeout: 480 seconds)
[4:42] * themgt (~themgt@201-223-210-60.baf.movistar.cl) has joined #ceph
[5:03] * rongze (~rongze@117.79.232.201) has joined #ceph
[5:04] * KindTwo (KindOne@h79.33.186.173.dynamic.ip.windstream.net) has joined #ceph
[5:04] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[5:04] * KindTwo is now known as KindOne
[5:06] * fireD (~fireD@93-139-164-132.adsl.net.t-com.hr) has joined #ceph
[5:08] * fireD_ (~fireD@93-139-177-179.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:09] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:11] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:17] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[5:18] * The_Bishop (~bishop@2001:470:50b6:0:d052:4fac:ffc8:a87d) Quit (Ping timeout: 480 seconds)
[5:18] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[5:19] * bandrus (~Adium@c-98-238-148-252.hsd1.ca.comcast.net) has joined #ceph
[5:20] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[5:21] * AfC (~andrew@ppp244-218.static.internode.on.net) Quit (Quit: Leaving.)
[5:27] * The_Bishop (~bishop@2001:470:50b6:0:30e4:1a0a:a3be:e998) has joined #ceph
[5:38] * h_bar (~John@c122-107-157-26.eburwd5.vic.optusnet.com.au) has joined #ceph
[5:39] * h_bar (~John@c122-107-157-26.eburwd5.vic.optusnet.com.au) has left #ceph
[5:41] * thomnico (~thomnico@38.126.120.10) Quit (Quit: Ex-Chat)
[5:51] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Bye!)
[6:02] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[6:07] * yy-nm (~Thunderbi@122.224.154.38) Quit (Quit: yy-nm)
[6:23] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley)
[6:25] * glanzi (~glanzi@201.75.202.207) Quit (Quit: glanzi)
[6:26] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[6:26] * glanzi (~glanzi@201.75.202.207) Quit ()
[6:30] * haomaiwang (~haomaiwan@211.155.113.208) Quit (Remote host closed the connection)
[6:30] * haomaiwang (~haomaiwan@117.79.232.201) has joined #ceph
[6:38] * haomaiwang (~haomaiwan@117.79.232.201) Quit (Ping timeout: 480 seconds)
[6:53] * raipin (raipin@a.clients.kiwiirc.com) Quit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
[6:55] * raipin (raipin@a.clients.kiwiirc.com) has joined #ceph
[6:57] <aarontc> hey all, I'm having some MDS troubles on 0.67.4, and I'm not sure how to troubleshoot - looks like an authentication problem but I haven't changed anything and it seemed to work fine as of a few days ago. Here is the mds log: http://hastebin.com/wopajucewo.vhdl
[7:01] * wazzaffen (~xbmc@CPE-121-208-48-218.gdqn1.woo.bigpond.net.au) has joined #ceph
[7:03] * wazzaffen (~xbmc@CPE-121-208-48-218.gdqn1.woo.bigpond.net.au) has left #ceph
[7:03] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[7:03] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) has joined #ceph
[7:04] <yanzheng> aarontc, is the mds/cluster newly created?
[7:04] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) has joined #ceph
[7:05] <aarontc> yanzheng: No, it's been working well for weeks
[7:05] <aarontc> yanzheng: but today I had trouble mounting CephFS so started investigating this
[7:08] <yanzheng> aarontc, the log says 'mds.0.cache creating system inode with ino:1', it only happens when creating newfs
[7:08] <yanzheng> what does ceph -w says
[7:13] <aarontc> yanzheng: This is definitely not a new FS. I stopped the mds that was having problems to see if it was isolated, but it's not. Hereis the log from the mds that tried to become active: http://hastebin.com/xemeqesono.vhdl
[7:14] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[7:17] <aarontc> yanzheng: I can only assume the 'creating system inode' message came about because the mds is having problem talking to the mon and/or osd? I'm not sure from the log which is the problem, so it doesn't know there is an fs already
[7:18] * themgt (~themgt@201-223-210-60.baf.movistar.cl) Quit (Quit: themgt)
[7:19] <yanzheng> what does ceph -w says
[7:19] <yanzheng> and your ceph.conf
[7:19] <aarontc> yanzheng: most recent line from ceph -w: 2013-10-20 22:19:35.301215 mon.1 [INF] pgmap v1301562: 4600 pgs: 3304 active+clean, 1295 active+degraded, 1 active+clean+scrubbing+deep; 10977 GB data, 22295 GB used, 14936 GB / 37231 GB avail; 172KB/s wr, 18op/s; 759631/6114641 degraded (12.423%)
[7:20] <aarontc> yanzheng: ceph.conf: http://hastebin.com/gigubafiyi
[7:21] <aarontc> (not all osd are listed, quit doing that a ways in)
[7:28] <yanzheng> try restarting the monitor
[7:28] <aarontc> all 5?
[7:31] <yanzheng> just mon.chekov
[7:32] <aarontc> okay, and then tail mds log again?
[7:33] <aarontc> awesome, 2013-10-20 22:28:22.825284 7f61744ee700 1 mds.0.19 recovery_done -- successful recovery!
[7:35] <yanzheng> looks like a monitor issue, no idea what's happen
[7:36] <aarontc> is it worth gathering logs for more information? I don't know if they are useful without debug =20
[7:37] <yanzheng> if you have mon log, please open a ticket in the bug tracker
[7:38] <yanzheng> nothing interesting in the mds log
[7:38] <aarontc> I can gather logs from any of the hosts, do you only want chekov?
[7:39] <yanzheng> I think only chekov is usefully
[7:43] <aarontc> how far back? The log is many MB
[7:53] <yanzheng> no idea, grep lines that contains 'mds'
[7:54] <aarontc> 0 lines contain mds in mon log
[7:56] <aarontc> I was guessing logs might be useless without "debug = 20" :(
[7:57] <yanzheng> yes
[7:58] <aarontc> is it safe (or sane?) to run with debug = 20 all the time?
[7:59] <yanzheng> definitely not
[7:59] <yanzheng> you will run out of space soon
[7:59] <aarontc> okay. well thank you for the help troubleshooting! I guess there is no point in opening a ticket
[8:07] * a_ (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Quit: This computer has gone to sleep)
[8:13] * smashmo (~smashmo@host109-154-221-125.range109-154.btcentralplus.com) has joined #ceph
[8:16] * yanzheng (~zhyan@jfdmzpr05-ext.jf.intel.com) Quit (Remote host closed the connection)
[8:19] * odyssey4me (~odyssey4m@165.233.205.190) has joined #ceph
[8:20] * RuediR (~Adium@macrr.switch.ch) has joined #ceph
[8:21] * mattt_ (~textual@94.236.7.190) has joined #ceph
[8:21] * smashmo (~smashmo@host109-154-221-125.range109-154.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[8:22] * yanzheng (~zhyan@134.134.139.74) has joined #ceph
[8:23] * odyssey4me (~odyssey4m@165.233.205.190) Quit ()
[8:25] * foosinn (~stefan@office.unitedcolo.de) has joined #ceph
[8:30] * mattt_ (~textual@94.236.7.190) Quit (Remote host closed the connection)
[8:31] * mattt_ (~textual@92.52.76.140) has joined #ceph
[8:37] * AfC (~andrew@2407:7800:200:1011:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[8:44] * rongze (~rongze@117.79.232.201) Quit (Remote host closed the connection)
[8:45] * topro (~topro@host-62-245-142-50.customer.m-online.net) has joined #ceph
[8:47] * Vjarjadian (~IceChat77@94.1.37.151) Quit (Quit: Oops. My brain just hit a bad sector)
[9:01] * sleinen (~Adium@2001:620:0:46:29f4:d28a:26c9:751e) has joined #ceph
[9:02] * sleinen1 (~Adium@2001:620:0:46:d82c:91d2:60a7:a5c5) has joined #ceph
[9:09] * sleinen (~Adium@2001:620:0:46:29f4:d28a:26c9:751e) Quit (Ping timeout: 480 seconds)
[9:10] * sleinen (~Adium@2001:620:0:26:d461:fbf6:499d:609d) has joined #ceph
[9:11] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Quit: www.adiirc.com - Free mIRC alternative - Without the Stockholm syndrome!)
[9:12] * sleinen2 (~Adium@2001:620:0:46:e854:6500:339c:ce2b) has joined #ceph
[9:15] * rturk-aw` (~rturk@irvine-dc.dreamhost.com) has joined #ceph
[9:17] * sleinen1 (~Adium@2001:620:0:46:d82c:91d2:60a7:a5c5) Quit (Ping timeout: 480 seconds)
[9:18] * JustEra (~JustEra@89.234.148.11) has joined #ceph
[9:18] * cephalobot (~ceph@ds2390.dreamservers.com) Quit (Ping timeout: 480 seconds)
[9:18] * rturk-away (~rturk@ds2390.dreamservers.com) Quit (Ping timeout: 480 seconds)
[9:19] * sleinen (~Adium@2001:620:0:26:d461:fbf6:499d:609d) Quit (Ping timeout: 480 seconds)
[9:21] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[9:24] * rturk-aw` (~rturk@irvine-dc.dreamhost.com) Quit (Ping timeout: 480 seconds)
[9:24] * rturk-away (~rturk@ds2390.dreamservers.com) has joined #ceph
[9:33] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[9:33] * ChanServ sets mode +v andreask
[9:36] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[9:40] * hijacker (~hijacker@213.91.163.5) has joined #ceph
[9:46] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) has joined #ceph
[9:48] * rongze (~rongze@117.79.232.201) has joined #ceph
[9:49] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) Quit (Remote host closed the connection)
[9:51] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) has joined #ceph
[9:54] <ofu> when I reboot a complete ceph-cluster, all nodes swap to death after reboot, is this a common problem?
[9:54] <ofu> I have 24 OSDs and 32GB RAM per Host and all osd processes collect huge amounts of private dirty data according do /proc/$pid/smaps
[9:56] * rongze (~rongze@117.79.232.201) Quit (Ping timeout: 480 seconds)
[9:57] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) Quit (Remote host closed the connection)
[9:57] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) has joined #ceph
[10:03] * kbm (~oftc-webi@ai126184001217.15.access-internet.ne.jp) has joined #ceph
[10:03] <kbm> What is the recommend ration of ceph monitor nodes to ceph osd's?
[10:04] <kbm> ratio
[10:04] <ofu> there is no recommended ratio, should just be an odd number
[10:05] * nigwil_ (~chatzilla@2001:44b8:5144:7b00:39ff:fd0b:6dee:4268) has joined #ceph
[10:05] <kbm> ofu, so its ok to have 100 ceph osd's and 7 monitor nodes?
[10:07] <andreask> kbm 3 mons are good to start
[10:08] <kbm> andreask: thanks :)
[10:09] * kbm (~oftc-webi@ai126184001217.15.access-internet.ne.jp) Quit (Remote host closed the connection)
[10:12] * nigwil (~chatzilla@2001:44b8:5144:7b00:dc89:763e:eb:8b67) Quit (Ping timeout: 480 seconds)
[10:13] * smashmo (~smashmo@kyle.see.ed.ac.uk) has joined #ceph
[10:26] * jbd_ (~jbd_@2001:41d0:52:a00::77) has joined #ceph
[10:32] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[10:34] * aliguori (~anthony@ip-77-221-165-98.dsl.twang.net) has joined #ceph
[10:39] * G_H_I_S (G_H_I_S@a.clients.kiwiirc.com) has joined #ceph
[10:41] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[10:42] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) has joined #ceph
[10:46] * tziOm (~bjornar@194.19.106.242) has joined #ceph
[10:50] * rongze (~rongze@123.151.28.64) has joined #ceph
[11:00] * ksingh (~Adium@2001:708:10:10:fd56:470:e47d:f1a) has joined #ceph
[11:01] * rongze (~rongze@123.151.28.64) Quit (Ping timeout: 480 seconds)
[11:03] * aliguori (~anthony@ip-77-221-165-98.dsl.twang.net) Quit (Read error: Operation timed out)
[11:10] * ksingh (~Adium@2001:708:10:10:fd56:470:e47d:f1a) Quit (Quit: Leaving.)
[11:11] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[11:11] * LeaChim (~LeaChim@host86-162-2-255.range86-162.btcentralplus.com) has joined #ceph
[11:11] * ksingh (~Adium@teeri.csc.fi) has joined #ceph
[11:12] * ksingh1 (~Adium@hermes1-231.csc.fi) has joined #ceph
[11:12] * ksingh (~Adium@teeri.csc.fi) Quit (Read error: Connection reset by peer)
[11:14] * ksingh1 (~Adium@hermes1-231.csc.fi) Quit (Read error: Connection reset by peer)
[11:14] * ksingh (~Adium@b-v6-0005.vpn.csc.fi) has joined #ceph
[11:16] * ksingh1 (~Adium@teeri.csc.fi) has joined #ceph
[11:17] * yanzheng (~zhyan@134.134.139.74) Quit (Quit: Leaving)
[11:22] * ksingh (~Adium@b-v6-0005.vpn.csc.fi) Quit (Ping timeout: 480 seconds)
[11:26] * claenjoy (~leggenda@37.157.33.36) has joined #ceph
[11:39] * sel_ (~sel@212.62.233.233) has joined #ceph
[11:44] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[11:45] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[11:45] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit ()
[11:46] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[11:47] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit ()
[11:51] * G_H_I_S (G_H_I_S@a.clients.kiwiirc.com) has left #ceph
[11:52] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) Quit (Ping timeout: 480 seconds)
[11:54] * hybrid512 (~walid@LPoitiers-156-86-25-85.w193-248.abo.wanadoo.fr) has joined #ceph
[12:00] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[12:04] * Ghislain (Ghislain@a.clients.kiwiirc.com) has joined #ceph
[12:04] * Kioob`Taff (~plug-oliv@local.plusdinfo.com) has joined #ceph
[12:06] * Ghislain (Ghislain@a.clients.kiwiirc.com) has left #ceph
[12:10] * ksingh1 (~Adium@teeri.csc.fi) Quit (Read error: Connection reset by peer)
[12:10] * ksingh (~Adium@hermes1-231.csc.fi) has joined #ceph
[12:11] * ksingh1 (~Adium@2001:708:10:91:c904:f79c:8efb:ab8a) has joined #ceph
[12:11] * ksingh (~Adium@hermes1-231.csc.fi) Quit (Read error: Connection reset by peer)
[12:13] * ksingh (~Adium@b-v6-0005.vpn.csc.fi) has joined #ceph
[12:19] * ksingh1 (~Adium@2001:708:10:91:c904:f79c:8efb:ab8a) Quit (Ping timeout: 480 seconds)
[12:19] * i_m (~ivan.miro@deibp9eh1--blueice3n2.emea.ibm.com) has joined #ceph
[12:20] * erice_ (~erice@c-98-245-48-79.hsd1.co.comcast.net) Quit (Ping timeout: 480 seconds)
[12:21] * themgt (~themgt@201-223-210-60.baf.movistar.cl) has joined #ceph
[12:21] * themgt (~themgt@201-223-210-60.baf.movistar.cl) Quit (Remote host closed the connection)
[12:21] * aliguori (~anthony@ip-77-221-165-98.dsl.twang.net) has joined #ceph
[12:22] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[12:27] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[12:33] * aliguori (~anthony@ip-77-221-165-98.dsl.twang.net) Quit (Quit: Ex-Chat)
[12:35] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[12:40] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit ()
[12:40] * jcsp (~jcsp@212.20.242.100) has joined #ceph
[12:40] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[12:42] * i_m (~ivan.miro@deibp9eh1--blueice3n2.emea.ibm.com) Quit (Quit: Leaving.)
[12:42] * i_m (~ivan.miro@deibp9eh1--blueice2n2.emea.ibm.com) has joined #ceph
[12:44] * adam1 (~adam@46-65-111-12.zone16.bethere.co.uk) has joined #ceph
[12:47] * ksingh1 (~Adium@hermes1-231.csc.fi) has joined #ceph
[12:48] * erice (~erice@50.240.86.181) has joined #ceph
[12:49] * adam4 (~adam@46-65-111-12.zone16.bethere.co.uk) Quit (Ping timeout: 480 seconds)
[12:50] * ksingh2 (~Adium@2001:708:10:10:9995:d8fe:be8b:4a18) has joined #ceph
[12:51] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Ping timeout: 480 seconds)
[12:53] * ksingh (~Adium@b-v6-0005.vpn.csc.fi) Quit (Ping timeout: 480 seconds)
[12:55] * ksingh1 (~Adium@hermes1-231.csc.fi) Quit (Ping timeout: 480 seconds)
[12:59] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[13:11] * mschiff (~mschiff@port-49786.pppoe.wtnet.de) has joined #ceph
[13:13] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) has joined #ceph
[13:19] * peetaur (~peter@CPE788df73fb301-CM788df73fb300.cpe.net.cable.rogers.com) has joined #ceph
[13:20] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[13:21] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[13:21] * julian (~julianwa@125.70.135.165) has joined #ceph
[13:24] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley)
[13:27] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[13:30] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) has joined #ceph
[13:44] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[13:50] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) has joined #ceph
[13:50] * ChanServ sets mode +v andreask
[13:50] * allsystemsarego (~allsystem@188.27.166.164) has joined #ceph
[13:52] * Grasshopper (~quassel@rrcs-74-218-204-10.central.biz.rr.com) Quit (Read error: No route to host)
[13:55] * peetaur (~peter@CPE788df73fb301-CM788df73fb300.cpe.net.cable.rogers.com) Quit (Quit: Konversation terminated!)
[13:55] * peetaur (~peter@CPE788df73fb301-CM788df73fb300.cpe.net.cable.rogers.com) has joined #ceph
[13:57] * jesus (~jesus@emp180-32.eduroam.uu.se) has joined #ceph
[14:02] * ksingh2 (~Adium@2001:708:10:10:9995:d8fe:be8b:4a18) Quit (Quit: Leaving.)
[14:03] * gsaxena (~gsaxena@pool-71-178-225-182.washdc.fios.verizon.net) has joined #ceph
[14:03] * smashmo (~smashmo@kyle.see.ed.ac.uk) Quit (Quit: Leaving)
[14:07] * sleinen2 (~Adium@2001:620:0:46:e854:6500:339c:ce2b) Quit (Ping timeout: 480 seconds)
[14:13] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[14:13] * ksingh (~Adium@teeri.csc.fi) has joined #ceph
[14:15] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[14:20] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[14:24] * yanzheng (~zhyan@134.134.137.71) has joined #ceph
[14:25] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[14:25] * themgt (~themgt@201-223-210-60.baf.movistar.cl) has joined #ceph
[14:29] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[14:30] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[14:39] * markbby (~Adium@168.94.245.2) has joined #ceph
[14:47] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[14:47] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley)
[14:57] * nwf_ (~nwf@67.62.51.95) Quit (Remote host closed the connection)
[14:57] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[14:58] * markbby (~Adium@168.94.245.2) has joined #ceph
[14:58] * nwf (~nwf@67.62.51.95) has joined #ceph
[15:02] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[15:05] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[15:06] <tsnider> Do devices for journal devices (which are separate from the OSDs) need to be mounted?
[15:07] * sleinen (~Adium@130.59.94.238) has joined #ceph
[15:08] <andreask> no if you are using devices and not file systems
[15:08] <peetaur> tsnider: I think a journal can be a mounted file system or a block device
[15:08] <tsnider> ok thx
[15:08] * sleinen1 (~Adium@2001:620:0:26:3983:3fcc:7dd9:1f1) has joined #ceph
[15:11] * papamoose (~kauffman@hester.cs.uchicago.edu) has joined #ceph
[15:12] * ajazdzewski (~quassel@lpz-66.sprd.net) has joined #ceph
[15:14] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has left #ceph
[15:15] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[15:15] * sleinen (~Adium@130.59.94.238) Quit (Ping timeout: 480 seconds)
[15:19] <ajazdzewski> hi,i like to increase my sequential IO, ih have 18 OSD (2TB MDL-SAS) and i got only 30mb/s inside a rbd disk read and write speed so something is wrong ;-)
[15:21] <mikedawson> ajazdzewski: do you have rbd writeback cache enabled?
[15:22] <andreask> is this Mbit or Mbyte?
[15:23] <ajazdzewski> i enabled "rbd cache = true" in the client sektion
[15:23] <ajazdzewski> Mbyte
[15:24] <andreask> and how many replicas?
[15:24] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[15:24] <ajazdzewski> i have 3 Nodes with 2Gigabit Bond LACP Layer 3+4
[15:24] <ajazdzewski> 2 replicas
[15:25] <mikedawson> ajazdzewski: are you using qemu? if so, paste the config somewhere (or paste the output of ps -ef | grep qemu)
[15:26] <ajazdzewski> yes i use ganeti one moment i paste it
[15:27] <mikedawson> tsnider: a block device for a journal does not need to be formatted or mounted
[15:28] <ajazdzewski> http://pastebin.com/dvp5ijiR
[15:29] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[15:29] * tchmnkyz (~jeremy@0001638b.user.oftc.net) Quit (Quit: Lost terminal)
[15:29] <mikedawson> ajazdzewski: you have 'cache=none' instead of the required 'cache=writeback'
[15:31] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[15:31] <mikedawson> ajazdzewski: actually that looks a bit odd to me, I reference rbd directly, not sure what it should look like with ganeti in th emix
[15:35] * dmsimard (~Adium@ap05.wireless.co.mtl.iweb.com) has joined #ceph
[15:36] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit ()
[15:37] * claenjoy (~leggenda@37.157.33.36) Quit (Quit: Leaving.)
[15:37] * jcsp (~jcsp@212.20.242.100) has joined #ceph
[15:38] * dmsimard1 (~Adium@2607:f748:9:1666:45b9:6e15:1cd7:5785) has joined #ceph
[15:39] <ajazdzewski> ganet use a loca dev so i map the disk on the host and give the dev to the kvm process, ist will be pssible to use the direct way in the future
[15:39] * dmsimard (~Adium@ap05.wireless.co.mtl.iweb.com) Quit (Read error: Operation timed out)
[15:40] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) has joined #ceph
[15:41] <ajazdzewski> my problem is the read speed at the moment i think it mus be possible to read the data faster, a backup of a 4TB disk taks 2 days, that is awful
[15:42] * haomaiwang (~haomaiwan@119.4.172.135) has joined #ceph
[15:48] <peetaur> ajazdzewski: what are you backing up? did you try incremental backup?
[15:49] * sleinen1 (~Adium@2001:620:0:26:3983:3fcc:7dd9:1f1) Quit (Quit: Leaving.)
[15:50] <ajazdzewski> i have a nas inside ganeti with a 6TB rbd disk and i did a first full backup it takes 2 Days
[15:50] * tobru (~quassel@2a02:41a:3999::94) Quit (Remote host closed the connection)
[15:51] * fireD_ (~fireD@78-0-203-148.adsl.net.t-com.hr) has joined #ceph
[15:51] * fireD (~fireD@93-139-164-132.adsl.net.t-com.hr) Quit (Remote host closed the connection)
[15:52] * tobru (~quassel@2a02:41a:3999::94) has joined #ceph
[15:52] * PerlStalker (~PerlStalk@2620:d3:8000:192::70) has joined #ceph
[15:52] <ajazdzewski> i run osd bench *2013-10-21 15:51:28.177001 osd.5 10.100.215.53:6826/8624 597 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 85.017448 sec at 12333 KB/sec* i think the result is OK for MDL SAS
[15:53] * KindOne (KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[15:54] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[15:54] <ajazdzewski> ok its bad
[15:55] * markbby (~Adium@168.94.245.2) Quit (Ping timeout: 480 seconds)
[15:55] <ofu> what kind of caching is there in between?
[15:56] * tobru (~quassel@2a02:41a:3999::94) Quit (Remote host closed the connection)
[15:56] * sleinen (~Adium@2001:620:0:26:510:ea24:aa3c:40a2) has joined #ceph
[15:58] * rongze (~rongze@117.79.232.201) has joined #ceph
[15:59] * tobru (~quassel@2a02:41a:3999::94) has joined #ceph
[15:59] <BillK> The mds got killed by -1 mds/MDSTable.cc: In function 'void MDSTable::load_2(int, ceph::bufferlist&, Context*)' thread 7f680d2be700 time 2013-10-20 15:32:48.168371
[15:59] <BillK> mds/MDSTable.cc: 152: FAILED assert(r >= 0)
[15:59] <ajazdzewski> no cach i have a LSI SAS HBA 9207 4i4e inside each node conectet to a HP DAS BOX
[16:00] <BillK> googling came up with one fix ... create a newfs which it has done ... two quesions
[16:00] * tnt_ (~tnt@ec2-54-200-98-43.us-west-2.compute.amazonaws.com) has joined #ceph
[16:00] <BillK> how can I fix it without losing the data
[16:01] <BillK> how can I remove the lost objects which are still in there somewhere (based on disk usage ... some 120Gb)
[16:06] * rongze (~rongze@117.79.232.201) Quit (Ping timeout: 480 seconds)
[16:06] <ksingh> guys need one suggestion , if i create my cluster using ceph-deploy and after that install ceph on node1 and create node1 as monitor
[16:07] <ksingh> so does ceph.conf file ( configuration file ) gets created on ceph-deploy node
[16:07] <ksingh> do i need to manually copy this configuration file to other nodes in cluster
[16:08] <alfredodeza> ksingh: ceph-deploy copies ceph.conf to your nodes
[16:08] <alfredodeza> no need to manually do that
[16:09] <alfredodeza> you can also push them and collect them as well as a separate step if you prefer that
[16:10] <ksingh> when i am creating second monitor , OSD those are not updating in ceph.conf file automatically
[16:10] <ksingh> i need to manually edit ceph.conf file and update with osd.0 osd.1 etc changes
[16:10] * jcsp1 (~jcsp@212.20.242.100) has joined #ceph
[16:11] <ksingh> it should update automatically since i am using ceph-deploy for everything
[16:11] <alfredodeza> ceph-deploy will not create OSD sections for you
[16:11] <alfredodeza> in the config that is
[16:11] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Read error: Connection reset by peer)
[16:12] <ksingh> so do you mean to say that if i create new OSD then first add manually OSD section in ceph.conf file
[16:12] <ksingh> after that copy this ceph.conf file to OSD node
[16:13] <alfredodeza> ksingh: I think that because ceph-deploy will use defaults it will not create specific information about the OSDs in the configuration regarding those OSDs
[16:13] <ksingh> using scp or
[16:13] <alfredodeza> if you want to add specific information to the ceph.conf
[16:13] <alfredodeza> then
[16:13] <alfredodeza> yes, you can either scp
[16:13] <alfredodeza> or make ceph-deploy push that
[16:14] <ksingh> lets say i have 100 nodes , so scp manually will be a pain in ass
[16:14] <alfredodeza> ksingh: you can make ceph-deploy push the config
[16:14] * haomaiwang (~haomaiwan@119.4.172.135) Quit (Remote host closed the connection)
[16:14] <ksingh> what is the commad for pushing from ceph-deploy , i am sorry to ask this
[16:14] <ksingh> but i dont know
[16:14] * haomaiwang (~haomaiwan@li498-162.members.linode.com) has joined #ceph
[16:14] <ksingh> and didnt found
[16:15] <alfredodeza> ksingh: `ceph-deploy --help` lists `config` as the one you are looking for here :)
[16:15] * julian (~julianwa@125.70.135.165) Quit (Quit: afk)
[16:15] <alfredodeza> you can pull or push too
[16:15] <alfredodeza> `ceph-deploy config --help` will give you details
[16:15] <ksingh> ok i will dig it out , and thanks for your prompt revert
[16:16] <ksingh> one last thing , if i add MON or MDS does it update automatically in ceph.conf
[16:16] <ksingh> i guess yes
[16:16] <alfredodeza> it updates the global section
[16:16] <alfredodeza> it will not create specific sections for them, no
[16:17] <ksingh> i have seen one problem here , if we have one monitor already , it will be shown in global section
[16:18] <ksingh> if you try to add a second monior ( scaling up your cluster ) this will remove the entry of firs mon from global section
[16:18] <ksingh> and add second mononitor
[16:19] <alfredodeza> is this a monitor on the same host or a different one?
[16:19] <ksingh> monitor on different NODE
[16:19] <alfredodeza> if you could attempt to replicate that from scratch reliably then it looks like a bug
[16:19] <alfredodeza> ksingh: also make sure you are following this: http://ceph.com/docs/master/rados/deployment/ceph-deploy-mon/
[16:20] <ksingh> you are pointing me exactly the same i m following this doc only , anyway let me try to implement this and get back to you
[16:21] <ksingh> Thanks ALFREDODEZA , (*)
[16:22] <alfredodeza> np
[16:25] * markbby (~Adium@168.94.245.2) has joined #ceph
[16:26] * haomaiwa_ (~haomaiwan@119.4.172.135) has joined #ceph
[16:27] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has left #ceph
[16:27] * haomaiwa_ (~haomaiwan@119.4.172.135) Quit (Remote host closed the connection)
[16:28] * sleinen (~Adium@2001:620:0:26:510:ea24:aa3c:40a2) Quit (Quit: Leaving.)
[16:28] * sleinen (~Adium@130.59.94.238) has joined #ceph
[16:28] <peetaur> if I try to mount cephFS, I get this error "secret is empty." and if I remove the blank line in the secret file, I get "secret is not valid base64: Invalid argument."
[16:29] * tryggvil (~tryggvil@178.19.53.254) has joined #ceph
[16:29] * haomaiwa_ (~haomaiwan@124.161.72.97) has joined #ceph
[16:29] <peetaur> and it works fine with inline -o name=admin,secret=...
[16:30] <peetaur> manpage doens't seem to indicate anything special missing, so I'd say it's a new dumpling bug that didn't exist in cuttlefish which I tested before
[16:31] * rongze (~rongze@123.151.28.64) has joined #ceph
[16:33] * haomaiwang (~haomaiwan@li498-162.members.linode.com) Quit (Ping timeout: 480 seconds)
[16:33] * jcsp1 (~jcsp@212.20.242.100) Quit (Quit: Leaving.)
[16:35] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[16:36] * sleinen (~Adium@130.59.94.238) Quit (Ping timeout: 480 seconds)
[16:36] <pmatulis_> peetaur: a blank line is needed in the secret file?
[16:37] <peetaur> I don't know what is needed... but the files that ceph-deploy left in the directory start with a blank line
[16:37] <peetaur> and those files worked fine in cuttlefish
[16:38] <tsnider> so -- followup to earlier ? -- If I want ceph osd journals on separate devices what's the best practice? 1. Use a raw unmounted device without a file system? e.g ln -s /dev/sdz1 /var/lib/osd/ceph-0/journal. 2. create a file system on the raw device and use that i.e. mkfs.xfs /dev/sdz1; ln -s /dev/sdz1 /var/lib/osd/ceph-0/journal; ceph-osd --mkjournal -i 0; or 3. use a mounted device with file system and file?
[16:38] <pmatulis_> peetaur: the only time i tested it i recall having to use quotes around the secret file or password on the command line
[16:39] <peetaur> quotes will be interpreted by the shell rather than the kernel/ceph ... so I doubt that will work. But I'll try
[16:40] <peetaur> as expected, secretfile="..." didn't work, and using \" also didn't work
[16:40] * gregmark (~Adium@68.87.42.115) has joined #ceph
[16:42] * BillK (~BillK-OFT@58-7-67-236.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[16:42] * joshd1 (~jdurgin@2602:306:c5db:310:59a4:fe2c:41b0:2ee7) has joined #ceph
[16:43] * rongze (~rongze@123.151.28.64) Quit (Remote host closed the connection)
[16:51] * mtanski_ (~mtanski@69.193.178.202) has joined #ceph
[16:53] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[16:56] * tchmnkyz (~jeremy@0001638b.user.oftc.net) has joined #ceph
[16:56] <tchmnkyz> anyone alive to help me with a major problem.
[16:56] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit ()
[16:56] <sel_> Quick question about snapshots, I've been running a job that do a freeze of the rbd device, and then makes a snapshot of it. As far as I can see that should work, but when I mount the snapshot the content isn't from the date I took the snapshot, the data is from an later point in time.
[16:56] <tchmnkyz> librbd::ImageCtx: error reading immutable metadata <- i get this when i try to do anything with one of my rbd images
[16:56] <sel_> What am I missing?
[16:57] * rongze (~rongze@211.155.113.161) has joined #ceph
[16:57] * mtanski (~mtanski@69.193.178.202) Quit (Ping timeout: 480 seconds)
[16:57] * mtanski_ is now known as mtanski
[16:59] * yanzheng (~zhyan@134.134.137.71) Quit (Ping timeout: 480 seconds)
[16:59] * sleinen (~Adium@130.59.94.238) has joined #ceph
[17:02] <joshd1> tchmnkyz: can you pastebin the output of 'rbd info --debug-ms 1' for one of your images?
[17:02] * sleinen (~Adium@130.59.94.238) Quit (Read error: Connection reset by peer)
[17:03] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[17:03] <joshd1> sel_: that should work, but there were bugs in the past with kernel rbd - are you using the kernel rbd driver, and if so, which version?
[17:04] * sleinen (~Adium@2001:620:0:25:602f:6eb8:7a1a:4235) has joined #ceph
[17:05] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:05] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[17:07] <tchmnkyz> yea
[17:07] <tchmnkyz> it is also only 1 image that is giving that
[17:07] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[17:09] * foosinn (~stefan@office.unitedcolo.de) Quit (Quit: Leaving)
[17:09] <tchmnkyz> http://pastebin.com/eGU4TRda
[17:09] <tchmnkyz> there we go
[17:10] * joshd (~joshd@2607:f298:a:607:d045:7239:b0e8:8b41) Quit (Ping timeout: 480 seconds)
[17:10] <joshd1> tchmnkyz: that says the header object doesn't exist - does that image still show up in 'rbd ls'?
[17:11] * pieter (~pieter@105-236-133-68.access.mtnbusiness.co.za) has joined #ceph
[17:11] <tchmnkyz> yes it does
[17:11] <pieter> Hi guys. Is there a way to query ceph for it's current active config?
[17:11] <tchmnkyz> that is why i did not understand how it does not exist
[17:11] <pieter> I've built a cluster using ceph-deploy but the config file seems very incomplete vs hand configs in the past
[17:12] <tchmnkyz> btw are you the guy assigned to work with me at 1PM EST?
[17:12] <tchmnkyz> lol
[17:12] * zhyan__ (~zhyan@101.82.254.117) has joined #ceph
[17:12] <tchmnkyz> KevinP is lgetting me someone
[17:13] <joshd1> tchmnkyz: no, that's someone else, perhaps we'd better wait for them
[17:14] <tchmnkyz> i would rather get it fixed no
[17:14] <tchmnkyz> now
[17:14] <tchmnkyz> it is kinda prod facing
[17:14] <tchmnkyz> i am also working to push a support contract through now too
[17:14] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[17:15] <tchmnkyz> so how can i recreate this header if i had to...
[17:17] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[17:19] <joshd1> tchmnkyz: if you know the size of the image, create a new image, 'rados get' its header object, edit it to use the broken image's block_prefix, and 'rados put' it into place
[17:19] <tchmnkyz> yea that the image is 96GB
[17:19] <tchmnkyz> if that helps
[17:19] * brambles (lechuck@s0.barwen.ch) Quit (Ping timeout: 480 seconds)
[17:19] <tchmnkyz> i have one similar to it if that would help
[17:19] <andreask> pieter: like ... ceph --admin-daemon /var/run/ceph/$cluster-$name.asok config show
[17:20] * joshd (~joshd@2607:f298:a:607:c01b:65d1:71ca:5b36) has joined #ceph
[17:20] * mattt_ (~textual@92.52.76.140) Quit (Read error: Connection reset by peer)
[17:21] <joshd1> tchmnkyz: if you have any snapshots you care about on that image it would be harder to recreate it, but just restoring access it's not too bad
[17:21] <tchmnkyz> no snaps on it
[17:22] <tchmnkyz> not yet
[17:22] <tchmnkyz> it is a newer image
[17:22] * mtanski (~mtanski@69.193.178.202) Quit (Quit: mtanski)
[17:22] <pieter> thanks
[17:22] * pieter (~pieter@105-236-133-68.access.mtnbusiness.co.za) Quit (Quit: Konversation terminated!)
[17:23] <tchmnkyz> block_name_prefix: rbd_data.c11a25c2ae8944a
[17:23] <tchmnkyz> that would be the header location right?
[17:23] <tchmnkyz> because when i try a get
[17:23] <tchmnkyz> it says not found also
[17:24] <joshd1> tchmnkyz: s/data/header/
[17:24] <tchmnkyz> o
[17:24] <tchmnkyz> ok
[17:24] <sel_> joshd, Im running a debian backport kernel (3.9-0.bpo.1-amd64)
[17:25] <joshd1> tchmnkyz: hmm, since this is a format 2 issue it will be more complicated
[17:25] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[17:25] * sprachgenerator (~sprachgen@130.202.135.192) has joined #ceph
[17:25] * markbby (~Adium@168.94.245.2) has joined #ceph
[17:25] <tchmnkyz> you mean like get not outputting anything?
[17:26] <tchmnkyz> root@stor01:~ # rados -p VKCloud_APO get rbd_header.c11a25c2ae8944a file
[17:26] <tchmnkyz> root@stor01:~ # cat file
[17:26] <tchmnkyz> root@stor01:~ #
[17:27] <joshd1> tchmnkyz: yeah, it's all in omap key-value pairs in format 2
[17:27] * r0r_taga (~nick@greenback.pod4.org) Quit (Remote host closed the connection)
[17:27] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[17:28] <tchmnkyz> so how can i do this
[17:28] <tchmnkyz> i just need ot recover as fast as possible
[17:28] <tchmnkyz> i have the owner of my company coming in every 20 minutes asking if it is fixed
[17:28] <tchmnkyz> hence why i am trying to fix before my time with kevin's guy
[17:32] * zhyan_ (~zhyan@101.83.219.172) has joined #ceph
[17:32] * r0r_taga (~nick@greenback.pod4.org) has joined #ceph
[17:32] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[17:32] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[17:33] <joshd1> tchmnkyz: at this point the fastest way to recover is probably to create a new image based using the old one's data (in the rbd_data.<blk_prefix>.* objects)
[17:34] <tchmnkyz> ok fastest way to do that?
[17:34] <joshd1> tchmnkyz: you can 'rados get' each of those, and 'rados put' them with your new image's block prefix
[17:34] <tchmnkyz> is there a way i can script it?
[17:34] * shang (~ShangWu@70.35.39.20) has joined #ceph
[17:35] * zhyan__ (~zhyan@101.82.254.117) Quit (Ping timeout: 480 seconds)
[17:35] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:35] * jwilliams_ (~jwilliams@72.5.59.176) has joined #ceph
[17:36] * glzhao (~glzhao@118.195.65.67) Quit (Quit: leaving)
[17:37] <joshd1> tchmnkyz: yeah, you can loop over 'rados -p rbd ls | grep rbd_data.<block_prefix>'
[17:37] * JustEra (~JustEra@89.234.148.11) Quit (Quit: This computer has gone to sleep)
[17:38] <tchmnkyz> ok and just get it and put it back with the new prefix?
[17:38] <joshd1> right
[17:38] <tchmnkyz> rados -p VKCloud_APO ls | grep c11a25c2ae8944a
[17:38] <tchmnkyz> ok
[17:39] <joshd1> those data objects don't use any xattrs or anything special, so rados get/put will work fine with them
[17:39] * sleinen (~Adium@2001:620:0:25:602f:6eb8:7a1a:4235) Quit (Quit: Leaving.)
[17:39] * sleinen (~Adium@130.59.94.238) has joined #ceph
[17:39] <joshd1> there's no metadata about which objects exist, so the new image will be just like the old one
[17:39] * ScOut3R (~ScOut3R@catv-89-133-21-203.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[17:39] <tchmnkyz> but if i pull from the one and put wont it overwrite the data that is on the other?
[17:41] * tryggvil (~tryggvil@178.19.53.254) Quit (Quit: tryggvil)
[17:41] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[17:42] * sleinen1 (~Adium@2001:620:0:25:a026:be2b:54:d4ec) has joined #ceph
[17:43] <joshd1> tchmnkyz: the varying part is the last number in the object name, so you can do the ls .. | grep .. | cut -d . -f 3
[17:44] <joshd1> and the get old_prefix.$num; put new_prefix.$num
[17:44] * tryggvil (~tryggvil@178.19.53.254) has joined #ceph
[17:44] <tchmnkyz> josh something like this
[17:45] <tchmnkyz> http://pastebin.com/814M1YAC
[17:45] <tchmnkyz> and does it have to be a NEW image or can it be one that i already have the same size?
[17:46] <joshd1> exactly, but with rbd_data instead of rbd_header
[17:46] <joshd1> a new image would be ideal so there wouldn't be any extra data around, but it would still work copying over an existing image
[17:47] <joshd1> assuming the image isn't in use, of course
[17:47] * jcsp (~jcsp@212.20.242.100) has joined #ceph
[17:47] <tchmnkyz> oh
[17:47] <tchmnkyz> ya the image is kinda in use on another server
[17:48] <tchmnkyz> lol
[17:48] <tchmnkyz> ok
[17:48] <tchmnkyz> so i will make a new junk image
[17:49] * sleinen (~Adium@130.59.94.238) Quit (Ping timeout: 480 seconds)
[17:50] * zhyan__ (~zhyan@101.83.98.249) has joined #ceph
[17:52] * themgt (~themgt@201-223-210-60.baf.movistar.cl) Quit (Quit: themgt)
[17:52] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) has joined #ceph
[17:52] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[17:53] <tchmnkyz> http://pastebin.com/CJbNceLv
[17:53] <tchmnkyz> look good?
[17:53] <tchmnkyz> think it should work?
[17:53] * zhyan_ (~zhyan@101.83.219.172) Quit (Ping timeout: 480 seconds)
[17:54] <tchmnkyz> and we are 100% sure we wont be overwriting any data right
[17:55] * mtanski (~mtanski@69.193.178.202) Quit (Ping timeout: 480 seconds)
[17:55] * ajazdzewski (~quassel@lpz-66.sprd.net) Quit (Remote host closed the connection)
[17:56] * ksingh (~Adium@teeri.csc.fi) Quit (Quit: Leaving.)
[17:57] <joshd1> tchmnkyz: that's copying into the broken image, I thought you wanted to move the broken image data to a new working image
[17:58] <jwilliams_> Hi, I have a cluster that is misbehaving
[17:58] <jwilliams_> 2 out of 8 of my osds are crashing almost immediately after startup
[17:59] <jwilliams_> *8 is 12
[17:59] <tchmnkyz> i just wanted to re-create the image header data
[17:59] <tchmnkyz> but this can work too
[17:59] <mikedawson> jwilliams_: what version?
[17:59] <jwilliams_> ceph version 0.69 (6ca6f2f9f754031f4acdb971b71c92c9762e18c3)
[17:59] * sleinen1 (~Adium@2001:620:0:25:a026:be2b:54:d4ec) Quit (Quit: Leaving.)
[17:59] * sleinen (~Adium@130.59.94.238) has joined #ceph
[17:59] <jwilliams_> I have to set noout, or the rest of the cluster will start crashing
[18:00] <mikedawson> jwilliams_: paste logs showing the crash somewhere
[18:00] <jwilliams_> http://pastebin.com/cB9ML5md
[18:00] <jwilliams_> is a crash from one of the nodes
[18:01] <jwilliams_> here is the other: http://pastebin.com/csHHjC2h
[18:02] <tchmnkyz> http://pastebin.com/ifSVTtvC
[18:02] <tchmnkyz> final paste i think
[18:02] <tchmnkyz> let me know
[18:02] <joshd1> sel_: yes, I'm afraid 3.9 was an unlucky kernel hit by that bug. it's fixed in 3.10
[18:03] <tchmnkyz> joshd1: any thoughts?
[18:03] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[18:03] <sel_> joshd1, Ok, thanks :)
[18:04] <joshd1> tchmnkyz: looks good
[18:04] <mikedawson> jwilliams_: looks like a bug http://www.spinics.net/lists/ceph-users/msg04589.html
[18:07] * sleinen (~Adium@130.59.94.238) Quit (Ping timeout: 480 seconds)
[18:08] * alram (~alram@cpe-76-167-50-51.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[18:09] <tchmnkyz> so once i put i dont need to keep the get file any more right
[18:09] <tchmnkyz> bceause i dont have a huge os drive in there
[18:09] <tchmnkyz> lol
[18:10] <jwilliams_> mikedawson: I ran it with the debugs set, but the files are 108m each
[18:10] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[18:11] * zhyan__ (~zhyan@101.83.98.249) Quit (Ping timeout: 480 seconds)
[18:12] * haomaiwa_ (~haomaiwan@124.161.72.97) Quit (Remote host closed the connection)
[18:12] <mikedawson> jwilliams_: get ahold of sjust - he may want them
[18:12] * haomaiwang (~haomaiwan@124.161.72.97) has joined #ceph
[18:12] <jwilliams_> ok, thanks
[18:15] * AaronSchulz (~chatzilla@216.38.130.164) Quit (Remote host closed the connection)
[18:20] * haomaiwang (~haomaiwan@124.161.72.97) Quit (Ping timeout: 480 seconds)
[18:24] * smashmo (~smashmo@host109-154-221-125.range109-154.btcentralplus.com) has joined #ceph
[18:24] * smashmo (~smashmo@host109-154-221-125.range109-154.btcentralplus.com) Quit ()
[18:25] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[18:26] * markbby (~Adium@168.94.245.2) has joined #ceph
[18:26] * RuediR (~Adium@macrr.switch.ch) Quit (Quit: Leaving.)
[18:27] * Cube (~Cube@12.248.40.138) has joined #ceph
[18:27] * shang (~ShangWu@70.35.39.20) Quit (Read error: Operation timed out)
[18:28] * brambles (lechuck@s0.barwen.ch) has joined #ceph
[18:34] * a (~a@209.12.169.218) has joined #ceph
[18:34] * a is now known as Guest3030
[18:37] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 24.0/20130910160258])
[18:39] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[18:39] * diegows (~diegows@190.190.11.42) Quit (Read error: Operation timed out)
[18:40] * xarses (~andreww@64-79-127-122.static.wiline.com) has joined #ceph
[18:41] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[18:42] * ircolle (~Adium@2601:1:8380:2d9:4cf:f5a8:9108:e140) has joined #ceph
[18:46] * alram (~alram@216.103.134.250) has joined #ceph
[18:51] <mikedawson> zackc: ping
[18:54] * angdraug (~angdraug@64-79-127-122.static.wiline.com) has joined #ceph
[18:58] * glanzi (~glanzi@201.75.202.207) Quit (Quit: glanzi)
[18:58] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) has joined #ceph
[19:03] <zackc> mikedawson: hey, what's up?
[19:04] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[19:04] <mikedawson> zackc: is there any mechanism in teuthology to trigger bad things during runs? Something similar to netflix's chaos monkey
[19:05] * jjgalvez (~jjgalvez@ip72-193-217-254.lv.lv.cox.net) has joined #ceph
[19:05] * alram (~alram@216.103.134.250) Quit (Ping timeout: 480 seconds)
[19:05] <mikedawson> zackc: like build an infrastructure, create rbd volume, do a long-running rbd bench while randomly triggering osd outages
[19:08] * glanzi (~glanzi@201.75.202.207) has joined #ceph
[19:09] <zackc> mikedawson: not that i'm aware of, no
[19:09] <zackc> ha, chaos monkey looks cool
[19:09] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[19:09] * shang (~ShangWu@70.35.39.20) has joined #ceph
[19:10] * nhm (~nhm@184-97-129-163.mpls.qwest.net) has joined #ceph
[19:10] * ChanServ sets mode +o nhm
[19:10] <mikedawson> zachc: ok, perhaps we can discuss further on the Wed call
[19:10] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) Quit (Quit: This computer has gone to sleep)
[19:16] * rongze (~rongze@211.155.113.161) Quit (Remote host closed the connection)
[19:20] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 24.0/20130910160258])
[19:20] * rongze (~rongze@106.120.176.65) has joined #ceph
[19:21] * alram (~alram@216.103.134.250) has joined #ceph
[19:21] * amospalla (~amospalla@0001a39c.user.oftc.net) Quit (Ping timeout: 480 seconds)
[19:21] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) Quit (Ping timeout: 480 seconds)
[19:25] * ponyofdeath (~vladi@cpe-75-80-165-117.san.res.rr.com) Quit (Remote host closed the connection)
[19:25] * mozg (~andrei@host217-46-236-49.in-addr.btopenworld.com) Quit (Ping timeout: 480 seconds)
[19:25] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) has joined #ceph
[19:26] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[19:26] * markbby (~Adium@168.94.245.2) has joined #ceph
[19:36] <joshd1> mikedawson: zackc: take a look at the thrashosds task - it does exactly that, and is used by a bunch of suites. there's also mon_thrash and mds_thrash
[19:36] * mikedawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[19:39] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[19:40] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[19:40] * The_Bishop (~bishop@2001:470:50b6:0:30e4:1a0a:a3be:e998) Quit (Ping timeout: 480 seconds)
[19:43] <zackc> joshd1: ah, will do. thanks!
[19:47] * The_Bishop (~bishop@2001:470:50b6:0:d052:4fac:ffc8:a87d) has joined #ceph
[19:48] * shang (~ShangWu@70.35.39.20) Quit (Read error: Operation timed out)
[19:53] * amospalla (~amospalla@0001a39c.user.oftc.net) has joined #ceph
[19:56] * peetaur (~peter@CPE788df73fb301-CM788df73fb300.cpe.net.cable.rogers.com) Quit (Read error: Connection reset by peer)
[19:56] * peetaur (~peter@CPE788df73fb301-CM788df73fb300.cpe.net.cable.rogers.com) has joined #ceph
[19:58] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[19:58] * davidz (~Adium@ip68-5-239-214.oc.oc.cox.net) has joined #ceph
[19:58] * topro (~topro@host-62-245-142-50.customer.m-online.net) Quit (Quit: Konversation terminated!)
[19:59] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) Quit (Quit: Leaving.)
[19:59] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) has joined #ceph
[20:00] * Pedras (~Adium@64.191.206.83) has joined #ceph
[20:01] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[20:01] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:01] * amospalla (~amospalla@0001a39c.user.oftc.net) Quit (Ping timeout: 480 seconds)
[20:06] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Remote host closed the connection)
[20:07] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[20:09] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[20:10] * amospalla (~amospalla@0001a39c.user.oftc.net) has joined #ceph
[20:10] * joao (~joao@89.181.145.133) Quit (Read error: Connection reset by peer)
[20:11] * shang (~ShangWu@70.35.39.20) has joined #ceph
[20:11] * joao (~joao@89.181.145.133) has joined #ceph
[20:11] * ChanServ sets mode +o joao
[20:11] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) Quit (Quit: Leaving.)
[20:12] * alram (~alram@216.103.134.250) Quit (Ping timeout: 480 seconds)
[20:13] * alram (~alram@216.103.134.250) has joined #ceph
[20:15] * tryggvil (~tryggvil@178.19.53.254) Quit (Quit: tryggvil)
[20:19] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) has joined #ceph
[20:22] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:22] * sarob (~sarob@nat-dip28-wl-b.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[20:23] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:24] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[20:24] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:24] * The_Bishop (~bishop@2001:470:50b6:0:d052:4fac:ffc8:a87d) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[20:25] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:26] * markbby (~Adium@168.94.245.2) Quit (Quit: Leaving.)
[20:26] * markbby (~Adium@168.94.245.2) has joined #ceph
[20:27] * john_barbee_ (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:27] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) has joined #ceph
[20:28] * i_m (~ivan.miro@deibp9eh1--blueice2n2.emea.ibm.com) Quit (Ping timeout: 480 seconds)
[20:30] * john_barbee (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 24.0/20130910160258])
[20:30] * john_barbee_ is now known as john_barbee
[20:30] * john_barbee_ (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:31] * themgt (~themgt@201-223-210-60.baf.movistar.cl) has joined #ceph
[20:31] * john_barbee_ (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit ()
[20:32] * john_barbee_ (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[20:32] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[20:35] * ponyofdeath (~vladi@cpe-75-80-165-117.san.res.rr.com) has joined #ceph
[20:36] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Remote host closed the connection)
[20:37] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[20:39] * rongze (~rongze@106.120.176.65) Quit (Remote host closed the connection)
[20:39] * rongze (~rongze@117.79.232.203) has joined #ceph
[20:42] * themgt (~themgt@201-223-210-60.baf.movistar.cl) Quit (Ping timeout: 480 seconds)
[20:44] * nwat (~nwat@eduroam-225-58.ucsc.edu) has joined #ceph
[20:45] * shang (~ShangWu@70.35.39.20) Quit (Ping timeout: 480 seconds)
[20:47] * rongze (~rongze@117.79.232.203) Quit (Ping timeout: 480 seconds)
[20:55] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:55] * tryggvil (~tryggvil@17-80-126-149.ftth.simafelagid.is) has joined #ceph
[20:56] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[20:56] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[20:58] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[20:59] * LPG (~LPG@c-76-104-197-224.hsd1.wa.comcast.net) has joined #ceph
[20:59] * thomnico (~thomnico@70.35.39.20) Quit (Quit: Ex-Chat)
[21:06] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[21:07] <cjh_> ceph: how do i diag a monitor that won't start?
[21:07] <cjh_> i tried firing it up manually and i have a stack dump
[21:07] * sleinen (~Adium@2001:620:0:25:6d0d:d2f1:43ea:c314) has joined #ceph
[21:07] <cjh_> i don't know what to do with the core dump i have though
[21:07] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) has joined #ceph
[21:15] <joao> cjh_, pastebin it and point us to it? :)
[21:15] <cjh_> you got it :D
[21:15] <joao> the stack trace that is
[21:16] <cjh_> joao: http://fpaste.org/48382/13823829/
[21:17] <joao> cjh_, so you upgraded your cluster, was that it?
[21:17] <cjh_> no i just upgraded ubuntu from 12.10 -> 13.04
[21:17] * JoeGruher (~JoeGruher@134.134.139.72) has joined #ceph
[21:17] <cjh_> after that it started crashing
[21:17] <cjh_> this is the latest dumpling code
[21:17] <cjh_> 2 of my 3 monitors work fine though
[21:18] <joao> cjh_, please set '--debug-mon 10', rerun and pastebin the result
[21:18] <cjh_> ok
[21:18] <joao> I'll be back in a few minutes
[21:19] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[21:19] <cjh_> joao: http://fpaste.org/48384/38238315/
[21:19] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) Quit (Quit: Computer has gone to sleep.)
[21:23] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) has joined #ceph
[21:23] * shang (~ShangWu@38.126.120.10) has joined #ceph
[21:26] * hijacker (~hijacker@213.91.163.5) Quit (Read error: Connection timed out)
[21:26] * hijacker (~hijacker@bgva.sonic.taxback.ess.ie) has joined #ceph
[21:26] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[21:27] * markbby (~Adium@168.94.245.2) has joined #ceph
[21:28] <joao> cjh_, how big is your mon store and are you willing to tar it and drop it somewhere I can take a look?
[21:28] <joao> seen this thing over the weekend but aside from logs didn't get a chance to audit the store
[21:29] <cjh_> hmm lemme check
[21:29] <cjh_> 138MB. yeah i'll tar it
[21:29] <joao> cool
[21:30] <joao> ty
[21:30] <cjh_> joao: where can i drop this? It's 11M
[21:33] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[21:47] <JoeGruher> when upgrading centos 6.4 to a recommended kernel version do folks generally just build the new kernel manually from source? or is there a better way?
[21:48] <TVR> I had to build from source.. but there are pre-built ones out there as well
[21:48] <TVR> I needed it because I wanted features like cephfs
[21:48] * shang (~ShangWu@38.126.120.10) Quit (Ping timeout: 480 seconds)
[21:48] * thomnico (~thomnico@70.35.39.20) Quit (Ping timeout: 480 seconds)
[21:48] <JoeGruher> does 3.4.59 sound like the right kernel to target? it was mentioned on the mailing list a while back as one to use.
[21:48] <TVR> anything > 3.11
[21:49] <JoeGruher> hmmm ok
[21:50] <TVR> 3.11 is a bit old.. but it has all the features needed in ceph.. and as you are in this channel.. that should be your oldest target if you are going to build one
[21:50] <JoeGruher> ok, thanks
[21:52] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) Quit (Quit: Leaving.)
[21:52] * ksingh (~Adium@b-v6-0003.vpn.csc.fi) has joined #ceph
[22:01] * Vjarjadian (~IceChat77@94.1.37.151) has joined #ceph
[22:07] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) has joined #ceph
[22:09] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) Quit ()
[22:11] <lxo> ugh! anchortable grew too big (>92MB), now ceph-mds won't start. trying to find out what tunning parameter I have to change...
[22:13] * JoeGruher (~JoeGruher@134.134.139.72) Quit (Ping timeout: 480 seconds)
[22:16] * allsystemsarego (~allsystem@188.27.166.164) Quit (Quit: Leaving)
[22:16] * ksingh (~Adium@b-v6-0003.vpn.csc.fi) Quit (Ping timeout: 480 seconds)
[22:17] * t0rn (~ssullivan@2607:fad0:32:a02:d227:88ff:fe02:9896) Quit (Quit: Leaving.)
[22:17] * Tamil1 (~Adium@cpe-108-184-77-181.socal.res.rr.com) has joined #ceph
[22:19] <cmdrk> hi guys, i'm doing some benchmarking of CephFS. i'm copying some large-ish files from ramdisk to a cephfs mountpoint. is it sufficient to drop caches and sync on the OSD hosts before writing files in ? is there any caching on the ceph level that i should be aware of?
[22:20] <tsnider> can someone tell me what I'm missing. I have a radosgw user created. I get the following complaint when I try to create a pool with or without the --uid parameter:
[22:20] <tsnider> radosgw-admin pool add --pool=radosPool --uid=rados
[22:20] <tsnider> 2013-10-21 13:17:03.976358 7f7b5f78a780 0 WARNING: cannot read region map
[22:20] <tsnider> failed to add bucket placement: (2) No such file or directory
[22:23] * markbby (~Adium@168.94.245.2) Quit (Remote host closed the connection)
[22:26] * jcsp (~jcsp@0001bf3a.user.oftc.net) has joined #ceph
[22:27] * bandrus1 (~Adium@c-98-238-148-252.hsd1.ca.comcast.net) has joined #ceph
[22:28] <lxo> looks like it is osd max write size
[22:30] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[22:30] * themgt (~themgt@201-223-204-108.baf.movistar.cl) has joined #ceph
[22:32] * jcsp (~jcsp@0001bf3a.user.oftc.net) Quit (Quit: Leaving.)
[22:33] * diegows (~diegows@200-127-157-157.net.prima.net.ar) has joined #ceph
[22:33] * bandrus (~Adium@c-98-238-148-252.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[22:34] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[22:43] * rongze (~rongze@114.249.24.120) has joined #ceph
[22:47] <cjh_> loicd: i just saw your users post
[22:48] <cjh_> i have some salt files that i'm using to deploy ceph
[22:48] <loicd> hi
[22:48] <cjh_> yo
[22:49] <loicd> cool, will you be able to join CDS ? It's likely to be 14/15 november 2013 (not sure yet)
[22:49] <cjh_> loicd: i think i should sign up for a wiki account. i can contribute to this
[22:49] <cjh_> when is the CDS again?
[22:49] <cjh_> i might be able to
[22:49] <loicd> excellent
[22:50] <cjh_> are you guys joining up over google hangouts or something?
[22:51] * rongze (~rongze@114.249.24.120) Quit (Ping timeout: 480 seconds)
[22:52] * JoeGruher (~JoeGruher@134.134.139.72) has joined #ceph
[22:53] <cjh_> loicd: if i posted my setup to github would that be helpful?
[22:53] <loicd> cjh_: I'm not sure what scuttlemonkey will come up with but it will be online, yes.
[22:54] <loicd> cjh_: yes, please add the link to the list on the wiki !
[22:54] <cjh_> loicd: ok cool. i can most likely participate then :)
[22:54] <cjh_> loicd: will do. It's currently in HG so i need to convert it post to bitbucket.
[23:01] <loicd> ok
[23:05] <cjh_> *github i mean
[23:11] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Read error: Connection reset by peer)
[23:12] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[23:13] * kraken (~kraken@gw.sepia.ceph.com) Quit (Remote host closed the connection)
[23:14] * kraken (~kraken@gw.sepia.ceph.com) has joined #ceph
[23:17] <cjh_> joao: so is the monitor on that machine totally screwed?
[23:17] <joao> cjh_, haven't had the chance to dive into it
[23:18] <joao> cjh_, do you have other monitors?
[23:18] <joao> is quorum formed?
[23:19] <cjh_> yeah i have quorum with my 2 others
[23:19] <cjh_> i should be ok for awhile
[23:19] <joao> okay, if you want to get the monitor working asap, I'd recommend you to re-mkfs that monitor
[23:20] <joao> cjh_, can you upload the full log for this monitor as well?
[23:20] <joao> and throw in a log from another monitor (ideally the leader)?
[23:20] <cjh_> sure
[23:20] <cjh_> yeah
[23:20] <joao> thanks
[23:20] <joao> appreciated
[23:21] * JustEra (~JustEra@ALille-555-1-116-86.w90-7.abo.wanadoo.fr) has joined #ceph
[23:22] <cjh_> joao: done
[23:22] <joao> ty, grabbing them
[23:23] <joao> most likely, will only be able to look into this tomorrow
[23:23] <joao> getting pretty late as it is and I still have other things to attend to
[23:23] * papamoose (~kauffman@hester.cs.uchicago.edu) Quit (Ping timeout: 480 seconds)
[23:23] <cjh_> joao: ok cool :)
[23:25] * jharley (~jharley@69-196-134-224.dsl.teksavvy.com) has joined #ceph
[23:27] <jharley> I have a few questions about ceph being in a degraded state (e.g. a few OSDs being unavailable because they're in a single box and it's died). At what point (if any?) does ceph decide the machine being gone is 'serious' and re-copy the objects that were on those OSDs to make sure things are healthy?
[23:28] <sjustwork> when the osds are marked "out"
[23:28] <sjustwork> which happens by default after (consulting docs)
[23:28] <Cube> 300 seconds
[23:28] <sjustwork> that one
[23:28] <jharley> killer, thanks. For my own education, what's that tunable?
[23:29] <sjustwork> osd_down_out_interval?
[23:29] <jharley> and, I'll RTFM from there
[23:29] * themgt (~themgt@201-223-204-108.baf.movistar.cl) Quit (Quit: themgt)
[23:29] <sjustwork> oops
[23:29] <sjustwork> mon_osd_down_out_interval
[23:29] <sjustwork> almost got it
[23:29] * sprachgenerator (~sprachgen@130.202.135.192) Quit (Quit: sprachgenerator)
[23:32] <jharley> sjustwork: thanks a bunch!
[23:32] * kraken (~kraken@gw.sepia.ceph.com) Quit (Remote host closed the connection)
[23:32] <sjustwork> sure!
[23:32] * kraken (~kraken@gw.sepia.ceph.com) has joined #ceph
[23:38] * thomnico (~thomnico@70.35.39.20) Quit (Quit: Ex-Chat)
[23:41] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[23:41] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[23:43] * shang (~ShangWu@70.35.39.20) has joined #ceph
[23:44] * jharley (~jharley@69-196-134-224.dsl.teksavvy.com) Quit (Quit: jharley)
[23:45] * brambles (lechuck@s0.barwen.ch) Quit (Ping timeout: 480 seconds)
[23:45] * thomnico (~thomnico@70.35.39.20) has joined #ceph
[23:45] * rongze (~rongze@114.249.24.120) has joined #ceph
[23:45] * thomnico (~thomnico@70.35.39.20) Quit (Read error: Connection reset by peer)
[23:46] * sprachgenerator (~sprachgen@130.202.135.192) has joined #ceph
[23:49] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[23:51] * sprachgenerator (~sprachgen@130.202.135.192) Quit (Quit: sprachgenerator)
[23:52] * markbby (~Adium@168.94.245.4) has joined #ceph
[23:54] * tsnider (~tsnider@nat-216-240-30-23.netapp.com) Quit (Quit: Leaving.)
[23:56] * john_barbee (~jbarbee@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Quit: ChatZilla 0.9.90.1 [Firefox 24.0/20130910160258])
[23:56] * rongze (~rongze@114.249.24.120) Quit (Ping timeout: 480 seconds)
[23:59] * glanzi (~glanzi@201.75.202.207) Quit (Quit: glanzi)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.