#ceph IRC Log

Index

IRC Log for 2014-01-10

Timestamps are in GMT/BST.

[0:01] * sarob_ (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[0:01] * AaronSchulz (~chatzilla@c-24-7-62-151.hsd1.ca.comcast.net) has joined #ceph
[0:01] * sarob (~sarob@2001:4998:effd:600:d191:14b7:e9f6:fe7c) has joined #ceph
[0:03] * nwat (~textual@martini.fl-guest.ucar.edu) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[0:05] <AaronSchulz> so, is http://tracker.ceph.com/issues/6462 supposed to be fixed in Emperor?
[0:07] <sagelap> AaronSchulz: it should be, yeah
[0:07] <sagelap> it sounds like teh Content-Disposition part is a separate/additional bug
[0:08] * nwat (~textual@martini.fl-guest.ucar.edu) has joined #ceph
[0:08] * nwat (~textual@martini.fl-guest.ucar.edu) Quit ()
[0:08] <AaronSchulz> content-encoding too, all of the non-metadata headers it seems
[0:08] <AaronSchulz> this works perfectly in regular swift
[0:09] <sagelap> can you include the (curl or whatever) commands that are failing in teh bug so we can easily reproduce this?
[0:09] * sarob (~sarob@2001:4998:effd:600:d191:14b7:e9f6:fe7c) Quit (Ping timeout: 480 seconds)
[0:11] <AaronSchulz> my initial comments mentions some, they can still be used for this
[0:11] * BillK (~BillK-OFT@124-148-103-108.dyn.iinet.net.au) has joined #ceph
[0:14] * nwat (~textual@martini.fl-guest.ucar.edu) has joined #ceph
[0:15] * dmsimard (~Adium@108.163.152.2) Quit (Quit: Leaving.)
[0:16] * nwat (~textual@martini.fl-guest.ucar.edu) Quit ()
[0:16] * vata (~vata@2607:fad8:4:6:a891:c01c:3849:bc26) Quit (Quit: Leaving.)
[0:26] * bandrus1 (~Adium@63.192.141.3) has joined #ceph
[0:27] * mattbenjamin1 (~matt@aa2.linuxbox.com) Quit (Quit: Leaving.)
[0:27] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[0:28] * BManojlovic (~steki@217-162-200-111.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[0:28] <PerlStalker> I have a cluster where one of my osd's has been down for a few days because of a disk failure.
[0:28] <PerlStalker> I removed the osd in preparation for recreating it.
[0:29] <PerlStalker> Why would the remove force a slew of remaps when the osd has been down for so long.
[0:29] <PerlStalker> ?
[0:32] * nwat (~textual@martini.fl-guest.ucar.edu) has joined #ceph
[0:32] * bandrus (~Adium@63.192.141.3) Quit (Ping timeout: 480 seconds)
[0:34] * nwat (~textual@martini.fl-guest.ucar.edu) Quit ()
[0:35] * nwat (~textual@martini.fl-guest.ucar.edu) has joined #ceph
[0:36] * nwat (~textual@martini.fl-guest.ucar.edu) Quit ()
[0:36] * nwat (~textual@martini.fl-guest.ucar.edu) has joined #ceph
[0:39] * japuzzo_ (~japuzzo@ool-4570886e.dyn.optonline.net) Quit (Quit: Leaving)
[0:39] * AfC (~andrew@2001:388:a098:120:2ad2:44ff:fe08:a4c) has joined #ceph
[0:41] * diegows (~diegows@190.190.17.57) has joined #ceph
[0:44] * nwat (~textual@martini.fl-guest.ucar.edu) Quit (Ping timeout: 480 seconds)
[0:48] * ScOut3R (~scout3r@54009895.dsl.pool.telekom.hu) Quit ()
[0:48] * sagelap (~sage@119.225.29.62) Quit (Read error: Connection reset by peer)
[0:48] * sagelap (~sage@119.225.29.62) has joined #ceph
[0:51] * andreask (~andreask@h081217067008.dyn.cm.kabsi.at) Quit (Ping timeout: 480 seconds)
[0:56] * glambert_ (~glambert@37.157.50.243) Quit (Ping timeout: 480 seconds)
[0:59] * capri_wk (~capri@p579F9ACB.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[0:59] * mattt_ (~textual@cpc25-rdng20-2-0-cust162.15-3.cable.virginm.net) Quit (Quit: Computer has gone to sleep.)
[1:06] * zjf (~zjf@103.31.149.32) has joined #ceph
[1:09] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) has joined #ceph
[1:09] * sarob (~sarob@nat-dip4.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[1:09] * sarob (~sarob@2001:4998:effd:600:299b:a89b:4c57:66d8) has joined #ceph
[1:10] * mattbenjamin (~matt@76-206-42-105.lightspeed.livnmi.sbcglobal.net) has joined #ceph
[1:14] * alram (~alram@38.122.20.226) Quit (Ping timeout: 480 seconds)
[1:15] * mwarwick (~mwarwick@2407:7800:400:1011:3e97:eff:fe91:d9bf) has joined #ceph
[1:16] * AfC (~andrew@2001:388:a098:120:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[1:22] <pmatulis> PerlStalker: b/c it was still considered part of the cluster. that is, it was 'down' but still 'in'
[1:28] * JoeGruher (~JoeGruher@jfdmzpr04-ext.jf.intel.com) has joined #ceph
[1:29] * sagelap (~sage@119.225.29.62) Quit (Ping timeout: 480 seconds)
[1:39] * mwarwick (~mwarwick@2407:7800:400:1011:3e97:eff:fe91:d9bf) has left #ceph
[1:42] * diegows (~diegows@190.190.17.57) Quit (Ping timeout: 480 seconds)
[1:43] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[1:47] * sagelap (~sage@md85636d0.tmodns.net) has joined #ceph
[1:53] * BillK (~BillK-OFT@124-148-103-108.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[1:53] * zerick (~eocrospom@190.187.21.53) Quit (Remote host closed the connection)
[1:57] * ScOut3R (~ScOut3R@54009895.dsl.pool.telekom.hu) has joined #ceph
[1:57] * zjf (~zjf@103.31.149.32) Quit (Remote host closed the connection)
[1:58] * cofol1986 (~xwrj@110.90.119.113) Quit (Read error: Connection reset by peer)
[2:00] * sagelap (~sage@md85636d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[2:01] * bandrus1 (~Adium@63.192.141.3) Quit (Quit: Leaving.)
[2:02] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[2:02] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Remote host closed the connection)
[2:02] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) has joined #ceph
[2:03] * kaizh (~oftc-webi@128-107-239-235.cisco.com) Quit (Remote host closed the connection)
[2:05] * xarses (~andreww@12.164.168.115) Quit (Ping timeout: 480 seconds)
[2:05] * diegows (~diegows@190.190.17.57) has joined #ceph
[2:06] * sarob (~sarob@2001:4998:effd:600:299b:a89b:4c57:66d8) Quit (Remote host closed the connection)
[2:06] * sarob (~sarob@2001:4998:effd:600:299b:a89b:4c57:66d8) has joined #ceph
[2:13] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[2:14] * ScOut3R (~ScOut3R@54009895.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[2:14] * sarob (~sarob@2001:4998:effd:600:299b:a89b:4c57:66d8) Quit (Ping timeout: 480 seconds)
[2:20] * Tamil2 (~Adium@cpe-76-168-18-224.socal.res.rr.com) Quit (Quit: Leaving.)
[2:25] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:26] * Tamil1 (~Adium@cpe-76-168-18-224.socal.res.rr.com) has joined #ceph
[2:26] * rmoe_ (~quassel@12.164.168.115) Quit (Ping timeout: 480 seconds)
[2:34] * dmsimard (~Adium@108.163.152.66) has joined #ceph
[2:35] * rmoe (~quassel@173-228-89-134.dsl.static.sonic.net) has joined #ceph
[2:36] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) has joined #ceph
[2:37] * Tamil1 (~Adium@cpe-76-168-18-224.socal.res.rr.com) Quit (Quit: Leaving.)
[2:38] * LeaChim (~LeaChim@host86-174-30-7.range86-174.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[2:40] * Tamil1 (~Adium@cpe-76-168-18-224.socal.res.rr.com) has joined #ceph
[2:43] * linuxkidd (~linuxkidd@cpe-066-057-019-145.nc.res.rr.com) Quit (Ping timeout: 480 seconds)
[2:45] * shang (~ShangWu@220-135-203-169.HINET-IP.hinet.net) Quit (Ping timeout: 480 seconds)
[2:52] * i_m (~ivan.miro@pool-109-191-69-34.is74.ru) has joined #ceph
[2:58] * diegows (~diegows@190.190.17.57) Quit (Read error: Operation timed out)
[3:13] * Tamil1 (~Adium@cpe-76-168-18-224.socal.res.rr.com) Quit (Quit: Leaving.)
[3:17] * sarob (~sarob@nat-dip32-wl-f.cfw-a-gci.corp.yahoo.com) has joined #ceph
[3:25] * sarob (~sarob@nat-dip32-wl-f.cfw-a-gci.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[3:29] * dmsimard1 (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[3:29] <PerlStalker> pmatulis: So, if I had marked it 'out' when I realized that the disk was hosed, I won't have had this rebuild when I removed the osd.
[3:31] * xmltok (~xmltok@cpe-76-90-130-65.socal.res.rr.com) has joined #ceph
[3:32] * dmsimard1 (~Adium@69-165-206-93.cable.teksavvy.com) Quit ()
[3:33] <pmatulis> PerlStalker: the data would always need to be redistributed if an OSD is removed, maybe i'm missing something, isn't it obvious?
[3:34] * erice (~erice@50.240.86.181) Quit (Read error: Connection reset by peer)
[3:34] * dmsimard (~Adium@108.163.152.66) Quit (Ping timeout: 480 seconds)
[3:35] <PerlStalker> pmatulis: My question comes from the fact that this osd has been down for a while and ceph had already compensated for it's loss return the cluster to an OK state.
[3:35] * xmltok (~xmltok@cpe-76-90-130-65.socal.res.rr.com) Quit (Remote host closed the connection)
[3:35] <PerlStalker> I figured since the cluster had made it to that point, removing the osd wouldn't require the cluster to recover the data again.
[3:35] * xmltok (~xmltok@cpe-76-90-130-65.socal.res.rr.com) has joined #ceph
[3:35] * xmltok (~xmltok@cpe-76-90-130-65.socal.res.rr.com) Quit (Remote host closed the connection)
[3:36] * xmltok (~xmltok@cpe-76-90-130-65.socal.res.rr.com) has joined #ceph
[3:36] <PerlStalker> s/recover/redistribute/
[3:36] <kraken> PerlStalker meant to say: I figured since the cluster had made it to that point, removing the osd wouldn't require the cluster to redistribute the data again.
[3:38] * xmltok_ (~xmltok@cpe-76-90-130-65.socal.res.rr.com) has joined #ceph
[3:38] * xmltok (~xmltok@cpe-76-90-130-65.socal.res.rr.com) Quit (Read error: Connection reset by peer)
[3:40] * luser0 (~chatzilla@2602:306:38a0:2c79:21d:72ff:fea4:5b69) has joined #ceph
[3:42] * luser0 (~chatzilla@2602:306:38a0:2c79:21d:72ff:fea4:5b69) has left #ceph
[3:42] * haomaiwa_ (~haomaiwan@211.155.113.185) Quit (Ping timeout: 480 seconds)
[3:43] <pmatulis> PerlStalker: b/c it's still part of the 'acting set' of a certain number of PGs, even if it was down
[3:43] * AfC (~andrew@182.255.122.166) has joined #ceph
[3:44] * erice (~erice@50.240.86.181) has joined #ceph
[3:44] <pmatulis> PerlStalker: if an OSD is not available for a while those PGs will be assigned to other OSDs, temporarily
[3:44] * angdraug (~angdraug@12.164.168.115) Quit (Quit: Leaving)
[3:47] * xmltok_ (~xmltok@cpe-76-90-130-65.socal.res.rr.com) Quit (Read error: Operation timed out)
[3:47] <pmatulis> if that's incorrect, hopefully someone here will say something
[3:48] * AfC (~andrew@182.255.122.166) Quit ()
[3:49] * dpippenger (~riven@66-192-9-78.static.twtelecom.net) Quit (Quit: Leaving.)
[3:53] * julian (~julianwa@125.69.104.241) has joined #ceph
[3:54] * julian (~julianwa@125.69.104.241) Quit ()
[3:55] * julian (~julianwa@125.69.104.241) has joined #ceph
[4:01] * shang (~ShangWu@175.41.48.77) has joined #ceph
[4:04] * sarob (~sarob@nat-dip32-wl-f.cfw-a-gci.corp.yahoo.com) has joined #ceph
[4:10] * nhm_ (~nhm@65-128-184-39.mpls.qwest.net) Quit (Quit: Lost terminal)
[4:23] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[4:27] <aarontc> pmatulis: I don't think that will happen until the OSD is set 'out'
[4:28] * haomaiwang (~haomaiwan@117.79.232.250) has joined #ceph
[4:30] * caius (~caius@pool-173-61-103-147.cmdnnj.fios.verizon.net) has joined #ceph
[4:38] * haomaiwang (~haomaiwan@117.79.232.250) Quit (Remote host closed the connection)
[4:40] * haomaiwang (~haomaiwan@101.78.195.61) has joined #ceph
[4:45] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) has joined #ceph
[4:56] * jesus (~jesus@emp048-51.eduroam.uu.se) Quit (Ping timeout: 480 seconds)
[4:56] * Tamil1 (~Adium@cpe-76-168-18-224.socal.res.rr.com) has joined #ceph
[5:00] * haomaiwa_ (~haomaiwan@117.79.232.250) has joined #ceph
[5:02] * BillK (~BillK-OFT@58-7-91-102.dyn.iinet.net.au) has joined #ceph
[5:06] * fireD (~fireD@93-142-206-75.adsl.net.t-com.hr) has joined #ceph
[5:07] * jesus (~jesus@emp048-51.eduroam.uu.se) has joined #ceph
[5:07] * haomaiwang (~haomaiwan@101.78.195.61) Quit (Ping timeout: 480 seconds)
[5:07] * shang (~ShangWu@175.41.48.77) Quit (Read error: Connection reset by peer)
[5:08] * fireD_ (~fireD@93-139-161-91.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:08] * Tamil1 (~Adium@cpe-76-168-18-224.socal.res.rr.com) Quit (Quit: Leaving.)
[5:08] <fcoj> Hi, I'm pretty new to ceph. Is it ok to ask a newbie question here?
[5:09] <fcoj> What does it mean when a osd is in but down?
[5:10] <fcoj> There are no errors or warnings in the logs.
[5:10] * doppelgrau (~doppelgra@pd956d116.dip0.t-ipconnect.de) Quit (Quit: doppelgrau)
[5:22] <dmick> fcoj: the process isn't running, or otherwise uncontactable, but the cluster still thinks it'll be usable again one day
[5:23] <dmick> and so may temporarily replicate its contents, but won't give up on it
[5:30] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) has joined #ceph
[5:34] <fcoj> @dmick: Thank you for your answer. The process is running,that much I know.
[5:34] * JC (~JC@71-94-44-243.static.trlk.ca.charter.com) Quit (Quit: Leaving.)
[5:34] <dmick> but shows as down in osd tree?
[5:34] <fcoj> I can start it and stop it from my mon node.
[5:35] <fcoj> yes, it seems to be in state in but down.
[5:35] <fcoj> root@ceph0:/etc/ceph# ceph -c ceph.conf health
[5:35] <fcoj> HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean; 2/2 in osds are down
[5:35] <dmick> how many machines do you expect to be configured, and how many of which daemons are running on them?
[5:36] <fcoj> There are three machines
[5:36] <fcoj> One for mon
[5:36] <fcoj> Two for osd
[5:36] <dmick> what do you mean by "i can start it and stop it from my mon node", then (assuming we're talking about an OSD process)
[5:37] <fcoj> Well, /etc/init.d/ceph -a stop
[5:37] <fcoj> stops all daemons on all three machines.
[5:37] <dmick> oh.
[5:38] <dmick> so somehow the osds can't contact the mon, then. network issues?
[5:38] <dmick> selinux or apparmor?
[5:39] <fcoj> not selinux at least
[5:39] <fcoj> It's a Ubuntu 13.10 install
[5:40] * dmsimard (~Adium@69-165-206-93.cable.teksavvy.com) Quit (Quit: Leaving.)
[5:40] <dmick> anything odd about the net config at all? everyone know each other by shortnames?
[5:41] * Vacum_ (~vovo@i59F792F8.versanet.de) has joined #ceph
[5:42] <fcoj> do you mean on /etc/hosts level in all three machines?
[5:43] <fcoj> BTW I tore down apparmor in all machines, restarted all daemons and it's the same
[5:43] <dmick> /etc/hosts or whatever nameservice you're using, yes
[5:43] <fcoj> monmap e1: 1 mons at {ceph0=192.168.10.200:6789/0}, election epoch 1, quorum 0 ceph0
[5:44] <fcoj> is this significant? Should not a single entitiy be able to reach a qourom with itself ? :)
[5:48] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[5:48] * Vacum (~vovo@i59F7A87D.versanet.de) Quit (Ping timeout: 480 seconds)
[5:49] <fcoj> @dmick: and yes, all machines know each others short name from hosts
[5:49] <cephalobot> fcoj: Error: "dmick:" is not a valid command.
[5:49] <dmick> I think it's in quorum; I think that's just saying that the quorum consists of mon 0
[5:49] <fcoj> oh
[5:49] <fcoj> ok
[5:49] <dmick> I'd look at logs then, and see why the osds are complaining
[5:50] <fcoj> But the logs seem to indicate that all is fine and dandy
[5:50] * sarob (~sarob@nat-dip32-wl-f.cfw-a-gci.corp.yahoo.com) Quit (Remote host closed the connection)
[5:50] <fcoj> no errors
[5:50] * sarob (~sarob@nat-dip32-wl-f.cfw-a-gci.corp.yahoo.com) has joined #ceph
[5:51] <fcoj> 16 update_osd_stat osd_stat(1057 MB used, 29646 MB avail, 30704 MB total, peers []/[] op hist [])
[5:51] <fcoj> (from logs)
[5:51] <dmick> which log
[5:51] <dmick> that must be the osd log?
[6:12] * Disconnected.
[6:12] -kinetic.oftc.net- *** Looking up your hostname...
[6:12] -kinetic.oftc.net- *** Checking Ident
[6:12] -kinetic.oftc.net- *** Couldn't look up your hostname
[6:13] -kinetic.oftc.net- *** No Ident response

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.