#ceph IRC Log

Index

IRC Log for 2013-10-19

Timestamps are in GMT/BST.

[0:00] <nhm> trying to get back to the terminal
[0:01] <infernix> try close and open in another tab
[0:01] <infernix> i restarted that thing, it complains about missing session
[0:01] <nhm> ah, there we go
[0:01] <nhm> wow, look at all of that!
[0:01] <nhm> beautiful
[0:02] <nhm> it's got EVERYTHING
[0:02] <nhm> so nice
[0:03] <nhm> how about: perf report --sort symbol --call-graph
[0:05] * yehudasa (~yehudasa@2607:f298:a:607:ea03:9aff:fe98:e8ff) Quit (Read error: Connection reset by peer)
[0:09] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Remote host closed the connection)
[0:14] * sagelap (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[0:20] * dmsimard (~Adium@2607:f748:9:1666:b560:a1e6:6767:52ea) Quit (Quit: Leaving.)
[0:24] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[0:25] * AfC (~andrew@jim1020952.lnk.telstra.net) has joined #ceph
[0:27] * papamoose (~kauffman@hester.cs.uchicago.edu) Quit (Quit: Leaving.)
[0:30] * dosaboy (~dosaboy@host109-157-182-235.range109-157.btcentralplus.com) Quit (Quit: leaving)
[0:31] * mtanski (~mtanski@69.193.178.202) has joined #ceph
[0:32] <nhm> infernix: when you get back, can you email me callgraph_for_mark.tgz? mark.nelson@inktank.com
[0:32] <nhm> Thanks! :)
[0:32] <infernix> nhm, you can scp it somewhere if you like
[0:34] <cjh_> ceph: i'm having a problem in dumpling on ubuntu 13.04 where the process just core dumps when i start it up
[0:35] * mtanski (~mtanski@69.193.178.202) Quit (Read error: Operation timed out)
[0:35] <cjh_> this is what i see: ceph osd tree*** Error in `/usr/bin/python': realloc(): invalid next size: 0x00000000015c2070 ***
[0:35] * onizo (~onizo@wsip-98-175-245-242.sd.sd.cox.net) Quit (Remote host closed the connection)
[0:36] <nhm> infernix: hrm, they probably wouldn't like it if I scped from there. Can we stick it on the web?
[0:37] * dosaboy (~dosaboy@65.93.189.91.lcy-01.canonistack.canonical.com) has joined #ceph
[0:37] <infernix> nhm: sent :)
[0:38] * davidzlap (~Adium@38.122.20.226) Quit (Ping timeout: 480 seconds)
[0:40] <cjh_> anyone else experiencing this error?
[0:40] * gregmark (~Adium@68.87.42.115) Quit (Quit: Leaving.)
[0:41] <cjh_> actually, lemme update. starting up ceph crashes my machine completely and it reboots :(
[0:45] * erice_ (~erice@50.240.86.181) Quit (Ping timeout: 480 seconds)
[0:54] <cjh_> i suspect i installed a bad ram stick. it started crashing after that. that might explain the invalid next size also
[0:55] * KevinPerks1 (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[0:57] * Guest2182 (~coyo@thinks.outside.theb0x.org) Quit (Quit: om nom nom delicious bitcoins...)
[0:58] * sarob (~sarob@ip-64-134-228-129.public.wayport.net) has joined #ceph
[1:01] * sarob (~sarob@ip-64-134-228-129.public.wayport.net) Quit (Remote host closed the connection)
[1:01] * sarob (~sarob@ip-64-134-228-129.public.wayport.net) has joined #ceph
[1:04] * danieagle (~Daniel@186.214.54.30) has joined #ceph
[1:05] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) has joined #ceph
[1:10] * vata (~vata@2607:fad8:4:6:4da0:eb94:ac70:4302) Quit (Quit: Leaving.)
[1:11] * LeaChim (~LeaChim@host86-174-76-26.range86-174.btcentralplus.com) Quit (Remote host closed the connection)
[1:12] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Remote host closed the connection)
[1:13] * davidzlap (~Adium@76.173.16.173) has joined #ceph
[1:14] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) has joined #ceph
[1:17] * alexxy (~alexxy@2001:470:1f14:106::2) has joined #ceph
[1:19] <n1md4> wido: Hi. Sorry to bother you, but I've been working out a problem, and you answered questions in the email thread that had the same problem as me. My question: I'd like to use libvirt with rbd. I have the secret.xml file, but what is the UUID that I should use in that file?
[1:20] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) has joined #ceph
[1:20] * dxd828 (~dxd828@host-2-97-72-213.as13285.net) Quit ()
[1:23] <cjh_> how do I kick off a backfill? My ceph cluster seems to just be sitting there with pg's degraded
[1:27] <n1md4> wido: Sorry to bother you again, but I've worked it out now. Thanks.
[1:31] * ircolle (~Adium@c-67-172-132-222.hsd1.co.comcast.net) Quit (Quit: Leaving.)
[1:34] * nwat (~nwat@eduroam-251-62.ucsc.edu) Quit (Ping timeout: 480 seconds)
[1:43] * xmltok (~xmltok@cpe-76-170-26-114.socal.res.rr.com) Quit (Quit: Bye!)
[1:45] * The_Bishop (~bishop@g229217042.adsl.alicedsl.de) has joined #ceph
[1:50] * Gamekiller77 (~oftc-webi@128-107-239-235.cisco.com) Quit (Remote host closed the connection)
[1:54] <cjh_> ceph, anyone around still?
[2:01] * AfC (~andrew@jim1020952.lnk.telstra.net) Quit (Quit: Leaving.)
[2:03] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[2:06] * a_ (~a@209.12.169.218) Quit (Quit: This computer has gone to sleep)
[2:09] * danieagle (~Daniel@186.214.54.30) Quit (Quit: inte+ e Obrigado Por tudo mesmo! :-D)
[2:11] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[2:16] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[2:18] <davidzlap> cjh_: It should be trying to recover itself, if it is possible. Maybe there is data which is unavailable because OSDs are down.
[2:23] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[2:25] * sarob (~sarob@ip-64-134-228-129.public.wayport.net) Quit (Remote host closed the connection)
[2:25] * sarob (~sarob@ip-64-134-228-129.public.wayport.net) has joined #ceph
[2:32] * Pedras (~Adium@216.207.42.132) Quit (Ping timeout: 480 seconds)
[2:33] * sarob (~sarob@ip-64-134-228-129.public.wayport.net) Quit (Ping timeout: 480 seconds)
[2:43] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) Quit (Quit: Ja odoh a vi sta 'ocete...)
[2:53] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[2:59] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[3:02] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[3:10] * gucki (~smuxi@p549F966C.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[3:12] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:16] * sagelap1 (~sage@2600:1012:b008:8ba2:18f7:88d2:cd17:da33) has joined #ceph
[3:18] * shimo (~A13032@122x212x216x66.ap122.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[3:21] * bitblt (~don@wsip-70-166-101-169.ph.ph.cox.net) has joined #ceph
[3:21] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[3:21] * sagelap (~sage@2607:f298:a:607:ea03:9aff:febc:4c23) Quit (Ping timeout: 480 seconds)
[3:22] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[3:23] <bitblt> hi, if anyone has a moment, maybe you could help me with ceph-rest-api. I keep getting a rados_initialize failure. I added a user called restapi, added perms for mon = r.
[3:24] <bitblt> I launch it as 'ceph-rest-api -c /etc/ceph/ceph.conf -n restapi
[3:25] <bitblt> the errors seem to flip-flop between the error code: -22 and "rados.ObjectNotFound: error calling connect" which I figure is from using non-existent args like a bad username?
[3:27] * angdraug (~angdraug@64-79-127-122.static.wiline.com) Quit (Quit: Leaving)
[3:32] <dmick> "added a user" meaning "added a key/keyring for client.restapi"?
[3:33] <dmick> (-n restapi is the default)
[3:33] <dmick> (as is /etc/ceph/ceph.conf)
[3:33] <dmick> you should see errors in the monitor log, but it's probably bad keys/keyrings
[3:35] * xarses (~andreww@64-79-127-122.static.wiline.com) Quit (Ping timeout: 480 seconds)
[3:36] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[3:37] <bitblt> i created a new keyring for restapi, and gave it read permissions for mon
[3:37] <bitblt> nothing in the mon log though..
[3:37] <dmick> so, a keyring is just a file; if you made a new one that has to be mentioned in ceph.conf
[3:37] <dmick> if you mean you made a new key in an existing keyring file, that step isn't necessary
[3:37] <dmick> but you also need to add the key to the ceph cluster
[3:38] <dmick> with ceph auth
[3:38] <dmick> i.e., the key (perhaps from a keyring file) is what the client gets out of its pocket
[3:38] <dmick> the ceph auth is what installs the lock
[3:39] <bitblt> i created the keyring as /etc/ceph/client.restapi,then added client.restapi using ceph auth to have read perms to mon
[3:39] <bitblt> you're saying i should cat the restapi keyring into ceph.conf then?
[3:39] <dmick> no
[3:40] <dmick> let me see what the default keyring paths are
[3:40] <dmick> /etc/ceph/ceph.client.restapi.keyring might work automatically
[3:41] <bitblt> ok I'll try renaming
[3:41] <dmick> otherwise you'd need a [client.restapi] section in ceph.conf with "keyring = /etc/ceph/client.restapi"
[3:41] <bitblt> ah
[3:41] <dmick> so it knows where to find the keyring
[3:41] <dmick> the paths are, by default
[3:41] <dmick> /etc/ceph/$cluster.$name.keyring,/etc/ceph/$cluster.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin
[3:42] * sagelap1 (~sage@2600:1012:b008:8ba2:18f7:88d2:cd17:da33) Quit (Quit: Leaving.)
[3:42] <bitblt> ok so this gets me to a different error, but looks closer now
[3:42] <bitblt> can't get osd dump output
[3:42] <bitblt> oh wait..duh, didn't give osd perms
[3:44] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[3:44] <bitblt> I wonder why it doesn't work with the admin user
[3:45] <dmick> why what doesn't work with the admin user how?
[3:46] <bitblt> oh I had tried it with -n admin and it failed too
[3:46] <dmick> that should work
[3:46] <bitblt> but now it works, without -n, and without touching the admin key
[3:46] <bitblt> oh wait, because the default is restapi
[3:46] <bitblt> ok so restapi works then
[3:47] * xarses (~andreww@c-71-202-167-197.hsd1.ca.comcast.net) has joined #ceph
[3:48] * aliguori (~anthony@74.202.210.82) Quit (Remote host closed the connection)
[3:49] <dmick> ah, apologies
[3:49] <dmick> -n needs to be "client.admin" or "client.restapi"
[3:49] <dmick> -i is just the post-dot part
[3:49] <dmick> my bad
[3:50] <dmick> as many times as I've typed "name = type.id", you'd think by now I'd remember
[3:50] <dmick> so either -i admin or -n client.admin should also work
[3:52] <bitblt> ok yes that does work
[3:52] <dmick> good, truth still runs the universe :)
[3:52] <bitblt> confusing the -n -i bit
[3:52] <bitblt> heh
[3:53] <bitblt> all is not lost
[3:53] <bitblt> thanks for the help
[3:53] <dmick> np
[3:55] * diegows (~diegows@190.190.11.42) has joined #ceph
[3:55] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[3:57] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Read error: Operation timed out)
[4:07] * aarontc (~aaron@static-50-126-79-226.hlbo.or.frontiernet.net) Quit (Ping timeout: 480 seconds)
[4:14] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[4:17] <BillK> is there a limit on the number of rbd snapshots you can have?
[4:20] <dmick> BillK: no hard limit I'm aware of
[4:25] <BillK> tkx, want to hourly snaps and trying to figure out how many I can keep around - having random problems where rolling back might be a quick fix while I ivestigate.
[4:36] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) has joined #ceph
[4:38] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:42] * AfC (~andrew@2001:44b8:31cb:d400:2ad2:44ff:fe08:a4c) has joined #ceph
[4:45] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[4:50] * a (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[4:51] * a is now known as Guest2811
[4:54] * aarontc (~aaron@static-50-126-79-226.hlbo.or.frontiernet.net) has joined #ceph
[4:58] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) Quit (Quit: smiley)
[5:00] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:01] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:04] * bitblt (~don@wsip-70-166-101-169.ph.ph.cox.net) Quit (Ping timeout: 480 seconds)
[5:06] * fireD (~fireD@93-142-212-14.adsl.net.t-com.hr) has joined #ceph
[5:07] * fireD_ (~fireD@93-139-148-230.adsl.net.t-com.hr) Quit (Ping timeout: 480 seconds)
[5:29] * diegows (~diegows@190.190.11.42) Quit (Ping timeout: 480 seconds)
[5:33] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[5:41] * AfC (~andrew@2001:44b8:31cb:d400:2ad2:44ff:fe08:a4c) Quit (Quit: Leaving.)
[5:43] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:43] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[5:49] * The_Bishop_ (~bishop@f055193005.adsl.alicedsl.de) has joined #ceph
[5:55] * The_Bishop (~bishop@g229217042.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[5:58] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[5:59] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) Quit (Quit: Leaving.)
[6:00] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[6:00] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) has joined #ceph
[6:00] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[6:06] * n1md4 (~nimda@anion.cinosure.com) Quit (Ping timeout: 480 seconds)
[6:11] * aarontc (~aaron@static-50-126-79-226.hlbo.or.frontiernet.net) Quit (Quit: Bye...)
[6:34] * \ask (~ask@oz.develooper.com) Quit (Quit: Bye)
[6:39] * dmick (~dmick@2607:f298:a:607:d44d:e3e0:6173:2a4f) has left #ceph
[6:54] * \ask (~ask@oz.develooper.com) has joined #ceph
[6:58] * MACscr (~Adium@c-98-214-103-147.hsd1.il.comcast.net) has left #ceph
[7:07] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:08] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[7:12] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[7:23] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[7:33] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:59] * gregsfortytwo1 (~Adium@cpe-172-250-69-138.socal.res.rr.com) Quit (Quit: Leaving.)
[8:02] * codice (~toodles@71-80-186-21.dhcp.lnbh.ca.charter.com) Quit (Quit: leaving)
[8:03] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) Quit (Ping timeout: 480 seconds)
[8:04] * codice (~toodles@71-80-186-21.dhcp.lnbh.ca.charter.com) has joined #ceph
[8:04] * codice (~toodles@71-80-186-21.dhcp.lnbh.ca.charter.com) Quit ()
[8:05] * codice (~toodles@71-80-186-21.dhcp.lnbh.ca.charter.com) has joined #ceph
[8:07] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[8:09] * sleinen1 (~Adium@2001:620:0:26:18bc:df1f:c9d:cb3b) has joined #ceph
[8:10] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[8:11] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[8:15] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[8:19] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[8:19] * sleinen1 (~Adium@2001:620:0:26:18bc:df1f:c9d:cb3b) Quit (Ping timeout: 480 seconds)
[8:27] * aarontc (~aaron@static-50-126-79-226.hlbo.or.frontiernet.net) has joined #ceph
[8:55] * haomaiwang (~haomaiwan@117.79.232.233) Quit (Remote host closed the connection)
[8:56] * haomaiwang (~haomaiwan@211.155.113.208) has joined #ceph
[9:01] * foosinn (~stefan@office.unitedcolo.de) has joined #ceph
[9:11] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[9:23] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[9:36] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[9:38] * sleinen1 (~Adium@2001:620:0:25:b197:2527:ab28:4ae6) has joined #ceph
[9:38] * wenjianhn (~wenjianhn@114.245.46.123) has joined #ceph
[9:44] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[10:03] * sleinen1 (~Adium@2001:620:0:25:b197:2527:ab28:4ae6) Quit (Quit: Leaving.)
[10:16] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[10:24] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[10:40] * themgt (~themgt@201-223-223-113.baf.movistar.cl) Quit (Ping timeout: 480 seconds)
[10:43] * themgt (~themgt@201-223-225-108.baf.movistar.cl) has joined #ceph
[10:43] * BManojlovic (~steki@fo-d-130.180.254.37.targo.rs) has joined #ceph
[11:16] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[11:16] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[11:23] * madkiss (~madkiss@089144203046.atnat0012.highway.a1.net) has joined #ceph
[11:25] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[11:45] * gucki (~smuxi@p549FB54E.dip0.t-ipconnect.de) has joined #ceph
[12:17] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[12:25] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[12:29] * madkiss (~madkiss@089144203046.atnat0012.highway.a1.net) Quit (Ping timeout: 480 seconds)
[12:37] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Remote host closed the connection)
[12:38] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) has joined #ceph
[12:46] * yguang11 (~yguang11@vpn-nat.peking.corp.yahoo.com) Quit (Ping timeout: 480 seconds)
[12:47] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[13:18] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[13:26] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[13:30] * yanzheng (~zhyan@134.134.139.76) has joined #ceph
[13:40] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[13:41] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[13:53] * n1md4 (~nimda@anion.cinosure.com) has joined #ceph
[13:55] * wenjianhn (~wenjianhn@114.245.46.123) Quit (Ping timeout: 480 seconds)
[14:11] * glzhao (~glzhao@118.195.65.67) has joined #ceph
[14:15] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[14:21] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[14:48] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[14:52] * diegows (~diegows@190.190.11.42) has joined #ceph
[14:56] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[15:06] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[15:07] * swinchen_ (~swinchen@samuel-winchenbach.ums.maine.edu) has joined #ceph
[15:09] * jnq (~jon@0001b7cc.user.oftc.net) Quit (Ping timeout: 480 seconds)
[15:09] * swinchen (~swinchen@samuel-winchenbach.ums.maine.edu) Quit (Ping timeout: 480 seconds)
[15:11] * KindOne (~KindOne@0001a7db.user.oftc.net) Quit (Ping timeout: 480 seconds)
[15:12] * KindOne (KindOne@0001a7db.user.oftc.net) has joined #ceph
[15:12] * yanzheng (~zhyan@134.134.139.76) Quit (Remote host closed the connection)
[15:16] * JustEra (~JustEra@ALille-253-1-60-17.w90-7.abo.wanadoo.fr) has joined #ceph
[15:18] * yanzheng (~zhyan@134.134.137.75) has joined #ceph
[15:22] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) Quit (Remote host closed the connection)
[15:27] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) has joined #ceph
[15:28] * sleinen1 (~Adium@2001:620:0:26:a5eb:3cb1:71bf:f0a8) has joined #ceph
[15:35] * sleinen (~Adium@77-58-245-10.dclient.hispeed.ch) Quit (Ping timeout: 480 seconds)
[15:49] * sarob (~sarob@2601:9:7080:13a:992e:17f5:4672:e3c3) has joined #ceph
[15:50] * yanzheng (~zhyan@134.134.137.75) Quit (Remote host closed the connection)
[15:52] * JustEra (~JustEra@ALille-253-1-60-17.w90-7.abo.wanadoo.fr) Quit (Quit: This computer has gone to sleep)
[15:53] * jnq (~jon@gruidae.jonquinn.com) has joined #ceph
[15:57] * sarob (~sarob@2601:9:7080:13a:992e:17f5:4672:e3c3) Quit (Ping timeout: 480 seconds)
[16:03] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[16:08] * davidzlap1 (~Adium@76.173.16.173) has joined #ceph
[16:08] * davidzlap (~Adium@76.173.16.173) Quit (Read error: Connection reset by peer)
[16:12] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[16:13] * wenjianhn (~wenjianhn@114.245.46.123) has joined #ceph
[16:14] * bandrus (~Adium@c-98-238-148-252.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[16:14] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) has joined #ceph
[16:15] * yanzheng (~zhyan@101.83.170.229) has joined #ceph
[16:23] * Coyo (~coyo@209.148.95.237) has joined #ceph
[16:23] * Coyo is now known as Guest2855
[16:31] <ksingh> do you know why command execution is taking so much time
[16:31] <ksingh> rbd create san1 --size 4096 -m 192.168.56.100 -k /etc/ceph/ceph.client.admin.keyring
[16:31] <ksingh> this command and other commands are taking a very long time
[16:31] <ksingh> and not giving any output or any error message
[16:31] <mikedawson> ksingh: on a cluster operating properly it should be quick
[16:32] <ksingh> mikedawson : i think you caught the point
[16:32] <ksingh> my cluster seems to be sick
[16:32] <ksingh> health HEALTH_WARN 192 pgs degraded; 192 pgs stale; 192 pgs stuck stale; 192 pgs stuck unclean
[16:33] <ksingh> how can i make it healthy
[16:33] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) has joined #ceph
[16:33] <mikedawson> ksingh: do 'ceph osd tree' to see if all the osd processes are up. Get them all running if possible, then watch 'ceph -w' to see if it is making progress.
[16:34] <mikedawson> ksingh: I should have said up and in (both states matter)
[16:35] * madkiss (~madkiss@089144203046.atnat0012.highway.a1.net) has joined #ceph
[16:35] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[16:37] <ksingh> [root@ceph-client ceph]# ceph status
[16:37] <ksingh> cluster 112280c3-ac20-4a08-bed2-a4a0802ac0de
[16:37] <ksingh> health HEALTH_WARN 192 pgs degraded; 192 pgs stale; 192 pgs stuck stale; 192 pgs stuck unclean
[16:37] <ksingh> monmap e1: 1 mons at {ceph-client=192.168.56.100:6789/0}, election epoch 1, quorum 0 ceph-client
[16:37] <ksingh> osdmap e137: 2 osds: 2 up, 2 in
[16:37] <ksingh> pgmap v206: 192 pgs: 128 stale+active+degraded, 64 stale+active+replay+degraded; 0 bytes data, 2117 MB used, 14240 MB / 16358 MB avail
[16:37] <ksingh> mdsmap e3: 1/1/1 up {0=ceph-client=up:creating}
[16:37] <ksingh> [root@ceph-client ceph]#
[16:37] <ksingh> sorry for flooding channel
[16:38] * jcsp (~john@82-71-55-202.dsl.in-addr.zen.co.uk) Quit (Read error: No route to host)
[16:40] <mikedawson> ksingh: do you have a single server hosting OSDs? The default crush rules will not put two (or more) replicas on the same host
[16:42] <mikedawson> ksingh: the default is "step chooseleaf firstn 0 type host". For a single host setup, you would need "step chooseleaf firstn 0 type osd". See http://ceph.com/docs/master/rados/operations/crush-map/
[16:42] * sarob (~sarob@c-50-161-65-119.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[16:45] <ksingh> Milkedawson : you are correct i am using 1 server for ceph-deploy , monitor , 2 OSD , MDS as well as client
[16:45] <ksingh> i know this is not a good way but need to just test ceph
[16:45] <ksingh> once this is OK , i will get it deployed to new hardware
[16:46] <mikedawson> ksingh: then change the crushmap to use "step chooseleaf firstn 0 type osd", and don't run any pools with more than two replicas (the total osd count you have). That should allow your cluster to replicate properly and go from HEALTH_WARN to HEALTH_OK
[16:55] <ksingh> Thanks Milkedawson for your guidance
[16:55] <ksingh> can you help me how to change the crushmap
[16:55] <ksingh> is there any configuration fine for this or any command
[16:56] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[17:05] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[17:24] * jluis (~JL@118.82.136.95.rev.vodafone.pt) has joined #ceph
[17:24] * ChanServ sets mode +o jluis
[17:26] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) Quit (Quit: Leaving.)
[17:26] * MK_FG (~MK_FG@00018720.user.oftc.net) Quit (Remote host closed the connection)
[17:27] * MK_FG (~MK_FG@00018720.user.oftc.net) has joined #ceph
[17:29] * diegows (~diegows@190.190.11.42) Quit (Ping timeout: 480 seconds)
[17:30] * BillK (~BillK-OFT@58-7-67-236.dyn.iinet.net.au) Quit (Ping timeout: 480 seconds)
[17:35] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[17:36] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[17:38] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[17:39] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Read error: Operation timed out)
[17:42] <foosinn> anyone here who is using rbd incremental backup?`http://ceph.com/docs/master/dev/rbd-diff/
[17:44] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) has joined #ceph
[17:53] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) Quit (Quit: Leaving.)
[18:05] * erice (~erice@71-208-244-175.hlrn.qwest.net) has joined #ceph
[18:22] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Ping timeout: 480 seconds)
[18:26] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[18:27] * yanzheng (~zhyan@101.83.170.229) Quit (Ping timeout: 480 seconds)
[18:33] * erice (~erice@71-208-244-175.hlrn.qwest.net) Quit (Ping timeout: 480 seconds)
[18:36] * erice (~erice@71-208-244-175.hlrn.qwest.net) has joined #ceph
[18:36] * DarkAceZ (~BillyMays@50.107.53.200) Quit (Ping timeout: 480 seconds)
[18:37] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[18:43] * foosinn (~stefan@office.unitedcolo.de) Quit (Quit: Leaving)
[18:45] <tchmnkyz> is there a max setting for max_backfill/max_recovery that i should not go over?
[18:45] <tchmnkyz> like i need to upp both to speed up some changes i have to make
[18:45] <tchmnkyz> i have 12 hours to add 475 OSD's
[18:46] <tchmnkyz> also in that time i need to remove 12 HUGE OSD's
[18:52] * erice (~erice@71-208-244-175.hlrn.qwest.net) Quit (Ping timeout: 480 seconds)
[18:54] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) has joined #ceph
[19:00] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[19:02] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) Quit (Ping timeout: 480 seconds)
[19:04] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) has joined #ceph
[19:07] * glzhao (~glzhao@118.195.65.67) Quit (Quit: leaving)
[19:08] <tchmnkyz> hey KevinPerks you alive?
[19:09] * diegows (~diegows@190.190.11.42) has joined #ceph
[19:10] * DarkAceZ (~BillyMays@50.107.53.200) has joined #ceph
[19:10] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[19:10] <tchmnkyz> alexbligh: you?
[19:11] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[19:11] * ChanServ sets mode +o scuttlemonkey
[19:12] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) Quit (Ping timeout: 480 seconds)
[19:25] * sjustlaptop (~sam@24-205-35-233.dhcp.gldl.ca.charter.com) has joined #ceph
[19:28] * nwat (~nwat@c-50-131-197-174.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[19:48] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[19:56] * madkiss (~madkiss@089144203046.atnat0012.highway.a1.net) Quit (Quit: Leaving.)
[19:59] * themgt (~themgt@201-223-225-108.baf.movistar.cl) Quit (Quit: themgt)
[20:04] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) has joined #ceph
[20:07] * The_Bishop__ (~bishop@g230095184.adsl.alicedsl.de) has joined #ceph
[20:07] <ksingh> hello guys
[20:07] <ksingh> can you help me how to change the crushmap
[20:07] <ksingh> is there any configuration fine for this or any command
[20:10] * The_Bishop_ (~bishop@f055193005.adsl.alicedsl.de) Quit (Ping timeout: 480 seconds)
[20:13] * themgt (~themgt@201-223-225-108.baf.movistar.cl) has joined #ceph
[20:26] * angdraug (~angdraug@c-67-169-181-128.hsd1.ca.comcast.net) has joined #ceph
[20:33] <tchmnkyz> what do you want to change on it
[20:33] <tchmnkyz> there are a few ways to change the crushmap
[20:33] <tchmnkyz> small changes to like weight of a OSD and such can be made using the ceph command
[20:33] <tchmnkyz> or you can dump the crushmap and make major changes
[20:34] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) has joined #ceph
[20:39] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) Quit (Read error: Connection reset by peer)
[20:41] * themgt (~themgt@201-223-225-108.baf.movistar.cl) Quit (Quit: themgt)
[20:55] * mozg (~andrei@host86-184-120-113.range86-184.btcentralplus.com) has joined #ceph
[21:20] * smiley (~smiley@pool-108-28-107-254.washdc.fios.verizon.net) has joined #ceph
[21:21] * erice (~erice@71-208-244-175.hlrn.qwest.net) has joined #ceph
[21:22] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) Quit (Ping timeout: 480 seconds)
[21:27] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) has joined #ceph
[21:33] * madkiss (~madkiss@089144203046.atnat0012.highway.a1.net) has joined #ceph
[21:41] * madkiss (~madkiss@089144203046.atnat0012.highway.a1.net) Quit (Ping timeout: 480 seconds)
[21:42] * lofejndif (~lsqavnbok@manning2.torservers.net) has joined #ceph
[21:43] * lofejndif (~lsqavnbok@83TAADD5Z.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[21:43] * lofejndif (~lsqavnbok@sipb-tor.mit.edu) has joined #ceph
[22:05] * themgt (~themgt@190.54.84.24) has joined #ceph
[22:06] * ksingh (~Adium@91-157-122-80.elisa-laajakaista.fi) has joined #ceph
[22:07] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[22:09] <ksingh> hello
[22:09] <ksingh> my cluster status is not Healtby
[22:09] <ksingh> [root@ceph-client ceph]# ceph status
[22:09] <ksingh> cluster 112280c3-ac20-4a08-bed2-a4a0802ac0de
[22:09] <ksingh> health HEALTH_WARN 192 pgs degraded; 192 pgs stale; 192 pgs stuck stale; 192 pgs stuck unclean; 5 requests are blocked > 32 sec; mds cluster is degraded
[22:09] <ksingh> monmap e1: 1 mons at {ceph-client=192.168.56.100:6789/0}, election epoch 1, quorum 0 ceph-client
[22:09] <ksingh> osdmap e145: 2 osds: 2 up, 2 in
[22:09] <ksingh> pgmap v222: 192 pgs: 128 stale+active+degraded, 64 stale+active+replay+degraded; 0 bytes data, 2118 MB used, 14240 MB / 16358 MB avail
[22:09] <ksingh> mdsmap e5: 1/1/1 up {0=ceph-client=up:replay}
[22:09] <ksingh> [root@ceph-client ceph]#
[22:09] <ksingh> can you suggest how to make it health
[22:12] * nhm (~nhm@184-97-129-163.mpls.qwest.net) Quit (Quit: Lost terminal)
[22:14] * gucki (~smuxi@p549FB54E.dip0.t-ipconnect.de) Quit (Ping timeout: 480 seconds)
[22:15] * diegows (~diegows@190.190.11.42) Quit (Ping timeout: 480 seconds)
[22:15] <mozg> ksingh, hello
[22:15] <mozg> does the state change over time?
[22:15] <ksingh> hello mozg
[22:15] <ksingh> nopes , i build 1 node cluster this node is my ceph-deploy , mon , OSD , client
[22:16] <ksingh> i am just testing ceph , once its UP here i will move to several other physical nodes
[22:16] <mozg> so, this is just one server, right?
[22:16] <ksingh> yes
[22:16] <mozg> with 2 disks
[22:16] <ksingh> yes
[22:16] <mozg> try something. not sure if it will help
[22:17] <ksingh> like what
[22:17] <mozg> sorry, before that
[22:17] <mozg> have you tried restarting your server?
[22:17] <mozg> if you've tried it already
[22:18] <mozg> try to switch off a single osd and check what status it will show
[22:18] <mozg> and switch it back on
[22:18] <ksingh> yes several times
[22:18] <mozg> what os are you running?
[22:18] <ksingh> centos 6.4
[22:19] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[22:19] <mozg> not sure how the scripts work in centos
[22:20] <mozg> in ubuntu i would do the following
[22:20] <mozg> stop ceph-osd id=0
[22:20] <mozg> this will stop the osd.0
[22:20] <mozg> in centos there got to be something similar
[22:20] <ksingh> yes i know how to stop osd service for particulat OSD , i will try that
[22:20] <mozg> okay
[22:20] <mozg> stop it
[22:21] <ksingh> well do you know
[22:21] <mozg> and check what does your status show
[22:21] <mozg> does the number of pgs change or not
[22:21] <ksingh> in the output of ceph osd dump
[22:21] <ksingh> there is a string like max_osd 7
[22:21] <ksingh> do you know what is this
[22:21] <mozg> no idea, sorry man
[22:21] <mozg> i am not an expert
[22:21] <mozg> i have a ceph cluster
[22:22] <mozg> and did have several issues, so the guys here helped me to troubleshoot
[22:22] <ksingh> you have a running ceph cluster , thats great MAN
[22:22] <mozg> so, i know a few bits and bobs
[22:22] <mozg> i had an issue once
[22:22] <ksingh> its always learning by doing
[22:22] <mozg> where my health status was showing warning
[22:22] <ksingh> anyway i will stop 1 osd and check health
[22:22] <mozg> and several pgs were not clean+active
[22:22] <mozg> and that stayed like that for hours
[22:23] <mozg> so, i've switched off one of the osds
[22:23] <mozg> or several, I do not remember exactly
[22:23] <mozg> and switched it back on in a few minutes
[22:23] <mozg> and ceph automatically fixed it
[22:23] <mozg> if this doesn't work for you
[22:23] * danieagle (~Daniel@186.214.54.30) has joined #ceph
[22:23] <mozg> there is a troubleshooting guide for pgs
[22:23] * mikedawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[22:23] <mozg> let me dig the link
[22:23] <mozg> mikedawson, hello
[22:23] <mozg> how's it going?
[22:24] <mikedawson> good
[22:24] <mozg> ksingh, here you go
[22:24] <mozg> http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/
[22:24] <mozg> try and see if anything there helps
[22:24] <ksingh> hello mike you are back :^)
[22:24] <ksingh> thanks buddy lemme check
[22:25] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Quit: Ex-Chat)
[22:26] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[22:26] <mikedawson> ksingh: do you still need help?
[22:27] <ksingh> yes brother i still need to fix cluster health problem
[22:27] <mikedawson> ksingh: ceph osd getcrushmap -o default-crushmap && crushtool -d default-crushmap -o default-crushmap.txt && cp default-crushmap.txt new-crushmap.txt
[22:28] <mikedawson> ksingh: then edit new-crushmap.txt, and change "step chooseleaf firstn 0 type host" to "step chooseleaf firstn 0 type osd"
[22:28] <mikedawson> ksingh: then crushtool -c new-crushmap.txt -o new-crushmap && ceph osd setcrushmap -i new-crushmap
[22:28] <ksingh> yes i reached till here but didnt kew to change HOST with OSD
[22:29] <ksingh> there are several lines containing step chooseleaf firstn 0 type host , so should i update all the lines
[22:29] <ksingh> or any specific one
[22:29] <mikedawson> That gets the crushmap, converts it to a text file, copy it, edit the copy, creates a new crushmap, and sets the new crushmap
[22:29] <mikedawson> ksingh: change ALL of them
[22:30] <ksingh> sure
[22:33] <ksingh> hi mikedawson , i have done exactly the same and restarted cluster services but still no LUCK
[22:33] <ksingh> [root@ceph-client ceph]# ceph status
[22:33] <ksingh> cluster 112280c3-ac20-4a08-bed2-a4a0802ac0de
[22:33] <ksingh> health HEALTH_WARN 192 pgs degraded; 192 pgs stale; 192 pgs stuck stale; 192 pgs stuck unclean; mds cluster is degraded
[22:33] <ksingh> monmap e1: 1 mons at {ceph-client=192.168.56.100:6789/0}, election epoch 1, quorum 0 ceph-client
[22:33] <ksingh> osdmap e153: 2 osds: 2 up, 2 in
[22:33] <ksingh> pgmap v232: 192 pgs: 128 stale+active+degraded, 64 stale+active+replay+degraded; 0 bytes data, 2118 MB used, 14240 MB / 16358 MB avail
[22:33] <ksingh> mdsmap e7: 1/1/1 up {0=ceph-client=up:replay}
[22:33] <ksingh> [root@ceph-client ceph]#
[22:34] * a_ (~a@pool-173-55-143-200.lsanca.fios.verizon.net) has joined #ceph
[22:34] <mikedawson> ksingh: please paste the output of
[22:35] <mikedawson> '
[22:35] <mikedawson> 'ceph osd dump | grep ^pool'
[22:36] <ksingh> pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
[22:36] <ksingh> pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0
[22:36] <ksingh> pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0
[22:37] <mikedawson> ksingh: ok, good. That means you do not have any pools with a replica size larger than your number of OSDs
[22:37] <ksingh> yep
[22:38] <mikedawson> ksingh: does 'ceph -w' show any movement. Placement groups hopefully will re-arrange themselves
[22:41] * Guest2811 (~a@pool-173-55-143-200.lsanca.fios.verizon.net) Quit (Ping timeout: 480 seconds)
[22:41] <ksingh> it shows mostly like
[22:41] <ksingh> 2013-10-17 16:21:24.362402 osd.0 [WRN] slow request 480.402092 seconds old, received at 2013-10-17 16:13:23.960209: osd_op(mds.0.3:5 200.00000000 [read 0~0] 1.844f3494 RETRY=4 e152) v4 currently waiting for pg to exist locally
[22:44] <ksingh> one more thing if you can check the below logs
[22:45] <ksingh> [root@ceph-client ceph]# ceph osd dump
[22:45] <ksingh> epoch 155
[22:45] <ksingh> fsid 112280c3-ac20-4a08-bed2-a4a0802ac0de
[22:45] <ksingh> created 2013-10-14 12:00:10.427071
[22:45] <ksingh> modified 2013-10-17 16:20:36.407312
[22:45] <ksingh> flags
[22:45] <ksingh> pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 crash_replay_interval 45
[22:45] <ksingh> pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0
[22:45] <ksingh> pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0
[22:45] <ksingh> max_osd 7
[22:45] <ksingh> osd.0 up in weight 1 up_from 152 up_thru 0 down_at 151 last_clean_interval [148,151) 192.168.56.100:6802/13916 192.168.56.100:6803/13916 192.168.56.100:6804/13916 192.168.56.100:6810/13916 exists,up a063de65-c37a-4445-a468-4975a7200ec2
[22:45] <ksingh> osd.1 down out weight 0 up_from 153 up_thru 0 down_at 154 last_clean_interval [149,152) 192.168.56.100:6806/14288 192.168.56.100:6808/14288 192.168.56.100:6809/14288 192.168.56.100:6811/14288 autoout,exists ad6b8481-85a0-4792-aa0c-203929716a92
[22:45] <ksingh> blacklist 192.168.56.100:6801/8826 expires 2013-10-17 16:37:20.087271
[22:45] <ksingh> [root@ceph-client ceph]#
[22:45] <ksingh> what is max_osd 7 here ??
[22:46] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[22:46] <mikedawson> ksingh: the slow requests are your problem, most likely. Stop and re-start the OSD processes, then watch ceph -w. If the slow requests come back, try again or think about rebooting.
[22:47] <mikedawson> ksingh: if you can't get past them, watch 'iostat -xt 2' and look for spindle contention (high % util)
[22:48] <ksingh> did you see max_osd 7 here
[22:48] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) Quit (Remote host closed the connection)
[22:48] <ksingh> is this normal
[22:49] * ScOut3R (~scout3r@BC24BF84.dsl.pool.telekom.hu) has joined #ceph
[22:50] <mikedawson> ksingh: what version of ceph are you running?
[22:50] <ksingh> IOSTAT utilisation for for OSD disk are 0%
[22:51] <ksingh> ceph version 0.67.4 and ceph-deploy version is 1.2.7
[22:52] <mikedawson> ksingh: I've rarely run into max_osd. Looks like it is the count of how many OSDs have been created with 'ceph osd create'
[22:53] <mikedawson> ksingh: can you paste your 'ceph osd tree' and your new-crushmap.txt (use a 3rd party paste service, please)
[22:53] <ksingh> yes earlier i created few osd from osd create and few from ceph-deploy and then i removed them
[22:53] <ksingh> and removed using documentation
[22:53] <ksingh> now i have 2 osd that are up and IN
[22:53] <ksingh> i hope this will not cause any problem for me.
[22:54] * wschulze (~wschulze@cpe-72-229-37-201.nyc.res.rr.com) Quit (Quit: Leaving.)
[22:54] <mikedawson> ksingh: if you have other osds defined, you need to take them out
[22:55] <ksingh> i guess i have removed them properly , those are not coming in osd tree
[22:55] <ksingh> [root@ceph-client ceph]# ceph osd tree
[22:55] <ksingh> # id weight type name up/down reweight
[22:55] <ksingh> -1 0.01999 root default
[22:55] <ksingh> -2 0.01999 host ceph-client
[22:55] <ksingh> 0 0.009995 osd.0 up 1
[22:55] <ksingh> 1 0.009995 osd.1 down 0
[22:55] <ksingh> [root@ceph-client ceph]#
[22:56] <ksingh> these osd.0 and osd.1 are the correct osd now and osd.1 i have down manuallly , i wil bring it up
[22:56] <ksingh> just for troubleshoting
[22:56] <mikedawson> ksingh: osd.1 is your problem. It appears to be down+out. Need bring it up for the placement groups to get healthy
[22:57] * alfredodeza (~alfredode@c-24-131-46-23.hsd1.ga.comcast.net) Quit (Remote host closed the connection)
[22:58] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) Quit (Quit: Leaving.)
[22:58] <ksingh> yes i brought it up , no message in ceph -w , cluster health still in warning
[22:59] * Vjarjadian (~IceChat77@94.1.37.151) Quit (Quit: We be chillin - IceChat style)
[23:00] <mikedawson> ksingh: you can try
[23:00] <mikedawson> 'ceph osd crush tunables optimal', sometimes that will fix otherwise stuck clusters
[23:00] <mikedawson> ksingh: http://ceph.com/docs/master/rados/operations/crush-map/#tunables
[23:00] <ksingh> ok
[23:02] * KevinPerks (~Adium@cpe-066-026-252-218.triad.res.rr.com) has joined #ceph
[23:04] <tchmnkyz> KevinPerks: you around?
[23:05] <ksingh> mikedawson no luck , still the same
[23:07] <mikedawson> ksingh: are you still seeing slow requests?
[23:08] <ksingh> yes still slow requests , do you know how to check status of MDS i think there is something related to mds
[23:08] <ksingh> 2013-10-17 16:45:24.039066 osd.0 [WRN] slow request 1920.079658 seconds old, received at 2013-10-17 16:13:23.959367: osd_op(mds.0.3:1 mds0_inotable [read 0~0] 1.b852b893 RETRY=2 e152) v4 currently waiting for pg to exist locally
[23:09] <mikedawson> ksingh: I would stop the MDS service for now, and focus on OSDs / placement groups. The MDS has no chance of working until we get the OSDs / placement groups fixed
[23:10] <mikedawson> ksingh: run this -> ceph pg dump | grep "^[0-9]\.[0-9a-f]*" | awk '{ print $14 }'
[23:10] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) Quit (Quit: Ex-Chat)
[23:10] <mikedawson> ksingh: you should see 192 lines of output with each line showing [0,1] or [1,0]. What do you see?
[23:12] <ksingh> i see 192 times [3]
[23:14] <mikedawson> ksingh: that is a problem. That means that at some point there was an osd.3 that held a copy of these placement groups. You only have osd.0 and osd.1, so all your placement groups should reside with the primary on one and the replica on the other
[23:14] <mikedawson> ksingh: you may need to remove each pool then re-add it
[23:15] <mikedawson> ksingh: http://ceph.com/docs/master/rados/operations/pools/
[23:15] <ksingh> i agree with you and i salute your keen observatino thanks
[23:15] <ksingh> yesterday i was doing some troubleshooting and remove few other osd
[23:16] <ksingh> it might cased then
[23:17] <ksingh> so should i delete all 3 pools and recreate them , using document that you shared
[23:18] <mikedawson> ksingh: i would do it on the rbd pool first to test the theory
[23:18] <ksingh> ok i will do the same
[23:19] <mikedawson> ksingh: obviously, you'll kill all the data in those pools if you remove them... but you don't have any data given the status of your cluster anyway. Just want to mention that for others reading along
[23:20] <ksingh> yes this is a important point , but i dont have data :^)
[23:22] * joshd1 (~jdurgin@2602:306:c5db:310:5840:6456:9bc4:d59) Quit (Ping timeout: 480 seconds)
[23:22] * Shmouel (~Sam@fny94-12-83-157-27-95.fbx.proxad.net) Quit (Quit: Leaving.)
[23:31] * diegows (~diegows@190.190.11.42) has joined #ceph
[23:34] <ksingh> mike , whats wrong here :-S
[23:35] <ksingh> [root@ceph-client ceph]# ceph osd lspools
[23:35] <ksingh> 0 data,1 metadata,2 rbd,
[23:35] <ksingh> [root@ceph-client ceph]# ceph osd pool delete rbd
[23:35] <ksingh> Invalid command: saw 0 of pool2(<poolname>), expected 1
[23:35] <ksingh> Error EINVAL: invalid command
[23:35] <ksingh> [root@ceph-client ceph]#
[23:51] * carif (~mcarifio@cpe-74-78-54-137.maine.res.rr.com) has joined #ceph
[23:51] * jluis (~JL@118.82.136.95.rev.vodafone.pt) Quit (Ping timeout: 480 seconds)
[23:52] * davidzlap1 (~Adium@76.173.16.173) Quit (Quit: Leaving.)
[23:52] * davidzlap (~Adium@76.173.16.173) has joined #ceph
[23:55] <ksingh> mikedawson u there
[23:55] <mikedawson> ksingh: yep, just got back
[23:56] <mikedawson> ksingh: perhaps you hit a bug (the ceph cli was recently rewritten and there have been some parsing issues similar to this issue)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.