#ceph IRC Log

Index

IRC Log for 2012-08-26

Timestamps are in GMT/BST.

[0:37] * loicd (~loic@brln-4dbc2d62.pool.mediaWays.net) Quit (Quit: Leaving.)
[0:53] * darkfader (~floh@188.40.175.2) Quit (Ping timeout: 480 seconds)
[1:02] * liiwi (liiwi@idle.fi) Quit (Remote host closed the connection)
[1:09] * liiwi (~liiwi@idle.fi) has joined #ceph
[1:14] * darkfader (~floh@188.40.175.2) has joined #ceph
[1:17] * liiwi (~liiwi@idle.fi) Quit (Ping timeout: 480 seconds)
[1:17] <mrjack_> hm
[1:17] <mrjack_> can someone help? health HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 3562/7124 degraded (50.000%)
[1:19] * liiwi (liiwi@idle.fi) has joined #ceph
[1:22] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) Quit (Remote host closed the connection)
[1:23] * sage (~sage@cpe-76-94-40-34.socal.res.rr.com) has joined #ceph
[1:24] <mrjack_> hi sage
[1:24] <mrjack_> are you there?
[1:55] * nhm (~nhm@64-90-86-39.brainerd.net) Quit (Ping timeout: 480 seconds)
[2:01] <Tobarja> Yesterday I added a 3rd machine to total 4 osds. Immediately afterwards I spun up mds on the new machine. I went back to ceph -s and was at HEALTH_WARN 212 pgs stuck unclean... I've been playing with crush reweight and even tunables trying to bring order back to chaos, but no go. I'm now at: HEALTH_WARN 375 pgs stuck unclean, and: 384 pgs: 9 active+clean, 375 active+remapped. I'd really appreciate any ideas. I've bounced boxes, mar
[2:12] * Tobarja_mobile (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) has joined #ceph
[2:16] * nhm (~nhm@64-90-86-39.brainerd.net) has joined #ceph
[3:00] * Tobarja_android (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) has joined #ceph
[3:00] * Tobarja_mobile (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) Quit (Read error: Connection reset by peer)
[3:09] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[3:16] * nhm (~nhm@64-90-86-39.brainerd.net) Quit (Ping timeout: 480 seconds)
[3:17] * danieagle (~Daniel@177.43.213.15) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[4:04] * Tobarja_mobile (~Tobarja@m892c36d0.tmodns.net) has joined #ceph
[4:06] * Tobarja_android (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) Quit (Read error: No route to host)
[4:12] * Tobarja_mobile (~Tobarja@m892c36d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[4:39] <iggy> Tobarja: I doubt I can help, but your message was cut off at "boxes, mar"
[4:53] * Tobarja1 (~athompson@cpe-071-075-064-255.carolina.res.rr.com) has joined #ceph
[4:53] * nhm (~nhm@64-90-86-39.brainerd.net) has joined #ceph
[4:58] * Tobarja (~athompson@cpe-071-075-064-255.carolina.res.rr.com) Quit (Ping timeout: 480 seconds)
[4:59] <Tobarja1> i've almost got it back together... lesson for today is setcrushmap doesn't undo reweight. after i got those matching, it started recovering/remapping/backfilling
[5:26] * Tobarja (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) has joined #ceph
[5:27] <Tobarja1> is it ok for there to be a small number of active+remapped remaining? 12/384, about 3%
[5:27] * nhm (~nhm@64-90-86-39.brainerd.net) Quit (Ping timeout: 480 seconds)
[7:25] * Tobarja (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) Quit (Quit: Bye)
[10:07] * loicd (~loic@brln-4dbc2d62.pool.mediaWays.net) has joined #ceph
[11:33] * rosco (~r.nap@188.205.52.204) Quit (Remote host closed the connection)
[11:33] * rosco (~r.nap@188.205.52.204) has joined #ceph
[11:34] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[11:35] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[11:38] * blufor (~blufor@adm-1.candycloud.eu) Quit (Remote host closed the connection)
[11:38] * blufor (~blufor@adm-1.candycloud.eu) has joined #ceph
[11:38] * liiwi (liiwi@idle.fi) Quit (Remote host closed the connection)
[11:38] * liiwi (liiwi@idle.fi) has joined #ceph
[11:43] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[11:52] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) has joined #ceph
[12:55] <mrjack_> hm
[12:55] <mrjack_> someone here? i have problem getting ceph back to normal again...
[12:55] <mrjack_> health HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 3562/7124 degraded (50.000%)
[13:14] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[14:22] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[15:04] * nhm (~nhm@64-90-86-39.brainerd.net) has joined #ceph
[15:29] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[15:30] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[15:31] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit ()
[15:32] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[15:57] <joao> mrjack, fwiw, I've seen that when the number of OSDs up and in the cluster is lower than the level of replication set
[15:57] <joao> e.g., you have only one osd and your replication is 2
[16:26] * nhm (~nhm@64-90-86-39.brainerd.net) Quit (Ping timeout: 480 seconds)
[16:38] * joao (~JL@89.181.149.181) Quit (Remote host closed the connection)
[17:28] * loicd1 (~loic@brln-4d0cec6b.pool.mediaWays.net) has joined #ceph
[17:31] * loicd (~loic@brln-4dbc2d62.pool.mediaWays.net) Quit (Read error: Operation timed out)
[20:18] * Tobarja (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) has joined #ceph
[20:20] * danieagle (~Daniel@177.43.213.15) has joined #ceph
[20:32] * Tobarja_mobile (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) has joined #ceph
[20:32] * Tobarja (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) Quit (Read error: Connection reset by peer)
[21:19] * Ryan_Lane (~Adium@216.38.130.164) Quit (Quit: Leaving.)
[21:44] * joao (~JL@89-181-149-181.net.novis.pt) has joined #ceph
[21:44] * joao (~JL@89-181-149-181.net.novis.pt) Quit ()
[22:09] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[22:28] * Tobarja (~Tobarja@ma72c36d0.tmodns.net) has joined #ceph
[22:31] * Tobarja_mobile (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) Quit (Read error: No route to host)
[22:31] * Tobarja_mobile (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) has joined #ceph
[22:36] * Tobarja (~Tobarja@ma72c36d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[22:43] * Tobarja_mobile (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) Quit (Quit: Bye)
[22:43] * Tobarja (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) has joined #ceph
[22:43] * Tobarja (~Tobarja@cpe-071-075-064-255.carolina.res.rr.com) Quit ()
[22:56] <mrjack_> please advice, ceph is stuck
[22:56] <mrjack_> ceph -s
[22:56] <mrjack_> health HEALTH_WARN 203 pgs degraded; 38 pgs peering; 37 pgs stuck inactive; 384 pgs stuck unclean; recovery 1882/7124 degraded (26.418%)
[22:56] <mrjack_> monmap e1: 3 mons at {0=192.168.0.10:6789/0,1=192.168.0.11:6789/0,2=192.168.0.12:6789/0}, election epoch 140, quorum 0,1,2 0,1,2
[22:56] <mrjack_> osdmap e153: 2 osds: 2 up, 2 in
[22:56] <mrjack_> pgmap v366830: 384 pgs: 143 active, 38 peering, 203 active+degraded; 7322 MB data, 523 GB used, 207 GB / 770 GB avail; 1882/7124 degraded (26.418%)
[22:56] <mrjack_> mdsmap e194: 1/1/1 up {0=1=up:active}, 1 up:standby
[22:59] <mrjack_> hm
[22:59] <mrjack_> node01:~# ceph -w
[22:59] <mrjack_> health HEALTH_WARN 203 pgs degraded; 38 pgs peering; 37 pgs stuck inactive; 384 pgs stuck unclean; recovery 1882/7124 degraded (26.418%)
[22:59] <mrjack_> monmap e1: 3 mons at {0=192.168.0.10:6789/0,1=192.168.0.11:6789/0,2=192.168.0.12:6789/0}, election epoch 140, quorum 0,1,2 0,1,2
[22:59] <mrjack_> osdmap e153: 2 osds: 2 up, 2 in
[22:59] <mrjack_> pgmap v366832: 384 pgs: 143 active, 38 peering, 203 active+degraded; 7322 MB data, 523 GB used, 207 GB / 770 GB avail; 1882/7124 degraded (26.418%)
[22:59] <mrjack_> mdsmap e194: 1/1/1 up {0=1=up:active}, 1 up:standby
[22:59] <mrjack_> 2012-08-26 22:56:36.090857 mon.0 [INF] pgmap v366831: 384 pgs: 143 active, 38 peering, 203 active+degraded; 7322 MB data, 523 GB used, 207 GB / 770 GB avail; 1882/7124 degraded (26.418%)
[22:59] <mrjack_> why does the output differ?
[22:59] <mrjack_> i did ceph -w
[23:02] <mrjack_> when i remove the osd, it gets to this again
[23:02] <mrjack_> 2012-08-26 23:02:26.423580 mon.0 [INF] pgmap v366837: 384 pgs: 384 active+degraded; 7322 MB data, 523 GB used, 207 GB / 770 GB avail; 3562/7124 degraded (50.000%)
[23:03] <mrjack_> what can i do?
[23:16] <mrjack_> i decided to remove the osd and readd the osd
[23:16] <mrjack_> but now
[23:16] <mrjack_> ceph-osd --conf /etc/ceph/ceph.conf -i 1 --mkfs
[23:16] <mrjack_> 2012-08-26 23:15:31.124777 f7297710 -1 filestore(/data/ceph_backend/osd) mkfs: BTRFS_IOC_SUBVOL_CREATE failed with error (22) Invalid argument
[23:16] <mrjack_> 2012-08-26 23:15:31.124803 f7297710 -1 OSD::mkfs: FileStore::mkfs failed with error -22
[23:16] <mrjack_> how can i tell ceph to not use btrfs and use ext4 instead?

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.