#ceph IRC Log

Index

IRC Log for 2011-05-02

Timestamps are in GMT/BST.

[0:06] * MarkN (~nathan@59.167.240.178) has joined #ceph
[0:07] * MarkN (~nathan@59.167.240.178) has left #ceph
[1:23] * MarkN (~nathan@59.167.240.178) has joined #ceph
[1:27] * MarkN (~nathan@59.167.240.178) has left #ceph
[2:58] * adjohn (~adjohn@p24092-ipngn1301marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:29] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[4:38] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[8:48] * andret (~andre@pcandre.nine.ch) Quit (Remote host closed the connection)
[8:52] * andret (~andre@pcandre.nine.ch) has joined #ceph
[9:10] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[9:11] * Meths (rift@2.25.213.181) Quit (Remote host closed the connection)
[9:11] * Meths (rift@2.25.213.181) has joined #ceph
[9:12] * allsystemsarego (~allsystem@188.25.128.31) has joined #ceph
[9:15] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[9:33] * darktim (~andre@ticket1.nine.ch) has joined #ceph
[9:38] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[10:17] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) has joined #ceph
[10:37] * Yoric (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[11:33] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Read error: Connection reset by peer)
[11:33] * Yoric (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[11:47] * adjohn (~adjohn@p24092-ipngn1301marunouchi.tokyo.ocn.ne.jp) Quit (Quit: adjohn)
[12:13] * darktim (~andre@ticket1.nine.ch) Quit (Remote host closed the connection)
[12:13] * darktim (~andre@ticket1.nine.ch) has joined #ceph
[12:45] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[13:25] * darktim (~andre@ticket1.nine.ch) Quit (Remote host closed the connection)
[13:28] * adjohn (~adjohn@s201.GtokyoFL27.vectant.ne.jp) has joined #ceph
[13:29] * darktim (~andre@ticket1.nine.ch) has joined #ceph
[13:34] * _adjohn (~adjohn@s201.GtokyoFL27.vectant.ne.jp) has joined #ceph
[13:34] * adjohn (~adjohn@s201.GtokyoFL27.vectant.ne.jp) Quit (Read error: Connection reset by peer)
[13:34] * _adjohn is now known as adjohn
[13:41] * darktim (~andre@ticket1.nine.ch) Quit (Remote host closed the connection)
[13:41] * darktim (~andre@ticket1.nine.ch) has joined #ceph
[13:58] * Yoric_ (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[13:58] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Read error: Connection reset by peer)
[13:58] * Yoric_ is now known as Yoric
[14:14] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[14:21] * cengiz_ (~cengiz@80.160.83.116) has joined #ceph
[14:25] * cengiz_ (~cengiz@80.160.83.116) has left #ceph
[15:26] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Quit: Ex-Chat)
[15:40] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Read error: Connection reset by peer)
[15:40] * Yoric (~David@87-231-38-145.rev.numericable.fr) has joined #ceph
[15:42] * adjohn (~adjohn@s201.GtokyoFL27.vectant.ne.jp) Quit (Quit: adjohn)
[15:53] * adjohn (~adjohn@s201.GtokyoFL27.vectant.ne.jp) has joined #ceph
[15:53] * adjohn (~adjohn@s201.GtokyoFL27.vectant.ne.jp) Quit ()
[15:53] * aliguori (~anthony@32.97.110.59) has joined #ceph
[16:12] * pombreda (~Administr@109.128.209.232) has joined #ceph
[16:42] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[16:51] * aliguori (~anthony@32.97.110.59) Quit (Read error: Operation timed out)
[16:52] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[16:52] <pombreda> sage, gregaf: hiya octopus tamers :) FYI, the playground has been back up (as in "ssh reachable") for a few days, but the ceph FS is not mounted there
[17:00] * aliguori (~anthony@32.97.110.51) has joined #ceph
[17:13] <josef> man you guys really know how to piss off btrfs's orphan space accounting
[17:19] <pombreda> ?
[17:38] * greglap (~Adium@cpe-76-170-84-245.socal.res.rr.com) Quit (Quit: Leaving.)
[17:51] * greglap (~Adium@198.228.210.230) has joined #ceph
[17:52] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[17:53] * Tv (~Tv|work@ip-66-33-206-8.dreamhost.com) has joined #ceph
[17:57] <sagewk> josef: :) is there anything we can do to help narrow this down?
[17:57] <josef> sagewk: i figured out whats going on, i'm working on fixing it
[17:58] <sagewk> oh cool. let us know when you have something you want us to test
[17:58] <josef> its a giant mess
[17:58] <sagewk> heh
[17:58] <josef> will do, i should have something this afternoon
[17:58] <sagewk> is it at all related to the async commits?
[17:59] <sagewk> that's the main thing that comes to mind where we would be that different than other workloads
[17:59] <sagewk> that and, i guess, a lot more snapshots (one every ~30 seconds)
[17:59] <josef> yeah its the snapshots
[17:59] * darktim (~andre@ticket1.nine.ch) Quit (Remote host closed the connection)
[17:59] <josef> we reserve space for the orphan item, updating the inode, and 2 for the truncate
[18:00] <josef> but the problem is if you have alot of snapshots the truncate operation is going to use those 2 slots quickly
[18:00] <josef> and we never refill it, so if you have to stop the transaction and restart it again you could easily exhaust this space
[18:00] <josef> and then when we go to update the inode we dont have that saved space anymore
[18:01] <josef> so i'm going to allocate a block rsv everytime we truncate to hold onto the truncate specific space
[18:01] <josef> and then only account for the orphan item in the orphan reserve
[18:01] <josef> and then use the transaction reserve for updating the inode
[18:01] <josef> that should make everybody happy
[18:01] <sagewk> sounds like a plan
[18:02] <josef> a terrible one too, but i dont have any better ideas
[18:02] <sagewk> :)
[18:04] <greglap> sagewk: you have any ideas about that "Object size" thread on the mailing list?
[18:04] <greglap> the data counts don't match up and I haven't been able to dig up any known confounding factors
[18:04] <sagewk> i haven't followed it closely.. i'll take a look
[18:04] <sagewk> as in, the ceph df doesn't make sense wrt the individual osd dfs?
[18:05] <greglap> yeah...
[18:05] <sagewk> ceph pg dump -o - will show the per-pg counts it's adding up (the osd lines at the end)
[18:05] <sagewk> er, per-osd
[18:05] <greglap> 12222MB data, 53579 used, each OSD has its own partition/disk
[18:05] <greglap> snapshots are accounted for correctly in doing that, right?
[18:06] <sagewk> the 'data' count is a sum of object sizes.
[18:06] <sagewk> oh
[18:07] <sagewk> it subtracts out extents allocated to two objects on the assumption it's doing the btrfs clone_range. if it's ext3 that's not the case and will undercount
[18:07] <sagewk> maybe that's it?
[18:07] <greglap> hmm, maybe that's it, dunno if he's using snapshots
[18:07] <greglap> guess I'll ask
[18:07] <greglap> because there's nothing else on his partitions and the journals are located elsewhere and that's the usual cause
[18:07] <sagewk> in any case, 'data' is the sum of object sizes. the other counts are a sum over statfs results on all osds (which you can see from ceph pg dump -o -)
[18:08] <sagewk> all of the numbers come from values in the pg dump actually, so that alone should have all the answers
[18:08] <greglap> yeah, I'm just trying to figure out how the counts could mismatch so much, I'll ask for the pg dump too
[18:38] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) Quit (Quit: zzZZZZzz)
[18:41] * Yoric (~David@87-231-38-145.rev.numericable.fr) Quit (Quit: Yoric)
[18:41] * greglap (~Adium@198.228.210.230) Quit (Read error: Connection reset by peer)
[18:54] * sagelap (~sage@12.248.40.138) has joined #ceph
[18:54] * sagelap (~sage@12.248.40.138) has left #ceph
[18:57] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[18:57] * joshd (~joshd@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:02] * cmccabe (~cmccabe@208.80.64.174) has joined #ceph
[19:09] * bchrisman (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[19:21] * pombreda (~Administr@109.128.209.232) Quit (Quit: Leaving.)
[19:24] * neurodrone (~neurodron@dhcp214-082.wireless.buffalo.edu) has joined #ceph
[19:24] * neurodrone (~neurodron@dhcp214-082.wireless.buffalo.edu) Quit ()
[19:29] <gregaf> cmccabe, everybody: skype!
[19:29] * aliguori (~anthony@32.97.110.51) Quit (Ping timeout: 480 seconds)
[19:29] <gregaf> bchrisman: you guys look to be online, call if you want in (although Sage is out today)
[19:30] <bchrisman> cool
[19:47] * neurodrone (~neurodron@dhcp214-082.wireless.buffalo.edu) has joined #ceph
[19:57] * neurodrone (~neurodron@dhcp214-082.wireless.buffalo.edu) Quit (Quit: zzZZZZzz)
[19:58] * aliguori (~anthony@32.97.110.59) has joined #ceph
[19:58] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[21:04] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving.)
[21:04] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[21:04] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) Quit ()
[21:07] * neurodrone (~neurodron@dhcp211-006.wireless.buffalo.edu) has joined #ceph
[21:29] * aliguori (~anthony@32.97.110.59) Quit (Remote host closed the connection)
[21:33] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[21:51] <greglap> wido: when's the last time you upgraded your cluster
[21:52] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[21:59] <wido> greglap: some time ago
[21:59] <wido> I think about a week ago
[21:59] <greglap> wido: okay
[21:59] <greglap> you probably don't want to upgrade to current master, there's a problem with data replication
[21:59] <wido> greglap: My cluster is broken, a lot
[22:00] <wido> You might have seen my issue
[22:00] <wido> monitor eating memory, OSD's which don't want to talk with eachother
[22:03] <wido> greglap: I'm waiting for the peering/recovery refactor, hopefully that will fix the issues I'm seeing
[22:04] <wido> got a second (6 osd) cluster running which I upgraded to 0.27 last week, that one has been running fine for the last two months, serving one VM via RBD
[22:04] <greglap> wido: right, I've been alternating between ongoing MDS stuff and helping Sam with the OSD refactor
[22:04] <greglap> we've got 3 people on that now so it shouldn't be too long
[22:14] <wido> greglap: Yeah, I guessed so. I'll wait paitiently
[22:14] <wido> :)
[22:24] * Yulya_the_drama_queen (~Yulya@ip-95-220-155-131.bb.netbynet.ru) Quit (Ping timeout: 480 seconds)
[22:33] * neurodrone (~neurodron@dhcp211-006.wireless.buffalo.edu) Quit (Quit: zzZZZZzz)
[22:34] * Yulya_the_drama_queen (~Yulya@ip-95-220-189-119.bb.netbynet.ru) has joined #ceph
[22:36] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has left #ceph
[22:36] * greglap (~Adium@ip-66-33-206-8.dreamhost.com) has joined #ceph
[23:33] * allsystemsarego (~allsystem@188.25.128.31) Quit (Quit: Leaving)
[23:34] * neurodrone (~neurodron@cpe-76-180-162-12.buffalo.res.rr.com) has joined #ceph
[23:52] * ghaskins_mobile (~ghaskins_@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[23:57] * sagelap (~sage@12.248.40.138) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.