#ceph IRC Log


IRC Log for 2010-09-16

Timestamps are in GMT/BST.

[0:38] * ezgreg_taf (~Greg@ has joined #ceph
[0:38] * ezgreg (~Greg@ Quit (Read error: Connection reset by peer)
[1:40] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[2:18] * deksai (~deksai@96-35-100-192.dhcp.bycy.mi.charter.com) has joined #ceph
[2:32] <yehudasa> wido: still working on it
[4:10] * xilei (~xilei@ has joined #ceph
[5:58] * deksai (~deksai@96-35-100-192.dhcp.bycy.mi.charter.com) Quit (Ping timeout: 480 seconds)
[5:59] * f4m8 (~f4m8@lug-owl.de) has joined #ceph
[6:17] * f4m8 is now known as f4m8_
[6:30] * Osso (osso@AMontsouris-755-1-7-230.w86-212.abo.wanadoo.fr) Quit (Quit: Osso)
[8:18] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:42] * allsystemsarego (~allsystem@ has joined #ceph
[9:01] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:21] * ezgreg_taf (~Greg@ Quit (Quit: Quitte)
[9:36] * ezgreg (~Greg@ has joined #ceph
[9:41] * hijacker (~hijacker@ has joined #ceph
[10:11] * Yoric (~David@ has joined #ceph
[11:09] * darktim (~andre@pcandre.nine.ch) has joined #ceph
[11:09] * andret (~andre@pcandre.nine.ch) Quit (Read error: Connection reset by peer)
[12:56] * Yoric_ (~David@ has joined #ceph
[12:56] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[12:56] * Yoric_ is now known as Yoric
[13:00] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[13:55] * Yoric (~David@ has joined #ceph
[15:01] * Yoric_ (~David@ has joined #ceph
[15:01] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[15:01] * Yoric_ is now known as Yoric
[15:45] * Osso (osso@AMontsouris-755-1-7-230.w86-212.abo.wanadoo.fr) has joined #ceph
[15:46] * deksai (~deksai@96-35-100-192.dhcp.bycy.mi.charter.com) has joined #ceph
[17:41] * fred_ (~fred@199-59.79-83.cust.bluewin.ch) has joined #ceph
[17:41] <fred_> hi
[17:51] * gregphone (~gregphone@ has joined #ceph
[17:52] <gregphone> fred_: hi
[17:54] <fred_> hi
[17:56] <fred_> gregphone, how is ceph development going? I'm still following the git repo but since src/TODO is not updated anymore, it's not as easy to follow as before...
[17:57] <gregphone> TODO has been superseded by the much more useful tracker.newdream.net
[17:58] <gregphone> which is redmine, so you can do all kinds if gunfights
[17:58] <gregphone> *fun things
[17:58] <gregphone> development is progressing as always
[18:03] <fred_> I'm still at 0.21.1, but had a problem earlier this week: there was only 90Mb free on my ceph fs (700Gb total) although a 'du' from ceph root only showed 130Gb used (replicated once, i.e., using approx 260Gb of 700Gb)
[18:04] <fred_> I had to restart all nodes, fs took a while to get back up, but now it's fine, the garbage disappeared and my free space is back
[18:04] <fred_> Is it something you have seen already ?
[18:04] <gregphone> don't think so, you should make a bug!
[18:05] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[18:05] <fred_> ok, I'll do after upgrading to latest "stable"
[18:05] <gregphone> although depending on your setup it could have been logs
[18:07] <fred_> what is strange is that used space jumped up every week although I don't have batch jobs running at that time
[18:08] <gregphone> how long-lived was this instance?
[18:08] <fred_> I've got btrfs partitions dedicated to my OSDs, and these were getting full, so I don't think it could be logs
[18:08] <fred_> about 1-2 months, but it seems ceph's init.d now restart everything once a day
[18:09] <gregphone> I think I remember something about map states being unbounded right now, that might be it
[18:11] <fred_> and is it something that got (or will get) into 0.21.3 ?
[18:12] <gregphone> mmm, not sure
[18:13] <gregphone> I've been doing into other things lately
[18:13] <fred_> anyway, I'll wait for 0.21.3 and report a bug if I see this again
[18:13] <gregphone> but I'll make sure to discuss it with sage today!
[18:13] <fred_> thank you
[18:14] <fred_> other things ceph related? hdfs related ?
[18:15] <gregphone> oh, always ceph related :)
[18:15] <gregphone> been doing some work on the messaging system and stuff
[18:26] * gregphone_ (~gregphone@ has joined #ceph
[18:33] * Yoric_ (~David@ has joined #ceph
[18:33] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[18:33] * Yoric_ is now known as Yoric
[18:33] * gregphone (~gregphone@ Quit (Ping timeout: 480 seconds)
[18:33] * gregphone_ is now known as gregphone
[18:35] * Yoric_ (~David@ has joined #ceph
[18:35] * Yoric (~David@ Quit (Read error: Connection reset by peer)
[18:35] * Yoric_ is now known as Yoric
[18:44] * fred_ (~fred@199-59.79-83.cust.bluewin.ch) Quit (Quit: Leaving)
[18:52] * gregphone (~gregphone@ Quit (Ping timeout: 480 seconds)
[18:56] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[19:49] * Yoric (~David@ Quit (Quit: Yoric)
[19:59] <wido> hi
[19:59] <yehudasa> hey wido!
[20:00] <yehudasa> had some progress on that btrfs issue
[20:00] <yehudasa> however, I'd like to be able to setup direct access to your device from here so that I can debug it more easily
[20:02] <wido> direct access? what do you mean exactly? you mean something like KVM over IP?
[20:03] <yehudasa> like iscsi
[20:05] <wido> sorry, I'm not understanding you fully. You want to access my disk over iSCSI? Or remove the btrfs stripe?
[20:05] <wido> but you are free to do what you want, it's a test cluster :)
[20:06] <yehudasa> yeah, access directly via iscsi
[20:07] <yehudasa> however, might be that removing the btrfs stripe is also a good way to proceed
[20:07] <yehudasa> can you do that easily?
[20:07] <wido> removing the stripe would involve doing a mkfs.btrfs
[20:08] <wido> you want a full copy of the data?
[20:08] <yehudasa> oh, not sure if it's a good idea
[20:09] <yehudasa> otoh, we can do that
[20:09] <yehudasa> what I'm worried about is that the problem will not happen anymore without the btrfs stripe
[20:09] <yehudasa> however, we can do it only on one node, right\
[20:09] <yehudasa> ?
[20:09] <wido> yes, sure
[20:10] <wido> I could just do a mkfs.btrfs so it goes back to one disk
[20:10] <wido> and we can see if the node then survives
[20:10] <yehudasa> yeah
[20:10] <wido> I'll do that on node05
[20:11] <yehudasa> sage says that you can probably remove a device online from the pool
[20:11] <sagewk> https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
[20:12] <sagewk> btrfs device delete /dev/whatever /mnt/foo
[20:12] <wido> indeed, i'll try
[20:13] <yehudasa> anyway, if you'd resort to doing the mkfs, can you copy the data?
[20:13] <wido> hmm: "btrfs: unable to go below two devices on raid1"
[20:13] <wido> I'll make a copy of the data to logger and then do a mkfs
[20:15] <yehudasa> great
[20:16] <wido> ok, backup is running right now
[20:16] <wido> anyone got a change yet to build the PHP extension? ;)
[20:17] <yehudasa> no, not yet..
[20:18] <wido> I'd like to donate the code to the Ceph project if you want, so the GIT could be hosted at Ceph's hosting and we could have a sub-project in the tracker
[20:19] <sagewk> wido: sure!
[20:21] <wido> cool :)
[21:02] * xilei (~xilei@ Quit (Quit: Leaving)
[21:13] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[21:15] <wido> yehudasa: backup of node05 is almost done, when done, i'll do the mkfs to get it back to one disk
[21:18] <yehudasa> wido: great
[21:44] <wido> oh, rsync was almost down to 0, but it's back again to 25000 remaining files
[21:44] <wido> this will take some time I think
[21:49] <yehudasa> hmm, ok
[22:03] <wido> yehudasa: it's done :)
[22:03] <wido> run the mkfs, so we get a clean fs on one disk
[22:04] <yehudasa> ok
[22:04] <wido> then copy the data back? Or just start the OSD empty (mkfs the OSD ofcourse)
[22:04] <yehudasa> yeah, copy the data back
[22:04] <wido> right now 4/12 OSD's are up
[22:04] <wido> i'll have to run a cosd --mkfs first i think? For the subvolumes to be created?
[22:04] <yehudasa> if you copy the data back, then no
[22:05] <wido> ok, then i'll copy it back now
[22:05] <wido> it will be running in a screen on node05, but i'm going afk in a moment
[22:14] <wido> yehudasa: i'm going afk, the rsync is running in a screen. If you want, you can check it and start the OSD when it's finished
[22:15] <yehudasa> ok
[22:15] <yehudasa> how can I check it?
[22:15] <gregaf> screen -dr? :p
[22:16] <wido> screen -x and detach with CTRL A+D
[22:32] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.