#ceph IRC Log

Index

IRC Log for 2010-09-09

Timestamps are in GMT/BST.

[2:06] * conner (~conner@leo.tuc.noao.edu) Quit (Ping timeout: 480 seconds)
[2:15] * conner (~conner@leo.tuc.noao.edu) has joined #ceph
[3:49] * _MKFG_ (~fraggod@188.226.51.71) Quit (Remote host closed the connection)
[4:36] * lidongyang (~lidongyan@222.126.194.154) Quit (Remote host closed the connection)
[4:49] * MK_FG (~fraggod@wall.mplik.ru) has joined #ceph
[6:50] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) Quit (Quit: Osso)
[8:28] * littlejo (~joseph@78.155.152.6) Quit (Quit: Quitte)
[8:33] * allsystemsarego (~allsystem@188.27.166.252) has joined #ceph
[8:38] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[9:23] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[10:05] * Jiaju (~jjzhang@222.126.194.154) Quit (Ping timeout: 480 seconds)
[12:11] * oksamyt (~sana@193.151.59.60) has joined #ceph
[12:11] * Yoric (~David@213.144.210.93) has joined #ceph
[12:11] <oksamyt> hello all
[12:19] <oksamyt> i have a problem, and google didn't help me to solve it
[12:19] <oksamyt> maybe someone knows what i should do
[12:19] <oksamyt> i have checked out the latest version of ceph and tried to follow this howto:
[12:19] <oksamyt> http://www.ece.umd.edu/~posulliv/ceph/cluster_build.html
[12:19] <oksamyt> however, i got stuck at the 'mkmonfs' step
[12:19] <oksamyt> as i found out, the mkmonfs command was removed in the latest version, and the cmon --mkfs should provide the corresponding functionality, for example:
[12:19] <oksamyt> cmon -i 0 --mkfs
[12:19] <oksamyt> (i have the mon data value in the config file)
[12:19] <oksamyt> but trying to execute this command i get the 'usage' output, as if i have used incorrect options
[12:19] <oksamyt> the question is: how do i create the monitor fs in the current release?
[12:30] <oksamyt> seems like mkcephfs does the trick, sorry for bothering :-)
[12:57] * oksamyt (~sana@193.151.59.60) has left #ceph
[13:55] * MK_FG (~fraggod@wall.mplik.ru) Quit (Remote host closed the connection)
[14:21] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) has joined #ceph
[15:15] * MK_FG (~fraggod@188.226.51.71) has joined #ceph
[16:03] * f4m8 is now known as f4m8_
[16:07] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) Quit (Quit: Osso)
[18:18] * sentinel_x73 (~sentinel_@188.226.51.71) has joined #ceph
[18:37] * Yoric (~David@213.144.210.93) Quit (Quit: Yoric)
[19:04] * oksamyt (~sana@193.151.59.60) has joined #ceph
[19:05] <oksamyt> hi all again
[19:07] <gregaf> hi oksamyt
[19:08] <gregaf> sorry we couldn't help you earlier, most of us are in the United States time zones :)
[19:09] <oksamyt> that problem was easy compared to what i have now :-)
[19:09] <oksamyt> i have a monitor, mds, and osd on one node, and just osd on another one
[19:10] <oksamyt> the first one starts normally and shows the storage, i can mount it from another machine
[19:11] <oksamyt> but when i start the osd on the second one, it starts doing a lot of i/o operations on the hdd (journalling, i guess) and the storage doesn't show in the monitor
[19:11] <oksamyt> i have tried reading logs but i can't understand whether there are any errors and what can be done about it
[19:12] <gregaf> can we see your config file?
[19:12] <sagewk> there may be some delay before the storage shows up. look at the last few lines of 'ceph pg dump -o -' to see what it has for each osd
[19:15] <oksamyt> gregaf, sure:
[19:15] <oksamyt> http://paste.rtg.in.ua/d86081dc8cf21b82b9da99f6e2eeb012/
[19:17] <gregaf> well, that part looks okay
[19:17] <oksamyt> sagewk, the output doesn't seem to change when i start the osd, and i waited about half an hour before stopping it
[19:17] <gregaf> how are you starting the system?
[19:17] <oksamyt> can the storage nodes be different in size?
[19:18] <gregaf> yes
[19:18] <oksamyt> service ceph start -c /etc/ceph/ceph.conf
[19:18] <gregaf> if you want to balance the data distribution you'll need to modify the CRUSH map, but that's not your problem right now
[19:18] <sagewk> this is osd1 i assume? do you see a line after 'osdstat...' that starts with 1 at the bottom of the pg dump output?
[19:18] <oksamyt> i have 1gb and 260 mb osds
[19:19] <oksamyt> osdstat kbused kbavail kb hb in hb out
[19:19] <oksamyt> 0 2028 1046548 1048576 [] []
[19:19] <oksamyt> sum 2028 1046548 1048576
[19:19] <oksamyt> sagewk, it only has 0, and the needed osd is osd1
[19:19] <oksamyt> i can paste some log lines if they will be of any help
[19:19] <oksamyt> there is something about missing epochs
[19:20] <gregaf> your config file is on both machines, right?
[19:20] <gregaf> can you start them all up together using the -a option and see what happens?
[19:21] <oksamyt> 10.09.09_20:20:53.982645 b1876b70 -- 192.168.1.5:6801/12879 <== mon0 192.168.1.134:6789/0 124 ==== mon_subscribe_ack(300s) v1 ==== 20+0+0 (929125189 0 0) 0xa067908
[19:21] <oksamyt> 10.09.09_20:20:53.982713 b1876b70 -- 192.168.1.5:6801/12879 dispatch_throttle_release 20 to dispatch throttler 1866/104857600
[19:21] <oksamyt> 10.09.09_20:20:53.982750 b1876b70 -- 192.168.1.5:6801/12879 done calling dispatch on 0xa067908
[19:21] <oksamyt> 10.09.09_20:20:53.982787 b1876b70 -- 192.168.1.5:6801/12879 <== mon0 192.168.1.134:6789/0 125 ==== osd_map(27,27) v1 ==== 1826+0+0 (1345188110 0 0) 0xa066eb8
[19:21] <oksamyt> 10.09.09_20:20:53.982846 b1876b70 osd1 0 _dispatch 0xa066eb8 osd_map(27,27) v1
[19:21] <oksamyt> 10.09.09_20:20:53.982883 b1876b70 osd1 0 handle_osd_map epochs [27,27], i have 0
[19:21] <oksamyt> 10.09.09_20:20:53.983000 b1876b70 osd1 0 handle_osd_map already had full map epoch 27
[19:21] <oksamyt> 10.09.09_20:20:53.983106 b1876b70 osd1 0 .. it is 1794 bytes
[19:21] <oksamyt> 10.09.09_20:20:53.983146 b1876b70 osd1 0 cur 0 < newest 27
[19:21] <oksamyt> 10.09.09_20:20:53.983229 b1876b70 osd1 0 handle_osd_map missing epoch 1
[19:21] <oksamyt> 10.09.09_20:20:53.983290 b1876b70 -- 192.168.1.5:6801/12879 --> mon0 192.168.1.134:6789/0 -- mon_subscribe({monmap=2+,osdmap=1}) v1 -- ?+0 0xa066740
[19:21] <oksamyt> 10.09.09_20:20:53.983360 b1876b70 -- 192.168.1.5:6801/12879 submit_message mon_subscribe({monmap=2+,osdmap=1}) v1 remote, 192.168.1.134:6789/0, have pipe.
[19:21] <oksamyt> 10.09.09_20:20:53.983454 b1876b70 osd1 0 write_superblock sb(7b0d8bd7-0cdf-5244-8a38-a2d5acc25d94 osd1 e0 [11,27] lci=[0,0])
[19:22] <oksamyt> this log is on the lappy (osd1)
[19:22] <oksamyt> i need root's ssh key for that, right?
[19:23] <gregaf> what version are you running?
[19:23] <gregaf> this looks like a protocol bug we found and fixed a month or two ago
[19:24] <oksamyt> ceph version 0.21 (090436f5) on moar (osd0)
[19:24] <oksamyt> ceph version 0.21.1 (7aa332cd82d5a349e08d0f9a0149acd2489e0a80) on lappy (sod1)
[19:24] <oksamyt> osd*
[19:25] <oksamyt> i'll try to run version 0.21.1 on moar too
[19:26] <sagewk> can you post your whole osd1 log somewhere?
[19:27] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) has joined #ceph
[19:29] <oksamyt> should i use the 20th level of debugging for all osd parts (ms, osd, filestore, journal)?
[19:30] <sagewk> debug ms = 1, debug osd = 20. shouldn't need the others for this problem.
[19:36] <oksamyt> http://paste.ubuntu.com/491130/
[19:39] <sagewk> well, the osd is behaving. can you add 'debug ms = 1' and 'debug mon = 20' to the [mon] section, restart cmon, restart cosd, and then post the last bit of the mon log?
[19:39] <oksamyt> at the first node this time, right?
[19:40] <sagewk> wait, i think i see the bug.
[20:11] <sagewk> oksamyt: are you using .debs or building from source?
[20:13] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) has joined #ceph
[20:15] <oksamyt> on the first node it was installed by apt-get (but i have also compiled from source), and on the second node i used source
[20:16] <sagewk> are you pulling source from git?
[20:16] <oksamyt> yes
[20:16] <oksamyt> git clone git://ceph.newdream.net/ceph.git
[20:16] <sagewk> git fetch ; git checkout -b mon_osdmap_sub origin/mon_osdmap_sub
[20:17] <sagewk> rebuild cmon, restart the monitor, and see if the osd starts after that
[20:29] <oksamyt> hm, it starts and no longer writes to the disk
[20:29] <oksamyt> also, this appears in the log on osd2:
[20:29] <oksamyt> http://paste.rtg.in.ua/544d22ffa27a51870c62c8619b4ddf3d/
[20:29] <oksamyt> osd1*
[20:31] <gregaf> oh, that's because you ran mkcephfs separately on each node, probably
[20:32] <gregaf> I think there's a command-line option to set the fsid, let me check
[20:32] <oksamyt> but the storage node is not recognized
[20:32] <gregaf> yeah, with the mismatch in fsids it doesn't want to continue
[20:32] <oksamyt> oh
[20:35] <gregaf> hrm, maybe not
[20:42] <gregaf> oksamyt: yeah, you're going to need to set up root access so you can run mkcephfs on one and have it set up the other
[20:43] <oksamyt> i see, so they have the same fsid?
[20:44] <gregaf> yeah
[20:45] <gregaf> Sage just made an issue so we can fix that: http://tracker.newdream.net/issues/400
[20:56] <oksamyt> can i change the path to the command on the remote server? it tries to run
[20:56] <oksamyt> === osd.1 ===
[20:56] <oksamyt> monmap.25276 100% 187 0.2KB/s 00:00
[20:56] <oksamyt> bash: ./cosd: No such file or directory
[20:56] <oksamyt> failed: 'ssh lappy ./cosd -c /etc/ceph/ceph.conf --monmap /tmp/monmap.25276 -i 1 --mkfs --osd-data /mnt/ceph'
[20:56] <oksamyt> but the cosd is in /home/sana/ceph/ceph/src
[20:58] * conner (~conner@leo.tuc.noao.edu) Quit (Read error: Operation timed out)
[21:10] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) Quit (Quit: Osso)
[21:10] * Osso (osso@AMontsouris-755-1-19-69.w90-46.abo.wanadoo.fr) has joined #ceph
[21:20] * conner (~conner@leo.tuc.noao.edu) has joined #ceph
[22:03] <oksamyt> i have run mkcephfs, it seems to have created everything, but after i start the service on the first node, the osd0 does not appear:
[22:03] <oksamyt> http://paste.rtg.in.ua/9e70641cb58883b3f44943828f89adca/
[22:04] <oksamyt> the log:
[22:04] <oksamyt> 10.09.09_23:00:45.273587 7fd2f18c0710 osd0 1 map says i am down. switching to boot state.
[22:04] <oksamyt> 10.09.09_23:00:45.273732 7fd2f18c0710 log [WRN] : map e1 wrongly marked me down
[22:05] <oksamyt> the same with osd1
[22:24] * ezgreg (~Greg@78.155.152.6) Quit (Read error: Connection reset by peer)
[22:24] * ezgreg (~Greg@78.155.152.6) has joined #ceph
[22:37] * allsystemsarego (~allsystem@188.27.166.252) Quit (Quit: Leaving)
[22:51] <sagewk> oksamyt: can you post the mon and osd logs somewhere? something is preventing the osds from joining the cluster
[23:05] <oksamyt> hm, i have restarted everything once more and now it says there are two osds:
[23:05] <oksamyt> 10.09.10_00:05:10.158927 mon <- [pg,stat]
[23:05] <oksamyt> 10.09.10_00:05:10.159751 mon0 -> 'v6: 528 pgs: 528 creating; 0 KB data, 0 KB used, 0 KB / 0 KB avail' (0)
[23:05] <oksamyt> root@moar:~/src/ceph/src# ./ceph osd stat
[23:05] <oksamyt> 10.09.10_00:05:11.699330 mon <- [osd,stat]
[23:05] <oksamyt> 10.09.10_00:05:11.700967 mon0 -> 'e5: 2 osds: 2 up, 2 in' (0)
[23:06] <gregaf> looks like it's working then?
[23:07] <oksamyt> but 0 kb available
[23:07] <oksamyt> http://paste.rtg.in.ua/c3d004472dcebcb43b1ccba52d4f05a5/
[23:08] <oksamyt> it keeps creating sequences (last line), is it ok?
[23:10] <gregaf> actually the status is "up:creating" and "seq 71" is a sort of epoch number
[23:11] <oksamyt> do i have to wait until seq 528 then?
[23:13] <oksamyt> i have just restarted ceph on osd2, and now it seems to be available!
[23:14] <oksamyt> osdstat kbused kbavail kb hb in hb out
[23:14] <oksamyt> 0 0 0 0 [] []
[23:14] <oksamyt> 1 468 265772 266240 [0] []
[23:14] <oksamyt> sum 468 265772 266240
[23:17] <oksamyt> hmm
[23:17] <oksamyt> i restarted ceph on osd0 and now i have two nodes working!
[23:17] <sagewk> cool.
[23:18] <oksamyt> thank you very-very much, gregaf and sagewk!
[23:18] <oksamyt> still i don't exactly understand what the problem was
[23:18] <gregaf> np, thanks for testing!
[23:19] <gregaf> we're in a meeting now, sorry for slow responses...
[23:19] <gregaf> did you maybe not fully restart everything on the new map/fs data?
[23:19] <oksamyt> actually, we are going use ceph for hosting home directories in a students' internet centre of a university :-)
[23:20] <oksamyt> i have restarted several times, but maybe that wasn't enough :-)
[23:21] <gregaf> I think you needed to shut down each daemon, run mkcephfs, and then start them all again
[23:23] <oksamyt> uh-huh
[23:31] <oksamyt> as i understood, ideally i just set up all the servers with a single mkcephfs call (in the future)
[23:31] <oksamyt> do i need to switch to the mon_osdmap_sub branch?
[23:32] <sagewk> i'vel merged that into testing,.. you can git fetch and git checkout -b testing origin/testing and stick with that for now
[23:43] <oksamyt> thanks again and good luck in development!
[23:48] * oksamyt (~sana@193.151.59.60) has left #ceph
[23:59] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.