#ceph IRC Log


IRC Log for 2011-09-03

Timestamps are in GMT/BST.

[0:01] <Tv> reproducible
[0:05] <Tv> http://tracker.newdream.net/issues/1493
[0:07] * cmccabe (~cmccabe@ma35636d0.tmodns.net) has joined #ceph
[0:10] * lxo (~aoliva@9KCAAAUAB.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[0:45] * cmccabe (~cmccabe@ma35636d0.tmodns.net) Quit (Ping timeout: 480 seconds)
[1:04] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has joined #ceph
[1:12] <johnl_> hey, I bust my test ceph cluster, but it's in a kind of weird state
[1:12] <johnl_> wiped my 4 osds, cosd --mkfs on each of em and started em up again
[1:13] <johnl_> mons think all 4 osds are in and fine
[1:13] <johnl_> but "rados df" shows output that suggests something still expects data
[1:13] <johnl_> 451,242 objects in data pool
[1:14] <johnl_> I didn't know the monitors had local on-disk state
[1:15] <johnl_> (I see that they do now :)
[1:16] <johnl_> I suppose I'm asking how the monitors can think there are objects in the data pool that do not exist on the osds, without complaining about degraded state
[1:20] * jojy (~jojyvargh@70-35-37-146.static.wiline.com) Quit (Quit: jojy)
[1:40] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[2:07] <cmccabe> johnl: monitors have to have local on-disk state to implement paxos
[2:08] <cmccabe> johnl: the state that you're thinking about now probably lives on the metadata servers (mdses) rather than the monitors though
[2:09] <cmccabe> johnl: actually I'm not sure exactly what state is local to the metadata severs, as opposed to stored in objects on the OSDs
[2:09] <cmccabe> johnl: anyway, you need to run mkfs.ceph if you want to mkfs
[2:10] <Tv> cmccabe: my understanding is cmds has *no* local state
[2:10] <sjust> bchrisman: back, sorry, got busy
[2:10] <bchrisman> hey.. me too :)
[2:10] <Tv> there is no mkfs.ceph, but there is a mkcephfs.. it definitely doesn't act like the typical mkfs.foo /dev/sda4 things
[2:11] <bchrisman> I've really got no clue what the issue is there??? what's the section that's failing?
[2:11] <cmccabe> tv: well, I can see that there is a cmon --mkfs
[2:11] <sjust> it's related to the filestore update code, it must have left some threads running some how
[2:11] <sjust> can you get it to start now?
[2:12] <bchrisman> I can't...
[2:12] <bchrisman> I can check for lingering processes.
[2:12] <cmccabe> johnl: ok, to clarify: you and tv are right. There is no mkfs for cmds per se.
[2:13] <bchrisman> sjust: odd though.. this happens right after I do a mkcephfs
[2:13] <cmccabe> johnl: however, there is a mkfs for the monitors, and I think you'll want to run that when reinstalling your FS
[2:13] <Tv> some day http://ceph.newdream.net/docs/latest/architecture/ will be enough to clarify all of this ;)
[2:14] <sjust> right after a mkcephfs? that's odd
[2:14] * Tv (~Tv|work@aon.hq.newdream.net) Quit (Quit: Tv)
[2:14] <bchrisman> well.. testing that again right now.. should only take a minute
[2:14] <sjust> ok, thanks for the help, this isn't showing up here
[2:16] <bchrisman> mkcephfs -a -c /etc/ceph/ceph.conf --mkbtrfs
[2:16] <bchrisman> service ceph -a start
[2:16] <bchrisman> yeah.. that does it.. very weird..
[2:17] <bchrisman> there any more data I can get? I really don't think it's got anything running.. I mean.. really absolutely shouldn't.
[2:17] <sjust> hmm
[2:20] <bchrisman> unfortunatley I can only track it back to: "working in de681da.. not working now".. feh
[2:20] <sjust> looking at mkcephfs
[2:21] <sjust> try running ./cosd --convert-fs -c <conf_file> -i <num>
[2:27] * adjohn (~adjohn@ Quit (Quit: adjohn)
[2:38] <bchrisman> ???convert-fs is not recognized
[2:38] <bchrisman> ahh filestore
[2:39] <bchrisman> no dice..
[2:43] <bchrisman> output was:
[2:43] <bchrisman> 2011-09-03 00:42:46.229338 7f62a9465720 FileStore is up to date.
[2:43] <bchrisman> 2011-09-03 00:42:46.230617 7f62a9465720 Converted Filestore /data/osd0
[2:43] <bchrisman> ...
[2:43] <bchrisman> 2011-09-03 00:43:16.416336 7f771f07a720 FileStore is up to date.
[2:43] <bchrisman> starting osd0 at osd_data /data/osd0 /dev/sda6
[2:43] <bchrisman> 2011-09-03 00:43:16.417362 7f771f07a720 global_init_daemonize: BUG: there are 1 child threads already started that will now die!
[2:44] <sjust> hang on, that all happened with just ./cosd --convert-filestore <etc> ?
[2:44] <bchrisman> nope
[2:44] <sjust> oh, ok
[2:44] <bchrisman> I attempted to start the cosd directly after
[2:44] <sjust> what was the second command?
[2:44] <bchrisman> cosd -i 0 -c /etc/ceph/ceph.conf
[2:49] <bchrisman> might be able to look more over the weekend.
[2:50] <sjust> well, I have a patch that may fix it
[2:50] <bchrisman> ahh cool.. going into master?
[2:50] <sjust> yeah, one sec
[2:51] <bchrisman> cool
[2:54] <bchrisman> will be off for a while.. will try another build later on. thx for the help.
[2:55] <sjust> ok, sure
[3:12] * cmccabe (~cmccabe@c-24-23-254-199.hsd1.ca.comcast.net) has left #ceph
[3:15] <sage> sjust: still there?
[3:38] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:04] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[4:04] * jojy (~jojyvargh@75-54-231-2.lightspeed.sntcca.sbcglobal.net) Quit ()
[4:43] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:44] * bchrisman (~Adium@c-98-207-207-62.hsd1.ca.comcast.net) has joined #ceph
[5:44] * lxo (~aoliva@9KCAAAUJ3.tor-irc.dnsbl.oftc.net) has joined #ceph
[9:55] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[9:58] * lxo (~aoliva@9KCAAAUJ3.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[10:02] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Ping timeout: 480 seconds)
[10:10] * lxo (~aoliva@659AAD0ZN.tor-irc.dnsbl.oftc.net) has joined #ceph
[13:36] * DanielFriesen (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (Remote host closed the connection)
[13:39] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[16:37] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) Quit (resistance.oftc.net synthon.oftc.net)
[16:37] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (resistance.oftc.net synthon.oftc.net)
[16:37] * todin (tuxadero@kudu.in-berlin.de) Quit (resistance.oftc.net synthon.oftc.net)
[16:37] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) Quit (resistance.oftc.net synthon.oftc.net)
[16:37] * ajm (adam@adam.gs) Quit (resistance.oftc.net synthon.oftc.net)
[16:37] * rsharpe (~Adium@70-35-37-146.static.wiline.com) Quit (resistance.oftc.net synthon.oftc.net)
[16:37] * u3q (~ben@uranus.tspigot.net) Quit (resistance.oftc.net synthon.oftc.net)
[16:37] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) Quit (resistance.oftc.net synthon.oftc.net)
[16:44] * Dantman (~dantman@S0106001731dfdb56.vs.shawcable.net) has joined #ceph
[16:44] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) has joined #ceph
[16:44] * ghaskins (~ghaskins@66-189-113-47.dhcp.oxfr.ma.charter.com) has joined #ceph
[16:44] * u3q (~ben@uranus.tspigot.net) has joined #ceph
[16:44] * ajm (adam@adam.gs) has joined #ceph
[16:44] * rsharpe (~Adium@70-35-37-146.static.wiline.com) has joined #ceph
[16:44] * sage (~sage@dsl092-035-022.lax1.dsl.speakeasy.net) has joined #ceph
[16:44] * todin (tuxadero@kudu.in-berlin.de) has joined #ceph
[17:38] <darkfader> does one of you happen to know why /sys/block/sda/device/model is being padded to 17 chars?
[17:38] <darkfader> trying to make my node setups a bit better but this is the oddest thing i've ever seen
[17:38] <darkfader> (except windows)
[18:31] * Salade (~chatzilla@ has joined #ceph
[18:31] <Salade> hi
[18:34] <Salade> I am trying to run Ceph on 11.04 in a EC2 large instance. all cmon, mds and osd are up and running. but when i try to mount, it said "FATAL: Module ceph not found.", "mount.ceph: modprobe failed, exit status 1", "mount error: ceph filesystem not supported by the system"
[18:35] <Salade> my ceph is installed via apt-get according to http://ceph.newdream.net/wiki/Debian
[18:37] <Salade> the kernel is 2.6.38-8-virtual. i wonder if anyone know if this EC2 kernel can use Ceph. 2.6.38-8-virtual is different from -server. any idea?
[19:27] <Salade> nvm. my kernel should have no ceph kernel module. i'm trying fuse and got a " auth_reply(proto 2 -1 Operation not permitted)" error. The full log message is " <== mon0 2 ==== auth_reply(proto 2 -1 Operation not permitted) v1 ==== 24+0+0 (3074150679 0 0) 0x18ffe00 con 0x19273c0". any idea?
[19:29] <Salade> got that error when i use "cfuse"
[19:50] * Salade (~chatzilla@ Quit (Quit: ChatZilla 0.9.87 [Firefox 3.6.21/20110830092825])
[20:59] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) has joined #ceph
[21:15] * adjohn (~adjohn@50-0-92-177.dsl.dynamic.sonic.net) Quit (Quit: adjohn)
[21:40] * aliguori (~anthony@cpe-70-123-132-139.austin.res.rr.com) Quit (Remote host closed the connection)
[22:13] * lxo (~aoliva@659AAD0ZN.tor-irc.dnsbl.oftc.net) Quit (Ping timeout: 480 seconds)
[22:17] * lxo (~aoliva@659AAD06Q.tor-irc.dnsbl.oftc.net) has joined #ceph

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.