#ceph IRC Log

Index

IRC Log for 2012-09-13

Timestamps are in GMT/BST.

[0:00] <sjust> amatter_: lookin
[0:02] <amatter_> sjust: thx
[0:02] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:02] <nhmlap> mgalkiewicz: I have to go soon because we are going to have dinner. I guess going forward I'd see if you can try 0.51 on your staging cluster and see how that performs. If we could do some rados bench tests in isolation on that cluster it would be helpful.
[0:03] <mgalkiewicz> nhmlap: but staging works fine
[0:03] <nhmlap> mgalkiewicz: It's fast?
[0:03] <mgalkiewicz> yes
[0:03] <mgalkiewicz> however it is running only one postgres and mongo
[0:04] <mgalkiewicz> which basically do nothing
[0:04] <nhmlap> mgalkiewicz: Ok. How old is the staging filesystem relatve to production?
[0:04] <nhmlap> mgalkiewicz: Is it possible the production filesystem has more fragmentation?
[0:04] <mgalkiewicz> hmm like 1 month where production is 2 or 3
[0:04] <mgalkiewicz> dont think so
[0:05] <nhmlap> nhmlap: Another thing you may want to do is play around with blktrace and seekwatcher. They will let you look at what the activity on the underlying disks looks like.
[0:05] <nhmlap> er, sorry. I'm getting tired. :)
[0:05] <mgalkiewicz> well it is possible but still that system should be able to work 3 months without fragmentation, shouldnt it?
[0:05] <nhmlap> mgalkiewicz: is this btrfs?
[0:05] <mgalkiewicz> yes
[0:06] <mgalkiewicz> 12GB out of 1.7TB is used
[0:06] <nhmlap> mgalkiewicz: we've seen some pretty bad degradation with btrfs, especially with really small IOs. In our tests it was small objects, I'm not sure if it would also happen with small writes into larger objects.
[0:07] <mgalkiewicz> I can defragment filesystem
[0:07] <mgalkiewicz> if it is safe
[0:08] <nhmlap> mgalkiewicz: Not sure. I'm still a bit wary of btrfs in production...
[0:08] <nhmlap> mgalkiewicz: what kernel are you on?
[0:08] <mgalkiewicz> 3.2
[0:08] <mgalkiewicz> in my company we like bleeding edge
[0:08] <mgalkiewicz> :)
[0:08] <nhmlap> mgalkiewicz: I think there have been some fixes in recent kernels for btrfs.
[0:09] <mgalkiewicz> I would prefer to work with 3.2 because it will be in debian wheezy which we are using
[0:10] <nhmlap> mgalkiewicz: I guess for now I'd see if you can figure out if there is much fragmentation going on, maybe look at seeing if you can increase the write sizes of mongodb and postgres.
[0:10] <mgalkiewicz> to what values?
[0:12] <nhmlap> mgalkiewicz: if you are feeling like working extra hard, checkout blktrace and seekwatcher. You can see what the underlying disk activity looks like. Here's an example from an aging test I did with 4kb ios that shows all kinds of nasty behavior: http://nhm.ceph.com/movies/aging-test/i29-4KB.mpg
[0:12] <nhmlap> mgalkiewicz: The bigger the better, but I don't know what the limitations are when you do that.
[0:13] <nhmlap> compare that to a fresh filesystem with 4KB IOs: http://nhm.ceph.com/movies/aging-test/i0-4KB.mpg
[0:13] <mgalkiewicz> ok I will take a look
[0:14] <mgalkiewicz> probably not today so I will contact you tommorow or later
[0:14] <mgalkiewicz> thx for you help
[0:15] <nhmlap> mgalkiewicz: ok. No problem, sorry it's still a problem. Fixing performance issues with small IO is really tough.
[0:16] <mgalkiewicz> it would be nice to get an info how safe is to upgrade 0.48 to 0.51
[0:16] <mgalkiewicz> I am willing to perform it because of improvements you have made
[0:20] <nhmlap> yeah, I'm not terribly familiar with the upgrade process. Sage or one of the other guys might have some thoughts.
[0:21] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[0:22] <joshd> there shouldn't be any issues upgrading, but if you want to make sure you can test in your staging env first
[0:26] <mgalkiewicz> ok
[0:30] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[0:31] * EmilienM (~EmilienM@ADijon-654-1-133-33.w90-56.abo.wanadoo.fr) has left #ceph
[0:45] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[0:48] * houkouonchi-work (~linux@12.248.40.138) Quit (Remote host closed the connection)
[0:52] * BManojlovic (~steki@212.200.241.6) Quit (Quit: Ja odoh a vi sta 'ocete...)
[0:55] <lightspeed> I wonder how badly ceph would react to this scenario:
[0:55] <lightspeed> add a new OSD to a ceph cluster such that the backing device for the new OSD is an RBD device located on the ceph cluster to which the OSD is being added, and in a pool for which some PGs will be mapped to the new OSD :)
[0:57] <Tv_> run ceph in a vm backed by rbd...
[0:58] <Tv_> use rbd thin provisioning to create INFINITE STORAGE
[0:58] <gregaf> and INFINITE WRITE MULTIPLICATION! :D
[1:01] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[1:01] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[1:02] <lightspeed> if only one of the OSDs was recursive like that, do you think it'd still function, and just recurse writes sufficiently to map all blocks back to the non-recursive OSDs? :)
[1:03] <Tv_> everyone knows running ocfs2 on top of rbd is where it's at
[1:03] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[1:07] <ajm> lightspeed: you have a crazy mind :)
[1:08] <lightspeed> haha
[1:14] * mgalkiewicz (~mgalkiewi@staticline57333.toya.net.pl) has left #ceph
[1:15] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:15] * jlogan (~Thunderbi@2600:c00:3010:1:e09b:e760:9ba1:c8ae) Quit (Ping timeout: 480 seconds)
[1:16] * lofejndif (~lsqavnbok@1GLAAADMT.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[1:22] * houkouonchi-work (~linux@12.248.40.138) Quit (Ping timeout: 480 seconds)
[1:25] <lightspeed> whilst not comparable, how's this for crazy... back in the days of EVMS, I was running root on EVMS on a single disk in my laptop, and had a need to blow away the DOS partitions on the disk and recreate them from scratch, but didn't want downtime on the laptop
[1:26] <lightspeed> so I created a loopback block device from a sparse file inside an NFS mount from a remote file server
[1:26] <lightspeed> then made the loopback device into an EVMS PV, and shifted the live root filesystem over to it, wiped the local disk, then shifted it all back
[1:27] <lightspeed> wasn't crazy enough to do that over wireless though... wouldn't have been pleased if the network had dropped halfway through :)
[1:40] <ajm> hah, that's fairly crazy :)
[1:40] <ajm> linux pivot_root is usually fairly robust though
[1:50] * amatter_ (~amatter@209.63.136.130) Quit (Ping timeout: 480 seconds)
[1:51] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[1:58] * Cube (~Adium@12.248.40.138) Quit (Quit: Leaving.)
[2:00] * jjgalvez1 (~jjgalvez@12.248.40.138) Quit (Quit: Leaving.)
[2:00] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[2:02] <darkfader> lightspeed: you sound like someone who enjoys doing root disk encapsulation for VxVM by hand instead of script
[2:02] <darkfader> (it's fun when it works, admittedly)
[2:04] * Tv_ (~tv@2607:f298:a:607:5905:afb4:18b:79c5) Quit (Quit: Tv_)
[2:05] * sagelap1 (~sage@38.122.20.226) Quit (Ping timeout: 480 seconds)
[2:06] * sagelap (~sage@114.sub-70-197-140.myvzw.com) has joined #ceph
[2:15] <lightspeed> hah, I'd not heard of VxVM before
[2:16] <lightspeed> is it specific to solaris?
[2:24] * houkouonchi-work (~linux@12.248.40.138) Quit (Remote host closed the connection)
[2:27] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[2:33] <gregaf> sage: sagewk: finally got to wip-accepter; it looks good to me (good docs!) if you've tested it
[2:33] * Ryan_Lane (~Adium@127.sub-166-250-38.myvzw.com) has joined #ceph
[2:34] <gregaf> seems like an awfully narrow race to run into, though; how'd we hit it?
[2:39] * Ryan_Lane1 (~Adium@39.sub-166-250-35.myvzw.com) Quit (Ping timeout: 480 seconds)
[2:44] * houkouonchi-work (~linux@12.248.40.138) Quit (Remote host closed the connection)
[2:47] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[2:47] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[2:56] * sagelap (~sage@114.sub-70-197-140.myvzw.com) Quit (Ping timeout: 480 seconds)
[3:02] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Read error: Connection reset by peer)
[3:04] * Ryan_Lane (~Adium@127.sub-166-250-38.myvzw.com) Quit (Quit: Leaving.)
[3:07] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Ping timeout: 480 seconds)
[3:09] * Ryan_Lane (~Adium@166.250.35.39) has joined #ceph
[3:11] <joshd> damien: if you could compile qemu with https://gist.github.com/3711149 and re-run, it might give us a hint, or at least tell us if the core file is accurate about the backtrace
[3:13] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[3:18] * mrjack_ (mrjack@office.smart-weblications.net) Quit ()
[3:19] <dmick> ah *hah*.
[3:20] <dmick> these coredump files are corrupted in their "link map", built at runtime by the dynamic linker
[3:20] <dmick> addresses in the 0x7f64 xxxx xxxx range
[3:21] <dmick> the *stack* starts at something close to Top of Mem, which is something like 0x7fff xxxx xxxx, and grows down, of course
[3:21] <dmick> sp from the core file: 0x7f64 61b2 2b10
[3:21] <dmick> uh oh
[3:22] <dmick> one wonders how these boundaries are *actually* chosen, and whether there's any rlimit set for stack, and whether one can redzone the stack, and such things
[3:28] <dmick> although it has been pointed out to me that this may well be a thread stack, which surely is not the same as the process initial stack, so, hm.
[3:32] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[3:35] * Ryan_Lane (~Adium@166.250.35.39) Quit (Read error: Connection reset by peer)
[3:37] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[3:41] * Ryan_Lane (~Adium@191.sub-166-250-36.myvzw.com) has joined #ceph
[3:42] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:47] * Ryan_Lane (~Adium@191.sub-166-250-36.myvzw.com) Quit (Quit: Leaving.)
[4:02] * amatter (amatter@c-174-52-137-136.hsd1.ut.comcast.net) has joined #ceph
[4:05] * amatter_ (~amatter@209.63.136.130) has joined #ceph
[4:10] * amatter (amatter@c-174-52-137-136.hsd1.ut.comcast.net) Quit (Ping timeout: 480 seconds)
[4:38] * maelfius (~mdrnstm@66.209.104.107) Quit (Quit: Leaving.)
[4:56] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[5:24] * tttones (~tttones@ool-182ff74c.dyn.optonline.net) has joined #ceph
[5:25] * tttones (~tttones@ool-182ff74c.dyn.optonline.net) has left #ceph
[5:37] * pentabular (~sean@70.231.142.192) has joined #ceph
[5:37] * pentabular is now known as Guest6893
[5:38] * Guest6893 is now known as pentabular
[5:41] * tttones (~tttones@ool-182ff74c.dyn.optonline.net) has joined #ceph
[5:49] * huangjun (~hjwsm1989@183.62.232.94) has joined #ceph
[5:51] * pentabular (~sean@70.231.142.192) has left #ceph
[5:56] * tttones (~tttones@ool-182ff74c.dyn.optonline.net) has left #ceph
[5:56] * chutzpah (~chutz@100.42.98.5) Quit (Quit: Leaving)
[6:02] * amatter (amatter@c-174-52-137-136.hsd1.ut.comcast.net) has joined #ceph
[6:08] * amatter_ (~amatter@209.63.136.130) Quit (Ping timeout: 480 seconds)
[6:10] * amatter (amatter@c-174-52-137-136.hsd1.ut.comcast.net) Quit (Ping timeout: 480 seconds)
[6:22] * jluis (~JL@89.181.153.232) has joined #ceph
[6:28] * joao (~JL@89-181-145-30.net.novis.pt) Quit (Ping timeout: 480 seconds)
[7:29] * loicd (~loic@90.84.144.79) has joined #ceph
[7:32] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[8:01] * loicd (~loic@90.84.144.79) Quit (Quit: Leaving.)
[8:02] * EmilienM (~EmilienM@ADijon-654-1-133-33.w90-56.abo.wanadoo.fr) has joined #ceph
[8:38] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[8:40] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[8:45] * gvkhjv (~kuyggvj@82VAAGE27.tor-irc.dnsbl.oftc.net) Quit (Remote host closed the connection)
[8:46] * nick`m (~kuyggvj@83TAAARXT.tor-irc.dnsbl.oftc.net) has joined #ceph
[8:50] <huangjun> Does ceph use distributed lock in OSD or MDS module?
[9:07] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[9:09] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:11] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[9:29] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[9:46] <exec> nice msg: -2/68468 degraded (-0.003%);
[10:10] * loicd (~loic@178.20.50.225) has joined #ceph
[10:20] <jluis> exec, lol
[10:20] <jluis> is that a monitor report?
[10:32] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[10:32] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[10:38] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[10:42] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit ()
[11:08] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:10] * tomaw (tom@tomaw.netop.oftc.net) Quit (Read error: Operation timed out)
[11:12] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[11:16] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[11:20] <exec> jluis: ceph -w
[11:25] * tomaw (tom@tomaw.netop.oftc.net) has joined #ceph
[11:29] * EmilienM (~EmilienM@ADijon-654-1-133-33.w90-56.abo.wanadoo.fr) has left #ceph
[11:32] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[11:36] * tomaw (tom@tomaw.netop.oftc.net) Quit (Ping timeout: 480 seconds)
[11:37] * tomaw (tom@tomaw.netop.oftc.net) has joined #ceph
[11:42] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[11:47] <jluis> exec, does it show it still? how's the cluster healthiness?
[11:47] * jluis is now known as joao
[11:48] * Leseb_ (~Leseb@193.172.124.196) has joined #ceph
[11:54] * Leseb (~Leseb@193.172.124.196) Quit (Read error: Operation timed out)
[11:54] * Leseb_ is now known as Leseb
[12:04] * lofejndif (~lsqavnbok@9KCAABN9N.tor-irc.dnsbl.oftc.net) has joined #ceph
[12:11] * mrjack_ (mrjack@office.smart-weblications.net) has joined #ceph
[13:15] <nhmlap> good morning #ceph
[13:20] <wido> good morning nhmlap
[13:22] <joao> afternoon :p
[13:24] <wido> joao: something like that :)
[13:48] * masterpe (~masterpe@2001:990:0:1674::1:82) Quit (Quit: Changing server)
[13:48] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) Quit (Ping timeout: 480 seconds)
[13:50] * masterpe (~masterpe@2001:990:0:1674::1:82) has joined #ceph
[13:50] * masterpe (~masterpe@2001:990:0:1674::1:82) Quit ()
[13:51] * masterpe (~masterpe@2001:990:0:1674::1:82) has joined #ceph
[14:19] * lofejndif (~lsqavnbok@9KCAABN9N.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[14:19] * mtk (~mtk@ool-44c35bb4.dyn.optonline.net) has joined #ceph
[14:28] * nick`m (~kuyggvj@83TAAARXT.tor-irc.dnsbl.oftc.net) Quit ()
[14:48] <gregorg> hi
[14:48] <gregorg> just got "ERROR: osd init failed: (1) Operation not permitted" after a fresh setup
[14:48] <gregorg> what's wrong ?
[14:51] <nhmlap> gregorg: hrm, maybe a permission issue with logs or the osd data directory or something?
[14:54] <gregorg> I've disabled auth, then it works
[14:55] <gregorg> seems to be a problem with auth, however I just follow the wiki
[15:01] <nhmlap> gregorg: Ah, interesting. I haven't played with our auth stuff much.
[15:02] <gregorg> I'm going to recreate all keys
[15:02] <gregorg> but using "Cephx" wiki page
[15:04] * sagelap (~sage@56.sub-70-197-144.myvzw.com) has joined #ceph
[15:24] * sagelap (~sage@56.sub-70-197-144.myvzw.com) Quit (Ping timeout: 480 seconds)
[15:31] * f4m8_ is now known as f4m8
[15:31] <wido> gregorg: in which context did you get that message?
[15:33] <joao> was that during mkfs or something?
[15:34] <joao> feels like lack of permissions to do something
[15:38] <wido> joao: I had the same idea indeed, doesn't seem cephx to me
[15:39] * sagelap (~sage@222.sub-70-197-141.myvzw.com) has joined #ceph
[15:47] * sagelap1 (~sage@114.sub-70-197-142.myvzw.com) has joined #ceph
[15:48] * sagelap (~sage@222.sub-70-197-141.myvzw.com) Quit (Ping timeout: 480 seconds)
[15:52] <gregorg> I used this cmd to mk ceph fs: mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/keyring.bin --mkbtrfs
[15:52] <gregorg> and no error was reported
[15:52] <gregorg> I got this error when starting osd
[15:52] <gregorg> the only error I got during mkcephfs is "failed to read /dev/sr0"
[15:53] <gregorg> however I've never mentionned /dev/sr0 in ceph.conf ...
[15:53] * sagelap1 (~sage@114.sub-70-197-142.myvzw.com) Quit (Ping timeout: 480 seconds)
[15:54] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:00] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:00] <wido> gregorg: Ah, you are using btrfs devs in ceph.conf?
[16:00] <gregorg> yep
[16:00] <wido> That will become obsolete
[16:01] <gregorg> btrfs obsolete ?
[16:01] <wido> The reason you are seeing /dev/sr0 is because btrfs will scan all block devices
[16:01] <wido> No, but the "btrfs devs" option in ceph.conf
[16:01] <wido> I recommend you mount the devices in your fstab and point to the correct location in your ceph.conf
[16:01] <gregorg> btrfs devs = /dev/sda9
[16:02] <gregorg> ok, I will try that
[16:02] <wido> gregorg: Yes, but btrfs itself will scan all block devices, it could be that it is a stripe over multiple disks
[16:02] <wido> gregorg: http://zooi.widodh.nl/ceph/ceph.conf
[16:02] <gregorg> not sure it is the problem
[16:02] <gregorg> since if I disable auth, it just works
[16:03] <wido> gregorg: http://pastebin.com/WvYZ0Sxm
[16:03] <wido> Where do you get the "operation not permitted" ?
[16:03] <wido> can you paste the output on pastebin?
[16:03] <gregorg> in osd log file
[16:04] <gregorg> http://paste.frsag.net/gCuTR
[16:04] * sagelap (~sage@201.sub-70-197-143.myvzw.com) has joined #ceph
[16:06] <wido> gregorg: Can you add "debug osd = 20" and "debug filestore = 20" to the [osd] section and re-try
[16:07] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[16:07] <gregorg> ok wido
[16:09] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:11] <gregorg> wido: http://paste.frsag.net/ObJxp
[16:11] <gregorg> and without btrfs devs, but using /etc/fstab
[16:12] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:13] * loicd (~loic@178.20.50.225) Quit (Ping timeout: 480 seconds)
[16:14] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:15] <wido> gregorg: Hmm, you might be right indeed. This seems like a cephx/messenger issue
[16:15] <wido> debug auth = 20, debug ms = 20 ?
[16:16] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:16] <gregorg> wido: restarting ...
[16:17] <gregorg> how to restart just osd.0 ? /etc/init.d/ceph restart osd.0 do nothing
[16:17] <wido> gregorg: It's probably not running, try a stop and start
[16:17] <wido> service ceph stop osd.0
[16:17] <wido> service ceph start osd.0
[16:17] <gregorg> I tried that
[16:18] <gregorg> big log file with debug * = 20 : http://paste.frsag.net/OP2mY
[16:19] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:21] <wido> gregorg: It's indeed a cephx issue, something with your keys
[16:21] <wido> can you share your ceph.conf?
[16:21] <wido> and which Ceph version are you using?
[16:22] <gregorg> wido: ceph.conf http://paste.frsag.net/jpoFv
[16:22] <gregorg> ceph version 0.43-1
[16:22] <wido> gregorg: Ah, I recommend you try at least 0.48.1
[16:22] <wido> Ubuntu platform?
[16:22] <gregorg> debian wheezy
[16:23] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:23] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:24] <wido> gregorg: 0.43 is a pretty old version. I'm sure the problem you are seeing is something with cephx
[16:24] <wido> Ubuntu 12.04 is recommend with 0.48 right now
[16:24] <gregorg> wido: I will try with officials ceph.com repository packages
[16:24] <wido> gregorg: The 0.48 packages aren't build for Debian Wheezy
[16:24] <wido> My bad, they are
[16:25] <wido> sorry, you can install 0.48 on Debian Wheezy
[16:25] <wido> gregorg: You can remove the keyring = /etc/ceph/keyring.bin line, better to use the default
[16:25] <wido> gregorg: Ah, reading your config again, you're problem is in the [osd] section
[16:26] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:26] <wido> the keyring directive is pointing to the wrong keyring, this should be: keyring = /etc/ceph/keyring.$name
[16:26] <wido> now it points to the keyring of client.admin
[16:29] <gregorg> wido: now I have auth: failed to open keyring from /etc/ceph/keyring.osd.0
[16:29] <wido> gregorg: Yes, that file doesn't exist, you should run mkcephfs again to generate the key
[16:29] <wido> it could be done manually, but that's the easiest way for now
[16:31] <gregorg> ok thx
[16:33] <gregorg> wido: nice, it works now !!!
[16:33] <gregorg> with 0.43
[16:33] <gregorg> but I will upgrade to 0.48 now
[16:33] <wido> gregorg: Great!
[16:33] <wido> Yes, 0.48 is much better then 0.43
[16:35] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:35] * sagelap (~sage@201.sub-70-197-143.myvzw.com) Quit (Ping timeout: 480 seconds)
[16:36] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[16:37] * fc (~fc@83.167.43.235) has joined #ceph
[16:45] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[16:47] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:01] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:01] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:02] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[17:04] * loicd (~loic@magenta.dachary.org) has joined #ceph
[17:04] <gregorg> upgrade successfull
[17:11] <gregorg> now I have this error: "mount: error writing /etc/mtab: Invalid argument" but cephfs is mounted
[17:13] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[17:14] * aliguori (~anthony@cpe-70-123-140-180.austin.res.rr.com) Quit (Remote host closed the connection)
[17:14] * lx0 (~aoliva@lxo.user.oftc.net) has joined #ceph
[17:14] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[17:17] * jlogan (~Thunderbi@2600:c00:3010:1:49cf:a720:7a5f:aaa9) has joined #ceph
[17:19] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Quit: Leaving.)
[17:19] * MikeMcClurg (~mike@62.200.22.2) has joined #ceph
[17:31] * Tv_ (~tv@2607:f298:a:607:5905:afb4:18b:79c5) has joined #ceph
[17:42] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[17:43] * lx0 (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[17:44] * aliguori (~anthony@32.97.110.59) has joined #ceph
[17:45] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[17:51] * ferai (~quassel@quassel.jefferai.org) Quit (Ping timeout: 480 seconds)
[17:52] * dabeowulf (~dabeowulf@free.blinkenshell.org) Quit (Read error: Connection reset by peer)
[17:55] * sagelap (~sage@166.250.39.146) has joined #ceph
[17:57] * jefferai (~quassel@quassel.jefferai.org) has joined #ceph
[17:57] * dabeowulf (dabeowulf@free.blinkenshell.org) has joined #ceph
[18:18] * sagelap1 (~sage@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:23] * sagelap (~sage@166.250.39.146) Quit (Ping timeout: 480 seconds)
[18:24] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[18:30] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[18:41] <joao> sagewk, did you get this error by any chance while compiling wip-mon-gv?
[18:41] <joao> make: *** No rule to make target `librbd.cc', needed by `librbd_la-librbd.lo'. Stop.
[18:42] <joao> oh
[18:42] <joao> oh
[18:42] <joao> forgot to ./configure -_-
[18:42] * amatter (~amatter@209.63.136.130) has joined #ceph
[18:44] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[18:44] * loicd (~loic@magenta.dachary.org) has joined #ceph
[18:45] * amatter (~amatter@209.63.136.130) Quit ()
[18:45] * amatter (~amatter@209.63.136.130) has joined #ceph
[18:56] * Ryan_Lane (~Adium@159.sub-166-250-39.myvzw.com) has joined #ceph
[18:57] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[18:58] * Ryan_Lane1 (~Adium@owa.robertfountain.com) has joined #ceph
[19:00] * joshd (~joshd@38.122.20.226) has joined #ceph
[19:04] * Ryan_Lane (~Adium@159.sub-166-250-39.myvzw.com) Quit (Ping timeout: 480 seconds)
[19:07] * MikeMcClurg (~mike@62.200.22.2) Quit (Quit: Leaving.)
[19:09] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[19:09] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[19:11] * chutzpah (~chutz@100.42.98.5) has joined #ceph
[19:12] * Tamil (~Adium@2607:f298:a:607:d427:d464:47a9:35f2) has joined #ceph
[19:15] * Cube (~Adium@12.248.40.138) has joined #ceph
[19:17] <nhmlap> wtf, "I'm getting all circuits are busy now" again trying to call into the stand-up.
[19:17] <nhmlap> ok, got in
[19:18] <dmick> we're still getting it together here
[19:29] * maelfius (~mdrnstm@66.209.104.107) has joined #ceph
[19:32] * BManojlovic (~steki@212.200.241.6) has joined #ceph
[19:35] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:40] * Tamil (~Adium@2607:f298:a:607:d427:d464:47a9:35f2) has left #ceph
[19:41] <nhmlap> sagelap1: sending out email about rgw performanc in a bit
[19:42] <sagelap1> nhmlap: k.
[19:43] <sagelap1> nhmlap, yehudasa: i think we need to figure out if this is them misusing the api, or something we need to work on
[19:43] <sagelap1> do we know how s3 perf compares?
[19:43] <sagelap1> with the same workload?
[19:44] <yehudasa> nhmlap: did you send out the email?
[19:53] <nhmlap> yehudasa: just did a bit ago. Did you get it?
[19:53] <yehudasa> yeah, got iy
[19:53] <yehudasa> it
[19:54] <gregaf> did you just send out a performance doc to a private group? *shakes fist*
[19:54] <nhmlap> gregaf: oh, I'm happy to share. I just never know how many people I should spam. :)
[19:55] <nhmlap> gregaf: I can make you a permenant recipient. >:)
[19:55] <gregaf> the dev list is probably safe too, but that works for me :)
[20:07] * dabeowulf (dabeowulf@free.blinkenshell.org) Quit (Ping timeout: 480 seconds)
[20:11] * dabeowulf (dabeowulf@free.blinkenshell.org) has joined #ceph
[20:40] * sagelap1 (~sage@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[20:52] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[20:53] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:55] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[21:04] <damien> joshd: good morning, was that patch you linked me too supposed to print something on qemu's stdout at the point of the crash?
[21:04] <joshd> damien: yes
[21:04] <damien> joshd: okay, chance i didn't apply it in the package correctly then
[21:04] <joshd> damien: the actual address of the crash, and the reason for the floating point exception
[21:05] <joshd> damien: which qemu version are you using? perhaps there's some difference in its signal handling
[21:05] <damien> joshd: 1.1.2
[21:07] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) has joined #ceph
[21:09] <joshd> damien: it works fine with 1.1.2 with my test case (adding a divide by zero to aio_read for raw files)
[21:09] <damien> joshd: okay in that case the package must not be building correctly :-/ I'll build a new one
[21:10] <joshd> damien: oh, it may not be getting called because the exception isn't occurring in the main qemu thread
[21:10] <damien> joshd: would that mean installing the signal handler in each thread?
[21:11] <joshd> damien: the backtrace would claim it's from a librados thread, so I'll try installing it there and see if that catches it
[21:11] <jlogan> Working on getting a Ceph cluster up for some testing. Which Distro? Debian testing or sid, Ubuntu 12.04 or 12.10 beta? Where are people having the most success. I saw some mailing list comments about qemu 1.1 or 1.2, but I know Ubuntu 12.04 is 1.0 still (via apt-cache show).
[21:12] <jlogan> I have 3 12-drives nodes for my test
[21:13] <damien> joshd: okay cool!
[21:14] <joshd> jlogan: ubuntu 12.04 is probably easiest, the only extra thing that qemu 1.2 gets you is you can use the qemu caching options instead of the rbd ones
[21:15] <jlogan> is there an advantage of one caching method over another?
[21:16] <damien> it uses the same caching method, it's just configured using qemus ,cache=writeback|writethrough|none option
[21:17] <jlogan> For some initial tests should I bring up all three machines at once, or just start with one host and then add the other 2 nodes?
[21:19] <joshd> it doesn't really matter how you bring them up, but you might want to start with just a couple osds so you can get familiar with the process
[21:20] <jlogan> ok. I'll get started and see how it goes.
[21:21] <jlogan> also should I follow http://ceph.com/docs or http://ceph.com/wiki for initial steps?
[21:22] <joshd> docs, the wiki isn't up to date on a lot of things
[21:23] <jlogan> good to know.
[21:23] <joshd> yeah, maybe we should put a banner on the front page noting that
[21:25] * Ryan_Lane1 (~Adium@owa.robertfountain.com) Quit (Quit: Leaving.)
[21:39] <dilemma> I'm using the qemu rbd driver, and after hot-attaching a volume, and writing large amounts of data to it, my qemu process seems to consume an additional 300MB of memory.
[21:39] <dilemma> That seems a bit extreme. I have caching disabled on the volume when attaching.
[21:39] <jlogan> With the quickstart on 1 node I see a message when I run mkceph
[21:39] <jlogan> sudo mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring
[21:39] <jlogan> ...
[21:39] <jlogan> 2012-09-13 12:37:43.385210 7fb3a2f95780 -1 filestore(/var/lib/ceph/osd/ceph-0) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
[21:39] <jlogan> 2012-09-13 12:37:43.836251 7fb3a2f95780 -1 created object store /var/lib/ceph/osd/ceph-0 journal /var/lib/ceph/osd/ceph-0/journal for osd.0 fsid 32afa63d-3b77-4847-9fac-04640df00a81
[21:39] <jlogan> ...
[21:39] <jlogan> But in the end the service does come up.
[21:39] <jlogan> root@ceph01-sef:/etc/ceph# ceph health
[21:39] <jlogan> HEALTH_OK
[21:42] <joshd> jlogan: I think that message is harmless
[21:43] <joshd> dilemma: what version are you running? we haven't seen any memory leaks recently other than one with discard and caching enabled
[21:45] <dilemma> QEMU emulator version 1.1.1
[21:46] <dilemma> also, my cephs libs are v0.48.1argonaut
[21:47] <joshd> dilemma: can you run it under valgrind with massif (i.e. valgrind --tool=massif) to see where the memory is being used?
[21:47] <dilemma> I can watch the memory usage balloon 315MB or so immediately as I start writing large amounts of data to the volume
[21:48] <dilemma> I'll give that a shot
[21:48] <gregaf> dilemma: does the memory usage go back down when you stop writing?
[21:49] <gregaf> joshd: it could just be the message data being kept locally in case it needs to retransmit, right?
[21:49] <joshd> possibly, yeah
[21:49] <gregaf> the throttler is less than 300MB, but it might be 200MB and some other overhead somewhere
[21:49] <joshd> that would be a lot of overhead
[21:50] <gregaf> it would – I don't remember the real numbers or all the sources though, and I would certainly expect memory usage to go up by some amount
[21:50] <dilemma> it does not look like memory usage goes back down
[21:50] <dilemma> unless I'm not waiting long enough
[21:52] <joshd> it could be a memory leak in the qemu rbd driver/block layer, I haven't run it under valgrind in a while
[21:52] <dilemma> I'm not too familiar with fiddling with valgrind myself
[21:53] <joshd> generating the data is easy, determining where the leaks actually are can be more difficult sometimes
[21:55] <dilemma> another interesting fact: I don't even get the memory back when detaching the volume
[21:55] <joshd> hmm, could it be the guest using more memory with the new block device then?
[21:55] <joshd> generally guest memory isn't reclaimed by the host once it's allocated
[21:56] <dilemma> well, before attaching, the guest's qemu process was using about 550MB. I gave the guest 512MB of RAM.
[21:57] <dilemma> After attaching, writing, and detaching, it's now using around 850MB
[21:57] <joshd> you could try attaching a regular file instead of an rbd image and running the same test to check
[21:57] <joshd> but that's sounding more and more like a leak in qemu somewhere, possibly the rbd driver
[21:58] <dilemma> I haven't seen this happen before when attaching LVM volumes, but I'll give it a shot real quick.
[22:04] <dilemma> :/ You seem to be correct. I'm seeing similar memory usage when performing these steps with LVM rather than RBD
[22:05] <dilemma> I'll start tackling this as a qemu issue now, thank you.
[22:11] * dilemma (~dilemma@2607:fad0:32:a02:1e6f:65ff:feac:7f2a) Quit (Quit: Leaving)
[22:31] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:32] <joshd> damien: sorry for the delay, this patch to librbd should do it: https://gist.github.com/3717397
[22:36] <joshd> damien: actually, I'll add that to the log so it should be output
[22:44] * Ryan_Lane (~Adium@owa.robertfountain.com) has joined #ceph
[23:03] <elder> nhm, ping.
[23:08] <nhmlap> elder: pong
[23:19] * sagelap (~sage@38.sub-166-250-38.myvzw.com) has joined #ceph
[23:21] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[23:23] <joshd> damien: new plan, try running under gdb again but 'print $_siginfo' and look at the si_code and the si_addr (ADDR) (maybe buried deep in the structure) and list *ADDR
[23:24] <joshd> damien: my attempts to make a signal handler catch one that happens in librbd failed
[23:24] <joshd> damien: $_siginfo is case sensitive, and I think yesterday I'd accidentally said $_SIGINFO
[23:25] <dmick> ptype $_siginfo will show you the structure so you can pick the right fieldnames
[23:35] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[23:39] * sagelap (~sage@38.sub-166-250-38.myvzw.com) Quit (Ping timeout: 480 seconds)
[23:44] * BManojlovic (~steki@212.200.241.6) Quit (Quit: Ja odoh a vi sta 'ocete...)
[23:54] * aliguori (~anthony@32.97.110.59) Quit (Remote host closed the connection)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.