#ceph IRC Log

Index

IRC Log for 2012-11-09

Timestamps are in GMT/BST.

[0:03] <Qten> joshd: i'm not using cephx at this stage if that helps
[0:04] <Qten> joshd: i'll try the glacepool/glaceimage option 1 sec
[0:04] <joshd> ah, no cephx makes this much stranger
[0:05] <Qten> rbd info images/b9022777-0bb4-4a08-b16a-1d75dc97af95
[0:05] <Qten> rbd image 'b9022777-0bb4-4a08-b16a-1d75dc97af95':
[0:05] <Qten> size 2048 MB in 256 objects
[0:05] <Qten> order 23 (8192 KB objects)
[0:05] <Qten> block_name_prefix: rbd_data.1008509aa4f6
[0:05] <Qten> format: 2
[0:05] <Qten> features: layering
[0:06] <Qten> interesting?
[0:06] <joshd> no, that's working as expected
[0:07] <Qten> well thats a good thing :)
[0:07] <joshd> could you pastebin cinder-volume.log when creating a volume from an image?
[0:07] <Qten> sure
[0:07] <joshd> something should be failing that makes it fall back to plain copying
[0:09] <joshd> also check the glance-api log to make sure there's a successful request for the image's location/direct_url
[0:13] <Qten> http://dpaste.com/827641/
[0:16] <Qten> seems to be trying to import the volume from tmp?
[0:17] <joshd> yeah, that's the fallback if it can't clone
[0:18] * danieagle (~Daniel@177.133.174.11) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[0:22] <joshd> it's not getting anything back for the location/direct_url
[0:22] <Qten> Constructed URL: http://0.0.0.0:9191/images/b9022777-0bb4-4a08-b16a-1d75dc97a$5dc97af95 _construct_url i shtat what were looking for?
[0:22] <Qten> *that
[0:23] <joshd> it's part of the detail request actually, so it's not visible just from the logs
[0:23] <joshd> I forgot about that part
[0:23] <Qten> ah
[0:23] <joshd> did you restart glance-api after setting show_image_direct_url=True?
[0:24] <Qten> yep
[0:24] <Qten> did that on my inital install and had several reboots since
[0:25] <Qten> image is raw and container is bare if that helps
[0:25] <Qten> size of the volume which the image was created is 80gb and i'm trying to create a new 80gb cloned volume
[0:25] <joshd> in the beginning of the glance-api.log, after startup it prints out its config
[0:25] <joshd> does it have the show_image_direct_url set correctly there?
[0:26] <Qten> hmm
[0:26] <Qten> just did a restart and looked for show_image and no go
[0:27] <Qten> double checked /etc/glance/glance-api.conf and its at the bottom, show_image_direct_url=True
[0:28] <joshd> hmm, you know, glance config can have multiple sections
[0:29] <joshd> like [DEFAULT] at the top, and others below. try putting it up there
[0:29] <joshd> in the default one
[0:30] <Qten> heh
[0:30] <Qten> bingo
[0:31] <Qten> ./facepalm
[0:31] <dmick> ./facepalm: command not found
[0:33] <Qten> indeed
[0:33] <Qten> joshd: mightbe worth adding that lil bit to the http://ceph.com/docs/master/rbd/rbd-openstack/
[0:34] <Qten> is that the same as the github doco
[0:34] <joshd> yeah
[0:35] <joshd> it'd be good to clarify that
[0:35] <Qten> joshd: i dunno if you saw the mailing list but i sent out a email with a "workaround" for nova dashboard not supporting rbd too
[0:35] <Qten> seems to work ok at this stage
[0:35] <joshd> cool
[0:36] <joshd> lots of people are interested in that, it'd be good to get it upstream for grizzly
[0:37] <Qten> vishy from openstack-dev's is owed most of the credit i just added a bit for downloading status and better vol names
[0:37] <Qten> ;)
[0:37] <joshd> still, nice to see it working from the dashboard
[0:37] <Qten> very
[0:40] <Qten> cow volumes hell yeah thats awesome!
[0:41] <lurbs> To what point does using copy on write clones of snapshots scale? Is it reasonable to use it as the base for the entire OpenStack deployment, or should you flatten the clones if possible?
[0:43] <joshd> there is a bit of a performance penalty (which will be made better in 0.55) for using a clone
[0:43] <Qten> joshd: thanks for your help once again
[0:43] <joshd> Qten: no problem, glad you got it all working
[0:44] <joshd> lurbs: reads are affected more than writes right now, although caching will make that better in 0.55
[0:45] <joshd> namely caching which objects in the clone don't exist and have to be read from the parent
[0:46] <lurbs> I'm waiting for bobtail until moving out of testing anyway, so that's not a problem.
[0:46] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)
[0:46] <lurbs> Just don't do something daft like unprotect and delete your snapshots, I guess. :)
[0:47] <joshd> it won't let you unprotect them if there are dependent clones
[0:47] <lurbs> I just did.
[0:47] <joshd> then I'd be very interested to hear how you did that
[0:49] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[0:49] <joshd> lurbs: were you the person dmick and i told to remove the rbd_children object? because that's what tracks dependent clones
[0:50] * yoshi (~yoshi@p30106-ipngn4002marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[0:50] <lurbs> http://paste.nothing.net.nz/b01358
[0:50] * PerlStalker (~PerlStalk@perlstalker-1-pt.tunnel.tserv8.dal1.ipv6.he.net) Quit (Quit: ...)
[0:52] <lurbs> Very possible that it's been fixed since 0.53, of course.
[0:55] <lurbs> That's a newly created cluster, BTW. No daft legacy of mucking about with the rbd_directory stuff.
[0:55] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[0:55] * lurbs has to run.
[0:56] <joshd> hmm, seems to be a new bug that the tests didn't catch... thanks for letting us know
[0:57] <joshd> the fix will be in 0.54
[1:01] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[1:09] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[1:18] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[1:18] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[1:19] <lurbs> joshd: Excellent, thanks.
[1:20] * didders_ (~btaylor@142.196.239.240) has joined #ceph
[1:20] * BManojlovic (~steki@85.222.181.90) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:21] <joshd> not sure how this happened yet, as the test case with the python bindings still succeeds
[1:26] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Quit: Leaving.)
[1:27] * jlogan1 (~Thunderbi@2600:c00:3010:1:4990:f1e9:6310:a09f) Quit (Ping timeout: 480 seconds)
[1:32] * tnt (~tnt@50.90-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:33] * maxiz (~pfliu@111.192.252.156) has joined #ceph
[1:35] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[1:59] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[2:08] <dmick> lurbs: found it, thanks for the tip
[2:08] <dmick> 1) missing test case (my fault); 2) using wrong snapshot id to search for children (also my fault); fixing
[2:12] * vata (~vata@208.88.110.46) Quit (Quit: Leaving.)
[2:15] <dmick> joshd: wip-rbd-unprotect (based on next)
[2:16] <joshd> dmick: looks good
[2:17] <dmick> I'll let gitbuilder complete just in case, then merge to next and master?
[2:17] <dmick> (and do you think we should be filing issues about things like this, just for the record, or no?)
[2:20] <joshd> push to next, then merge next into master, yeah
[2:21] <joshd> it's probably worth making an issue so you can search for it later
[2:21] <lurbs> I can report via the bug tracker, if that works better for you guys.
[2:22] <dmick> lurbs: no, that's ok, I'll create it; you've done more than we could expect :)
[2:22] <dmick> http://tracker.newdream.net/issues/3468
[2:49] * maxiz (~pfliu@111.192.252.156) Quit (Quit: Ex-Chat)
[2:49] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Ping timeout: 480 seconds)
[2:51] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[2:51] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[2:58] * didders_ (~btaylor@142.196.239.240) Quit (Quit: didders_)
[3:54] * adjohn (~adjohn@69.170.166.146) Quit (Quit: adjohn)
[3:59] * mdrnstm (~mdrnstm@206-169-78-213.static.twtelecom.net) Quit (Quit: Leaving.)
[4:06] * imjustmatthew (~imjustmat@pool-74-110-201-156.rcmdva.fios.verizon.net) Quit (Remote host closed the connection)
[4:08] * sagelap (~sage@bzq-218-183-205.red.bezeqint.net) Quit (Read error: No route to host)
[4:09] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[4:11] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[4:32] * maxiz (~pfliu@202.108.130.138) has joined #ceph
[4:32] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[4:33] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit ()
[4:43] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[4:45] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[4:50] * yoshi (~yoshi@p30106-ipngn4002marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[4:53] <rektide> do ceph people have experience tuning to btrfs's allocator?
[4:54] <rektide> btrfs apparently uses 1GiB lumps to allocate
[4:54] <rektide> "As the filesystem needs storage to hold file data, or filesystem metadata, it allocates chunks of this raw storage, typically in 1GiB lumps, for use by the higher levels of the filesystem." https://btrfs.wiki.kernel.org/index.php/SysadminGuide
[4:54] <rektide> does this help Ceph at all?
[4:54] <rektide> i'm interested in hacking with allocating drive space dynamically to btrfs, thought ceph might have more experience/expertise with this kind of thing than btrfs would
[5:27] * sagelap (~sage@bzq-79-182-241-147.red.bezeqint.net) has joined #ceph
[5:38] * yoshi (~yoshi@ai126165130151.3.access-internet.ne.jp) has joined #ceph
[6:13] * s_parlane (~scott@202.49.72.37) Quit (Ping timeout: 480 seconds)
[6:15] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (Ping timeout: 480 seconds)
[6:25] * dmick (~dmick@2607:f298:a:607:752e:36a4:1152:7d34) Quit (Quit: Leaving.)
[6:26] * sagelap (~sage@bzq-79-182-241-147.red.bezeqint.net) Quit (Ping timeout: 480 seconds)
[6:26] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) has joined #ceph
[6:30] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[6:38] * yoshi (~yoshi@ai126165130151.3.access-internet.ne.jp) Quit (Remote host closed the connection)
[7:01] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[7:07] * rweeks (~rweeks@c-24-4-66-108.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[7:10] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[7:12] * s_parlane (~scott@121-74-248-190.telstraclear.net) has joined #ceph
[7:19] * sjustlaptop1 (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[7:24] * yoshi (~yoshi@p30106-ipngn4002marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[8:03] <todin> morning #ceph
[8:11] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[8:16] * KindTwo (KindOne@h210.25.131.174.dynamic.ip.windstream.net) has joined #ceph
[8:19] * KindOne (KindOne@h116.26.131.174.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[8:19] * KindTwo is now known as KindOne
[8:23] * tnt (~tnt@50.90-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[8:23] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[8:37] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[8:38] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[8:39] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[8:45] * guerby (~guerby@nc10d.tetaneutral.net) has joined #ceph
[8:53] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:53] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:01] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Ping timeout: 480 seconds)
[9:06] * miroslav (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[9:06] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[9:08] * silversurfer (~silversur@124x39x126x66.ap124.ftth.ucom.ne.jp) has joined #ceph
[9:16] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[9:19] * jtang1 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[9:21] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[9:24] * jtang1 (~jtang@79.97.135.214) Quit ()
[9:37] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:38] <ramsay_za> yo todin
[9:58] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[10:01] * tnt (~tnt@50.90-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[10:05] * maxiz (~pfliu@202.108.130.138) Quit (Remote host closed the connection)
[10:20] * Leseb (~Leseb@193.172.124.196) has joined #ceph
[10:40] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[10:40] * tryggvil (~tryggvil@16-80-126-149.ftth.simafelagid.is) Quit (Quit: tryggvil)
[10:48] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[10:57] * MikeMcClurg (~mike@62.200.22.2) has joined #ceph
[11:04] * yoshi (~yoshi@p30106-ipngn4002marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:05] * silversurfer (~silversur@124x39x126x66.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:12] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[11:13] * gucki (~smuxi@84-72-8-40.dclient.hispeed.ch) has joined #ceph
[11:18] <gucki> good morning
[11:29] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:41] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[11:41] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[11:47] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[11:54] <gucki> ok, today my first hardware with ceph osds on it died (just hard reset). is it safe to just restart the ceph osd daemons or can there by any data corruption (xfs had to do recovery, but not it's clean)
[11:54] <s_parlane> do you have other osds ?
[11:55] <gucki> s_parlane: yes, the cluster is up fine...only degraded
[11:56] <gucki> s_parlane: i just wonder if xfs recovery mixed up some data, will the osd then send this data out or will it detect (by some checksum) that the data is broken and fix it?
[11:56] <s_parlane> did the osd get marked as down and/or out ?
[11:57] <gucki> s_parlane: it's marked down
[11:57] <s_parlane> is ceph -w showing recovery ?
[11:58] <gucki> s_parlane: i think they'll come back and start recovery once i restart the osds daemons. i'm just a bit concerned about data integrity...
[12:02] <s_parlane> ok, from manuals + mailing list, ceph osd scrub num
[12:02] <s_parlane> after you bring the failed osd back
[12:02] <s_parlane> it should detect any corruption that has occurred
[12:07] <s_parlane> does that help ?
[12:12] <gucki> s_parlane: thanks, i'll do the scrubbing of the osds once they are up....but idealy i'd be able to do the scrub before they are in the cluster again, so they don't send corrupted data
[12:12] <gucki> s_parlane: need to go now, but i'll check later...thanks and cya!
[12:12] * gucki (~smuxi@84-72-8-40.dclient.hispeed.ch) Quit (Remote host closed the connection)
[12:38] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Ping timeout: 480 seconds)
[12:47] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[12:48] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[12:50] * vagabon (~fbui@au213-1-82-235-205-153.fbx.proxad.net) has joined #ceph
[12:52] <vagabon> hi. I'm trying to build Ceph from http://eu.ceph.com/download/ceph-0.53.tar.bz2, but getting the following build error:
[12:52] <vagabon> make[3]: Entering directory `/home/build/rpmbuild/BUILD/ceph-0.53/src'
[12:52] <vagabon> ./check_version ./.git_version
[12:52] <vagabon> not updating .git_version (no ../.git)
[12:52] <vagabon> CXXLD libcls_rbd.la
[12:52] <vagabon> .libs/libcls_rbd_la-cls_rbd.o: In function `copyup':
[12:52] <vagabon> /home/build/rpmbuild/BUILD/ceph-0.53/src/cls_rbd.cc:1234: undefined
[12:52] <vagabon> reference to `cls_cxx_stat(void*, unsigned long*, long*)'
[12:53] <vagabon> /home/build/rpmbuild/BUILD/ceph-0.53/src/cls_rbd.cc:1236: undefined reference to `cls_log'
[12:53] <vagabon> etc...
[12:53] <vagabon> could anybody give me some help please ?
[13:05] * guigouz (~guigouz@177.33.212.119) Quit (Ping timeout: 480 seconds)
[13:09] * joao (~JL@89-181-150-224.net.novis.pt) Quit (Ping timeout: 480 seconds)
[13:10] * sagelap (~sage@94.175.239.226) has joined #ceph
[13:13] <s_parlane> vagabon: on redhat ?
[13:20] <s_parlane> can you get it to build with V=1 and Q=
[13:20] <s_parlane> im guessing it's missing a -l, but need more info
[13:21] <vagabon> s_parlane: thanks for answering, it's on mandriva
[13:21] <vagabon> I'm adding "V=1 Q=" and report ASAP
[13:22] <s_parlane> are you using an arguments to configure ?
[13:22] <vagabon> s_parlane: I'm actually using the .spec to build it
[13:22] <vagabon> s_parlane: http://pastebin.com/0E3zsgiN
[13:24] <s_parlane> ok, let me match that, hold on
[13:29] <s_parlane> what does head config.log have as configure options ?
[13:29] <vagabon> $ ./configure x86_64-mandriva-linux-gnu --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib64 --localstatedir=/var --sharedstatedir=/usr/com --mandir=/usr/share/man --infodir=/usr/share/info --x-includes=/usr/include --x-libraries=/usr/lib64 --prefix=/usr --sbindir=/sbin --
[13:29] <vagabon> localstatedir=/var --sysconfdir=/etc --docdir=/usr/share/doc/ceph --without-hadoop --with-radosgw --without-tcmalloc CFLAGS=-O2 -g CXXFLAGS=-O2 -g
[13:30] <s_parlane> getconf _NPROCESSORS_ONLN ?
[13:31] <vagabon> 8
[13:31] <s_parlane> ok
[13:31] <vagabon> but actually I got this error running "make"
[13:32] <s_parlane> the last part is passed to make
[13:34] <s_parlane> just making sure i build the same parts as you
[13:35] * didders_ (~btaylor@142.196.239.240) has joined #ceph
[13:36] <vagabon> thanks !
[13:44] * didders_ (~btaylor@142.196.239.240) Quit (Quit: didders_)
[13:47] * Meths (~meths@2.27.72.59) has joined #ceph
[13:49] * gucki (~smuxi@dslb-084-057-126-044.pools.arcor-ip.net) has joined #ceph
[13:53] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[14:02] <s_parlane> ok, mine didn't break
[14:02] <s_parlane> but thats ok
[14:12] <s_parlane> do you have a src/ceph_osd-class_api.o ?
[14:13] <vagabon> s_parlane: yes
[14:14] <s_parlane> that file contains all the parts that you are missing, it seems anyways
[14:14] <s_parlane> actually, no
[14:16] * joao (~JL@89-181-150-224.net.novis.pt) has joined #ceph
[14:18] <s_parlane> in the spec file change make -j$(getconf .... to just make
[14:18] <s_parlane> and then clean and build again
[14:18] <vagabon> ok
[14:19] <vagabon> s_parlane: building
[14:36] <s_parlane> did it break ?
[14:37] <vagabon> s_parlane: it just broke the same way :-/
[14:37] <vagabon> CXXLD libcls_rbd.la
[14:37] <vagabon> .libs/libcls_rbd_la-cls_rbd.o: In function `copyup':
[14:37] <vagabon> /home/build/rpmbuild/BUILD/ceph-0.53/src/cls_rbd.cc:1234: undefined reference to `cls_cxx_stat(void*, unsigned long*, long*)'
[14:37] <s_parlane> ok
[14:38] <s_parlane> in src/Makefile.am, find the follow line and make it look like this (the last part i have added)
[14:38] <s_parlane> libcls_rbd_la_SOURCES = cls_rbd.cc objclass/class_api.cc
[14:39] <s_parlane> then rebuild (you can undo the change to the spec file)
[14:42] <s_parlane> anyways, i should go to sleep, its 2:40am, hit up one of the devs if that still doesn't work
[14:42] <s_parlane> goodluck
[14:43] <vagabon> s_parlane: thanks for helping
[14:43] <vagabon> just trying what you just suggested
[14:51] * s_parlane (~scott@121-74-248-190.telstraclear.net) Quit (Ping timeout: 480 seconds)
[15:00] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[15:35] * tnt (~tnt@212-166-48-236.win.be) Quit (Quit: leaving)
[15:44] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[15:49] * ninkotech (~duplo@89.177.137.231) Quit (Quit: Konversation terminated!)
[16:00] * PerlStalker (~PerlStalk@perlstalker-1-pt.tunnel.tserv8.dal1.ipv6.he.net) has joined #ceph
[16:05] * vata (~vata@208.88.110.46) has joined #ceph
[16:09] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[16:13] * sagelap (~sage@94.175.239.226) Quit (Ping timeout: 480 seconds)
[16:13] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[16:24] <Robe> how important is syncfs support when you're not using rbd and just cephfs?
[16:24] <Robe> from a performance PoV?
[16:30] * sagelap (~sage@94.175.239.226) has joined #ceph
[17:00] * sagewk (~sage@2607:f298:a:607:e973:a80c:de9b:db1d) Quit (Ping timeout: 480 seconds)
[17:08] * gucki (~smuxi@dslb-084-057-126-044.pools.arcor-ip.net) Quit (Remote host closed the connection)
[17:11] * sagewk (~sage@2607:f298:a:607:1d79:430b:893e:17a4) has joined #ceph
[17:14] * sagelap (~sage@94.175.239.226) Quit (Ping timeout: 480 seconds)
[17:17] * sagelap (~sage@149.6.120.133) has joined #ceph
[17:18] * TheSnide (~snide@2a00:1c10:5:201:216:3eff:fe7b:ab8d) has joined #ceph
[17:19] * jlogan1 (~Thunderbi@2600:c00:3010:1:4990:f1e9:6310:a09f) has joined #ceph
[17:21] * BManojlovic (~steki@91.195.39.5) Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:33] * sagelap (~sage@149.6.120.133) Quit (Ping timeout: 480 seconds)
[17:37] <elder> HTTPError: HTTP Error 403: Forbidden
[17:37] <elder> I get that on my teuthology run.
[17:41] <elder> Never mind. My build failed.
[17:45] <elder> I think it's something we saw before, dmick. I don't remember what happened or how it got fixed but my build completed, but the rsync failed.
[17:56] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[17:57] * buck (~buck@bender.soe.ucsc.edu) has joined #ceph
[17:58] * mdawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[18:01] * lofejndif (~lsqavnbok@1RDAAEYYS.tor-irc.dnsbl.oftc.net) has joined #ceph
[18:04] <jtang> right off sc12
[18:04] <jtang> see you there nhm! and whoever inktank/ceph people that might be around
[18:12] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: This computer has gone to sleep)
[18:12] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[18:13] <joao> jtang, if you bump into nhm, say hi for me :)
[18:13] <joao> and pay him a beer
[18:13] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[18:13] <joao> belgian beer preferably, iirc
[18:15] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit ()
[18:15] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) has joined #ceph
[18:22] * Leseb (~Leseb@193.172.124.196) Quit (Quit: Leseb)
[18:32] * didders_ (~btaylor@rrcs-71-43-128-65.se.biz.rr.com) has joined #ceph
[18:34] <nhm> jtang: ooh, have fun! I leave on Sunday
[18:34] <nhm> joao: I found a place around the corner from the hotel with 62 belgian beers.
[18:36] <joao> so it should be easy to find you whenever you're not on the booth :p
[18:36] <nhm> joao: I'm afraid they are going to run out
[18:37] <nhm> joao: What happens when 12k engineers/marketing folks decend on a city with a very limited alcohol supply?
[18:38] <joao> forced alcohol withdrawal?
[18:40] <joao> okay, I just made a commit to the docs that had a formatting typo; fixed on a second commit that was *just*fine* on my branch, but it appears to still be borked on master
[18:40] <joao> any hints?
[18:40] <joao> oh
[18:40] <joao> nevermind
[18:40] <joao> chrome's refresh was to blame
[18:43] * noob2 (a5a00214@ircip1.mibbit.com) has joined #ceph
[18:44] <noob2> i know it might not be advisable but can i mount a ceph rbd on multiple servers at the same time?
[18:49] <rweeks> I don't actually think that SLC has a limited alcohol supply
[18:50] <nhm> rweeks: interesting, that's good to know.
[18:50] <rweeks> they might run out of Belgian beers, but there are plenty of breweries there now with local supply
[18:51] <nhm> rweeks: Not that I plan on really drinking *that* much, but in seattle last year people did a pretty good job of draining the better beers from the pubs near the convention center.
[18:51] <rweeks> yeah I imagine
[18:52] * nwatkins (~Adium@c-50-131-197-174.hsd1.ca.comcast.net) has joined #ceph
[18:52] * vagabon (~fbui@au213-1-82-235-205-153.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[18:53] * BManojlovic (~steki@85.222.181.90) has joined #ceph
[19:00] <nhm> I wonder how hard it would be to spray-paint the ceph logo with food coloring on a filled doughnut.
[19:01] <joao> the current logo is pretty geometrical, so it should be feasible
[19:02] <joao> assuming there's such a thing as food coloring in spray-paint format
[19:02] <rweeks> not sure about a donut, but a cake or a cupcake, super easy
[19:02] <nhm> joao: I'm pretty sure I've seen bakeries around here use something like that for cakes.
[19:02] <rweeks> you can get pretty high-rest stuff printed on cakes.
[19:02] <rweeks> er, high-res
[19:03] <rweeks> noob2: No, that is not advisable
[19:03] <nhm> rweeks: sounds like good exec-level meeting propoganda.
[19:03] <rweeks> if you want shared access to ceph you want to either look at API access or CephFS
[19:03] * MikeMcClurg (~mike@62.200.22.2) Quit (Quit: Leaving.)
[19:03] <rweeks> sharing block devices is a bad bad bad bad bad idea, no matter what kind of block storage it is.
[19:04] <elder> A frisbee would be better than a cake I think.
[19:04] <rweeks> not as tasty, though.
[19:04] <nhm> elder: that sounds like good booth swag
[19:04] <elder> That's what I was thinking.
[19:04] <nhm> elder: Sam Just was thinking yo-yo too.
[19:04] <elder> I wasn't sure the context of the discussion.
[19:04] <nhm> elder: It mostly stems from me wanting a filled doughnut.
[19:05] <elder> And by having a ceph logo it's justified as "work"
[19:05] <nhm> elder: I like your thinking!
[19:07] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[19:11] <nhm> Culinary Ceph: A whisper suite treat.
[19:12] <joao> do you guys know if it's possible to change the ceph branch mid-teuthology run?
[19:13] <elder> I don't see how that would work.
[19:14] <joao> my only objective would be to kill the monitors and run the versions from a different branch
[19:14] <joao> I might hack around it and create a workunit for that purpose, but it just seems wrong
[19:15] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[19:24] <elder> joshd, do you know why for rbd, its osd request header always fills in CEPH_NOSNAP for its snapid?
[19:28] <elder> It looks like ceph_osd_new_request(), the other caller of ceph_osd_alloc_request(), doesn't touch that field.
[19:37] <elder> I think I know what's going on joshd.
[19:39] <elder> It gets set unconditionally in the message header after calling ceph_osdc_alloc_request() for some reason. But then that gets overwritten again by ceph_calc_raw_layout().
[19:39] <elder> So the first one--writing CEPH_NOSNAP--appears to be useless.
[19:42] <noob2> how hard is it to move the ceph journal after the osd has been created?
[19:42] <noob2> i think i asked yesterday but i don't remember if someone said
[19:43] <joshd> noob2: not too hard, you just have to stop the osd, flush the journal, and initialize the new journal before starting it again
[19:44] <noob2> ok
[19:44] <noob2> everything with ceph is generally not hard but i like to ask :)
[19:44] <noob2> it seems like a lot of thought has gone into the maint aspect of ceph from the admin side.
[19:45] <joshd> elder: yup, it looks useless to set it first
[19:48] <joshd> noob2: thanks, we try to keep it in mind
[19:53] <noob2> if i put journal size = x and then set it again under the osd.x will that more specific setting take effect?
[19:55] * tziOm (~bjornar@ti0099a340-dhcp0628.bb.online.no) has joined #ceph
[19:57] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[19:57] * PerlStalker (~PerlStalk@perlstalker-1-pt.tunnel.tserv8.dal1.ipv6.he.net) Quit (Quit: rcirc on GNU Emacs 24.2.1)
[19:57] <elder> joshd, I already have a patch building that gets rid of it.
[19:59] <joshd> noob2: yes, the more specific settings (i.e. in [osd.1]) always override the general settings in [osd] or [global]
[20:02] * mdawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Remote host closed the connection)
[20:03] * mdrnstm (~mdrnstm@206-169-78-213.static.twtelecom.net) has joined #ceph
[20:05] <elder> joshd, this seems more limited than it has to be: #define MAX_OBJ_NAME_SIZE 100
[20:05] <elder> It also disagrees with this: #define RBD_MAX_SEG_NAME_LEN 128
[20:05] <noob2> cool.
[20:05] <noob2> thanks josh
[20:06] <joshd> elder: yes, that was raised from 40 a while back just so larger rbd image names would work, but ideally it would not be restricted
[20:07] <elder> Not a big deal to me, really, just something I noticed just now.
[20:07] <elder> Whatever the max path component length is on the osd host is probably the real limit.
[20:07] <elder> 256?
[20:08] <joshd> no, it uses hashing on long names to get around that
[20:09] <elder> So a long name is converted to hash + shorter name or something?
[20:09] <elder> Or hash with name stored somewhere else?
[20:11] * yehuda_hm (~yehuda@2602:306:330b:a40:cbf:cddb:830:bce8) has joined #ceph
[20:11] <elder> In any case I think the *only* thing this affects on rbd is the image name. And I don't expect limiting that to about 75 characters is not so troubling.
[20:12] <elder> (Ignore either "don't" or "not" in that sentence.)
[20:15] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) has joined #ceph
[20:19] <noob2> is it normal for ceph to only write to 2 out of my 4 osd nodes? i have 600GB of space that i setup but i only allocated 20GB to rbd devices. maybe those are only on 2 nodes at this point if i have pool size of 2?
[20:20] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) Quit (Read error: Connection reset by peer)
[20:20] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) has joined #ceph
[20:20] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[20:20] <joshd> noob2: no, how many pgs do you have?
[20:21] <joshd> noob2: the full output of ceph -s would be informative
[20:21] <noob2> let me find out
[20:22] <noob2> he says health ok and all osd's are up and in
[20:22] <joshd> ceph osd dump will show the number of pgs in each pool
[20:22] <noob2> ok
[20:22] <noob2> in my pool i setup i have pg_num 8 and pgp_num 8
[20:23] <joshd> that would be the problem
[20:23] <noob2> not enough pages?
[20:23] <joshd> placement groups
[20:23] <noob2> oh sorry
[20:23] <noob2> yes
[20:23] <noob2> ok let me tweak that. thanks :)
[20:24] <noob2> wow it says ballpark 100 pg's per osd
[20:24] * dmick (~dmick@2607:f298:a:607:e473:ba49:fec:7032) has joined #ceph
[20:24] <noob2> i have 12 osd's haha
[20:24] <noob2> so i should have pg=1200?
[20:25] <TheSnide> isnt the fs layer stable ?
[20:25] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) has joined #ceph
[20:25] <joshd> noob2: yeah, or a bit more if you plan to add more osds
[20:25] * tryggvil_ (~tryggvil@rtr1.tolvusky.sip.is) Quit ()
[20:26] <noob2> wow
[20:26] <noob2> alright i was way off haha
[20:26] <TheSnide> (cause i see a big fat disclaimer on the 5min doc)
[20:27] <joshd> TheSnide: cephfs is still considered unstable. it's getting better, but still needs a more qa (especially with multiple active mdses)
[20:27] <TheSnide> mdses ?
[20:27] <TheSnide> (sorry, n00b here)
[20:28] <jefferai> this is weird -- when I am adding a new monitor to the cluster, it shows up in the monmap but reports HEALTH_WARN 1 mons down
[20:28] <TheSnide> metadata servers ?
[20:28] <jefferai> not sure why
[20:28] <elder> joshd, is an object name sent over the wire assumed to be NUL-terminated on the OSD side?
[20:28] <joshd> TheSnide: yeah, metadata servers
[20:30] <dmick> elder: it's encoded
[20:30] <elder> Oooh!
[20:30] <dmick> IIRC that's count/bytes
[20:30] <dmick> but I don't know for sure
[20:31] <elder> That would suggest that no NUL-termination is expected.
[20:31] <dmick> looking at encoding.h
[20:31] <elder> I think that's right. Thank you.
[20:31] <dmick> __u32 len = s.length();
[20:31] <dmick> encode(len, bl);
[20:31] <dmick> bl.append(s.data(), len);
[20:32] <elder> It's the decode I think I'm interested in, to be precise.
[20:32] <elder> But I do think we've concluded the right answer.
[20:32] <dmick> I can say with *some* confidence that it's probably the inverse operation :)
[20:32] * tryggvil (~tryggvil@rtr1.tolvusky.sip.is) Quit (Ping timeout: 480 seconds)
[20:34] <dmick> rbd_children is an array of encoded sets-of-strings
[20:35] <dmick> so one could look at the "internal" format of an rbd_children entry
[20:35] <elder> I'm convinced, I'm not going to look any further.
[20:35] <joao> jefferai, is the monitor running?
[20:36] <dmick> yeah. le 32-bit followed by bytes, no NUL
[20:37] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) Quit (Quit: Leseb)
[20:37] <jefferai> joao: yeah, it is..it slurps the osds and then calls for a new election
[20:37] <jefferai> nothing in the log after that
[20:38] <elder> dmick, thank you for following that all the way through for me.
[20:38] <dmick> np; helps me too
[20:39] * sakine (~sakine@659AABWVT.tor-irc.dnsbl.oftc.net) has joined #ceph
[20:39] <joao> jefferai, any chance you can query directly that monitor over the admin socket with 'mon_status'?
[20:40] <jefferai> joao: sure, what would the command be?
[20:40] <joao> would be something like 'ceph --admin-daemon path/to/mon.X.asok (or similar) mon_status'
[20:40] <joao> hey, dmick, any chance we can get rid of sakine?
[20:41] <joao> I could also just ignore him/her/it
[20:42] <joao> sakine!*@* added to ignore list.
[20:42] <joao> yay
[20:43] <elder> Interesting. The OSD client sets aside the maximum amount of space for an object name in its request buffer rather than just what's needed.
[20:44] <elder> OK.
[20:44] <elder> shit
[20:44] <jefferai> joao: I'm not sure where I'm supposed to find that path
[20:44] <jefferai> like, what is the asok file?
[20:44] <jefferai> I don't see one
[20:44] <elder> I guess I'm the target. SHit.
[20:45] <jefferai> elder: target?
[20:45] * BManojlovic (~steki@85.222.181.90) Quit (Ping timeout: 480 seconds)
[20:45] <joao> jefferai, trying to figure it out; not sure where it land on a proper ceph install
[20:45] <jefferai> joao: oddly, after restarting the daemon twice, it's fine
[20:45] <joao> anyone have an idea?
[20:45] <jefferai> the first time I saw something in the log about bad authorization
[20:45] <elder> I guess -sakine- is an odd sort of robot.
[20:45] <jefferai> but when I restarted the daemon again, all was well
[20:45] <joao> jefferai, oh, that's a bummer from my perspective
[20:45] <jefferai> elder: yeah, it ain't just you
[20:45] <jefferai> joao: ah
[20:45] <jefferai> well
[20:46] <jefferai> I need to add another :-)
[20:46] <joao> elder, yeah, we should have more chanops
[20:46] <jefferai> let me see if that gives me the same problem
[20:46] <joao> jefferai, hey, I'm glad it worked for you :)
[20:46] <elder> dmick, shit.
[20:47] <joao> I was just wondering if by troubleshooting it we could assess if that's a bug or what
[20:47] <joao> btw, jefferai, what version are you running?
[20:48] <TheSnide> joshd: so, the 1-mdses would be safe then ?
[20:48] * BManojlovic (~steki@85.222.184.27) has joined #ceph
[20:48] <joao> elder, /ignore sakine!*@*
[20:48] <jefferai> joao: ok, replicated on the next mon box
[20:49] <jefferai> last thing in log is mon.e calling a new monitor election
[20:49] <joao> replicated? you mean reproduced?
[20:49] <jefferai> and ceph -s reports one down
[20:49] <jefferai> yes
[20:49] <jefferai> sorry
[20:49] <joao> okay
[20:49] <jefferai> so, we can try that admin-daemon command
[20:49] <joao> right
[20:49] <jefferai> do I just give it the path to the mon data dir?
[20:49] <joao> no, it should be somewhere maybe down /var/lib ?
[20:50] <joshd> TheSnide: much safer, but I'd be wary of putting it in production still
[20:50] <jefferai> ah
[20:50] <joao> I'm not really sure; anyone have an idea where the admin socket is on a proper ceph install?
[20:50] <jefferai> /var/run/ceph
[20:50] <jefferai> ok
[20:50] <jefferai> I see
[20:50] <joao> okay
[20:50] <jefferai> joao: running current precise packages from your debian-testing repo
[20:50] <joao> jefferai, give me just a sec; phone call
[20:52] <jefferai> joao: http://paste.kde.org/600920/
[20:52] <jefferai> sure, no problem
[20:52] <sakine> joao say hello to the one on the phone ;)
[20:53] * Steki (~steki@85.222.179.105) has joined #ceph
[20:53] <joao> sorry; my brother handed me the phone, while talking to me and while I was trying to form a coherent thought to reply to you
[20:53] <joao> let me take a look at that
[20:54] * guigouz1 (~guigouz@201-87-100-166.static-corp.ajato.com.br) has joined #ceph
[20:55] <joao> jefferai, is mon.e still electing?
[20:55] <joao> it may sometimes take a bit until quorum is formed
[20:55] <joao> erm, I mean, until the monitor joins the quorum
[20:55] <joao> ;)
[20:56] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[20:57] * sakine is now known as bonboncha
[20:59] * BManojlovic (~steki@85.222.184.27) Quit (Ping timeout: 480 seconds)
[20:59] <elder> bonboncha, why not sakine?
[21:00] <joao> why do I have a feeling that my ignore filter will no longer work?
[21:00] <joao> yep...
[21:00] * lofejndif (~lsqavnbok@1RDAAEYYS.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[21:01] * Steki (~steki@85.222.179.105) Quit (Ping timeout: 480 seconds)
[21:01] <joao> gladly, did change the nick but not the username
[21:02] <bonboncha> elder, just to break joao's ignore
[21:05] * didders_ (~btaylor@rrcs-71-43-128-65.se.biz.rr.com) Quit (Quit: didders_)
[21:05] * tryggvil (~tryggvil@16-80-126-149.ftth.simafelagid.is) has joined #ceph
[21:11] <jefferai> joao: sorry, back
[21:11] <jefferai> joao: yep
[21:12] <joao> still electing then? could you check 'ceph -v'?
[21:12] * sattar (~Chatzilla@9KCAAC2IV.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:12] <sattar> what the fuck is ceph?
[21:12] <joao> I just hope that's a recent enough version so not to incur in the election loop issue
[21:13] * Cube (~Cube@12.248.40.138) has joined #ceph
[21:14] <joao> oh, actually, I don't think it would incur in that issue; as far as I recall, the path taken to trigger it was a different one, but am not sure
[21:14] <jefferai> joao: ceph version 0.53 (commit:2528b5ee105b16352c91af064af5c0b5a7d45d7c)
[21:15] <joao> yeah, shouldn't be what I was thinking
[21:15] * jjgalvez (~jjgalvez@12.248.40.138) has joined #ceph
[21:15] <jefferai> and yes, still electing
[21:15] <jefferai> I can try bouncing the daemon a few times, but that won't help your debugging if it fixes the problem :-)
[21:15] <joao> jefferai, can you drop the mon logs for mon.a and mon.e somewhere?
[21:15] <jefferai> .a?
[21:16] <joao> the one with rank = 0
[21:16] <joao> should be the leader
[21:16] * ChanServ sets mode +o dmick
[21:16] <joao> I'm assuming that 'ceph -s' reports an existing quorum, right?
[21:17] <elder> dmick, back?
[21:17] <dmick> yes
[21:17] <jefferai> joao: yep
[21:17] * noob2 (a5a00214@ircip1.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[21:17] <jefferai> I can send logs, yes
[21:17] * danieagle (~Daniel@177.99.134.107) has joined #ceph
[21:17] <joao> dmick, could you add a couple more of us to the access list for redundancy purposes? :)
[21:18] * dmick sets mode +b bonboncha!*@*
[21:18] <jefferai> joao: e: http://paste.kde.org/600950/
[21:18] <dmick> joao: if I knew how. I can research
[21:18] <sattar> yes, I agree with joao, I volounteer
[21:18] <jefferai> dmick: I can help
[21:18] * bonboncha was kicked from #ceph by dmick
[21:18] <joao> dmick, something along the lines of /chanserv access #ceph add <nick> ?
[21:19] <joao> can't recall exactly; long gone my chanops days
[21:19] <dmick> consulting chanserv right now
[21:19] <jefferai> joao: ah!
[21:19] <jefferai> 2012-11-09 15:19:28.646834 7f105e362700 1 mon.a@0(leader).elector(54) discarding election message: 192.168.88.224:6800/0 not in my monmap e19: 5 mons at {a=192.168.66.201:6789/0,b=192.168.66.202:6789/0,c=192.168.66.204:6789/0,d=192.168.66.223:6789/0,e=192.168.66.224:6789/0}
[21:19] <jefferai> every five seconds
[21:19] <jefferai> that's on mon.a
[21:20] <joao> oh...
[21:20] <joao> the 6800 bug again
[21:20] <joao> that's why restarting the monitors fixed it for you
[21:20] <jefferai> I did do the
[21:20] <jefferai> ceph mon add <name> <ip>[:<port>]\n";
[21:20] <jefferai> from the wiki
[21:20] <jefferai> er, doc site
[21:20] <jefferai> without the \n"; of course :-)
[21:20] <jefferai> and the ceph.conf was updated on all boxes, including mon.a
[21:20] <joao> I would *really* love to get hold on to the logs of your mon.e and mon.d (if that was the one that you fixed by multiple restarts)
[21:21] <jefferai> sure -- the paste I sent above was the entirety of e's logs
[21:21] <jefferai> this one: http://paste.kde.org/600950/
[21:21] <joao> jefferai, you did correctly; but there's a lingering, recurring bug that only happens once in a while and we've not been able to figure out why
[21:21] <joao> that makes the monitor bind on port 6800 when it shouldn't
[21:21] * ChanServ sets mode +o elder
[21:22] <jefferai> joao: mon.d log
[21:22] <jefferai> http://paste.kde.org/600956/
[21:22] <elder> Watch out!
[21:22] <jefferai> also,I'll repaste e's so that it doesn't expire after 1 day
[21:22] <jefferai> sec
[21:22] <jefferai> http://paste.kde.org/600962/
[21:23] <joao> jefferai, thanks!
[21:23] <jefferai> joao: on mon.d, I cut off the traffic at the end of hte log that was just it complaining about e not being in the monmap
[21:23] <joao> yeah
[21:23] <jefferai> joao: do you want me to leave mon.e in this state?
[21:23] <jefferai> cause I can
[21:23] <jefferai> at least for now
[21:23] <joao> jefferai, my best solution to your problem is to restart the monitor
[21:23] <jefferai> sure, but if it's helpful to you to leave it as it is
[21:24] <joao> if it is, I have no idea how to leverage it :\
[21:24] <joao> this is something that appears to happen when the monitor starts
[21:24] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[21:25] <joao> but thanks for offering
[21:25] <joao> btw
[21:25] <joao> could you just check if there's anything using port 6789?
[21:26] <jefferai> yah
[21:26] <jefferai> nope
[21:26] <jefferai> nothing
[21:26] <joao> okay
[21:26] * calebamiles (~caleb@c-24-128-194-192.hsd1.vt.comcast.net) Quit (Ping timeout: 480 seconds)
[21:26] <jefferai> huh
[21:26] <jefferai> now it bound on 6801
[21:26] <joao> jefferai, before you restart the monitor, it would be great if you could add 'debug ms = 20' to your conf file
[21:26] <joao> oh
[21:26] <jefferai> too late for the restart
[21:26] <joao> well, next time
[21:26] <jefferai> but, didn't bind on 6789
[21:27] <joao> just in case it happens
[21:27] <jefferai> what section should I put it in?
[21:27] <jefferai> mon.e?
[21:27] <jefferai> or global?
[21:27] <joao> yeah, if it doesn't bind on 6789 it will keep being shunned from the cluster
[21:27] <joao> mon.e, yeah
[21:27] <joao> should be fine there
[21:27] <jefferai> ok
[21:27] <jefferai> trying again
[21:27] <jefferai> btw, thanks for being so helpful (all of you)
[21:28] <dmick> The address to mail your check is: ;)
[21:28] * calebamiles (~caleb@c-24-128-194-192.hsd1.vt.comcast.net) has joined #ceph
[21:28] <jefferai> joao: this time of course it worked
[21:28] <joao> lol dmick
[21:28] <joao> yeah, of course :p
[21:28] <joao> my experience with that bug is that it goes away as soon as I crank up debugging
[21:28] <jefferai> hah
[21:29] <jefferai> dmick: seriously, I'd love to support you guys, but I have no monies
[21:29] <dmick> jefferai: I'm only joking of course
[21:29] <jefferai> didn't even have enough money to buy all the hard drives I needed
[21:29] * calebamiles (~caleb@c-24-128-194-192.hsd1.vt.comcast.net) Quit ()
[21:29] * jaloose (~tor@659AABWXP.tor-irc.dnsbl.oftc.net) has joined #ceph
[21:29] <jefferai> I know, but as someone that also does open source development, I know the feeling
[21:29] <jefferai> dmick: elder^ new TOR kiddie
[21:30] <joao> jefferai, there's different ways to contribute to the project, and you're doing just fine having patience while we try to chase down bugs
[21:30] <joao> my 2cents
[21:30] <jefferai> joao: sure
[21:30] <jefferai> That's how I feel about users that help me chase down bugs in Tomahawk :-)
[21:30] <dmick> oh?
[21:30] <joao> yeah, dmick, kindly request you to boot jaloose out ;)
[21:31] <elder> jaloose?
[21:31] <jefferai> yeah
[21:31] * jaloose was kicked from #ceph by elder
[21:31] <elder> HAH!
[21:31] * dmick sets mode +b *!*@*.tor-irc*
[21:31] <jefferai> WHAM-O
[21:31] <elder> I HAVE THE POWER!!!
[21:32] <joao> oh snap!
[21:32] <dmick> added a ban for hostnames with tor-irc, I hope
[21:32] <dmick> ffs, what children
[21:32] <joao> yeah
[21:32] <dmick> joao: HAVE YOU READ YOUR PMS YET
[21:32] <joao> ops
[21:32] <elder> Yes, children. But seriously, I HAVE THE POWER!!!
[21:33] <dmick> with great power comes great responsibility
[21:34] <elder> Who says?
[21:34] <joao> elder, related: http://www.youtube.com/watch?v=z33tH-JdPDg
[21:34] <elder> My theme song.
[21:35] <dmick> which is ripping off http://www.youtube.com/watch?v=12VUjgYMm1U&feature=related
[21:35] <elder> I'm the lyrical Jesse James, by the way.
[21:35] <dmick> youngsters...
[21:36] <joao> damn, I google Jesse James and still have no clue what you meant :x
[21:36] <jefferai> elder: http://toma.hk/tJyaaaab
[21:36] <elder> (It's one of the lyrics in that song)
[21:36] <joao> oh
[21:37] * houkouonchi-work (~linux@12.248.40.138) Quit (Read error: Connection reset by peer)
[21:37] <joao> just proves how much attention I pay to lyrics
[21:37] <jefferai> damn
[21:37] <jefferai> I got beat to it
[21:37] <jefferai> because I thought it *was* C+C
[21:37] <jefferai> never knew it was Snap!
[21:37] * mdawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) has joined #ceph
[21:37] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[21:38] <joao> jefferai, I had no idea it was snap until World Hosting Days
[21:39] * houkouonchi-work (~linux@12.248.40.138) Quit (Remote host closed the connection)
[21:40] * calebamiles (~caleb@c-24-128-194-192.hsd1.vt.comcast.net) has joined #ceph
[21:40] <jefferai> huh
[21:40] <jefferai> Did they play there or something?
[21:41] <jefferai> blast from the past?
[21:41] <joao> yeah
[21:42] * houkouonchi-work (~linux@12.248.40.138) has joined #ceph
[21:42] <joao> sometime after the boxing match
[21:42] <joao> that was some serious party
[21:42] <dmick> hm
[21:42] <dmick> "The Power" was a massive hit, and it was based on the rap from Chill Rob G's "Let The Words Flow" and a sample of Disco singer Jocelyn Brown's song "Love's Gonna Get You," which is where the "I've got the power" line comes from.
[21:43] <dmick> http://www.youtube.com/watch?v=749MZ97Kztk
[21:43] <jefferai> Boxing match?
[21:44] <jefferai> man
[21:44] <jefferai> sounds like a good conference
[21:45] <jefferai> http://toma.hk/wJyaaaab
[21:45] <jefferai> huh
[21:46] <jefferai> Hm. "I've got the power...to chase you by the hour..."
[21:46] <jefferai> is where that come from
[21:46] <jefferai> less impactful with the second lyric
[21:46] <dmick> jefferai: yeah, in that youtube link
[21:47] <elder> dmick, wasn't there some controversy about their use of a young model for that voice part in the video rather than Brown?
[21:48] * guigouz1 (~guigouz@201-87-100-166.static-corp.ajato.com.br) Quit (Ping timeout: 480 seconds)
[21:48] <dmick> it wasn't Brown, but another singer (in the C+C version, which was a much more lucrative hit in the US)
[21:49] <elder> http://memegenerator.net/instance/27859702
[21:49] <dmick> IIRC she won a profit lawsuit
[21:50] <dmick> http://en.wikipedia.org/wiki/Martha_Wash
[21:54] <mdawson> do ceph mons go on the public or cluster network?
[21:54] <jefferai> mdawson: they need access to the cluster network
[21:54] * jtang1 (~jtang@79.97.135.214) has joined #ceph
[21:54] <jefferai> unless I'm wrong :-0
[21:54] <dmick> but for contacting them from clients, public is fine
[21:54] <jefferai> :-)
[21:55] <dmick> low bw requirements IOW
[21:56] <jefferai> but high disk storage requirements
[21:56] <jefferai> what does the mon store, exactly?
[21:56] <dmick> ? I am surprised by this claim
[21:56] <jefferai> 10GB minimum
[21:57] <joshd> jefferai: dmick: mdawson: only the osds use separate networks
[21:57] <jefferai> http://ceph.com/docs/master/install/hardware-recommendations/#data-storage
[21:57] * guigouz1 (~guigouz@201-87-100-166.static-corp.ajato.com.br) has joined #ceph
[21:57] * sattar (~Chatzilla@9KCAAC2IV.tor-irc.dnsbl.oftc.net) Quit (Quit: ChatZilla 0.9.89 [Iceweasel 10.0.9/20121013091107])
[21:57] <jefferai> joao: when I tried to start a mon on a box that only had the public network, it complained about not having an ip address on the cluster network
[21:57] <joshd> mons connect to whichever they have defined first in (public, cluster) order iirc
[21:57] <jefferai> er
[21:57] <jefferai> joshd: ^
[21:57] <jefferai> sorry
[21:58] <jefferai> hm
[21:58] <joshd> usually you'd want to assign mons an ip:port explicitly though
[21:58] * Leseb (~Leseb@5ED17881.cm-7-2b.dynamic.ziggo.nl) has joined #ceph
[21:58] <jefferai> Yeah, I do
[21:58] <jefferai> although on their own vlan
[21:59] <jefferai> I only have cluster/public network statements in [global] and public is first
[22:03] <mdawson> Building my first RDB cluster with 4 nodes identical nodes. Can I put OSDs on all four nodes and mon on node0, node1, and node2? I'm not using CephFS, so I don't need MDS, right?
[22:03] <dmick> mdawson: all sounds good, yes
[22:04] <dmick> if the nodes have multiple disks, you may want to even run more than one OSD per node
[22:05] <mdawson> so if public network = 10.10.1.0/24 and cluster network = 10.10.0.0/24, then should [mon.1] have "mon addr = 10.10.1.101:6789" or should that be on the cluster network?
[22:05] <mdawson> dmick: yes, multiple OSDs per node
[22:06] <dmick> mdawson: mon on public network sounds right to me. Just researching if you have to actually assign that in the .conf; you should not have to assign OSD addresses at least
[22:08] <dmick> if I'm reading MonMap::build_initial right, you have to specify monitor addresses
[22:08] <dmick> either in the conf or with -m
[22:08] <mdawson> thanks
[22:10] * eternaleye (~eternaley@tchaikovsky.exherbo.org) Quit (Remote host closed the connection)
[22:11] * eternaleye (~eternaley@tchaikovsky.exherbo.org) has joined #ceph
[22:26] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:35] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[22:40] * mdrnstm1 (~mdrnstm@206-169-78-213.static.twtelecom.net) has joined #ceph
[22:40] * mdrnstm (~mdrnstm@206-169-78-213.static.twtelecom.net) Quit (Read error: Connection reset by peer)
[22:40] <mdawson> Using 512GB SSDs for OS and OSD journal partitions. What would be a reasonable HPA setup? Is 10GB the best practice for OSD journal size, or any reason to go bigger?
[22:40] <sjust> HPA?
[22:40] <sjust> 10G is a commonly used size
[22:40] <mdawson> HPA = Host Protected Area
[22:40] <sjust> what are you using for the osd filestore disks?
[22:41] <mdawson> 3TB SATA
[22:41] <sjust> 10G seems reasonable
[22:42] <mdawson> what filesystem is recommended for the journal partitions?
[22:42] <sjust> usually it's best to just use a raw partition
[22:44] <mdawson> just create a partition, never format it, and point to it in ceph.conf?
[22:45] <sjust> yeah
[22:45] <mdawson> sjust: thanks
[22:46] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[22:53] <elder> OK I give up. dmick can you find where ceph_osd_op->op gets encoded on the osd side?
[22:54] <elder> Now that I gave up I think I figured it out.
[22:55] <elder> It doesn't get encoded...
[22:55] <elder> No wonder I couldn't find it.
[22:55] * joey_ (~terje@71-218-31-90.hlrn.qwest.net) has joined #ceph
[22:58] * mdawson (~chatzilla@23-25-46-97-static.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[23:04] * guigouz1 (~guigouz@201-87-100-166.static-corp.ajato.com.br) Quit (Quit: Computer has gone to sleep.)
[23:10] <dmick> glad I could help elder :-P
[23:10] <elder> I'm glad you're there to help.
[23:17] <dmick> I'm distracted by fascinating straces of bonnie++
[23:17] <dmick> it's writing every ascii character in one-byte-writes to my rbd image
[23:17] <dmick> I expect *really* high performance out of this test
[23:21] * danieagle (~Daniel@177.99.134.107) Quit (Quit: Inte+ :-) e Muito Obrigado Por Tudo!!! ^^)
[23:21] * pbeadles (~pat@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[23:23] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[23:23] * miroslav1 (~miroslav@c-98-248-210-170.hsd1.ca.comcast.net) has joined #ceph
[23:23] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Read error: Connection reset by peer)
[23:24] * JoDarc (~Adium@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[23:30] * s_parlane (~scott@121-74-248-190.telstraclear.net) has joined #ceph
[23:37] <dspano> I just noticed while testing a snapshot rollback that the rollback did not take effect until I umounted then remounted the volume in the vm. Is that normal?
[23:40] * drokita (~drokita@199.255.228.10) has joined #ceph
[23:40] <benpol> dspano: I would expect so, especially since as far as the vm is concerned it's just talking to a regular block device (not a cluster filesystem)
[23:44] <benpol> might even be inadvisable to do a snapshot rollback while the rbd image is mounted.
[23:46] <dspano> benpol: Thanks. The main reason I would need to do it is when our production management software vendor upgrades our system. In that scenario, I would be able to shutdown and unmount the volume anyway.
[23:50] * mdawson (~chatzilla@c-98-220-189-67.hsd1.in.comcast.net) has joined #ceph
[23:52] <dspano> I love how well this works. For a guy running a small datacenter like me this rollback stuff is like opening the Christmas present you always wanted.
[23:53] * drokita (~drokita@199.255.228.10) Quit (Ping timeout: 480 seconds)
[23:55] * vata (~vata@208.88.110.46) Quit (Quit: Leaving.)
[23:58] <benpol> dspano: indeed, very cool stuff
[23:59] <dspano> benpol: Have a great weekend.
[23:59] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.