#ceph IRC Log


IRC Log for 2012-08-17

Timestamps are in GMT/BST.

[0:03] <jamespage> SpamapS, sagewk: actually I'm still around - now that I have the final MIR/Security ack on libfcgi I can minimize the packaging delta
[0:03] * jamespage reads backscroll
[0:04] * tightwork (~tightwork@rrcs-71-43-128-65.se.biz.rr.com) Quit (Ping timeout: 480 seconds)
[0:05] <sagewk> jamespage: great!
[0:05] <sagewk> jamespage: and where does that leave radosgw? do we need a separate bug or something to get that MIR'd?
[0:06] <jamespage> sagewk, I think that if we want it to reside in main, then yes we do need to raise another MIR bug - I see SpamapS has already started the conversation in the existing one.
[0:06] <jamespage> lemme just check something...
[0:08] <jamespage> sagewk, the scope of work for 12.10 specifically excluded radosgw in main
[0:09] <jamespage> sagewk, all features should be enabled still (hence my push on libfcgi so we can at least have radosgw in universe)
[0:11] * loicd1 (~loic@brln-4db801aa.pool.mediaWays.net) Quit (Quit: Leaving.)
[0:12] <sagewk> jamespage: ah, okay. is there a blueprint for that, or is it internal?
[0:12] <yehudasa> SpamapS: jdstrand complains about defensive programming, but at the time we addressed all the relevant occurances
[0:12] <sjust> Leseb: what does your workload look like and what sort of improvement did you get?
[0:13] <yehudasa> SpamapS: and in any case it wasn't specific to radosgw
[0:14] <jamespage> sagewk, well its kinda implicit in https://blueprints.launchpad.net/ubuntu/+spec/servercloud-q-ceph-object-integration
[0:14] <jamespage> but I agree it is vague
[0:14] <jamespage> (I had to double check in an internal doc as well...)
[0:16] <Leseb> sjust: it's just a simple 'dd' with direct io, with filestore flusher = true I got ~25MB/sec and with filestore flusher = false I got ~60MB/sec
[0:16] <sjust> on rbd or cephfs?
[0:16] <Leseb> on rbd
[0:16] <sjust> dd block size?
[0:16] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[0:16] <Leseb> big, 1G
[0:16] <sjust> was this directly on the disk, or through a filesystem?
[0:17] <Leseb> but bonnie++ also shows better results
[0:17] <Leseb> dd to the fs using direct flag
[0:17] <sjust> ok
[0:17] <sjust> how long did you run it?
[0:17] * aliguori (~anthony@ Quit (Remote host closed the connection)
[0:17] <Leseb> bonnie?
[0:18] <Leseb> bonnie++ -s 8192 -r 4096 -u root -d /mnt/ -m Control01
[0:19] <Leseb> (my journal is stored on tmpfs)
[0:21] <Leseb> could you elaborate a little bit about this option please? it's still a little unclear --> filestore flusher
[0:22] <sagewk> jamespage: it seems like radosgw would be needed for the frank user story?
[0:22] <jamespage> sagewk, I think so yes....
[0:24] <jamespage> sagewk, anyway - lets see where the conversation with Ubuntu security goes - I'm about to go on leave for the next 12 days - I'm sure SpamapS will keep nudging it forwards....
[0:24] <sjust> Leseb: we have observed cases where the filesystem fails to attempt to flush dirty data until we force it with a sync. we would prefer that it write out data pretty much at all times and just use the sync to establish a commit point. filestore flusher runs sync_file_range on the written data right after it's written in an attempt to get the filesystem to write out the data earlier.
[0:25] <sjust> it's frequently not a winning strategy, however
[0:25] <sagewk> jamespage: yep, okay. thanks!
[0:30] <Leseb> sjust: hum so enabling this option is not the safest way right?
[0:31] <sjust> no, it's safe either way
[0:31] <sjust> just a way to try to cajole the filesystem into doing the writes earlier
[0:31] <sjust> it'll definitely happen by the end of the sync
[0:32] <Leseb> ooh ok :)
[0:32] <sjust> if the filesystem is doing the right thing, filestore flusher is an unhelpful hack :)
[0:33] <Leseb> interesting, the thing is I'm running this test pretty old machine
[0:34] <Leseb> it's Intel Xeon CPU 3050 @ 2.13GHz x2 and 4G RAM
[0:34] <Leseb> and while writing the server is struggling
[0:39] <wido> Short question about rados_write
[0:39] <wido> It fails when writing 100mb at once, has that something to do with rados_max_write ?
[0:40] <wido> Ah, while asking I found: osd_max_write_size
[0:40] <wido> never mind
[0:40] <Leseb> sjust: really big thanks anyway :D
[0:41] <sjust> Leseb: high cpu utilization you mean?
[0:41] <Leseb> sjust: yes :)
[0:43] <Leseb> sjust: really dying??? I assume that nowadays 4 cores and 8GB RM for 3 OSD is a good recommendation?
[0:43] <sjust> so it's 4 osds on 2 cores and 4GB?
[0:43] <sjust> that might be a bit tight
[0:43] <sjust> what is the total throughput it's serving?
[0:44] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[0:44] <Leseb> no at the moment each server (the old one) only has one OSD
[0:45] <Leseb> it's Intel Xeon CPU 3050 @ 2.13GHz x2 and 4G RAM for one OSD, the disks are 15K, write at 110MB and the network is gigabit
[0:46] <Leseb> the radios bench shows Bandwidth (MB/sec): 75.069
[0:46] <wido> Leseb: I'm still running a Atom cluster, 4GB of RAM and 4 OSDs per server, 10 servers in total
[0:46] <wido> That works, but under heavy recovery it sometimes becomes to much for the Atoms
[0:47] <Leseb> wido: what is your value for this option filestore flusher?
[0:48] <Leseb> because for me setting, setting this option to false really improve the client writes
[0:48] <Leseb> -setting
[0:48] <wido> Leseb: I haven't changed that one. Must be honest, have done very much performance testing
[0:49] <wido> I'm seeing 35MB/sec write right now, but I have replication set to 3 and it's all on Gig network
[0:49] <wido> Started rados bench
[0:49] <Leseb> where do you store your journal? do you have a private network for the 'internal replication'?
[0:50] <wido> Leseb: I'm using a 80GB SSD for the journals, use LVM to split it up and each OSD has 4GB journal.
[0:50] <wido> I'm not using a dedicated network between the OSD's, so client traffic goes over the same NIC as replication
[0:52] <Leseb> wido: how fast is your SSD? and your disks?
[0:53] <wido> Leseb: SSD is doing somewhere around 100MB/sec write, Intel X25-M (oldie) and using 2TB 5400RPM disks
[0:53] <wido> Leseb: my config is online at: http://zooi.widodh.nl/ceph/ceph.conf
[0:54] <wido> I'm going afk, almost 01:00 :)
[0:55] <Leseb> wido: thanks I will have a look :D
[0:55] <Leseb> wido: almost 01:00 and still really hot in the NL :D
[0:58] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:24] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[1:25] <yehuda_hm> is it me or the recaptcha words are getting impossible?
[1:25] <yehuda_hm> it just presented me with one of the words in hebrew
[1:28] * tnt (~tnt@113.39-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:28] <mikeryan> yehuda_hm: did you enter it in hebrew?
[1:28] <yehuda_hm> yes, it worked!
[1:31] * tightwork (~tightwork@ has joined #ceph
[1:35] * lxo (~aoliva@lxo.user.oftc.net) Quit (Remote host closed the connection)
[1:37] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[1:45] * Leseb_ (~Leseb@ has joined #ceph
[1:51] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Ping timeout: 480 seconds)
[1:51] * Leseb_ is now known as Leseb
[1:56] <sagewk> test_librbd_fsx has a fully clean valgrind memcheck run, yay
[1:57] <joshd> hooray!
[1:58] * Leseb (~Leseb@ Quit (Quit: Leseb)
[2:05] * bchrisman (~Adium@ Quit (Quit: Leaving.)
[2:12] <tightwork> how can I find out what osd IDs are in the cluster?
[2:13] <joshd> check out 'ceph osd dump' and 'ceph osd tree'
[2:14] <tightwork> thank you
[2:27] * Cube (~Adium@ Quit (Quit: Leaving.)
[2:48] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[2:58] * yehuda_hm (~yehuda@99-48-179-68.lightspeed.irvnca.sbcglobal.net) has joined #ceph
[3:08] * Tv_ (~tv@ Quit (Quit: Tv_)
[3:10] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) Quit (Quit: Leaving.)
[3:18] * dmick (~dmick@ Quit (Quit: Leaving.)
[4:02] * scheuk (~scheuk@ has joined #ceph
[4:02] <scheuk> hello
[4:04] <scheuk> I have a couple of higher level questions about ceph
[4:06] * scheuk (~scheuk@ Quit ()
[4:06] * scheuk (~scheuk@ has joined #ceph
[4:08] <iggy> scheuk: you should just ask, but you'll likely get more answers in California work hours
[4:15] <scheuk> my first question is what filesystem is suggested to use for the OSDs? Everything I read is conflicting between ext4, xfs, and btrfs
[4:15] <womble> I've only heard btrfs suggested.
[4:17] <scheuk> and my second question is in reguards to using ceph for shared storage for openstack/nova instances, wich would be better to use Cephfs or RBD for all of the running instances?
[4:18] <scheuk> I understand that the nova-volume(EBS) would use RDS
[4:19] <scheuk> are there any benefits to using one over the other?
[4:20] <tightwork> I am looking at osd tree, osd.2 is shown as down. How can I check the host to ensure ceph is running? what is the process name?
[4:21] <womble> I wouldn't use CephFS for anything I could use a block device for. But then, I'm a bit weird like that.
[4:21] <tightwork> ceph-osd ?
[4:22] <iggy> scheuk: xfs, rbd
[4:23] <tightwork> im using ubuntu, try to service start ceph, no errors anywhere nothing in /var/log/ceph/ceph-osd.2.log just returns console yet no ceph-osd is running
[4:27] * chutzpah (~chutz@ Quit (Quit: Leaving)
[4:37] * renzhi (~renzhi@ has joined #ceph
[4:46] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit (Quit: adjohn)
[4:49] * The_Bishop (~bishop@2a01:198:2ee:0:c50d:2f48:eff6:9e1d) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[4:55] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[4:55] <tightwork> create mon map, run ceph-osd -i 2 --mkfs --monmap /tmp/monmap --mkkey, creates the filesys. using service ceph start does not work. i can use ceph-osd -d -i 2 and get info about journal _open under osd/ceph-2 that its finally working but ceph osd tree shows osd.2 down ?
[5:01] <tightwork> :-/
[5:05] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[5:07] * nhm (~nhm@ Quit (Ping timeout: 480 seconds)
[5:14] <scheuk> what's are the differences between cephfs and rdb on the osd end?
[5:15] <scheuk> does a large cephfs file (10gb) get chunked up and split accross OSDs like an rbd volume?
[5:18] <iggy> scheuk: nothing... it's all just objects... yes
[5:19] * nhm (~nhm@253-231-179-208.static.tierzero.net) has joined #ceph
[5:33] <scheuk> ok, I get it now, thanks
[5:35] * deepsa_ (~deepsa@ has joined #ceph
[5:41] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[5:41] * deepsa_ is now known as deepsa
[6:05] * EmilienM (~EmilienM@ has joined #ceph
[6:29] * lxo (~aoliva@lxo.user.oftc.net) Quit (Ping timeout: 480 seconds)
[6:34] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[6:39] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[6:44] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[7:20] * senner (~Wildcard@68-113-228-89.dhcp.stpt.wi.charter.com) Quit (Quit: Leaving.)
[7:25] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[7:31] * deepsa_ (~deepsa@ has joined #ceph
[7:32] * tightwork (~tightwork@ Quit (Read error: Operation timed out)
[7:35] * deepsa (~deepsa@ Quit (Ping timeout: 480 seconds)
[7:35] * deepsa_ is now known as deepsa
[7:43] * exec (~defiler@ has joined #ceph
[7:45] * exec (~defiler@ Quit ()
[7:45] * exec (~defiler@ has joined #ceph
[7:46] * exec (~defiler@ Quit ()
[7:46] * exec (~defiler@ has joined #ceph
[7:54] * tnt (~tnt@113.39-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:59] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[7:59] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit ()
[8:08] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[8:22] * Qten (Q@qten.qnet.net.au) has joined #ceph
[8:22] * Qu310 (Q@qten.qnet.net.au) Quit (Read error: Connection reset by peer)
[8:22] * loicd (~loic@brln-4db801aa.pool.mediaWays.net) has joined #ceph
[8:50] * nhm (~nhm@253-231-179-208.static.tierzero.net) Quit (Ping timeout: 480 seconds)
[9:03] * Cube (~Adium@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[9:08] * BManojlovic (~steki@ has joined #ceph
[9:10] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has joined #ceph
[9:20] * renzhi (~renzhi@ Quit (Ping timeout: 480 seconds)
[9:22] * verwilst (~verwilst@d5152FEFB.static.telenet.be) has joined #ceph
[9:28] * Leseb (~Leseb@ has joined #ceph
[10:16] <NaioN> anyone any idea how to online grow a rbd device?
[10:16] <NaioN> i've resized it with rbd resize --size X pool/imagename
[10:16] <pmjdebru1jn> +so it doesn't have to be unmounted
[10:16] <NaioN> and with rbd info it states the new size
[10:17] <NaioN> but is there a way to have the client see the new size? like something as rescan for a scsi device?
[10:17] <Leseb> is your device mounted?
[10:17] <NaioN> no it's re-exported as a iscsi device with LIO
[10:19] <NaioN> I hoping I can resize it without un-exporting the volume and unmapping the rbd and remapping the rbd and re-exporting the volume
[10:19] <Leseb> there is no need to unman the device
[10:20] <NaioN> hmmm the client doesn't see the change in size
[10:20] <NaioN> do I need to trigger something?
[10:20] <Leseb> I don't know if it's comparable but let me explain me setup
[10:20] <NaioN> i've tried /sys/bus/rbd/device/X/refresh
[10:21] <NaioN> Leseb: ok
[10:21] <Leseb> I use rbd mapped device, mount it and export via NFS, I can easily resize the rbd image, but if the image is mounted on the client the system never see the block device changing
[10:21] <Leseb> the only way is to umount and re-mount
[10:22] <NaioN> but if you umount and re-mount it automaticly sees the resize?
[10:22] <Leseb> when I unmount the client system sees the block device changes
[10:22] <NaioN> aha ok
[10:22] <Leseb> I dropped something about that on the ML couple of weeks
[10:22] <NaioN> well it could be that LIO also "locks" the rbd, so maybe it's only nessecary to un-export
[10:22] <Leseb> still have no clue, and tried everything
[10:23] <Leseb> I thing so
[10:23] <NaioN> ok thx I'll try that
[10:23] <Leseb> read this thread http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/8013
[10:24] <Leseb> at the end, the only solution for me is to umount and remount :(
[10:25] <Leseb> NaioN: afk :)
[10:25] <NaioN> i see
[10:26] <NaioN> well I have to experiment a bit :)
[10:26] <NaioN> the case is not exactly the same because we are exporting it as a iscsi volume
[11:07] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[11:08] * yoshi (~yoshi@p22043-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[11:52] <wido> NaioN: Whenever a block device is in use you can't resize it
[11:52] <wido> RBD -> ietd -> iSCSI -> ext4. Something like that?
[11:52] <wido> On the iSCSI initiator side you will need to unmount the filesystem to do a refresh
[11:53] <wido> Although if you are using ietd you will need to restart ietd, since it caches the LUN size
[11:53] <wido> or are you using tgtd?
[11:54] * Leseb_ (~Leseb@ has joined #ceph
[11:54] * Leseb (~Leseb@ Quit (Read error: Connection reset by peer)
[11:54] * Leseb_ is now known as Leseb
[11:54] <wido> Leseb: NL?
[11:55] <Leseb> wido: Netherlands
[11:56] <wido> Leseb: Ik bedoelde meer of je ook uit NL kwam :)
[11:57] * pmjdebru1jn and NaioN too
[11:57] <Leseb> wido: I'm currently leaving in the Netherlands but I don't speak Dutch :p (just got a translation from a guy at the office ^^)
[12:01] <wido> Leseb: Get it :) I'm 100% dutch though
[12:01] <pmjdebru1jn> living I assume not leaving :)
[12:01] <Leseb> pmjdebru1jn: oops yes living :/
[12:01] <pmjdebru1jn> :D
[12:02] <wido> Shouldn't we have a dutch Ceph meeting?
[12:02] <Leseb> wido: we could
[12:02] <Leseb> wido: pm?
[12:02] * Fruit ??? dutch too
[12:03] <wido> This channel is flooded with dutchies!
[12:03] <Leseb> haha
[12:03] <pmjdebru1jn> Fruit: oh hi... still at the KUB?
[12:03] <Fruit> heh, that's what it was called 10 years ago, yes :P
[12:04] <pmjdebru1jn> right :)
[12:05] <Fruit> I believe TiU is the most recent name. it seems to change a lot.
[12:05] <wido> All playing around with CephFS, RBD or RADOS?
[12:05] <pmjdebru1jn> well, it's original ""correct"" abbreviation was rather, erhm, well... :)
[12:05] <pmjdebru1jn> wido: RBD mostly over here
[12:06] <Fruit> pmjdebru1jn: heh that's an urban legend
[12:06] <pmjdebru1jn> Fruit: funny one though :)
[12:06] <pmjdebru1jn> Fruit: funny urban legelds are particularly hard to kill :)
[12:06] <Fruit> apparently "tiu" is vulgar chinese for male reproductive organ
[12:06] <pmjdebru1jn> lol
[12:07] <pmjdebru1jn> and another urban legend is born :)
[12:07] <Fruit> this one is not made up though
[12:07] <Fruit> oh well.
[12:07] <Fruit> afk lunch ;)
[12:07] <pmjdebru1jn> bon apetit
[12:09] <Leseb> pmjdebru1jn: bon app??tit I assume not apetit =P
[12:10] <pmjdebru1jn> right
[12:10] <pmjdebru1jn> touche :)
[12:11] <Leseb> french people never says 'touch??', I don't understand why non-french people use it :(
[12:12] <pmjdebru1jn> :)
[12:14] <rosco> wido: Dutch Ceph meeting sounds good :)
[12:18] <pmjdebru1jn> oh noes another one :)
[12:20] <jluis> that woul be nice
[12:20] <jluis> *would
[12:20] * jluis is now known as joao
[12:23] <wido> rosco: No.. Even you here!
[12:23] <wido> Like I said, this channel is flooded with dutchies
[12:23] <joao> if you guys go with that idea, let me know
[12:24] <joao> I'd probably give an arm and my first born for ceph meetins in europe :p
[12:25] <joao> *meetings
[12:25] <wido> joao: You're welcome! Spain, correct?
[12:27] <joao> Portugal, but close enough :p
[12:28] <wido> joao: Yes, you are right. I remember
[12:29] <liiwi> we can have one tonight in Helsinki :)
[12:30] <joao> looks like there is quite the European presence around :)
[12:30] <liiwi> at this time of the day, at least
[12:32] <wido> liiwi: 27 hours of driving from my place. Sorry, not going to make it tonight
[12:32] * joao checks for flights
[12:33] <wido> But it's nice to see more European presence at day. It always was very USA focussed. Nice to be able to help/share at normal times of the day
[12:41] <wido> joao: Fixed a lot of btrfs bugs lately?
[12:41] <joao> been working on the monitor for quite a while now
[12:41] <joao> closing in on a major rework
[12:42] <wido> Ah, nice :)
[12:43] <wido> From time to time it still see my monitors go OOM
[12:46] <wido> I'm out to lunch
[12:47] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[12:47] * BManojlovic (~steki@ has joined #ceph
[12:51] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[13:31] * tnt (~tnt@113.39-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[13:45] * tightwork (~tightwork@ has joined #ceph
[14:16] * tightwork (~tightwork@ Quit (Ping timeout: 480 seconds)
[14:53] * s15y (~s15y@sac91-2-88-163-166-69.fbx.proxad.net) Quit (Ping timeout: 480 seconds)
[15:22] * deepsa (~deepsa@ Quit (Quit: Computer has gone to sleep.)
[15:32] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) has joined #ceph
[15:33] * tjpatter (~tjpatter@ has joined #ceph
[15:35] * senner (~Wildcard@68-113-228-89.dhcp.stpt.wi.charter.com) has joined #ceph
[15:58] * verwilst (~verwilst@d5152FEFB.static.telenet.be) Quit (Quit: Ex-Chat)
[16:02] * nhm (~nhm@253-231-179-208.static.tierzero.net) has joined #ceph
[16:05] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[16:33] * deepsa (~deepsa@ has joined #ceph
[16:37] * Hobbz (~trhoden@pool-108-28-184-160.washdc.fios.verizon.net) has joined #ceph
[16:38] * Hobbz (~trhoden@pool-108-28-184-160.washdc.fios.verizon.net) Quit ()
[16:39] * trhoden (~trhoden@pool-108-28-184-160.washdc.fios.verizon.net) has joined #ceph
[17:03] * kibbu (claudio@owned.ethz.ch) Quit (Remote host closed the connection)
[17:10] * BManojlovic (~steki@ Quit (Quit: Ja odoh a vi sta 'ocete...)
[17:14] * kibbu (claudio@owned.ethz.ch) has joined #ceph
[17:30] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[17:39] * BManojlovic (~steki@ has joined #ceph
[17:58] * Tv_ (~tv@ has joined #ceph
[17:59] * tnt (~tnt@113.39-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:18] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[18:20] * Leseb_ (~Leseb@ has joined #ceph
[18:20] * Leseb (~Leseb@ Quit (Read error: Connection reset by peer)
[18:20] * Leseb_ is now known as Leseb
[18:23] * Cube (~Adium@ has joined #ceph
[18:28] * Leseb (~Leseb@ Quit (Quit: Leseb)
[18:31] * Cube1 (~Adium@ has joined #ceph
[18:32] * Cube (~Adium@ Quit (Read error: Connection reset by peer)
[18:42] * nhm (~nhm@253-231-179-208.static.tierzero.net) Quit (Ping timeout: 480 seconds)
[18:46] * jbd_ (~jbd_@34322hpv162162.ikoula.com) has left #ceph
[18:55] * joshd (~joshd@2607:f298:a:607:221:70ff:fe33:3fe3) has joined #ceph
[19:00] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[19:01] * nhm (~nhm@ has joined #ceph
[19:01] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Ping timeout: 480 seconds)
[19:07] * chutzpah (~chutz@ has joined #ceph
[19:07] * The_Bishop (~bishop@2a01:198:2ee:0:a5f6:6779:de5:5e2e) has joined #ceph
[19:14] * lofejndif (~lsqavnbok@9KCAAAVLB.tor-irc.dnsbl.oftc.net) has joined #ceph
[19:21] * lofejndif (~lsqavnbok@9KCAAAVLB.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[19:30] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[19:47] * cattelan (~cattelan@2001:4978:267:0:21c:c0ff:febf:814b) has joined #ceph
[19:48] * dmick (~dmick@ has joined #ceph
[19:50] * ninkotech (~duplo@ Quit (Quit: Konversation terminated!)
[19:51] * ninkotech (~duplo@ has joined #ceph
[19:55] * allsystemsarego (~allsystem@ has joined #ceph
[20:04] <sagewk> yehudasa: can you sanity-check wip-crypto for me please?
[20:05] * glowell_ (~glowell@c-98-210-226-131.hsd1.ca.comcast.net) Quit (Remote host closed the connection)
[20:14] * steki-BLAH (~steki@ has joined #ceph
[20:15] * glowell1 (~Adium@c-98-210-226-131.hsd1.ca.comcast.net) has joined #ceph
[20:16] * The_Bishop (~bishop@2a01:198:2ee:0:a5f6:6779:de5:5e2e) Quit (Ping timeout: 480 seconds)
[20:18] * BManojlovic (~steki@ Quit (Ping timeout: 480 seconds)
[20:21] <scheuk> I am running 0.48.1 of ceph, and would like to add an osd at a specific ID
[20:22] <scheuk> when I do a 'ceph osd create 41'
[20:22] <scheuk> it defaults to the 0
[20:23] <scheuk> the 0 ID
[20:24] <scheuk> is specifying the ID broken in 0.48.1
[20:24] <scheuk> ?
[20:26] * The_Bishop (~bishop@2a01:198:2ee:0:5d8b:da0:ca25:a5b7) has joined #ceph
[20:26] <dmick> it seems a bit odd to me that you would not already have a 0; is that correct?
[20:28] <gregaf> I don't think you specify IDs with "ceph osd create" ?????it allocates one for you
[20:30] <dmick> you *can* specify one, but it's optional
[20:31] <dmick> however, it doesn't necessarily work
[20:31] <gregaf> you can specify a UUID, but not the integer ID
[20:31] <gregaf> at least in .48.1 (I'm looking at the source), but I'm not sure how that behavior might have changed over the command's lifetime
[20:31] <dmick> hm. "osd-id" is a little ambiguous I suppose
[20:32] <gregaf> presumably we didn't change what it did between .48 and .48.1 ...
[20:32] <dmick> but yes, I see you're right
[20:32] <rturk> hi all - I'm getting ready to install a module to Redmine that will hopefully make the robots stop spinning out of control
[20:32] <dmick> are there robots spinning out of control?
[20:32] <rturk> going to require a restart, shouldn't be noticeable unless something goes wrong :)
[20:33] <rturk> ya, they're killing ceph.com every week or so
[20:33] <dmick> scheuk: so there's your answer; that parameter is actually the UUID, not the id. should probably change the usage message to clarify that.
[20:35] <scheuk> correct I do not have 0 osd
[20:35] <scheuk> dmick: thank you
[20:36] <dmick> heh, thank gregaf, but yw
[20:38] * lofejndif (~lsqavnbok@19NAABVM3.tor-irc.dnsbl.oftc.net) has joined #ceph
[20:41] <dmick> scheuk: see http://tracker.newdream.net/issues/2960
[20:41] <yehudasa> sagewk: yeah
[20:41] * deepsa (~deepsa@ Quit ()
[20:46] <rturk> ok, I'm done with my redmine work. let me know if anything unusual comes up
[20:46] <rturk> for the record, I installed the PluginBotsFilter plugin
[20:47] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) Quit (Remote host closed the connection)
[20:56] * aliguori (~anthony@cpe-70-123-145-39.austin.res.rr.com) has joined #ceph
[21:19] <Fruit> killer robots? that sounds exciting
[21:23] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[21:23] <dmick> from outer space
[22:03] <yehudasa> SpamapS: I've commented in launchpad on the issue yesterday, basically we think that the problems has already been addressed. Is there anything else that we can do now?
[22:08] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[22:14] * amatter (~amatter@ has joined #ceph
[22:16] <Tobarja> curiosity: is anyone aware of something a step above two boxes rsync'ing that might meet an office's needs for storage distributed over a couple of boxes? i'm beginning to wonder if ceph/gluster/moosefs/etc are overkill for me... (especially since clients are windows boxes that have to go through a samba middleman)
[22:18] <gregaf> depends on your exact requirements, but AndrewFS and derivatives sound like maybe what you want
[22:18] <gregaf> alternatively (evil closed source): Dropbox et al?
[22:19] <Tobarja> i have considered dropbox, but i have some files that i don't want going out into the cloud ;)
[22:19] <trhoden> Tobarja: since you mention rsync (which would mirror data in two places), what about drbd?
[22:19] <trhoden> Tobarja: not real flexible, but it's basically a networked RAID1
[22:21] <scheuk> has anyone seen this problem before: 4 storage nodes with 4 OSDs, client connected via kernel driver, created a new osd on one of the storage nodes, expansion went fine after adding to crush
[22:22] <scheuk> on the client tried copying a file, and the copy is blocked
[22:23] <scheuk> and I'm seeing log [WRN] : 8 slow requests, 1 included below; oldest blocked for > 480.750413 secs
[22:23] <scheuk> in the exisiting osd log of the storage node where I added the additional OSD
[22:27] <scheuk> and if I restart the exsisting osd processing, the client will start working again
[22:28] <amatter> hello. I'm new to ceph and installing on a fresh ubuntu 12.04 machine. I'm following the 5-min quick start installations but getting a "/tmp/mkcephfs.0AxQ8FixEm/Key.*: No such file or directory" error on the mkcephfs cmd creating the keyring. Full dialog is here http://pastebin.com/398P1wEv
[22:28] <amatter> Not sure what to check
[22:30] * Ryan_Lane (~Adium@ has joined #ceph
[22:32] <scheuk> amatter: is there anything in: /tmp/mkcephfs.0AxQ8FixEm ?
[22:33] <amatter> no /tmp is void of directories
[22:33] <NaioN> scheuk: what tells ceph -s
[22:33] <trhoden> amatter: I ran into a similar problem yesterday. Was able to figure it out using strace.
[22:34] <trhoden> amatter: in my case, it was that I wasn't providing "-c /etc/ceph/ceph.conf", even though the man page says that it is optional (and I was using the default value)
[22:34] <trhoden> amatter: but your pastebin shows that you were using that option.
[22:34] <scheuk> naion: when the osd is broken?
[22:35] <NaioN> euhmmm the osd crashes?
[22:35] <NaioN> because that ofcourse explains the block for the client
[22:36] <scheuk> no the osd is running
[22:36] <Tobarja> line 420 of /sbin/mkcephfs looks like it cleans up /tmp/XXX at end of run... try commenting it out, rerunning, and taking a peek?
[22:36] <amatter> trhoden- reading a strace now to see if I can find any more hints
[22:36] <sjust> scheuk: what is the output of ceph -s?
[22:37] <NaioN> scheuk: you have a cluster with 5 osds? all up and in? and all pg's active+clean?
[22:37] <scheuk> right now it's working after restarting the osd
[22:37] <scheuk> root@de8-39-35-2a-3d-44:~# ceph -s
[22:37] <scheuk> health HEALTH_OK
[22:37] <scheuk> monmap e1: 3 mons at {40=,50=,60=}, election epoch 2, quorum 0,1,2 40,50,60
[22:37] <scheuk> osdmap e63: 6 osds: 6 up, 6 in
[22:37] <scheuk> pgmap v14535: 13632 pgs: 13632 active+clean; 17531 MB data, 38686 MB used, 25658 GB / 25696 GB avail
[22:37] <scheuk> mdsmap e20: 1/1/1 up {0=70=up:active}, 3 up:standby
[22:37] <NaioN> this all looks ok
[22:37] <scheuk> I'll do another expansios to 7 OSDs and run ceph -s before restarting the OSD
[22:38] <scheuk> give me a few :)
[22:38] <NaioN> ceph -w
[22:38] <NaioN> you can watch the process
[22:39] <amatter> :420
[22:39] <NaioN> if you expand the pg's will be spread over all the osds (after including the new osds in the crushmap)
[22:39] <NaioN> so a lot of data gets to moved around
[22:40] <scheuk> yeah that makes sense
[22:40] <scheuk> I see that right after doing a "eph osd crush set 1 osd.1 1.0 pool=default rack=unknownrack host=storage.storage5"
[22:41] <scheuk> during that period and right after the client is blocked from reading or writing
[22:42] <NaioN> yeah that's normal
[22:42] <NaioN> the osdmap gets updated
[22:43] <NaioN> but after a while the writes and reads should continu
[22:43] <scheuk> even when the osd's are done rebalanced and when ceph -s goes healthy
[22:43] <scheuk> the client is still blocked
[22:43] <NaioN> and the client will also contact the new osds
[22:44] <NaioN> hmmm that isn't correct
[22:44] <amatter> Instead of using the -a flag, I'm running each of the commands to initialize each part of the cluster. looks like the issue is in "sudo mkcephfs -d /tmp/foo --prepare-mon" so now I'm off to read the mkcephs script and see what's failing
[22:44] <scheuk> then when I restart the OSD that's stuck, the client is unblocked
[22:44] <NaioN> the client should continu while the cluster is still rebalancing
[22:45] <NaioN> stuck?
[22:45] <NaioN> how can the cluster finish rebalancing if an osd is stuck?
[22:46] <NaioN> how do you know which osd is stuck?
[22:47] <scheuk> the log shows things like: log [WRN] : 33 slow requests, 6 included below; oldest blocked for > 1035.761830 secs
[22:47] <scheuk> and
[22:47] <scheuk> log [WRN] : slow request 480.812221 seconds old, received at 2012-08-17 14:40:20.125573: osd_op(client.4711.1:4596 1000000a57f.0000001d [read 0~4194304 [1@-1]] 0.836ae225 RETRY) currently no flag points reached
[22:49] <NaioN> and those messages are still comming when the cluster is in healthy state?
[22:49] <scheuk> yes
[22:49] <scheuk> and it's only one of the OSDs
[22:49] <scheuk> coming from one of the osds
[22:49] <NaioN> one of the new?
[22:50] <NaioN> is there a difference in speed between the osds?
[22:50] <scheuk> yes the exsisting osd where I added a new osd
[22:50] <scheuk> there is, but very little difference
[22:50] <NaioN> you mean the existing node with a new osd
[22:50] <scheuk> 8 disk raid 0 vs a 7 disk raid 0
[22:51] <scheuk> naion: yes
[22:51] <NaioN> euhmmm 8 disks raid 0?
[22:51] <NaioN> well that's one osd daemon you run on top of it?
[22:51] <scheuk> yes
[22:51] <scheuk> yes
[22:52] <NaioN> you mean 4x 2disk/raid0 setup?
[22:52] <scheuk> it's a hardware raid controller
[22:52] <NaioN> aha you made more volumes?
[22:53] <scheuk> no
[22:53] <scheuk> the server has 16 disks
[22:53] <scheuk> 1 OS, 7 in a raid 0, and the last 8 in a raid 0
[22:53] <scheuk> so I see sda sdb sdc in linux
[22:53] <NaioN> k
[22:54] <scheuk> sdc is the first exsisting osd
[22:54] <NaioN> you know that's real risky? raid0 on 7 or 8 disks? :)
[22:54] <scheuk> that's what the replication is used for :)
[22:54] <NaioN> hehe you're stretching it a bit... :)
[22:54] <NaioN> but ok it should work
[22:54] <NaioN> so you run 2 osds on that node
[22:55] <scheuk> yes
[22:55] <NaioN> one on sdb and one on sdc
[22:55] <scheuk> correct
[22:55] <scheuk> I started with sdc as the first 4
[22:55] <scheuk> on the 4 servers
[22:55] <NaioN> first 4?
[22:55] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[22:55] <scheuk> correct
[22:55] <NaioN> what do you mean with that?
[22:56] <scheuk> 4 servers with 2 disks/osds in the end
[22:56] <NaioN> ok
[22:57] <scheuk> then I have been adding the sdb one at time to the cluster
[22:57] <NaioN> so you have 4 servers with each 2 raid 0 volumes and on each an osd daemon
[22:57] <scheuk> yes
[22:57] <NaioN> well first 4 and than the other 4
[22:57] <NaioN> ok
[22:58] <scheuk> when add the sdb, I do a mkfs.xfs -f
[22:58] <scheuk> update fstab
[22:58] <scheuk> mkdir /srv/dev/osd#
[22:58] <scheuk> mount /srv/dev/osd#
[22:58] <scheuk> then edit ceph.conf
[22:59] <scheuk> co a cepg-osd -i # --mkfs --mkkey
[22:59] <scheuk> do a
[22:59] <scheuk> add it to mon
[22:59] <scheuk> cepg auth add osd.#
[22:59] <scheuk> start the service
[22:59] <scheuk> ceph osd tree show's the new osd
[23:00] <NaioN> yeah if it turns up in the tree and it has a weight it will get data
[23:00] <scheuk> everything has a weight of 1
[23:00] <scheuk> this is just a test setup for now
[23:01] <scheuk> we are looking at replacing a gluster cluster
[23:01] <NaioN> ok, isn't perfect in your case because the volumes/osds aren't the same size
[23:01] <NaioN> but ok that shouldn't be a problem
[23:01] <joshd> scheuk: there's two places the osds need to show up - the osdmap and the crushmap
[23:01] <joshd> ceph osd tree displays the crushmap
[23:01] <joshd> ceph osd dump displays the osdmap
[23:01] <joshd> 'ceph osd create' adds it to the osdmap
[23:02] <joshd> 'ceph crush add ...' adds it to the crushmap
[23:02] <scheuk> joshd: yes
[23:02] <scheuk> that all works
[23:02] <scheuk> I am about todo a ceph crush add
[23:02] <joshd> ok, just wanted to make sure it was clear
[23:03] <scheuk> root@de8-39-35-2a-3d-44:~# ceph -s
[23:03] <scheuk> health HEALTH_WARN 348 pgs peering; 22 pgs recovering; recovery 70/8816 degraded (0.794%)
[23:03] <scheuk> monmap e1: 3 mons at {40=,50=,60=}, election epoch 2, quorum 0,1,2 40,50,60
[23:03] <scheuk> osdmap e67: 7 osds: 7 up, 7 in
[23:03] <scheuk> pgmap v14564: 13632 pgs: 101 active, 13161 active+clean, 348 peering, 22 active+recovering; 17531 MB data, 39233 MB used, 29568 GB / 29606 GB avail; 70/8816 degraded (0.794%)
[23:03] <scheuk> mdsmap e20: 1/1/1 up {0=70=up:active}, 3 up:standby
[23:03] <scheuk> after I did a ceph add
[23:03] <scheuk> ceph crush add
[23:03] <NaioN> yeah thats correct
[23:03] <scheuk> now the client is still connected
[23:04] <NaioN> because the crushmap changes the pg's get redistributed
[23:04] <scheuk> if I try to copy a file, it will lock the copy
[23:04] * nhm (~nhm@ Quit (Ping timeout: 480 seconds)
[23:04] <scheuk> copy a file on the client
[23:05] <NaioN> well in the state you pasted I can imagine that the client blocks
[23:05] <scheuk> ok
[23:05] <NaioN> but after a while a lot of pgs should be active+clean and some should be degraded+remapped+backfill or something
[23:06] <scheuk> ok the balance just finished
[23:06] <scheuk> root@de8-39-35-2a-3d-44:~# ceph -w
[23:06] <scheuk> health HEALTH_OK
[23:06] <scheuk> monmap e1: 3 mons at {40=,50=,60=}, election epoch 2, quorum 0,1,2 40,50,60
[23:06] <scheuk> osdmap e67: 7 osds: 7 up, 7 in
[23:06] <scheuk> pgmap v14655: 13632 pgs: 13617 active+clean, 15 active+clean+replay; 18559 MB data, 41266 MB used, 29566 GB / 29606 GB avail
[23:06] <scheuk> mdsmap e20: 1/1/1 up {0=70=up:active}, 3 up:standby
[23:06] <NaioN> some are still in replay mode
[23:06] <scheuk> ok
[23:07] <NaioN> but they should accept writes/reads
[23:07] <NaioN> because they're active
[23:07] <scheuk> the client is currently blocked, and I am seeing:
[23:07] <scheuk> osd.60 [WRN] slow request 30.064095 seconds old, received at 2012-08-17 16:06:01.199278: osd_op(client.4711.1:6540 1000000a581.00000033 [read 0~4194304 [1@-1]] 0.19778d89 RETRY) currently no flag points reached
[23:07] <scheuk> sd.60 [WRN] 8 slow requests, 1 included below; oldest blocked for > 105.373584 secs
[23:08] <NaioN> well you have still 15 in replay
[23:08] <NaioN> are those also going to active+clean?
[23:08] * sjustlaptop (~sam@ has joined #ceph
[23:09] <scheuk> naion: how do I tell?
[23:09] <scheuk> root@de8-39-35-2a-3d-44:~# ceph -s
[23:09] <scheuk> health HEALTH_OK
[23:09] <scheuk> monmap e1: 3 mons at {40=,50=,60=}, election epoch 2, quorum 0,1,2 40,50,60
[23:09] <scheuk> osdmap e67: 7 osds: 7 up, 7 in
[23:09] <scheuk> pgmap v14671: 13632 pgs: 13617 active+clean, 15 active+clean+replay; 18543 MB data, 41303 MB used, 29566 GB / 29606 GB avail
[23:09] <scheuk> mdsmap e20: 1/1/1 up {0=70=up:active}, 3 up:standby
[23:09] <NaioN> well keep watching ceph -s or ceph -w
[23:09] <scheuk> ok
[23:09] * dspano (~dspano@rrcs-24-103-221-202.nys.biz.rr.com) Quit (Quit: Leaving)
[23:09] <NaioN> hmmm you can use ceph -w
[23:09] <NaioN> but those 15 should go to active+clean
[23:10] <NaioN> else you have some pgs in trouble
[23:10] <NaioN> you could look with ceph health detail
[23:10] <NaioN> it should tell you which pgs are still in replay mode and I hope why
[23:11] <scheuk> root@de8-39-35-2a-3d-44:~# ceph health detail
[23:11] <scheuk> HEALTH_OK
[23:11] <NaioN> hmmm weird
[23:12] <scheuk> very
[23:12] <NaioN> and what tells ceph -s at the moment?
[23:12] <NaioN> still those 15 in replay?
[23:12] <scheuk> root@de8-39-35-2a-3d-44:~# ceph -s
[23:12] <scheuk> health HEALTH_OK
[23:12] <scheuk> monmap e1: 3 mons at {40=,50=,60=}, election epoch 2, quorum 0,1,2 40,50,60
[23:12] <scheuk> osdmap e67: 7 osds: 7 up, 7 in
[23:12] <scheuk> pgmap v14704: 13632 pgs: 13617 active+clean, 15 active+clean+replay; 19559 MB data, 43109 MB used, 29564 GB / 29606 GB avail
[23:12] <scheuk> mdsmap e20: 1/1/1 up {0=70=up:active}, 3 up:standby
[23:12] <scheuk> yes
[23:12] <scheuk> seems like they are stuck
[23:12] <NaioN> yeah they stuck
[23:13] <Tobarja> what do you do then?
[23:14] <NaioN> scheuk: try ceph pg dump_stuck
[23:14] <NaioN> you can add 3 options
[23:14] <scheuk> root@de8-39-35-2a-3d-44:~# ceph pg dump_stuck
[23:14] <scheuk> Must specify inactive or unclean or stale.
[23:14] <NaioN> stale inactive or unclean
[23:15] <scheuk> root@de8-39-35-2a-3d-44:~# ceph pg dump_stuck inactive
[23:15] <scheuk> ok
[23:15] <scheuk> root@de8-39-35-2a-3d-44:~# ceph pg dump_stuck unclean
[23:15] <scheuk> ok
[23:15] <scheuk> root@de8-39-35-2a-3d-44:~# ceph pg dump_stuck stale
[23:15] <scheuk> ok
[23:15] <scheuk> hmm
[23:15] <scheuk> nothing is reporting
[23:16] <NaioN> hmmm i'm afraid i can't help you, maybe one of the developers in this channel can help you further
[23:16] <scheuk> ok
[23:16] <NaioN> but those 15 should go to active+clean
[23:16] <NaioN> and i think if your restart the osd those go to active_clean
[23:17] <NaioN> because you told earlier that restarting the right osds makes the client function again
[23:17] <NaioN> you could look in the log of that osd
[23:17] <scheuk> yeah
[23:17] <NaioN> well first state a higher log level for that osd
[23:17] <NaioN> so you see more
[23:17] <scheuk> there is'nt anything out the ordinary
[23:18] <scheuk> I keep seeing in ceph -w
[23:18] <scheuk> 2012-08-17 16:17:12.380083 osd.60 [WRN] 71 slow requests, 4 included below; oldest blocked for > 733.487829 secs
[23:18] <scheuk> 2012-08-17 16:17:12.380093 osd.60 [WRN] slow request 480.990521 seconds old, received at 2012-08-17 16:09:11.389442: osd_op(client.4711.1:6392 1000000a586.000000c2 [write 0~4194304 [1@-1]] 0.1f144550 RETRY snapc 1=[]) currently no flag points reached
[23:18] <scheuk> the slow requests keep adding up
[23:18] <NaioN> Well it looks like the osd gets flooded
[23:19] <NaioN> if I add disks/osds to my cluster I also see some slow request, but after a while they go away
[23:19] <NaioN> not after the cluster is rebalanced
[23:20] <scheuk> are you using xfs?
[23:30] * lofejndif (~lsqavnbok@19NAABVM3.tor-irc.dnsbl.oftc.net) Quit (Quit: gone)
[23:32] * maelfius (~Adium@ has joined #ceph
[23:32] * loicd (~loic@brln-4db801aa.pool.mediaWays.net) Quit (Quit: Leaving.)
[23:34] * lofejndif (~lsqavnbok@09GAAHL9M.tor-irc.dnsbl.oftc.net) has joined #ceph
[23:35] <scheuk> ok I just restarted the osd and the 15 replays disapeerd
[23:40] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) Quit (Quit: Leaving)
[23:41] * guerby (~guerby@nc10d-ipv6.tetaneutral.net) has joined #ceph
[23:58] * senner (~Wildcard@68-113-228-89.dhcp.stpt.wi.charter.com) Quit (Ping timeout: 480 seconds)
[23:59] * sjustlaptop (~sam@ Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.