#ceph IRC Log

Index

IRC Log for 2014-06-29

Timestamps are in GMT/BST.

[0:01] * cronix1 (~cronix@5.199.139.166) Quit (Ping timeout: 480 seconds)
[0:04] * joef (~Adium@c-67-188-220-98.hsd1.ca.comcast.net) has joined #ceph
[0:05] <iggy> adding nodes (eventually)
[0:06] <iggy> I'd guess they plan on it eventually being the cluster membership setup tool
[0:07] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Quit: Ex-Chat)
[0:10] <MACscr> is it the recommended way to deploy? i want to make sure i do it right so that i learn the system
[0:10] <MACscr> and eventually i want to write my own puppet module for it
[0:11] <MACscr> (i dont learn as much when i use another persons puppet module and sometimes they are overly complex for my needs)
[0:13] * cronix1 (~cronix@5.199.139.166) has joined #ceph
[0:15] <MACscr> iggy: ^
[0:16] <iggy> it's a fairly common method (as is chef, puppet, etc.)
[0:16] * xarses (~andreww@c-76-103-129-113.hsd1.ca.comcast.net) has joined #ceph
[0:17] * cronix1 (~cronix@5.199.139.166) Quit (Read error: Operation timed out)
[0:17] <MACscr> iggy: what im trying to do and i havent really found a solid guide for yet is to do a cache ssd pool in front of my sata pool
[0:18] <MACscr> though i dont have anything setup yet. all from scratch
[0:18] <iggy> that's because it's fairly new functionality
[0:19] <MACscr> its been stable a couple months hasnt it? and was in development/beta releases for 6 or so months, right?
[0:19] * joef (~Adium@c-67-188-220-98.hsd1.ca.comcast.net) Quit (Read error: Connection reset by peer)
[0:23] <iggy> that's not really very long...
[0:29] * haomaiwa_ (~haomaiwan@121.48.184.24) has joined #ceph
[0:30] * cronix1 (~cronix@5.199.139.166) has joined #ceph
[0:30] * mlausch (~mlausch@2001:8d8:1fe:7:6d79:9831:8bba:e7e) Quit (Ping timeout: 480 seconds)
[0:34] * haomaiwang (~haomaiwan@121.48.186.45) Quit (Ping timeout: 480 seconds)
[0:35] * cronix1 (~cronix@5.199.139.166) Quit (Read error: Operation timed out)
[0:36] * xarses (~andreww@c-76-103-129-113.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[0:37] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[0:39] * mlausch (~mlausch@2001:8d8:1fe:7:6d0e:e50d:3e2e:1567) has joined #ceph
[0:41] <sherry> what happens to my Ceph OSDs that fail? I would not able to bring them back again and even when I try to restart my server I'll get this error > lock_fsid failed to lock /var/lib/ceph/osd/ceph-6/fsid, is another ceph-osd still running? (11) Resource temporarily unavailable
[1:03] <MACscr> iggy: hmm, so im trying out ceph deploy and while i already installed ceph manually on all the nodes with apt-get, it seemed to be able to connect to all my systems fine and showed they already had the files they needed. I then tried to create 3 monitors on those systems and seemed to get timeout issues:
[1:03] <MACscr> http://hastebin.com/ukafowotek.mel
[1:04] <MACscr> whats odd though is that it didnt seem to happen on the third node
[1:04] <MACscr> and they should pretty much all 3 be the same
[1:23] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[1:23] <MACscr> wth, ceph-deploy doesnt support creating monitors?
[1:23] <MACscr> that seems odd
[1:31] <MACscr> hmm, purge data command doesnt seem to work properly either. [ceph_deploy][ERROR ] RuntimeError: refusing to purge data while ceph is still installed
[1:32] <MACscr> the guide says to run that if you want to start over
[1:51] * BManojlovic (~steki@cable-94-189-160-74.dynamic.sbb.rs) Quit (Ping timeout: 480 seconds)
[2:02] <MACscr> ah, i was following an older guide i guess
[2:02] * ScOut3R (~ScOut3R@4E5CC1B5.dsl.pool.telekom.hu) has joined #ceph
[2:13] * ScOut3R (~ScOut3R@4E5CC1B5.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[2:36] * T-Rex (~user@tor-exit-node.7by7.de) has joined #ceph
[2:37] * T-Rex (~user@tor-exit-node.7by7.de) has left #ceph
[2:59] * xarses (~andreww@c-24-23-183-44.hsd1.ca.comcast.net) has joined #ceph
[3:01] * fmanana (~fdmanana@bl5-3-231.dsl.telepac.pt) has joined #ceph
[3:08] * fdmanana (~fdmanana@bl10-142-244.dsl.telepac.pt) Quit (Ping timeout: 480 seconds)
[3:16] * KevinPerks (~Adium@cpe-76-180-81-150.buffalo.res.rr.com) has joined #ceph
[3:17] * LeaChim (~LeaChim@host86-161-90-122.range86-161.btcentralplus.com) Quit (Ping timeout: 480 seconds)
[3:24] * vilobhmm (~vilobhmm@c-50-152-188-98.hsd1.ca.comcast.net) has joined #ceph
[3:25] * swills (~swills@mouf.net) Quit (Quit: Coyote finally caught me)
[3:30] * swills (~swills@mouf.net) has joined #ceph
[4:07] <MACscr> ok, well i screwed up when zaping disks during my ceph-deploy and i had to reprovision one of my servers. DOH! Anyway, do I have to restart the process or is there a way to add this one to the mix? I had only gotten as far as setting them up as monitors and got their keys back
[4:22] * dome-house (~g@pool-173-60-206-213.lsanca.fios.verizon.net) has joined #ceph
[4:22] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Read error: Operation timed out)
[4:31] * diegows (~diegows@190.190.5.238) Quit (Ping timeout: 480 seconds)
[4:35] * bkopilov (~bkopilov@213.57.16.131) Quit (Read error: Operation timed out)
[4:36] * vbellur (~vijay@122.172.236.116) has joined #ceph
[4:39] * KevinPerks (~Adium@cpe-76-180-81-150.buffalo.res.rr.com) Quit (Quit: Leaving.)
[4:39] * JCL (~JCL@c-24-23-166-139.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[5:00] * chrisjones (~chrisjone@12.237.137.162) Quit (Quit: chrisjones)
[5:02] * rturk|afk is now known as rturk
[5:02] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[5:02] * rturk is now known as rturk|afk
[5:07] * Vacum (~vovo@i59F7A287.versanet.de) has joined #ceph
[5:11] * chrisjones (~chrisjone@12.237.137.162) has joined #ceph
[5:13] * chrisjones (~chrisjone@12.237.137.162) Quit ()
[5:14] * Vacum_ (~vovo@88.130.206.241) Quit (Ping timeout: 480 seconds)
[5:20] * KevinPerks (~Adium@cpe-76-180-81-150.buffalo.res.rr.com) has joined #ceph
[5:21] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[5:28] * KevinPerks (~Adium@cpe-76-180-81-150.buffalo.res.rr.com) Quit (Ping timeout: 480 seconds)
[5:38] * lupu (~lupu@46.102.93.169) has joined #ceph
[5:49] * vbellur (~vijay@122.172.236.116) Quit (Read error: Operation timed out)
[5:59] * vbellur (~vijay@122.167.104.219) has joined #ceph
[6:04] * ScOut3R (~ScOut3R@4E5CC1B5.dsl.pool.telekom.hu) has joined #ceph
[6:04] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) has joined #ceph
[6:06] <longguang> hi
[6:14] * ScOut3R (~ScOut3R@4E5CC1B5.dsl.pool.telekom.hu) Quit (Ping timeout: 480 seconds)
[6:15] <MACscr> howdy longguang
[6:16] <dome-house> how is ceph different than memcached?
[6:16] <dome-house> :)
[6:16] <MACscr> lol, is that a troll attempt?
[6:18] <longguang> could you explain to me about 'osd heartbeat interval'
[6:19] <longguang> ceph is fs, memcache is a cache which store key and value in order to speed up access.
[6:23] <longguang> what is the connection between rbd and rados? if i use rbd if rados is must-use?
[6:27] <dome-house> m
[6:27] <dome-house> hm
[6:27] <MACscr> ceph can be a fs, but you dont have to use its fs
[6:29] <longguang> yes. ceph can be fs,rbd,object storage.
[6:46] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) Quit (Quit: Leaving.)
[7:02] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[7:10] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[7:12] <longguang> what is 'pod,pdu,row,rack,chassis'
[7:21] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[7:25] <MACscr> longguang: in reference to what?
[7:25] <MACscr> thats not ceph related
[7:27] <longguang> it belongs to crush map.
[7:32] * rwheeler (~rwheeler@107.17.67.64) Quit (Ping timeout: 480 seconds)
[7:36] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[7:37] * haomaiwang (~haomaiwan@112.193.131.164) has joined #ceph
[7:39] * jtaguinerd (~Adium@112.198.79.39) has joined #ceph
[7:41] * rwheeler (~rwheeler@107.17.67.64) has joined #ceph
[7:41] <jtaguinerd> hi guys. i am using the latest release for firefly (0.82). any one from here encountered the issue with reweight getting back to 1 when you restart the OSD?
[7:41] <jtaguinerd> i thought it was fixed already because that's what it said from the release notes
[7:44] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[7:44] * haomaiwa_ (~haomaiwan@121.48.184.24) Quit (Ping timeout: 480 seconds)
[8:01] * sarob (~sarob@2601:9:1d00:c7f:8026:f50e:7230:bf24) has joined #ceph
[8:09] * sarob (~sarob@2601:9:1d00:c7f:8026:f50e:7230:bf24) Quit (Ping timeout: 480 seconds)
[8:15] * DLange (~DLange@dlange.user.oftc.net) Quit (Quit: some kernel work required. CU soon. Hopefully :))
[8:24] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[8:25] * DLange (~DLange@dlange.user.oftc.net) has joined #ceph
[8:28] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Read error: Operation timed out)
[8:31] * rendar (~I@host173-6-dynamic.55-79-r.retail.telecomitalia.it) has joined #ceph
[8:32] * bitserker (~toni@70.59.79.188.dynamic.jazztel.es) has joined #ceph
[8:45] * hasues1 (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) has joined #ceph
[8:45] * hasues (~hazuez@108-236-232-243.lightspeed.knvltn.sbcglobal.net) Quit (Read error: Connection reset by peer)
[8:47] * haomaiwang (~haomaiwan@112.193.131.164) Quit (Ping timeout: 480 seconds)
[8:47] <MACscr> if i have a ssd cache pool and then a regular sata pool behind that, do i need journals for both?
[8:47] * imriz (~imriz@82.81.163.130) has joined #ceph
[8:52] * zigo (quasselcor@atl.apt-proxy.gplhost.com) Quit (Ping timeout: 480 seconds)
[8:56] * zigo (quasselcor@ipv6-ftp.gplhost.com) has joined #ceph
[9:01] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[9:03] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has joined #ceph
[9:06] * rwheeler (~rwheeler@107.17.67.64) Quit (Ping timeout: 480 seconds)
[9:07] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[9:08] * zidarsk8 (~zidar@89-212-142-10.dynamic.t-2.net) has left #ceph
[9:10] * BManojlovic (~steki@cable-94-189-160-74.dynamic.sbb.rs) has joined #ceph
[9:14] * rwheeler (~rwheeler@107.17.67.64) has joined #ceph
[9:20] * madkiss (~madkiss@2001:6f8:12c3:f00f:61e2:20ed:ccfb:4bac) has joined #ceph
[9:26] * lincolnb (~lincoln@c-67-165-142-226.hsd1.il.comcast.net) Quit (Read error: Operation timed out)
[9:32] * lincolnb (~lincoln@c-67-165-142-226.hsd1.il.comcast.net) has joined #ceph
[9:38] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[9:40] * Muhlemmer (~kvirc@cable-90-50.zeelandnet.nl) has joined #ceph
[9:42] <longguang> what do you mean 'behind that'?
[9:57] <lupu> he has a regular sata pool behind a ssd cache tier pool
[10:01] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[10:09] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[10:10] <lupu> MACscr: i might be wrong but my understanding is that the cache pool does not differ from 'regular' pools with anything
[10:13] * vbellur (~vijay@122.167.104.219) Quit (Read error: Operation timed out)
[10:20] * vilobhmm (~vilobhmm@c-50-152-188-98.hsd1.ca.comcast.net) has left #ceph
[10:26] * vbellur (~vijay@122.178.240.55) has joined #ceph
[10:39] <longguang> how to create cache tier pool?
[10:40] <longguang> i only know ssd can be the primary osd of pgs.
[10:42] * lupu (~lupu@46.102.93.169) Quit (Quit: Leaving.)
[10:43] * lupu (~lupu@46.102.93.169) has joined #ceph
[10:52] * amaron (~amaron@cable-178-148-239-68.dynamic.sbb.rs) has joined #ceph
[11:00] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[11:01] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[11:05] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[11:20] * beardo (~sma310@beardo.cc.lehigh.edu) Quit (Read error: Operation timed out)
[11:27] * LeaChim (~LeaChim@host86-161-90-122.range86-161.btcentralplus.com) has joined #ceph
[11:36] <sherry> scuttle|afk: ping
[11:37] <sherry> MACscr: yes, u need journal for every single OSD u have
[11:41] <sherry> jtaguinerd: in my osd tree, the weights are what is given by me but reweight shows 1 when my OSD is up!
[11:48] * amaron (~amaron@cable-178-148-239-68.dynamic.sbb.rs) Quit (Ping timeout: 480 seconds)
[12:01] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[12:05] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[12:09] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[12:26] * dome-house (~g@pool-173-60-206-213.lsanca.fios.verizon.net) has left #ceph
[12:55] <Anticimex> if i've understood the main criticism against using btrfs for osd, it's that the metadata structures decay over time, so write's takes longer and longer
[12:56] <Anticimex> i'm right now thinking: in a ssd cache tier + rotating drive tier setup, if the decay of btrfs isn't too fast, it could be used at least for the cache tier osds, with rolling scripted reformats every n week or so
[13:01] * sarob (~sarob@2601:9:1d00:c7f:69b1:686:d872:b35b) has joined #ceph
[13:02] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[13:04] <Gugge-47527> i would think the fragmentation on btrfs would matter much less on ssd
[13:04] <Anticimex> are there other production-readiness-stoppers on btrfs for ceph?
[13:05] * Anticimex is googling around
[13:05] <Anticimex> what i'm trying to achieve is to not need separate OSD journal
[13:06] <Anticimex> for the ssd cache tier at least
[13:06] <Gugge-47527> do you have 10Gbit network?
[13:07] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) has joined #ceph
[13:09] <Gugge-47527> if you dont, im pretty sure any decent ssd will be faster even with the journal colocated :)
[13:09] * sarob (~sarob@2601:9:1d00:c7f:69b1:686:d872:b35b) Quit (Ping timeout: 480 seconds)
[13:11] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[13:15] <Anticimex> will be 40G
[13:15] <Anticimex> journal hurts performance for the tier
[13:17] <Anticimex> without http://wiki.ceph.com/Planning/Sideboard/osd%3A_clone_from_journal_on_btrfs though, even btrfs parallel write to journal and osd will mean 2x writes. hmm
[13:29] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: Textual IRC Client: www.textualapp.com)
[13:36] <Anticimex> another question: how are IOPs propagated from a ssd cache pool to a spinner pool below? is there some coalescing if say many small writes to ssd cache pool ended up in a few amount of objects (rbd above), are then these objects potentially flushed in $very_few iops to regular pool below?
[13:37] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) has joined #ceph
[13:53] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[13:58] * rwheeler (~rwheeler@107.17.67.64) Quit (Read error: Operation timed out)
[13:59] * cronix1 (~cronix@5.199.139.166) has joined #ceph
[14:01] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[14:02] * rwheeler (~rwheeler@107.17.67.64) has joined #ceph
[14:03] * cronix1 (~cronix@5.199.139.166) Quit (Read error: Operation timed out)
[14:09] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[14:20] * joao (~joao@a79-168-5-220.cpe.netcabo.pt) has joined #ceph
[14:20] * ChanServ sets mode +o joao
[14:27] * diegows (~diegows@190.190.5.238) has joined #ceph
[14:40] * sputnik13 (~sputnik13@wsip-68-105-248-60.sd.sd.cox.net) Quit (Quit: My MacBook has gone to sleep. ZZZzzz???)
[14:51] * diegows (~diegows@190.190.5.238) Quit (Ping timeout: 480 seconds)
[15:01] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[15:01] * bkopilov (~bkopilov@nat-pool-tlv-t.redhat.com) Quit (Read error: Operation timed out)
[15:07] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[15:18] * rwheeler (~rwheeler@107.17.67.64) Quit (Read error: Operation timed out)
[15:20] * imjustmatthew (~imjustmat@pool-74-110-226-158.rcmdva.fios.verizon.net) has joined #ceph
[15:27] * jtaguinerd (~Adium@112.198.79.39) Quit (Read error: Connection reset by peer)
[15:37] * ScOut3R (~ScOut3R@catv-89-133-44-70.catv.broadband.hu) has joined #ceph
[15:51] <Gugge-47527> Anticimex: as far as i know only full objects are read/written from the pool below
[15:52] <Gugge-47527> Anticimex: so when you make a small write, the cache needs to fetch the full object from below, and update that object, and then later it will write that object back to the pool below
[15:53] * ScOut3R (~ScOut3R@catv-89-133-44-70.catv.broadband.hu) Quit (Ping timeout: 480 seconds)
[15:58] * diegows (~diegows@190.190.5.238) has joined #ceph
[16:01] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[16:07] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[16:09] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Quit: Ex-Chat)
[16:09] * madkiss (~madkiss@2001:6f8:12c3:f00f:61e2:20ed:ccfb:4bac) Quit (Quit: Leaving.)
[16:11] <Anticimex> right, that actually makes a lot of sense. i have read the blueprints and associated doc on cache pools. i just have to start testing it soon. :)
[16:11] <Anticimex> (before ordering racks of drives)
[16:15] * steveeJ (~junky@HSI-KBW-46-223-54-149.hsi.kabel-badenwuerttemberg.de) Quit (Remote host closed the connection)
[16:16] * steveeJ (~junky@HSI-KBW-46-223-54-149.hsi.kabel-badenwuerttemberg.de) has joined #ceph
[16:30] * bkopilov (~bkopilov@213.57.17.98) has joined #ceph
[16:41] * aldavud (~aldavud@217-162-119-191.dynamic.hispeed.ch) Quit (Ping timeout: 480 seconds)
[16:43] * JCL (~JCL@2601:9:5980:39b:bcf1:8f20:1986:d04d) has joined #ceph
[16:58] * dmsimard (~dmsimard@198.72.123.202) has joined #ceph
[17:01] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[17:01] * i_m (~ivan.miro@nat-5-carp.hcn-strela.ru) has joined #ceph
[17:04] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[17:25] * dmsimard (~dmsimard@198.72.123.202) Quit (Quit: Signed off)
[17:30] * diegows (~diegows@190.190.5.238) Quit (Read error: Operation timed out)
[17:38] * imriz (~imriz@82.81.163.130) Quit (Read error: Operation timed out)
[17:49] * mrjack_ (mrjack@office.smart-weblications.net) has joined #ceph
[17:54] * marrusl (~mark@209-150-43-182.c3-0.wsd-ubr2.qens-wsd.ny.cable.rcn.com) has joined #ceph
[17:56] * danieagle (~Daniel@179.184.165.184.static.gvt.net.br) has joined #ceph
[17:58] * kevinc (~kevinc__@client65-78.sdsc.edu) has joined #ceph
[18:01] * sarob (~sarob@2601:9:1d00:c7f:b46e:45b7:9e00:ccee) has joined #ceph
[18:04] <iggy> Anticimex: all I can say about using btrfs is, make sure you are using the most recent kernel you can get
[18:08] * kevinc (~kevinc__@client65-78.sdsc.edu) Quit (Quit: This computer has gone to sleep)
[18:09] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) has joined #ceph
[18:09] * sarob (~sarob@2601:9:1d00:c7f:b46e:45b7:9e00:ccee) Quit (Ping timeout: 480 seconds)
[18:11] <mrjack_> hmm.. for some reason, one osd refuses to start, looks like the fs of the osd got corrupted?! => http://pastebin.com/sJhx3ExU fsck of that ext4 osd runs without errors.. any hints? can i safely remove 2.360_head from osd data dir?
[18:13] <MACscr> is the ceph-users list manually moderated? I posted to it 9 hours ago and i hasnt shown up yet
[18:14] <mrjack_> MACscr: i don't think so, but imho you have to be subscribed?
[18:14] <MACscr> yes, im subscribed
[18:17] * amaron (~amaron@cable-178-148-239-68.dynamic.sbb.rs) has joined #ceph
[18:21] * bitserker (~toni@70.59.79.188.dynamic.jazztel.es) Quit (Quit: Leaving.)
[18:21] <Anticimex> iggy: what's the real gain with parallel write to journal + osd anyway? i was surprised by that. i thought there "was no journal", i.e. only one write happening
[18:24] * diegows (~diegows@190.190.5.238) has joined #ceph
[18:25] * lupu (~lupu@46.102.93.169) Quit (Quit: Leaving.)
[18:25] * lupu (~lupu@46.102.93.169) has joined #ceph
[18:29] * longguang (~chatzilla@123.126.33.253) Quit (Max SendQ exceeded)
[18:29] <iggy> the journal allows bunching writes together
[18:31] <MACscr> ok, so if im doing 3 disks on 3 nodes. 2 disks on each node are 512MB SSD's (with capacitors) and a 2TB SATA (i plan to add another for each node as well). I plan to use the SSD's for a cache pool. How should I setup the journaling and what sizes?
[18:32] * longguang (~chatzilla@123.126.33.253) has joined #ceph
[18:33] <MACscr> oh yeah, so its 1 OSD per disk and one journal per OSD, so i need journals on all disks, right?
[18:40] * wmat (wmat@wallace.mixdown.ca) has joined #ceph
[18:43] * cronix1 (~cronix@5.199.139.166) has joined #ceph
[18:48] * cronix1 (~cronix@5.199.139.166) Quit (Read error: Operation timed out)
[18:48] <Anticimex> iggy: yeah, i do get 'why journal'
[18:48] <Anticimex> MACscr: journals can be placed anywhere, but they eat IOPS and disk throughput
[18:49] <Anticimex> where "can be" does not suggest all placings will result in good performance
[18:51] <MACscr> well i guess im more looking for guidance =P
[18:51] <MACscr> my HBA's are only SATA 2, so im limited on throughput there
[18:51] <Anticimex> first you have to know what you are optimizing towards - high iops or cheap and large (slow) storage
[18:52] <MACscr> high iops
[18:52] <MACscr> going to be used for kvm rbd
[18:52] <Anticimex> depending on budget, do mix of journals, like (max) 6 spinner's osds per intel dc s3700 (~8 GB/journal, so 100G variant sufficient) and spinners
[18:53] <Anticimex> for a single tier setup. journals will catch some of it
[18:53] <Anticimex> but reads will still come direct from spinners
[18:53] <mrjack_> imho the intel dc s3700 suck if you use the small ones... need at least 400gb version for reasonable io
[18:53] <Anticimex> recent ceph have ability to put a pool on top of another pool
[18:53] <MACscr> "ok, so if im doing 3 disks on 3 nodes. 2 disks on each node are 512MB SSD's (with capacitors) and a 2TB SATA (i plan to add another for each node as well). I plan to use the SSD's for a cache pool."
[18:53] <Anticimex> mrjack_: please elaborate? performance not up to par?
[18:54] <Anticimex> i've read reviews on anandtech that gave fairly good performance
[18:54] <MACscr> i am using ceph firefly
[18:54] <mrjack_> Anticimex: it is, but the limit was somewhere at 100mb/sek
[18:54] <Anticimex> mostly limited by 6Gbps bus
[18:54] <mrjack_> for the smallest one
[18:54] <Anticimex> hmm, i've seen this clearly for the s3500 where they cheap on the chips
[18:54] <mrjack_> oh wait, maybe i mix them both up
[18:54] <Anticimex> but s3700 is overprovisioned to give it the full writes per day stats
[18:55] * loicd reading http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/ to fix inconsistent PG in a cluster
[18:55] <mrjack_> Anticimex: yeah sorry, you are right, 3700 is okay, 3500 sucks
[18:55] <Anticimex> MACscr: so you can haev two layers of pools, one of ssd with journal + osd on same device, and then put that as a cache on top of another pool that consist of spinners
[18:56] <Anticimex> MACscr: i don't know what performs best there, and it also depends on how many spinners, etc... in terms of where to put spinners' journal in that config. either 1:1 on local drive, or still on the ssds
[18:56] <mrjack_> Anticimex but even the 100gb s3700 is slow on writes.. one should at least take the 200gb edition..
[18:56] <Anticimex> mrjack_: interesting feedback, i must check it out
[18:56] <mrjack_> Anticimex: we use kingston ssdnow v300 in raid1 for journals
[18:57] <MACscr> Anticimex: i will only have 2 at most spinners per host, so 6 total
[18:57] <Anticimex> sounds hipster
[18:57] <Anticimex> :)
[18:57] <mrjack_> Anticimex: the 100gb version: 100GB 75K/19K random R/W 4K IOPS
[18:57] <Anticimex> MACscr: ok, full colocation of ceph with compute hosts then?
[18:57] <mrjack_> the 200gb makes 75k/32k
[18:57] <Anticimex> mrjack_: ah, check
[18:58] <Anticimex> mrjack_: what about throughput?
[18:58] <MACscr> Anticimex: yes
[18:58] <Anticimex> will 100G still saturate 6G?
[18:58] <Anticimex> they don't really, i believe, on larger versions
[18:58] <mrjack_> Anticimex: but, the kingson ssdnow 300 60gb kosts about 35 eur, makes 85k/60k
[18:58] * Kedsta (Ked@cpc6-pool14-2-0-cust202.15-1.cable.virginm.net) Quit (Ping timeout: 480 seconds)
[18:58] <Anticimex> i wonder when ssds come out for sas 3 (12Gbps)
[18:58] <mrjack_> Anticimex: now, only read, but not on heavy random io write
[18:58] <Anticimex> mrjack_: what write endurance?
[18:59] <Anticimex> mrjack_: no, maxa throughput on sequential write i mean. the drive limits with whatever io pattern is required to reach limit
[18:59] <mrjack_> Anticimex: at least 1 year until now... don't know exactly, that's why i put them in raid1 and mix them sometimes when i am at the datacenter
[18:59] <Anticimex> k
[19:00] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[19:01] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[19:01] <MACscr> raid1? i though you werent supposed to use raid with ceph
[19:01] <MACscr> thought
[19:02] <mrjack_> MACscr: you can, though.. and i don't want to rebuild 8tb of data just because of loosing journal osd - that's why i put them in raid1
[19:05] <loicd> will deep scrub eventually repair errors that can be repaired with ceph pg repair ? I think pg repair is the same as deep scrub but I'm not 100% sure.
[19:05] <MACscr> you are doing raid1 on just the journals?
[19:06] <mrjack_> yes
[19:09] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[19:09] * beardo (~sma310@beardo.cc.lehigh.edu) has joined #ceph
[19:17] <MACscr> ok, so im thinking with my setup, should just do journals on their own disks. since the spinners are mostly going to be used for cold storage and the SSD's will be able to hold 90% of the data, i dont think i need to put think the spinners journal on the SSD's
[19:19] <mrjack_> why not make the ssds 10% larger and put everything on ssd? :)
[19:27] * cronix1 (~cronix@5.199.139.166) has joined #ceph
[19:27] * chrisjones (~chrisjone@12.237.137.162) has joined #ceph
[19:28] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[19:31] * cronix1 (~cronix@5.199.139.166) Quit (Read error: Operation timed out)
[19:33] <MACscr> ha. Ok, maybe 75%
[19:33] <MACscr> and that number will decrease
[19:34] <MACscr> as more data is added
[19:45] <MACscr> hmm, i didnt think the MON part did much writing and my mon and osd are on the same systems and my operating systems are just on usb flash drives
[19:45] <MACscr> seems i have another item i really need to play out
[19:48] * cookednoodles (~eoin@eoin.clanslots.com) Quit (Quit: Ex-Chat)
[19:51] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[19:52] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[19:52] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[19:52] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[19:59] <MACscr> mrjack_: http://www.hastexo.com/resources/hints-and-kinks/solid-state-drives-and-ceph-osd-journals
[19:59] <MACscr> good article
[20:00] <MACscr> might be useful for you too
[20:01] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[20:09] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[20:10] * alexxy[home] (~alexxy@2001:470:1f14:106::2) has joined #ceph
[20:10] * alexxy (~alexxy@2001:470:1f14:106::2) Quit (Read error: Connection reset by peer)
[20:11] * cronix1 (~cronix@5.199.139.166) has joined #ceph
[20:15] <mrjack> MACscr: thanks but no need for me, i got my cluster up and running for two years now.. :)
[20:20] * cronix1 (~cronix@5.199.139.166) Quit (Ping timeout: 480 seconds)
[20:23] <MACscr> lol, ok
[20:23] <MACscr> mrjack: what do you think about my concern about MON and my flash drives
[20:26] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[20:27] * sarob_ (~sarob@2601:9:1d00:c7f:d5a7:6e26:5eeb:98a1) has joined #ceph
[20:27] * rendar (~I@host173-6-dynamic.55-79-r.retail.telecomitalia.it) Quit (Ping timeout: 480 seconds)
[20:30] * rendar (~I@host173-6-dynamic.55-79-r.retail.telecomitalia.it) has joined #ceph
[20:33] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[20:35] * sarob_ (~sarob@2601:9:1d00:c7f:d5a7:6e26:5eeb:98a1) Quit (Ping timeout: 480 seconds)
[21:01] * sarob (~sarob@2601:9:1d00:c7f:b56b:a5b6:d2c3:16da) has joined #ceph
[21:09] * sarob (~sarob@2601:9:1d00:c7f:b56b:a5b6:d2c3:16da) Quit (Ping timeout: 480 seconds)
[21:13] * dinosaurpt (~dinosaurp@a79-169-174-225.cpe.netcabo.pt) has joined #ceph
[21:13] * iggy (~iggy@theiggy.com) Quit (Quit: leaving)
[21:14] * analbeard (~shw@host81-147-14-90.range81-147.btcentralplus.com) has joined #ceph
[21:32] * Muhlemmer (~kvirc@cable-90-50.zeelandnet.nl) Quit (Ping timeout: 480 seconds)
[21:33] * cookednoodles (~eoin@eoin.clanslots.com) has joined #ceph
[21:35] * iggy (~iggy@theiggy.com) has joined #ceph
[21:54] * madkiss (~madkiss@p5099fdaa.dip0.t-ipconnect.de) has joined #ceph
[21:59] * zidarsk81 (~zidar@89-212-142-10.dynamic.t-2.net) has joined #ceph
[22:01] * sarob (~sarob@2601:9:1d00:c7f:2553:dde3:adda:7b11) has joined #ceph
[22:05] <MACscr> hmm, my ceph-deploy create osd didnt seem to go that well http://pastie.org/pastes/9338319/text?key=bpkdz3yloprdwbgwqx1lw
[22:06] * KevinPerks (~Adium@cpe-174-098-096-200.triad.res.rr.com) has joined #ceph
[22:06] * FL1SK (~quassel@159.118.92.60) Quit (Quit: http://quassel-irc.org - Chat comfortably. Anywhere.)
[22:08] <MACscr> ah, ceph -s has to be used from a mon system, not the deploy
[22:09] * sarob (~sarob@2601:9:1d00:c7f:2553:dde3:adda:7b11) Quit (Ping timeout: 480 seconds)
[22:09] <MACscr> though im wondering if i should manually prepare the disks instead because of those warnings
[22:20] * KevinPerks (~Adium@cpe-174-098-096-200.triad.res.rr.com) Quit (Quit: Leaving.)
[22:27] * cronix1 (~cronix@5.199.139.166) has joined #ceph
[22:31] <MACscr> how do i assign an osd to a pool?
[22:31] <MACscr> i need two pools
[22:31] <MACscr> seems all of them are in one right now
[22:39] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[22:42] * cronix1 (~cronix@5.199.139.166) Quit (Read error: Operation timed out)
[22:43] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) has joined #ceph
[22:51] * cronix1 (~cronix@5.199.139.166) has joined #ceph
[23:00] * cronix1 (~cronix@5.199.139.166) Quit (Ping timeout: 480 seconds)
[23:02] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) has joined #ceph
[23:04] <darkfader> MACscr: i only know during creation time
[23:04] <darkfader> have a look at the "adding an osd" manual
[23:07] * cronix1 (~cronix@5.199.139.166) has joined #ceph
[23:09] * sarob (~sarob@c-76-102-72-171.hsd1.ca.comcast.net) Quit (Read error: Operation timed out)
[23:09] * lupu (~lupu@46.102.93.169) Quit (Ping timeout: 480 seconds)
[23:10] * dinosaurpt (~dinosaurp@a79-169-174-225.cpe.netcabo.pt) Quit (Quit: Leaving)
[23:14] * steveeJ (~junky@HSI-KBW-46-223-54-149.hsi.kabel-badenwuerttemberg.de) Quit (Quit: Leaving)
[23:14] * dis (~dis@109.110.66.170) Quit (Ping timeout: 480 seconds)
[23:22] * bitserker (~toni@70.59.79.188.dynamic.jazztel.es) has joined #ceph
[23:24] * cronix1 (~cronix@5.199.139.166) Quit (Ping timeout: 480 seconds)
[23:26] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) has joined #ceph
[23:31] * Nacer (~Nacer@c2s31-2-83-152-89-219.fbx.proxad.net) Quit (Remote host closed the connection)
[23:33] * zidarsk81 (~zidar@89-212-142-10.dynamic.t-2.net) has left #ceph
[23:34] * wschulze (~wschulze@cpe-69-206-251-158.nyc.res.rr.com) Quit (Quit: Leaving.)
[23:42] * cronix1 (~cronix@5.199.139.166) has joined #ceph
[23:50] * lollipop (~s51itxsyc@23.94.38.19) Quit (Read error: Connection reset by peer)
[23:51] * cronix1 (~cronix@5.199.139.166) Quit (Ping timeout: 480 seconds)
[23:53] * Midnightmyth (~quassel@93-167-84-102-static.dk.customer.tdc.net) Quit (Ping timeout: 480 seconds)
[23:54] * lollipop (~s51itxsyc@23.94.38.19) has joined #ceph
[23:57] * bitserker (~toni@70.59.79.188.dynamic.jazztel.es) Quit (Ping timeout: 480 seconds)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.