#ceph IRC Log

Index

IRC Log for 2012-11-02

Timestamps are in GMT/BST.

[0:03] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) Quit (Quit: Leseb)
[0:07] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[0:07] <dec> I don't think I understand this live modification ....
[0:07] <dec> for me to add a new mon, the mon will need to be running...
[0:07] <dec> how do I get it running without adding it?
[0:09] <joshd> http://ceph.com/docs/master/cluster-ops/add-or-rm-mons/
[0:12] * maxim (~pfliu@222.128.144.42) Quit (Ping timeout: 480 seconds)
[0:24] * jlogan2 (~Thunderbi@2600:c00:3010:1:9b2:ed42:a1f6:a6ec) has joined #ceph
[0:27] <dec> thanks; so I did a 'ceph mon remove c' and stopped mon.c on the node it runs on, then did a 'ceph mon add c [ipv6]:6789' from one node, changed mon.c's IP address in ceph.conf on every node and started mon.c
[0:27] <dec> does that sound about right?
[0:28] * vata (~vata@208.88.110.46) Quit (Quit: Leaving.)
[0:28] * jlogan1 (~Thunderbi@2600:c00:3010:1:3880:bbab:af7:6407) Quit (Ping timeout: 480 seconds)
[0:30] <joshd> I think that might be enough
[0:30] <joshd> if ceph -s reports mon c as part of the quorum, you're good
[0:33] <dec> it does show it in there
[0:34] <dec> however I've got 1179 stuck inactive PGs after that change
[0:34] <dec> ... and they don't seem to be changing
[0:34] <dec> (I'm new to this :P)
[0:34] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: ["Textual IRC Client: www.textualapp.com"])
[0:35] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[0:37] <joshd> can you pastebin the output of 'ceph -s' and 'ceph health detail'?
[0:40] <dec> http://privatepaste.com/0fb32e6de5
[0:42] <joshd> I've got to go, but perhaps dmick or sjust can help you
[0:42] <dec> ok thanks
[0:43] <dec> it's probably getting late over there I guess, assuming most are in the US
[0:44] <dmick> not that late; 16:45
[0:44] <dmick> but toward the end of the day
[0:44] <dmick> lots of people are traveling this wek
[0:44] <dec> ah right
[0:45] <dmick> so you can try ceph -w
[0:45] <dmick> which will show you what the cluster is doing trying to catch up every time there's a significant event
[0:46] <dec> ceph -w doesn't seem to be showing anything much :)
[0:46] <dec> it gives a regular 'ceph -s' style output and then sits there
[0:49] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) Quit (Quit: Leaving.)
[0:52] * Ryan_Lane (~Adium@216.38.130.163) Quit (Quit: Leaving.)
[0:54] <dmick> hm
[0:54] <dmick> it's a little interesting that two mons are using ipv4 and one is using ipv6; I wasn't watching the whole conversation, but is that what you expect?
[0:55] <dec> yeah - I was trying to migrate it all to ipv6, joshd suggested removing the old mons and re-adding them on ipv6
[0:55] <dmick> ok
[0:55] <dec> but I've just blown it away and am going to rebuild it with ipv6 to start with :)
[0:55] <dmick> they appear to have reached agreement, so it's not a problem, just something I noticed
[0:55] <dmick> ok
[0:56] <dec> is 'public network' the correct config setting to set the osd listen address?
[0:58] <dmick> http://ceph.com/docs/master/config-cluster/ceph-conf/#networks
[0:58] <dec> yep, just found that - thanks
[0:58] <dmick> you can use public if you only have one net
[0:58] <dmick> if you want a "front" and "back", that's what public and cluster are for
[0:58] <dec> there's only one net here at the moment
[0:58] <dmick> public is right then
[0:59] <dec> we're assuming, so far, that 20gbps per server is sufficient
[0:59] <dmick> you have a 20Gb interface?
[0:59] <dec> 2x 10gbps
[0:59] <dmick> (or they're bonded)
[0:59] <dmick> ok
[1:00] <dmick> yeah, front/back is more about segregating the traffic, particularly if back is faster. it's fine to have one net
[1:09] * BManojlovic (~steki@212.200.241.231) Quit (Quit: Ja odoh a vi sta 'ocete...)
[1:16] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[1:17] * tnt (~tnt@20.35-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[1:19] * buck (~buck@bender.soe.ucsc.edu) has left #ceph
[1:20] * maxim (~pfliu@222.128.144.42) has joined #ceph
[1:22] * jtang2 (~jtang@79.97.135.214) has joined #ceph
[1:27] * jtang1 (~jtang@79.97.135.214) Quit (Ping timeout: 480 seconds)
[1:29] * Cube (~Cube@12.248.40.138) has joined #ceph
[1:34] * Cube1 (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[1:39] * Cube (~Cube@12.248.40.138) Quit (Ping timeout: 480 seconds)
[1:47] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[1:53] * pentabular (~sean@70.231.143.235) has left #ceph
[2:09] * adjohn (~adjohn@69.170.166.146) Quit (Quit: adjohn)
[2:16] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[2:24] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit (Ping timeout: 480 seconds)
[2:30] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[2:30] * jlogan2 (~Thunderbi@2600:c00:3010:1:9b2:ed42:a1f6:a6ec) Quit (Ping timeout: 480 seconds)
[2:33] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[2:33] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[2:39] * maxim (~pfliu@222.128.144.42) Quit (Quit: Ex-Chat)
[2:48] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[2:51] * stp (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[2:52] * stp (~stp@188-193-211-236-dynip.superkabel.de) Quit ()
[3:03] * lxo (~aoliva@lxo.user.oftc.net) Quit (Quit: later)
[3:36] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[3:38] * lxo (~aoliva@lxo.user.oftc.net) has joined #ceph
[3:38] * loicd (~loic@magenta.dachary.org) has joined #ceph
[3:51] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Ping timeout: 480 seconds)
[3:53] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[4:00] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) Quit (Quit: Leaving)
[4:03] <houkouonchi-work> dec: guessing you probably have already tuned stuff but you better have a huge MTU if your using bonding mode 0 or the out of ordering on packets will likely net you less speed than a single 10 gig interface (as is the case even with 1 gbit and standard 1500 mtu)
[4:13] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[4:15] * chutzpah (~chutz@199.21.234.7) Quit (Quit: Leaving)
[4:34] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[4:35] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[4:37] <dec> houkouonchi-work: we have 9000 byte MTUs, but nevertheless I realised after I said that earlier that we actually are only pinning traffic out of one of the two 10Gb interfaces
[4:39] * KindOne (KindOne@h187.46.28.71.dynamic.ip.windstream.net) has joined #ceph
[4:41] <dec> dmick: that public network config we talked about doesn't seem to let you specify an ipv6 addr
[4:42] <dec> '2012-11-02 14:11:01.537634 7f20ab03c760 -1 unable to parse network: 2001:db8::b5ff:fe00:7f'
[4:42] <dmick> hum
[4:43] <dec> I tried specifying with brackets around it [...] too
[4:43] <dmick> it's supposed to...
[4:44] <dmick> using inet_pton(AF_INET6, looks ilke
[4:44] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[4:44] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[4:45] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[4:46] <dec> I don't know if I'm reading it correctly, but I think common/config_opts.h indicates I should be using 'public addr' not 'public network'
[4:47] <dec> (the docs say to use 'public network')
[4:48] <dmick> my understanding is that addresses can be chosen from existing interfaces; setting "network" allows choosing "the address of the interface that's on that network"
[4:49] <dec> OK, so if I specify network I would have to specify a whole network range, not an individual address
[4:49] <dmick> see pick_addresses()
[4:49] <dmick> well yeah. a network address, not a host address
[4:50] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[4:50] <dmick> but that wouldn't affect the parsing, I don't think
[4:52] * loicd (~loic@magenta.dachary.org) has joined #ceph
[4:55] * nhm (~nh@67-220-20-222.usiwireless.com) has joined #ceph
[4:55] * nhm_ (~nh@184-97-251-146.mpls.qwest.net) Quit (Read error: Connection reset by peer)
[4:56] <dmick> a small test program parses that address
[4:56] <dmick> confused
[4:56] <dmick> can you show your public network config file line?
[4:59] <dmick> dec?
[5:01] * nhm_ (~nh@184-97-251-146.mpls.qwest.net) has joined #ceph
[5:03] * nhm (~nh@67-220-20-222.usiwireless.com) Quit (Ping timeout: 480 seconds)
[5:04] <dmick> yo dec: where you at?
[5:05] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[5:08] <dec> sorry - went to get coffee
[5:09] <dec> I changed all the OSDs to: [osd.x] public addr = 2001:db8::1
[5:09] <dec> and put their known ipv6 addr there
[5:10] <dec> they've all started up properly, but I'm seeing the same issue as I saw earlier with thousands of PGs 'peering'
[5:10] <dmick> peering is good; you want that
[5:10] <dec> oh, OK cool
[5:10] <dec> ceph -w shows a scrub happening of all the PGs on osd.3
[5:11] <dmick> so can you reproduce the line that failed to parse?
[5:11] <dmick> (I understand you'd like to get the cluster back too but)
[5:11] <dec> I don't mind, it's just a test cluster
[5:11] <dmick> (want not to lose that bug, if it's there)
[5:11] <dec> I can reproduce it with 'public network = 2001:db8::1'
[5:11] <dmick> that's in the global osd section, or each sub-osd section, or?...
[5:12] <dec> each sub-osd section
[5:12] <dmick> testing with a local cluster
[5:12] <dec> I think this doc page just needs updating to say that the per-osd address is conigured with 'public addr' not 'public network'
[5:12] <dec> http://ceph.com/docs/master/config-cluster/ceph-conf/#networks
[5:12] <dmick> but it's not
[5:13] <dmick> the problem is not the keyword
[5:13] <dmick> they mean different things
[5:13] <dmick> the problem is that for some reason parsing the actual address is failing
[5:14] <dec> is that because 'public network ' is parsed as a string, not an addr
[5:14] <dec> ?
[5:15] <dmick> oh sorry.
[5:15] <dmick> it's because there's no slash. you need to specify a network (i.e. with CIDR notation)
[5:15] <dmick> like 10.0.0.0/8
[5:16] <dec> yeah that's what I was saying, I figured that if I gave it a network /mask, it would be fine
[5:16] <dmick> yeah, and I agreed, but
[5:16] <dmick> didn't notice the early out in the routine that requires it
[5:17] <dmick> so yeah. put the net address in [osd], and then you shouldn't have to put addresses for each osd.N
[5:17] <dec> right, sounds good;
[5:17] <dec> i'll test that when this has finished building
[5:17] <dec> I'm sure it didn't take this long to build the cluster the first time I did it
[5:17] <dmick> is ceph -w showing progress now?
[5:17] <dmick> (oh yeah; now it's moving data around to recover out-of-date OSDs)
[5:17] <dec> there's no data on there! :)
[5:18] <dmick> then it shouldn't take long :)
[5:18] <dec> actually... that's odd. it tells me I've used 20GB (which is what was used before I blew it all away)
[5:18] <dec> I mustn't have cleaned it up correctly
[5:18] <dmick> when you say "blew it all away"
[5:18] <dmick> what exactly do you mean?
[5:19] <dec> I shut it all down, then rm -rf /var/lib/ceph/{mon,osd}/*/*
[5:19] <dec> then also formatted/mkfs'd the disks that the OSDs live on
[5:19] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[5:19] <dec> I assume there's state kept outside of these locations
[5:19] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[5:19] <dmick> not that I know of, unless you've changed paths in the .conf
[5:20] <dec> I haven't
[5:20] <dmick> does, say, "rados lspools" respond?
[5:20] <dec> yep, with data metadata and rbd
[5:21] <dmick> ok, so it's healthy enough to use while recovering
[5:21] <dmick> and you can rados -p <pool> ls to see if there are objects there
[5:21] <dec> I had a 20G RBD (empty) image in there before I destroyed it, but 'rbd ls' is just hanging
[5:21] <dec> 'rados -p data ls' is hanging too
[5:21] <dmick> ah. so not quite healthy enough yet
[5:22] <dmick> what's ceph -s saying?
[5:22] <dec> still the same as it was before, 3000+ peering PGs
[5:22] <dec> pgmap v99: 3648 pgs: 56 creating, 81 active+clean, 3193 peering, 318 active+degraded; 0 bytes data, 21672 MB used, 49490 GB / 49512 GB avail
[5:27] <dec> yeah something funny is going on
[5:27] <dec> just out of curiosity I destroyed all the data again and created a new cephfs
[5:28] <dec> initially it came up with: pgmap v5: 3648 pgs: 3648 creating; 0 bytes data, 0 KB used, 0 KB / 0 KB avail
[5:28] <dec> which I expected, since it was completely wiped
[5:28] <dec> however after a few moments it's come back with: pgmap v11: 3648 pgs: 3648 peering; 0 bytes data, 21664 MB used, 49490 GB / 49512 GB avail
[5:28] <dec> still showing 20GB used...
[5:28] <dec> ps auxwf
[5:28] <dec> whoops :)
[5:31] <dec> there's no OSD state kept in strange xattrs of the root filesystem on an OSD or something?
[5:31] <dmick> I sure don't think so
[5:32] <dec> it gets more weird: I just shut down 6 out of 18 OSDs
[5:32] <dec> and it still sees 18 up/in!
[5:32] <dec> mon.0 [INF] osdmap e5: 18 osds: 18 up, 18 in
[5:33] <dmick> there's some grace period. right now the other OSDs are heartbeating and getting no answer
[5:33] <dec> oh ok
[5:33] <dmick> about 30s IIRC
[5:33] <dec> I'll keep watching ceph -w
[5:38] <Q310> sigh
[5:38] <Q310> silly horizon dashboard not implementing rbd boot instances from images yet..
[5:39] <dec> whoa, mon.a on one host has gone nuts - it's sending hundreds/thousands of SYN packets out to the OSDs on other machines every second
[5:39] <dmick> odd
[5:40] <dec> actually it looks like it's an osd, not the mon
[5:40] <dec> osd.0 trying to connect to other osds to peer PGs?
[5:41] <dec> I can tell it's osd.0 because it's using 300% CPU
[5:41] <dec> and has 40,000 sockets open
[5:41] <dec> heh, and nothing has been able to detect that I killed 6/18 osd
[5:41] <dec> what on earth have I done :)
[5:42] <dmick> maybe check some logs. something's clearly not working as planned
[5:43] <dec> ah yes, I see thousands of these in osd.0's log: http://privatepaste.com/554dc4d151
[5:44] <dmick> did you maybe screw up an address?
[5:45] <dec> doesn't look like it...
[5:45] <dec> nope, they're all correct
[5:45] <dmick> yeah, actually that's just a port
[5:46] <dmick> I've seen that happen temporarily when restarting a cluster
[5:46] <dmick> but it was with data that was still there
[5:46] <dec> is temporarily more than ~15 minutes? :)
[5:46] <dmick> no
[5:46] <dmick> but then mine was a toy cluster on one machine
[5:47] <dmick> what are you using to init and start the cluster?
[5:47] <dec> mkcephfs
[5:47] <dec> and 'service ceph start' on RHEL
[5:47] <dmick> it's sure acting like it had leftover state somewhere
[5:48] <dec> in that log message I pasted, what comes after the slash?
[5:48] <dec> [IP]:port/???
[5:48] <dec> is that PID?
[5:48] <dmick> good question
[5:48] <dec> :)
[5:50] <dec> yeah there's some weird mon state hanging around by the looks of it
[5:50] <dmick> ah, that's the "nonce"
[5:50] <dec> what/where is that?
[5:50] <dmick> which is something like "a number that's unique per connection"
[5:50] <dec> ahh, right
[5:50] <dmick> inline ostream& operator<<(ostream& out, const entity_addr_t &addr)
[5:50] <dmick> {
[5:50] <dmick> return out << addr.addr << '/' << addr.nonce;
[5:50] <dmick> }
[5:51] <dmick> msg_types.h
[5:51] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[5:51] <dec> I even just dropped a mon, and I get a new election and a quorum of 2 nodes, but it still shows three mons listed:
[5:51] <dec> monmap e1: 3 mons at {a=[2001:db8::2:225:b5ff:fe00:7f]:6789/0,b=[2001:db8::2:225:b5ff:fe00:4f]:6789/0,c=[2001:db8::2:225:b5ff:fe00:2f]:6789/0}, election epoch 10, quorum 0,1 b,c
[5:52] <dec> and I don't get a mon missing warning
[5:52] <dmick> yeah, it still knows about the mon that's down
[5:52] <dec> in my previous testing, when I killed a mon it told me that one was down
[5:52] <dmick> only two in the quorum tho
[5:52] <dec> and gave me a health warning
[5:52] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) has joined #ceph
[5:53] <dmick> I haven't done these kinds of experiments personally, so I'm not sure how long things like that ought to take
[5:53] <dmick> but yeah, this is odd
[6:14] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[6:22] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[6:25] * yanzheng (~zhyan@jfdmzpr02-ext.jf.intel.com) Quit (Quit: Leaving)
[6:54] <dec> BTW, 0.53 borks trying to build on our EL6 build machine
[6:55] <dec> (we're testing 0.52 at the moment)
[6:56] <dmick> how so?
[6:56] <dec> {standard input}: Assembler messages:
[6:56] <dec> {standard input}:316721: Warning: end of file not at end of a line; newline inserted
[6:56] <dec> {standard input}:317865: Error: unknown pseudo-op: `.lbe2'
[6:56] <dec> {standard input}:317865: Error: open CFI at the end of file; missing .cfi_endproc directive
[6:56] <dec> g++: Internal error: Killed (program cc1plus)
[6:56] <dec> I'm trying to work out where that is
[6:56] <dmick> tf?
[6:56] <dec> yeah
[6:56] <dmick> that's pretty amazing
[6:57] <dmick> sounds like out of mem or something
[6:57] <dec> nope!
[7:05] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[7:06] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[7:13] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) has joined #ceph
[7:19] * tnt (~tnt@20.35-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[7:23] * Cube (~Cube@cpe-76-95-223-199.socal.res.rr.com) Quit (Quit: Leaving.)
[7:26] * sjustlaptop (~sam@68-119-138-53.dhcp.ahvl.nc.charter.com) Quit (Ping timeout: 480 seconds)
[7:29] <masterpe> On my way to https://www.42on.com/events/
[7:39] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) Quit (Quit: Leaving.)
[7:46] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[7:47] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[7:48] <iltisanni> Hi. What bugs are known with cephfs? (I heard it's not ready for production yet) and what do you think when the manor bugs will be fixed?
[7:54] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) Quit (Quit: Leaving.)
[7:57] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[7:57] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:22] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[8:34] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[8:42] * tnt (~tnt@20.35-67-87.adsl-dyn.isp.belgacom.be) Quit (Ping timeout: 480 seconds)
[8:49] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Remote host closed the connection)
[8:50] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[8:52] <dec> dmick: heh - I just tried rebulding this cluster as 0.53 and it's still having some sort of schizophrenic attack!
[8:53] <dec> although 0.53 seems to give me better messages in the 'ceph -w' output
[8:54] <dec> when operating on ipv6, the osd heartbeats seem to fail
[8:56] <dec> is it normal for an OSD to listen on more than 1 port?
[8:56] <dec> I see one here listening on 6800, 6822 and 6823
[9:02] <dec> and I *still* see 20GB used, even on a brand new empty cluster
[9:03] <dec> even though 'rados -p <pool> ls' on every pool and 'rbd ls' all show no data
[9:05] * loicd (~loic@207.209-31-46.rdns.acropolistelecom.net) has joined #ceph
[9:07] * verwilst (~verwilst@dD5769628.access.telenet.be) has joined #ceph
[9:09] * iltisanni (d4d3c928@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[9:10] * BManojlovic (~steki@91.195.39.5) has joined #ceph
[9:10] * loicd (~loic@207.209-31-46.rdns.acropolistelecom.net) Quit (Remote host closed the connection)
[9:13] * iltisanni (d4d3c928@ircip3.mibbit.com) has joined #ceph
[9:14] * loicd (~loic@207.209-31-46.rdns.acropolistelecom.net) has joined #ceph
[9:17] * sage (~sage@76.89.177.113) Quit (Ping timeout: 480 seconds)
[9:18] * tnt (~tnt@212-166-48-236.win.be) has joined #ceph
[9:23] * jtang2 (~jtang@79.97.135.214) Quit (Quit: Leaving.)
[9:27] * joao (~JL@92.67.123.57) has joined #ceph
[9:41] <dweazle> live from ceph workshop event \o/
[9:42] * sage (~sage@76.89.177.113) has joined #ceph
[9:43] <ninkotech> iltisanni: see http://tracker.newdream.net/projects/ceph/issues?set_filter=1&tracker_id=1
[9:43] <ninkotech> iltisanni: i think noone will tell you date
[9:44] <joao> dweazle, welcome :)
[9:44] <ninkotech> the only way to really control software projects is -- by directing focus
[9:44] <ninkotech> timelines do fail horribly with complex software
[9:45] <dweazle> joao: you here too? :)
[9:45] <joao> I am, by the bar
[9:46] <tnt> oh damn, I missed that I should have gone to that event.
[9:46] <dweazle> joao: how's your wifi? i'm on 3g right now, wifi is totally unusable
[9:46] <dweazle> tnt: there's still time!
[9:46] <joao> wifi's working just fine
[9:46] * maxim (~pfliu@111.192.241.105) has joined #ceph
[9:46] <dweazle> joao: then i guess i'm just too far from the access point
[9:47] <dweazle> (front left)
[9:48] <tnt> dweazle: huh ... it's not exactly next door :)
[9:48] <dweazle> tnt: if you travel fast enough, time will slow down
[9:49] <joao> dweazle, Thomas just told me that there's also another wifi network from the venue
[9:49] <joao> if you're having issues with the wifi try that
[9:51] <tnt> one of the great remaining problems in computer science: wifi at conferences ...
[9:52] <tnt> you gotta love the link theming on the schedule: https://www.42on.com/events/ orange on a slightly brighter shade of orange.
[9:54] <joao> dweazle, yeah, am having some trouble loading some web pages too
[9:54] <joao> I guess it's because they were unable to get more than a meager 10Mb link for the event
[9:55] <darkfaded> joao: i also love if they only have a single /24 or even smaller
[9:55] <darkfaded> and then offer rooms for 300 people
[9:55] <dweazle> joao: ah thanks
[9:56] <joao> darkfaded, lol
[9:56] <joao> and get some smartphones and tablets to the mix, and you don't even need 300 people in the venue :p
[9:58] * sagelap (~sage@92.67.123.57) has joined #ceph
[10:11] * Fruit (wsl@2001:980:3300:2:216:3eff:fe10:122b) has joined #ceph
[10:11] * Leseb (~Leseb@62.212.134.195) has joined #ceph
[10:12] * tore_ (~tore@110.50.71.129) Quit (Remote host closed the connection)
[10:14] * tziOm (~bjornar@92.67.123.57) has joined #ceph
[10:15] <Robe> morning!
[10:15] * jtang1 (~jtang@2001:770:10:500:5dab:8f0d:d3ec:5032) has joined #ceph
[10:19] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[10:25] * Leseb (~Leseb@62.212.134.195) Quit (Quit: Leseb)
[10:25] * Leseb (~Leseb@62.212.134.195) has joined #ceph
[10:27] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[10:38] * gucki (~smuxi@46-127-158-51.dynamic.hispeed.ch) has joined #ceph
[10:46] * yoshi (~yoshi@p37219-ipngn1701marunouchi.tokyo.ocn.ne.jp) Quit (Remote host closed the connection)
[10:55] * joao (~JL@92.67.123.57) Quit (Ping timeout: 480 seconds)
[11:01] * Leseb (~Leseb@62.212.134.195) Quit (Ping timeout: 480 seconds)
[11:06] * deepsa_ (~deepsa@122.172.168.153) has joined #ceph
[11:08] * deepsa (~deepsa@122.172.170.221) Quit (Ping timeout: 480 seconds)
[11:08] * deepsa_ is now known as deepsa
[11:21] * sagelap (~sage@92.67.123.57) Quit (Ping timeout: 480 seconds)
[11:38] * stingray (~stingray@stingr.net) Quit (Quit: менять колесо)
[11:46] * yanzheng (~zhyan@134.134.139.76) has joined #ceph
[11:55] * sagelap (~sage@62.212.134.195) has joined #ceph
[11:58] * gucki (~smuxi@46-127-158-51.dynamic.hispeed.ch) Quit (Remote host closed the connection)
[12:00] * bla123 (~chatzilla@p5DCD61B4.dip0.t-ipconnect.de) has joined #ceph
[12:00] * sagelap (~sage@62.212.134.195) Quit (Read error: Connection reset by peer)
[12:00] * sagelap (~sage@62.212.134.195) has joined #ceph
[12:01] <bla123> i'm trying out ceph on 4 virtual machines (2 in ceph, 2 clients). however, the changes one client makes won't sync to the other until re-mount
[12:05] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) Quit (Ping timeout: 480 seconds)
[12:06] <bla123> sync does not help, writing from another client before re-mount breaks the filesystem. what else can i try?
[12:06] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[12:06] * loicd1 (~loic@207.209-31-46.rdns.acropolistelecom.net) has joined #ceph
[12:06] <Qten> bla123: rbd volumes are like a iscsi volume you need a clustered filesystem
[12:06] * loicd (~loic@207.209-31-46.rdns.acropolistelecom.net) Quit (Read error: Connection reset by peer)
[12:06] <Qten> if your trying to do that
[12:07] <Qten> ie: the issue your seeing is related to ext4/xfs/ntfs/fat/fat32/etc
[12:07] * yanzheng (~zhyan@134.134.139.76) Quit (Remote host closed the connection)
[12:08] <Qten> however if you mounted the same volume on 2 machines via something alike vmfs you'd be fine
[12:10] <Qten> if your using linux you could use GFS or OCFS
[12:11] <Qten> depends on what your end goal / to do i guess :)
[12:12] <Qten> bla123: if for example you were trying to just mount a shared space ie kind of like a fileshare you could try and use cephfs however thats not production ready i believe however RBD is
[12:13] <Qten> condered 100% production ready
[12:13] <bla123> Qten: thanks a lot, somehow i came to think that rbd volumes werre already clustered
[12:13] * tore_ (~tore@110.50.71.129) has joined #ceph
[12:14] <Qten> bla123: they are "clusted" in the sence they are put across all avaliable nodes/disks etc
[12:14] <Qten> bla123: but not a clusted filesystem
[12:14] <Qten> *clustered even
[12:14] <Qten> silly keyboard
[12:14] * tziOm (~bjornar@92.67.123.57) Quit (Ping timeout: 480 seconds)
[12:15] <bla123> Qten: ok, i was looking for a clustered filesystem then
[12:15] <Qten> cephfs
[12:15] <nhm_> morning #ceph
[12:15] <bla123> Qten: ok, i will dig into cephfs then and look up "condered", because i never heard of that one
[12:15] <Qten> haha
[12:16] <Qten> sorry typeo
[12:16] <Qten> considered
[12:16] <bla123> ah, considered? ;)
[12:16] <Qten> been a long day =)
[12:17] <bla123> i see ;)
[12:32] <ramsay_za> a few others to look at ocfs2 or clvm
[12:32] * sagelap (~sage@62.212.134.195) Quit (Ping timeout: 480 seconds)
[12:44] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) has joined #ceph
[13:07] * maxim (~pfliu@111.192.241.105) Quit (Ping timeout: 480 seconds)
[13:24] * KindTwo (KindOne@h116.26.131.174.dynamic.ip.windstream.net) has joined #ceph
[13:26] * KindOne (KindOne@h187.46.28.71.dynamic.ip.windstream.net) Quit (Ping timeout: 480 seconds)
[13:26] * KindTwo is now known as KindOne
[13:27] * deepsa (~deepsa@122.172.168.153) Quit ()
[13:27] * loicd1 (~loic@207.209-31-46.rdns.acropolistelecom.net) Quit (Quit: Leaving.)
[13:37] <tnt> Mmm, apt-get source ceph doesn't work, the repo don't have the source packages in there.
[13:39] * sagelap (~sage@62.212.134.195) has joined #ceph
[13:40] * Leseb (~Leseb@62.212.134.195) has joined #ceph
[13:54] * loicd (~loic@magenta.dachary.org) has joined #ceph
[13:57] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[13:57] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[13:58] * yanzheng (~zhyan@101.83.211.139) has joined #ceph
[14:00] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[14:01] * mtk (~mtk@ool-44c35983.dyn.optonline.net) has joined #ceph
[14:03] * zhyan_ (~zhyan@101.82.33.7) has joined #ceph
[14:05] * nhorman (~nhorman@hmsreliant.think-freely.org) has joined #ceph
[14:07] * yanzheng (~zhyan@101.83.211.139) Quit (Ping timeout: 480 seconds)
[14:11] * zhyan_ (~zhyan@101.82.33.7) Quit (Ping timeout: 480 seconds)
[14:23] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) has joined #ceph
[14:25] * Leseb (~Leseb@62.212.134.195) Quit (Quit: Leseb)
[14:28] * aliguori (~anthony@cpe-70-123-145-75.austin.res.rr.com) has joined #ceph
[14:32] * sagelap (~sage@62.212.134.195) Quit (Ping timeout: 480 seconds)
[14:33] * mtk (~mtk@ool-44c35983.dyn.optonline.net) Quit (Remote host closed the connection)
[14:33] * mtk (r9uVl19f_A@panix2.panix.com) has joined #ceph
[14:53] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[14:53] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit ()
[14:58] * PerlStalker (~PerlStalk@perlstalker-1-pt.tunnel.tserv8.dal1.ipv6.he.net) has joined #ceph
[15:08] * janjaap (~chatzilla@62.212.134.195) has joined #ceph
[15:12] * janjaap (~chatzilla@62.212.134.195) has left #ceph
[15:13] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[15:20] <dec> Some part of the OSD stats seems to be broken here. I can't work out how it builds these, but the osd_sum.kb_used stat in my pgmap is wrong; it reports 21GB used on a freshly reinstalled cluster.
[15:21] <dec> it seems to be update_osd_stat() in osd/OSD.cc, but I can't work out where it's getting the stat data from
[15:24] * hhoover (~hhoover@of2-nat1.sat6.rackspace.com) has joined #ceph
[15:30] <dec> actually, it's not wrong... it's just based on a statfs of all of the OSD filesystems, and given they have a 1GB Journal on each and I have 18 of them, ~20GB "used" is correct
[15:30] <dec> ignore me :)
[15:30] <Robe> :)
[15:35] * Leseb (~Leseb@62.212.134.195) has joined #ceph
[15:44] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) has joined #ceph
[15:45] * sagelap (~sage@62.212.134.195) has joined #ceph
[15:47] * CristianDM (~CristianD@host173.186-124-185.telecom.net.ar) Quit (Ping timeout: 480 seconds)
[15:47] * CristianDM (~CristianD@201-213-232-83.net.prima.net.ar) has joined #ceph
[15:50] * elder_ (~elder@c-71-195-31-37.hsd1.mn.comcast.net) has joined #ceph
[15:50] * elder (~elder@c-71-195-31-37.hsd1.mn.comcast.net) Quit (Read error: Connection reset by peer)
[15:50] <elder_> That was weird. Just got a power drop.
[15:50] * Leseb (~Leseb@62.212.134.195) Quit (Quit: Leseb)
[15:51] <Robe> power drop?
[15:51] <elder_> My power just failed momentarily, killing my machine. My battery backup evidently doesn't work.
[15:51] * Leseb (~Leseb@62.212.134.195) has joined #ceph
[15:52] <Robe> ah
[15:52] <Robe> time to hit the facility guys
[15:52] <elder_> I work at home.
[15:52] <elder_> I am the facility guy.
[15:52] <Robe> hahaha
[15:52] <Robe> ok :)
[15:52] <Robe> go tyler durden on yourself!
[15:53] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) Quit (Quit: Leaving.)
[15:53] * Oliver2 (~oliver1@62.212.134.195) has joined #ceph
[15:54] <tnt> Is it possible to create a S3 user that can't create buckets ?
[15:56] * Leseb (~Leseb@62.212.134.195) Quit ()
[15:57] <slang> anyone know how to get github to render my README.md as markdown?
[15:57] <slang> I moved README -> README.md on the autobuild-ceph repo, and it renders markdown if I view the README.md file, but the main README is still txt
[15:58] <slang> there must be a bit I need to flip on github somewhere...
[16:01] * noob2 (a5a00214@ircip3.mibbit.com) has joined #ceph
[16:02] <noob2> has anyone explored the idea of using vm's for monitors instead of physical boxes?
[16:02] <slang> oh
[16:02] <slang> it just magically started working
[16:03] * slang peers at github
[16:03] <elder_> noob2, it shouldn't really matter, but you'd want them on separate physical boxes anyway.
[16:03] <noob2> right
[16:03] <Robe> slang: eventually consistent!
[16:03] <noob2> in my testing with ceph it seems like the monitors don't do all that much so i thought why not vm's :)
[16:07] <dweazle> noob2: well, at least don't put your monitors on a vm backed by rbd :)
[16:07] <tnt> Can I safely erase the .rgw.* and .users.* pools and just create them manually ? There were created wth pg_num=8 which is a bit low ...
[16:07] <noob2> lol
[16:07] <noob2> yeah that'd be bad
[16:07] * Leseb (~Leseb@62.212.134.195) has joined #ceph
[16:08] <tnt> noob2: I use VMs ...
[16:08] <noob2> awesome
[16:08] <noob2> that cuts the cost down a bunch on deploying this
[16:09] <tnt> I even have OSD on VMs ... there is a small perf impact but not that much and I'm not after super-high-perf so it's good for me.
[16:09] <noob2> yeah i'm testing osd's on vm's right now just to see how it works
[16:09] <tnt> (obviously they still ahve dedicated drives and pinned cpu and such)
[16:09] <noob2> right
[16:10] <tnt> then I just run rbd backed VMs on the same physical boxes, using the memory and cpu not used by the osd.
[16:12] <noob2> that's great
[16:12] <noob2> i like it
[16:13] <tnt> and xen live migration works well on RBD so that's nice :p
[16:14] <nhm_> elder: I have a spare UPS with a bad battery. Want it? ;)
[16:15] <elder_> nhm_, we can trade.
[16:16] <nhm_> elder_: Mine emits a high pitched siren when plugged in...
[16:17] <elder_> A replacement battery is about $30 at Batteries Plus. But you may get one that works as well as mine just did.
[16:17] <nhm_> and keeps doing it for a while after it's unplugged. :D
[16:20] * BManojlovic (~steki@91.195.39.5) Quit (Remote host closed the connection)
[16:26] * joao (~JL@92.67.123.57) has joined #ceph
[16:31] <tnt> Is there a way to run a 'df' on all the cluster at once to get usage by osd ?
[16:32] <tnt> I thought I saw that somewhere but can't find it anymore ...
[16:32] * jlogan1 (~Thunderbi@2600:c00:3010:1:9b2:ed42:a1f6:a6ec) has joined #ceph
[16:32] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[16:34] * jjgalvez (~jjgalvez@cpe-76-175-17-226.socal.res.rr.com) has joined #ceph
[16:34] * jmlowe (~Adium@2001:18e8:2:28a2:e116:78a7:e6e0:bfa0) has joined #ceph
[16:34] * jlogan1 (~Thunderbi@2600:c00:3010:1:9b2:ed42:a1f6:a6ec) Quit ()
[16:36] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) Quit ()
[16:38] * jlogan1 (~Thunderbi@2600:c00:3010:1:9b2:ed42:a1f6:a6ec) has joined #ceph
[16:46] * Leseb (~Leseb@62.212.134.195) Quit (Quit: Leseb)
[16:48] <PerlStalker> Strange issue: ceph -s shows that the status HEALTH_OK but two of the three hosts hang when I run rbd ls
[16:48] <PerlStalker> The third works just fine
[16:51] <Oliver2> @tnt: perhaps a combination of rados df, ceph osd dump, ceph pg dump?
[16:51] <cephalobot`> Oliver2: Error: "tnt:" is not a valid command.
[16:52] <Oliver2> tnt: perhaps a combination of rados df, ceph osd dump, ceph pg dump?
[16:53] <tnt> Oliver2: ah yes, ceph osd dump actually has the summary at the end
[16:53] * scalability-junk (~stp@188-193-211-236-dynip.superkabel.de) Quit (Quit: Leaving)
[16:54] <Oliver2> greetnx from Amsterdam ;) Out for the drinks now at the end...
[16:56] <nhm_> Oliver2: woo!
[16:59] <nhm_> Oliver2: I just had a spiked morning coffee to help you celebrate. ;)
[17:01] * CristianDM (~CristianD@201-213-232-83.net.prima.net.ar) Quit ()
[17:01] * rweeks (~rweeks@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[17:04] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Ping timeout: 480 seconds)
[17:06] <zynzel> c
[17:06] <zynzel> wrong window ;)
[17:08] * sagelap (~sage@62.212.134.195) Quit (Ping timeout: 480 seconds)
[17:10] * Oliver2 (~oliver1@62.212.134.195) Quit (Quit: Leaving.)
[17:16] <rweeks> would be c++ anyway.
[17:16] * rweeks grins
[17:16] <nhm_> rweeks: You and Alex can have a cage match fight. ;)
[17:16] <rweeks> hehe
[17:17] <rweeks> nah. I don't have enough strong opinions about it one way or the other.
[17:20] * Cube1 (~Cube@cpe-76-95-223-199.socal.res.rr.com) has joined #ceph
[17:21] * jmlowe (~Adium@2001:18e8:2:28a2:e116:78a7:e6e0:bfa0) Quit (Quit: Leaving.)
[17:23] * Oliver2 (~oliver1@62.212.134.195) has joined #ceph
[17:23] <Oliver2> nhm_… bye 4 now ;)
[17:23] * Oliver2 (~oliver1@62.212.134.195) has left #ceph
[17:38] * adjohn (~adjohn@108-225-130-229.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[17:43] * davidz (~Adium@ip68-96-75-123.oc.oc.cox.net) has joined #ceph
[17:46] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) has joined #ceph
[17:50] * tnt (~tnt@212-166-48-236.win.be) Quit (Ping timeout: 480 seconds)
[17:56] * MikeMcClurg (~mike@firewall.ctxuk.citrix.com) Quit (Ping timeout: 480 seconds)
[17:57] * joao (~JL@92.67.123.57) Quit (Remote host closed the connection)
[18:01] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) has joined #ceph
[18:05] * jtang1 (~jtang@2001:770:10:500:5dab:8f0d:d3ec:5032) Quit (Quit: Leaving.)
[18:06] * tnt (~tnt@20.35-67-87.adsl-dyn.isp.belgacom.be) has joined #ceph
[18:11] * sagewk (~sage@2607:f298:a:607:bcf7:887f:817:c0d7) Quit (Ping timeout: 480 seconds)
[18:13] * jmlowe (~Adium@c-71-201-31-207.hsd1.in.comcast.net) has joined #ceph
[18:14] * synapsr (~Adium@c-69-181-244-219.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[18:15] * chutzpah (~chutz@199.21.234.7) has joined #ceph
[18:20] * sagewk (~sage@2607:f298:a:607:e973:a80c:de9b:db1d) has joined #ceph
[18:23] * bla123 (~chatzilla@p5DCD61B4.dip0.t-ipconnect.de) Quit (Quit: ChatZilla 0.9.89 [Firefox 15.0.1/20120905151427])
[18:36] * BManojlovic (~steki@212.200.240.120) has joined #ceph
[18:50] * tnt (~tnt@20.35-67-87.adsl-dyn.isp.belgacom.be) Quit (Quit: Lost terminal)
[19:18] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[19:20] * synapsr (~Adium@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[19:20] * Ryan_Lane (~Adium@c-67-160-217-184.hsd1.ca.comcast.net) has joined #ceph
[19:29] * aliguori (~anthony@cpe-70-123-145-75.austin.res.rr.com) Quit (Quit: Ex-Chat)
[19:37] * Steki (~steki@212.200.243.207) has joined #ceph
[19:41] * BManojlovic (~steki@212.200.240.120) Quit (Ping timeout: 480 seconds)
[19:42] * loicd (~loic@90.84.146.208) has joined #ceph
[19:43] <elder_> joshd, do you know who is doing all the 4K reads right after a device gets mapped?
[19:44] <elder_> I just happened to report them when they fell back to the parent and got 167 reads, having done nothing but map the device.
[19:45] <elder_> Probably something having to do with reading partition tables, in part.
[19:48] * Fruit (wsl@2001:980:3300:2:216:3eff:fe10:122b) has left #ceph
[19:51] * synapsr (~Adium@c-98-234-186-68.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[19:56] * Steki (~steki@212.200.243.207) Quit (Ping timeout: 480 seconds)
[19:58] * aliguori (~anthony@32.97.110.59) has joined #ceph
[20:02] <joshd> elder_: not sure exactly, but I'd guess something like a udev helper
[20:02] <elder_> Each device gets 85 reads. I'm looking at it.
[20:02] <elder_> check_partition() reports "unknown partition type"
[20:02] <elder_> And that's after doing a bunch of reads.
[20:03] <elder_> Connected to register_disk(), it appears.
[20:15] * yeming (~user@180.168.36.70) Quit (Read error: Connection reset by peer)
[20:15] * miroslav (~miroslav@173-228-38-131.dsl.dynamic.sonic.net) Quit (Quit: Leaving.)
[20:17] * hhoover (~hhoover@of2-nat1.sat6.rackspace.com) Quit (Quit: ["Bye"])
[20:27] * loicd (~loic@90.84.146.208) Quit (Ping timeout: 480 seconds)
[20:40] * verwilst (~verwilst@dD5769628.access.telenet.be) Quit (Quit: Ex-Chat)
[20:42] * loicd (~loic@magenta.dachary.org) has joined #ceph
[20:46] * nwatkins (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[20:59] <slang> nwatkins: could you review wip-fchown for me?
[21:01] <dmick> elder_: it's the stuff looking for uuids
[21:01] <dmick> um...I knew its name yesterday...
[21:01] <dmick> blk...part...something?...
[21:01] <nwatkins> slang: sure thing
[21:01] <elder_> dmick, it's under check_partition()
[21:02] <dmick> yeah, I mean the userland agent that initiated it
[21:02] <dmick> ultimately driven by udev, in any event
[21:02] <elder_> The first one is calling msdos_partition(), and I haven't looked but presume the other reads are attempts by other partition types to claim the discovered disk as its own.
[21:03] <joshd> blkid? blockdev?
[21:04] <dmick> yeah, blkid is the one I was thinking about I think, from 60-persistent-storage.rules
[21:04] <dmick> I stopped it in gdb, and Yehuda taught me how to find the struct task * from the process stack, and I found the command line
[21:05] <elder_> Task is at the base of the page(s) containing the stack.
[21:06] <elder_> I'm a little nervous about stack consumption with layered images too, by the way.
[21:09] <dmick> why, this ain't no RISC proc :)
[21:10] <dmick> but it might be good to keep an eye peeled for unnecessary automatics, I agree
[21:11] <elder_> It's an ongoing problem in the Linux kernel.
[21:11] <elder_> Fixed stack size.
[21:11] <dmick> it's an ongoing problem in every kernel :)
[21:11] <dmick> what is the default task stack size, anyway?
[21:11] <elder_> And network interrupts can burn you.
[21:11] <elder_> 8K? I don't know any more.
[21:12] <dmick> I think Solaris went to 16K for 64-bit procs
[21:12] <dmick> IIRC
[21:30] * nhorman (~nhorman@hmsreliant.think-freely.org) Quit (Quit: Leaving)
[21:33] * Leseb (~Leseb@5ED01FAC.cm-7-1a.dynamic.ziggo.nl) has joined #ceph
[21:35] * scalability-junk (~stp@dslb-084-056-037-212.pools.arcor-ip.net) has joined #ceph
[21:43] <nwatkins> slang: looks good to me, passes tests.
[21:44] <slang> mwatkins: cool. thanks!
[21:51] <elder_> Well I definitely need to sort out the details of disk initialization. As things stand, the disk as well as each parent are getting read when they're added.
[21:51] <elder_> We only want the original mapped disk to get read.
[21:51] <joshd> elder_: why does it matter if the others are read?
[21:52] <elder_> Hmmm.
[21:53] <elder_> I suppose when it really comes down to it, it won't hurt much.
[21:53] <elder_> It's not pretty, but it most likely won't cause harm.
[21:56] <dmick> It's reading the parent because it has to to get the block, right?
[21:56] <elder_> No.
[21:57] <elder_> I mean it's directly reading the parent (scanning for partitions, for example) *in addition* to anything that might happen due to the parent-child relationship.
[22:09] <dmick> oh, because it's a full-fledged device ATM
[22:09] <dmick> but you're going to fix that, right? :)
[22:10] * noob2 (a5a00214@ircip3.mibbit.com) Quit (Quit: http://www.mibbit.com ajax IRC Client)
[22:12] <elder_> That's what I started to say, sort of.
[22:12] <elder_> But Josh asserted it might not be a problem.
[22:13] <joshd> it comes down to figuring out how caching will work. it may require full-fledged devices.
[22:14] * scuttlemonkey (~scuttlemo@c-69-244-181-5.hsd1.mi.comcast.net) Quit (Quit: This computer has gone to sleep)
[22:26] <elder_> I'm going to proceed as if none of what I already have in place needs to change.
[22:27] <elder_> But I have to quit now. I'll be back on a bit over the weekend but some of this will have to wait until next week.
[22:27] <dmick> well it's not a short-term problem, but if it has side effects like "dev nodes in the filesystem", my opinion is that it needs to change eventually, but that's eventually
[22:27] <dmick> still want to get some time to understand gendisk better
[22:27] <slang> dmick: are you setting -nostdinc somehow?
[22:29] <dmick> no, but apparently the build machines don't have Ceph installed at all, and of course the tree should be building against its own headers
[22:29] <slang> dmick: /usr/include is already in my gcc search path
[22:29] <dmick> yeah, but that's not the point
[22:29] <slang> dmick: maybe I misunderstood your email
[22:29] <dmick> I mean $(SRC)/src/include
[22:29] <slang> ah
[22:30] <dmick> i.e.: I had a file that did #include <rbd/librbd.h>. It built on my machine, but it was getting the file from /usr/include. It failed on the build machine.
[22:30] <slang> dmick: I figured that was why we installed in /usr/local
[22:30] <slang> at least for cephfs stuff
[22:31] <dmick> no, that's just the default, and do_autogen.sh sets /usr, and the packages install to /usr
[22:31] <dmick> but even then
[22:31] <slang> dmick: adding -I$(SRCDIR)/src/include makes sense though
[22:31] <dmick> you have the bootstrapping issue of "I just changed the header file, and I need that copy to build"
[22:32] * slang nods
[22:32] <dmick> so far it's been done by "#include "include/rbd/librbd.h", but that seems like the wrong place to do that
[22:34] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[22:34] * nwatkins (~Adium@soenat3.cse.ucsc.edu) Quit (Read error: Connection reset by peer)
[22:36] * nwatkins (~Adium@soenat3.cse.ucsc.edu) has joined #ceph
[22:36] * nwatkins1 (~Adium@soenat3.cse.ucsc.edu) Quit (Read error: Connection reset by peer)
[22:39] <slang> dmick: -Iinclude won't work for out-of-tree builds, that's partly why I was confused
[22:39] <slang> dmick: it needs to include the autoconf srcdir var as the root of the path probably
[22:47] * miroslav (~miroslav@c-98-234-186-68.hsd1.ca.comcast.net) has joined #ceph
[22:48] * brandi (~brandi@9YYAAKEJL.tor-irc.dnsbl.oftc.net) has joined #ceph
[22:50] * scalability-junk (~stp@dslb-084-056-037-212.pools.arcor-ip.net) Quit (Ping timeout: 480 seconds)
[22:55] <dmick> oh
[22:55] <dmick> yes, I meant things that build out of tree against /usr/include
[22:55] <dmick> other things
[22:56] <dmick> librados apps, etc.
[22:56] <dmick> but yes, $(SRCwhatever)/include
[22:56] * slang nods
[22:57] * silversu_ (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) has joined #ceph
[22:58] <dmick> $(top_srcdir)/src/include, perhaps
[22:58] <dmick> like leveldb does
[22:58] <dmick> or perhaps $(srcdir)/include. Not sure if it matters.
[22:59] <dmick> easily tested I suppose
[22:59] * silversurfer (~silversur@124x35x68x250.ap124.ftth.ucom.ne.jp) Quit (Read error: Connection reset by peer)
[23:00] * ajm (~ajm@adam.gs) Quit (Quit: ajm)
[23:00] * ajm (~ajm@adam.gs) has joined #ceph
[23:02] * aliguori (~anthony@32.97.110.59) Quit (Remote host closed the connection)
[23:02] * tryggvil (~tryggvil@16-80-126-149.ftth.simafelagid.is) has joined #ceph
[23:28] * MikeMcClurg (~mike@cpc10-cmbg15-2-0-cust205.5-4.cable.virginmedia.com) has joined #ceph
[23:43] * synapsr (~Adium@2620:149:f01:201:e417:11eb:834b:57e3) has joined #ceph
[23:44] * synapsr (~Adium@2620:149:f01:201:e417:11eb:834b:57e3) Quit ()
[23:53] * PerlStalker (~PerlStalk@perlstalker-1-pt.tunnel.tserv8.dal1.ipv6.he.net) Quit (Quit: Good night everybody)
[23:55] * loicd (~loic@magenta.dachary.org) Quit (Quit: Leaving.)
[23:59] * mdxi (~mdxi@74-95-29-182-Atlanta.hfc.comcastbusiness.net) Quit (Read error: Connection reset by peer)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.