#ceph IRC Log


IRC Log for 2010-07-22

Timestamps are in GMT/BST.

[0:14] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[0:32] * Osso (osso@AMontsouris-755-1-7-189.w86-212.abo.wanadoo.fr) Quit (Quit: Osso)
[1:43] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (Remote host closed the connection)
[2:14] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[2:29] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (Ping timeout: 480 seconds)
[7:58] * JaSa (~justanoth@ has joined #ceph
[8:02] <JaSa> Hey folks, i've been banging my head against a wall trying to get the shadow_copy or shadow_copy2 vfs modules working with ceph exported via samba. Does anybody have any tips or thoughts on that?
[8:22] * eternaleye (~quassel@184-76-53-210.war.clearwire-wmx.net) Quit (Ping timeout: 480 seconds)
[8:26] <wido> sagewk: ok, i see. Something went wrong i think with my pool creation.
[8:26] <wido> had some default rules (commit you made yesterday) to set the replication to three by default.
[8:31] * mtg (~mtg@vollkornmail.dbk-nb.de) has joined #ceph
[8:34] * JaSa (~justanoth@ Quit (Quit: Quit)
[8:38] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[8:40] * Jiaju (~jjzhang@ has joined #ceph
[8:40] * atg (~atg@please.dont.hacktheinter.net) Quit (Remote host closed the connection)
[8:41] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[9:05] * atg (~atg@please.dont.hacktheinter.net) Quit (Quit: No Ping reply in 180 seconds.)
[9:06] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[9:19] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[9:38] * allsystemsarego (~allsystem@ has joined #ceph
[9:50] * atg (~atg@please.dont.hacktheinter.net) Quit (Quit: No Ping reply in 180 seconds.)
[9:51] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[10:01] * pcish (7a74347d@ircip1.mibbit.com) Quit (Ping timeout: 480 seconds)
[10:15] * Yoric (~David@ has joined #ceph
[10:25] * atg (~atg@please.dont.hacktheinter.net) Quit (Remote host closed the connection)
[10:26] * atg (~atg@please.dont.hacktheinter.net) has joined #ceph
[10:54] <andret> darkfader: thanks for the monitoring-page in the wiki
[12:07] * Osso (osso@AMontsouris-755-1-7-189.w86-212.abo.wanadoo.fr) has joined #ceph
[12:08] * Osso_ (osso@AMontsouris-755-1-7-189.w86-212.abo.wanadoo.fr) has joined #ceph
[12:08] * Osso (osso@AMontsouris-755-1-7-189.w86-212.abo.wanadoo.fr) Quit (Read error: Connection reset by peer)
[12:08] * Osso_ is now known as Osso
[12:20] * T5 (jh@server19.xlhost.de) has joined #ceph
[12:54] * Jiaju (~jjzhang@ Quit (Quit: 暂离)
[14:24] * T5 (jh@server19.xlhost.de) Quit (Remote host closed the connection)
[15:04] * deksai (~chris@71-13-57-82.dhcp.bycy.mi.charter.com) has joined #ceph
[15:08] * Yoric_ (~David@ has joined #ceph
[15:13] * Yoric (~David@ Quit (Ping timeout: 480 seconds)
[15:13] * Yoric_ is now known as Yoric
[15:48] * f4m8 is now known as f4m8_
[16:20] * ghaskins_mobile (~ghaskins_@66-189-114-103.dhcp.oxfr.ma.charter.com) Quit (Quit: This computer has gone to sleep)
[16:26] * mtg (~mtg@vollkornmail.dbk-nb.de) Quit (Quit: Verlassend)
[16:43] * noahdesu (~noahdesu@c-76-113-30-117.hsd1.nm.comcast.net) has joined #ceph
[16:49] * JaSa (~justanoth@ has joined #ceph
[16:51] * nwatkins (~nwatkins@c-76-113-30-117.hsd1.nm.comcast.net) has joined #ceph
[16:52] * nwatkins (~nwatkins@c-76-113-30-117.hsd1.nm.comcast.net) Quit ()
[16:58] * JaSa (~justanoth@ Quit (Quit: Quit)
[17:18] * noahdesu (~noahdesu@c-76-113-30-117.hsd1.nm.comcast.net) Quit (Ping timeout: 480 seconds)
[17:52] * Yoric (~David@ Quit (Quit: Yoric)
[17:58] * eternaleye (~quassel@184-76-53-210.war.clearwire-wmx.net) has joined #ceph
[17:59] <wido> hi
[17:59] <wido> little typo in debian/control line 67
[17:59] <wido> "originally designed for mapping data objects to storage servesr" should be "servers"
[19:19] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[19:55] <yehudasa> wido: you there?
[19:56] <sagewk> wido: thanks, fixed
[20:07] * yehudasa1 (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:08] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (Quit: Leaving)
[20:09] * yehudasa1 (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit ()
[20:10] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:18] * noahdesu (~noahdesu@c-76-113-30-117.hsd1.nm.comcast.net) has joined #ceph
[20:37] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) Quit (Read error: Connection reset by peer)
[20:39] * noahdesu (~noahdesu@c-76-113-30-117.hsd1.nm.comcast.net) Quit (Ping timeout: 480 seconds)
[20:57] <wido> sagewk: is yehudasa at the office with you?
[20:57] <sagewk> yep
[20:57] <wido> he's offline right now, but i think he had a question regarding my VLC problem i have with the RADOS gateway
[20:58] * yehudasa (~yehudasa@ip-66-33-206-8.dreamhost.com) has joined #ceph
[20:58] <wido> i'm using VLC 1.0.6 from Ubuntu
[20:58] <yehudasa> hmm
[20:58] <yehudasa> you're using ubunto 10.04?
[20:58] <yehudasa> ubuntu
[21:00] <wido> yes
[21:00] <wido> but, you noticed VLC not doing any requests, that's what i saw too
[21:00] <yehudasa> wido: yesterday when you saw the range variable in the message header that showed something other than '0', did you see that on wireshark?
[21:01] <wido> no, VLC did not do another request when i clicked through the timeline
[21:01] <yehudasa> oh, so where did you see that range?
[21:01] <wido> but today i just clicked in the timeline and about 2 minutes later, it jumped to there and went on
[21:01] <wido> only when i was running plain Apache
[21:01] <wido> just serving the file
[21:02] <yehudasa> I see
[21:02] <wido> so it takes very long for the RADOS gateway to respond to that, like the client downloads it all until it reaches the point to where i clicked, but i'm not sure, haven't tested that
[21:03] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[21:03] <yehudasa> so when running with plain apache, vlc sends a second request when you try to reposition the movie?
[21:04] <wido> indeed
[21:05] <yehudasa> do you have a wireshark capture file that shows that?
[21:06] <wido> yes, but the problem is, there is a password of me in it
[21:06] <wido> http://zooi.widodh.nl/ceph/Screenshot-wlan0.pcap.png
[21:06] <wido> the IPv6 request is to my gateway, where the IPv4 is to the regular Apache server
[21:07] <Yoric> trahi
[21:09] <yehudasa> are all these HTTP GET operations triggered by you moving the movie time bar?
[21:09] <wido> yes, just clicking in the timebar somewhere
[21:09] <wido> right now my gateway is down due to a bug (mds), so i can't test it right now
[21:10] <wido> a good test would be to see if when i play the movie from the gateway and i click, there is traffic during the time i click and the movie jumps to that part. If there is traffic, that would mean that VLC simply downloads the movie until that particular point
[21:11] <wido> rather then doing a new request. But the question then is, why doesn't it do the new request on the gateway, while it does with Apache
[21:12] <yehudasa> how does the http conversation look like when going to the vanilla apache?
[21:13] <yehudasa> like right-click on the GET request and do a 'follow TCP stream'
[21:13] <wido> wait a second, i'll place the movie somewhere, make a pcap and upload it
[21:13] <wido> so you can see what VLC does
[21:13] <yehudasa> great
[21:16] <wido> yehudasa: http://zooi.widodh.nl/ceph/wlan0_1.pcap
[21:17] <yehudasa> thanks
[21:17] <wido> filter on http and check the traffic to 2a00:f10:103:1::2810:80
[21:17] <wido> for /members/wido
[21:18] <wido> i clicked through the timeline a few times to generate some requests, that works fine, in about < 2 secs VLC starts from where i clicked
[21:18] <yehudasa> yeah, I see that now
[21:26] <yehudasa> a major difference that I see is that the vanilla apache responds with 206-Partial Content response
[21:26] <yehudasa> not sure why it decides to do so
[21:27] <wido> would you expect a 207?
[21:27] <wido> oh, no, it's 206 indeed. If you read the RFC: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
[21:27] <yehudasa> I'd expect 200, but am now reading about the 206..
[21:28] <wido> a 206 is indeed normal here, since you ask for a byte range
[21:28] <yehudasa> yeah
[21:28] <yehudasa> so that's probably the solution
[21:29] <yehudasa> I'll check the Amazon S3 REST specifications to see if they have anything to say about that
[21:31] <yehudasa> no, can't find anything specific
[21:31] <yehudasa> I'll fix it to return the 206 case range was specified
[21:32] <wido> i found something i think
[21:32] <wido> http://developer.amazonwebservices.com/connect/message.jspa?messageID=94232
[21:32] <wido> the last message: "S3 also supports reading in chunks, so the reader plug-in could do this if it wanted to, but because S3 does not send an "Accept-Ranges: bytes" header, the reader plug-in does not try to do this"
[21:32] <wido> if you, the client thinks range is not supported, so it doesn't do that request
[21:34] <yehudasa> yeah, so I wonder whether we need to do both: send the 206 and the 'accept-ranges'
[21:35] <wido> yes, in the response header you should sent the Accept-Ranges: bytes
[21:35] <yehudasa> however, amazon don't send that
[21:35] <wido> and then respond to a request with a range with a 206 instead of a 200
[21:36] <wido> indeed
[21:36] <wido> but that's pretty weird imho, since with large files and crappy internet connections, you would want to resume your download if it got stuck
[21:37] <wido> or use a download accelerator which does multiple requests all with a specific range
[21:37] <yehudasa> yeah
[21:38] <wido> i don't think you would break any S3 compatability with implementing this
[21:41] <wido> i'm opening a S3 account to see what that does
[21:50] <wido> yehudasa: Amazon doesn't send a Accept-Ranges header
[21:51] <yehudasa> does it send 206 response?
[21:51] <wido> i can't test it, since VLC doesn't get the Accept-Ranges response header, it never sends a new request
[21:54] <wido> but for now, the gateway responds like Amazon does, although i still think partial content would be cool :-)
[21:54] <yehudasa> well.. we can probably add it
[22:05] <wido> i think so
[22:08] <wido> yehudasa: http://developer.amazonwebservices.com/connect/message.jspa?messageID=66774
[22:09] <wido> but i'm still thinking about why S3 does not send the "Accept-Range: bytes" response header
[22:16] <wido> sagewk: about the MDS crash, if you up the debug level to 20, it seems the MDS is becoming so slow, that it shuts itself down
[22:16] <wido> only with a lower loglevel it really crashes
[22:17] <sagewk> it just takes a long time to get through journal replay. got the logs leading up to the crash. looking at the core now..
[22:17] <wido> ok, great
[22:18] <wido> i'm going afk, ttyl
[22:25] <sagewk> wido: the mds crash is just out of memory, AFAICS. you have only 4gb and no swap
[22:26] <wido> oh, really? 4GB not enough?
[22:26] <sagewk> the memory use during recovery is governed more by journal size than the normal cache pruning limits. adding swap to that machine should do the trick. once it's up it'll be less of an issue.
[22:27] <wido> ok, great. Is some form of logging possible for this? which would give a hint for that
[22:27] <sagewk> we'll probably use tcmalloc for servers after v0.21 (way more efficient with memory), but don't want to merge that at this stage
[22:27] <wido> since a lot of crashes seem to be oom, but are pretty hard to find
[22:28] <wido> i'll add some swap then tomorrow
[22:28] <sagewk> ok. yeah we'll look at catching those exceptions
[22:29] <wido> btw, would i really benefit from placing the metadata pool on SSD?
[22:29] <wido> or would that only speed up a MDS recovery?
[22:30] <wido> but i really have to go, i'll add some swap (a lot) and then try again
[22:30] <sagewk> it'll speed up access to cold parts of the hierarchy as well.. yes!
[22:30] <sagewk> ok, ttyl!
[22:31] <wido> sagewk: cool, might be funny when you use a SSD for journaling and the metadata pool only
[22:31] <wido> and yehudasa should i open a issue for the range thing?
[22:31] * deksai (~chris@71-13-57-82.dhcp.bycy.mi.charter.com) Quit (Read error: Connection reset by peer)
[22:31] <wido> i might want to give it a try myself tomorrow to implement it, just see how far i get
[22:32] <wido> really afk now
[22:33] * deksai (~chris@71-13-57-82.dhcp.bycy.mi.charter.com) has joined #ceph
[22:33] <yehudasa> wido: yes
[22:41] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[22:50] * allsystemsarego (~allsystem@ Quit (Quit: Leaving)
[23:11] * eternaleye (~quassel@184-76-53-210.war.clearwire-wmx.net) Quit (Ping timeout: 480 seconds)
[23:21] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[23:47] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit (Quit: Yoric)
[23:48] <darkfader> i had also asked about using an ssd, i'll see if there's a good spot in wiki for it
[23:49] <darkfader> btw, did you see there is a review for btrfs with ssd mode (which is slower almost all times :(
[23:55] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) has joined #ceph
[23:59] * Yoric (~David@dau94-10-88-189-211-192.fbx.proxad.net) Quit ()

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.