#ceph IRC Log

Index

IRC Log for 2012-02-10

Timestamps are in GMT/BST.

[0:27] * joao (~joao@89.181.154.123) has joined #ceph
[0:27] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[0:52] * lollercaust (~paper@85.Red-83-41-151.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[0:59] <yehudasa_> jdwilson: set 'debug rgw = 20' and also 'debug ms = 1'
[0:59] <yehudasa_> jdwilson: also maybe set 'log dir = /var/log/radosgw'
[1:04] * ninkotech_lite (~dp@ip-85-160-202-232.eurotel.cz) Quit (Quit: Konversation terminated!)
[1:06] * verwilst (~verwilst@d51A5B5DF.access.telenet.be) Quit (Quit: Ex-Chat)
[1:09] <joshd> mosu001: is there anything in your mds logs?
[1:11] <mosu001> There are no logs for the second MDS which I'm assuming is the standby one
[1:11] <mosu001> For the first MDS there is a crash from before I started repairing the system
[1:12] <mosu001> 2011-07-29 22:02:26.169701 7f5d23f57710 -- 192.168.1.4:6800/10411 >> 192.168.1.3:0/10913760 pipe(0xbcdc80 sd=12 pgs=2 cs=1 l=0).fault initiating reconnect
[1:12] <mosu001> 2011-07-29 22:02:26.334061 7f5d23d55710 -- 192.168.1.4:6800/10411 >> 192.168.1.3:0/3758642648 pipe(0xc26c80 sd=13 pgs=0 cs=0 l=0).accept peer addr is really 192.168.1.3:0/3758642648 (socket is 192.168.1.3:34280/0)
[1:12] <mosu001> *** Caught signal (Aborted) **
[1:12] <mosu001> in thread 0x7f5d2435b710
[1:12] <mosu001> ceph version 0.31 (commit:9019c6ce64053ad515a493e912e2e63ba9b8e278)
[1:12] <mosu001> 1: cmds() [0x8070d9]
[1:12] <mosu001> 2: (()+0xf2e0) [0x7f5d299132e0]
[1:12] <mosu001> 3: (gsignal()+0x35) [0x7f5d2849e9e5]
[1:12] <mosu001> 4: (abort()+0x186) [0x7f5d2849fee6]
[1:12] <mosu001> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f5d28cf8cdd]
[1:12] <mosu001> 6: (()+0xbdef6) [0x7f5d28cf6ef6]
[1:12] <mosu001> 7: (()+0xbdf23) [0x7f5d28cf6f23]
[1:12] <mosu001> 8: (()+0xbe02e) [0x7f5d28cf702e]
[1:12] <mosu001> 9: (ceph::buffer::create_page_aligned(unsigned int)+0x95) [0x787205]
[1:12] <mosu001> 10: (SimpleMessenger::Pipe::read_message(Message**)+0x1301) [0x7a33a1]
[1:12] <mosu001> 11: (SimpleMessenger::Pipe::reader()+0xb8a) [0x7aceba]
[1:12] <mosu001> 12: (SimpleMessenger::Pipe::Reader::entry()+0xd) [0x4b609d]
[1:12] <mosu001> 13: (()+0x6a4f) [0x7f5d2990aa4f]
[1:12] <mosu001> 14: (clone()+0x6d) [0x7f5d2853c82d]
[1:12] <mosu001> Even the IP addresses in the lines before the crash are wrong...
[1:12] <mosu001> Actually, I didn't think IP addesses had to be specified fro the MDS servers?
[1:14] <Tv|work> mosu001: everything contacts the monitors and learns its addresses from there
[1:14] <mosu001> The crash is the last entry in the log, so it appears not to be working since before I upgraded my version of ceph and repaired the OSDs
[1:14] <Tv|work> mosu001: that you're seeing unexpected ip addresses is probably meaningful
[1:14] <mosu001> Myabe I did not copy across the right ceph.conf file and that caused the crash?
[1:15] <mosu001> However, I think everything is OK now, but the MDS is stuck in replay
[1:15] <Tv|work> mosu001: is your clock horribly wrong?
[1:15] <mosu001> It may well have been at the time but I am using ntpd -q before starting ceph
[1:15] <mosu001> The crash was from a long time ago
[1:16] <mosu001> but I've just had time to start fixing things
[1:16] <Tv|work> mosu001: there's a high chance the bug's already been fixed
[1:16] <Tv|work> since 0.31
[1:17] <mosu001> I'm using ceph 0.34 now, but I should update this to latest
[1:17] <mosu001> I thought my scripts had done this, but apparently not...
[1:18] <mosu001> I'll update ceph source and rebuild, etc and then get back in touch if I still have problems
[1:18] <mosu001> Thanks!
[1:18] <mosu001> ls
[1:32] * mosu001 (~mosu001@awlaptop1.esc.auckland.ac.nz) Quit (Quit: Leaving)
[1:43] * joao (~joao@89.181.154.123) Quit (Quit: joao)
[1:43] * yoshi (~yoshi@p8031-ipngn2701marunouchi.tokyo.ocn.ne.jp) has joined #ceph
[1:57] * jeffhung (~jeffhung@60-250-103-120.HINET-IP.hinet.net) has joined #ceph
[2:01] * Tv|work (~Tv|work@aon.hq.newdream.net) Quit (Ping timeout: 480 seconds)
[2:27] * livemoon (~deadsun@221.133.228.74) has joined #ceph
[3:06] * amichel (~amichel@salty.uits.arizona.edu) has joined #ceph
[3:16] <amichel> Is there any reason anyone can think of off the top of their heads that I shouldn't be able to make a ceph osd on zfs? I'm getting a strange error out of mkcephfs and I think it's zfs related but I can't figure out what exactly the problem is.
[3:18] * The_Bishop (~bishop@e177091002.adsl.alicedsl.de) Quit (Quit: Wer zum Teufel ist dieser Peer? Wenn ich den erwische dann werde ich ihm mal die Verbindung resetten!)
[3:20] <joshd> amichel: make sure you have xattrs enabled
[3:20] <amichel> I do, it's mounted with xattr
[3:21] <joshd> what's the error you're getting?
[3:21] <amichel> "unable to open journal: open() failed: (22) Invalid argument"
[3:22] <amichel> Which is weird cause it definitely makes the journal file
[3:25] <joshd> does zfs not support O_DIRECT or something?
[3:25] <amichel> That would be odd
[3:25] <amichel> Lemme see
[3:25] <joshd> if you strace -f you can see the flags it's using
[3:26] <amichel> Ha, you're right
[3:26] <amichel> https://github.com/zfsonlinux/zfs/issues/224
[3:28] <amichel> Guess I'll just move the journals off to a different FS for the moment
[3:31] <joshd> you can set "journal dio = false" in the osd section of your ceph.conf, and it should work (using fsync or fdatasync instead of directio)
[3:33] <amichel> Hey, that worked much better
[4:03] <amichel> Well it worked much better for a little bit anyway
[4:04] <amichel> Replaced the zpools with btrfs stripes, works like a champ, so it must all be zfs related
[4:04] <amichel> Man, I was really hoping to just have an easy life with zpools
[4:06] <amichel> Well, thanks for helping me track some of it down, at least! Have a good one.
[4:06] * amichel (~amichel@salty.uits.arizona.edu) Quit (Quit: Bad news, everyone!)
[4:23] * joshd (~joshd@aon.hq.newdream.net) Quit (Quit: Leaving.)
[4:25] <livemoon> amichel: does ceph support zfs?
[5:31] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) Quit (Quit: Leaving.)
[5:51] * bchrisman (~Adium@c-76-103-130-94.hsd1.ca.comcast.net) has joined #ceph
[5:53] * dmick (~dmick@aon.hq.newdream.net) Quit (Quit: Leaving.)
[19:38] -kilo.oftc.net- *** Looking up your hostname...
[19:38] -kilo.oftc.net- *** Checking Ident
[19:38] -kilo.oftc.net- *** Found your hostname
[19:38] -kilo.oftc.net- *** No Ident response
[19:38] * CephLogBot (~PircBot@rockbox.widodh.nl) has joined #ceph
[19:43] <nhm> hi mom
[19:47] * lollercaust (~paper@172.Red-193-153-66.dynamicIP.rima-tde.net) has joined #ceph
[19:58] * dmick (~dmick@aon.hq.newdream.net) has joined #ceph
[20:20] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[20:24] <Tv|work> good news: 1) serial console works, after i patch the rootfs to start a getty 2) i know why the keyboard trouble is there; i need to patch the rootfs with a new kernel to make it go away (and be all better in other respects)
[20:26] <dmick> usb issue?
[20:42] <yehudasa_> gregaf1, nhm: the wireshark plugin hasn't been touched for more than 2 years, maybe closer to 3. I think the biggest change in protocol that broke everything is the auth handshake.
[20:57] * fghaas (~florian@85-127-86-65.dynamic.xdsl-line.inode.at) Quit (Remote host closed the connection)
[21:04] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) Quit (Quit: Leaving.)
[21:21] * fghaas (~florian@85-127-86-65.dynamic.xdsl-line.inode.at) has joined #ceph
[21:31] * MarkDude (~MT@76-218-98-3.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[21:35] * lollercaust (~paper@172.Red-193-153-66.dynamicIP.rima-tde.net) Quit (Quit: Leaving)
[21:39] * izdubar (~MT@76-218-98-3.lightspeed.sntcca.sbcglobal.net) has joined #ceph
[21:42] * MarkDude (~MT@76-218-98-3.lightspeed.sntcca.sbcglobal.net) Quit (Ping timeout: 480 seconds)
[21:47] * izdubar (~MT@76-218-98-3.lightspeed.sntcca.sbcglobal.net) Quit (Quit: Leaving)
[21:50] * andreask (~andreas@chello062178013131.5.11.vie.surfer.at) has joined #ceph
[21:55] <jmlowe1> is this normal? osd.0[8928]: 7fcb85ff9700 -- xxx.xxx.xxx.xxx:6802/8927 >> xxx.xxx.xxx.xxx:6811/10812 pipe(0x3cb6500 sd=30 pgs=4 cs=1 l=0).fault with nothing to send, going to standby
[21:55] * fghaas (~florian@85-127-86-65.dynamic.xdsl-line.inode.at) has left #ceph
[22:01] <jmlowe1> Also, anybody interested in a btrfs stacktrace from syslog?
[22:09] <joshd> jmlowe1: I think that messenger warning is harmless
[22:10] * nhorman (~nhorman@99-127-245-201.lightspeed.rlghnc.sbcglobal.net) Quit (Quit: Leaving)
[22:10] <joshd> jmlowe1: what's the stacktrace?
[22:10] <jmlowe1> http://pastebin.com/WLPBLPKY
[22:11] <jmlowe1> probably nothing, the btrfs guys just said slow path warnings aren't interesting
[22:12] <joshd> yeah, apparently it's been hit before with ceph: http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/4777
[22:26] <nhm> oooh, Tesla Model X looks nice.
[22:33] <jdwilson> hey guys -- topography question -- if i've got two servers with 4 drives each, would it be better for performance / stability if i opt to go with two osds (one per machine) and raid10 the drives or to have 8 osds?
[22:40] <gregaf1> raid10 is a lot of drive waste to put under the replication we already do
[22:47] <nhm> jdwilson: let me know what you end up going with and how it works out.
[23:01] <nhm> ugh, github down again.
[23:22] * aa (~aa@r200-40-114-26.ae-static.anteldata.net.uy) Quit (Remote host closed the connection)
[23:30] <Tv|work> note to self: don't serve same filesystem image as read-write to two machines :-/
[23:42] * lollercaust (~paper@172.Red-193-153-66.dynamicIP.rima-tde.net) has joined #ceph
[23:46] <Tv|work> "console: Serial Device 2 is currently in use" grrrrrr give me something to throw
[23:50] <Tv|work> and btw that is why i think there should be some process between the serial port and the end user, that mediates access (and can also log all output, as a side)

These logs were automatically created by CephLogBot on irc.oftc.net using the Java IRC LogBot.