Friday, September 21, 2007

Quote

"ALL men of whatsoever quality they be, who have done anything of excellence, or which may properly resemble excellence, ought, if they are persons of truth and honesty, to describe their life with their own hand; but they ought not to attempt so fine an enterprise till they have passed the age of forty."

Benvenuto Cellini, one of Tom Sawyer's favorite authors, from Cellini's autobiography.

Tuesday, September 18, 2007

Windows Vista, Sockets, Java NIO, and TIME_WAIT

Blogging on behalf of Shrideep Pallickara:

When you are working with NIO sockets on Microsoft Vista you may run into a
problem where you will sometimes be able to connect to a specific host/port
combination, and sometimes the following error related to a bind exception
would be thrown:
java.net.SocketException: Invalid argument: sun.nio.ch.Net.setIntOption
...


The reason this happens is that when a connection is closed it goes into a
TIME_WAIT state. This can be checked with a tool such as netstat. Sometimes,
it can take up to a couple of minutes to get out of this state. When you try
to establish connections to the same host/port combination at a later time
you may not be able to establish this connection, because the previous
connection is still in a timeout state.

To get around this problem you need to configure your socket so that it can
reuse addresses. With this fix you can bind a socket to the specified
Address even if there is a connection in the timeout state that utilizes the
socket's address or port.

Please note that on the client side you will need to configure the setup
BEFORE you bind the socket. The code below shows how you do this for NIO.
You will need to wrap this code-fragment in the appropriate try-catch block.


SocketChannel sc = SocketChannel.open();
sc.socket().setReuseAddress(true);
sc.socket().setKeepAlive(true);
sc.configureBlocking(false);

InetSocketAddress ia = new InetSocketAddress(_hostName, _portNum);
sc.connect(ia);

You will also need to do a similar configuration on the Server side of the
socket as well.

Wednesday, September 12, 2007

Restarting the ISCSI SAN

The ISCSI SAN periodically dies and the file systems become inaccessible. Logwatch error symptoms can look like this:

 --------------------- Kernel Begin ------------------------

WARNING: Kernel Errors Present
connection0:0: iscsi: detected conn error (1011) ...: 3 Time(s)
Buffer I/O error on device sdc1, ...: 29 Time(s)
EXT2-fs error (device sdc1): e ...: 171 Time(s)
end_request: I/O error, dev sdc, sector ...: 2421 Time(s)
lost page write due to I/O error on sdc1 ...: 29 Time(s)
sd 3:0:0:0: SCSI error: return code = 0 ...: 2421 Time(s)
sd 3:0:0:0: SCSI error: return code ueu ...: 1 Time(s)

To restart, follow these steps:

1. Log into the local NavSphere Express web utility running on the SAN.
2. Restart the SAN via NavSphere
3. Reboot the Linux server that mounts the SAN. If you don't want to reboot, try restarting the iscsi daemon and remounting the file system. Use fdisk -l to make sure that the iscsi devices are visible.

services iscsi restart
fdisk -l
mount /my/iscsi/partition