Wednesday, June 18, 2008

WTF, doesn't Solaris speak Linux? (The NAS Project, part 3)

[covering software installation, configuration, troubleshooting]

When we last left our hero, he had just successfully powered on his newly assembled NAS and was rejoicing. The emotion was short lived.
Prior to ordering everything, I did some searches against the solaris hardware compatibility list and didn't really turn up anything untoward.
I still had some issues, however. By far, the biggest stumbling blocks for me were getting the network adaptors functioning and having the operating system recognize my sata drives.

But I'm getting ahead of myself. The first step was to actually install the OS. I had downloaded the opensolaris 2005.08 iso and burned it to disc. It's a live-cd, so I was able to boot into it, and then run the install. Unfortunately, I must have had some corruptions in the download or the burn, because the install didn't work correctly. I tried again, and this time the download was good (verified the hash check, which I should have done originally - shame on me). But the burn didn't work -- great, another coaster. The third time was the charm, though.
Opensolaris was installed and booting.

Then it freaked out. It would start, but as soon as the bios would post the video would go out. Or the video would go out when I logged into the gui. Or it would start, then the video would go out when the screensaver kicked in, and never come back.

After much hair pulling & a helpful suggestion from the LogicSupply tech support, I tracked the problem down to the memory being in a bad spot. I moved the RAM chip to the other slot, ran a memory test from a bootable disc (http://www.memtest.org/) which showed 100% error free, and everything worked great after that.

The next problem I tackled was getting it to see my network... "N" is the first part of NAS, after all. The network adaptors on my motherboard are Broadcom BCM5787M (vendor id: 14e4, device id: 1693), and not supported natively in opensolaris yet.
After much searching & forum reading and attempting various & sundry things, I stumbled across this thread:
http://opensolaris.org/jive/thread.jspa?messageID=195224
It was during all of this trial and error that I came to realize networking and the various commands in opensolaris were different enough from linux to cause me a decent amount of confusion.

Anyhow, I ended up downloading & installing the archived drivers, and afterward my system says the broadcom devices were "up" -- I just wasn't getting an IP.
Finally, after a week and a half of tweaking & reading through the various forums, I've managed to get everything working & have it persist through restarts.
It still grumbles about configuration errors & forcing things into maintenance mode when I boot but things seem to be working fine.

NOTE: Don't bother with the network gui. Don't even go into it.
NOTE: In opensolaris, your network adaptors are named for their drivers. Since I was using the broadcom driver, my two adaptors are bcme0 and bcme1.
NOTE: hostnames and IP's have been changed to protect the innocent :)
NOTE: the stuff below is mostly from memory. I'll try to double-check it before I post but may not get to it. UPDATE: Double checked, and I think this is complete.

First, I disabled the NWAM service (network auto magic? -- it didn't work, either):
# svcadm disable /network/physical:nwam

I enabled the normal network service:
# svcadm enable /network/physical:default

[EDIT] I forgot to mention the magic bean... the thing that makes all of our networking stuff gel. This is specific to the hardware/drivers, so isn't totally necessary for anyone else not using the bcme.
# svccfg -s network/physical:default setenv DLPI_DEVONLY 1
# svcadm refresh network/physical:default
# svccfg -s network/physical:nwam setenv DLPI_DEVONLY 1
# svcadm refresh network/physical:nwam
# reboot


http://www.opensolaris.org/jive/thread.jspa?threadID=61541&tstart=50

Without doing this, you can have an IP address & snoop the network traffic, but not be able to ping/respond or do anything else.

I put my router's IP in the /etc/defaultrouter file:
# cat /etc/defaultrouter
192.168.0.1

I edited my etc/hosts to add a static IP for the adaptor/box:
# cat /ets/hosts
127.0.0.1 localhost
192.168.0.10 supernas

I added my DNS entries to /etc/resolv.conf (one for my router's IP which should do DNS lookups, as well as one for OpenDNS):
# cat /etc/resolv.conf
nameserver 192.168.0.1
nameserver 208.67.222.222

I edited copied /etc/nsswitch.dns over to /etc/nsswitch.conf and then edited the .conf file so the hosts line read:
# cat /etc/nsswitch.conf | grep hosts
hosts: files dns

I added a default route to my router:
# route -p add default gateway 192.168.0.1 1

I edited /etc/netmasks (for various complicated reasons, I have a couple different subnets at home. Most people will use the default 255.255.255.0):
# cat /etc/netmasks
192.168.0.0 255.255.254.0

I put my system's hostname & some configuration commands in the /etc/hostname.[interface] file:
# cat /etc/hostname.bcme0
supernas netmask + broadcast + up

And that seemed to do it!

The next big problem I had was the fact that opensolaris couldn't see the really cool hot-swap sata drives, no matter how much tweaking I did.
You can find some of my problems in this thread I started (the thread got away from me, but the first few posts are still valid): http://opensolaris.org/jive/thread.jspa?threadID=62493

With the backplane in place & the drives socketed in their bays, opensolaris never recognized them as being anything other than "empty" no matter what bios setting I used (AHCI, legacy, etc.) If I directly connected them to the mobo it recognized them just fine, however. Drat -- I had really wanted to have the hot-swap capability.
Oh well, maybe the chenbro chipset will be supported in a later release of opensolaris.
I decided to just directly connect the drives to the motherboard and removed the backplanes (after some case dissasembly). BTW -- if anyone is developing drivers for this backplane, I'd be willing to temporarily donate one of the backplane boards to the cause, assuming I could get it back once they're supported :)
I then test fit the drive caddys in to see how the cables would fit through the new holes. Nope... had to notch the little metal plate that held the backplanes.
Once that was done, everything fits nicely and opensolaris saw the drives just fine.
I checked the "location" of the new drives:
# cfgadm | grep sata
App_Id Type Receptacle Occupant Condition
sata0/0::dsk/c6t0d0 disk connected configured ok
sata0/1::dsk/c6t1d0 disk connected configured ok
sata0/2::dsk/c6t2d0 disk connected configured ok
sata0/3::dsk/c6t3d0 disk connected configured ok

(or something similar - that's not the exact output)

From that point, creating my 2+ terabyte storage pool was a matter of:
# zpool create tank raidz c6t0d0 c6t1d0 c6t2d0 c6t3d0

Then I downloaded and installed the SMB server, and created a zfs filesystem for sharing stuff (which is covered in several places online, so I won't go into it here).

And the story comes to it's conclusion... I have a 2TB RAID storage server sitting on my network, keeping our photos safe and happily giving movies/media to TVersity (on my primary computer) to serve to my PS3.

Next steps:
  • Before anyone mentions that my stuff still isn't 100% safe -- I'm in the process of figuring out off-site storage for the photos. I may write something (or find something someone else has written) to synchronize my photo filesystem with flickr.com.

  • Obtain another 2.5" hard drive and add it to the operating systems zpool as a mirror (I'll probably have to unplug the optical for this).

  • Get a low-cost managed switch and set my two gigabit network adaptors up with link aggregation.


[edited because I accidentally labeled this "Part 2" instead of "Part 3"; I also forgot a couple commands in the configuration part]

5 comments:

Anonymous said...

what a disaster. Sun should really have a centralized web page that tracks the freedom status of each piece of Solaris, and of OpenSolaris/Indiana. You should be able to type in a pathname and a build number, and get either a link to the source code used to build that object, or a message that it's not available. That would be incredible. obviously that's kind of a dream, but the fact that this sort of software map doesn't exist in any form makes me distrustful as well as inconvenienced, because I constantly see Sun benefitting from common people's assumption that Solaris is far more open than it actually is.

They assume all integrated drivers are open-source and only non-integrated downloadable drivers are closed. or they assume that all drivers in opensolaris are open, while some drivers in SXCE are closed. nope.

The thread you quoted says there are two drivers for your card---one that is open-source and integrated, and one that is closed-source and not integrated. You're using the closed-source driver.

On my desktop I'm using an open-source non-integrated driver from Masayuki Murayama. And of course there are many integrated, closed-source drivers that Sun developed under NDA or simply never got around to opening, in both SXCE and opensolaris. The ones in opensolaris are redistributable while some in SXCE can only be downloaded directly from Sun with click-wrap and your SDC web-user-login. but there is still huge amounts of proprietary crap in opensolaris that's not in linux.

Sun aside, in general this broadcom chip has been a pain in my ass on every OS. My initial impression like six years ago was that it was technically superior to the intel gigabit chip, but there are just so many regressions between revisions, of chips, of drivers, of chips with fiber PHY instead of copper, of the chip on non-i386 architecture, it's a real time-waster. especially when the problems are so subtle, like ``poor performance'' or ``drops out for a few seconds every two hours.'' I bet the broadcom chip generates more mailing list and forum postings than any other ethernet MAC ever.

Anonymous said...

Hi Benjamin,

thank you very much for your detailed report. I am very interested to hear about your experiences with the NAS, now it is running. I am planning to build a similar box with the same case and OS (maybe NexentaCore) so any reports about performance or further issues is appreciated.

Udo

Anonymous said...

Many many thanks for posting this. I have an almost identical setup but using the KINO-690AM2 board and I've had intermittent ethernet problems since I upgraded from b78. Generally a warm or cold reboot would bring the ethernet up, but it could take several goes to do it. Adding the setenv has fixed that.

Re your other problem, my Chenbro and the KINO have had no problems with Solaris recognising the backplane.

Andy

Anonymous said...

Update: from snv_94 the 5787 is supported by bge.

Works fine on my nas.

BA Ellison said...

fantastic, thanks Andy! Now I have to figure out how to upgrade.