My current troubles…

March 24th, 2007

Wow … I just realized that it’s been a long time since I wrote a post here. As usual, I blame this on being busy. I’ve been doing a lot of scripting and programming at work now for various projects, which has been fun. It has, however, gotten me stuck for long periods of time in front of the computer doing the usual write/test/debug cycle.

Back on the “home front” as it were I have other things that have been eating my time, to wit:

1. Windows XP is steadfastly refusing to boot on my laptop. I fear I’ve borked it up bad enough that I’m going to have to do a restore off the IBM rescue partition, which may or may not nuke my  Kubuntu install … ugh.

2. Debian is steadfastly refusing to install on my test NSLU2 (it worked fine on the one I actually have in production). I need to find some time to debug this.

3. I’m working on putting together specs of a new and very high end desktop for myself. Extravagant? Yes! But I feel like I’ve earned it by now.

The NetBSD boot process - part 4

February 19th, 2007

I know that this is way late, but I’ve finally found some time to sit down and write some more of my guide to the NetBSD boot process. I should note before I go much farther that the NetBSD project itself has a kernel internals guide and a kernel programming FAQ that will be of interest to anyone trying to understand the inner workings. I haven’t had much time to pour through them in great depth, but they look really good.

Anyhow, as promised from last time, we’ll start with the init386 routine in sys/arch/i386/i386/machdep.c. The first thing that happens here is a call to cpu_probe_features (which is in identcpu.c). This functions calls a couple of other routines that use assembler magic to read the CPU ID and feature set out of a control register (both the Intel and AMD documentation describe how this is done for each of their particular chips). The next 3 lines of code set up the skeleton for process 0. This includes setting up its memory area and PCB (process control block, which as I understand it holds the CPU registers, etc. for the process). This is then put in the cpu_info_primary structure. The cpu_info_primary structure is a cpu_info struct. The cpu_info struct is defined is sys/arch/i386/include/cpu.h and includes information as to the owner of the CPU, its ID, APIC ID, trap information, and other low level stuff. Therefore, in init386 we immediately set up process 0 (the swapper) to own the boot processor. However, we have not actually switched control over to this process.

Assuming that we’re not on an X-Box, the next step in the procedure is to initialize the x86 bus space. On a PC, communication with various devices occurs over a bus (examples being a PCI express bus, the vanilla PCI bus, or for older systems the ISA bus). It is worth noting that both NetBSD and OpenBSD (not sure about FreeBSD) have the concept of a device tree. This arranges devices in a logical tree like fashion. For example, a simplified device tree for my test NetBSD system has mainbus0 as the root, with cpu0 and pci0 hanging off of it, and the various ATA buses hanging off pci0 and the actual hard drive and CD devices hanging off of them (I had planned to draw this using some ASCII art, but Wordpress’s HTML editing feature sucks so I could not get it to look right. I think I’ll try to draw it in Xfig and attach).

Due to architectural similarities, the x86_bus_space_init function is shared with the amd64 port and thus not under the sys/arch/i386 directory. Instead, it is in arch/x86/x86/bus_space.c. Really, all this function does is set up the memory mapping for the I/O ports and memory maps for devices (there are two extents created, one for the ports and one for the memory maps, which I assume are used for things like memory mapped I/O). These are very generic and I imagine the memory actually gets associated with devices when the aforementioned device tree is populated.

This is getting long enough for the moment, so we’ll finish with the init386 function next time. As a preview, most of the rest of this function deals with setting up and populating the physical memory map. if you’ll recall from last time, we’ve set up the basic structures for the virtual memory subsystem and now it seems like it’s time to actually get into the nuts and bolts of getting memory ready for allocation by kernel and user processes.

Wireless troubles

February 12th, 2007

I’ve been having some strange trouble with my home wireless network lately. Everything seems to work OK in Windows XP, but Kubuntu Linux just can see my access point despite having the correct ESSID and WEP key. I think it might be Kubuntu itself screwing up, so I plan to try with Knoppix at some point. My card is a well supported Atheros that has worked perfectly well in the past. This is probably my number one minor irritant at the moment (combined with the usual major irritants at work).

I still don’t have a lot of energy for writing blog posts. Hopefully I’ll get back on with it fairly soon.

Busy at work again…

January 25th, 2007

Sorry for the delay in the next NetBSD kernel article. I’ve been busy at work some more. Today I took down one of our production file servers, swapped the SCSI card back into a different control node, and brought it back on line (the control node swapped back in was the original which had suffered a power supply failure some weeks ago but is now fixed—meanwhile a backup node was in its place).

I also have a number of longer term projects which are coming to critical points but I will try to have something written up this weekend.

NSLU2 update

January 16th, 2007

Over the long weekend I installed Debian on my NSLU2 (the little NAS device I’ve blogged about earlier). The installation was very easy. The only problem I had was resolved when I realized that I was using the wrong installer image. I was able to put the system into recovery mode and use upslug2 to re-flash using the correct Debian install firmware (for Etch RC1). During the install I selected the file server packages and got NFS and SAMBA set up and working just as they were using the UNSLUNG firmware.

The main problem is I still get corruption when copying large (> 200 MB) files over NFS. This doesn’t occur over SCP, so I wonder if it’s something in the NFS stack. I need to check if it’s using UDP or TCP by default. If UDP, I will switch to TCP. The corruption is also accompanied by some NFS errors in dmesg. if I can figure iot out, I will ask on the most excellent nslu2-linux mailing list. Anyone who has one of these should definitely check out nslu2-linux.org.

I was hoping to have another NetBSD internals article these weekend, but didn’t get around to writing it … maybe in a week or so.

NetBSD kernel internals: booting part 3

December 30th, 2006

Hi All! It’s time to get back to discussing the boot process of NetBSD. In my last post, I described how the boot code jumps to the kernel. We left off with the jump to the kernel entry point in src/arch/i386/i386/locore.S, so we’ll pick it up there.

Note that locore is all assembly. The first thing this file does is checked if it was booted by a multiboot compliant bootloader (e.g. GRUB) or NetBSD’s own bootloader. This is important because the way the bootloaders pass information to the kernel differs. However, the multiboot structure won;t actually be parsed until later after the kernel has relocated itself in its virtual address space. As a brief diversion, I will say the multiboot specification is a very handy thing based on my experience writing a couple of little toy OS initialization routines for the i386 processor. The multiboot specification defines where in memory the kernel can look for hardware information passed from the bootloader. This information includes stuff like different memory regions, total RAM installed, boot device, etc.

Anyhow, we’ll assume here that the kernel ascertains it was booted from its own bootloader (the process I described in parts 1 and 2 of this little series). In this case we call the C function native_loader which places the arguments placed from the NetBSD bootloader into a known part of the address space. Again this is basic memory and device info. After this is done locore.S goes through a rather complex process of determining what kind of processor the machine has. Naturally, this is a architecture dependent operation but for x86 processors there is an identification string that the code checks to see what kind of processor it is running on.

After this, locore goes and sets up the kernel address space. Copying and pasting from a comment in the source code, the address space looks something like:

text | data | bss | [syms] | page dir | proc0 kstack

Where the kernel text is at the lowest address and the kernel stack for process 0 (the kernel swapper thread) is at the highest address. The highlite of this is calculating the size for and setting up the boot tables (not sure what those do yet) and the page directory and creating the initial page tables (page tables deal with virtual to physical address translation and managing them is one of the critical functions of any protected mode OS kernel). Once the page tables for the kernel are permitted, the appropriate access permissions is placed on the different segments (read only for the kernel text and read write for almost everything else). Once this is done the page directory entries (PDEs) for the kernel are created and the processor is told to begin using the newly created page tables thus completing the setup of the kernel virtual memory space. This is done by loading the page table directory into the cr3 control register, setting a bit in the processor control register cr0, pushing the appropriate address in the virtual space onto the stack, and doing a ret instruction to jump to it.
Following this jump, there is some more relocating to do and the multiboot boot header is safely relocated this point if necessary (see above). Then the code calls the init386 C routine in sys/arch/i386/i386/machdep.c. However, this is getting long for a blog post, so I’ll start with that next time.

I would really enjoy hearing feedback on this little series of articles. In particular, if anyone familiar with the NetBSD code sees any mistakes please let me know so I can publish an erratum. As a bit of background, I’m trying to take up studying the NetBSD kernel as a little hobby to refresh my memory of my graduate OS course and in the hopes of being able to contribute to the BSD projects. I also feel that while there are book length studies of the FreeBSD, Linux, and Windows NT kernels, a little study of a kernel like NetBSD might be a useful resource :-) .

Happy Holidays!

December 23rd, 2006

I know … I’m massively behind on writing. I blame an absolutely terrible month at work in terms of work load (I think I get busier all the time—don’t we all). Still I have a few days over over Christmas so I’m hoping to pump another article out on my NetBSD booting walkthrough. I just built the latest version of -CURRENT (4.99.7) so I should be ready to go.

Damn, I’m good

November 23rd, 2006

A timeline of my early Thanksgiving morn

1:30 AM: I notice that my shell session on my OpenBSD router has stopped responding and I can no longer SSH in, although the machine is still forwarding packets.

1:45 AM: I try to hard reset the machine. It does not come back up.

1:50 AM: Attaching a keyboard and monitor, I rapidly discover that the problem  is a failed hard drive that seems to have corrupted the /usr and /var partitions to the point that the kernel panics when trying to mount either of them. From the debug messages, the root cause appears to be disk damage.

2:05 AM: I swap the network cards into a spare machine that had been running an old FreeBSD 7 snapshot. VERY fortunately I had purchased a complete install set of OpenBSD 3.9 so I had an actual CD with all the sets (this was done for just this eventuality). I commence the OpenBSD install.

2:25 AM: After wasting a few minutes remembering how to run disklabel, the install is finished. Note—I have started running ntpd so I set it to start on boot.
2:27 AM: firstboot; notice I have the network cards in backwards. I swap the Ethernet cable and all is well.

2:30 AM: I create my own account so I can stop using root. For a change, I remember to put myself in the wheel group.

2:31 AM: After a keyboard swap, I ssh to my main workstation and grab the most recent backup of this host’s /etc directory. I swap in the necessary config files to get the PPPoE working properly, remembering to change the interface names to reflect the now swapped cards.

2:32 AM: Success #1, I can ping hosts on the Internet.

2:33 AM: Forgot to change the interfaces in pf.conf, so my first attempt to start PF freezes my ssh connection. Back to the console to unlock the machine and change the offending setting.

2:35 AM: PF is up but strangely packets aren’t being forwarded.

2:42 AM: After several minutes of head scratching, I realize I am a dolt and forgot to use sysctl to turn on packet filtering in the kernel. This accomplished, I am back online!

There—an hour and 12 minutes from system hardware failure to back up and running on a new host. This proved to me I know my OpenBSD (although having a backup of /etc so I didn’t have to rewrite config files from scratch was a huge help—always make back-ups). Frankly, given the beat up old hardware I use for these tasks, I know these situations will come up (this has happened once before and caused a lot more pain).

NetBSD boot part 2: the kernel load

November 19th, 2006

I have been away in Tampa, FL for the past week at Supercomputing ‘06, which was a real blast. I say this because there was free alcohol, although that was hardly the sole reason. Anyhow, I imagine I will write a post about that later. Meanwhile, I wanted to continue my little guide through the NetBSD kernel. In the last little piece, we discussed the MBR and the primary and secondary boot loaders. Now let’s go on and discuss what happens when the initial bootstrap has completed and the time comes to start the kernel.

The /boot program itself locates a suitable kernel image and then calls the bootit function which essentially just calls exec_netbsd. This function does not return unless there’s some sort of error. The exec_netbsd function is located in sys/arch/i386/stand/lib/exec.c (for i386 machines). This code gathers some memory information and calls the loadfile function (the code for which is in sys/lib/libsa/loadfile.c)  to actually load the kernel file (which is just a normal ELF binary). This gets loaded into the address passed from bootit and the exec_netbsd function gathers some additional info for the kernel into  the btinfo_symtab structure (this is put into a well known address so that the kernel can access it). The exec_netbsd function then calls startprog. This function sets up the kernel stack, passes the arguments to the kernel onto it, sets up all of the data segments and then actually jumps to the kernel. The way it does this is by pushing the entry address for the kernel onto the newly-created stack and then uses lret to cause the processor to jump to it. Note that at this point the processor has already entered protected mode and the data space is now the standard flat 4 GB pagable address space.

The main i386 kernel entry appears to be in sys/arch/i386/i386/locore.S (it actually took me a little bit of time to find this). This code sets up the initial mappings and page entries before calling the kernel’s main C function (in sys/kern/init_main.c). I’ll detail that in my next post, which hopefully won’t take 2 weeks to show up.

NetBSD kernel hacking

November 7th, 2006

I’ve decided to do some more tinkering around with *nix kernels, this time taking a bit of a look at NetBSD. I have built a bleeding edge (4.99.3—no really) NetBSD box using an old PII 233 with 64 MB of RAM (I really get use out of old hardware!) and have started digging through some of the guts.

NetBSD’s boot loader is an interesting study. The MBR bootsector code (in sys/arch/i386/stand/mbr for the i386 architecture) jumps to the bootstrap code at the beginning of the BSD partition (in sys/arch/i386/stand/bootxx). The code in pbr.S (the partition boot record) loads the primary bootstrap code into memory. There is a bit of ASM here (bootxx.S) that then calls the boot1 function (in boot1.c). This code apparently figures out the on-disk location of the secondary bootstrap program (which is just a file on the / BSD slice called /boot [not to be confused with the /boot directory on some OSes]). This secondary bootstrap program then is responsible for loading up the kernel image.

Anyhow, that’s a quick tour of how things work “under the hood” in a NetBSD boot, as mostly figured out by me looking through the source code (so hopefully I didn’t get too much of it wrong ;-) ). I’m planning to post more entries like this as I continue to explore the NetBSD source.

Firefox 2.0

October 29th, 2006

I installed Firefox 2.0 the other day. So far it seems to work pretty well and in particular I LOVE the in-line spell-check. It’s very handy for things like GMail chat and writing blog posts (not that I do that very much). With any luck, it will reduce the number of stupid typos I always seem to put into things.

One thing I noticed was that some of the fonts were off in various Web pages that I like to visit. Going to freshmeat.net and installing the webcore-fonts seems to have solved this problem. I had them installed on my work workstation for awhile and I guess I just never bothered putting them on my home system. It’s interesting that I didn’t seem to need them under 1.5.

Also,  the Flash 9 beta didn’t work well with Firefox 2 for me (kept crashing), despite being rock solid under 1.5 One of my friends at work reported no problems, however, so I’ll have to have a closer look at it sometime.

Back in the saddle

September 30th, 2006

After a hectic month getting new machines in and the computer room rearranged, I got a couple of weeks off to see family in South Dakota and Colorado. We also went down into New Mexico and saw Carlsbad Caverns and Bandolier National Monument. That was probably the most fun for me of the whole trip. Maybe I should consider giving up tech and become a park ranger :-) .

Anyhow, I’ve been sort of considering what do with the blog, since it just kind of runs on fumes now. Work is still hectic, though, so I doubt I’ll be able to do something before the end of the year.

No entries

September 4th, 2006

I’ve had an extremely busy/stressfull month of work, which is why there are no entries here. There will probably be more at some point, but not right away.

Install CentOS 4.3 completely diskless

August 9th, 2006

The other day at work I received a machine with no floppy or CD-ROM. Of course, the NIC supported PXE booting so I set to work trying to install CentOS 4.3 via the network. Unfortunately, the documentation isn’t terribly good for people who haven’t done this sort of thing before, so I decided to write this little guide. It assumes that your NIC is capable of talking PXE. Read the rest of this entry »

Oracle on Ubuntu

August 6th, 2006

I managed to get Oracle going on Kubuntu. My only experience with Oracle is with a little work we did in my J2EE class back in grad school at Chicago, but I’m interested in learning more. I’m not really going to do anything more with this other than play around, but it is somewhat cool to say I have Oracle running on my laptop.

Looking for a Linux NetFlow collector that doesn’t suck

July 30th, 2006

This is my latest quest. I’ve been trying to convert my old Debian rig into a NetFlow collector to process the flows generated by pfflowd on my OpenBSD router. Right now I’m running an old-ass (3.0) version of NTop. Thios has the problemn that only active hosts (like my workstation) stay in the database. The other hosts on my LAN mysteriously disappear from the records when they’re not sending out packets.

I tried this piece of free as in beer software from Manage Engine and while the setup was pretty slick (except for me being a moron about MySQL) it didn’t actually detect any of the NetFlow packets despite the fact that a tcpdump showed them being sent to the box). I guess there must be some little inconsistency about the way pfflowd does things that made it choke. Pity.

I’ve tried the flow-tools and flowscan and they’re (a) rather primitive and (b) I’ve never actually managed to get them working correctly. The flow-collector actually does collect the flows but flowscanner chokes making the pretty pictures. This is with the stock Debian stable package. I played with compiling them myself on NetBSD some time back but without much of a different result.

I’m going to keep looking, but at this point I’m getting increasingly tempted to write my own (sounds like another project I’ll start and never finish).

Kubuntu

July 16th, 2006

I decided I really wanted to give something Ubuntu-ish a try, so I installed Kubuntu Dapper Drake on my Thinkpad T42p. So far, I’m fairly impressed by it, although I did find the partitioner in the GUI installer to be somewhat confusing (I think I nuked the suspend partition by mistake, but it doesn’t really bother me since I tend not to use software suspend. Otherwise, Kubuntu detected my wireless card (based on the Atheros chipset) correctly and got the wireless LED on my laptop working (it did not work under Slack 10). I do notice that network performance seems slightly slower under Kubuntu than Slackware. I have heard that disabling IPv6 support can help that, so I want to try that idea soon.

KDE 3.5.2 is impressive as usual (I run it on Slackme under Slack 10.2 so it is nothing new to me). The display was set up to the proper resolution (1280×1024) automatically, which was also a nice touch. In terms of applications, most basic things seem to be on board except that Firefox must be installed by hand (via apt-get as this is a Debian based system after all). I’m typing this in Konqueror while waiting for that to happen.

I’ll post if I notice anything further interesting about Kubuntu…

Quick note re Digg colorizer script

July 1st, 2006

I know that the update to Digg 3.0 broke my script. I am going to try and fix it this weekend.

I’m not dead

July 1st, 2006

It’s the truth. I’ve just been preoccupied. My life has taken a rather interesting (by my low standards at least) turn in the last couple of weeks. I’ve been doing a lot of serious thinking about the future as a result. I can’t really say too much about it (I really don’t like my blog to be a window on my soul—I guess I’m too private or whatever).

Plus, I’ve been working on all the planning at work. Maintaining a beowulf cluster, planning for upgrades, making sure the physical infrastructure is in place, etc. is a challenging job. I’ve got my plans pretty much set for the latest upgrades (hardware to watch, dual core Opterons) but it never ends (I’m already wondering when AMD will launch a response to Woodcrest). It’s going to be an interesting few months for me.

GreaseMonkey scripts

June 17th, 2006

I’ve taken it upon myself to learn a little bit about GreaseMonkey, which is a method of easily writing extensions for Firefox. I actually managed to write a script that works halfway respectably, and I’ve gone ahead and registered at userascripts.org, where you can see my page. My one script so far colorizes digg.com links based on category.

I’ve also updated my home page, mostly to share this wonderful and exciting news.