In September, 2005, I had the opportunity to exchange e-mail with Halil Demirezen on the subject of getting to know the FreeBSD kernel. Since the answer seemed of more general interest, I thought I'd posted it on my web site in case it was of interest to anyone else.
From rwatson at FreeBSD.org Tue Sep 6 14:10:10 2005 Date: Tue, 6 Sep 2005 14:10:10 +0100 (BST) From: Robert Watson <rwatson at FreeBSD dot org> To: Halil Demirezen < halil at enderunix dot org> Subject: Re: One dummy question but important for me. On Tue, 6 Sep 2005, Halil Demirezen wrote: > It is one little sample of hundred thousands of silly questions. But for > an individual it is really important one. I have been dealing with > kernel (/usr/src/sys/) in FreeBSD. However, this is really a big > challenge. My graduate thesis was a little i386 kernel that boots from > Floppy Disks (3 1/2). Run in protected mode. A shell little memory > management and multiprocessed one. > > I am coping with FreeBSD kernel to understand and to hack it. Where to > start? It is really difficult to understand the essentials. I am sure i > will spend my time on it. But what are the first essentials for a kernel > developer to be in the developing. I know it is easy for one who started > developing FreeBSD from the beginning. But as the code grows, It starts > to be impossible to begin. I tried to start from booting process. But > the kernel structure got lost in mind. When I try to understand one > source file, I need to open several to unite the all. ... Halil, Well, the FreeBSD kernel is a huge piece of software -- I've been working with it fairly continuously for about 8-10 years now, and only feel like I really understand a fraction of it. On the other hand, it turns out that partial understanding is still good enough to do some useful work :-). There are a couple of ways to go about learning about the structure of the kernel. If you don't already have a copy of McKusick and Neville-Neil's Design and Implementation of FreeBSD book, you should see if you can get a copy. While it's hardly an introductory guide to the kernel, it does do a good job of presenting important depth about a lot of important subsystems. I find it more useful as reference material than reading material, but it's quite helpful. Another way is to go at it from the perspective of reading source code. If you've skimmed or read the above book, I would recommend using one of a number of source code browsing web sites to start exploring the kernel. There are a couple of places you might begin, depending on what you want to learn. Here are a few you might try: http://fxr.watson.org/fxr/ident?i=mi_startup mi_startup() is the machine-independent system boot function, which occurs after the low level hardware has been initialized, and is responsible for kicking off SYSINIT(), the boot time registration mechanism. Various kernel components declare code that has to run at boot using SYSINIT() macros, which are ordered using subsystem ID's. You can find a list of the subsystems and their ordering here: http://fxr.watson.org/fxr/ident?i=sysinit_sub_id You may also be interested in the proc0 initialization function -- process 0 becomes the kernel process and swapper, and is started by mi_startup() via sysinit: http://fxr.watson.org/fxr/ident?i=proc0_init http://fxr.watson.org/fxr/ident?i=proc0_post Another important starting point is the process 1 kernel code, which is the kernel process that runs init(8), which in turn kicks off the start of user space: http://fxr.watson.org/fxr/ident?i=create_init http://fxr.watson.org/fxr/ident?i=start_init http://fxr.watson.org/fxr/ident?i=kick_init The start_init() function is run in process 1, which is created by create_init() run by sysinit in process 0. Once init is created, it will be started by kick_init(), also run by process 0 using sysinit. Prior to mi_startup(), machine-dependent boot code runs. The details vary a lot by platform, but the code is pretty much always named "locore", and is the entry point for the kernel loaded by the boot loader: http://fxr.watson.org/fxr/source/i386/i386/locore.s It generally is responsible for getting the kernel set up to execute, laying out kernel memory, preparing stacks, and all that. There's a book by Jolice called "Kernel Source Code Secrets" in two volumes, which describes the early 386BSD kernel. While the code has changed quite a lot since then, it can still make useful reading in understanding locore and a number of other parts of the kernel. Sysinits are particularly interesting for understanding the boot process because they rely on kernel linker sets and hook mechanically into both the boot process and module load process. Sysinit functions are run in the order they appear in the enumeration, and you'll find that many kernel subsystems register the components using macros wrapped around sysinit. For example, VFS_SET() is a macro that declares the initializes for virtual file system implementations, allowing the file systems to register themselves for use later: http://fxr.watson.org/fxr/ident?i=VFS_SET Another perspective from which to view the kernel and explore the source is from the perspective of the steady state offering services to user space. The most useful starting point here is: http://fxr.watson.org/fxr/source/kern/syscalls.master This is the master system call definition file, from which other files, such as init_sysent.c, are generated. Each system call listed in syscalls.master is implemented by a function by the same name in the kernel. For example, sync() is declared in syscalls.master, and implemented by a function named sync() in vfs_syscalls.c: http://fxr.watson.org/fxr/ident?i=sync So you can investigate code paths taken by particular system calls starting there. Of course, system calls aren't the only way to enter a kernel. A variety of traps exist, some leading to the VM system, others to signal handlers, math emulators, interrupt handlers, and so on. The low level trap code is machine-dependent, but will usually be implemented by a function named trap(): http://fxr.watson.org/fxr/ident?i=trap It calls into a lot of other things, including the system call vector for the process, generally derived from syscalls.master or an emulator variant, the page fault handler, and so on: http://fxr.watson.org/fxr/ident?i=trap_pfault On return from a system call (and various other circumstances), userret() is also interesting: http://fxr.watson.org/fxr/ident?i=userret Hopefully this is a useful starting point for browsing? Robert N M Watson