This is the printer-friendly version of 'reference information' section.


.:Files::CPU::Memory::IRQ::Video::PnP:.


Files & Filesystems

Executable File Types

Included from Executable File Types

The number of different executable file types is as many and varied as the number of different image and sound file formats. Every Operating System seems to have several executable file types unique to itself. This part of the FAQ will give a brief rundown on the various types you will come across.

A quick intro to a few terms:

  • TEXT is the actual exectuable code area,
  • DATA is "initialised" data,
  • BSS is "un-initialised" data.

The BSS (Below Stack Segment) needn't to be present in an executable file. At load-time, the loader will still allocate memory for it and wipes this memory with zeroes (this is assumed by C programs, for instance).

If you're looking for comprehensive informations, consider using the Programmer's File Format Collection and the Linkers and Loaders online book... You can also check Pierre's Library

EXE (DOS "MZ")

DOS-MZ was introduced with MS-DOS (not DOS v1 though) as a companion to the simplified DOS COM file format. DOS-MZ was designed to be run in real mode and reflects this, having a relocation table of SEGMENT:OFFSET pairings. A very simple format that can be run at any offset, it does not distinguish between TEXT, DATA and BSS. Since it was designed to run in real mode, its maximum filesize of code + data + bss is 1mb in size.

Operating Systems that use it: DOS, Win*, Linux DOS Emu, Amiga DOS Emu

EXE (Win 3.xx "NE")

The WIN-NE executable formated designed for Windows 3.x was the "NE" New-Executable. Again, a 16bit format, it alleviated the maximum size restrictions that the DOS-MZ had.

Operating Systems that use it: Windows 3.xx

EXE (OS/2 "LE")

The "LE" Linear Executable format was designed for IBM's OS/2 operating system by Microsoft. Supporting both 16 and 32bit segments.

Operating Systems that use it: OS/2, Watcom Compiler/Extender (DOS)

EXE (Win 9x/NT "PE")

Included from PeBinaries

With Windows 95/NT, a new exectuable file type was required. Thus was born the "PE" Portable Executable, which is still in use. Unlike its predecessors, WIN-PE is a true 32bit file format, supporting relocatable code. It does distinguish between TEXT, DATA, and BSS. It is, in fact, a bastardised version of the COFF format.

If you did set up a Cygwin environment on your Windows machine, "PE" is the target format for your Cygwin GCC toolchain, which causes the unaware some headache when trying to link parts build under Cygwin with parts build under Linux or BSD (which use the ELF target by default). (Hint: You have to build a GCC Cross-Compiler...)

Operating Systems that use it: Windows 95/98/NT, the Mobius.


additionnal resources

  • MSDN documentation about PE format

ELF

Included from ElfBinaries

The ELF (Executable Linkable Format) was designed by SUN for use in their Unix clone. A very versatile file format, it was later picked up by many other operating systems for use as both executable files and as shared library files. It does distinguish between TEXT, DATA and BSS.

Documentation on ELF can be obtained e.g. at http://www.linuxbase.org/spec/refspecs/elf/, ftp://tsx.mit.edu/pub/linux/packages/GCC/ELF.doc.tar.gz, or various other sources.

Today, ELF is considered the standard format on Unix-alike systems. While it has some drawbacks (e.g., using up one of the scarce general purpose registers of the IA32 when using position-independent code), it is well supported and documented.

Operating Systems that use it: Solaris, IRIX and IRIX64, Linux, *BSD, many many others...

A.OUT

A.OUT is the "original" binary format for Unix machines. It is considered obsolete today because of several shortcomings. However, as it is extremely simple and supported by many compilers/assemblers, it may be a good choice if you're willing to develop your own format or have more information than 'raw binary' for your bootloader.

File Systems

Included from Filesystems

Tell me about Filesystems

Filesystems are the machine's way of ordering your data on readable and/or writable media. They provide a logical way to access the stuff that you have down on disk so that you can read or modify extit. Which file system you use depends upon what you want to do with it. For example, Windows uses the Fat32 or NTFS filesystem. If your disk is really huge, then there's no point using Fat32 because the FAT system was designed in the days when nobody had disks as big as we do now. At the same time, there's no point using a NTFS filesystem on a tiny disk, because it was designed to work with large volumes of data - the overhead would be pointless for, say, reading a 1.44m floppy disk.

There are many different kinds of filesystems around, from the well-known to the more obscure ones. The most unfortunate thing about filesystems is that every hobbyist OS programmer thinks that the filesystem they design is the ultimate technology, when in reality it's usually just a bad copy of DOS FAT with a change here and there. The world doesn't need another crap filesystem. Investigate all the possibilites before you decide you roll your own.

FAT And its Variants

File Allocation Table (FAT) was introduced with DOS v1.0 (and possibly CP/M), supposedly written by Bill Gates. FAT is a very simple filesystem which is nothing more than a singular linked list of clusters. FAT filesystems use very little memory and is one of, if not the, most basic filesystem in existance today.

There are two versions of this simplified FAT, FAT12 and FAT16. FAT12 was designed for floppy disks and can manage a maximum size of 16mb using 12bit cluster numbers. FAT16 was designed for early hard disks and could handle a maximum size of 64kb * cluster_size. The larger the hard disk, the larger the cluster size would be, which lead to large amounts of "slack space" on the disk.

FAT12+FAT16 filesystems have fixed size for filenames of "8.3" and limited support for file attributes. You could also read the FAT12 document. You could also check out the FAT tutorial reported by Kemp.

VFAT

VFAT is an extension of FAT16 and FAT12 that has the ability to use long filenames (up to 255 characters i think). First introduced by Windows95, it uses a "cludge" whereby long filenames are marked with an "volume label"; attribute and filenames are subsequently stored in the 8.3 format in sequential directory entries. (This is a bit of an oversimplification, but close enough).

FAT32

FAT32 was introduced to us by Windows95-B and Windows98. FAT32 solved some of FAT's problems. No more 64kb max clusters! FAT32, as its name suggests, can handle a maximum of 4gig clusters per partition. This enables very large hard disks to still maintain very small cluster sizes and thus reduce slack space between files.

Is FAT32 really able to handle 4G clusters ? last time i looked to a FAT table, it looked to have last 4 bits cleared all the time, including the 'end of chain' tag, giving me the feeling that it was somehow more a "FAT28" than "FAT32" -- PypeClicker

Correct - FAT32 is actually only FAT28. Top 4 bits are currently "Undefined" (according to Microsoft's specification). Note that this does not mean 'top four bits should be "0"' - they should be honored. This means that if you need to alter a FAT entry, and the top 4 bits contain 1001, you should write it back to disk containing 1001 and not 0000. -- djhayman

Inode-based File Systems

Inodes (information nodes) are a crucial design element in most Unix filesystems: Each file is made of data blocks (the sectors that contains your raw data bits), index blocks (containing pointers to data blocks so that you know which sector is the nth in the sequence), and one inode block.

The inode is the root of the index blocks, and can also be the sole index block if the file is small enough. Moreover, as unix filesystems support hard links (the same file may appear several times in the directory tree), inodes are a natural place to store metadata such as file size, owner, creation/access/modification times, locks, etc.

HPFS (High Performace Filesystem)

The HPFS was designed by IBM/Microsoft for IBMs new windowing system, OS/2. It was designed to be fast, remove all the shortcomings of FAT, support long filenames, small cluster sizes, remove degfragmentation as much as possible and support more attributes.

HPFS is the precursor to NTFS and is, in a nutshell, NTFS minus the security features embeded into NTFS. Instead of storing cluster chains in a single linked list format, HPFS stores its information in sorted B-Tree's. This makes searching for files very fast.

Instead of keeping the directory tables and other descriptors at the start of the disk, HPFS bands them at regular intervals throughout the disk and in the middle of the disk, with the theory being that the heads only have to move half as much in any direction.

More information about HPFS can be found at: http://www.wotsit.org/download.asp?f=hpfs

NTFS (New Technology Filesystem)

NTFS is the native filesystem of WindowsNT. It is much like HPFS, but supports security features in the filesystem such as access control. Since WindowsNT is entirly unicode, NTFS is a unicode filesystem, each "character" being 16bits wide. NTFS adds quite a bit more to HPFS than just security features, though. First, it adds quite a bit of builtin redundancy -- with HPFS, wiping out one sector in the wrong place can render an entire volume inaccessible. Second, it adds support for multiple hard-links to a file (up 'til now, the only easy access has been via the POSIX subsystem, but NT 5/Win2K adds this to Win32 as well). Third, it supports an arbitrary number of file forks a la MacOS (except MacOS always has exactly 2 forks per file). Fourth, HPFS decrees that a cluster is always 512 bytes, and a cluster is always one sector. For the sake of performance and compatibility with some (especially Japanese) machines, NTFS allows sectors of other sizes. It also supports clusters of more than one sector, which tends to help performance a little.

NTFS is probably one of the most difficult file system to deal with, especially because of the lack of hacking experience and reliable documents about it. A read-only stable driver is in Linux source code base since kernel 2.4, while an experimental read-write driver is coming with linux 2.6.

The best information found about it so far is Andrew Tanenbaum's article. The Linux NTFS project also has some information about it, at http://linux-ntfs.sourceforge.net/ntfs/index.html. (Use the next / previous links at the top of the pages, or use the glossary.) You are welcome to add more.

ext2fs (Second Extended Filesystem)

The Second Extended Filesystem (ext2fs) was the default filesystem of Linux prior the advent of the journaling file systems ext3fs and ReiserFS. It has native support for UNIX ownership / access rights, symbolic and hard links and other Unix-native properties. Like HPFS, it tries to minimize head movement by distributing data across the disk. Also, by using "groups", it minimizes the impact of fragmentation. It is another "inode" based system. An ext2fs-partition is made up from blocks, which normally are 1K each. The first block (the bootblock) is zeroized, all the other blocks are divided into so-called block groups (normally, between 256 and 8192 blocks form a group). Each block group contains:

  • a copy of the superblock (which is a mighty useful structure containing info about the filesystem);
  • the filesystem descriptors (dunno what that is exactly)
  • the block bitmap, tells which blocks are used
  • the inode bitmap, tells which inodes are used (difference?)
  • the inode table, which contains the inodes themselves
  • the data blocks referenced by the inodes

The first inode is a special one; it is the bad blocks inode, which references all the damaged sectors of the partition. The fifth inode contains the bootloader, whereas the 11th contains the root directory.

Windows users can access ext2fs partitions with explore2fs.

Additional information about ext2fs:

ext3fs (Third Extended File System)

ext3fs is basically ext2fs with journaling added. If your ext3fs partition does not need journal replay, it can even be accessed with a 'simple' ext2fs driver.

ReiserFS

ReiserFS - homepage at http://www.namesys.com - is a file system that is free if your OS is free, and comes at a licensing cost if your OS is not free. It has excellent performance on large directories and small files, using "dancing trees" instead of B-trees, and does meta-data journaling to improve file-system stability across system crashes.

Unlike "classic" filesystems, Reiser allows you to have files that occupy less than one sector on the disk (i.e. it can store several tiny files or tails of files on the same sector) through its tree organization.

As of version 3.6, ReiserFS supports:

  • max number of files - 232 - 3 => 4 G - 3
  • max number files a dir can have - 232 - 4 => 4 G - 4, but in practice this value is limited by hash function. r5 hash allows about 1 200 000 file names without collisions
  • max file size - 260 - bytes => 1 E, but page cache limits this to 16 T on architectures with 32 bit int
  • max number links to a file - 232 => 4 G
  • max filesystem size - 232 (4K) blocks => 16 T

Network-based File Systems

All these file systems are a way to create a large, distributed storage system from a collection of "back end" systems. That means you cannot (for instance) format a disk in 'NFS' but you instead mount a 'virtual' NFS partition that will reflect what's on another machine. Note that a new generation of File Systems is under heavy research, basing on latest P2P, cryptography and error correction techniques (such as the Ocean Store Project or Archival Intermemory.

NFS

NFS was invented by Sun Microsystems. It became widespread largely because it'a quite easy to implement. In return for its simplicity, it tends to give relatively poor performance and a nearly complete lack of safety. These are both largely due to its connectionless nature. When you request data from a file, the server sends you the requested information, but does NOT keep track of which clients have which files open. To keep you from seeing (terribly) out-of-date information from a file, the data you read has an "expiration date". If you refer to the data from more than, say, a minute, it will expire and your client will request the data from the server again, whether it's changed or not. If you write data to the file, you have no way of knowing whether somebody else has updated the information between your reading and writing your data, so you may overwrite things they've written with older data. To ensure at least a little bit of safety, the server is supposed to actually commit data you write to disk before it returns to you.

In other words, NFS works pretty well for read-only access to things like executables on a server. For things like on-line databases, it's essentially a disaster waiting to happen (and it usually doesn't wait very long).

More recent versions of the NFS spec have cured most of these problems, but support for these updates is still (years later) somewhat uneven.

AFS

AFS is the Andrew File System aka Advanced File System, similar to NFS to about the same degree that a tricycle is similar to a fighter jet -- they're both typically one-person vehicles. AFS is a drastically more robust design than NFS, and is intended for MUCH larger networks. OTOH, it's also much more difficult to implement completely -- to the point that it's not likely to be of much interest to most hobbyists and such writing a new OS.

RFS

RFS (Remote File System) was introduced in UNIX System V to compete with NFS and such. Unlike NFS, RFS is a connection-oriented system, so if, for example, two different machines access a file on a server, they get about the same semantics as if two processes on a single machine accessed the file. Note that NFS and RFS are both built on top of some sort of local file system, which determines things like inodes and such.

Unclassifiable and Other Stuff

BeFS

BeFS is the new filesystem for the Be Operating system. It is very much like the MacOS Filesystem, supporting multiple forks in a 64bit filesystem. One very useful feature it shares with the AmigaOS FFS is the ability for an application to set a "notify callback", i.e. being notified when a file or directory changes.

Get way more information about BeFS in Practical File System Design

FFS (Amiga)

The Amiga Fast File System, to put it bluntly, is not - or rather, it's fast only when compared to the OFS, the Original File System of AmigaOS 1.x.

There are many bright design ideas making the AmigaOS a very special thing, but the file system was not exactly part of it. It is prone to invalidation, holds redundant data, and its directory structure is comparatively slow to traverse. It also lacks any concept of multi-user environments.

Perhaps the only good thing with the Amiga FFS was the concept of the Rigid Disk Block (RDB) - a special area at the beginning of a disk, holding not only the partitioning information. It was also possible to store a file system there - a module that would tell a different AmigaOS machine how to read a partition if it was not formatted in FFS format but something else.

For those interested in its internals should try to find a copy of "The Amiga Guru Book" by Ralph Babel, which holds a complete reference of its rather complex block structure. (It also has a complete reference of the DOS library, as well as interesting information on various internals of the Amiga architecture. It is long out of print, but perhaps you can still find copies on eBay.) The old FAQ also held some info in the internals, which are preserved in the AmigaFFS Document.

FFS / UFS (BSD)

Not to be confused with the Amiga FFS, the BSD FFS / UFS is commonly used on hard disks for the *BSD and derivatives. What is usually called a "partition" is called a "slice" in *BSD, which is in turn subdivided into "partitions" - a naming pattern that leads to some confusion, and to rather cryptic device names (ad0s1c for the third partition on the second slice on the primary master ATAPI hard drive...).

XFS

XFS is Silicon Graphics "Next Generation Journalled 64-Bit Filesystem With Guaranteed Rate I/O" designed for IRIX based systems. XFS uses the standard inodes, bitmaps and blocks, and is compatable with EFS and NFS filesystems.

According to the XFS white paper it has;

  • Scalable features and performance from small to truly huge data (petabytes)
  • Huge numbers of files (millions)
  • Exceptional performance: 500+ MBytes/second
  • Designed with log/database (journal) technology as a fundamental part not just an extension to an existing filesystem
  • Mission-critical reliability

BFS

BFS (UnixWare Boot File System) is a SCO specification for a KISS filesystem used at bootstrap. It only offers one directory and, due to the way information about blocks are stored, only one file opened for writing at a time.

http://www.penguin.cz/~mhi/fs/bfs/bfs-structure.html

From what i see, it also means BFS will have to do nasty things if a file must be extended after some other file has been created -- PypeClicker

Agreed, but it's not a general-purpose filesystem. One tends not to extend things like the kernel image or modules. -- Strib

FAT12 (for floppies)

Included from FAT12 document

The Microsoft version might have information left out in this document. Make sure you read Long Filenames Specification too.

Note that the FAT filesystem is covered by software patents.


File Allocation Table (FAT 12)

This paper concentrates on the FAT12 system only. It is broken down into several sections. Following a brief introduction on File Allocation Tables, the paper goes into a step by step instruction on how to read an MS-DOS File Allocation Table for a diskette (FAT12). The sections are in order:

  • Introduction
  • FAT12 (Diskette)
  • Reading the Boot Sector
  • Reading the Directory
  • Finding the Beginning of the Boot, FAT, Directory, and Open Space
  • File Allocation Table Entry Cluster Values
  • Location of File in Open Space Area
  • A printed copy of the file that is used to dissect the FAT12 table
  • A printed copy of the Boot Sector and Directory
  • A printed copy of the File Allocation Table
  • A printed copy of the Beginning of the File in the Open Space Area
  • A printed copy of the Ending of the File in the Open Space Area

In the Introduction, what is called the data area is called the Open Space Area later in the instructional part of the paper. And finally although, this paper does go into quite a bit of detail it is by no means complete.

Introduction

The File Allocation Table (FAT) is a table stored on every hard or floppy disk that indicates the status and location of all data clusters that are on the disk. The File Allocation Table can be considered to be the "table of contents" of a disk. If the file allocation table is damaged or lost, then a disk is unreadable. In a file server the FAT data is sometimes kept in the computer RAM for quick access and is easily lost if the system crashes as the result of a power failure.

The File Allocation Table is maintained by the operating system that provides a map of the clusters (the basic unit of logical storage on a disk) that a file has been stored in. When you write a new file, the file is stored in one or more clusters that are not necessarily next to each other; they may be rather widely scattered over the disk. A typical cluster size is 2,048 Bytes, 4,096 Bytes or 8,192 Bytes.* The operating system creates a FAT entry for the new file that records where each cluster is located and their sequential order. When you read a file, the operating system reassembles the file from clusters and places it as an entire file where you want to read it.

The hard disk is physically arranged by cylinders, heads, and sectors, that is how it is addressed by the hardware controller and the ROM BIOS, which addresses it at a physical level. For the operating system and other programs, however, this is cumbersome, since the physical number of cylinders, heads, and sectors varies from disk to disk. It would be convenient to view the disk as simply a large continuous block of sectors with simple sequential addresses.

MS-DOS does, in fact, view the sectors on a disk as a one-dimensional array of sectors numbered from 0 to n-1, where n is the total number of sectors on the disk. It therefore must translate from the logical sector numbers to physical to physical cylinder-head-sector, or CHS addresses. In doing so, MS-DOS sequentially numbers all the sectors of head 0, cylinder 0, then all the sectors of head 1, cylinder 0, and so on for each head, and then repeats this for each cylinder, to the end of the disk.

Furthermore, MS-DOS logically divides this array of sectors into five distinct areas, which are, in the order they appear on the disk,

  • The partition table,
  • The boot record,
  • The File Allocation Table (FAT),
  • The root directory, and
  • The data area.

    The first four areas of the disk, collectively called the system area, are used by MS-DOS to keep track of the contents of the disk. The largest area of the disk, the data area, is where all user files and data reside. MS-DOS uses a special numbering scheme for the area called cluster numbering which is in addition to, but independent of logical sector numbers.

    The boot record occupies one sector, and is always placed in logical sector number (LSN) 0, which is physically cylinder 0, head 0, sector 1, the first sector of the first head of the first cylinder on the disk. This is the easiest sector on the disk for the computer to locate when it begins running.

    The File Allocation Table (FAT) is an array of integers in which each element represents one cluster in the data area. For each cluster in the data area the corresponding entry in the FAT contains a code which indicates the status of the cluster. The cluster may be available for use, it may be reserved by the operating system, it may be unavailable due to a bad sector on the disk, or it may be in use by a file.

    MS-DOS maintains a hierarchical directory structure in which there is one entry for every file on the disk.

    Data area is where all user files and data reside.

FAT 12 (Diskette)

Boot Sector

BYTE +0 +1 +2 +3 +4 +5 +6 +7 meaning
0-2 0000 eb 3c 90 .. .. .. .. .. Jump to start of boot code
3 - 10 0000 .. .. .. 6d 6b 64 6f 73 OEM identifier (mkdosfs)
0008 66 73 00 .. .. .. .. .. (program/OS being used to format)
11 - 12 0008 .. .. .. 00 02 .. .. .. The number of Bytes per sector (512)
13 0008 .. .. .. .. .. 01 .. .. Number of sectors per allocation unit (cluster)
14 - 15 0008 .. .. .. .. .. .. 01 00 Number of reserved sectors
16 0010 02 .. .. .. .. .. .. .. Number of FAT's on the diskette
17 - 18 0010 .. e0 00 .. .. .. .. .. Number of directory entries (BIOS Parameters)
19 - 20 0010 .. .. .. 40 0b .. .. .. The total sectors in the logical volume
21 0010 .. .. .. .. .. f0 .. .. Media descriptor type
22 - 23 0010 .. .. .. .. .. .. 09 00 Number of sectors per FAT
24 - 25 0018 12 00 .. .. .. .. .. .. Number of sectors per track
26 - 27 0018 .. .. 02 00 .. .. .. .. Number of heads or sides on the diskette
28 - 31 0018 .. .. .. .. 00 00 00 00 Number of hidden sectors
32 - 35 0020 00 00 00 00 .. .. .. .. Large amount of sector on media
36 0020 .. .. .. .. 00 .. .. .. Drive number
37 0020 .. .. .. .. .. 00 .. .. Flags
38 0020 .. .. .. .. .. .. 29 .. Signature (must be 0x28 or 0x29)
39 - 42 <disk-dependent> VolumeID 'Serial' number (ignore this)
43 - 53 0028 .. .. .. 20 20 20 20 20 Volume label,
0030 20 20 20 20 20 20 .. .. padded with spaces
54 - 61 =0030 .. .. .. .. .. .. F A = system identifier
0038 T 1 2 20 20 20 .. .. (padded with space)
62-509 0038 .. .. .. .. .. .. 0e 1f Start of Bootstrap routine
0040 be 5b 7c ac 22 c0 74 0b Bootstrap routine (cont'd)
510 01f0 .. .. .. .. .. .. 55 aa BIOS boot Signature (dw 0xAA55)

Reading the Boot Sector

Bytes (0-2)
The first three bytes 6B 3C and 90 disassemble to JMP 003C NOP. The reason for this is to jump over the disk format information. Since the first sector of the disk is loaded into ram at location 0x0000:0x7c00 and executed, without this jump, the processor would attempt to execute data that isn't code.
Bytes (3 - 10)
The first 8 Bytes (3 - 10) is the version of DOS being used. The next eight Bytes 29 3A 63 7E 2D 49 48 and 43 read out the name of the version. The official FAT Specification from Microsoft says that this field is really meaningless and is ignored by MS FAT Drivers, however it does recommend the value "MSWIN4.1" as some 3rd party drivers supposedly check it and expect it to have that value. Older versions of dos also report MSDOS5.1 and linux-formatted floppy will likely to carry "mkdosfs" here. If the string is less than 8 bytes, it is padded with zeroes.
Bytes (11 - 12)
The next two Bytes (11 - 12), 00 and 02, is the number of Bytes per sector. The first thing you do when reading a pair of Bytes is reverse them to read 02 00. 0200 is the number of Bytes per sector in hexadecimal or 512 Bytes per sector in decimal.
Byte 13
Byte 13 is the number of sectors per allocation (cluster). In this case it is one.
Bytes (14 - 15)
These two Bytes, 01 and 00, indicate the number of reserved sectors. Again you must reverse the Bytes to 00 01. There is one reserved sector.
Byte 16
This is the first Byte of the second row and it indicates the number of FAT's on the diskette. There are two.
Bytes (17 -18)
This indicates the number of directory entries. Reversing the Bytes E0 00 to 00 E0 and converting the number to decimal we have 224 directory entries.
Bytes (19 - 20)
The total sectors in the logical volume. Reversing 40 and 0B to 0B 40 and converting the number to decimal we have 2880 sectors in the logical volume. If that value is 0, it means there are more than 65535 sectors in the volume, and the actual count is stored in "Large Sectors (bytes 32-35).
Byte 21
This Byte (F0) indicates the media descriptor type, which is here a 1.44MB floppy.
Bytes (22 - 23)
These two Bytes, 09 and 00, indicate the number of sectors per FAT. Reversing 09 and 00 to 00 09, we see that we have nine sectors per FAT.
Bytes (24 - 25)
Number of sectors per track. Reversing the Bytes 12 and 00 to 00 12, there are eighteen sectors per track.
Bytes (26 - 27)
These two Bytes indicate the number of heads or sides on the diskette. Reversing the Bytes 02 and 00 to 00 02, we see that there are two sides to the diskette.
Bytes (28 - 29)
Number of hidden sectors. Both Bytes read zero, no hidden sectors.

Byte 30 Start of bootstrap routine is zero.

Directory

ToDo this information is weak, lacks clarifications about padding (how is A.B exactly encoded), what time and dates refers, etc.
Bytes Meaning
0 - 10 File name with extension
11 Attributes of the file
12 - 21 Reserved Bytes
22 - 23 Indicate the time
24 - 25 Indicate the date
26 - 27 Indicate the entry cluster value
28 - 31 Indicate the size of the file

Reading the Directory

We had a thread where it shows that root directory might not be that simple. It seems like we should read all entries, skipping entries marked as 'volume label' if any. -- PypeClicker

Bytes (0 - 10)
Starting with the Byte 2620, the first 11 Bytes (0 - 10) is the name of the file with extension. If the 11 byte string is PROCESSATXT, then the 8.3 filename is PROCESSA.TXT since the first 8 bytes of the string comprise the filename and the last 3 are the extension. If the filename is less than 8 bytes or the extension is less than 3, padding spaces are added, e.g. a file name of LOADER.RC would be encoded simply as "LOADER RC " (that's two spaces after LOADER and one after RC).
Byte 11
This Byte lists the attributes of the file. To read this you must convert the hexadecimal Byte to binary. In this case 20 (hex) is converted to 0010 0000. Each of the eight bits represents an attribute of the file. When a bit is on, indicated by a one, the file has that attribute. Starting with the right most bit, which is the zero bit and working over to the left most bit the 7th bit the attributes are; read only, hidden, system file, volume label, sub-directory, archive, and the last two bits the 6th and 7th bits indicate resolved. In this particular file it is the 5th bit that is on meaning that it is an achieve file.
Bytes (12 - 21)
These are the reserved Bytes.
Bytes (22 - 23)
These two Bytes, 4E and 7B, indicate the time the file was made. To retrieve the time reverse the Bytes to 7B 4E and convert to binary 0111 1011 0100 1110. The hour is read from the first five bits, the minutes are read from the next six bits, and the seconds are read from the last five bits. So our time Bytes are read like this 01111 011010 01110. Reading the Bytes; the hour is 15, the minutes are 26, and the seconds are 14. Important, the seconds must be multiplied by 2 to get the true second reading. So the time that the file was created was 15:26:28 military time or 3:26:28 PM.
Hour 5 bits
Minutes 6 bits
Seconds 5 bits
Bytes (24 - 25)
These two Bytes, 96 and 26, indicate the date the file was made. To retrieve the date reverse the Bytes to 26 96 and convert to binary 0010 0110 1001 0110. The year is read from the first seven bits, the month is read from the next four bits, and the day is read from the last five bits. So our date Bytes are read like this 0010011 0100 10110. Reading the Bytes; the year is 19, the month is 4, and the day is 22. The number for the year must be added with 1980 to get the correct year the file was made. So the date that the file was made was April 22, 1999.
Year 7 bits
Month 4 bits
Day 5 bits
Bytes (26 -27)
These two Bytes, 02 and 00, indicate the entry cluster value for both the FAT and the Open Space Area. More about this in the last two sections (File Allocation Table Entry Cluster Values and Location of File in the Open Space Area).
Bytes (28 - 31)
These four Bytes 1B, 0C, 00, 00 indicate the size of the file. Reversing the Bytes to 00 00 0C 1B and converting the number to decimal the size of the file is 3099 Bytes.

Finding the Beginning of the Boot, FAT, Directory, and Open Space

Boot Sector

as stated in the introduction the Boot Sector is always placed in logical sector number (LSN) 0, 0000.

File Allocation Table (FAT)

The File Allocation Table begins after the Boot Sector. To find the starting Byte, find the length of the Boot Sector which is one sector multiplied by the number of Bytes per sector (Bytes 11 and 12 of the Boot Sector). The File Allocation Table begins at 0200 (hex).

Directory

The Directory begins after both the Boot Sector and the File Allocation Tables. To find the starting Byte, find the number File Allocation Tables on the diskette (Byte 16 of the Boot Sector) and multiply this number with the number of sectors per FAT (Bytes 22 and 23 of the Boot Sector). Add this number with the number of Boot Sectors (which is one) to give you the total number of sectors of both the FAT and Boot sectors. Multiply total number of sectors by the number of Bytes per sector (Bytes 11 and 12 of the Boot Sector) giving you the starting Byte of the Directory.

(2 * 9) + 1 = 19 sectors; 19 sectors * 512 Bytes/ sector = 9728 Bytes (decimal) or 2600 Bytes (hex). The start of the Directory is 2600.

Open Space

The Open Space begins after the directory. To find the beginning of the Open Space you need to find the size of the directory in Bytes and add that to the beginning Byte of the Directory (2600). To find the size of the directory multiply the number of directory entries (Bytes 17 and 18 of the Boot Sector) by the Bytes per directory entries which in this case is given at 32 Bytes/directory entry of data (decimal).

224 directory entries * 32 Bytes/ directory entry = 7168 Bytes (decimal) or 1C00 (hex)

1C00 Bytes (hex) + 2600 Bytes (hex) = 4200 Bytes (hex). The start of the Open Space is 4200.

File Allocation Table Entry Cluster Values

Starting with the entry cluster value (Bytes 26 and 27 in the Directory), find the values (02 and 00) and reverse them to read 00 02 (hex). The result being **2 (hex or decimal).

Because, this result, the value of 2 is the same for both hexadecimal and decimal converting to decimal is not necessary, just remember that this next step is in decimal. Multiply this number 2 by 1.5 giving the number 3 (decimal) or 3 (hex). Now, go to the File Allocation table and retrieve the 3rd (0203) and 4th (0204) Bytes.¨ Remember to start your counting from zero. Take the two Bytes 03 and 40 and reverse them to 40 03. Because 3 is a whole integer, AND the binary value of the hexadecimal number of 4003 (0100 0000 0000 0011) to the binary value of the hexadecimal number 0FFF (0000 1111 1111 1111). The result is **3 (hex or decimal).

Convert the result above to decimal (if necessary) and multiply by 1.5 giving 4.5. So now we extract the 4th (0204) and 5th (0205) numbers from the File Allocation Table which are 40 and 00. Reverse the hexadecimal numbers to read 00 40. Because 4.5 is not a whole integer we right shift 0040 to read 0004. The result being **4 (hex or decimal).

Multiply 4 (decimal) by 1.5 giving 6. Now we go to the 6th (0206) and 7th (0207) number in the File Allocation Table which are 05 and 60. Reverse the numbers to read 60 05. Because 6 is a whole integer, AND the binary value of the hexadecimal number of 6005 (0110 0000 0000 0101) to the binary value of the hexadecimal number 0FFF (0000 1111 1111 1111). The result is **5 (hex or decimal).

Multiply 5 (decimal) by 1.5 giving 7.5. Now we read the 7th (0207) and 8th (0208) numbers in the File Allocation Table which are 60 and 00. Reversing the numbers we have 00 60. Because 7.5 is a fractional number we right shift 0060 to read 0006. The result is **6 (hex or decimal).

Multiply 6 (decimal) by 1.5 giving 9. Reading the 9th (0209) and 10th (0210) numbers in the File Allocation Table which are 07 and 80. Reverse the numbers to read 80 07. Because 9 is a whole integer, AND the binary value of the hexadecimal number of 8007 (1000 0000 0000 0111) to the binary value of the hexadecimal number 0FFF (0000 1111 1111 1111). The result is **7 (hex or decimal).

Take the decimal result above and multiply by 1.5 giving 10.5. Read the 10th (0210) and 11th (0211) Bytes in the File Allocation Table, the numbers are 80 and 00. Reversing the numbers we have 00 80. Because 10.5 is not a whole integer we right shift 0080 to 0008. The result is **8 (hex or decimal).

    • This number in decimal form is used also to calculate the Location of File in Open Space Area (Next Section).

Multiply 8 (decimal) by 1.5 giving 12. Now extract the 12th (0212) and 13th (0213) Bytes in the File Allocation Table. These numbers are FF 0F. Reverse the numbers to read 0F FF. This value 0FFF (hex) indicates the end of this file.

Location of File in Open Space Area

To find the location of the file in the Open Space Area take the decimal results of the File Allocation Table entry cluster values, as denoted by the double asterisks, and subtract 2. Then multiply by the number of Bytes per sector, which is indicated in Bytes 11 and 12 in the Boot Sector. In this case the Bytes per sector value is 512 (decimal). Finally take this value in Bytes, convert it to hexadecimal, and add it onto the starting location of the Open Space Area, which in this case is 4200.

(2 - 2) sectors * 512 Bytes per sector = 0 Bytes (decimal) 0 Bytes (hex) + 4200 Bytes = 4200 Entry value of the first cluster is 4200.

(3 - 2) sectors * 512 Bytes per sector = 512 Bytes (decimal) 200 Bytes (hex) + 4200 Bytes = 4400 Entry value of the second cluster is 4400.

(4 - 2) sectors * 512 Bytes per sector = 1024 Bytes (decimal) 400 Bytes (hex) + 4200 Bytes = 4600 Entry value of the third cluster is 4600.

(5 - 2) sectors * 512 Bytes per sector = 1536 Bytes (decimal) 600 Bytes (hex) + 4200 Bytes = 4800 Entry value of the fourth cluster is 4800.

(6 - 2) sectors * 512 Bytes per sector = 2048 Bytes (decimal) 800 Bytes + 4200 Bytes = 4A00 Entry value of fifth cluster is 4A00.

(7 - 2) sectors * 512 Bytes per sector = 2560 Bytes (decimal) A00 Bytes + 4200 Bytes = 4C00 Entry value of sixth cluster is 4C00.

(8- 2) sectors * 512 Bytes per sector = 3072 Bytes (decimal) C00 Bytes + 4200 Bytes = 4E00 Entry value of seventh cluster is 4E00.


Links to more information about FAT

AmigaFFS document

Included from AmigaFFS Document

1.1 Root Block

The root of the tree is the root block, which is at a fixed place on the disk. The root is like any other directory, except that it has no parent, and it's secondary type is different. AmigaDOS stores the name of the disk volume in the name field of the root block.

Each filing system blck contains a checksum, where the sum (ignoring overflow) of all the words in the block is zero.

          +---------------+
        0 |  T. SHORT     | Type
          |---------------|
        1 |       0       | header key (always 0)
          |---------------|
        2 |         0     | Highest seq number (always 0)
          |---------------|
        3 |   HT SIZE     | Hashtable size (=blocksize -56)
          |---------------|
        4 |       0       |
          |---------------|
        5 |   CHECKSUM    |
          |---------------|
        6 |     hash      |
          |     table     |
          /               /
          \               \
  SIZE-51 |               |
          |---------------|
  SIZE-50 |  BMFLAG       | TRUE if bitmap on disk is valid
          |---------------|
  SIZE-49 |   bitmap      | Used to indicate the blocks
  SIZE-24 |    pages      | containing the bitmap
          |---------------|
  SIZE-23 |    DAYS       | Volume last altered date and time
          |---------------|
  SIZE-22 |    MINS       |
          |---------------|
  SIZE-21 |    TICKS      |
          |---------------|
  SIZE-20 |     DISK      | Volume name as a BCPL string
          |     NAME      | of <= 30 characters
          |---------------|
  SIZE-7  |   CREATEDAYS  | Volume creation date and time
          |---------------|
  SIZE-6  |   CREATEMINS  |
          |---------------|
  SIZE-5  |  CREATETICKS  |
          |---------------|
  SIZE-4  |       0       | Next entry on this hash chain
          |---------------| (always 0)
  SIZE-3  |       0       | Parent directory (always 0)
          |---------------|
  SIZE-2  |       0       | Extension (always 0)
          |---------------|
  SIZE-1  |    ST.ROOT    | Secondary type indicates root block
          +---------------+

1.1.2 User Directory Blocks

          +---------------+
        0 |   T.SHORT     | Type
          |---------------|
        1 |   OWN KEY     | Header Key (pointer to self)
          |---------------|
        2 |       0       | Highest Seq Number (always 0)
          |---------------|
        3 |       0       |
          |---------------|
        4 |       0       |
          |---------------|
        5 |  CHECKSUM     |
          |---------------|
        6 |               |
          |    hash table |
          /               /
          \               \
  SIZE-51 |               |
          |---------------|
  SIZE-50 |    Spare      |
          |---------------|
  SIZE-48 |    PROTECT    |  Protection bits
          |---------------|
  SIZE-47 |       0       | Unused (always 0)
          |---------------|
  SIZE-46 |               |
          |   COMMENT     | Stored as  BCPL string
  SIZE-24 |               |
          |---------------|
  SIZE-23 |     DAYS      | Creation date and time
          |---------------|
  SIZE-22 |     MINS      |
          |---------------|
  SIZE-21 |    TICKS      |
          |---------------|
  SIZE-20 | DIRECTORY NAME| Stored as a BCPL string <=30 chars
          |---------------|
  SIZE-4  | HASHCHAIN     | Next entry with same hash value
          |---------------|
  SIZE-3  |    PARENT     | back pointer to parent directory
          |---------------|
  SIZE-2  |      0        | Extension (always 0)
          |---------------|
  SIZE-1  |  ST.USERDIR   | secondary type
          +---------------+

User directory blocks have type T.SHORT and secondary type ST.USERDIRECTORY. The six information words at the start of the block also indicate the block's own key (this is, the block number) as a consistency check and the size of the hash table. The 50 information words at the end of the block contain the date and time of creation, the name of the directory, a pointer to the next file or directory on the hash chain, and a pointer to the directory above.

To find a file or sub-directory, you must first apply a hash function to its name. This has function yields and offset in the hash table, which is the key of the first block on a chain linking those with the same hash value (or 0, if there are none). AmigaDOS reads teh block with this key and compares the name of the block with the required name. If the names do not match, it reads the next block on the chain, and so on.

1.1.3 File Header Block

           +------------+
        0  |   T.SHORT  | Type
           |------------|
        1  |   OWN KEY  | Header Key
           |------------|
        2  | HIGHEST SEQ| Total number of data blocks in file
           |------------|
        3  |  DATA SIZE | Number of data block slots used
           |------------|
        4  | FIRST DATA | First data block
           |------------|
        5  |  CHECKSUM  |
           |------------|
        6  |            |
           /            /
           \            \
           | DATA BLK 3 |
           | DATA BLK 2 | List of data block keys
  SIZE-51  | DATA BLK 1 |
           |------------|
  SIZE-50  |  Spare     |
           |------------|
  SIZE-49  |   PROTECT  | Protection bits
           |------------|
  SIZE-48  |  BYTESIZE  | Total size of file in bytes
           |------------|
  SIZE-46  |            |
           |  COMMENT   | Comment as a BCPL string
  SIZE-24  |            |
           |------------|
  SIZE-23  |    DAYS    | Creation date and time
           |------------|
  SIZE-22  |    MINS    |
           |------------|
  SIZE-21  |    TICKS   |
           |------------|
  SIZE-20  | FILE NAME  | Stored as BCPL string <= 30 chars
           |------------|
  SIZE-4   |  HASHCHAIN | Next entry with same hash value
           |------------|
  SIZE-3   |   PARENT   | Back pointer to the parent directory
           |------------|
  SIZE-2   |  EXTENSION | Zero pointer to the first extension
           |------------| block
  SIZE-1   |  ST. FILE  | Secondary type
           +------------+

Each terminal file starts with a file header block, which has type T.SHORT and secondary type ST.FILE. The start and end of the block contain name, time, and redundancy information similar to that in a directory block. The body of the file consists of Data blocks with sequence numbers from 1 upwards. AmigaDOS stores the addresses of these blocks in consecutive words downwards from offset size-51 in the block. In general, AmigaDOS does not use all the space for this list and the last data block is not full.

1.1.4 File List Block

If there are more blocks in the file than can be specified in the block list, then the EXTENSION field is non-zero and points to another disk block which contains a further data block list. The following figure explains the structure of the file list block.

           +-------------+
        0  |   T. LIST   | Type
           |-------------|
        1  |   OWN KEY   | Header Key
           |-------------|
        2  | BLOCK COUNT | =number of data blocks in block list
           |-------------|
        3  | DATA SIZE   | Same as above
           |-------------|
        4  | FIRST DATA  | First Data Block
           |-------------|
        5  |  CHECKSUM   |
           |-------------|
        6  |             |
           /             /
           \             \
           | BLOCK N+3   |
           | BLOCK N+2   | Extended list of data block keys
  SIZE-51  | BLOCK N+1   |
           |-------------|
  SIZE-50  |      info   | (unused)
           |-------------|
  SIZE-4   |     0       | Next in hash list (always 0)
           |-------------|
  SIZE-3   |   PARENT    | File header block of this file
           |-------------|
  SIZE-2   | EXTENTSION  | Next extension block
           |-------------|
  SIZE-1   |   ST. FILE  | Secondary type
           +-------------+

There are as many file extension blocks as required to list the data blocks that make up the file. The layout of the block is very similar to that of a file header block, except that the type is different and the date and filename fields are not used.

1.1.5 Data Block

           +-------------+
        0  |   T. DATA   | type
           |-------------|
        1  |   HEADER    | header key
           |-------------|
        2  |   SEQ NUM   | Sequence number
           |-------------|
        3  |  DATA SIZE  |
           |-------------|
        4  |  NEXT DATA  | next data block
           |-------------|
        5  |  CHECKSUM   |
           |-------------|
        6  |             |
           |             |
           |             |
           |             |
           |    DATA     |
           |             |
           |             |
           |             |
           +-------------+

Data blocks contain only six words of filing system information. These six words refer to the following:

  • type (T.DATA)
  • pointer to the file header block
  • sequence number of the data block
  • number of words of data
  • pointer to the next data block
  • checksum

Normally, all data blocks except the last are full (that is, they have a blocksize = blocksize-6). The last data block has a forward pointer of 0.



.:Files::CPU::Memory::IRQ::Video::PnP:.



Hardware::CPU

The IA32 Architecture Family

Included from The IA32 Architecture Family

(Information taken from the Intel manuals to give an overview over the individual generation's capabilities.)

Intel Processors

These processors from Intel use the CPUID string "GenuineIntel"

Intel 386

Successor to the 80286, the Intel 386 is the first processor of the IA32 architecture. It has 32 bit wide registers, supports 4 kByte paging, and a flat memory model in addition to the segmented memory model of the 80286.

Intel 486

The 486 integrates a 80x87 FPU on-chip, and supports power saving functions (System Management Mode, StopClock, AutoHaltPowerdown).

Pentium

The Pentium supports 4 MByte paging in addition to the usual 4 kByte paging, integrates an APIC and (in later steppings) MMX SIMD registers (single instruction, multiple data). It also supports 2-way multiprocessing.

Pentium Pro

The Pentium Pro supports PAE (36 bit address space), but does not have the MMX registers of the Pentium.

Pentium II

The Pentium II again supports MMX (as well as PAE), as well as additional low-power states: AutoHALT, Stop-Grant, Sleep, and DeepSleep.

Pentium II Xeon

The Xeon supports 4/8/+ way multiprocessing.

Pentium III

The Pentium III supports SSE (128 bit packed single FP SIMD).

Pentium IV / Pentium M

The Pentium IV as well as the (mobile) Pentium M both support SSE2; the Pentium IV also supports Hyper-Threading (one-chip multiprocessing).

Advanced Micro Device Intel-compatible Processors

The biggest competitor to Intel at this time (2004 August). They came into being slightly after Cyrix with a 5k86 (being a 486 compatible similar to the 5x86, don't confuse them) and then followed it up by a K6 processor. This one was faster than the Pentiums, and more popular than the Cyrix ones because they both didn't rate it (afaik), and they didn't overheat (as was claimed, untrue, for the Cyrixes).

The CPUID identifier string is "AuthenticAMD"

K6

The K6 processor was a very nice processor, being Pentium compatible and doing anything the Pentium half could. It became impopular because it was too damn fast, it did a LOOPcc within 2 cycles, where the Pentiums took 18 cycles.

Because software couldn't handle the sheer speed, it made errors, and thus caused frequent "Blue Screen"s. Microsoft issued a K6 patch which was mandatory for all K6 users.

K6-2

AMD had a lesson learned there, don't make your processor too fast in some instructions. Not that they were put off by that, they just added 16 wait states to the execution of the LOOPcc and thus caused it to slow to the speed of a Pentium. AMD didn't just do this however. They added a special case (speculation, might be coincidence) for the DEC (E)CX; Jcc combination, which is semantically equivalent with the LOOPcc instruction, but this semantic equivalency and the loop being faster on Intels caused the loop instruction to always be used. Nobody used the DEC/Jcc combo. They kept the original speed for this combo and specified in their optimization manuals that this was the preferred method over the loopcc instruction.

It also featured a new technology, the 3DNOW! technology, which was MMX using floating point numbers, and multiplexed (again) on the floating point registers. The K6-2 was quite popular, and scaled higher than the P1 ever did. It was largely compatible with the P2, but (afaik) not completely.

K6-3

They started this design off with the concept of not making it underpowered in any place, and to make it at least P2 compatible. It was fully P2 compatible.

The K6-3 was not too popular, mainly because the K6-2 did very well and people didn't see why they should buy a more expensive K6-3 for the same amount of megahertz. This of course was a joke, same as it is to call a 2GHZ opteron slower than a 2.2GHZ celeron.

A little known fact about the K6-3 is that it is in fact an Athlon, minus a few instructions, and minus one very important piece. The K6-3 suffered from a bottleneck at the instruction decode unit (which converts the X86 instructions to native instructions). It could only handle 2 in a cycle, which it made during about 20-30% of the cycles for average software. For optimized software you could bring it to 100% easily, and still want another channel. This wasn't too weird, because it did have 3 execution units of each type (ALU / MMX / loadstore) which were not used much at all. Note that these units are units executing the native instructions, so making 3 of each is not a stupid idea. They needed a new front end, and of course a new copy of instructions from Intel.

Athlon (first try)

The first models of the Athlon were distinct, they were the first time that a competitor to Intel actually had a faster processor, without Intel having a backup plan. It was poised against the PIII, which at that time was their top model and best-running one too. The athlon beat them to the 1GHZ mark, and at that time the 1GHZ had become completely irrelevant. It just meant that they had a new size to mark their processors with. Intel missed the point here, and they did until very shortly ago. The GHZ myth had been broken, the Athlon at 1.1GHZ was still faster than the PIII at 1.3GHZ, and people knew. They didn't go for a P3 if a faster athlon was available at a lower clock speed, and at a lower price.

Athlon XP / MP / Duron (new style)

AMD switched to a big offensive, trying to persuade the buyers to demand AMD CPU's instead of being OK with Intels. The new versions of these processors were all just a tad better than the previous one, could do a slight number of instructions more (the Athlons started with not even SSE1, and from model 6 (both Athlon and Duron) they supported it). The processors also advanced very slightly in each other direction, making each new type just a tad faster than the previous one. In the end of the GHZ wars (past year, about) the fastest Athlon was running at 2.2GHZ, but outperformed the better half of the 3GHZ P4's.

AMD64 based CPU's

This is slightly offtopic here, but still quite relevant, since these processors all support the entire IA32 family natively. AMD created a new processor, with 64-bit (actually 48-bit, but who notices those 16 bits?) memory addressing and 64-bit calculations, being very compatible with the old style CPU's. So compatible, that the core for 32-bit and 64-bit is essentially equal, aside from the size of calculations and the support of a few encodings that were in effect redundant. They removed a few 1-byte opcodes (about 20 in total, including all 1-byte INC and 1-byte DEC instructions) to make place for a new REX prefix. They modified it to use 16 registers instead of 8, added a load of new names, got the old software working, and optimized the 32-bit performance to unprecedented levels. These CPU's outperform the P4 at any clock speed, in almost (1/20 programs not) any calculation-intensive program. This made them very popular, but also very expensive, The cheapest nowadays is around 180 dollars, or euro's.

Other CPU vendors making similar chips

Cyrix

Cyrix was a well-known CPU vendor from the 386 years (and slightly before) up to the Pentium II times, when it more or less vanished inside Via. Via now uses the name as a CPU name (not making it clearer), but this section is about the Cyrix CPU's. The processors supporting CPUID call it a "CyrixInstead"

Cyrix 387

This isn't actually a processor, but is the most famous Cyrix processor. It was the fastest coprocessor to the 386 to be found, and was even very usable aside a 486-SX. These were the main line of money for Cyrix.

Cyrix 5x86

A processor that performed as a 486 and was socket-compatible. Is not a pentium compatible, misses required instructions (such as cmpxchg8b).

Cyrix 6x86MX / M1

This processor is, even though the name suggests otherwise, compatible with the 586 (Pentium). It didn't contain any of the MMX or PPro features but is nevertheless very nice. It performed slightly better per cycle, and was thus given ratings. This was the time they were loathed for rating their processors.

Cyrix M2

Was a Pentium MMX compatible processor, also using ratings which gave it a bad name to start with. It was again socket-compatible to the Pentium MMX and the older Pentiums (without MMX). It supported a few features from the Pentium Pro, among which the very usable CMOVcc set. This however wasn't well known at the time, and nobody seemed to care.

There were possibly more but the current author can't recall which ones. Suffice to say, if there were any they were impopular and they were soon gone. The company was bought by Via.

Rise Technologies

I've only heard about this company making Pentium-compatible chips, without MMX, but I don't know any detail but the CPUID identifier string. It just stuck. The string was "RiseRiseRise", or the same in all 3 dwords (making a search for it very easy).


partially related thread: AT,XT and PC


Category: CollectedKnowledge, HardWareCpu

What is v8086 mode ?

Included from What is v8086 mode?

Virtual 8086 mode is a sub-mode of ProtectedMode. In short, virtual 8086 mode is whereby the cpu (in protected mode) is running a "Emulated" 16bit 'segmented' model (real mode) machine.

I don't enable V86 myself. Why should i care ?

The most common problem with v86 mode is that you can't enter ProtectedMode inside a v86 task. In other words, if you are running Windows or have emm386 in memory, you can't do a "raw" switch into protected mode (it causes an exception, iirc). DOS extenders worked around that problem using either VCPI or DPMI interfaces to switch into pmode (actually, promoting their V86 task as a 'regular' user task). For an OS programmer such interfaces are simply useless as they're part of another OS.

There are a few other more "technical" problems people have when using v86 mode, mostly because v86 has some instructions "emulated" by what's known as a v86-monitor program, as the cpu is in protected mode, some instructions are high up on the security/protection level and running those directly would cause no-end of trouble for the OS.

Such technicalities are beyond the scope of a simple FAQ. If you wish to learn more about virtual mode, i suggest you read the corresponding chapter of the HollyIntelManual.

How do i detect v8086 ?

EFLAGS.VM is NEVER pushed onto the stack if the V86 task uses PUSHFD. You should check if CR0.PE=1 and then assume it's V86 if that bit is set.

detect_v86:
        smsw    ax
        and     eax,1           ;CR0.PE bit
        ret

VM mode detection is mainly useful when writing DOS extenders or other programs that could be started either in plain real mode or in virtual mode from a protected mode system. An 'ordinary' bootloader shouldn't worry about this since the BIOS will not set up VM86 to read the bootsector ;)

I heard it could help me. How can i support it ?

Indeed, VM86 can be of high interrest if you need to access BIOS functions while you're in ProtectedMode. This is essentially useful in order to set up video mode. As many modern card/chipsets lack support for VBE3 protected mode interface, setting up a VM86 task that will perform the proper 'set video mode' call remains the method.

TimRobinson has provided a very nice tutorial about VM86 mode. BeyondInfinity also has a working implementation (combined VM86+VBE task). See VirtualMonitor page for more implementation considerations.

Argh! My kernel is below 1MB! what can i do ?

TimRobinson and many others suggests that you put your kernel at a 'high' logical address (e.g. 0xC0000000) to avoid VM86 tasks to interfere with it. This is especially important when your kernel is large and leaves no room for VM86 code below 1MB, or when you plan to run 'full programs' within your VM86 box.

If all you need is a BIOS interrupt wrapper, then you can easily do the following:

  1. ensure that your 16bits code is on a separate page from any 32 bits code
  2. enable paging
  3. make kernel pages unwritable (and unreadable ?) for DPL3 and allow user-access only to those pages that contains your 16 bits code and pages that contains BIOS code or data.

Can i use VM86 for disk access ?

Theorically yes, though it is probably not a GoodIdea(tm), as most BIOS disk access will include IRQ handlers, DMA transfers which you can hardly control from your VM monitor, and may stick in VM86 task while the BIOS waits for an interrupt response while a 'good' driver would have let the CPU free for other processes.

Remember of your old MS9x system freezing when doing a disk access ? that was most of the time due to an INT13-through-VM86 problem.


Categories: FAQ, HardWareCpu


Related forum threads

Creating vm86 task VM86 and INT10h kernel location & VM86

add yours here

Additionnal links

add yours here

AMD K6 WriteBack Optimisations

Included from AMD K6 WriteBack Optimisations

I wrote and tested this on my own K6 (k6-200) and it works ok, but I was unable to find anyone with a K6-2 (CXT core) or K6-3 since there is two different methods for enabling writeback mode. It should work fine on k6-2 CXT and K6-3 processors.

With some tweaking, can be put into anyone's OS.

You call AMD_K6_writeback with the CPUID results family, model and stepping, only when you are sure you have an AMD cpu.

Here's the code, using InlineAssembly

void AMD_K6_writeback(int family, int model, int stepping)
{
    /* mem_end == top of memory in bytes */
    int mem=(mem_end>>20)/4; /* turn into 4mb aligned pages */
    int c;
    union REGS regs;

    if(family==5)
    {
        c=model;

        /* model 8 stepping 0-7 use old style, 8-F use new style */
        if(model==8)
        {
            if(stepping<8)
                c=7;
            else
                c=9;
        }

        switch(c)
        {
        /* old style write back */
        case 6:
        case 7:
            AMD_K6_read_msr(0xC0000082, &regs);
            if(((regs.x.eax>>1)&0x7F)==0)
                kprintf("AMD K6 : WriteBack currently disabled\n");
            else
                kprintf("AMD K6 : WriteBack currently enabled (%luMB)\n",
                    ((regs.x.eax>>1)&0x7F)*4);

            kprintf("AMD K6 : Enabling WriteBack to %luMB\n", mem*4);
            AMD_K6_write_msr(0xC0000082, ((mem<<1)&0x7F), 0, &regs);
            break;

        /* new style write back */
        case 9:
            AMD_K6_read_msr(0xC0000082, &regs);
            if(((regs.x.eax>>22)&0x3FF)==0)
                kprintf("AMD K6 : WriteBack Disabled\n");
            else
                kprintf("AMD K6 : WriteBack Enabled (%luMB)\n",
                    ((regs.x.eax>>22)&0x3FF)*4);

            kprintf("AMD K6 : Enabled WriteBack (%luMB)\n", mem*4);
            AMD_K6_write_msr(0xC0000082, ((mem<<22)&0x3FF), 0, &regs);
            break;
        default:    /* dont set it on Unknowns + k5's */
            break;
        }
    }
}

void AMD_K6_write_msr(ULONG msr, ULONG v1, ULONG v2, union REGS *regs)
{
    asm __volatile__ (
        "pushfl\n"
        "cli\n"
        "wbinvd\n"
        "wrmsr\n"
        "popfl\n"
        : "=a" (regs->x.eax),
          "=b" (regs->x.ebx),
          "=c" (regs->x.ecx),
          "=d" (regs->x.edx)
        : "a" (v1),
          "d" (v2),
          "c" (msr)
        : "eax",
          "ecx",
          "edx",
          "ebx",
          "memory");
}

void AMD_K6_read_msr(ULONG msr, union REGS *regs)
{
    asm __volatile__ (
        "pushfl\n"
        "cli\n"
        "wbinvd\n"
        "xorl %%eax, %%eax\n"
        "xorl %%edx, %%edx\n"
        "rdmsr\n"
        "popfl\n"
        : "=a" (regs->x.eax),
          "=b" (regs->x.ebx),
          "=c" (regs->x.ecx),
          "=d" (regs->x.edx)
        : "c" (msr)
        : "eax",
          "ecx",
          "edx",
          "ebx",
          "memory");
}

Categories: CollectedKnowledge, HardWareCpu

How can I tell CPU speed ?

Included from How can I tell CPU speed ?

General Overview

In order to tell what's the CPU speed, we need two things:

  1. being able to tell that a given (precise) amount of time has elapsed.
  2. being able to know how much 'clock cycles' a portion of code took.

Once these two sub-problems are solved, one can easily tell the CPU speed

using the following pseudo-code
prepare_a_timer(X milliseconds ahead);
while (timer has not fired) {
  inc iterations_counter;
}
cpuspeed_mhz = (iteration_counter * clock_cycles_per_iteration)/1000;

Note that except for very special cases, using a busy-loop (even calibrated) to introduce delays is a bad idea and that it should be kept for very small delays (nano or micro seconds) that you must comply when programming hardware only.

Also note that PC emulators (like BOCHS, for instance) are rarely realtime and that you shouldn't be surprised if your clock appears to run faster than expected on those emulators.

Waiting for a given amount of time

There are two circuits in a PC that allows you to deal with time: the PIT (Programmable Interval Timer, 8253 iirc) and the RTC (Real Time Clock). The PIT is probably the better of the two for this task.

The PIT has two operating mode that can be useful for telling the cpu speed:

  1. the periodic interrupt mode (0x36), in which a signal is emitted to the interrupt controller at a fixed frequency. This is especially interresting on PIT channel 0 which is bound to IRQ0 on a PC.
  2. the one shot mode (0x34), in which the PIT will decrease a counter at its top speed (1.19318 MHz) until the counter reaches zero.

    Whether or not an IRQ is fired by channel0 in 0x34 mode should be checked

Note that theorically, one shot mode could be used with a polling approach, reading the current count on the channel's data port, but I/O bus cycles have unpredictable latency and one should make sure the timestamp counter is not affected by this approach.

ToDo: check if there's code that programs the PIT in the FAQ already

Knowing how many cycles your loop takes

This step depends on your CPU. On 286, 386 and 486, each instruction took a well-known and deterministic amount of clock cycles to execute. This allowed the programmer to tell exactly how many cycles a loop iteration took by looking up the timing of each instruction (see HelpPC) and then sum them up.

Since the multi-pipelined architecture of the Pentium, however, such numbers are no longer communicated (for a major part because the same instruction could have variable timings depending on its surrounding, which makes the timing almost useless)

It is possible to create code which is exceptionally pipeline hostile such as:

xor eax,edx
xor edx,eax
xor eax,edx
xor edx,eax
...

A simple xor instruction takes one cycle, and it's guaranteed that the processor cannot pipeline this code as the current instructions operands depend on the results from the last calculation. One can check that, for a small count (tested from 16 to 64), RDTSC will show the instruction count is almost exactly (sometimes off by one) the cycles count. Unfortunately, when making the chain longer you'll start experiencing code cache misses, which will ruin the whole process.

E.g. looping on a chain of 1550 XORs may require a hundred of iterations before it stabilizes around 1575 clock cycles on a AMDx86-64, and i'm still waiting it to stabilize on my Pentium3

Despite this inaccuracy it gives relatively good results across the whole processor generation given a reasonably accurate timer but if very accurate measurements are needed the next method should prove more useful.

A Pentium developer has a much better tool to tell timings: the Time Stamp Counter: an internal counter that can be read using RDTSC special instruction

rdtscpm1.pdf explains how that feature can be used for performance monitoring and should provide the necessary information on how to access the TSC on a pentium

How do i know if i have access to RDTSC instruction or not ?

The presence of the Time Stamp Counter (and thus the availability of RDTSC instruction) can be detected through the CPUID instruction. When calling cpuid with eax=1, you'll receive the features flags in edx. TSC is the bit #4 of that field.

Included from CpuIdWarning

Note that prior to use the CPUID instruction, you should also make sure the processor support it by testing the 'ID' bit in eflags (this is 0x200000 and is modifiable only when CPUID instruction is supported. For systems that doesn't support CPUID, writing a '1' at that place will have no effect)

In the case of a processor that does not support CPUID, you'll have to use more eflags-based tests to tell if you're running on a 486, 386, etc. and then pick up one of the 'calibrated loops' for that architecture (8086 through 80486 may have variable instruction timings).

Do you have code that works ?

There is a RealMode Intel-copyrighted example in the above-mentionned application note ... Here comes another code submitted by DennisCGC that will give the total measured frequency of a pentium processor.

Some notes:

  • irq0_count is a variable, which increases each time when the timer interrupt is called.
  • in this code it's assumed that the PIT is programmed to 100 hz (of course, I give the formula about how to calculate it
  • it's assumed that the command CPUID is supported.

AsmExample:

;get_speed
;first do a cpuid command, with eax=1
mov  eax,1
cpuid
test edx,byte 0x10      ; test bit #4. Do we have TSC ?
jnz  detect_end         ; no ?, go to detect_end
;wait until the timer interrupt has been called.
mov  ebx, [irq0_count]
;wait_irq0
cmp  ebx, [irq0_count]
jz   wait_irq0
rdtsc                   ; read time stamp counter
mov  [tscLoDword], eax
mov  [tscHiDword], edx
add  ebx, 2             ; Set time delay value ticks.
; remember: so far ebx = [irq0]-1, so the next tick is
; two steps ahead of the current ebx ;)
;wait_for_elapsed_ticks
cmp  ebx, [irq0_count] ; Have we hit the delay?
jnz  wait_for_elapsed_ticks
rdtsc
sub eax, [tscLoDword]  ; Calculate TSC
sbb edx, [tscHiDword]
; f(total_ticks_per_Second) =  (1 / total_ticks_per_Second) * 1,000,000
; This adjusts for MHz.
; so for this: f(100) = (1/100) * 1,000,000 = 10000
mov ebx, 10000
div ebx
; ax contains measured speed in MHz
mov [mhz], ax

See the intel manual (see links) for more information. (

-- bugs report are welcome. IM to DennisCGC

Can i do it if i have no interrupts support (yet) ?

I'd be tempted to say 'yes', though I haven't gave it a test nor heard of it elsewhere so far. Here is the trick
disable()     // disable interrupts (if still not done)
outb(0x43,0x34);   // set PIT channel 0 to single-shot mode
outb(0x40,0);
outb(0x40,0);      // program the counter will be 0x10000 - n after n ticks
long stsc=CPU::readTimeStamp();
for (int i=0x1000;i>0;i--);
long etsc=CPU::readTimeStamp();
outb(0x43,0x04);   // read PIT counter command ??
byte lo=inb(0x40);
byte hi=inb(0x40);

Now, we know that

  1. ticks=(0x10000 - (hi*256+lo)) periods of 1/1193180 seconds have elapsed at least and no more than ticks+1.
  2. etsc-stsc clock cycles have elapsed during the same time.

Thus (etsc-stsc)*1193180 / ticks should be your CPU speed in Hz ...

As far as i can say, 0x1000 iterations lead to 10 PIT ticks on a 1GHz CPU and a bit less than 0x8000 ticks on the same CPU running BOCHS. This certainly means that on very high speed systems, the discovered speed may not be accurate at all, or worse, less than 1 tick could occur ...

This technique is currently under evaluation in the forum

-- hope you like my technique /PypeClicker

Asking the SMBios for CPU speed

The SMBios (System Management BIOS) Specification addresses how motherboard and system vendors present management information about their products in a standard format by extending the BIOS interface on Intel architecture systems. The information is intended to allow generic instrumentation to deliver this information to management applications that use DMI, CIM or direct access, eliminating the need for error prone operations like probing system hardware for presence detection.

SMBios Processor Information

A Processor information (type 4) structure describes features of the CPU as detected by the SMBios. The exact structure is depicted in section 3.3.5 (p 39) of the standard. Within those informations will you find the processor type, family, manufacturer etc. but also

  • the External Clock (bus) frequency, which is a word at offset 0x12,
  • the Maximum CPU speed in MHz, which is a word at offset 0x14 (e.g. 0xe9 is a 233MHz processor),
  • the Current CPU speed in MHz, (word at offset 0x16).

How do i get that structure ?

SMBios provide a Get SMBIOS Information function that tells you how many structures exists. You can then use Get SMBIOS Structure function to read processor information.

As an alternative, you can locate the SMBIOS Entry Point and then traverse manually the SMBIOS structure table, looking for type 4.

All this is depicted in 'Acessing SMBIOS Information' structure of the standard (p 11).

The SMBIOS Entry Point structure, described below, can be located by application software by searching for the anchor-string on paragraph (16-byte) boundaries within the physical memory address range 000F0000h to 000FFFFFh. This entry point encapsulates an intermediate anchor string that is used by some existing DMI browsers.

00-03 Anchor String (_ SM _ or 5f 33 4d 5f)
04 Checksum
05 Length
06 major version
07 minor version
08-09 max structure size
0A entry point revision
0B-0F formatted area
10-14 _ DMI _ signature
15 intermediate checksum
16-17 structure table length
18-1B structure table (physical) address
1C-1D number of SMBIOS structures
1E SMBIOS revision (BCD)

I don't feel like re-explaining the PnP calling convention etc. as chances are it will be useless in ProtectedMode ...

-- Thanks to DasCandy for bringing this information to my knowledge ;)


Categories: HowTo, HardWareCpu


Links

Related threads in the forum:

http://www.mega-tokyo.com/forum/index.php?board=1;action=display;threadid=5849 http://www.mega-tokyo.com/forum/index.php?board=1;action=display;threadid=767 http://www.mega-tokyo.com/forum/index.php?board=1;action=display;threadid=922 http://www.mega-tokyo.com/forum/index.php?board=1;action=display;threadid=8949, featuring info on bogomips, how linux does it and durand's code.

Other resources

ftp://download.intel.com/support/processors/procid/

especially section 12: "Operating Frequency" on page 29 of 24161815.pdf

Searching for SMBIOS should give you info on that too, it contains entries about the CPU, including current speed.

Tell me about x86 64 bits CPU ...

Included from Tell me about x86 64 bits CPU ...

This page tries to clear ideas about x86-64 CPUs (AMD64 and Intel's equivalent EM64T implementation). IA-64 (Itanium) are really a different beast and not addressed here.

Features

What does Long Mode offer ?

Long mode extends general registers to 64 bits (RAX, RBX, RIP, RSP, RFLAGS, etc), and adds an additional 8 integer registers (R8, R9, ..., R15) plus 8 more SSE registers (XMM8 to XMM15) to the CPU. Linear addresses are extended to 64 bit (however, a given CPU may implement less than this) and the physical address space is extended to 52 bits (a given CPU may implement less than this). In essence long mode adds another mode to the CPU
  • Real mode
  • Legacy mode (32 bit protected mode)
  • Long mode (64 bit protected mode)
  • System Management mode

Long mode does not support hardware task switching or virtual 8086 tasks, and most of the segment register details are ignored (a flat memory model is required). In long mode the current CS determines if the code currently running is 64 bit code (true long mode) or 32 bit code (compatability mode), or even 16-bit protected mode code (still in compatability mode).

The first 64 bit CPUs from both Intel and AMD will support 40 bit physical addresses and 48 bit linear addresses.

Setting up ...

How do I detect if the CPU is 64 bits ?

You can find that out by checking CPUID. All AMD64 compliant processors have the longmode-capable-bit turned on in the extended feature flags (bit 29) in EDX, after calling CPUID with EAX=0x80000001. There are also other bits required by long mode, but you can see those yourself in CPUID at AMD general purpose instruction reference

How do i enable Long Mode ?

The steps for enabling long mode are
  • Disable paging
  • Set the PAE enable bit in CR4
  • Load CR3 with the physical address of the PML4
  • Enable long mode by setting the EFER.LME flag in MSR 0xC00000080
  • Enable paging

Now the CPU will be in compatability mode, and instructions are still 32-bit. To enter long mode, the D/B bit (bit 22, 2nd dword) of the GDT code segment must be clear (as it would be for a 16-bit code segment), and the L bit (bit 21, 2nd dword) of the GDT code segment must be set. Once that is done, the CPU is in 64-bit long mode.

Are there restrictions on 32 code running in Legacy Mode ?

x86-64 processors can operate in a legacy mode, they still start in real mode and protected mode is still available (along with the associated v8086 mode). This means an x86 operating system, even DOS, will still run just fine. The only difference is that physical addresses can be up to 52 bits (or as many bits as implemented by the CPU) when PAE is used.

However, there is nothing like Virtual8086 Mode (16 bits support) once in long/compatibility mode.

Can i enable Long Mode directly ?

Protected mode must be entered before activating long mode. A minimal protected-mode environment must be established to allow long-mode initialization to take place. This environment must include the following:

  • A protected-mode IDT for vectoring interrupts and exceptions to the appropriate handlers while in protected mode.
  • The protected-mode interrupt and exception handlers referenced by the IDT.
  • Gate descriptors for each handler must be loaded in the IDT.

    --AMD64 docs, volume 2, section 14.4 (Enabling Protected Mode), 24593 Rev. 3.10 February 2005

That being said, we have a thread where Brendan shows how you can enable 64-bit long mode with no 32-bit IDT and no 32-bit segments ... Be assured, however, that any paging-related exception that occurs in long mode before you enable 64-bit IDT will cause the processor to reset due to a triple fault ...

64bit Environment Models

There are three 64bit programming models you need to consider; LP64, ILP64, LLP64, each mode has its own pitfalls. The I/L/P stand for Int, Long, Pointer, and the 64 means thats how many bits in each.

This LP64 means Longs and Pointers are 64bits wide. LL is a special case and means long-long...

DataTypes

This table lists the breakdown of sizes in the various programming models.

 Datatype   LP64   ILP64   LLP64   ILP32   LP32 
 char   8   8   8   8   8 
 short   16   16   16   16   16 
 _int   32   --   32   --   -- 
 int   32   64   32   32   16 
 long   64   64   32   32   32 
 long long   --   --   64   --   -- 
 pointer   64   64   64   32   32 

64bit OS Modes

The following table lists what some current 64bit OS have as a programming model.

 OS   Mode 
 Windows XP-64 / IA64   LLP64 
 Linux   LP64 
 Solaris   LP64 
 DEC OSF/1 Alpha   LP64 
 SGI Irix   LP64 
 HP UX 11   LP64 

Categories: CollectedKnowledge, HardWareCpu


Learn More

Protected Mode (glossary)

Included from ProtectedMode

Glossary -- ProtectedMode

Protected mode is the 32 bit 'native' operating mode of Intel processors (and clones) since the 80386. It allows the developer to work with several virtual address spaces, each of which has 4GB of addressable memory and allows the system to enforce strict memory protection as well as restricting the available instruction set (so that your application cannot control the hard disk directly while the kernel can ;)

Protected mode unleashes the real power of your CPU, so you better get informed about it if you are considering writing an OS. However, it will prevent you from using virtually any of the BIOS interrupts (unless you have a V86 monitor).

Whether the CPU is in RealMode or in protected mode is defined by the lowest bit of the CR0 register, so basically

    ;; make sure interrupts are disabled, etc.
    mov eax, cr0
    or al,1
    mov cr0,eax

takes you to protected mode ... however you'll discover that there are many other things to be done before and after that operation to switch gracefully to pmode rather than resetting the CPU...

Plenty of information about protected mode can be found on both OSRC and Bona Fide, including in-detail tutorials and realmode/pmode switch programs. Our BabyStep series could also help you a bit :)


You may like http://home.swipnet.se/smaffy/asm/info/embedded_pmode.pdf if you're looking for a pragmatic tutorial on pmode.

Real Mode (glossary)

Included from RealMode

Glossary -- RealMode 16 bits Operating mode in which the x86 cpu runs when it boots. That mode is mainly for backward compatibility and provide very few help for the modern developer (no memory protection, only 1MB of adressable memory, no virtual memory support). BIOS and DOS are typically real-mode stuff. All the rest you know (windows, linux, DukeNukem3D, zsnes, dos4gw, djgpp ...) are ProtectedMode OS/applications/dosextenders respectively.

additionnal informations about address formations in RealMode can be found in Perica's tutorial on Bona Fide.

Unreal Mode (glossary)

Included from UnrealMode

What is unreal mode ? (for the Glossary)

basically, unreal mode consist of breaking the '64Kb' limit of real mode segments, but still keeping 16 bits instruction and segment*16+offset address formation. You can find much more about it in OSRC

When should i use unreal mode ?

unreal mode is recommended in the two following cases :

  1. you're trying to extend a legacy 16-bits DOS program so that it can deal with larger datas and neither vm86, nor xms is suitable for your needs
  2. you're trying to load something that will run in 32 bits mode and which is larger than 640Kb (so you cannot load it in conventionnal memory) and you don't want to bother with a disk driver called from pmode yet, and you do not wish to switch between real and protected mode for copying chunks from the conventionnal memory buffer to the high memory areas ...

Of course, unreal mode is kinda useless as long as you don't have the A20Line enabled.

How do i set up unreal mode ?

See BabyStep7 :)


related threads:

PowerPC (a step in non-Intel world)

Included from PowerPC

The PowerPC CPU architecture is significantly different from the IA32. Yet still, the architecture of your OS need not differ too much: While the way you address memory on the lowest levels might be different, or the way your SIMD unit operates, you still have a bootloader, a scheduler, a dispatcher, a memory manager etc. etc.

You will be able to transcribe most of the documentation 1:1 to the PowerPC, unless they handle the low levels (interrupt handling, real vs. protected mode etc.).

For the low levels, you'll of course need Motorola (or IBM) docs.

There are good books on the PowerPC architecture (including MMU and stuff) too, but they're harder to find because they're less demand for it.

But CPU docs won't suffice. You also need info on the motherboard / chipset / boot sequence etc. - and that's where it gets tricky, since there is no such thing as "the PowerPC architecture" - Apple doesn't like giving away that information, and there are many other platforms that might or might not be compatible.

A Linux-on-PPC project exists, and is probably a good place to look for PowerPC info...

/usr/src/linux/arch/ppc/boot/prep/head.S
/*
* Boot loader philosophy:
*      ROM loads us to some arbitrary location
*      Move the boot code to the link address (8M)
*      Call decompress_kernel()
*        Relocate the initrd, zimage and residual data to 8M
*        Decompress the kernel to 0
*      Jump to the kernel entry
*            -- Cort
*/

The BIOS in latest Apple Macintosh is named "Open Firmware". Open Firmware was initially a SPARC-stuff targetted at Sun stations. Extension cards and other hardware of the like that are OpenFirmware compliant should carry a Forth-written initialisation code in their ROM. I suppose the main boot rom then scan memory for such ROMs and then interprete the FORTH code on them.

There are a few PowerPC emulation projects, namely:


Related Threads: OS for PowerPC



.:Files::CPU::Memory::IRQ::Video::PnP:.



Hardware::Memory

Why cant I access all my memory ?

Included from Why cant I access all my memory?

You have enabled ProtectedMode or UnrealMode and still weird things occurs when you try to load your kernel above 1MB barrier ... Or maybe you just read about HighMemoryArea and found that you could be loading a small kernel just above 1MB because 0xFFFF*16 + 0x1234 == 0x101224 (see perica's tutorial about RealMode for details)...

but it doesn't work.

The main reason you can't access all your memory (or only "odd" megabytes) is because you need to enable the A20 line on your bus.

Note: This should not happen if you are using a modern (circa 1997 or later) machine. The modern IBM PC and compatibles do not have all those separate chips physically on-board (keyboard controller, interrupt controller and friends); they actually have one chip which emulates all these IBM PC and compatible chips called the Super I/O chip. The Super I/O chip enables the A20 line for you.

What is the A20 line ?

Included from A20Line

The A20 line takes a bit of explaining.

When the IBM-AT was introduced, it was able to access up to sixteen megabytes of memory (instead of the 1 MByte of the IBM-XT). But to remain compatible with the IBM-XT, a quirk in the XT architecture (memory wraparound) had to be duplicated in the AT. To achieve this, the 20th line on the address bus (A20) was turned off.

Some programs (and bioses) were indeed expecting that 0xFFFF * 16 + 0x1234 would be 0x001224.

The A20 line is controlled by the keyboard controller unit, which is usually a derivative of the 8042 chip. By programming that chip accurately, you can either enable or disable bit #20 on the address bus.

When your PC boots, the A20 gate is always disabled, but some BIOSes do enable it for you, as do some high-memory managers (HIMEM.SYS) or bootloaders (GRUB).


This is a Glossary page, see Why cant I access all my memory? for more info.

How do i enable the A20 gate (Pre Pentium)

To enable the A20 line, you have to use some hardware IO using the Keyboard Controller chip (8042 chip) and enable it. Good documentation exists for the 8042 chip but here is my source for enabling the A20 in C code.

The flow chart for this is;

   1. Disable interrupts
   2. Wait until the keyboard buffer is empty
   3. Send command to disable the keyboard
   4. Wait until the keyboard buffer is empty
   5. Send command to read output port
   6. Wait until the keyboard buffer is empty
   7. Save byte from input port
   8. Wait until the keyboard buffer is empty
   9. Send command Write output port
  10. Wait until the keyboard buffer is empty
  11. Send saved byte OR by 2 (GATE A20 to ON)
  12. Wait until the keyboard buffer is empty
  13. Enable the keyboard
  14. Enable interrupts

It's especially important to wait for the 8042 everytime the flowchart tells it. Going too fast may lead to discarded command/data bytes on some chipsets.

Here is my C source;

/* Init the A20 by Dark Fiber */
void init_A20(void)
{
        UCHAR   a;

        disable_ints();

        kyb_wait_until_done();
        kyb_send_command(0xAD);         // disable keyboard

        kyb_wait_until_done();
        kyb_send_command(0xD0);         // Read from input

        kyb_wait_until_done();
        a=kyb_get_data();

        kyb_wait_until_done();
        kyb_send_command(0xD1);         // Write to output

        kyb_wait_until_done();
        kyb_send_data(a|2);

        kyb_wait_until_done();
        kyb_send_command(0xAE);         // enable keyboard

        enable_ints();
}

Or in ASM if you wish

;;
;; NASM 32bit assembler
;;

[bits 32]
[section .text]

enable_A20:
        cli

        call    a20wait
        mov     al,0xAD
        out     0x64,al

        call    a20wait
        mov     al,0xD0
        out     0x64,al

        call    a20wait2
        in      al,0x60
        push    eax

        call    a20wait
        mov     al,0xD1
        out     0x64,al

        call    a20wait
        pop     eax
        or      al,2
        out     0x60,al

        call    a20wait
        mov     al,0xAE
        out     0x64,al

        call    a20wait
        sti
        ret

a20wait:
.l0:    in      al,0x64
        test    al,2
        jz      .l2
        jmp     .l0
.l2:    ret


a20wait2:
.l0:    in      al,0x64
        test    al,1
        jnz     .l2
        jmp     .l0
.l2:    ret

According to The Undocumented PC (Frank Van Gilluwe) the AT keyboard controller can also accept a Disable and Enable A20 line command directly (using the A20 wait subroutine from above)-

call a20wait
out 0x64,  0xDD         ; Disable A20
ret

call a20wait
out 0x64,  0xDF         ; Enable A20
ret

How do i enable the A20 gate (Post Pentium)

Fortunately with the Pentium onwards, the processor has a FAST A20 option that bypasses the A20 line altogether. To set the A20 line, there is no need for delay loops or polling, just 3 simple instructions.

        in al, 0x92
        or al, 2
        out 0x92, al

However, this is not supported everywhere and there is no reliable way to tell if it will have some effect or not on a given system. Even worse, on some systems, it may actually do something else like blanking the screen, so imvho, you shouldn't use that unless you learned by the BIOS that FAST A20 is available... and you still have to write down the code for systems that do not support FAST A20. The good news is that enabling A20 line is one of the things GRUB can do for you, and it does it well :)


This is a TroubleShooting and HowTo page about HardWareMemory

How do I determine the amount of RAM ?

Included from How do I determine the amount of RAM?

What and why ...

Determining how much memory you have is one of the first things that most people implement in their kernel. In the "old" days of operating systems this was very easy as few people had more than 64mb of memory.

Why did I mention 64mb of memory? Because the CMOS can only hold values up to 99mb. So what are you going to do if you encounter machines with 128mb? There are some functions in the BIOS for memory handling. The first set of calls supported by all BIOS only returns what is in the CMOS, thus making it irrelevant. There is some more advanced calls but are not kuse are directly probing memory, and using your motherboard chipset registers to determine memory. The major drawback with the later methods is you must know what kind of chipset the user has on his motherboard....

Something to note, not that you may need to know, is that on old machines (maybe even new ones), it is possible to have less than 640kb base memory and still have extended memory beyond the 1mb mark. But in today's machines where most people have single 64mb DIMM's, you wont come across this.

A map of low (conventional) memory

There are regions that are used for the system and that the BIOS will never report to you because they're assuming well-known standards. Quoting The Workings of: x86-16/32 RealMode Addressing by Perica Senjak (minor corrections by BrendanTrotter):

        start        end      size  region/exception       description

Low Memory (the first MiB)
     00000000 - 000003FF       400  RAM                    Real-Mode IVT (Interrupt Vector Table)
     00000400 - 000004FF       100  RAM                    BDA (BIOS data area)
     00000500 - 0009FBFF   ? 9F700  RAM/free for use       Conventional memory (<= 9F700 Byte)
     00007C00 - 00007DFF       200  RAM                    Operating System BootSector
     0009FC00 - 0009FFFF       400  RAM                    EBDA (Extended BIOS data area)
     000A0000 - 000FFFFF     60000  various                ROM Area (384 KiB)

    Standard usage of the ROM Area:
     000A0000 - 000BFFFF     20000  video RAM               VGA Mem (128 KiB)
     000A0000 - 000AFFFF     10000  video RAM                VGA framebuffer (64 KiB)
     000B0000 - 000B7FFF      8000  video RAM                text monochrom  (32 KiB)
     000B8000 - 000BFFFF      8000  video RAM                text color      (32 KiB)
     000C0000 - 000C7FFF      8000  ROM                     Video BIOS* (32 KiB is typical size)
     000F0000 - 000FFFFF     10000  ROM                     Motherboard BIOS* (64 KiB is typical size)

High Memory (everything after the first MiB)
     00100000 - FEBFFFFF  FEB00000  RAM?/free for use?     Extended memory
     01000000 - 010FFFFF    100000  ?                       ISA 15-16MB (only with ISA bus?)
     FEC00000 - FFFFFFFF   1400000  various                PnP NVRAM?, LAPIC, ...

Counting RAM using the BIOS

You can determine RAM size with the BIOS via two different calls.

The first call is built in nearly every BIOS, the later call is only contained within newer BIOS's (from Ralf Browns Interrupt List:)

 --------B-1588-------------------------------
 INT 15 - SYSTEM - GET EXTENDED MEMORY SIZE (286+)
   AH = 88h
   Return: CF clear if successful
     AX = number of contiguous KB starting at absolute address 100000h
     CF set on error
   AH = status
     80h invalid command (PC,PCjr)
     86h unsupported function (XT,PS30)

Notes: DOS TSRs which wish to allocate extended memory for themselves often hook this call, and return a reduced memory size. They are then free to use the memory between the new and old sizes at will. If your OS boots from DOS (e.g. lin-loader for linux), this may become more relevant.

The standard BIOS only returns memory between 1MB and 16MB; use AH=C7h for memory beyond 16MB not all BIOSes correctly return the carry flag, making this call unreliable unless one first checks whether it is supported through a mechanism other than calling the function and testing CF

In these times though, these functions are 'poor' to say the least. A newer function that does the same thing is:

INT 15h
AX = E801h

Return:
  CF clear if successful
  AX = extended memory between 1M and 16M, in K (max 3C00h = 15MB)
  BX = extended memory above 16M, in 64K blocks
  CX = configured memory 1M to 16M, in K
  DX = configured memory above 16M, in 64K blocks CF set on error

This function has been around since about 1994, so all systems from after then up to now should have this function.

Note: For optimum compatibility with all systems you should use the functions in the order: E820h, E881h, E801h and resort to 88h/C7h if everything else fails.

 SeeAlso:AH=87h,AH=8Ah"Phoenix",AH=C7h,AX=DA88h,AX=E801h,AX=E820h

 --------b-15E820-----------------------------
 INT 15 - newer BIOSes - GET SYSTEM MEMORY MAP
   AX = E820h
   EAX = 0000E820h
   EDX = 534D4150h ('SMAP')
   EBX = continuation value or 00000000h to start at beginning of map
   ECX = size of buffer for result, in bytes (should be >= 20 bytes)
   ES:DI -> buffer for result (see #00560)
   Return: CF clear if successful
     EAX = 534D4150h ('SMAP')
     ES:DI buffer filled
     EBX = next offset from which to copy or 00000000h if all done
     ECX = actual length returned in bytes
     CF set on error
       AH = error code (86h) (see #00475 at INT 15/AH=80h)

Notes:

  • this function is now supported by most newer BIOSes
  • a maximum of 20 bytes will be transferred at one time, even if ECX is higher; some BIOSes ignore the value of ECX on entry, and always copy 20 bytes
  • some BIOSes expect the high word of EAX to be clear on entry, I.e. EAX=0000E820h

If this function is not supported, an application should fall back to AX=E802h, AX=E801h, and then AH= 88h the BIOS is permitted to return a nonzero continuation value in EBX and indicate that the end of the list has already been reached by returning with CF set on the next iteration this function will return base memory and ISA/PCI memory contiguous with base memory as normal memory ranges; it will indicate chipset-defined address holes which are not in use and motherboard memory-mapped devices, and all occurrences of the system BIOS as reserved standard PC address ranges will not be reported

 SeeAlso:AH=C7h,AX=E801h"Phoenix",AX=E881h,MEM xxxxh:xxx0h"ACPI"

 Format of Phoenix BIOS system memory map address range descriptor:
 Offset Size Description (Table 00559)
  00h QWORD base address
  08h QWORD length in bytes
  10h DWORD type of address range (see #00560)

 (Table 00560)
 Values for System Memory Map address type:
  01h memory, available to OS
  02h reserved, not available (e.g. system ROM, memory-mapped device)
  03h ACPI Reclaim Memory (useable by OS after reading ACPI tables)
  04h ACPI NVS Memory (OS is required to save this memory between NVS
         sessions)
  other not defined yet -- treat as Reserved
  SeeAlso: #00559

ACPI 3.0 Notes:

  • Version 3.0 of the ACPI standard extends "int 15h, AX=E820"
  • A type of 05h has been defined for "memory in which errors have been detected"
  • A 32 bit set of flags ("Extended Attibutes") has been appended to the end of the 20 byte structure, making it 24 bytes.
  • Bit 0 of the Extended Attributes indicates if the entire entry should be ignored (if the bit is clear). This is going to be a huge compatibility problem because most current OSs won't read this bit and won't ignore the entry.
  • Bit 1 of the Extended Attributes indicates if the entry is non-volatile (if the bit is set) or not. The standard states that "Memory reported as non-volatile may require characterization to determine its suitability for use as conventional RAM.".
  • The remaining 30 bits of the Extended Attributes are undefined.

BIOS reports xxx, is this normal ?

The 'standard' gap between 0xA0000 and 0xFFFFF will never be reported by BIOS.

ToDo does anybody have examples of output to show ? It seems that many people on the FAQ find GRUB or BIOS memory maps confusing, for instance because they appear to have zero-sized zero-typed slots. -- see this thread to find out...

Asking GRUB the amount of RAM

GRUB, or any bootloader implementing The Multiboot Specification provides a convenient way of detecting the amount of RAM your machine has. Rather than re-invent the wheel, you can ride on the hard work that others have done by utilizing the multiboot_info structure. When GRUB runs, it loads this structure into memory and leaves the address of this structure in the EBX register.

To utilize this structure, first include the file multiboot.h in your kernel's main file. Then, make sure that when you load your _main function from your assembly loader, you push EBX onto the stack. BareBones has this already done for you.

The key for memory detection lies in the multiboot_info struct. To get access to it, you've already pushed it onto the stack...define your start function as such:

   _main (multiboot_info_t* mbd, unsigned int magic) {...}

Now you may just check mbd->flags to see that bit 0 is set, and then you can safely refer to mbd->mem_lower for conventional memory (e.g. physical addresses ranging between 0 and 640KB) and mbd->mem_upper for high memory (e.g. from 1MB). Both are given in "real" kilobytes, i.e. blocks of 1024 bytes each.

If this is still not yet enough for you, you may check bit 6 of mbd->flags and use mbd->mmap_addr to access the BIOS-provided memory map. Quoting specifications,

If bit 6 in the flags word is set, then the mmap_* fields are valid, and indicate the address and length of a buffer containing a memory map of the machine provided by the BIOS. mmap_addr is the address, and mmap_length is the total size of the buffer. The buffer consists of one or more of the following size/structure pairs (size is really used for skipping to the next pair):

             +-------------------+
     -4      | size              |
             +-------------------+
     0       | base_addr_low     |
     4       | base_addr_high    |
     8       | length_low        |
     12      | length_high       |
     16      | type              |
             +-------------------+

where size is the size of the associated structure in bytes, which can be greater than the minimum of 20 bytes. base_addr_low is the lower 32 bits of the starting address, and base_addr_high is the upper 32 bits, for a total of a 64-bit starting address. length_low is the lower 32 bits of the size of the memory region in bytes, and length_high is the upper 32 bits, for a total of a 64-bit length. type is the variety of address range represented, where a value of 1 indicates available RAM, and all other values currently indicated a reserved area.

So in order to use the GRUB memory map you declare an appropriate structure, get the pointer to the first instance, grab whatever address and length information you want, and finally skip to the next memory map instance by adding size+4 to the pointer, tacking on the 4 to account for GRUB treating base_addr_low as offset 0 in the structure. You must also use mmap_length to make sure we don't overshoot the entire buffer.

typedef struct multiboot_memory_map {
unsigned int size;
unsigned int base_addr_low,base_addr_high;
//You can also use: unsigned long long int base_addr; if supported.
unsigned int length_low,length_high;
//You can also use: unsigned long long int length; if supported.
unsigned int type;
} multiboot_memory_map_t;

int main(multiboot_info* mbt,unsigned int magic) {
...
multiboot_memory_map_t* mmap = mbt->mmap_addr;
while(mmap < mbt->mmap_addr + mbt->mmap_length) {
 ...
 mmap = (multiboot_memory_map_t*) ( (unsigned int)mmap + mmap->size + sizeof(unsigned int) );
 };
...
}

Counting RAM by direct probing

WE DISCOURAGE YOU FROM DIRECTLY PROBING MEMORY

Use BIOS to get a memory map, or use GRUB.

When perfectly implemented, directly probing memory may allow you to detect the amount of memory even on systems where the BIOS fails to provide the appropriate support (or without even worrying about whether your BIOS can do it or not). Depending on how its coded, may or may not take into account holes in system memory (15/16mb mark ala OS/2) or memory mapped devices like frame buffering SVGA cards, etc.

However, your BIOS knows things you ignore about your motherboard and PCI devices. Probing memory-mapped PCI devices may have unpredictable results and possibly damage your system, so once again we discourage its use.

That being said, here comes an example of how it should work (note the interrupt disable and the cache invalidation using InlineAssembly to keep memory consistent :)

/*
 * void count_memory (void)
 *
 * probes memory above 1mb
 *
 * last mod : 05sep98 - stuart george
 *            08dec98 - ""     ""
 *            21feb99 - removed dummy calls
 *
 */
void count_memory(void)
{
        register ULONG *mem;
        ULONG   mem_count, a;
        USHORT  memkb;
        UCHAR   irq1, irq2;
        ULONG   cr0;

        /* save IRQ's */
        irq1=inb(0x21);
        irq2=inb(0xA1);

        /* kill all irq's */
        outb(0x21, 0xFF);
        outb(0xA1, 0xFF);

        mem_count=0;
        memkb=0;

        // store a copy of CR0
        __asm__ __volatile("movl %%cr0, %%eax":"=a"(cr0))::"eax");

        // invalidate the cache
        // write-back and invalidate the cache
        __asm__ __volatile__ ("wbinvd");

        // plug cr0 with just PE/CD/NW
        // cache disable(486+), no-writeback(486+), 32bit mode(386+)
        __asm__ __volatile__("movl %%eax, %%cr0", ::
                             "a" (cr0 | 0x00000001 | 0x40000000 | 0x20000000) : "eax");

        do
        {
                memkb++;
                mem_count+=1024*1024;
                mem=(ULONG*)mem_count;

                a=*mem;

                *mem=0x55AA55AA;

                // the empty asm calls tell gcc not to rely on whats in its registers
                // as saved variables (this gets us around GCC optimisations)
                asm("":::"memory");
                if(*mem!=0x55AA55AA)
                        mem_count=0;
                else
                {
                        *mem=0xAA55AA55;
                        asm("":::"memory");
                        if(*mem!=0xAA55AA55)
                                mem_count=0;
                }

                asm("":::"memory");
                *mem=a;
        }while(memkb<4096 && mem_count!=0);

        __asm__ __volatile__("movl %%eax, %%cr0", :: "a" (cr0) : "eax");

        mem_end=memkb<<20;
        mem=(ULONG*)0x413;
        bse_end=((*mem)&0xFFFF)<<6;

        outb(0x21, irq1);
        outb(0xA1, irq2);
}

Related Threads
  • Grub memory map, featuring real examples of GRUB/BIOS reported memory map.

Categories: HowTo, HardWareMemory, UsingBios



.:Files::CPU::Memory::IRQ::Video::PnP:.



Hardware::Interrupts

How do I know if an IRQ or exception is firing ?

Included from How do I know if an IRQ or exception is firing?

When the PC boots up, the PIC IRQ's are mapped to interrupts 8 to 15 and 70 to 77.

So when your kernel boots up and you switch into protected mode, the PIC's remain unchanged, and the low 8 IRQ's end up corresponding to the same interrupt as CPU exceptions 8 to 15.

If one of these fire off, you can test if its an CPU exception or IRQ by testing some bits in the PIC status register but there is a much easier way, remap the PIC to use different interrupts than the CPU exceptions! See Can I remap the PIC? for more information on how this is done.


Categories: HardWareIrq

What is the PIC ?

Included from What is the PIC?

The PIC is a "Programmable Interrupt Controler" and is one of THE important chips (datasheets and more on OSRC), without it, x86 would not be an interrupt driven architecture.

There needs to be a way for perhiperals and other devices external of the CPU to tell the system than an event has happened or needs to happen. Examples of this: hard disk IO, modem/serial ports, keyboard.

Without the PIC interface, you would have to poll all the devices in the system to see if they want to do anything (signal an event), but with the PIC, your system can run along nicely until such time that a device wants to signal an event, which means you don't waste time going to the devices, you let the devices come to you when they are ready.

In the begining, the age of the IBM XT, we had only 1 PIC chip giving us 8 hardware interrupt lines, but the 8259A PIC chip has a neat ability, it can cascade!

Cascading means you can daisy chain PIC chips together. This is what happened with the introduction of the IBM AT, we had a second PIC chip cascaded onto the first, giving us a total of 15 hardware lines... Why 15 and not 16? That's because when you cascade chips, the PIC needs to use one of the int lines to signal to the other chip.

Thus, in an AT, IRQ line 2 is used to signal the second chip... But to confuse things more, IRQ 9 is redirected to IRQ 2. So when you get an IRQ 9, the signal is redirected to IRQ 2.


Categories: CollectedKnowledge, HardWareIrq related thread: edge/level-triggered interrupts

Can I remap the PIC ?

Included from Can I remap the PIC?

What's the PIC anyway ?

There are two PIC "chips" (8259A) in PC design which are organized in a master/slave scheme: everything the slave controller has to report goes through a line of the master controller (hardwired on master's channel 2 in the PC design).

What's that noise about remapping ?

PICs can be configured to use a "vector offset" that is added to their IRQ line numbers to form interrupt vectors. Master and Slave PICs have each its own offset, independent of one another.

The default (BIOS-defined) vector offsets are 8 for Master PIC and 0x70 for Slave PIC:

  • Master: IRQ 0..7 -> INT 8..0xF
  • Slave: IRQ 8..15 -> INT 0x70..0x77

These default values don't suit the needs of ProtectedMode programming: there's a collision between IRQs 0..7 (mapped to INT 8..0xF) and processor exceptions (INT 0..0x1F are reserved).

from Intel manual v.3 p.5-2

"Vectors in the range 0 through 31 are reserved by the IA-32 architecture for architecture-defined exceptions and interrupts. Not all of the vectors in this range have a currently defined function. The unassigned vectors in this range are reserved. Do not use the reserved vectors.

The vectors in the range 32 to 255 are designated as user-defined interrupts and are not reserved by the IA-32 architecture. These interrupts are generally assigned to external I/O devices.. "

It's thus recommended to change the PIC's offsets (also known as remapping the PIC) so that IRQs use non-reserved vectors. A common choice is to move them to the beginning of the available range (IRQs 0..0xF -> INT 0x20..0x2F). For that, we need to set Master's offset to 0x20 and Slave's to 0x28.

This can be done by calling remap_pics(0x20, 0x28); (see code below).

Note however that,

  • each PIC vector offset must be divisible by 8, as the 8259A uses the lower 3 bits for the interrupt number of a particular interrupt (0..7).
  • the only way to change the vector offsets used by the PIC is to re-initialize it, which explains why the code is "so long" and plenty of things that have apparently no reasons to be here.
  • if you plan to return to real mode (for any purpose), you really must restore the PIC to its former configration.

Programming the PIC chips

Each chip (master and slave) has a command port and a data port (given in the table below). When no command is issued, the data port allows us to access the interrupt mask of the PIC.

PIC ports:

Master PIC command 0x20
data 0x21
Slave PIC command 0xA0
data 0xA1

A common command for the PIC is the end of interrupt command (code 0x20), but there's also the "initialize" command (code 0x11), which makes the PIC wait for 3 extra "initialization words" on the data port. Those data bytes gives the PIC

  • its vector offset (ICW2),
  • tell it how it is wired to master/slaves (ICW3)
  • gives additionnal infos about the environment (ICW4)

Code

/* reinitialize the PIC controllers, giving them specified vector offsets
   rather than 8 and 70, as configured by default */

#define PIC1            0x20           /* IO base address for master PIC */
#define PIC2            0xA0           /* IO base address for slave PIC */
#define PIC1_COMMAND    PIC1
#define PIC1_DATA       (PIC1+1)
#define PIC2_COMMAND    PIC2
#define PIC2_DATA       (PIC2+1)
#define PIC_EOI         0x20            /* End - of - interrupt command code */

#define ICW1_ICW4       0x01            /* ICW4 (not) needed */
#define ICW1_SINGLE     0x02            /* Single (cascade) mode */
#define ICW1_INTERVAL4  0x04            /* Call address interval 4 (8) */
#define ICW1_LEVEL      0x08            /* Level triggered (edge) mode */
#define ICW1_INIT       0x10            /* Initialization - required! */

#define ICW4_8086       0x01            /* 8086/88 (MCS-80/85) mode */
#define ICW4_AUTO       0x02            /* Auto (normal) EOI */
#define ICW4_BUF_SLAVE  0x08            /* Buffered mode/slave */
#define ICW4_BUF_MASTER 0x0C            /* Buffered mode/master */
#define ICW4_SFNM       0x10            /* Special fully nested (not) */

/*
  arguments:
    offset1 - vector offset for master PIC
      vectors on the master become offset1..offset1+7
    offset2 - same for slave PIC: offset2..offset2+7
 */
void remap_pics(int offset1, int offset2)
{
        UCHAR   a1, a2;

        a1=inb(PIC1_DATA);   // save masks
        a2=inb(PIC2_DATA);

        outb(PIC1_COMMAND, ICW1_INIT+ICW1_ICW4);  // starts the initialization sequence
        io_wait();
        outb(PIC2_COMMAND, ICW1_INIT+ICW1_ICW4);
        io_wait();
        outb(PIC1_DATA, offset1);                    // define the PIC vectors
        io_wait();
        outb(PIC2_DATA, offset2);
        io_wait();
        outb(PIC1_DATA, 4);                       // continue initialization sequence
        io_wait();
        outb(PIC2_DATA, 2);
        io_wait();

        outb(PIC1_DATA, ICW4_8086);
        io_wait();
        outb(PIC2_DATA, ICW4_8086);
        io_wait();

        outb(PIC1_DATA, a1);   // restore saved masks.
        outb(PIC2_DATA, a2);
}
Q
What does that io_wait() function do ?
A
It forces the CPU to wait a little before going on, so that the PIC got the time to react. Simply jumping forward a few times or doing a small loop is usually enough. The exact timing doesn't really matter.

Note that even linux kernel is weird regarding to this feature, allowing a REAL_SLOW_IO flag make delays with 4 times more jumps or by writing to a 'dummy' port (0x80)

Q
Am i the only one to think ICW4_8086|ICW4_BUF_MASTER and ICW4_8086|ICW4_BUF_SLAVE should be sent to the PIC instead of raw 1 ?''

-- PypeClicker

A
Yes, you are. ;) Under normal circumstances, the PIC chip uses the SP/EN pin as an input pin to determine whether it is master or slave. Setting the BUF bit (3) in the PIC ICW4 causes the chip to instead use the SP/EN pin as an output pin for activating external buffers. Since PC-style computers are not wired up this way, that bit should never be set, although it probably doesn't do any harm other than requiring you to then use the M/S bit (2) to tell each PIC its function (since it's no longer using the SP/EN pin to tell). This information comes from The Indispensable PC Hardware Book by Hans-Peter Messmer.

-- DaidalosGuy


Further reading

  • "The Indispensable PC Hardware Book" by Hans-Peter Messmer explains it nicely.
  • The datasheet from Intel - "8259A Programmable Interrupt Controller"

    • Utterly fails to explain what is going on. Has pinout and electrical signal levels, in case you need them. It should have the words "Abandon hope.." written in large, burning letters on the cover..

Categories: HowTo, HardWareIrq

So whats the NMI then ?

Included from So whats the NMI then?

The NMI ("Non Maskable Interrupt") is a hardware-driven interrupt much like the PIC interrupts, but the NMI goes directly to the CPU, not via the PIC controller.

Luckily, you CAN have control over the NMI -- otherwise you could be in deep trouble.

The NMI is "turned on" (set high) by the memory module when a memory parity error occurs.

You have to be careful about disabling the NMI and the PIC for extended periods of time: your system will hang unless it has a failsafe timer! (You've always got one, as long as you don't kill the PIT timer.)

/* enable the NMI */
void NMI_enable(void)
{
        outb(0x70, inb(0x70)&0x7F);
}

/* disable the NMI */
void NMI_disable(void)
{
        outb(0x70, inb(0x70)|0x80);
}

Comments

Is it really wise to turn off NMI? okay, if you get an NMI while you're switching from RealMode to ProtectedMode, you could get a Triple Fault, which would reset the system, but isn't a system reset wished when content from memory is unreliable by that time ? -- Pype.

Is there much you can do if an NMI occurs? I guess if you got the error while reading from something that was copied from a disk at some point in the past and not modified since then you could read it from the disk again and continue with the (hopefully) good copy, but if it's something that has changed or been created dynamically then you don't really have an easy way to recover other than essentially starting from scratch. -- Kemp.

Of course, you're assuming that the kernel code hasn't been corrupted, if the kernel is damaged then you'll most likely tripple fault anyway. The best course of action is probably to request the user perform a RAM diagnostic with MemTest or something (and hope you don't crash before you can get that far). Then again... Windows keeps a "Hardware Damaged" flag for every physical page of RAM, unfortunately this would mean aborting the program that was running when the NMI occured then checking each page the program was using at the time for what are basically "bad sectors" and flaging them so they aren't used again -- AR

So if an NMI occurs you should assume it'll continue happening at that location in future rather than it being a freak occurrence? -- Kemp

I'm not an engineer so I wouldn't know for certain, but I would be inclined to think that if an NMI occurs on a page when you retest the program that caused the original NMI then it would seem rather likely that the chip is faulty, I don't know if Windows persists the flags across reboots but I doubt it since AFAIK it's impossible to tell if the RAM has been replaced while the computer was off. If the NMI doesn't occur again while checking the pages then the fault isn't severe and could be written off, just if it is a persistent problem then the page should probably be disabled (in the interest of preventing random program crashes) -- AR


Categories: CollectedKnowledge, HardWareIrq

Interrupt Service Routines (ISR)

Included from InterruptServiceRoutines

What is an Interrupt Service Routine (ISR)?

The x86 architecture is an interrupt driven system. External events trigger an interrupt - the normal control flow is interrupted and a interrupt service routine (ISR) is called.

Such events can be triggered by hardware or software. An example of a hardware interrupt is the keyboard: Every time you press a key, the keyboard triggers IRQ1 (Interrupt Request 1), and the corresponding interrupt handler is called. Timers, and disk request completion are other possible sources of hardware interrupts.

Software driven interrupts are triggered by the int opcode; e.g. the services provided by MS-DOS are called by the software triggering INT 21h and passing the applicable parameters in CPU registers.

For the system to know which interrupt service routine to call when a certain interrupt occurs, offsets to the ISR's are stored in the interrupt descriptor table (IDT) when you're in ProtectedMode, or in the interrupt vector table (IVT) when you're in RealMode.

More about ISR, IRQ and stuff can be found on Bona Fide.

What makes an ISR so special?

An ISR is called directly by the CPU, and the protocol for calling an ISR differs from calling e.g. a C function. Most importantly, an ISR has to end with the iret opcode, whereas usual C functions end with ret or retf. The obvious but nevertheless wrong solution leads to one of the most "popular" tripple-fault errors among OS programmers.

The Problem

Many people shun away from Assembler, and want to do as much as possible in their favourite high-level language. GCC (as well as other compilers) allow you to add inline Assembler, so many programmers are tempted to write an ISR like this:

/* How NOT to write an interrupt handler           */
void interrupt_handler(void)
{
    __asm__("pushad"); /* Save registers.          */
    /* do something */
    __asm__("popad");  /* Restore registers.       */
    __asm__("iret");   /* This will triple-fault! */
}

This cannot work. The compiler adds stack handling code before and after your function, which together with the iret results in Assembler code resembling this:

push   %ebp
mov    %esp,%ebp
sub    $<size of local variables>,%esp
pushad
# C code comes here
popad
iret
# 'leave' if you use local variables, 'pop %ebp' otherwise.
leave
ret

It should be obvious how this messes up the stack (ebp gets push'ed but never pop'ed). Don't do this. Instead, these are your options.

Solutions

Plain Assembler

Learn enough about Assembler to write your interrupt handlers in it. ;-)

Two-Stage Assembler Wrapping

Write an Assembler wrapper calling the C function to do the real work, and then doing the iret.

/* filename : isr_wrapper.asm */
.globl   _isr_wrapper
.align   4

_isr_wrapper:

    pushad
    call    _interrupt_handler
    popad
    iret


/* filename : interrupt_handler.c */
void interrupt_handler(void)
{
    /* do something */
}

Compiler Specific Directives

Some compilers for some processors have directives allowing you to declare a routine interrupt, offer a #pragma interrupt, or a dedicated macro. Borland C, Watcom C/C++, Microsoft C 6.0 and Free Pascal Compiler 1.9.* and up offer this, while VisualC++ and GCC don't:

/* Borland C */
void interrupt interrupt_handler(void)
{
    /* do something */
}
/* Watcom C/C++ */
void _interrupt interrupt_handler(void)
{
    /* do something */
}

Actually, VisualC++ can be used to make interupts, by making them naked. adding _declspec(naked) to your function will cause the compiler to leave out the stack handling code. This means you are free to set up and release stack space however you want. Just be careful that you put in a return statement, because the compiler won't, it will just allow execution to continue past the end of your code on into garbage. Also, if you plan to use local variables or function arguments in the C code, you need to set up the stack frame the way the compiler expects it. This is not such a problem in interrupts though, because since they are non-reenterant, you can simply use static variables.

/* Microsoft Visual C++ */
void _declspec(naked) interrupt_handler()
{
    _asm pushad;

    /* do something */

    _asm{
        popad
        iretd
    }
}

Black Magic

Look at the faulty code above, where the proper C function exit code was skipped, screwing up the stack. Now, consider this code snippet, where the exit code is added manually:

/* BLACK MAGIC - Strongly Discouraged! */
void interrupt_handler() {
    __asm__("pushad");
    /* do something */
    __asm__("popad; leave; iret"); /* BLACK MAGIC! */
}

The corresponding output would look somewhat like this:

push   %ebp
mov    %esp,%ebp
sub    $<size of local variables>,%esp
pushad
# C code comes here
popad
leave
iret
leave # dead code
ret   # dead code

This assumes that leave is the correct end-of-function handling - you are doing the function return code "by hand", and leave the compiler-generated handling as "dead code". Needless to say, such assumptions on compiler internals are dangerous. This code can break on a different compiler, or even a different version of the same compiler. It is therefore strongly discouraged, and listed only for completeness.


Categories: FAQ, HardWareIrq

See Also Help!? I can't get interrupts working


Related Threads

brendan providing a great intro on IRQ, ISR and similar stuff how to know the interrupt number in a generic handler

Help!? I can't get interrupts working

Included from Help!? I can't get interrupts working

This page is a sort of TroubleShooting manual to help you getting through common interrupts framework problems encountered by guests and members of the forum.

Make sure you collected enough information about your own situation (for instance running your code in Bochs).

ISR problems

My handler doesn't get called!? (ASM)

For this test, you need to call the interrupt yourself, by software. Don't try to get IRQ handled right from the start before you're sure your IDT setup is correct. You need to have:

  • your IDT loaded and filled properly.
  • your IDT's linear address loaded in a structure together with the table's size (in bytes, iirc). Be especially cautious if you have a HigherHalfKernel design or did not set up IdentityPaging.
  • a valid Code selector and offset in the descriptor, proper type, etc.
  • a handling code at the defined offset.

    -- see test code below

My Handler doesn't get called (C) !?

If you're programming the IDT setup in C, make sure the IDTR structure has been correctly understood by your compiler. As intel's 6 bytes structures enfringe most compiler's packing rules, you'll need to use either bitfields or packing pragmas. Use sizeof() and OFFSETOF() macros to make sure the expected definition is used (a runtime test would be fine)

My handler is called but it doesn't return !?

Try to run it in the BOCHS and see if you get any exception report. Program all your exception to have the same kind of behaviour as #test1, but displaying a character indicating the fault. Exceptions occuring at the end of an interrupt handler are usually due to a wrong stack operation within the handler.

  • don't try to return from an exception (unless you solved its cause). Returning from a division by zero, for instance, makes no sense at all
  • pops everything you push, but no more
  • make sure you didn't forget the CPU-pushed error code (for exceptions 8,10 and 14 at least)
  • make sure your handler doesn't trash unexpected registers. For exceptions and hardware IRQ handlers, no registers at all should be modified.

Another common source of error at this point comes from misimplementation of ISR in C. Check the InterruptServiceRoutines page for enlightenment ...

IRQ problems

Now that you're sure an interrupt can be called and can return, you're ready to enable hardware interrupts. As a first step, you're suggested to enable the keyboard handler only, as you'll have almost complete control of what it does. Use the mask feature of the PIC to enable/disable some handlers.

   outb(0x21,0xfd);
   outb(0xa1,0xff);
   enable(); // asm("sti");

I'm receiving EXC9 instead of IRQ1 when striking a key !?

You missed the PIC vectors reprogramming step. Check Can I remap the PIC? page. Note that if you remap the PIC vectors out of the IDT you'll get a GPF exception instead of any interrupt.

I do not receive any IRQ

Make sure you receive software interrupts first. Also make sure you enabled the IRQ of your interrest on the PIC mask and that you enabled the cascading line (bit #2 of the master) if you're waiting for a slave IRQ.

I can only receive one IRQ

Each IRQ needs to be acknowledged to the PIC manually. You need to have outb(0x20,0x20) within any master handler and any outb(0x20,0x20); outb(0xa0,0x20); within any slave handler.

When i try to enable timer, keyboard doesn't work anymore

A common mistake is that people reload the mask with 0xFE when they want to add timer, but doing this actually enables only the timer and disables the keyboard (bit #1 of 0xFE is set!) The correct value for enabling both keyboard and timer is 0xFC.

I keep getting an IRQ7 for no apparent reason

This is a known problem that cannot be prevented from happening, although there is a workaround. When any IRQ7 is received, simply read the In-Service Register (outb(0x20, 0x0B); unsigned char irr = inb(0x20);) and check if bit 7 (irr & 0x80) is set. If it isn't, then return from the interrupt without sending an EOI.

For more information, including a more detailed explanation, see Brendan's post in this thread.

IDT problems in assembly

what does "shift operator may only be applied to scalar values" mean ?

You're trying to load a 16-bits field (a part of the IDT descriptor) with a reference to a 32-bit label that is subject to relocation. Try to replace

isr_label:
   iret
bad_stuff dw isr_label & 0xFFFF
          dw 0xdead
          dw 0xbeef
          dw isr_label >> 16

by something that extracts a 'pure value' from the address (e.g. the difference of two addresses are a pure value and $$ means to NASM the start of the section)

%define BASE_OF_SECTION SOME_CONSTANT_YOU_SHOULD_KNOW
isr_label:
   iret
good_stuff dw (BASE_OF_SECTION isr_label - $$) & 0xFFFF
           dw 0xcafe
           dw 0xbabe
           dw (BASE_OF_SECTION isr_label - $$) >> 16

The role of BASE_OF_SECTION is to adjust the pure offset to the real situation (usually as defined in your linker script), e.g. if your kernel get loaded at 1MB, you'll set it to 0x100000 to keep the CPU happy.


AsmExample / test1

int_handler:
    mov ax, LINEAR_DATA_SELECTOR
    mov gs, ax
    mov dword [gs:0xB8000],') : '
    hlt

idt:
    resd 50*2

idtr:
    dw (50*8)-1
    dd LINEAR_ADDRESS(idt)

test1:
    lidt [idtr]
    mov eax,int_handler
    mov [idt+49*8],ax
    mov word [idt+49*8+2],CODE_SELECTOR
    mov word [idt+49*8+4],0x8E00
    shr eax,16
    mov [idt+49*8+6],ax
    int 49

should display a smiley on the top-left corner ... then the CPU is halted indefinitely.

Tell me about APIC ...

Included from Tell me about APIC

What's the APIC ?

APIC (Advanced Programmable Interrupt Controller) is the Intel standard for the "new" PIC. Its used in multiprocessor systems and is an integral part of all Intel (and compatible) processors from the P6 (Pentium III) and onwards.

The Intel APIC specification is used in modern processors, and there's a bit (bit 9) in the CPUID standard information flags that let you see whether a CPU has an APIC. The APIC itself is used for more up to date interrupt redirection, and for sending interrupts between processors. These things weren't possible using the older PIC specification.

Local APIC and IO-APIC

In APIC-based system, each CPU is made of a core and a local APIC. The local apic is responsible for handling cpu-specific interrupt configuration. Among other things, it contains the Local Vector Table (LVT) that translate events such as "internal clock" and other "local" interrupt sources into a interrupt vector (e.g. LocalINT1 pin could be raising an NMI exception by storing "2" in the corresponding entry of the LVT).

information about the local APIC can be found in "system programming guide" of intel processors)

In addition, the I/O APIC (e.g. intel 82093AA) is part of the chipset and provides multi-processor interrupt management, incorporating both static and dynamic symmetric interrupt distribution across all processors. In systems with multiple I/O subsystems, each subsystem can have its own set of interrupts.

Each interrupt pin is individually programmable as either edge or level triggered. The interrupt vector and interrupt steering information can be specified per interrupt. An indirect register accessing scheme optimizes the memory space needed to access the I/O APIC's internal registers. To increase system flexibility when assigning memory space usage, the I/O APIC's two-register memory space is re-locatable, but defaults to 0xFEC00000.

The original I/O APIC specification/datasheet is available from Intel at http://www.intel.com/design/chipsets/datashts/290566.htm and an updated version can be found at http://developer.intel.com/design/chipsets/specupdt/290710.htm.

The Intel Standards for the APIC can be found on the Intel site under the name of Multiprocessor Specification at http://developer.intel.com/design/pentium/datashts/24201606.pdf.

Inter-Procesor Interrupts

Inter-Processor Interrupts (IPIs) are generated by local APIC and can be used as basic signalling for scheduling coordination, multi-processors bootstrapping, etc.

APIC configuration

The local APIC is enabled at boot-time and can be disabled by clearing bit 11 of IA32_APIC_BASE_MSR (the CPU then receives its interrupts directly from a 8259-compatible PIC, which is usually used prior a reboot) and the I/O APIC can be programmed in legacy mode (so that it reacts as a 8259 device).

The local APIC registers are memory-mapped in physical page FEE00xxx (as seen in table 8-1 of intel P4 SPG). Note that there's a ModelSpecificRegister that specifies the actual APIC base.

Make sure you enable the APIC, by OR-ing the Spurious Interrupt Register (0x00F0) with 0x100, before you try to configure anything else ;)

#define IA32_APIC_BASE_MSR 0x1B
#define IA32_APIC_BASE_MSR_ENABLE 0x800

/** returns a 'true' value if the CPU supports APIC
 *  and if the local APIC hasn't been disabled in MSRs
 *  note that this requires CPUID to be supported.
 */
boolean cpuHasAPIC()
{
   dword a,d;
   cpuid(1,&a,&d);
   return d&CPUID_FLAG_APIC;
}

/** defines the physical address for local APIC registers
 */
void cpuSetAPICBase(phys_addr apic)
{
   dword a=(apic&0xfffff000) | IA32_APIC_BASE_MSR_ENABLE;
#ifdef __PHYSICAL_MEMORY_EXTENSION__
   dword d=(apic>>32) & 0x0f;
#else
   dword d=0;
#endif

   cpuSetMSR(IA32_APIC_BASE_MSR, a,d);
}

/** determines the physical address of the APIC registers page
 *  make sure you map it to virtual memory ;)
 */
phys_addr cpuGetAPICBase()
{
   dword a,d;
   cpuGetMSR(IA32_APIC_BASE_MSR,&a,&d);
#ifdef __PHYSICAL_MEMORY_EXTENSION__
   return (a&0xfffff000)|((d&0x0f)<<32);
#else
   return (a&0xfffff000);
#endif
}

IO APIC Configuration

The IO APIC, like the VGA controller and the CMOS, has a two-register address space - an address register at IOAPICBASE+0 and a data register at IOAPICBASE+0x10. All accesses must be done on dword boundaries. Here's some example code that illustrates this:

dword cpuReadIoApic(void *ioapicaddr, dword reg)
{
   dword * volatile ioapic = (dword*)ioapicaddr;
   ioapic[0] = reg;
   return ioapic[4];
}

void cpuWriteIoApic(void *ioapicaddr, dword reg, dword value)
{
   dword * volatile ioapic = (dword*)ioapicaddr;
   ioapic[0] = reg;
   ioapic[4] = value;
}

Note the strange usage of the volatile keyword. It means in this case that the value pointed to, not the pointer, is volatile. It prevents the compiler, as Visual C does, from reordering the memory accesses, which is a Bad Thing.


Categories: CollectedKnowledge, HardWareIrq

Related threads

APIC timer "mapping IO APIC"

Tell me about OPIC ...

Included from Tell me about OPIC

No info on OPIC currently available... other than its a non-intel standard.


Starting points for research:

  • The K6/2/3 use the OPIC (Open PIC) which no one else other than Cyrix (AFAIK) bothered with.
  • The K6-2 has a built-in OPIC.
  • I seem to remember that there was very little difference between OPIC and APIC in the first place, but AMD could not use APIC because of licensing problems.

Categories: ToDo, HardWareIrq



.:Files::CPU::Memory::IRQ::Video::PnP:.



Hardware::Video

How do I output text to the screen in protected mode ?

Included from How do I output text to the screen in protected mode?

Basics

Working on the assumption that you are in protected mode and not using the BIOS to do screen writes, you will have to do screen writes direct to "video" memory yourself.

This is quite easy to do, the text screen video memory for colour monitors resides at 0xB8000, and for monochrome monitors it is at address 0xB0000.

Text mode memory takes two bytes for every "character" on the screen. One is the ASCII code byte and the other the attribute byte. so HeLlo is stored as

0x000b8000: 'H', colourforH
0x000b8002: 'e', colourfore
0x000b8004: 'L', colourforL
0x000b8006: 'l', colourforl
0x000b0008: 'o', colourforo

The attribute byte carries the foreground colour in its lowest 4 bits and the background color in its highest 3 bits. The bit #7 's interpretation depends on how you (or the BIOS) configured the hardware (see VgaResources for additionnal info).

For instance, using 0x00 as attribute means black-on-black (you'll see nothing). 0x07 is lightgrey-on-black (dos default), 0x1F is white-on-blue (Win9x's blue-screen-of-death), 0x2a is for green-monochrome nostalgics :P

For colour video cards, you have 16kb of text video memory to use, and since 80x25 mode (80x25x2==4000 bytes per screen) does not use all 16kb, you have what is known as 'pages' and in 80x25 screen mode you have 8 display pages to use.

When you print to any other page than 0, it will not appear on screen until that page is enabled or "copied" into the page 0 memory space.

Writing strings

If you have a pointer to video memory and want to write a string, here is how you might do it;

        /* note this example will always write to the top
           line of the screen */
        void write_string(int colour, char *string)
        {
                char *video=(char*)0xB8000;
                while(*string!=0)
                {
                        *video=*string;
                        string++;
                        video++;
                        *video=colour;
                        video++;
                }
        }

Okay for strings, but how do i print numbers ?

just like in any environment: convert the number into a string, then print the string. E.g. since 1234 = 4 + 10*3 + 100*2 + 1000*1, if you recursively divide "1234" by ten and use the result of the division, you get all the digits:

1234 = 123*10 + 4
123 = 12*10 + 3
12 = 1*10 + 2
1 = 1

digits to be displayed are '1','2','3','4' ... if you know the numerical value of number%10, you simply have to add this to the character '0' to have the correct character (e.g. '0'+4 == '4')

(see more on the forum)

How do i print formatted messages (a la printf) ?

If you're working with C, you may want to print any number of arguments and you may have looked at the stdarg.h file from other Operating Systems (e.g. linux 0.1 and Thix 0.3.x). These macro definitions may be a bit weird to understand as they're basically C voodoo using pointers and casts and sizeof. Beware before porting them to 16 bits system, though.

va_start() points to the first variable argument. va_arg() advances the pointer, then evaluates to the previous argument (comma operator). va_end() doesn't do anything.

E.g. to implement va_start(), you extract the address of the last 'known' argument and advance (yes, it's a + since you're rewinding the stack) to the next stack item. The voodoo comes to the fact that things are automatically aligned on 32bits boundary on the stack, even if just 1 byte. Under gcc, the __builtin_next_arg() function may help you. Versions 3.x even seem to have __builtin_va_start(), __builtin_va_end() and __builtin_va_arg() so no black magic is required at all.

va_arg() usually require a bit more black magic since you have two things to do:

  • advance to the next item
  • return the value of the (previously current) item.

You can get stdarg.h, _printf.h and doprintf.c from geezer/osd.

Uh ? I get nothing displayed at all ...

Keep in mind that this way of writing to video memory will only work if the screen has been correctly set up for 80x25 video mode (which is mode 03). You can do this either by initializing every VGA register manually or by calling the Set Video Mode service of the BIOS Int10h while you're still in real mode (in your bootsector, for instance). Most BIOSes does that initialization for you, but some other (mainly on laptops) do not. Check out Ralf Browns Interrupt List for details. Note also that some modes that are reported as "both text&graphic" by mode lists are actually graphic modes with BIOS functions that plot fonts when you call char/message output through Int10h (which means you'll end up with plain graphic mode once in ProtectedMode)

More hints for non-working implementations on Help! I cannot print to screen !?


Categories: HowTo, HardWareVga

How do I detect if I have a colour or monochrome monitor ?

Included from How do I detect if I have a colour or monochrome monitor?

Detecting if you have a colour or monochrome video card is trivial. You can use the values stored in the BIOS data segment to determine this.

Here is some C source example of how I do it.

        /* video card mono/colour detection by Dark Fiber
         * returns 0=mono, 1=colour
         */
        int detect_video_type(void)
        {
                int rc;
                char c=(*(USHORT*)0x410&0x30;

                /* C can be 0x00 or 0x20 for colour, 0x30 for mono
                if(c==0x30)
                {
                        rc=0;   // mono
                }
                else
                {
                        rc=1;   // colour
                }
                return rc;
        }

Categories: HowTo, HardWareVga

How do I move the cursor when I print ?

Included from How do I move the cursor when I print?

Without access to bios calls and functions, moving the cursor requires using video hardware control. Lucky for us it is a simple procedure.

Note, this quick example ASSUMES 80x25 screen mode.

        /* void update_cursor(int row, int col)
         * by Dark Fiber
         */
        void update_cursor(int row, int col)
        {
                unsigned short position=(row*80) + col;

                // cursor LOW port to vga INDEX register
                outb(0x3D4, 0x0F);
                outb(0x3D5, (unsigned char)(position&0xFF));
                // cursor HIGH port to vga INDEX register
                outb(0x3D4, 0x0E);
                outb(0x3D5, (unsigned char )((position>>8)&0xFF));
        }

Keep in mind that in/out to VGA hardware is a sloow operation. So using the hardware registers to remember of the current character location (row, col) is bad practice -- and updating position after each displayed character is poor practice (updating it only when a line/string is complete is wiser and hiding it until a user prompt is required is wisest)


Categories: HowTo

How do I draw graphics ?

Included from How do I draw things in protected mode?

Now that you know how you can easily write text to the screen using HardWareVga support, you might be wondering how you'll be able to display nice images, windows, menues, icons, fancy cursors and buttons, etc.

Well, to quote Curufir, "Switch to a graphical mode and write directly in video memory".

Which are the graphics modes ?

Well the VGA (And VESA) modes can be selected using the standard BIOS interrupt 0x10. Int 0x10 seems like a decent enough reference for int 0x10 (No VESA extension) while VESA contains the various VESA standards.

Vga is limited to a 640x480x16, VESA (Depending on your card) can present much higher resolutions.

Included from GettingVbeModeInfo

VESA stopped assigning codes for video modes long ago -- instead they standardized a much better solution: you can query the video card for what modes it supports, and query it about the attributes of each mode. In your OS, you can have a function that you call with a desired width, height, and depth, and it returns the video mode number for it (or the closest match). Then, just set that mode

You'll want to look in the VESA VBE docs for these functions:

INT 0x10, AX=0x4F00
Get Controller Info. This is the one that returns the array of all supported video modes.
struct vbeControllerInfo {
   char signature[4];             // == "VESA"
   short version;                 // == 0x0300 for VBE 3.0
   short oemString[2];            // isa vbeFarPtr
   unsigned char capabilities[4];
   short videomodes[2];           // isa vbeFarPtr
   short totalMemory;             // as # of 64KB blocks
};

vbeInfoBlock *vib = dos_alloc(512);
v86_bios(0x10, {ax:0x4f00, es:SEG(vib), di:OFF(vib)},&out);
if (out.ax!=0x004f) die("something wrong with VBE get info");
INT 0x10, AX=0x4F01
Get Mode Info. Call this for each member of the mode array to find out the details of that mode.

struct vbeModeInfo {
  word attributes;
  byte winA,winB;
  word granularity;
  word winsize;
  word segmentA, segmentB;
  VBE_FAR(realFctPtr);
  word pitch; // bytes per scanline

  word Xres, Yres;
  byte Wchar, Ychar, planes, bpp, banks;
  byte memory_model, bank_size, image_pages;
  byte reserved0;

  byte red_mask, red_position;
  byte green_mask, green_position;
  byte blue_mask, blue_position;
  byte rsv_mask, rsv_position;
  byte directcolor_attributes;

  dword physbase;  // your LFB address ;)
  dword reserved1;
  short reserved2;
};
INT 0x10, AX=0x4F02
Set Video Mode. Call this with the mode number you decide to use.

Will it work with Bochs ?

For VBE to work in Bochs you need the "VGABIOS-lgpl" BIOS and have a version of Bochs that was compiled with the "--enable-vbe" option... See Vesa Information in Bochs thread for more info

How to pick the mode i wish ?

Here's a sample code, assuming you have a VirtualMonitor already ... Basically, you will scan the 'modes list' referenced by the vbeInfoBlock.videomodes[] and then call 'get mode info' for each mode. You can then compare width, height and colordepth of each mode with the desired one.


UInt16 findMode(int x, int y, int d)
{
  struct vbeControllerInfo *ctrl = (ControllerInfo *)0x2000;
  struct vbeModeInfo *inf = (ModeInfo *)0x3000;
  UInt16 *modes;
  int i;
  UInt16 best = 0x13;
  int pixdiff, bestpixdiff = DIFF(320 * 200, x * y);
  int depthdiff, bestdepthdiff = 8 >= d ? 8 - d : (d - 8) * 2;

  strncpy(ctrl->signature, "VBE2", 4);
  intV86(0x10, "ax,es:di", 0x4F00, 0, ctrl); // Get Controller Info
  if ( (UInt16)v86.tss.eax != 0x004F ) return best;

  modes = (UInt16*)REALPTR(ctrl->VideoModePtr);
  for ( i = 0 ; modes[i] != 0xFFFF ; ++i ) {
      intV86(0x10, "ax,cx,es:di", 0x4F01, modes[i], 0, inf); // Get Mode Info

      if ( (UInt16)v86.tss.eax != 0x004F ) continue;

      // Check if this is a graphics mode with linear frame buffer support
      if ( (inf->attributes & 0x90) != 0x90 ) continue;

      // Check if this is a packed pixel or direct color mode
      if ( inf->memory_model != 4 && inf->memory_model != 6 ) continue;

      // Check if this is exactly the mode we're looking for
      if ( x == inf->XResolution && y == inf->YResolution &&
          d == inf->BitsPerPixel ) return modes[i];

      // Otherwise, compare to the closest match so far, remember if best
      pixdiff = DIFF(inf->Xres * inf->Yres, x * y);
      depthdiff = (inf->bpp >= d)? inf->bpp - d : (d - inf->bpp) * 2;
      if ( bestpixdiff > pixdiff ||
          (bestpixdiff == pixdiff && bestdepthdiff > depthdiff) ) {
        best = modes[i];
        bestpixdiff = pixdiff;
        bestdepthdiff = depthdiff;
      }
  }
  if ( x == 640 && y == 480 && d == 1 ) return 0x11;
  return best;
}

Initial Thread: VESA, higher modes, reply by Dreamsmith (aka DaidalosGuy)

How to do the switch ?

The cleanest way to set up your video mode is to go through the video BIOS. It can be performed through the regular Int 0x10 interface, or through the (optional) ProtectedMode interface offered by VBE3. As you can guess, Int 0x10 requires a 16bits environment, so you can only use it in RealMode or V86Mode

Practically, the options are:

  • you set up the mode you want at early stage (in the bootloader) before entering protected mode.
  • you switch back to UnrealMode for setting the proper video mode
  • you set up a V86 monitor that will execute the mode-switching code
  • you use the PMID from VBE3
  • you run some software code translation tool to produce pmode code out of bios rmode code. (SANiK is on the catch)
  • you let GRUB do the switch for you (currently only works with patched GRUB)

How do I locate the video memory ?

For standard VGA video modes the video memory will either be at address 0xA0000 or 0xB8000. To find out which one look at the following table (quoting http://www.uv.tietgen.dk/staff/mlha/PC/Prog/ASM/INT/INT10.htm): "text" means 0xB8000, CGA/VGA/EGA is at 0xA0000. Note that most EGA modes (and high res VGA modes) use several bit planes so you won't be able to use all the colors by simply writing to video memory :(

00: text 40*25 16 color (mono)
01: text 40*25 16 color
02: text 80*25 16 color (mono)
03: text 80*25 16 color
04: CGA 320*200 4 color
05: CGA 320*200 4 color (m)
06: CGA 640*200 2 color
07: MDA monochrome text 80*25
08: PCjr
09: PCjr
0A: PCjr
0B: reserved
0C: reserved
0D: EGA 320*200 16 color
0E: EGA 640*200 16 color
0F: EGA 640*350 mono
10: EGA 640*350 16 color
11: VGA 640*480 mono
12: VGA 640*480 16 color
13: VGA 320*200 256 color

For VESA modes, the framebuffer address is stored in the mode info block. Once you get that block,

How do i plot pixels to video memory ?

where ?

Let's say you want to plot a pixel in red in the middle of your screen. The first thing you have to know is where the middle of the screen is. In 320x200x8 (mode 13), this will be at 100x320+160 = 32160. In general, your screen can be described by:

  • width -- how much pixels you have on a horizontal line
  • height -- how much horizontal lines of pixels are present
  • pitch -- how much bytes of VRAM you should skip to go one pixel down
  • depth -- how much bits of color you have
  • "pixelwidth" -- how much bytes of VRAM you should skip to go one pixel right.

"pitch" and "width" may seem redundant at first sight but they aren't. It's not rare once you go to higher (and exotic) resolutions to have e.g. 8K bytes per line while your screen is actually 1500 pixels wide (32-bits per pixel). The good news is that it allows smooth horizontal scrolling (which is mainly useful for 2D games :P )

Pitch and pixel width are usually announced by VESA mode info. Once you know them, you can calculate the place where you plot your pixel as:

unsigned char *pixel = vram + y*pitch + x*pixelwidth;

what ?

The second thing to know is what value you should write for "red". This depends on your screen setup, again. In EGA mode, you have a fixed palette featuring dark-red (color 4) and light-red (color 12). Yet, EGA requires you to plot each bit of that on different pixel plane, so refer to EGA programming tutorials if you really want such modes supported. In conventional 320x200x8 VGA mode, you have the same colours 4 and 12 as in EGA so you would plot your red pixel with

*pixel = 4;

Yet, in VGA, the palette is reprogrammable (as you can learn in FreeVGA documents), so virtually any value between 0..255 could be 'red' if you program the palette so :P

Finally, in VESA modes, you usually have truecolor or hicolor, and in both of them, you have to give independent red, green and blue values for each pixel. modeinfo will (again) instruct you of how the RGB components are organized in the pixel bits. E.g. you will have xRRRRRGGGGGBBBBB for 15-bits mode, meaning that #ff0000 red is there 0x7800, and #808080 grey is 0x4210 (pickup pencil, draw the bits and see by yourself ;)

/* only valid for 800x600x16M */
static void putpixel(unsigned char* screen, int x,int y, int color)
{
  unsigned where=x*3+y*2400;
  screen[where]=color&255;         // BLUE
  screen[where+1]=(color>>8)&255;  // GREEN
  screen[where+2]=(color>>16)&255; // RED
}

/* only valid for 800x600x32bpp */
static void putpixel(unsigned char* screen, int x,int y, int color)
{
  unsigned where=x*4+y*3200;
  screen[where]=color&255;          // BLUE
  screen[where+1]=(color>>8)&255;   // GREEN
  screen[where+2]=(color>>16)&255;  // RED
}

a few optimizations

It can be tempting from here to write fill_rect, draw_hline, draw_vline, etc. from calls to putpixel ... don't. Drawing a filled rectangle means you access successive pixels and then advance by "pitch - rect_width" to fill the next line. If you do a "for(y=100;y<200;y++) for(x=100;x<200;x++) putpixel (screen,x,y,RED);" loop, you'll recompute 'where' about 10,000 times. Even if the compiler has done good job to translate y*3200 into adds and shifts instead of multiplication, it's silly to run that so much time while you could do

static void fillrect(unsigned char *vram, unsigned char r, unsigned char g, unsigned char b, unsigned char w, unsigned
char h) {
  unsigned char *where = vram;
  int i,j;
  for (i=0;i<w;i++) {
    for (j=0;j<h;j++) {
      //putpixel(vram,64+j,64+i,(r<<16)+(g<<8)+b);

      where[j*4]=r;
      where[j*4+1]=g;
      where[j*4+2]=b;
    }
    where+=3200;
  }
}

That should be enough to get you started coding (or googling for) a decent video library.

How do i draw text ?

Once in graphic mode, you no longer have the BIOS or the hardware to draw fonts for you. The basic idea is to have font data for each character and use it to plot (or not to plot) pixels. There are plenty of ways to store those fonts depending on whether they have multiple colors or not, alpha channel or not etc. What you will basically have, however is:

// holding what you need for every character of the set
font_char* font_data[CHARS];

// rendering one of the character, given its font_data
draw_char(screen, where, font_char*);

draw_string(screen, where, char* input)
{
  while(*input) {
    draw_char(screen,where,font_data[input]);
    where+=char_width;
    input++;
  }
}

font encoding

The most common encoding that allows you not to overwrite the background over which you draw your text is the font bitmap, that is, an "A" character will e.g. be encoded as

...XX... = 0*128+0*64+0*32+1*16+1*8+0*4+0*2+0*1 = 0x18
..XXXX.. = 0x3C
.XX..XX. = 0x66
.XXXXXX. = 0x7E
.XX..XX. = 0x66
.XX..XX. = 0x66
........ = 0x00
........ = 0x00

In which case you test each bit of the font data to tell whether it's 1 or 0 and only put the pixel if it's 1. For larger fonts you might want to use RLE encoding instead, for instance. Finally, state-of-the-art true-type fonts will require you to support the "freetype" library.

Eeek! All that C code ! What about my 100% ASM project ?

http://bos.asmhackers.net/forum/viewtopic.php?id=65 covers basically the same, and is ASM-oriented (as expected from asmhackers :)



.:Files::CPU::Memory::IRQ::Video::PnP:.



Hardware::Plug'n'Play

Where can I find programming info on (ISA) PNP ?

Included from Where can I find programming info on PNP?

You can get the official Plug-and-Play documentation from the Microsoft ftp site. These documents are .EXE (self extracting) MS-Word format files. The documents are the industry (MS) specifications for PNP on BIOS, SCSI, Peripherals, etc.

ftp://ftp.microsoft.com/developr/drg/Plug-and-Play/Pnpspecs/

Craig Hart also has a good page on PNP programming at http://members.datafast.net.au/dft0802/

You could also check out http://linux-sxs.org/programming/interfac.html


Categories: HardWarePnp

I heard you can do PNP calls with the BIOS in Protected Mode ...

Included from I heard you can do PNP calls with the BIOS in Protected Mode?

Yes, just like PCI bios32 calls you can do PNP calls in pmode.

Once you have the BIOS32 service directory (see PCI example routine) you can call it with the PNP Auto Config magic. Note the InlineAssembly again for registers interfacing.

void bios32_scan_pnp_entry(void)
{
        ULONG   cseg_size, offset, base_addr;

        /* call the BIOS32 BSD for the PCI address
           BSD calls terminate in RETF not RET */

        /* eax is loaded with "$ACF" magic */
        asm("movl       $0x46434124, %%eax\n"
                "lcall _bios32_call\n"
                : "=c" (cseg_size),
                "=d" (offset),
                "=b" (base_addr)
                :
                : "eax", "ebx", "ecx", "edx", "ebp", "memory" );

        /* setup two new selectors of pnp_code32, pnp_data32, etc. */
}

Once you have determined that PNP BIOS calls exist for pmode applications, you can call the PCI v2.0c+ calls (see INT 0x1A, function 0xB400 to 0xB407 in RalfBrown's INT List).

Note, not many BIOS seem to support PNP Bios32 calls, so you may have to resort to using pmode16 calls directly to the PNP bios (requiring a 286 TSS).


Categories: HardWarePnp

Where can I find programming info on PCI ?

Included from Where can I find programming info on PCI?

Web resources

The official PCI web site is located at www.pcisig.com

This PCI Special Interest Group site contains a little information on PCI as Microsoft Word 6 document files which are the official specifications, it also contains a few test/diagnostic programs in C for working with PCI that you can download and also they keep a list of Vendor ID numbers but not product ID numbers.

Recommended books on PCI Programming

The two books below are based on the PCI spec and supply much information about device types, interrupt rerouting, Hot-plug /Power management specs etc.

PCI System Architecture - from Mindshare Inc
                          by Tom Shanley and Don Anderson
                          ISBN 0-201-30974-2 ($US 40 in 2004)
                          publisher Addison Wesley

PCI-X System Architecture - from Mindshare Inc
                          by Tom Shanley
                          ISBN 0-201-72682-3 ($US 45 in 2004)
                          publisher Addison Wesley

What can i expect from the PCI docs ?

The PCI documentation tells us how one can list the PCI hardware (both extension card and built-in components such as hard drive controllers, PCI-to-PCMCIA bridges, etc.) and what resources that hardware use for communication with the CPU. However, how you should use those resources to program the hardware depends on the very hardware you try to program (i.e. PCI has no info about how AC97 pci soundcard works: it can just tell you that it is a soundcard and that it will use memory-mapped region [ 0x1234..0x5678 ] and interrupt 10 to communicate with you.

Do I have to pay $xxx for an open-standard specification

Official PCI documents (PCI_Local_Bus_Spec_v2.x.pdf) unfortunately come at a high fee. Free-pirate hosting links are not welcome here (of course). Before ordering the documents, keep in mind that they're describing the whole standard, not just the programming information you need. That means you'd end up with a 300+ pages hard-cover book full of electric & mechanics specifications, bus timing, burst frame structure etc. which are completely meaningless for you.


What should a 'PCI core' component offer ?

Primitives for accessing configuration space

Quoting 'PCICFG.TXT' from RalfBrown PCI utility
PCI devices have an address which is broken down into a PCI-bus number (usually 0), a device number within that bus (0-31), and a function number within the device (0-7). (...) There are two incompatible methods for accessing the ports; on most chipsets, -b1 is the proper method ("PCI Configuration Space Access Mechanism #1"). On some chipsets (mostly the earliest ones supporting PCI), you will need to use -b2 ("PCI Configuration Space Access Mechanism #2"). USING THE INCORRECT METHOD CAN HANG OR RESET YOUR COMPUTER!, which explains why using BIOS service for accessing PCI makes sense ...

Configuration mode 2 is deprecated, afaik, so it won't be covered here. The following code snippet (under RalfBrown tutorial license ;) reads the value of register reg on bus:device:func pci space. Accessing a datum in PCI mechanism 1 consist of sending its 'address' on port 0xCF8 and then either reading or writing the value through port 0xcfc.

    DWORD addr = 0x80000000L | (((DWORD)(bus & 0xFF)) << 16) |
                      ((((unsigned)device) & 0x1F) << 11) |
                      ((((unsigned)func) & 0x07) << 8) | (reg & 0xFC) ;
         DWORD orig = inp(0xCF8) ;      // get current state
         outpd(0xCF8,addr) ;            // set up addressing to config data
         value = inpd(0xCFC) ;          // get requested DWORD of config data
         outpd(0xCF8,orig) ;            // restore configuration control

note: PCI bios interface do have them

Bus enumeration

There are two possible approach for device/driver handling:

driver push
the user tells the system to use the driver and the system check there's a device here. This check requires that you scan devices, looking for a specific vendor/subclass.
driver pull
the system scans for devices and loads drivers for devices that exists.

So in both approach, enumeration is required. The basics of devices enumeration consists of probing each bus:device:function in the configuration space and see if we have something consistent.

  • if the device does not exists, reading the PCI_VENDORID field (word at 0x00) will return either 0x0000 or =0xffff, which are both invalid vendors.
  • if the device exists and has only one function, the PCI_HEADER_TYPE (byte at 0x0E) will have it's highest bit cleared.
  • if the device exists and has multiple functions, the PCI_HEADER_TYPE has its highest bit set. In that case, the scan is repeated for all the 8 functions
  • There is a maximum of 8 PCI busses. Initially, the scan starts on bus #0. Some devices may act as bridges. This can be detected because PCI_HEADER_TYPE has the value 1. Bridges have a byte at 0x1a that tells the highest bus number behind the bridge.

Note that devices does not necessarily have contiguous numbers. E.g. a bus may have device 4 present even if there are no devices 1, 2, or 3. This is especially common on the bus #0 due to on-chip devices.


// disclaimer: this is untested code summarizing
// http://my.execpc.com/~geezer/osd/pnp/pci32.c

// it doesn't take pci-to-pci bridges into account
// and blindly scans all the possible busses.


int nb_bus=1;

for (bus=0;bus<nb_bus;bus++) {
  for (dev=0;dev<64;dev++) {
    byte htype=pciGetCfgByte(bus,dev,fn,PCI_HEADER_TYPE);
    word vendor_id=pciGetCfgWord(bus,dev,0,PCI_VENDORID);
    int nbfuncs=1;

    if (vendor_id==0 || vendor_id==0xffff) continue;
    if (htype & 0x80) nbfuncs=8;

    for (fn=0;fn<nbfuncs;fn++) {

      /** we have a device. Get vendor (0x00:2), product (0x02:2), class (0x09:3)
       *  information and create a DeviceNfo with it for later retrieval
       *  Don't forget to insert bus:dev:fn in the DeviceNfo so that we don't need
       *  to bother with enumeration again
       */
    }
  }
}

Primitives for decoding BARs

Address registers (telling what memory range or I/O range is used by the device) may not be immediate to use. According to sigops's chapter, the lowest bit you read will tell you if you're facing memory (e.g. bit0==0) or IO resource (e.g. bit0==1). You can tell the size of the region by writing 0xffffffff in the BAR and reading back. Once you're done, don't forget to write back the old value.

void pciDecodeBar(pci_device* pci, pci_bar_register bar)
{
   unsigned type = pciReadDword(pci, bar) & 0x0f;
   unsigned addr = pciReadDword(pci, bar) & ~0x0f;
   unsigned size;

   pciWriteDword(pci, bar, 0xffffffff);
   size = pciReadDword(pci, bar) & 0xfffffff0;
   pciWriteDword(pci, bar, addr|type);

   size = 1 + ~size;

   kprint(((type&1)?"memory %8x..%8x, %s":"i/o %4x..%4x"),
          addr,addr+size,(type&8)?"prefetchable":"no prefetch");
}

``BARs'' ? You mean i should go for a drink ?

For auto-configuration, PCI has has a special space called "PCI Configuration Space". This space is accessed using I/O ports. There's 2 different configuration space access mechanisms, but for the most common one you set the address in one I/O port (the bus, device, function and offset) and read or write the data in another I/O port.

Anyway, in each PCI device's configuration space there's normally one or more BARs (or "Base Address Registers"), which can be used to set or find the address (in physical memory or in I/O space) for each resource the card uses.

The BARs could be used for memory mapped I/O, I/O port ranges and/or a boot ROM.

They aren't used for IRQs though - there's different fields in PCI configuration space for that, one for the interrupt line the card uses (which is a fixed/static flag) and another for which IRQ it is routed to (which is set by the BIOS during boot, and only exists as a means to pass this information to operating systems - the hardware itself ignores it).


Categories: CollectedKnowledge, HardWareBus


related threads

PCI related , calling BIOS32 service, finding PCI device, Clicker 0.8.20 -- pci

most of the links posted on these threads looks broken, though (unfortunately)

additionnal material

there was an article called "pentium_vme_article.pdf" formerly on the Web, but it disappeared. PciSectionOfPentiumVme is revived here ...

I heard you can do PCI calls with the BIOS in Protected Mode ...

Included from I heard you can do PCI calls with the BIOS in Protected Mode?

True! You can call the PCI bios functions from pmode and its quite easy to do, and it does not require any mode switching back into real-mode to do it.

How do i access BIOS32 PCI ?

First thing is, you have to locate the BIOS32 service directory entry point. This is done by scanning for the 4 bytes of "magic" that is _32_ (0x5F32335F). The BIOS32 SD can lie in memory from 0xE0000 to 0x100000 and it always lies on a paragraph alignment.

CALL xxxxh:xxxxh - BIOS32 Service Directory
InstallCheck:   scan paragraph boundaries E000h to FFFFh for signature
                string "_32_", followed by a valid header structure
                (see #F0021)

Notes:  a 32-bit-code alternate PCI BIOS entry point may be found (if
        supported) by requesting the entry point for the API with
        identifier "$PCI".
        an alternate entry point for INT 1A/AH=B4h may be found (if
        supported) by requesting the entry point for the API with
        identifier "$ACF"
        other known identifiers are "$WDS" and "MPTN"
SeeAlso: INT 1A/AX=B100h

Format of BIOS32 Service Directory header structure:
Offset  Size    Description     (Table F0021)
 00h  4 BYTEs   signature "_32_"
 04h    DWORD   physical address of BSD entry point (see #F0022)
 08h    BYTE    header structure version number (currently 00h)
 09h    BYTE    header structure length in paragraphs (currently 01h)
 0Ah    BYTE    checksum (8-bit sum of all bytes in structure,
                including this one, should equal zero)
 0Bh  5 BYTEs   reserved (0)

(Table F0022)
Call BIOS32 Service Directory entry point with:
        EBX = function
            00000000h get service entry point
                EAX = service identifier
                    46434124h ("FCA$") Plug-and-Play Auto-Configuration
                    49435024h ("ICP$") PCI BIOS
                    4E54504Dh ("NTPM") ??? MPTN [PhoenixBIOS4 Rev. 6.0]
                    54435724h ("SDW$") ??? WDS$ [PhoenixBIOS4 Rev. 6.0]
                Return: AL = status
                            00h successful
                                 EBX = base address of handler's code seg
                                 ECX = size of code segment
                                 EDX = offset of handler in code seg
                            80h unknown service identifier
            else
                Return: AL = 81h invalid function
Notes:  the BSD handler assumes that it is running in a 32-bit code
        segment the returned entry points for PCI BIOS and Auto-Config
        must be called with the same registers as the real-mode INT
        1Ah interface, including the value B1h or B4h in AH (AMI BIOS
        v1.00.05.AX1 returns the same entry point for both interfaces
        and uses AH to distinguish which API is desired)

        some references indicate that only BL is used for the function
        number, though at least one implementation actually checks the
        entire EBX register; for maximum compatibility, the upper 24
        bits of EBX should be cleared when calling the entry point

Here is an example of detecting the BIOS32 service directory.

typedef struct BIOS32
{
        ULONG   magic                   __attribute__ ((packed));
        ULONG   phys_bsd_entry          __attribute__ ((packed));
        UCHAR   vers                    __attribute__ ((packed));
        UCHAR   prg_lens                __attribute__ ((packed));
        UCHAR   crc                     __attribute__ ((packed));
} BIOS32;

BIOS32 *master_bios32;
ULONG bios32_call;

UCHAR search_pci_bios(void)
{
        UCHAR           *p=(UCHAR*)0xE0000;
        BIOS32          *x;
        UCHAR           flag=0;
        UCHAR           crc;
        int                     i;

        master_bios32=NULL;
        bios32_call=0;
        while(flag==0 && (ULONG)p<0X100000)
        {
                X=(BIOS32*)P;
                IF(X->magic==0x5F32335F)                /* _32_ */
                {
                        for(i=0, crc=0; i<(X->prg_lens*16); i++)
                                crc+=*(p+i);
                        if(crc==0)
                        {
                            flag=1;
                            master_bios32=x;
                                bios32_call=master_bios32->phys_bsd_entry;
                        }
                }
                else
                        p+=0x10;
        }
}

Once you have located the service directory you can do a far call (InlineAssembly helping you for registers passing) to its physical entry point and ask it if PCI bios32 calls exist. (PCI v2.0c+)

void bios32_scan_pci_entry(void)
{
        ULONG   cseg_size, offset, base_addr;

        /* call the BIOS32 BSD for the PCI address
           BSD calls terminate in RETF not RET */

        /* eax is loaded with "$PCI" magic */
        asm("movl       $0x49435024, %%eax\n"
                "xorl   %%ebx, %%ebx\n"
                "movl   _bios32_call, %%ebp\n"
                "pushl  %%cs\n"
                "call   %%ebp\n"
                : "=c" (cseg_size),
                  "=d" (offset),
                  "=b" (base_addr)
                :
                : "eax", "ebx", "ecx", "edx", "ebp", "memory" );

        /* setup two new selectors of pci_code32, pci_data32, etc. */
}

Once you have determined that PCI v2.0c+ BIOS calls exist for pmode applications, you can call the PCI v2.0c+ calls (see INT 0x1A, function 0xB101 down to 0xB18F in RalfBrown's INT List).

But ... It doesn't work ?

Watch out! The BIOS32 may be present even if no PCI bios is available ... This is especially the case on the BOCHS virtual PC. Check every step, and make sure you didn't stop your search too early (i.e. finding the magic _32_ string means nothing if the checksum is wrong) ...

And what is it good for ?

Good question. Afaik, the BIOS32 will offer a 'select device where vendor_ID==x' and a 'select device where function==x' queries as well as calls to read/write byte/word/dword from PCI configuration space. In version 2.1 of the standard there are also some functions to provide the O.S. or the PNP driver with PCI Interrupt Routing Options.

configuration mechanisms

As explained on Where can I find programming info on PCI?, there are 2 incompatible configuration mechanism for PCI. Using the wrong one will cause chaos, and the PCI BIOS can help you in telling which one is the one to use

(int 0xa1, ax=0xb101, mechanism i supported if al&i is set). RBIL for details

telling the number of buses that exists

When enumerating devices, you'll love to know how many they are. This is returned in cl for the same int 0xa1, ax=0xb101 call. On each bus, you'll have to poll each of the 32 devices (which all may have up to 8 functions).

Where do i get a more complete reference ?

The PCI BIOS Specification, revision 2.1, could be what you need. It details all the standardized calls.


Categories: HowTo, HardWareBus, UsingBios

And what about USB ?

Included from And what about USB ?

How can I access my USB keyboard/mouse ?

Keyboards and mice are what the USB standard calls HID (Human Interface Device) class devices, and follow a special superset of the USB standard. Once you have a driver for a HID device, all USB HID devices will work with it, including mice, keyboards, joysticks, game controllers, and so forth. The HID standard is built on top of lower-level USB APIs to send and receive data packets across the wire, but provides a translation layer that interprets the USB data so that the HID layer could conceivably be implemented on top of other protocols (Like PS/2 or serial).

For early stages, you can probably ignore the fact that those devices are USB. Virtually every chipsets will offer a good old PS/2 emulation of the USB human interface devices, so you can use I/O port 0x60 like the rest of us ...

How can I boot from an USB flash disk ?

This is only possible with BIOS that support booting from USB devices and with sticks that supports to be bootable. If possible, http://www.ncsu.edu/project/runt/ suggests that simply putting your 512 bytes as the boot sector of the USB's FAT partition would be enough ...

How to add USB support in my kernel ?

As a first step, you'll need to provide a driver for the USB Host Controller, which appears as a PCI device (iirc) and will allow you to enumerate devices on the USB bus and send packets to the identified devices. Documentation about Open Host Controller Interface, Universal Host Controller Interface and Extended Host Controller Interface are available freely. Your USB-aware chipset should support one of these three. Some computer may have support for more than one of these standards (there are computer with UHCI for USB 1.1 and EHCI for high-speed USB 2.0 using the same USB ports)

any information about whether both OHCI & EHCI can be found on a single mother board are welcome, as well as information about whether a EHCI driver will work on UHCI hardware

Once you know what devices are present, you'll have to identify a device-specific driver that will match that hardware. For some devices (webcams, scanners, etc), this may require a vendor-specific driver while other devices (USB keychains, HID etc) have to adhere to class standards. This means that one can write a generic USB storage driver that will work with all possible keychains, embedded mp3 players, flash card readers, etc.

USB uses a tree topology, with non-leaf devices in the tree providing hub class services. The tree can be up to 16 levels deep, with each hub theoretically providing up to 16 devices. The "root" of the tree is not considered a hub for some reason. Before you can start talking to devices on the USB, you will want to be able to enumerate all devices. Fortunately, the USB standard requires that all devices, including hubs, are self-describing.

Don't forget that USB devices are hot-pluggable, that is they can be added and removed at any time while the system is running.

Where can I find additional information about USB?

  • At USB.org, or more particularly The USB 2.0 Specification, where you can download the official Universal Serial Bus Revision 2.0 specification, which defines the hardware and software. This is by far the best place to start, although not light reading. You may also find a wealth of information regarding the HID standard here.
  • In the Linux kernel (though things tends to be confusing there, and you have to be careful with educating yourself from Linux sources if your project isn't GPL'ed).
  • In Intel chipsets manuals, for instance.
  • Usb in a Nutshell may also interrest you. It looks like a really good tutorial giving all the required knowledge to understand any other USB documentation/sourcecode in a couple of HTML pages ...

    any link to a description of USBstorage, USBprinter, HID, USB vendor list, etc. is of course welcome ;)

  • Notes for an USB Tutorial (comments are welcome)

Categories: CollectedKnowledge, HardWareBus


Related Thread on the forum

USB stick

discusses the feasability of booting from an USB memory stick

USB driver

suggests a few startup links about USB

support USB keyboard

explained by Schol-R-Lea

Collecting links about USB

PypeClicker and Df collection of links about USB

USB Tutorial

Included from USB Tutorial

Disclaimer: Hi, this is PypeClicker typing. I'm using this page as a scratchpad for a yet-to-come USB programming tutorial.


What happens on a USB cable ?

USB looks much more like a network protocol than like RS232 (good old serial cables). Data transmissions follows formatted packets rather than being made of plain characters. However, unlike network protocols like Ethernet, Token Ring, etc. all communications are directed by the host (i.e. your computer), and even 'interrupts' are actually polled by the host.

More informations about this section can be found on Usb In A Nutshell (ch. 3)

USB Packets

These are the smallest formatted things that may transfer on the USB cable. Each USB packet at least have a SYNC header and a EOP trailer that carry no information but help identifying packets boundaries. Each USB must also have a PID field (Packet IDentifier) that tells the role of the packet in a transfer or a transaction.

USB transaction

A transaction is a "single sentence" between the host and a device. Transactions may be used to send/read data from the device, initiate transfers, set up parameters, get information, etc. Each transaction is typically made of 3 packets:

  1. a token packet that always comes from the host and carries the address of the device/endpoint concerned in the transaction. USB defines 4 types of tokens: SETUP (initiates a control transaction), IN (initiates a device->host data transaction) and OUT (initiates a host->device data transaction). Tokens are typically have small fixed size.
  2. a data packet which carries the transaction payload. Depending on the USB version, transfer speed and type of data packet used (DATA0, DATA1, DATA2 or MDATA), the maximum payload may be of 8, 64 or 1024.
  3. a handshake packet which informs the data emitter of the reception status. Handshake may be ACK (data received correctly), NAK (unable to receive/emit data right now, or no data to send) or STALL (device state invalid, host intervention required)

USB transfers

Transactions are used to build more complex protocols like control transfer, interrupt transfer (single small sized status poll), isochronous transfer (periodic non-acknowledged packet transmission, like a sample to play on speakers) and bulk transfer (non-predictable large-sized transfers like a page to print)

More on USB transfers can be found on USB in a nutshell, ch 4

further reading required to know how much on transfers will be required by the implementor

USB Configuration

The configuration part of the USB standard occurs through the notion of descriptors. Descriptors are blocks of information retrieved through the host controller which defines things like vendor/product identifiers for each device and class/subclass/protocol codes for each device's interface. Based on these informations, the Operating System can assign the proper driver to each device/interface.

A list of structures defining USB descriptors is available on Clicker's CVS (usbdescr.h)

note that Clicker is (L)GPL. If someone finds some equivalent text in public domain, feel free to post it here

Endpoints

Configuration communications are performed on endpoint 0. The USB device will usually define more endpoints used as communication channels between the device and the host, link those endpoints to interfaces which can be assigned a class, subclass and protocol codes (identifying what the device is able to do). For instance a digital camera could offer a couple of endpoints to implement the USBstorage interface (for the camera's memory card) and other endpoints for a webcam interface.

Reading descriptors

The host has to explicitly request descriptors at the device before it manipulate them. This is achieved by a setup transfer using a GET_DESCRIPTOR as the host->device data and receiving the descriptor as the data from the host. The whole transfer will look like

  1. command transaction:
    Setup_TOKEN(addr=device, endpoint=0) : host -> device
    DATA[ RequestType=0x80, Request=0x06, value=DESCR_TYPE|DESCR_SELECTOR,
           index=0, length=wished_length] : h->d, 8 bytes
    ACK : host <- device
  1. response transaction:
    IN_TOKEN(addr=device, endpoint=0) : host -> device
    DATA(the_descriptor) : host <- device (as much bytes as requested)
    ACK : host -> device
  1. confirm transaction:
    OUT_TOKEN(addr=device, endpoint=0) : host -> device
    DATA() : host -> device (empty)
    ACK : host <- device

The DATA packet in the command transaction is called "Setup Packet" (according to Beyond Logic), and carries almost all of the 'interresting' stuff:

  • the RequestType of 0x80 (DeviceToHost=0x80| StandardRequest=0| DeviceTargetted=0)
  • the Request (command) 0x06 for "GET_DESCRIPTOR"
  • the value (encoding unknown atm)

categories: HardWareBus

Valid XHTML 1.0! Valid CSS!
Page Execution took real: 76.415, user: 19.870, sys: 1.430 seconds