try3.asm version 2.19.2014

; boot.asm
; bin version

;from NASM manual:
; The BITS directive specifies whether NASM should generate code designed to run on a processor operating in 16-bit mode, 32-bit mode or 64-bit mode.
; You do not need to specify BITS 32 merely in order to use 32-bit instructions in a 16-bit DOS program; if you do, the assembler will generate incorrect code because it will be writing code targeted at a 32-bit platform, to be run on a 16-bit one:
; When NASM is in BITS 16 mode, instructions which use 32-bit data are prefixed with an 0x66 byte, and those referring to 32-bit addresses have an 0x67 prefix. In BITS 32 mode, the reverse is true: 32-bit instructions require no prefixes, whereas instructions using 16-bit data need an 0x66 and those working on 16-bit addresses need an 0x67.
; When NASM is in BITS 64 mode, most instructions operate the same as they do for BITS 32 mode.
[BITS 16]

; from the Programmer’s Reference Manual
;The segment containing the currently executing sequence of instructions is known as the current code segment;
;it is specified by means of the CS register. The 80386 fetches all instructions from this code segment, using
;as an offset the contents of the instruction pointer. CS is changed implicitly as the result of intersegment
;control-transfer instructions (for example, CALL and JMP), interrupts, and exceptions.
;The instruction pointer register (EIP) contains the offset address, relative to the start of the current code
;segment, of the next sequential instruction to be executed. The instruction pointer is not directly visible
;to the programmer; it is controlled implicitly by control-transfer instructions, interrupts, and exceptions.
;As Figure 2-9 shows, the low-order 16 bits of EIP is named IP and can be used by the processor as a unit.
;This feature is useful when executing instructions designed for the 8086 and 80286 processors.
; from http://www.supernovah.com/Tutorials/BootSector2.php
;As stated earlier, we cannot be sure if the BIOS set us up with the starting address of 0x7C0:0x0 or 0x0:0x7C00.
;We will use the second segment offset pair to execute our boot sector so we know for sure how the CPU will access
;our code. To do this, our very first instruction will be a far jump that simply jumps to the next instruction.
;The trick is, if we specify a segment, even if it is 0x0, the jmp will be a far jump and the CS register will be
;loaded with the value 0x0 and the IP register will be loaded with the address of the next instruction to be
;executed.
;[BITS 16]
;[ORG 0x7C00]
;jmp 0x0:Start
;Start:
; This code will set the CS segment to 0x0, set the IP register to the the very next instruction which will be slightly past 0x7C00, ….

; universal-loop
;     {
;       start-ORG-nguye^n-thu?y: maintain-gi`n-giu+~ba?o-to^`n (“muo^n loa`i ddu+o+.c so^’ng la^u bi`nh thu+o+`ng; everyone live long and well”); // in “gia ba?o”, “ba?o” ~ maintain as in “ba?o thu?/to^`n” …
; try/if   ;// tin messages …. the try/if is the “gia” of “gia ba?o” …
;  maintain-gi`n-giu+~-ba?o-to^`n (“muo^n loa`i va` messageA va` messageB va` messageNEW va` tinLA`NH va`… ddu+o+.c so^’ng la^u bi`nh thu+o+`ng; everyone live long and well”); // the message “stack” is loaded or push-pop with messages …; // push-and-pop-or-sent-and-receive (&messageNEW-hay-tinLA`NH); // tin and shakespeare’s version of “all roads lead to rome”: “doubt thou the stars are fire doubt truth to be a liar but never doubt I loved ‘muo^n loa`i ddu+o+.c so^’ng la^u bi`nh thu+o+`ng; everyone live long and well'”:  1/19/2014 Sunday Service … Gospel ~ Good News Tin La\nh …”Gia Ba?o”:  the “gia” attempts to reach an agreement with the “ba?o” …// salinger on internet news: push/pop/create stack/heap by an expansion assignment (“muo^n loa`i” <= “muo^n loa`i va` messageA va` messageB va` messageC va` ….”)
; ;catch/else  ;// unmaintainable tin/messages or kho’ tin hay kho^ng tin no messages … SBTN Uye^n Thi. commercial for MBR [master boot record] “kho’ tin nhu+ng co’ tha^.t …”
; ; go-to-jump-tro+?-ve^` start-ORG-nguye^n-thu?y: maintain-gi`n-giu+~-ba?o-to^`n (“muo^n loa`i ddu+o+.c so^’ng la^u bi`nh thu+o+`ng; everyone live long and well”);
; go-to-jump-tro+?-ve^` start-ORG-nguye^n-thu?y: maintain-gi`n-giu+~-ba?o-to^`n (“muo^n loa`i ddu+o+.c so^’ng la^u bi`nh thu+o+`ng; everyone live long and well”);
;     }

; irish-catholic Pat Benatar song “heartbreaker, dreammaker, don’t you mess around with me …” ….
; perhaps “there’s beggary in a love that can be reckoned” when love is unconditional–gia ba?o chu’ hoa`ng to^n an hoa`ng phi hu`ng and 10 commandments–but
; the ten commandments say there’s a love that’s conditional … and the 10 commandments describe the limits or conditions of that love …
; from http://wiki.osdev.org/Babystep2:
;some say that the bootloader is loaded at [metaphorical address] 0000:7C00, while others say 07C0:0000.
;This is in fact the same [real] address: 16 * 0x0000 + 0x7C00 = 16 * 0x07C0 + 0x0000 = 0x7C00.

%define ORIGIN ;  ….. comment this out to use “org 0”

%ifdef ORIGIN
[ORG 0x7c00]
; segment:offset … ds:offset or cs:offset … 0:offset-from-0x7COO … that is, labels in code following is addressed as 0:0x7C00+offset-from-start-of-file

;Following code will set the CS segment to 0x0, set the IP register to the the very next instruction which will be slightly past 0x7C00, ….
jmp 0x0:start  ; set up the ip stack pointer and cs segment register implicitly via jmp instruction
; jmp start  ; set up the ip stack pointer and cs segment register implicitly via jmp instruction

%else ;
[ORG 0]
; segment:offset … ds:offset or cs:offset …  0x07C0:offset-from-0 … that is, labels in the code following is addressed as 0x07C0:0+offset-from-start-of-file

;Following code will set the CS segment to 0x07C0, set the IP register to the the very next instruction which will be slightly past 0x0, ….
jmp 0x07C0:start ; set up the ip stack pointer and cs segment register implicitly via jmp instruction
; jmp start  ; set up the ip stack pointer and cs segment register implicitly via jmp instruction

%endif ; ORIGIN

%ifdef ORIGIN
%define MEMORYSEGMENTREALLOWBOUND 0x7C00
%else
%define MEMORYSEGMENTREALLOWBOUND 0x0000
%endif ; ORIGIN
%define SEGMENTSIZE   512
%define MEMORYSEGMENTREALUPPERBOUND MEMORYSEGMENTREALLOWBOUND + SEGMENTSIZE

; data segment
;section datasegment align=16 ; start= follows=
;segment datasegment align=16 ; start= follows=
; align 16
datasegment   dw      123

;%define TRYIVT 0 ; try out ivt codes … comment this out to exclude ivt codes
%ifdef TRYIVT
; interrupt vector table
ivt:        times 1024 db 0 ; interrupt vector table: reserve space to push-pop BIOS’ ivt table
ivtend:
;the ivtr structure:
ivtr    DW 0 ; For limit storage
DD 0 ; For base storage

; interrup descriptor table
;idt:
;idt_end:
;the idtr structure:
;idtr   DW 0 ; For limit storage
; DD 0 ; For base storage
%endif ; TRYIVT

; stack segment
; section stacksegment align=16 ; start= follows=
; segment stacksegment align=16 ; start= follows=
; align 16
stacksegment  resb 64
stacktop:

; set up the data, stack, etc. segment registers
start:
;mov AX, 0x0
;mov    AX,seg DATASEGMENT1
mov AX, datasegment
mov     DS,AX
;mov    AX,seg STACKSEGMENT
mov AX, stacksegment
mov     SS,AX
mov     SP,stacktop

;%define TRYIVT 1 ; non-zero
%ifdef TRYIVT

;interrrupts are a type of (messageA + messageB + messageC + messageD + tinLa`nh + …):
; from http://wiki.osdev.org/Interrupt_Vector_Table
; The IVT is typically located at 0000:0000H, and is 400H bytes in size (4 bytes for each interrupt). Although the default address can be changed using the LIDT instruction on newer CPUs, this is usually not done because it is both inconvenient and incompatible with other implementations and/or older software (e.g. MS-DOS programs). However, note that the code must remain in the first MiB of RAM.
; format of the ivt table entries [1024/4=256 entries] is
; +———–+———–+
; |  Segment  |  Offset   |
; +———–+———–+
; 4           2           0
; from https://www.uop.edu.jo/issa/Assembly/programming.pdf
;ivt table is 1k in real mode, 2k in protected mode
;ivt entry is 4 bytes in real mode, 8 bytes in protected mode
;size of the pointer to ivt table is 4 bytes for addresses from 00000000 to 000003FF, is 8 bytes in protected mode

;%define BIVTSTART 0x0; Start of BIOS ivt data area
;struc   tBIOSIVT                      ; its structure
;        .SEGMENT      RESW    1
;        .OFFSET       RESW    1
;endstruc

; the ivt table defined in the data segment “datasegment” above
;;.ivt:        times 1024 db 0 ; interrupt vector table: reserve space to push-pop BIOS’ ivt table
;;.ivtend:
;changeivt:
; ;mov CX,0x0400 ;// = 1024
; mov CX, 256 ; setup loop counter
;.loadivtwithbiosivt: mov byte [ivt + cx*4] [0000:cx*4]
; loop .loadivtwithbiosivt
; mov byte [ivt + 0] [0000:0000] ; since “loop” exists when CX is 0, 0th entry must be done manually
; jmp .exit
;.loadbiosivtwithivt: mov byte [0000:cx*4] [ivt + cx*4]
; loop .loadbiosivtwithivt
; mov byte [0000:0000] [ivt + 0] ; since “loop” exists when CX is 0, 0th entry must be done manually
; jmp .exit

; from http://wiki.osdev.org/GDT_Tutorial
;gdtr   DW 0 ; For limit storage
; DD 0 ; For base storage
;GDT:
;GDT_end:
;setGdt:
;   xor EAX, EAX ; zero EAX register for use as scratch
;   mov AX, DS  ; the data segment “datasegment”
;   shl EAX, 4  ; The linear address should here be computed as segment * 16 + offset. shift left 4 ~ multiply by 16
;   add EAX, ”GDT” ; add offset to GDT structure in segment “datasegment”
;   mov [gdtr + 2], eax ; initialize gdtr’s base storage to segment:offset address of GDT structure
;   mov EAX, ”GDT_end”
;   sub EAX, ”GDT” ; size of GDT structure = GDT end – GDT begin
;   mov [gdtr], AX ; initialize gdtr’s to size of GDT structure = GDT end – GDT begin
;   lgdt [gdtr]  ; set the gdt with lgdt
;   ret

; the idt or ivt table defined in the data segment “datasegment” above
;;.ivt:        times 1024 db 0 ; interrupt vector table: reserve space to push-pop BIOS’ ivt table
;;.ivtend:
;;ivtend:
;; interrup descriptor table
;idt:
;idt_end:
;the idtr or ivtr structures defined in the data segment “datasegment” above:
;idtr   DW 0 ; For limit storage
; DD 0 ; For base storage
;ivtr   DW 0 ; For limit storage
; DD 0 ; For base storage
;.setidt:  ; set the interrupt descriptor table IDT
.setivt:   ; set the interrupt vector table IVT
xor EAX, EAX ; zero EAX register for use as scratch
mov AX, DS  ; the data segment “datasegment”
shl EAX, 4  ; The linear address should here be computed as segment * 16 + offset. shift left 4 ~ multiply by 16
;  add EAX, idt ; add offset to IDT structure in segment “datasegment”
add EAX, ivt ; add offset to IVT structure in segment “datasegment”
;  mov [idtr + 2], eax ; initialize gdtr’s base storage to segment:offset address of IDT structure
mov [ivtr + 2], eax ; initialize gdtr’s base storage to segment:offset address of IVT structur
;  mov EAX, idt_end
mov EAX, ivt_end
;  sub EAX, idt ; size of GDT structure = IDT end – IDT begin
sub EAX, ivt ; size of GDT structure = IVT end – IVT begin
;  mov [idtr], AX ; initialize gdtr’s to size of IDT structure = IDT end – IDT begin
mov [ivtr], AX ; initialize gdtr’s to size of IVT structure = IVT end – IVT begin
;  lgdt [idtr]  ; set the idt with lgdt
lgdt [ivtr]  ; set the ivt with lgdt
;.exit:
ret
%endif ; TRYIVT

; to use the stack, use “call” and “ret” instead of “jmp”
; effectively, the illegal “mov eip, label” ~ legal “jmp label”
; or just let the program flows, without the jmp, to instructions that follow
; jmp main ; jmp Loads EIP with the specified address
call main ; call = push + jmp; ret = pop + jmp
; from http://wiki.osdev.org/Babystep2:
; In real mode, addresses are calculated as segment * 16 + offset. Since offset can be much larger than 16, there are many pairs
; of segment and offset that point to the same address.
%define REALADDRESS(SEGMENTNO,OFFSETNO) SEGMENTNO*16+OFFSETNO

%define VERIFYSEGMENTADDRESSBOUND(SEGMENTADDRESSTOVERIFY, OFFSETADDRESSTOVERIFY) \
(REALADDRESS(SEGMENTADDRESSTOVERIFY,OFFSETADDRESSTOVERIFY) > MEMORYSEGMENTREALLOWBOUND) \
& (REALADDRESS(SEGMENTADDRESSTOVERIFY,OFFSETADDRESSTOVERIFY) < MEMORYSEGMENTREALUPPERBOUND)
; generate some virtual segment:offset address for use with a real address …
; TO DO: align the generated addresses to “natural” byte boundaries …
; %define GENERATESEGMENTADDRESS(REALADDRESSNO, &GENSEGMENTNO, &GENOFFSETNO) …………….
; %define GENERATEVIRTUALSEGMENTADDRESS(REALADDRESSNO, VIRTUALOFFSETADDRESSINPUT) (REALADDRESSNO – VIRTUALOFFSETADDRESSINPUT)/16
; %define GENERATEOFFSETNO(REALADDRESSNO, VIRTUALSEGMENTADDRESSINPUT) (REALADDRESSNO – VIRTUALSEGMENTADDRESSINPUT * 16)

; from http://geezer.osdevbrasil.net/johnfine/segments.htm:
;The way it really works
; Each segment register is really four registers: •A selector register
;•A base register
;•A limit register
;•An attribute register
;
;In all modes, every access to memory that uses a segment register uses the base, limit, and attribute portions of the segment register and does not use the selector portion.
;Every direct access to a segment register (PUSHing it on the stack, MOVing it to a general register etc.) uses only the selector portion. The base, limit, and attribute portions are either very hard or impossible to read (depending on CPU type). They are often called the “hidden” part of the segment register because they are so hard to read.
;Intel documentation refers to the hidden part of the segment register as a “descriptor cache”. This name obscures the actual behavior of the “hidden” part.
; In real mode (or V86 mode), when you write any 16-bit value to a segment register, the value you write goes into the selector and 16 times that value goes into the base. The limit and attribute are not changed.
;In pmode, any write to a segment register causes a descriptor to be fetched from the GDT or LDT and unpacked into the base, limit and attribute portion of the segment register. (Special exception for the NULL Selector).
;When the CPU switchs between real mode and pmode, the segment registers do not automatically change. The selectors still contain the exact bit pattern that was loaded into them in the previous mode. The hidden parts still contain the values they contained before, so the segment registers can still be used to access whatever segments they refered to before the switch.

;Writes to a segment register
;When I refer to “writing to a segment register”, I mean any action that puts a 16-bit value into a segment register.
;The obvious example is something like:
;  MOV  DS,AX
;However the same rules apply to many other situations, including: •POP to a segment register.
;•FAR JMP or CALL puts a value in CS.
;•IRET or FAR RET puts a value in CS.
;•Both hardware and software interrupts put a value in CS.
;•A ring transition puts a value in both SS and CS.
;•A task switch loads all the segment registers from a TSS.

; from the Programmer’s Reference Manual
;The segment containing the currently executing sequence of instructions is known as the current code segment;
;it is specified by means of the CS register. The 80386 fetches all instructions from this code segment, using
;as an offset the contents of the instruction pointer. CS is changed implicitly as the result of intersegment
;control-transfer instructions (for example, CALL and JMP), interrupts, and exceptions.

main:
; to use the stack, use “call” and “ret” instead of “jmp”
; jmp screensetup ; or just let the program flows, without the jmp, to instructions that follow

call screensetup
call clearscreenpixels
call sayhello
;call exit
call hang
ret ; return

; from http://www.supernovah.com/Tutorials/BootSector4.php:
;Video Memory
;As previously stated, what is printed to the screen is simply controlled by a special section of memory called
;the video memory (or VGA memory). This section of memory is then periodically copied to the video device
;memory which is then presented to the screen by the Digital Analog Converter (DAC). Currently we are in text
;mode 03h which is a form of EGA. The video memory for text mode 3h begins at 0xB8000. Text mode 03h is 80 characters wide
;and 25 characters tall. This gives us 2000 total characters (80 * 25). Each character consists of 2 bytes which
;yields 4000 bytes of memory in total. So this means that text mode 03h stores it’s video information (the information that is
;printed to the screen) at the memory address 0xB8000 and it takes up 4000 bytes of memory.
;Printing Character to the Screen
;The first we must do in order to print character to the screen is to get a segment register setup that points
;to the memory location 0xB8000 [= 753664 = 47104 * 16]. Remember that segments in real mode have the lower four bits implicitly
;set to zero and because each hex digit represents four bits we can easily drop the right most zero on the
;memory address when storing it in a segment register. We will use the ES segment register because we
;still want to access our data with the DS segment so we don’t run into problems when using instructions that
;implicitly use the DS segment by default.
;mov AX,0xB800 ;// = 47104
;mov ES,AX

;screen output …
;for the screen, the messages in (“muo^n loa`i” <= “muo^n loa`i va` messageA va` messageB va` messageC va` ….”) are pixels …
;(“muo^n loa`i va` pixel1 va` pixel2 va` … ddu+o+.c so^’ng la^u bi`nh thu+o+`ng; everyone live long and well”)

screensetup: ; point ES to video memory
.setupvideosegment:
mov AX,0xB800 ;// = 47104
mov ES,AX
; to use the stack, use “call” and “ret” instead of “jmp”
; or just let the program flows, without the jmp, to instructions that follow
;jmp clearscreenpixels
ret  ; return

;Clearing the Background
;Clearing the background is rather trivial. The goal is to set all of the attribute bytes to the background color
;you wish to clear it to. The basic idea is to create a loop that will set every other byte, starting at the first
;attribute byte, to the background color we wish to clear to. We must also be sure to only clear all of the attributes that
;are used to represent the string. In other words, be sure not to go past the last attribute byte. The last attribute byte is
;found at 80 * 25 * 2 – 1. The 80 is the width and the 25 is the height. The 2 is there because two bytes make up each
;character; one for the character and one for the attribute. Finally the 1 is subtracted because our first attribute byte is
;actually the second byte at the beginning The 1 simply takes into account that we start our count at one instead of zero.

;The right most hex digit sets the lower four bits of the attribute byte. The lower four bits control the character color while the upper
;four bits (the left most hex digit) control the background color and flash bit. We set the background and flash bits (upper four bits) to 0h
; because 0h corresponds to the color black with no flashing.

;color  index hex 64-color palette index
;Black   0 00h 0
;Blue    1   01h  1
;Green   2  02h   2
;Cyan    3   03h   3
;Red    4   04h   4
;Magenta 5   05h   5
;Brown   6   06h   20
;Light Gray 7   07h   7
;Dark Gray  8   08h   56
;Bright Blue 9   09h   57
;Bright Green 10 0Ah   58
;Bright Cyan 11  0Bh   59
;Bright Red 12   0Ch   60
;Bright Magenta 13   0Dh   61
;Bright Yellow 14   0Eh   62
;Bright White 15   0Fh   63

 

clearscreenpixels:
mov CX,80 * 25 * 2 – 1
mov BX,1
.Loopthroughscreenpixels:
cmp BX,CX
ja .finishclearscreenpixels ;CF = 0 and ZF = 0
;ja Loads EIP with the specified address, if first operand of previous CMP instruction is greater than the second. ja is the same as jg, except that it performs an unsigned comparison.

mov byte [ES:BX],70h ;Set background to light gray
;and the text to black
;with no flashing text
add BX,2
jmp .Loopthroughscreenpixels ; jmp Loads EIP with the specified address

.finishclearscreenpixels:
; to use the stack, use “call” and “ret” instead of “jmp”
; or just let the program flows, without the jmp, to instructions that follow
;jmp exit
;jmp sayhello
ret

sayhello:
mov byte [ES:0],’H’
mov byte [ES:2],’o’
mov byte [ES:4],’p’
mov byte [ES:6],’e’
mov byte [ES:8],’ ‘
mov byte [ES:10],’W’
mov byte [ES:12],’e’
mov byte [ES:14],’l’
mov byte [ES:16],’l’

; to use the stack, use “call” and “ret” instead of “jmp”
; or just let the program flows, without the jmp, to instructions that follow
;jmp exit
ret

 

exit:
; to use the stack, use “call” and “ret” instead of “jmp”
; or just let the program flows, without the jmp, to instructions that follow
; jmp hang

hang:
jmp hang ; or, equivalently in nasm: jmp $
hlt  ; halt the system

times 510-($-$$) db 0 ; 2 bytes less now; $ = beginning of current line/expression = “times”, $$ = beginning of current section = “hang:”
db 0x55
db 0xAA
;********************************************
;*** NOTE ***
; from NASM manual:
;NASM gives special treatment to symbols beginning with a period. A label beginning with a single period is treated as a local label, which means that it is associated with the previous non-local label. So, for example:
;label1  ; some code
;.loop
;        ; some more code
;        jne     .loop
;        ret
;label2  ; some code
;.loop
;        ; some more code
;        jne     .loop
;        ret
;In the above code fragment, each JNE instruction jumps to the line immediately before it, because the two definitions of .loop are kept separate by virtue of each being associated with the previous non-local label.

;from http://wiki.osdev.org/Interrupts
;  if IRQ 6 is sent to the PIC by a device, the PIC would tell the CPU to service INT 0Eh, which presumably has code for interacting with whatever device sent the interrupt in the first place. Of course, there can be trouble when two or more devices share an IRQ; if you wonder how this works, check out Plug and Play.

; from http://www.techmasala.com/2006/03/31/foundation-stone-3-bios-part-2-the-interrupt-vector-table/:
;Foundation stone #3 – BIOS part 2 – The interrupt vector table
;by Ramesh on Friday,March 31, 2006 @ 9:50 am
;In my post Foundation stone #2 we saw that BIOS is the one that takes in charge when you switch on your PC. After collecting the inventory of available and properly working hardware, the BIOS sets up what is called as the Interrupts area. An interrupt is a signal to the processor that there is something that needs its attention. As such each and every piece of hardware that is put together in your PC is useless unless it is orchestrated well. Take for example the keyboard, if the attention is not given at the right time when you press a key and reciprocated accordingly wherever you are then you can call the thing that is sitting in front of you as dumb
;So when the BIOS is done with the inventory of hardware, it initializes a memory space of 1024 bytes starting at 0000:0000h (this is a representation of memory location in the form of segment:offset in hexadecimal). An interrupt is a small routine or code that has the necessary details of the interrupt and occupies 4 bytes. So starting at memory location 0000:0000h interrupts are stored. So a total of 256 interrupts can be stored in a the allotted 1024 bytes but all is not being initialized by the BIOS. There are different types of interrupts, hardware interrupts, software interrupts, user interrupts and so on. The BIOS fills up the hardware interrupts and the software interrupts are mostly added by the OS.
;The Interrupt Vector Table (IVT) is a mapping of the interrupt number and the memory location in the form of segment:offset. This memory location contains the  interrupt code for that particular interrupt. It is the responsibility of the OS to keep track of the IVT and monitor for interrupt and notify the processor. So what happens when you press a key or release a key, the keyboard send signals that contain information on what key was pressed or released. This gets stored in the memory location assigned for the keyboard interrupt (traditionally interrupt 09h is for keyboard). The OS which is constantly looking for these interrupts immediately captures the information and sends it for processing accordingly. The interrupt number and other details could differ from one BIOS manufacturer to other. You can get a lot of information about BIOS and interrupts from the BIOS central site.

; conventionally [c.f. http://en.wikipedia.org/wiki/Conventional_memory, http://en.wikipedia.org/wiki/Power-on_self-test%5D people agree upon the following memory map … from http://www.supernovah.com/Tutorials/Assembly2.php:
;Default Memory
;When the computer boots, the BIOS loads the memory with a lot of different data. This data resides in different places throughout memory and we are only left with 630Kb of memory to work with in the middle of everything. Here is a table showing the map of the memory directly after the computer boots:
;All ranges are inclusive
;Address Range (in hex)  Size   Type  Description
;0 – 3FF     1Kb    Ram   Real Mode Interrupt Vector Table (IVT)
;400 – 4FF     256 bytes   Ram   BIOS Data Area (BDA)
;500 – 9FBFF     630Kb    Ram   Free Memory
;9FC00 – 9FFFF    1Kb    Ram   Extended BIOS Area (EBDA)
;A0000 – BFFFF    128Kb  Video  Ram   VGA Frame Buffer
;C0000 – C7FFF    32Kb    Rom   Video Bios
;C8000 – EFFFF    160kb    Rom   Misc.
;F0000 – FFFFF    64Kb

; from NASM manual
;Multi-line macros are much more like the type of macro seen in MASM and TASM: a multi-line macro definition in NASM looks something like this.
;%macro  prologue 1
;        push    ebp
;        mov     ebp,esp
;        sub     esp,%1
;%endmacro

; from http://www.husseinsspace.com/teaching/udw/1996/asmnotes/chaptwo.htm:
;The SHR/SLR instructions
;format:
;SHR destination,1
;SHR destination,CL
; SHL destination,1
; SHL destination,CL
;SHR shifts the destination right bitwise either 1 position or a number of positions determined by the current value of the CL register. SHL shifts the destination left bitwise either 1 position or a number of positions determined by the current value of the CL register. The vacant positions are filled by zeros.
;example:
;shr ax,1
; shl ax,1
;The first example effectively divides ax by 2 and the second example effectively multiplies ax by 2. These commands are faster than using DIV and MUL for arithmetic involving powers of 2.

;****************************
; from Intel Programmer’s Reference Manual
;10.1 Processor State After Reset
;The contents of EAX depend upon the results of the power-up self test. The self-test may be requested externally by assertion of BUSY# at the end of RESET. The EAX register holds zero if the 80386 passed the test. A nonzero value in EAX after self-test indicates that the particular 80386 unit is faulty. If the self-test is not requested, the contents of EAX after RESET is undefined.
;DX holds a component identifier and revision number after RESET as Figure 10-1 illustrates. DH contains 3, which indicates an 80386 component. DL contains a unique identifier of the revision level.
;Control register zero (CR0) contains the values shown in Figure 10-2 . The ET bit of CR0 is set if an 80387 is present in the configuration (according to the state of the ERROR# pin after RESET). If ET is reset, the configuration either contains an 80287 or does not contain a coprocessor. A software test is required to distinguish between these latter two possibilities.
;The remaining registers and flags are set as follows:
;   EFLAGS             =00000002H
;   IP                 =0000FFF0H
;   CS selector        =000H
;   DS selector        =0000H
;   ES selector        =0000H
;   SS selector        =0000H
;   FS selector        =0000H
;   GS selector        =0000H
;   IDTR:
;              base    =0
;              limit   =03FFH
;All registers not mentioned above are undefined.
;These settings imply that the processor begins in real-address mode with interrupts disabled.
;10.2 Software Initialization for Real-Address Mode
;In real-address mode a few structures must be initialized before a program can take advantage of all the features available in this mode.
;10.2.1 Stack
;No instructions that use the stack can be used until the stack-segment register (SS) has been loaded. SS must point to an area in RAM.
;10.2.2 Interrupt Table
;The initial state of the 80386 leaves interrupts disabled; however, the processor will still attempt to access the interrupt table if an exception or nonmaskable interrupt (NMI) occurs. Initialization software should take one of the following actions: • Change the limit value in the IDTR to zero. This will cause a shutdown if an exception or nonmaskable interrupt occurs. (Refer to the 80386 Hardware Reference Manual to see how shutdown is signalled externally.)
;• Put pointers to valid interrupt handlers in all positions of the interrupt table that might be used by exceptions or interrupts.
;• Change the IDTR to point to a valid interrupt table.
;
;10.2.3 First Instructions
;After RESET, address lines A{31-20} are automatically asserted for instruction fetches. This fact, together with the initial values of CS:IP, causes instruction execution to begin at physical address FFFFFFF0H. Near (intrasegment) forms of control transfer instructions may be used to pass control to other addresses in the upper 64K bytes of the address space. The first far (intersegment) JMP or CALL instruction causes A{31-20} to drop low, and the 80386 continues executing instructions in the lower one megabyte of physical memory. This automatic assertion of address lines A{31-20} allows systems designers to use a ROM at the high end of the address space to initialize the system.

; from http://en.wikipedia.org/wiki/Interrupt_descriptor_table
;In the 8086 processor, the IDT resides at a fixed location in memory from address 0x0000 to 0x03ff, and consists of 256 four-byte real mode pointers (256 × 4 = 1024 bytes of memory). In the 80286 and later, the size and locations of the IDT can be changed in the same way as it is done in protected mode, though it does not change the format of it. A real mode pointer is defined as a 16-bit segment address and a 16-bit offset into that segment. A segment address is expanded internally by the processor to 20 bits thus limiting real mode interrupt handlers to the first 1 megabyte of addressable memory. The first 32 vectors are reserved for the processor’s internal exceptions, and hardware interrupts may be mapped to any of the vectors by way of a programmable interrupt controller.
; A commonly used x86 real mode interrupt is INT 10, the Video BIOS code to handle primitive screen drawing functions such as pixel drawing and changing the screen resolution.
; from http://software.intel.com/en-us/articles/introduction-to-x64-assembly
;   XOR EAX, EAX ; zero out eax
;   MOV  ECX, 10  ; loop 10 times
;Label:   ; this is a label in assembly
;   INX  EAX    ; increment eax
;   LOOP  Label  ; decrement ECX, loop if not 0

; from https://courses.engr.illinois.edu/ece390/books/artofasm/CH06/CH06-5.html#HEADING5-294
;                mov     ecx, 255
;ArrayLp:        mov     Array[ecx], cl
;                loop    ArrayLp
;                mov     Array[0], 0
;The last instruction is necessary because the loop does not repeat when cx is zero. Therefore, the last element of the array that this loop processes is Array[1], hence the last instruction.
; The loop instruction does not affect any flags.

; 2.17.2014 chu’ Ha^n telephoned about obtaining literature on American Philosophy and on US Census Data particularly
; US Census Data on black population expansion into US and into the world …
; following day:  couple resembling co^ Be^ and David Lowe seen at Post Office when we tried to mail chu’ Kha’s preserved fruit to father in Michigan
; from http://randomascii.wordpress.com/2012/12/29/the-surprising-subtleties-of-zeroing-a-register/
; also see http://navet.ics.hawaii.edu/~casanova/courses/ics312_spring14/slides/ics312_bits_2.pdf
;Tabula rasa
;The x86 instruction set does not have a special purpose instruction for zeroing a register. An obvious way of dealing with this would be to move a constant zero into the register, like this:
;mov eax, 0
;That works, and it is fast. Benchmarking this will typically show that it has a latency of one Sandybridge diecycle – the result can be used in a subsequent instruction on the next cycle. Benchmarking will also show that this has a throughput of three-per-cycle. The Sandybridge documentation says that this is the maximum integer throughput possible, and yet we can do better.
;It’s too big
;The x86 instruction used to load a constant value such as zero into eax consists of a one-byte opcode (0xB8) and the constant to be loaded. The problem, in this scenario, is that eax is a 32-bit register, so the constant is 32-bits, so we end up with a five-byte instruction:
;B8 00 00 00 00       mov         eax, 0
;Instruction size does not directly affect performance – you can create lots of benchmarks that will prove that it is harmless – but in most real programs the size of the code does have an effect on performance. The cost is extremely difficult to measure, but it appears that instruction-cache misses cost 10% or more of performance on many real programs. All else being equal, reducing instruction sizes will reduce i-cache misses, and therefore improve performance to some unknown degree.
;Smaller alternatives
;Many RISC architectures have a zero register in order to optimize this particular case, but x86 does not. The recommended alternative for years has been to use xor eax, eax. Any register exclusive ored with itself gives zero, and this instruction is just two bytes long:
;33 C0                xor         eax, eax
;Careful micro-benchmarking will show that this instruction has the same one-cycle latency and three-per-cycle throughput of mov eax, 0 and it is 60% smaller (and recommended by Intel), so all is well.
;Suspicious minds
;If you really understand how CPUs work then you should be concerned with possible problems with using xor eax, eax to zero the eax register. One of the main limitations on CPU performance is data dependencies. While a Sandybridge processor can potentially execute three integer instructions on each cycle, in practice its performance tends to be lower because most instructions depend on the results of previous instructions, and are therefore serialized. The xor eax, eax instruction is at risk for such serialization because it uses eax as an input. Therefore it cannot (in theory) execute until the last instruction that wrote to eax completes. For example, consider this code fragment below:
;1: add eax, 1
;2: mov ebx, eax
;3: xor eax, eax
;4: add eax, ecx
;Careful micro-benchmarking will show that this instruction has the same one-cycle latency and three-per-cycle throughput of mov eax, 0 and it is 60% smaller (and recommended by Intel), so all is well.
;Ideally we would like our awesome out-of-order processor to execute instructions 1 and 3 in parallel. There is a literal data dependency between them, but a sufficiently advanced processor could detect that this dependency is artificial. The result of the xor instruction doesn’t depend on the value of eax, it will always be zero.
;It turns out that for x86 processors have for years handled xor of a register with itself specially. Every out-of-order Intel and AMD processor that I am aware of can detect that there is not really a data dependency and it can execute instructions 1 and 3 in parallel. Which is great. The CPUs use register renaming to ‘create’ a new eax for the sequence of instructions starting with instruction 3.
; from http://stackoverflow.com/questions/4909563/why-should-code-be-aligned-to-even-address-boundaries-on-x86
;Because the (16 bit) processor can fetch values from memory only at even addresses, due to its particular layout: it is divided in two “banks” of 1 byte each, so half of the data bus is connected to the first bank and the other half to the other bank. Now, suppose these banks are aligned (as in my picture), the processor can fetch values that are on the same “row”.
;  bank 1   bank 2
;+——–+——–+
;|  8 bit | 8 bit  |
;+——–+——–+
;|        |        |
;+——–+——–+
;| 4      | 5      | <– the CPU can fetch only values on the same “row”
;+——–+——–+
;| 2      | 3      |
;+——–+——–+
;| 0      | 1      |
;+——–+——–+
; \      / \      /
;  |    |   |    |
;  |    |   |    |
; data bus  (to uP)

;Now, since this fetch limitation, if the cpu is forced to fetch values which are located on an odd address (suppose 3), it has to fetch values at 2 and 3, then values at 4 and 5, throw away values 2 and 5 then join 4 and 3 (you are talking about x86, which as a little endian memory layout).
; That’s why is better having code (and data!) on even addresses.
;PS: On 32 bit processors, code and data should be aligned on addresses which are divisible by 4 (since there are 4 banks).
;Hope I was clear. 🙂
;share|improve this answer
;answered Feb 5 ’11 at 23:02
;BlackBear
;9,42131746
;bio
;website google.it
;location Trento, Italy
;age 19

; from http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
;Conclusion: On recent Intel processors, data alignment does not make processing measurably faster. Data alignment for speed is a myth.
;Acknowledgement: I am grateful to Owen Kaser for pointing me to the references on this issue.
;http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/

 

 

Leave a comment