CHAPTER 8 NUMBERS AND EXPRESSIONS Numbers and Bases A86 supports a variety of formats for numbers. In non-computer life, we write numbers in a decimal format. There are ten digits, 0 through 9, that we use to describe numbers; and each digit position is ten times as significant as the position to its right. The number ten is called the "base" of the decimal format. Computer programmers often find it convenient to use other bases to specify numbers used in their programs. The most commonly-used bases are two (binary format), sixteen (hexadecimal format), and eight (octal format). The hexadecimal format requires sixteen digits. The extra six digits beyond 0 through 9 are denoted by the first six letters of the alphabet: A for ten, B for eleven, C for twelve, D for thirteen, E for fourteen, and F for fifteen. In A86, a number must always begin with a digit from 0 through 9, even if the base is hexadecimal. This is so that A86 can distinguish between a number and a symbol that happens to have digits in its name. If a hexadecimal number would begin with a letter, you precede the letter with a zero. For example, hex A0, which is the same as decimal 160, would be written 0A0. Because it is necessary for you to append leading zeroes to many hex numbers, and because you never have to do so for decimal numbers, I decided to make hexadecimal the default base for numbers with leading zeroes. Decimal is still the default base for numbers beginning with 1 through 9. Large numbers can be given as the operands to DD, DQ, or DT directives. For readability, you may freely intersperse underscore characters anywhere with your numbers. The default base can be overridden, with a letter or letters at the end of the number: B or xB for binary, O or Q for octal, H for hexadecimal, and D or xD for decimal. Examples: 077Q octal, value is 8*7 + 7 = 63 in decimal notation 123O octal if the "O" is a letter: 64 + 2*8 + 3 = 83 decimal 1230 decimal 1230: shows why you should use "Q" for octal!! 01234567H large constant 0001_0000_0000_0000_0003R real number specified in hexadecimal 100D superfluous D indicates decimal base 0100D hex number 100D, which is 4096 + 13 = 5009 in decimal 0100xD decimal 100, since xD overrides the default hex format 0110B hex 110B, which is 4096 + 256 + 11 = 4363 in decimal 0110xB binary 4+2 = 6 in decimal notation 110B also binary 4+2 = 6, since "B" is not a decimal digit 8-2 The last five examples above illustrate why an "x" is sometimes necessary before the base-override letter "B" or "D". If that letter can be interpreted as a hex digit, it is; the "x" forces an override interpretation for the "B" or "D". By the way, the usage of lower case for x and upper case for the following override letter is simply a recommendation; A86 treats upper-and lower-case letters equivalently. The RADIX Directive The above-mentioned set of defaults (hex if leading zero, decimal otherwise) can be overridden with the RADIX directive. The RADIX directive consists of the word RADIX followed by a number from 2 to 16. The default base for the number is ALWAYS decimal, regardless of any (or no) previous RADIX commands. The number gives the default base for ALL subsequent numbers, up to (but not including) the next RADIX command. If there is no number following RADIX, then A86 returns to its initial mixed default of hex for leading zeroes, decimal for other leading digits. For compatibility with IBM's assembler, RADIX can appear with a leading period; although I curse the pinhead designer who put that period into IBM's language. As an alternative to the RADIX directive, I provide the D switch, which causes A86 to start with decimal defaults. You can put +D into the A86 command invocation, or into the A86 environment variable. The first RADIX command in the program will override the D switch setting. Following are examples of radix usage. The numbers in the comments are all in decimal notation. DB 10,010 ; produces 10,16 if RADIX was not seen yet ; and +D switch was not specified RADIX 10 DB 10,010 ; produces 10,10 RADIX 16 DB 10,010 ; produces 16,16 RADIX 2 DB 10,01010 ; produces 2,10 RADIX 3 ; for Martian programmers in Heinlein novels DB 10,100 ; produces 3,9 RADIX DB 10,010 ; produces 10,16 8-3 Floating Point Initializations A86 allows floating point numbers as the operands to DD, DQ, and DT directives. The numbers are encoded according to the IEEE standard, followed by the 8087 and 287 coprocessors. The format for floating point constants is as follows: First, there is a decimal number containing a decimal point. There must be a decimal point, or else the number is interpreted as an integer. There must also be at least one decimal digit, either to the left or right of the decimal point, or else the decimal point is interpreted as an addition (structure element) operator. Optionally, there may follow immediately after the decimal number the letter E followed by a decimal number. The E stands for "exponent", and means "times 10 raised to the power of". You may provide a + or - between the E and its number. Examples: 0.1 constant one-tenth .1 the same 300. floating point three hundred 30.E1 30 * 10**1; i.e., three hundred 30.E+1 the same 30.E-1 30 * 10**-1; i.e., three 30E1 not floating point: hex integer 030E1 1.234E20 scientific notation: 1.234 times 10 to the 20th 1.234E-20 a tiny number: 1.234 divided by 10 to the 20th Overview of Expressions Most of the operands that you code into your instructions and data initializations will be simple register names, variable names, or constants. However, you will regularly wish to code operands that are the results of arithmetic calculations, performed either by the machine when the program is running (for indexing), or by the assembler (to determine the value to assemble into the program). A86 has a full set of operators that you can use to create expressions to cover these cases: * Arithmetic Operators byte isolation and combination (HIGH, LOW, BY) addition and subtraction (+,-) multiplication and division (* , /, MOD) shifting operators (SHR, SHL, BIT) * Logical Operators (AND, OR, XOR, NOT) * Boolean Negation Operator (!) * Relational Operators (EQ, LE, LT, GE, GT, NE) * String Comparison Operators (EQ, NE, =) 8-4 * Attribute Operators/Specifiers size specifiers (B=BYTE,W=WORD,F=FAR,SHORT,LONG) attribute specifiers (OFFSET,NEAR,brackets) segment addressing specifier (:) compatibility operators (PTR,ST) built-in value specifiers (TYPE,THIS,$) * Special Data Duplication Operator (DUP) --see Chapter 9 for a description Types of Expression Operands Numbers and Label Addresses A number or constant (16-bit number) can be used in most expressions. A label (defined with a colon) is also treated as a constant and so can be used in expressions, except when it is a forward reference. Variables A variable stands for a byte- or word-memory location. You may add or subtract constants from variables; when you do so, the constant is added to the address of the variable. You typically do this when the variable is the name of a memory array. Index Expressions An index expression consists of a combination of a base register [BX] or [BP], and/or an index register [SI] or [DI], with an optional constant added or subtracted. You will usually want to precede the bracketed expression with B, W, or F; to specify the kind of memory unit (byte, word, or far pointer) you are referring to. The expression stands for the memory unit whose address is the run-time value(s) of the base and/or index registers added to the constant. See the Effective Address section and the beginning of this chapter for more details on indexed memory. Arithmetic Operators HIGH/LOW Syntax: HIGH operand LOW operand These operators are called the "byte isolation" operators. The operand must evaluate to a 16-bit number. HIGH returns the high order byte of the number; LOW the low order byte. For example, MOV AL,HIGH(01234) ; AL = 012 TENHEX EQU LOW(0FF10) ; TENHEX = 010 8-5 These operators can be applied to each other. The following identities apply: LOW LOW Q = LOW Q LOW HIGH Q = HIGH Q HIGH LOW Q = 0 HIGH HIGH Q = 0 BY Syntax: operand BY operand This operator is a "byte combination" operator. It returns the word whose high byte is the left operand, and whose low byte is the right operand. For example, the expression 3 BY 5 is the same as hexadecimal 0305. The BY operator is exclusive to A86. I added it to cover the following situation: Suppose you are initializing your registers to immediate values. Suppose you want to initialize AH to the ASCII value 'A', and AL to decimal 10. You could code this as two instructions MOV AH,'A' and MOV AL,10; but you realize that a single load into the AX register would save both program space and execution time. Without the BY operator, you would have to code MOV AX,0410A, which disguises the types of the individual byte operands you were thinking about. With BY, you can code it properly: MOV AX,'A' BY 10. Addition (combination) Syntax: operand + operand operand.operand operand PTR operand operand operand As shown in the above syntax, addition can be accomplished in four ways: with a plus sign, with a dot operator, with a PTR operator, and simply by juxtaposing two operands next to each other. The dot and PTR operators are provided for compatibility with Intel/IBM assemblers. The dot is used in structure field notation; PTR is used in expressions such as BYTE PTR 0. (See Chapter 12 for recommendations concerning PTR.) If either operand is a constant, the answer is an expression with the typing of the other operand, with the offsets added. For example, if BVAR is a byte variable, then BVAR + 100 is the byte variable 100 bytes beyond BVAR. Other examples: DB 100+17 ; simple addition CTRL EQU -040 MOV AL,CTRL'D' ; a nice notation for control-D! MOV DX,[BP].SMEM ; --where SMEM was in an unindexed structure DQ 10.0 + 7.0 ; floating point addition 8-6 Subtraction Syntax: operand - operand The subtraction operator may have operands that are: a. both absolute numbers b. variable names that have the same type The result is an absolute number; the difference between the two operands. Subtraction is also allowed between floating point numbers; the answer is the floating point difference. Multiplication and Division Syntax: operand * operand (multiplication) operand / operand (division) operand MOD operand (modulo) You may only use these operators with absolute or floating point numbers, and the result is always the same type. Either operand may be a numeric expression, as long as the expression evaluates to an absolute or floating point number. Examples: CMP AL,2 * 4 ; compare AL to 8 MOV BX,0123/16 ; BX = 012 DT 1.0 / 7.0 Shifting Operators Syntax: operand SHR count (shift right) operand SHL count (shift left) BIT count (bit number) The shift operators will perform a "bit-wise" shift of the operand. The operand will be shifted "count" bits either to the right or the left. Bits shifted into the operand will be set to 0. The expression "BIT count" is equivalent to "1 SHL count"; i.e., BIT returns the mask of the single bit whose number is "count". The operands must be numeric expressions that evaluate to absolute numbers. Examples: MOV BX, 0FACBH SHR 4 ; BX = 0FACH OR AL,BIT 6 ; AL = AL OR 040; 040 is the mask for bit 6 8-7 Logical Operators Syntax: operand OR operand operand XOR operand operand AND operand NOT operand The logical operators may only be used with absolute numbers. They always return an absolute number. Logical operators operate on individual bits. Each bit of the answer depends only on the corresponding bit in the operand(s). The functions performed are as follows: 1. OR: An answer bit is 1 if either or both of the operand bits is 1. An answer bit is 0 only if both operand bits are 0. Example: 11110000xB OR 00110011xB = 11110011xB 2. XOR: This is "exclusive OR." An answer bit is 1 if the operand bits are different; an answer bit is 0 if the operand bits are the same. Example: 11110000xB XOR 00110011xB = 11000011xB 3. AND: An answer bit is 1 only if both operand bits are 1. An answer bit is 0 if either or both operand bits are 0. Example: 11110000xB AND 00110011xB = 00110000xB 4. NOT: An answer bit is the opposite of the operand bit. It is 1 if the operand bit is 0; 0 if the operand bit is 1. Example: NOT 00110011xB = 11001100xB Boolean Negation Operator Syntax: ! operand The exclamation-point operator, rather than reversing each individual bit of the operand, considers the entire operand as a boolean variable to be negated. If the operand is non-zero (any of the bits are 1), the answer is 0. If the operand is zero, the answer is 0FFFF. 8-8 Because ! is intended to be used in conditional assembly expressions (described in Chapter 11), there is also a special action when ! is applied to an undefined name: the answer is the defined value 0FFFF, meaning it is TRUE that the symbol is undefined. Similarly, when ! is applied to some defined quantity other than an absolute constant, the answer is 0, meaning it is FALSE that the operand is undefined. Relational Operators Syntax: operand EQ operand (equal) operand NE operand (not equal) operand LT operand (less than) operand LE operand (less or equal) operand GT operand (greater than) operand GE operand (greater or equal) The relational operators may have operands that are: a. both absolute numbers b. variable names that have the same type The result of a relational operation is always an absolute number. They return an 8-or 16-bit result of all 1's for TRUE and all 0's for FALSE. Examples: MOV AL, 3 EQ 0 ; AL = 0 (false) MOV AX, 2 LE 15 ; AX = 0FFFFH (true) String Comparison Operators Syntax: string EQ string (equal) string NE string (not equal) string = string (equal ignoring case) In order to subsume the string comparison facilities offered by That Other Assembler's special conditional-assembly directives IFIDN and IFDIF, A86 allows the relational operators EQ and NE to accept string arguments. For this syntax to be accepted by A86, both strings must be bounded using the same delimiter (either single quotes for both strings, or double quotes for both strings). For a match (EQ returns TRUE or NE returns FALSE), the strings must be the same length, and every character must match exactly. 8-9 An additional A86-exclusive feature is the = operator, which returns TRUE if the characters of the strings differ only in the bit masked by the value 020. Thus you may use = to compare a macro parameter to a string containing nothing but letters. The comparison will be TRUE whether the macro parameter is upper-case or lower-case. No checking is made to detect non-letters, so if you use = on strings containing non-letters, you may get some false TRUE results. Also, = is accepted when it is applied to non-strings as well-- the corresponding values are interpreted as two-byte strings, with the 020 bits masked away before comparison. Attribute Operators/Specifiers B,W,D,Q,T memory variable specifiers Syntax: B operand Q operand operand B operand Q W operand T operand operand W operand T D operand operand D B, W, D, F, Q, and T convert the operand into a byte, word, doubleword, far, quadword, and ten-byte variable, respectively. The operand can be a constant, or a variable of the other type. Examples: ARRAY_PTR: DB 100 DUP (?) WVAR DW ? MOV AL,ARRAY_PTR B ; load first byte of ARRAY_PTR array into AL MOV AL,WVAR B ; load the low byte of WVAR into AL MOV AX,W[01000] ; load AX with the memory word at loc. 01000 LDS BX,D[01000] ; load DS:BX with the doubleword at loc. 01000 JMP F[01000] ; jump far to the 4-byte location at 01000 FLD T[BX] ; load ten-byte number at [BX] to 87 stack For compatibility with Intel/IBM assemblers, A86 accepts the more verbose synonyms BYTE, WORD, DWORD, FAR, QWORD, and TBYTE for B,W,D,F,Q,T, respectively. SHORT and LONG Operators Syntax: SHORT label LONG label 8-10 The SHORT operator is used to specify that the label referenced by a JMP instruction is within 127 bytes of the end of the instruction. The LONG operator specifies the opposite: that the label is not within 127 bytes. The appropriate operator can (and sometimes must) be used if the label is forward referenced in the instruction. When a non-local label is forward referenced, the assembler assumes that it will require two bytes to represent the relative offset of the label (so the instruction including the opcode byte will be three bytes). By correctly using the SHORT operator, you can save a byte of code when you use a forward reference. If the label is not within the specified range, an error will occur. The following example illustrates the use of the SHORT operator. JMP FWDLAB ; three byte instruction JMP SHORT FWDLAB ; two byte instruction JMP >L1 ; two byte instruction assumed for a local label Because the assembler assumes that a forward reference local label is SHORT, you may sometimes be forced to override this assumption if the label is in fact not within 127 bytes of the JMP. This is why LONG is provided: JMP LONG >L9 ; three byte instruction If you are bothered by this possibility, you can specify the +L switch, which causes A86 to pessimistically generate the three byte JMP for all forward references, unless specifically told not to with SHORT. NOTE that LONG will have effect only on the operand to an unconditional JMP instruction; not to conditional jumps. This is because the conditional jumps don't have 3-byte forms; the only conditional jumps are short ones. If you run into this problem, then chances are your code is getting out of control--time to rearrange, or to break off some of the intervening code into separate procedures. If you insist upon leaving the code intact, you can replace the conditional jump with an "IF cond JMP". OFFSET Operator Syntax: OFFSET var-name OFFSET is used to convert a variable into the constant pointer to the variable. For example, if you have declared XX DW ?, and you want to load SI with the pointer to the variable XX, you can code: MOV SI,OFFSET XX. The simpler instruction MOV SI,XX moves the variable contents of XX into SI, not the constant pointer to XX. 8-11 NEAR Operator Syntax: NEAR operand NEAR converts the operand to have the type of a code label, as if it were defined by appearing at the beginning of a program line with a colon after it. NEAR is provided mainly for compatibility with Intel/IBM assemblers. Square Brackets Operator Syntax: [operand] Square brackets around an operand give the operand a memory variable type. Square brackets are generally used to enclose the names of base and index registers: BX, BP, SI, and DI. When the size of the memory variable can be deduced from the context of the expression, square brackets are also used to turn numeric constants into memory variables. Examples: MOV B[BX+50],047 ; move imm value 047 into mem byte at BX+50 MOV AL,[050] ; move byte at memory location 050 into AL MOV AL,050 ; move immediate value 050 into AL Colon Operator Syntax: constant:operand segreg:operand seg_or_group_name:operand The colon operator is used to attach a segment register value to an operand. The segment register value appears to the left of the colon; the rest of the operand appears to the right of the colon. There are three forms to the colon operator. The first form has a constant as the segment register value. This form is used to create an operand to a long (inter-segment) JMP or CALL instruction. An example of this is the instruction JMP 0FFFF:0, which jumps to the cold-boot reset location of the 86 processor. The only context other than JMP or CALL in which this first form is legal, is as the operand to a DD directive or an EQU directive. The EQU case has a further restriction: the offset (the part to the right of the colon) must have a value less than 256. This is because there simply isn't room in a symbol table entry for a segment register value AND a 2-byte offset. I don't think you will be hurt by this restriction, since references to other segments are usually to jump tables at the beginning of those segments. 8-12 The second form has a segment register name to the left of the colon. This is the segment override form, provided for compatibility with Intel/IBM assemblers. A86 will generate a segment override byte when it sees this form, unless the operand to the right of the colon already has a default segment register that is the same as the given override. I prefer the more explicit method of overrides, exclusive to A86: simply place the segment register name before the instruction mnemonic. For example, I prefer ES MOV AL,[BX] to MOV AL,ES:[BX]. The third form has a segment or group name before the colon. This form is ignored by A86; it is provided for compatibility with Turbo C, which likes to include spurious DGROUP: overrides, to satisfy MASM's ASSUME-checking. ST Operator ST is ignored whenever it occurs in an expression. It is provided for compatibility with Intel and IBM assemblers. For example, you can code FLD ST(0),ST(1), which will be taken by A86 as FLD 0,1. TYPE Operator Syntax: TYPE operand The TYPE operator returns 1 if the operand is a byte variable; 2 if the operand is a word variable; 4 if the operand is a doubleword variable; 8 if the operand is a quadword variable; 10 if the operand is a ten-byte variable; and the number of bytes allocated by the structure if the operand is a structure name (see STRUC in the next chapter). A common usage of the TYPE operator is to represent the number of bytes of a named structure. For example, if you have declared a structure named LINE (as described in the next chapter) that defines 82 bytes of storage, then two ways you might refer to the value symbolically are as follows: MOV CX,TYPE LINE ; loads the size of LINE into CX DB TYPE LINE DUP ? ; allocates an area of memory for a LINE THIS and $ Specifiers THIS returns the value of the current location counter. It is provided for compatibility with Intel/IBM assemblers. The dollar sign $ is the more standard and familiar specifier for this purpose; it is equivalent to THIS NEAR. THIS is typically used with the BYTE and WORD specifiers to create alternate-typed symbols at the same memory location: 8-13 BVAR EQU THIS BYTE WVAR DW ? I don't recommend the use of THIS. If you wish to retain Intel compatibility, you can use the less verbose LABEL directive: BVAR LABEL BYTE WVAR DW ? If you are not concerned with compatibility to lesser assemblers, A86 offers a variety of less verbose forms. The most concise is DB without an operand: BVAR DB WVAR DW ? If this is too cryptic for you, there is always BVAR EQU B[$]. Operator Precedence Consider the expression 1 + 2 * 3. When A86 sees this expression, it could perform the multiplication first, giving an answer of 1+6 = 7; or it could do the addition first, giving an answer of 3*3 = 9. In fact, A86 does the multiplication first, because A86 assigns a higher precedence to multiplication than it does addition. The following list specifies the order of precedence A86 assigns to expression operators. All expressions are evaluated from left to right following the precedence rules. You may override this order of evaluation and precedence through the use of parentheses ( ). In the example above, you could override the precedence by parenthesizing the addition: (1+2) * 3. Some symbols that we have referred to as operators, are treated by the assembler as operands having built-in values. These include B, W, F, $, and ST. In a similar vein, a segment override term (a segment register name followed by a colon) is recorded when it is scanned, but not acted upon until the entire containing expression is scanned and evaluated. If two operators are adjacent, the rightmost operator must have precedence; otherwise, parentheses must be used. For example, the expression BIT ! 1 is illegal because the leftmost operator BIT has the higher precedence of the two adjacent operators BIT and "!". You can code BIT (! 1). --Highest Precedence-- 8-14 1. Parenthesized expressions 2. Period 3. OFFSET, SEG, TYPE, and PTR 4. HIGH, LOW, and BIT 5. Multiplication and division: *, /, MOD, SHR, SHL 6. Addition and subtraction: +,- a. unary b. binary 7. Relational: EQ, NE, LT, LE, GT, GE = 8. Logical NOT and ! 9. Logical AND 10. Logical OR and XOR 11. Colon for long pointer, SHORT, LONG, and BY 12. DUP --Lowest Precedence--