This chapter discusses the MASM compile-time language, including the very important macro expansion facilities. A macro is an identifier that the assembler will expand into additional text (often many lines of text), allowing you to abbreviate large amounts of code with a single identifier. MASM’s macro facility is actually a computer language inside a computer language; that is, you can write short little programs inside a MASM source file whose purpose is to generate other MASM source code to be assembled by MASM.
This language inside a language, also known as a compile-time language, consists of macros (the compile-time language equivalent of a procedure), conditionals (if
statements), loops, and other statements. This chapter covers many of the MASM compile-time language features and shows how you can use them to reduce the effort needed to write assembly language code.
MASM is actually two languages rolled into a single program. The runtime language is the standard x86-64/MASM assembly language you’ve been reading about in all the previous chapters. This is called the runtime language because the programs you write execute when you run the executable file. MASM contains an interpreter for a second language, the MASM compile-time language (CTL). MASM source files contain instructions for both the MASM CTL and the runtime program, and MASM executes the CTL program during assembly (compilation). Once MASM completes assembly, the CTL program terminates (see Figure 13-1).
The CTL application is not a part of the runtime executable that MASM emits, although the CTL application can write part of the runtime program for you, and, in fact, this is the major purpose of the CTL. Using automatic code generation, the CTL gives you the ability to easily and elegantly emit repetitive code. By learning how to use the MASM CTL and applying it properly, you can develop assembly language applications as rapidly as high-level language applications (even faster because MASM’s CTL lets you create very high-level-language constructs).
You may recall that Chapter 1 began with the typical first program most people write when learning a new language, the “Hello, world!” program. Listing 13-1 provides the basic “Hello, world!” program written in the MASM compile-time language.
; Listing 13-1
; CTL "Hello, world!" program.
echo Listing 13-1: Hello, world!
end
Listing 13-1: The CTL “Hello, world!” program
The only CTL statement in this program is the echo
statement.1 The end
statement is needed just to keep MASM happy.
The echo
statement displays the textual representation of its argument list during the assembly of a MASM program. Therefore, if you compile the preceding program with the command
ml64 /c listing13-1.asm
the MASM assembler will immediately print the following text:
Listing 13-1: Hello, world!
Other than displaying the text associated with the echo
parameter list, the echo
statement has no effect on the assembly of the program. It is invaluable for debugging CTL programs, displaying the progress of the assembly, and displaying assumptions and default actions that take place during assembly.
Though assembly language calls to print
also emit text to the standard output, there is a big difference between the following two groups of statements in a MASM source file:
echo "Hello World"
call print
byte "Hello World", nl,0
The first statement prints "Hello World"
(and a newline) during the assembly process and has no effect on the executable program. The last two lines don’t affect the assembly process (other than the emission of code to the executable file). However, when you run the executable file, the second set of statements prints the string Hello World
followed by a newline sequence.
The .err
directive, like echo
, will display a string to the console during assembly, though this must be a text string (delimited by <
and >
). The .err
statement displays the text as part of a MASM error diagnostic. Furthermore, the .err
statement increments the error count, and this will cause MASM to stop the assembly (without assembling or linking) after processing the current source file. You would normally use the .err
statement to display an error message during assembly if your CTL code discovers something that prevents it from creating valid code. For example:
.err <Statement must have exactly one operand>
Just as the runtime language does, the compile-time language supports constants and variables. You declare compile-time constants by using the textequ
or equ
directives. You declare compile-time variables by using the =
directive (compile-time assignment statement). For example:
inc_by equ 1
ctlVar = 0
ctlVar = ctlVar + inc_by
The MASM CTL supports constant expressions in the CTL assignment statement. See “MASM Constant Declarations” in Chapter 4 for a discussion of constant expressions (which are also the CTL expressions and operators).
In addition to the operators and functions appearing in that chapter, MASM includes several additional CTL operators, functions, and directives you will find useful. The following subsections describe these.
The first operator is the !
operator. When placed in front of another symbol, this operator tells MASM to treat that character as text rather than as a special symbol. For example, !;
creates a text constant consisting of the semicolon character, rather than a comment that causes MASM to ignore all text after the ;
symbol (for C/C++ programmers, this is similar to the backslash escape character, \
, in a string constant).
The second useful CTL operator is %
. The percent operator causes MASM to evaluate the expression following it and replace that expression with its value. For example, consider the following code sequence:
num10 = 10
text10 textequ <10>
tn11 textequ %num10 + 1
If you assemble this sequence in an assembly language source file and direct MASM to produce an assembly listing, it will report the following for these three symbols:
num10 . . . . . . . . . . . . . Number 0000000Ah
text10 . . . . . . . . . . . . . Text 10
tn11 . . . . . . . . . . . . . . Text 11
The num10
is properly reported as a numeric value (decimal 10), text10
as a text symbol (containing the string 10
), and tn11
as a text symbol (as you would expect, because this code sequence uses the textequ
directive to define it). However, rather than containing the string %num10 + 1
, MASM evaluates the expression num10 + 1
to produce the numeric value 11, which MASM then converts to text data. (By the way, to put a percent sign in a text string, use the text sequence <!%>
.)
If you place the %
operator in the first column of a source line, MASM will translate all numeric expressions on that line to textual form. This is handy with the echo
directive. It causes echo
to display the value of numeric equates rather than simply displaying the equate names.
The catstr
function has the following syntax:
identifier catstr string1, string2, ...
The identifier is an (up to this point) undefined symbol. The string1 and string2 operands are textual data surrounded by <
and >
symbols. This statement stores the concatenation of the two strings into identifier. Note that identifier is a text object, not a string object. If you specify the identifier in your code, MASM will substitute the text string for the identifier and try to process that text data as though it were part of your source code input.
The catstr
statement allows two or more operands separated by commas. The catstr
directive will concatenate the text values in the order they appear in the operand field. The following statement generates the textual data Hello, World!
:
helloWorld catstr <Hello>, <, >, <World!!>
Two exclamation marks are necessary in this example, because !
is an operator telling MASM to treat the next symbol as text rather than as an operator. With only one !
symbol, MASM thinks that you’re attempting to include a >
symbol as part of the string and reports an error (because there is no closing >
). Putting !!
in the text string tells MASM to treat the second !
symbol as a text character.
The instr
directive searches for the presence of one string within another. The syntax for the directive is
identifier instr start, source, search
where identifier is a symbol into which MASM will put the offset of the search string within the source string. The search begins at position start within source. Unconventionally, the first character in source has the position 1 (not 0). The following example searches for World
within the string Hello World
(starting at character position 1, which is the index of the H
character):
WorldPosn instr 1, <Hello World>, <World>
This statement defines WorldPosn
as a number with the value 7 (as the string World
is at position 7 in Hello World
if you start counting from position 1).
The sizestr
directive computes the length of a string.2 The syntax for the directive is
identifier sizestr string
where identifier is the symbol into which MASM will store the string’s length, and string is the string literal whose length this directive computes. As an example,
hwLen sizestr <Hello World>
defines the symbol hwLen
as a number and sets it to the value 11.
The substr
directive extracts a substring from a larger string. The syntax for this directive is
identifier substr source, start, len
where identifier is the symbol that MASM will create (type TEXT
, initialized with the substring characters), source is the source string from which MASM will extract the substring, start is the starting position in the string to begin the extraction, and len is the length of the substring to extract. The len operand is optional; if it is absent, MASM will assume you want to use the remainder of the string (starting at position start) for the substring. Here’s an example that extracts Hello
from the string Hello World
:
hString substr <Hello World>, 1, 5
MASM’s compile-time language provides an if statement, if
, that lets you make decisions at assembly time. The if
statement has two main purposes. The traditional use of if
is to support conditional assembly, allowing you to include or exclude code during an assembly, depending on the status of various symbols or constant values in your program. The second use is to support the standard if-statement decision-making process in the MASM compile-time language. This section discusses these two uses for the MASM if
statement.
The simplest form of the MASM compile-time if
statement uses the following syntax:
if constant_boolean_expression
Text
endif
At compile time, MASM evaluates the expression after the if
. This must be a constant expression that evaluates to an integer value. If the expression evaluates to true (nonzero), MASM continues to process the text in the source file as though the if
statement were not present. However, if the expression evaluates to false (zero), MASM treats all the text between the if
and the corresponding endif
clause as though it were a comment (that is, it ignores this text), as shown in Figure 13-2.
The identifiers in a compile-time expression must all be constant identifiers or a MASM compile-time function call (with appropriate parameters). Because MASM evaluates these expressions at assembly time, they cannot contain runtime variables.
The MASM if
statement supports optional elseif
and else
clauses that behave in an intuitive fashion. The complete syntax for the if
statement looks like the following:
if constant_boolean_expression1
Text
elseif constant_boolean_expression2
Text
else
Text
endif
If the first Boolean expression evaluates to true, MASM processes the text up to the elseif
clause. It then skips all text (that is, treats it like a comment) until it encounters the endif
clause. MASM continues processing the text after the endif
clause in the normal fashion.
If the first Boolean expression evaluates to false, MASM skips all the text until it encounters an elseif
, else
, or endif
clause. If it encounters an elseif
clause (as in the preceding example), MASM evaluates the Boolean expression associated with that clause. If it evaluates to true, MASM processes the text between the elseif
and the else
clauses (or to the endif
clause if the else
clause is not present). If, during the processing of this text, MASM encounters another elseif
or, as in the preceding example, an else
clause, then MASM ignores all further text until it finds the corresponding endif
. If both the first and second Boolean expressions in the previous example evaluate to false, MASM skips their associated text and begins processing the text in the else
clause.
You can create a nearly infinite variety of if
statement sequences by including zero or more elseif
clauses and optionally supplying the else
clause.
A traditional use of conditional assembly is to develop software that you can easily configure for several environments. For example, the fcomip
instruction makes floating-point comparisons easy, but this instruction is available only on Pentium Pro and later processors. To use this instruction on the processors that support it and fall back to the standard floating-point comparison on the older processors, most engineers use conditional assembly to embed the separate sequences in the same source file (instead of writing and maintaining two versions of the program). The following example demonstrates how to do this:
; Set true (1) to use FCOMIxx instrs.
PentProOrLater = 0
.
.
.
if PentProOrLater
fcomip st(0), st(1) ; Compare ST1 to ST0 and set flags
else
fcomp ; Compare ST1 to ST0
fstsw ax ; Move the FPU condition code bits
sahf ; into the FLAGS register
endif
As currently written, this code fragment will compile the three-instruction sequence in the else
clause and ignore the code between the if
and else
clauses (because the constant PentProOrLater
is false). By changing the value of PentProOrLater
to true, you can tell MASM to compile the single fcomip
instruction rather than the three-instruction sequence.
Though you need to maintain only a single source file, conditional assembly does not let you create a single executable that runs efficiently on all processors. When using this technique, you will still have to create two executable programs (one for Pentium Pro and later processors, one for the earlier processors) by compiling your source file twice: during the first assembly, you must set the PentProOrLater
constant to false; during the second assembly, you must set it to true.
If you are familiar with conditional assembly in other languages, such as C/C++, you may be wondering if MASM supports a statement like C’s #ifdef
statement. The answer is yes, it does. Consider the following modification to the preceding code that uses this directive:
; Note: uncomment the following line if you are compiling this
; code for a Pentium Pro or later CPU.
; PentProOrLater = 0 ; Value and type are irrelevant
.
.
.
ifdef PentProOrLater
fcomip st(0), st(1) ; Compare ST1 to ST0 and set flags
else
fcomp ; Compare ST1 to ST0
fstsw ax ; Move the FPU condition code bits
sahf ; into the FLAGS register
endif
Another common use of conditional assembly is to introduce debugging and testing code into your programs. A typical debugging technique that many MASM programmers use is to insert print statements at strategic points throughout their code; this enables them to trace through their code and display important values at various checkpoints.
A big problem with this technique, however, is that they must remove the debugging code prior to completing the project. Two further problems are as follows:
Conditional assembly can provide a solution to this problem. By defining a symbol (say, debug
) to control debugging output in your program, you can activate or deactivate all debugging output by modifying a single line of source code. The following code fragment demonstrates this:
; Set to true to activate debug output.
debug = 0
.
.
.
if debug
echo *** DEBUG build
mov edx, i
call print
byte "At point A, i=%d", nl, 0
else
echo *** RELEASE build
endif
As long as you surround all debugging output statements with an if
statement like the preceding one, you don’t have to worry about debugging output accidentally appearing in your final application. By setting the debug
symbol to false, you can automatically disable all such output. Likewise, you don’t have to remove all your debugging statements from your programs after they’ve served their immediate purpose. By using conditional assembly, you can leave these statements in your code because they are so easy to deactivate. Later, if you decide you need to view this same debugging information during assembly, you can reactivate it by setting the debug
symbol to true.
Although program configuration and debugging control are two of the more common, traditional uses for conditional assembly, don’t forget that the if
statement provides the basic conditional statement in the MASM CTL. You will use the if
statement in your compile-time programs the same way you would use an if
statement in MASM or another language. Later sections in this chapter present lots of examples of using the if
statement in this capacity.
MASM’s while..endm
, for..endm
, and forc..endm
statements provide compile-time loop constructs.3 The while
statement tells MASM to process the same sequence of statements repetitively during assembly. This is handy for constructing data tables as well as providing a traditional looping structure for compile-time programs.
The while
statement uses the following syntax:
while constant_boolean_expression
Text
endm
When MASM encounters the while
statement during assembly, it evaluates the constant Boolean expression. If the expression evaluates to false, MASM will skip over the text between the while
and the endm
clauses (the behavior is similar to the if
statement if the expression evaluates to false). If the expression evaluates to true, MASM will process the statements between the while
and endm
clauses and then “jump back” to the start of the while
statement in the source file and repeat this process, as shown in Figure 13-3.
To understand how this process works, consider the program in Listing 13-2.
; Listing 13-2
; CTL while loop demonstration program.
option casemap:none
nl = 10
.const
ttlStr byte "Listing 13-2", 0
.data
ary dword 2, 3, 5, 8, 13
include getTitle.inc
include print.inc
.code
; Here is the "asmMain" function.
public asmMain
asmMain proc
push rbx
push rbp
mov rbp, rsp
sub rsp, 56 ; Shadow storage
i = 0
while i LT lengthof ary ; 5
mov edx, i ; This is a constant!
mov r8d, ary[i * 4] ; Index is a constant
call print
byte "array[%d] = %d", nl, 0
i = i + 1
endm
allDone: leave
pop rbx
ret ; Returns to caller
asmMain endp
end
Listing 13-2: w
hile
..
endm
demonstration
Here’s the build command and program output for Listing 13-2:
C:\>build listing13-2
C:\>echo off
Assembling: listing13-2.asm
c.cpp
C:\>listing13-2
Calling Listing 13-2:
array[0] = 2
array[1] = 3
array[2] = 5
array[3] = 8
array[4] = 13
Listing 13-2 terminated
The while
loop repeats five times during assembly. On each repetition of the loop, the MASM assembler processes the statements between the while
and endm
directives. Therefore, the preceding program is really equivalent to the code fragment shown in Listing 13-3.
.
.
.
mov edx, 0 ; This is a constant!
mov r8d, ary[0] ; Index is a constant
call print
byte "array[%d] = %d", nl, 0
mov edx, 1 ; This is a constant!
mov r8d, ary[4] ; Index is a constant
call print
byte "array[%d] = %d", nl, 0
mov edx, 2 ; This is a constant!
mov r8d, ary[8] ; Index is a constant
call print
byte "array[%d] = %d", nl, 0
mov edx, 3 ; This is a constant!
mov r8d, ary[12] ; Index is a constant
call print
byte "array[%d] = %d", nl, 0
mov edx, 4 ; This is a constant!
mov r8d, ary[16] ; Index is a constant
call print
byte "array[%d] = %d", nl, 0
Listing 13-3: Program equivalent to the code in Listing 13-2
As you can see in this example, the while
statement is convenient for constructing repetitive-code sequences, especially for unrolling loops.
MASM provides two forms of the for..endm
loop. These two loops take the following general form:
for identifier, <arg1, arg2, ..., argn>
.
.
.
endm
forc identifier, <string>
.
.
.
endm
The first form of the for
loop (plain for
) repeats the code once for each of the arguments specified between the <
and >
brackets. On each repetition of the loop, it sets identifier to the text of the current argument: on the first iteration of the loop, identifier is set to arg1, and on the second iteration it is set to arg2, and so on, until the last iteration, when it is set to argn. For example, the following for
loop will generate code that pushes the RAX, RBX, RCX, and RDX registers onto the stack:
for reg, <rax, rbx, rcx, rdx>
push reg
endm
This for
loop is equivalent to the following code:
push rax
push rbx
push rcx
push rdx
The forc
compile-time loop repeats the body of its loop for each character appearing in the string specified by the second argument. For example, the following forc
loop generates a hexadecimal byte value for each character in the string:
forc hex, <0123456789ABCDEF>
hexNum catstr <0>,<hex>,<h>
byte hexNum
endm
The for
loop will turn out to be a lot more useful than forc
. Nevertheless, forc
is handy on occasion. Most of the time when you’re using these loops, you’ll be passing them a variable set of arguments rather than a fixed string. As you’ll soon see, these loops are handy for processing macro parameters.
Macros are objects that a language processor replaces with other text during compilation. Macros are great devices for replacing long, repetitive sequences of text with much shorter sequences of text. In addition to the traditional role that macros play (for example, #define
in C/C++), MASM’s macros also serve as the equivalent of a compile-time language procedure or function.
Macros are one of MASM’s main features. The following sections explore MASM’s macro-processing facilities and the relationship between macros and other MASM CTL control constructs.
MASM supports a straightforward macro facility that lets you define macros in a manner that is similar to declaring a procedure. A typical, simple macro declaration takes the following form:
macro_name macro arguments
Macro body
endm
The following code is a concrete example of a macro declaration:
neg128 macro
neg rdx
neg rax
sbb rdx, 0
endm
Execution of this macro’s code will compute the two’s complement of the 128-bit value in RDX:RAX (see the description of extended-precision neg
in “Extended-Precision Negation Operations” in Chapter 8).
To execute the code associated with neg128
, you specify the macro’s name at the point you want to execute these instructions. For example:
mov rax, qword ptr i128
mov rdx, qword ptr i128[8]
neg128
This intentionally looks just like any other instruction; the original purpose of macros was to create synthetic instructions to simplify assembly language programming.
Though you don’t need to use a call
instruction to invoke a macro, from the point of view of your program, invoking a macro executes a sequence of instructions just like calling a procedure. You could implement this simple macro as a procedure by using the following procedure declaration:
neg128p proc
neg rdx
neg rax
sbb rdx, 0
ret
neg128p endp
The following two statements will both negate the value in RDX:RAX:
neg128
call neg128p
The difference between these two (the macro invocation versus the procedure call) is that macros expand their text inline, whereas a procedure call emits a call to the corresponding procedure elsewhere in the text. That is, MASM replaces the invocation neg128
directly with the following text:
neg rdx
neg rax
sbb rdx, 0
On the other hand, MASM replaces the procedure call neg128p
with the machine code for the call
instruction:
call neg128p
You should choose macro versus procedure call based on efficiency. Macros are slightly faster than procedure calls because you don’t execute the call
and corresponding ret
instructions, but they can make your program larger because a macro invocation expands to the text of the macro’s body on each invocation. If the macro body is large and you invoke the macro several times throughout your program, it will make your final executable much larger. Also, if the body of your macro executes more than a few simple instructions, the overhead of a call
and ret
sequence has little impact on the overall execution time of the code, so the execution time savings are nearly negligible. On the other hand, if the body of a procedure is very short (like the preceding neg128
example), the macro implementation can be faster and doesn’t expand the size of your program by much. A good rule of thumb is as follows:
Use macros for short, time-critical program units. Use procedures for longer blocks of code and when execution time is not as critical.
Macros have many other disadvantages over procedures. Macros cannot have local (automatic) variables, macro parameters work differently than procedure parameters, macros don’t support (runtime) recursion, and macros are a little more difficult to debug than procedures (just to name a few disadvantages). Therefore, you shouldn’t really use macros as a substitute for procedures except when performance is absolutely critical.
Like procedures, macros allow you to define parameters that let you supply different data on each macro invocation, which lets you write generic macros whose behavior can vary depending on the parameters you supply. By processing these macro parameters at compile time, you can write sophisticated macros.
Macro parameter declaration syntax is straightforward. You supply a list of parameter names as the operands in a macro declaration:
neg128 macro reg64HO, reg64LO
neg reg64HO
neg reg64LO
sbb reg64HO, 0
endm
When you invoke a macro, you supply the actual parameters as arguments to the macro invocation:
neg128 rdx, rax
MASM automatically associates the type text
with macro parameters. This means that during a macro expansion, MASM substitutes the text you supply as the actual parameter everywhere the formal parameter name appears. The semantics of pass by textual substitution are a little different from pass by value or pass by reference, so exploring those differences here is worthwhile.
Consider the following macro invocations, using the neg128
macro from the previous section:
neg128 rdx, rax
neg128 rbx, rcx
These two invocations expand into the following code:
; neg128 rdx, rax
neg rdx
neg rax
sbb rdx, 0
; neg128 rbx, rcx
neg rbx
neg rcx
sbb rbx, 0
Macro invocations do not make a local copy of the parameters (as pass by value does), nor do they pass the address of the actual parameter to the macro. Instead, a macro invocation of the form neg128 rdx, rax
is equivalent to the following:
reg64HO textequ <rdx>
reg64LO textequ <rax>
neg reg64HO
neg reg64LO
sbb reg64HO, 0
The text objects immediately expand their string values inline, producing the former expansion for neg128 rdx, rax
.
Macro parameters are not limited to memory, register, or constant operands as are instruction or procedure operands. Any text is fine as long as its expansion is legal wherever you use the formal parameter. Similarly, formal parameters may appear anywhere in the macro body, not just where memory, register, or constant operands are legal. Consider the following macro declaration and sample invocations that demonstrate how you can expand a formal parameter into a whole instruction:
chkError macro instr, jump, target
instr
jump target
endm
chkError <cmp eax, 0>, jnl, RangeError ; Example 1
.
.
.
chkError <test bl, 1>, jnz, ParityError ; Example 2
; Example 1 expands to:
cmp eax, 0
jnl RangeError
; Example 2 expands to:
test bl, 1
jnz ParityError
We use the <
and >
brackets to treat the full cmp
and test
instructions as a single string (normally, the comma in these instructions would split them into two macro parameters).
In general, MASM assumes that all text between commas constitutes a single macro parameter. If MASM encounters any opening bracketing symbols (left parentheses, left braces, or left angle brackets), then it will include all text up to the appropriate closing symbol, ignoring any commas that may appear within the bracketing symbols. Of course, MASM does not consider commas (and bracketing symbols) within a string constant as the end of an actual parameter. So the following macro and invocation are perfectly legal:
_print macro strToPrint
call print
byte strToPrint, nl, 0
endm
.
.
.
_print "Hello, world!"
MASM treats the string Hello, world!
as a single parameter because the comma appears inside a literal string constant, just as your intuition suggests.
You can run into some issues when MASM expands your macro parameters, because parameters are expanded as text, not values. Consider the following macro declaration and invocation:
Echo2nTimes macro n, theStr
echoCnt = 0
while echoCnt LT n * 2
call print
byte theStr, nl, 0
echoCnt = echoCnt + 1
endm
endm
.
.
.
Echo2nTimes 3 + 1, "Hello"
This example displays Hello
five times during assembly rather than the eight times you might intuitively expect. This is because the preceding while
statement expands to
while echoCnt LT 3 + 1 * 2
The actual parameter for n
is 3 + 1
; because MASM expands this text directly in place of n
, you get an erroneous text expansion. At compile time MASM computes 3 + 1 * 2
as the value 5 rather than as the value 8 (which you would get if the MASM passed this parameter by value rather than by textual substitution).
The common solution to this problem when passing numeric parameters that may contain compile-time expressions is to surround the formal parameter in the macro with parentheses; for example, you would rewrite the preceding macro as follows:
Echo2nTimes macro n, theStr
echoCnt = 0
while echoCnt LT (n) * 2
call print
byte theStr, nl, 0
echoCnt = echoCnt + 1
endm ; while
endm ; macro
Now, the invocation expands to the following code that produces the intuitive result:
while echoCnt LT (3 + 1) * 2
call print
byte theStr, nl, 0
endm
If you don’t have control over the macro definition (perhaps it’s part of a library module you use, and you can’t change the macro definition because doing so could break existing code), there is another solution to this problem: use the MASM %
operator before the argument in the macro invocation so that the CTL interpreter evaluates the expression before expanding the parameters. For example:
Echo2nTimes %3 + 1, "Hello"
This will cause MASM to properly generate eight calls to the print
procedure (and associated data).
As a general rule, MASM treats macro arguments as optional arguments. If you define a macro that specifies two arguments and invoke that argument with only one argument, MASM will not (normally) complain about the invocation. Instead, it will simply substitute the empty string for the expansion of the second argument. In some cases, this is acceptable and possibly even desirable.
However, suppose you left off the second parameter in the neg128
macro given earlier. That would compile to a neg
instruction with a missing operand and MASM would report an error; for example:
neg128 macro arg1, arg2 ; Line 6
neg arg1 ; Line 7
neg arg2 ; Line 8
sbb arg1, 0 ; Line 9
endm ; Line 10
; Line 11
neg128 rdx ; Line 12
Here’s the error that MASM reports:
listing14.asm(12) : error A2008:syntax error : in instruction
neg128(2): Macro Called From
listing14.asm(12): Main Line Code
The (12)
is telling us that the error occurred on line 12 in the source file. The neg128(2)
line is telling us that the error occurred on line 2 of the neg128
macro. It’s a bit difficult to see what is actually causing the problem here.
One solution is to use conditional assembly inside the macro to test for the presence of both parameters. At first, you might think you could use code like this:
neg128 macro reg64HO, reg64LO
if reg64LO eq <>
.err <neg128 requires 2 operands>
endif
neg reg64HO
neg reg64LO
sbb reg64O, 0
endm
.
.
.
neg128 rdx
Unfortunately, this fails for a couple of reasons. First of all, the eq
operator doesn’t work with text operands. MASM will expand the text operands before attempting to apply this operator, so the if
statement in the preceding example effectively becomes
if eq
because MASM substitutes the empty string for both the operands around the eq
operator. This, of course, generates a syntax error. Even if there were non-blank textual operands around the eq
operator, this would still fail because eq
expects numeric operands. MASM solves this issue by introducing several additional conditional if
statements intended for use with text operands and macro arguments. Table 13-1 lists these additional if
statements.
Table 13-1: Text-Handling Conditional if
Statements
Statement | Text operand(s) | Meaning |
ifb * |
arg | If blank: true if arg evaluates to an empty string. |
ifnb |
arg | If not blank: true if arg evaluates to a non-empty string. |
ifdif |
arg1, arg2 |
If different: true if arg1 and arg2 are different (case-sensitive). |
ifdifi |
arg1, arg2 |
If different: true if arg1 and arg2 are different (case-insensitive). |
ifidn |
arg1, arg2 |
If identical: true if arg1 and arg2 are exactly the same (case-sensitive). |
ifidni |
arg2, arg2 |
If identical: true if arg1 and arg2 are exactly the same (case-insensitive). |
* |
You use these conditional if
statements exactly like the standard if
statement. You can also follow these if
statements with an elseif
or else
clause, but there are no elseifb
, elseifnb
, . . . , variants of these if
statements (only a standard elseif
with a Boolean expression may follow these statements).
The following snippet demonstrates how to use the ifb
statement to ensure that the neg128
macro has exactly two arguments. There is no need to check whether reg64HO
is also blank; if reg64HO
is blank, reg64LO
will also be blank, and the ifb
statement will report the appropriate error:
neg128 macro reg64HO, reg64LO
ifb <reg64LO>
.err <neg128 requires 2 operands>
endif
neg reg64HO
neg reg64LO
sbb reg64HO, 0
endm
Be very careful about using ifb
in your programs. It is easy to pass in a text symbol to a macro and wind up testing whether the name of that symbol is blank rather than the text itself. Consider the following:
symbol textequ <>
neg128 rax, symbol ; Generates an error
The neg128
invocation has two arguments, and the second one is not blank, so the ifb
directive is happy with the argument list. However, inside the macro when neg128
expands reg64LO
after the neg
instruction, the expansion is the empty string, producing an error (which is what the ifb
was supposed to prevent).
A different way to handle missing macro arguments is to explicitly tell MASM that an argument is required with the :req
suffix on the macro definition line. Consider the following definition for the neg128
macro:
neg128 macro reg64HO:req, reg64LO:req
neg reg64HO
neg reg64LO
sbb reg64HO, 0
endm
With the :req
option present, MASM reports the following if you are missing one or more of the macro arguments:
listing14.asm(12) : error A2125:missing macro argument
One way to handle missing macro arguments is to define default values for those arguments. Consider the following definition for the neg128
macro:
neg128 macro reg64HO:=<rdx>, reg64LO:=<rax>
neg reg64HO
neg reg64LO
sbb reg64HO, 0
endm
The :=
operator tells MASM to substitute the text constant to the right of the operator for the associated macro argument if an actual value is not present on the macro invocation line. Consider the following two invocations of neg128
:
neg128 ; Defaults to "RDX, RAX" for the args
neg128 rbx ; Uses RBX:RAX for the 128-bit register pair
It is possible to tell MASM to allow a variable number of arguments in a macro invocation:
varParms macro varying:vararg
Macro body
endm
.
.
.
varParms 1
varParms 1, 2
varParms 1, 2, 3
varParms
Within the macro, MASM will create a text object of the form <
arg1,
arg2, ...,
argn>
and assign this text object to the associated parameter name (varying
, in the preceding example). You can use the MASM for
loop to extract the individual values of the varying argument. For example:
varParms macro varying:vararg
for curArg, <varying>
byte curArg
endm ; End of FOR loop
endm ; End of macro
varParms 1
varParms 1, 2
varParms 1, 2, 3
varParms <5 dup (?)>
Here’s the listing output for an assembly containing this example source code:
00000000 .data
varParms macro varying:vararg
for curArg, <varying>
byte curArg
endm ; End of FOR loop
endm ; End of macro
varParms 1
00000000 01 2 byte 1
varParms 1, 2
00000001 01 2 byte 1
00000002 02 2 byte 2
varParms 1, 2, 3
00000003 01 2 byte 1
00000004 02 2 byte 2
00000005 03 2 byte 3
varParms <5 dup (?)>
00000006 00000005 [ 2 byte 5 dup (?)
00
]
A macro can have, at most, one vararg
parameter. If a macro has more than one parameter and also has a vararg
parameter, the vararg
parameter must be the last argument.
Inside a macro, you can use the &
operator to replace a macro name (or other text symbol) with its actual value. This operator is active anywhere, even with string literals. Consider the following examples:
expand macro parm
byte '&parm', 0
endm
.data
expand a
The macro invocation in this example expands to the following code:
byte 'a', 0
If, for some reason, you need the string '&parm'
to be emitted within a macro (that has parm
as one of its parameters), you will have to work around the expansion operator. Note that '!&parm'
will not escape the &
operator. One solution that works in this specific case is to rewrite the byte
directive:
expand macro parm
byte '&', 'parm', 0
endm
Now the &
operator is not causing the expansion of parm
inside a string.
Consider the following macro declaration:
jzc macro target
jnz NotTarget
jc target
NotTarget:
endm
This macro simulates an instruction that jumps to the specified target location if the zero flag is set and the carry flag is set. Conversely, if either the zero flag or the carry flag is clear, this macro transfers control to the instruction immediately following the macro invocation.
There is a serious problem with this macro. Consider what happens if you use this macro more than once in your program:
jzc Dest1
.
.
.
jzc Dest2
.
.
.
The preceding macro invocations expand to the following code:
jnz NotTarget
jc Dest1
NotTarget:
.
.
.
jnz NotTarget
jc Dest2
NotTarget:
.
.
.
These two macro invocations both emit the same label, NotTarget
, during macro expansion. When MASM processes this code, it will complain about a duplicate symbol definition.
MASM’s solution to this problem is to allow the use of local symbols within a macro. Local macro symbols are unique to a specific invocation of a macro. You must explicitly tell MASM which symbols must be local by using the local
directive:
macro_name macro optional_parameters
local list_of_local_names
Macro body
endm
The list_of_local_names is a sequence of one or more MASM identifiers separated by commas. Whenever MASM encounters one of these names in a particular macro invocation, it automatically substitutes a unique name for that identifier. For each macro invocation, MASM substitutes a different name for the local symbol.
You can correct the problem with the jzc
macro by using the following macro code:
jzc macro target
local NotTarget
jnz NotTarget
jc target
NotTarget:
endm
Now whenever MASM processes this macro, it will automatically associate a unique symbol with each occurrence of NotTarget
. This will prevent the duplicate symbol error that occurs if you do not declare NotTarget
as a local symbol.
MASM generates symbols of the form ??
nnnn, where nnnn is a (unique) four-digit hexadecimal number, for each local symbol. So, if you see symbols such as ??0000
in your assembly listings, you know where they came from.
A macro definition can have multiple local
directives, each with its own list of local names. However, if you have multiple local
statements in a macro, they should all immediately follow the macro
directive.
The MASM exitm
directive (which may appear only within a macro) tells MASM to immediately terminate the processing of the macro. MASM will ignore any additional lines of text within the macro. If you think of a macro as a procedure, exitm
is the return statement.
The exitm
directive is useful in a conditional assembly sequence. Perhaps after checking for the presence (or absence) of certain macro arguments, you might want to stop processing the macro to avoid additional errors from MASM. For example, consider the earlier neg128
macro:
neg128 macro reg64HO, reg64LO
ifb <reg64LO>
.err <neg128 requires 2 operands>
exitm
endif
neg reg64HO
neg reg64LO
sbb reg64HO, 0
endm
Without the exitm
directive inside the conditional assembly, this macro would attempt to assemble the neg reg64LO
instruction, generating another error because reg64LO
expands to the empty string.
Originally, MASM’s macro design allowed programmers to create substitute mnemonics. A programmer could use a macro to replace a machine instruction or other statement (or sequence of statements) in an assembly language source file. Macros could create only whole lines of output text in the source file. This prevented programmers from using macro invocation such as the following:
mov rax, some_macro_invocation(arguments)
Today, MASM supports additional syntax that allows you to create macro functions. A MASM macro function definition looks exactly like a normal macro definition with one addition: you use an exitm
directive with a textual argument to return a function result from the macro. Consider the upperCase
macro function in Listing 13-4.
; Listing 13-4
; CTL while loop demonstration program.
option casemap:none
nl = 10
.const
ttlStr byte "Listing 13-4", 0
; upperCase macro function.
; Converts text argument to a string, converting
; all lowercase characters to uppercase.
upperCase macro theString
local resultString, thisChar, sep
resultStr equ <> ; Initialize function result with ""
sep textequ <> ; Initialize separator char with ""
forc curChar, theString
; Check to see if the character is lowercase.
; Convert it to uppercase if it is, otherwise
; output it to resultStr as is. Concatenate the
; current character to the end of the result string
; (with a ", " separator, if this isn't the first
; character appended to resultStr).
if ('&curChar' GE 'a') and ('&curChar' LE 'z')
resultStr catstr resultStr, sep, %'&curChar'-32
else
resultStr catstr resultStr, sep, %'&curChar'
endif
; First time through, sep is the empty string. For all
; other iterations, sep is the comma separator between
; values.
sep textequ <, >
endm ; End for
exitm <resultStr>
endm ; End macro
; Demonstration of the upperCase macro function:
.data
chars byte "Demonstration of upperCase"
byte "macro function:"
byte upperCase(<abcdEFG123>), nl, 0
.code
externdef printf:proc
; Return program title to C++ program:
public getTitle
getTitle proc
lea rax, ttlStr
ret
getTitle endp
; Here is the "asmMain" function.
public asmMain
asmMain proc
push rbx
push rbp
mov rbp, rsp
sub rsp, 56 ; Shadow storage
lea rcx, chars ; Prints characters converted to uppercase
call printf
allDone: leave
pop rbx
ret ; Returns to caller
asmMain endp
end
Listing 13-4: Sample macro function
Whenever you invoke a MASM macro function, you must always follow the macro name with a pair of parentheses enclosing the macro’s arguments. Even if the macro has no arguments, an empty pair of parentheses must be present. This is how MASM differentiates standard macros and macro functions.
Earlier versions of MASM included functions for directives such as sizestr
(using the name @sizestr
). Recent versions of MASM have removed these functions. However, you can easily write your own macro functions to replace these missing functions. Here’s a quick replacement for the @sizestr
function:
; @sizestr - Replacement for the MASM @sizestr function
; that Microsoft removed from MASM.
@sizestr macro theStr
local theLen
theLen sizestr <theStr>
exitm <&theLen>
endm
The &
operator in the exitm
directive forces the @sizestr
macro to expand the text associated with theLen
local symbol inside the <
and >
string delimiters before returning the value to whomever invoked the macro function. Without the &
operator, the @sizestr
macro will return text of the form ??0002
(the unique symbol MASM creates for the local symbol theLen
).
Although programmers typically use macros to expand to a sequence of machine instructions, there is absolutely no requirement that a macro body contain any executable instructions. Indeed, many macros contain only compile-time language statements (for example, if
, while
, for
, =
assignments, and the like). By placing only compile-time language statements in the body of a macro, you can effectively write compile-time procedures and functions using macros.
The following unique
macro is a good example of a compile-time function that returns a string result:
unique macro
local theSym
exitm <theSym>
endm
Whenever your code references this macro, MASM replaces the macro invocation with the text theSym
. MASM generates unique symbols such as ??0000
for local macro symbols. Therefore, each invocation of the unique
macro will generate a sequence of symbols such as ??0000
, ??0001
, ??0002
, and so forth.
The MASM compile-time language allows you to write short programs that write other programs—in particular, to automate the creation of large or complex assembly language sequences. The following subsections provide simple examples of such compile-time programs.
Earlier, this book suggested that you could write programs to generate large, complex lookup tables for your assembly language programs (see the discussion of tables in “Generating Tables” in Chapter 10). Chapter 10 provides C++ programs that generate tables to paste into assembly programs. In this section, we will use the MASM compile-time language to construct data tables during assembly of the program that uses the tables.
One common use for the compile-time language is to build ASCII character lookup tables for alphabetic case manipulation with the xlat
instruction at runtime. Listing 13-5 demonstrates how to construct an uppercase conversion table and a lowercase conversion table.4 Note the use of a macro as a compile-time procedure to reduce the complexity of the table-generating code.
; Listing 13-5
; Creating lookup tables with macros.
option casemap:none
nl = 10
.const
ttlStr byte "Listing 13-5", 0
fmtStr1 byte "testString converted to UC:", nl
byte "%s", nl, 0
fmtStr2 byte "testString converted to LC:", nl
byte "%s", nl, 0
testString byte "This is a test string ", nl
byte "Containing UPPERCASE ", nl
byte "and lowercase chars", nl, 0
emitChRange macro start, last
local index, resultStr
index = start
while index lt last
byte index
index = index + 1
endm
endm
; Lookup table that will convert lowercase
; characters to uppercase. The byte at each
; index contains the value of that index,
; except for the bytes at indexes "a" to "z".
; Those bytes contain the values "A" to "Z".
; Therefore, if a program uses an ASCII
; character's numeric value as an index
; into this table and retrieves that byte,
; it will convert the character to uppercase.
lcToUC equ this byte
emitChRange 0, 'a'
emitChRange 'A', %'Z'+1
emitChRange %'z'+1, 0ffh
; As above, but this table converts uppercase
; to lowercase characters.
UCTolc equ this byte
emitChRange 0, 'A'
emitChRange 'a', %'z'+1
emitChRange %'Z'+1, 0ffh
.data
; Store the destination strings here:
toUC byte 256 dup (0)
TOlc byte 256 dup (0)
.code
externdef printf:proc
; Return program title to C++ program:
public getTitle
getTitle proc
lea rax, ttlStr
ret
getTitle endp
; Here is the "asmMain" function.
public asmMain
asmMain proc
push rbx
push rdi
push rsi
push rbp
mov rbp, rsp
sub rsp, 56 ; Shadow storage
; Convert the characters in testString to uppercase:
lea rbx, lcToUC
lea rsi, testString
lea rdi, toUC
jmp getUC
toUCLp: xlat
mov [rdi], al
inc rsi
inc rdi
getUC: mov al, [rsi]
cmp al, 0
jne toUCLp
; Display the converted string:
lea rcx, fmtStr1
lea rdx, toUC
call printf
; Convert the characters in testString to lowercase:
lea rbx, UCTolc
lea rsi, testString
lea rdi, TOlc
jmp getLC
toLCLp: xlat
mov [rdi], al
inc rsi
inc rdi
getLC: mov al, [rsi]
cmp al, 0
jne toLCLp
; Display the converted string:
lea rcx, fmtStr2
lea rdx, TOlc
call printf
allDone: leave
pop rsi
pop rdi
pop rbx
ret ; Returns to caller
asmMain endp
end
Listing 13-5: Generating case-conversion tables with the compile-time language
Here’s the build command and sample output for the program in Listing 13-5:
C:\>build listing13-5
C:\>echo off
Assembling: listing13-5.asm
c.cpp
C:\>listing13-5
Calling Listing 13-5:
testString converted to UC:
THIS IS A TEST STRING
CONTAINING UPPERCASE
AND LOWERCASE CHARS
testString converted to LC:
this is a test string
containing uppercase
and lowercase chars
Listing 13-5 terminated
Chapter 7 points out that you can unroll loops to improve the performance of certain assembly language programs. However, this requires a lot of extra typing, especially if you have many loop iterations. Fortunately, MASM’s compile-time language facilities, especially the while
loop, come to the rescue. With a small amount of extra typing plus one copy of the loop body, you can unroll a loop as many times as you please.
If you simply want to repeat the same code sequence a certain number of times, unrolling the code is especially trivial. All you have to do is wrap a MASM while..endm
loop around the sequence and count off the specified number of iterations. For example, if you wanted to print Hello World
10 times, you could encode this as follows:
count = 0
while count LT 10
call print
byte "Hello World", nl, 0
count = count + 1
endm
Although this code looks similar to a high-level language while
loop, remember the fundamental difference: the preceding code simply consists of 10 straight calls to print
in the program. Were you to encode this using an actual loop, there would be only one call to print
and lots of additional logic to loop back and execute that single call 10 times.
Unrolling loops becomes slightly more complicated if any instructions in that loop refer to the value of a loop control variable or another value, which changes with each iteration of the loop. A typical example is a loop that zeroes the elements of an integer array:
xor eax, eax ; Set EAX and RBX to 0
xor rbx, rbx
lea rcx, array
whlLp: cmp rbx, 20
jae loopDone
mov [rcx][rbx * 4], eax
inc rbx
jmp whlLp
loopDone:
In this code fragment, the loop uses the value of the loop control variable (in RBX) to index into array
. Simply copying mov [rcx][ebx * 4], eax
20 times is not the proper way to unroll this loop. You must substitute an appropriate constant index in the range 0 to 76 (the corresponding loop indices, times 4) in place of rbx * 4
in this example. Correctly unrolling this loop should produce the following code sequence:
mov [rcx][0 * 4], eax
mov [rcx][1 * 4], eax
mov [rcx][2 * 4], eax
mov [rcx][3 * 4], eax
mov [rcx][4 * 4], eax
mov [rcx][5 * 4], eax
mov [rcx][6 * 4], eax
mov [rcx][7 * 4], eax
mov [rcx][8 * 4], eax
mov [rcx][9 * 4], eax
mov [rcx][10 * 4], eax
mov [rcx][11 * 4], eax
mov [rcx][12 * 4], eax
mov [rcx][13 * 4], eax
mov [rcx][14 * 4], eax
mov [rcx][15 * 4], eax
mov [rcx][16 * 4], eax
mov [rcx][17 * 4], eax
mov [rcx][18 * 4], eax
mov [rcx][19 * 4], eax
You can easily do this using the following compile-time code sequence:
iteration = 0
while iteration LT 20
mov [rcx][iteration * 4], eax
iteration = iteration + 1
endm
If the statements in a loop use the loop control variable’s value, it is possible to unroll such loops only if those values are known at compile time. You cannot unroll loops when user input (or other runtime information) controls the number of iterations.
Of course, if the code sequence loaded RCX with the address of array
immediately prior to this loop, you could also use the following while
loop to save the use of the RCX register:
iteration = 0
while iteration LT 20
mov array[iteration * 4], eax
iteration = iteration + 1
endm
Calling procedures (functions) in assembly language is a real chore. Loading registers with parameters, pushing values onto the stack, and other activities are a complete distraction. High-level language procedure calls are far more readable and easier to write than the same calls to an assembly language function. Macros provide a good mechanism to call procedures and functions in a high-level-like manner.
Of course, the most trivial example is a call to an assembly language procedure that has no arguments at all:
someProc macro
call _someProc
endm
_someProc proc
.
.
.
_someProc endp
.
.
.
someProc ; Call the procedure
This simple example demonstrates a couple of conventions this book will use for calling procedures via macro invocation:
someProc
, change the procedure’s name to _someProc
and then use someProc
as the macro name.While the advantage to using a macro invocation of the form someProc
versus a call to the procedure using call someProc
might seem somewhat dubious, keeping all procedure calls consistent (by using macro invocations for all of them) helps make your programs more readable.
The next step up in complexity is to call a procedure with a single parameter. Assuming you’re using the Microsoft ABI and passing the parameter in RCX, the simplest solution is something like the following:
someProc macro parm1
mov rcx, parm1
call _someProc
endm
.
.
.
someProc Parm1Value
This macro works well if you’re passing a 64-bit integer by value. If the parameter is an 8-, 16-, or 32-bit value, you would swap CL, CX, or ECX for RCX in the mov
instruction.5
If you’re passing the first argument by reference, you would swap an lea
instruction for the mov
instruction in this example. As reference parameters are always 64-bit values, the lea
instruction would usually take this form:
lea rcx, parm1
Finally, if you’re passing a real4
or real8
value as the parameter, you’d swap one of the following instructions for the mov
instruction in the previous macro:
movss xmm0, parm1 ; Use this for real4 parameters
movsd xmm0, parm1 ; Use this for real8 parameters
As long as the actual parameter is a memory variable or an appropriate integer constant, this simple macro definition works quite well, covering a very large percentage of the real-world cases.
For example, to call the C Standard Library printf()
function with a single argument (the format string) using the current macro scheme, you’d write the macro as follows:6
cprintf macro parm1
lea rcx, parm1
call printf
endm
So you can invoke this macro as
cprintf fmtStr
where fmtStr
is (presumably) the name of a byte
object in your .data
section containing the printf
format string.
For a more high-level-like syntax for our procedure calls, we should allow something like the following:
cprintf "This is a printf format string"
Unfortunately, the way the macro is currently written, this will generate the following (syntactically incorrect) statement:
lea rcx, "This is a printf format string"
We could modify this macro to allow this invocation by rewriting it as follows:
cprintf macro parm1
local fmtStr
.data
fmtStr byte parm1, nl, 0
.code
lea rcx, fmtStr
call printf
endm
Invoking this macro by using a string constant as the argument expands to the following code:
.data
fmtStr byte "This is a printf format string", nl, 0
.code
lea rcx, fmtStr ; Technically, fmtStr will really be something
call printf ; like ??0001
The only problem with this new form of the macro is that it no longer accepts invocations such as
cprintf fmtStr
where fmtStr
is a byte object in the .data
section. We’d really like to have a macro that can accept both forms.
The trick to this is the opattr
operator (see Table 4-1 in Chapter 4). This operator returns an integer value with certain bits set based on the type of expression that follows. In particular, bit 2 will be set if the expression following is relocatable or otherwise references memory. Therefore, this bit will be set if a variable such as fmtStr
appears as the argument, and it will be clear if you pass a string literal as the argument (opattr
actually returns the value 0 for string literals that are longer than 8 characters, just so you know). Now consider the code in Listing 13-6.
; Listing 13-6
; opattr demonstration.
option casemap:none
nl = 10
.const
ttlStr byte "Listing 13-6", 0
fmtStr byte nl, "Hello, World! #2", nl, 0
.code
externdef printf:proc
; Return program title to C++ program:
public getTitle
getTitle proc
lea rax, ttlStr
ret
getTitle endp
; cprintf macro:
; cprintf fmtStr
; cprintf "Format String"
cprintf macro fmtStrArg
local fmtStr, attr, isConst
attr = opattr fmtStrArg
isConst = (attr and 4) eq 4
if (attr eq 0) or isConst
.data
fmtStr byte fmtStrArg, nl, 0
.code
lea rcx, fmtStr
else
lea rcx, fmtStrArg
endif
call printf
endm
atw = opattr "Hello World"
bin = opattr "abcdefghijklmnopqrstuvwxyz"
; Here is the "asmMain" function.
public asmMain
asmMain proc
push rbx
push rdi
push rsi
push rbp
mov rbp, rsp
sub rsp, 56 ; Shadow storage
cprintf "Hello World!"
cprintf fmtStr
allDone: leave
pop rsi
pop rdi
pop rbx
ret ; Returns to caller
asmMain endp
end
Listing 13-6: opattr
operator in a macro
Here’s the build command and sample output for Listing 13-6:
C:\>build listing13-6
C:\>echo off
Assembling: listing13-6.asm
c.cpp
C:\>listing13-6
Calling Listing 13-6:
Hello World!
Hello, World! #2
Listing 13-6 terminated
This cprintf
macro is far from perfect. For example, the C/C++ printf()
function allows multiple arguments that this macro does not handle. But this macro does demonstrate how to handle two different calls to printf
based on the type of the argument you pass cprintf
.
Expanding the macro-calling mechanism from one parameter to two or more (assuming a fixed number of parameters) is fairly easy. All you need to do is add more formal parameters and handle those arguments in your macro definition. Listing 13-7 is a modification of Listing 9-11 in Chapter 9 that uses macro invocations for calls to r10ToStr
, e10ToStr
, and some fixed calls to printf
(for brevity, as this is a very long program, only the macros and a few invocations are included).
.
. ; About 1200 lines from Listing 9-10.
.
; r10ToStr - Macro to create an HLL-like call for the
; _r10ToStr procedure.
; Parameters:
; r10 - Must be the name of a real4, real8, or
; real10 variable.
; dest - Must be the name of a byte buffer to hold
; string result.
; wdth - Output width for the string. Either an
; integer constant or a dword variable.
; dPts - Number of positions after the decimal
; point. Either an integer constant or
; a dword variable.
; fill - Fill char. Either a character constant
; or a byte variable.
; mxLen - Maximum length of output string. Either
; an integer constant or a dword variable.
r10ToStr macro r10, dest, wdth, dPts, fill, mxLen
fld r10
; dest is a label associated with a string variable:
lea rdi, dest
; wdth is either a constant or a dword var:
mov eax, wdth
; dPts is either a constant or a dword var
; holding the number of decimal point positions:
mov edx, dPts
; Process fill character. If it's a constant,
; directly load it into ECX (which zero-extends
; into RCX). If it's a variable, then move with
; zero extension into ECX (which also zero-
; extends into RCX).
; Note: bit 2 from opattr is 1 if fill is
; a constant.
if ((opattr fill) and 4) eq 4
mov ecx, fill
else
movzx ecx, fill
endif
; mxLen is either a constant or a dword var.
mov r8d, mxLen
call _r10ToStr
endm
; e10ToStr - Macro to create an HLL-like call for the
; _e10ToStr procedure.
; Parameters:
; e10 - Must be the name of a real4, real8, or
; real10 variable.
; dest - Must be the name of a byte buffer to hold
; string result.
; wdth - Output width for the string. Either an
; integer constant or a dword variable.
; xDigs - Number of exponent digits.
; fill - Fill char. Either a character constant
; or a byte variable.
; mxLen - Maximum length of output string. Either
; an integer constant or a dword variable.
e10ToStr macro e10, dest, wdth, xDigs, fill, mxLen
fld e10
; dest is a label associated with a string variable:
lea rdi, dest
; wdth is either a constant or a dword var:
mov eax, wdth
; xDigs is either a constant or a dword var
; holding the number of decimal point positions:
mov edx, xDigs
; Process fill character. If it's a constant,
; directly load it into ECX (which zero-extends
; into RCX). If it's a variable, then move with
; zero extension into ECX (which also zero-
; extends into RCX).
; Note: bit 2 from opattr is 1 if fill is
; a constant.
if ((opattr fill) and 4) eq 4
mov ecx, fill
else
movzx ecx, fill
endif
; mxLen is either a constant or a dword var.
mov r8d, mxLen
call _e10ToStr
endm
; puts - A macro to print a string using printf.
; Parameters:
; fmt - Format string (must be a byte
; variable or string constant).
; theStr - String to print (must be a
; byte variable, a register,
; or a string constant).
puts macro fmt, theStr
local strConst, bool
lea rcx, fmt
if ((opattr theStr) and 2)
; If memory operand:
lea rdx, theStr
elseif ((opattr theStr) and 10h)
; If register operand:
mov rdx, theStr
else
; Assume it must be a string constant.
.data
strConst byte theStr, 0
.code
lea rdx, strConst
endif
call printf
endm
public asmMain
asmMain proc
push rbx
push rsi
push rdi
push rbp
mov rbp, rsp
sub rsp, 64 ; Shadow storage
; F output:
r10ToStr r10_1, r10str_1, 30, 16, '*', 32
jc fpError
puts fmtStr1, r10str_1
r10ToStr r10_1, r10str_1, 30, 15, '*', 32
jc fpError
puts fmtStr1, r10str_1
.
. ; Similar code to Listing 9-10 with macro
. ; invocations rather than procedure calls.
; E output:
e10ToStr e10_1, r10str_1, 26, 3, '*', 32
jc fpError
puts fmtStr3, r10str_1
e10ToStr e10_2, r10str_1, 26, 3, '*', 32
jc fpError
puts fmtStr3, r10str_1
.
. ; Similar code to Listing 9-10 with macro
. ; invocations rather than procedure calls.
Listing 13-7: Macro call implementation for converting floating-point values to strings
Compare the HLL-like calls to these three functions against the original procedure calls in Listing 9-11:
; F output:
fld r10_1
lea rdi, r10str_1
mov eax, 30 ; fWidth
mov edx, 16 ; decimalPts
mov ecx, '*' ; Fill
mov r8d, 32 ; maxLength
call r10ToStr
jc fpError
lea rcx, fmtStr1
lea rdx, r10str_1
call printf
fld r10_1
lea rdi, r10str_1
mov eax, 30 ; fWidth
mov edx, 15 ; decimalPts
mov ecx, '*' ; Fill
mov r8d, 32 ; maxLength
call r10ToStr
jc fpError
lea rcx, fmtStr1
lea rdx, r10str_1
call printf
.
. ; Additional code from Listing 9-10.
.
; E output:
fld e10_1
lea rdi, r10str_1
mov eax, 26 ; fWidth
mov edx, 3 ; expDigits
mov ecx, '*' ; Fill
mov r8d, 32 ; maxLength
call e10ToStr
jc fpError
lea rcx, fmtStr3
lea rdx, r10str_1
call printf
fld e10_2
lea rdi, r10str_1
mov eax, 26 ; fWidth
mov edx, 3 ; expDigits
mov ecx, '*' ; Fill
mov r8d, 32 ; maxLength
call e10ToStr
jc fpError
lea rcx, fmtStr3
lea rdx, r10str_1
call printf
.
. ; Additional code from Listing 9-10.
.
Clearly, the macro version is easier to read (and, as it turns out, easier to debug and maintain too).
Some procedures expect a varying number of parameters; the C/C++ printf()
function is a good example. Some procedures, though they might support only a fixed number of arguments, could be better written using a varying argument list. For example, consider the print
procedure that has appeared throughout the examples in this book; its string parameter (which follows the call to print
in the code stream) is, technically, a single-string argument. Consider the following macro implementation for a call to print
:
print macro arg
call _print
byte arg, 0
endm
You could invoke this macro as follows:
print "Hello, World!"
The only problem with this macro is that you will often want to supply multiple arguments in its invocation, such as this:
print "Hello, World!", nl, "It's a great day!", nl
Unfortunately, this macro will not accept this list of parameters. However, this seems like a natural use of the print
macro, so it makes a lot of sense to modify the print
macro to handle multiple arguments and combine them as a single string after the call to the _print
function. Listing 13-8 provides such an implementation.
; Listing 13-8
; HLL-like procedure calls with
; a varying parameter list.
option casemap:none
nl = 10
.const
ttlStr byte "Listing 13-8", 0
.code
externdef printf:proc
include getTitle.inc
; Note: don't include print.inc here
; because this code uses a macro for
; print.
; print macro - HLL-like calling sequence for the _print
; function (which is, itself, a shell for
; the printf function).
; If print appears on a line by itself (no; arguments),
; then emit a string consisting of a single newline
; character (and zero-terminating byte). If there are
; one or more arguments, emit each argument and append
; a single 0 byte after all the arguments.
; Examples:
; print
; print "Hello, World!"
; print "Hello, World!", nl
print macro arg1, optArgs:vararg
call _print
ifb <arg1>
; If print is used by itself, print a
; newline character:
byte nl, 0
else
; If we have one or more arguments, then
; emit each of them:
byte arg1
for oa, <optArgs>
byte oa
endm
; Zero-terminate the string.
byte 0
endif
endm
_print proc
push rax
push rbx
push rcx
push rdx
push r8
push r9
push r10
push r11
push rbp
mov rbp, rsp
sub rsp, 40
and rsp, -16
mov rcx, [rbp + 72] ; Return address
call printf
mov rcx, [rbp + 72]
dec rcx
skipTo0: inc rcx
cmp byte ptr [rcx], 0
jne skipTo0
inc rcx
mov [rbp + 72], rcx
leave
pop r11
pop r10
pop r9
pop r8
pop rdx
pop rcx
pop rbx
pop rax
ret
_print endp
p macro arg
call _print
byte arg, 0
endm
; Here is the "asmMain" function.
public asmMain
asmMain proc
push rbx
push rdi
push rsi
push rbp
mov rbp, rsp
sub rsp, 56 ; Shadow storage
print "Hello world"
print
print "Hello, World!", nl
allDone: leave
pop rsi
pop rdi
pop rbx
ret ; Returns to caller
asmMain endp
end
Listing 13-8: Varying arguments’ implementation of print
macro
Here’s the build command and output for the program in Listing 13-8:
C:\>build listing13-8
C:\>echo off
Assembling: listing13-8.asm
c.cpp
C:\>listing13-8
Calling Listing 13-8:
Hello world
Hello, World!
Listing 13-8 terminated
With this new print
macro, you can now call the _print
procedure in an HLL-like fashion by simply listing the arguments in the print
invocation:
print "Hello World", nl, "How are you today?", nl
This will generate a byte directive that concatenates all the individual string components.
Note, by the way, that it is possible to pass a string containing multiple arguments to the original (single-argument) version of print
. By rewriting the macro invocation
print "Hello World", nl
as
print <"Hello World", nl>
you get the desired output. MASM treats everything between the <
and >
brackets as a single argument. However, it’s a bit of a pain to have to constantly put these brackets around multiple arguments (and your code is inconsistent, as single arguments don’t require them). The print
macro implementation with varying arguments is a much better solution.
At one time, MASM provided a special directive, invoke
, that you could use to call a procedure and pass it parameters (it worked with the proc
directive to determine the number and type of parameters a procedure expected). When Microsoft modified MASM to support 64-bit code, it removed the invoke
statement from the MASM language.
However, some enterprising programmers have written MASM macros to simulate the invoke
directive in 64-bit versions of MASM. The invoke
macro not only is useful in its own right but also provides a great example of how to write advanced macros to call procedures. For more information on the invoke
macro, visit https://www.masm32.com/ and download the MASM32 SDK. This includes a set of macros (and other utilities) for 64-bit programs, including the invoke
macro.
The previous sections provided examples of macro parameter processing used to determine the type of a macro argument in order to determine the type of code to emit. By carefully examining the attributes of an argument, a macro can make various choices concerning how to deal with that argument. This section presents some more advanced techniques you can use when processing macro arguments.
Clearly, the opattr
compile-time operator is one of the most important tools you can use when looking at macro arguments. This operator uses the following syntax:
opattr expression
Note that a generic address expression follows opattr
; you are not limited to a single symbol.
The opattr
operator returns an integer value that is a bit mask specifying the opattr
attributes of the associated expression. If the expression following opattr
contains forward-referenced symbols or is an illegal expression, opattr
returns 0. Microsoft’s documentation indicates that opattr
returns the values shown in Table 13-2.
Table 13-2: opattr
Return Values
Bit | Meaning | |
0 | There is a code label in the expression. | |
1 | The expression is relocatable. | |
2 | The expression is a constant expression. | |
3 | The expression is uses direct (PC-relative) addressing. | |
4 | The expression is a register. | |
5 | The expression contains no undefined symbols (obsolete). | |
6 | The expression is a stack-segment memory expression. | |
7 | The expression references an external symbol. | |
8–11 | Language type* | |
Value | Language | |
0 | No language type | |
1 | C | |
2 | SYSCALL | |
3 | STDCALL | |
4 | Pascal | |
5 | FORTRAN | |
6 | BASIC | |
* 64-bit code generally doesn’t support a language type, so these bits are usually 0. |
Quite honestly, Microsoft’s documentation does not do the best job explaining how MASM sets the bits. For example, consider the following MASM statements:
codeLabel:
opcl = opattr codeLabel ; Sets opcl to 25h or 0010_0101b
opconst = opattr 0 ; Sets opconst to 36 or 0010_0100b
The opconst
has bits 2 and 5 set, just as you would expect from Table 13-2. However, opcl
has bits 0, 2, and 5 set; 0 and 5 make sense, but bit 2 (the expression is a constant expression) does not make sense. If, in a macro, you were to test only bit 2 to determine if the operand is a constant (as, I must admit, I have done in earlier examples in this chapter), you could get into trouble when bit 2 is set and you assume that it is a constant.
Probably the wisest thing to do is to mask off bits 0 to 7 (or maybe just bits 0 to 6) and compare the result against an 8-bit value rather than a simple mask. Table 13-3 lists some common values you can test against.
Table 13-3: 8-Bit Values for opattr
Results
Value | Meaning |
0 | Undefined (forward-referenced) symbol or illegal expression |
34 / 22h | Memory access of the form [ reg + const] |
36 / 24h | Constant |
37 / 25h | Code label (proc name or symbol with a : suffix) or offset code_label form |
38 / 26h | Expression of the form offset label, where label is a variable in the .data section |
42 / 2Ah | Global symbol (for example, symbol in .data section) |
43 / 2Bh | Memory access of the form [ reg + code_label] , where code_label is a proc name or symbol with : suffix |
48 / 30h | Register (general-purpose, MM, XMM, YMM, ZMM, floating-point/ST, or other special-purpose register) |
98 / 62h | Stack-relative memory access (memory addresses of the form [rsp + xxx] and [rbp + xxx] ) |
165 / 0A5h | External code symbol (37 / 25h with bit 7 set) |
171 / ABh | External data symbol (43 / 2Bh with bit 7 set) |
Perhaps the biggest issue with opattr
, as has already been pointed out, is that it believes that constant expressions are integers that can fit into 64 bits. This creates a problem for two important constant types: string literals (longer than 8 characters) and floating-point constants. opattr
returns 0 for both.8
Although opattr
won’t help us determine whether an operand is a string, we can use MASM’s string-processing operations to test the first character of an operand to see if it is a quote. The following code does just that:
; testStr is a macro function that tests its
; operand to see if it is a string literal.
testStr macro strParm
local firstChar
ifnb <strParm>
firstChar substr <strParm>, 1, 1
ifidn firstChar,<!">
; First character was ", so assume it's
; a string.
exitm <1>
endif ; ifidn
endif ; ifnb
; If we get to this point in the macro,
; we definitely do not have a string.
exitm <0>
endm
Consider the following two invocations of the testStr
macro:
isAStr = testStr("String Literal")
notAStr = testStr(someLabel)
MASM will set the symbol isAStr
to the value 1, and notAStr
to the value 0.
Real constants are another literal type that MASM’s opattr
operator doesn’t support. Again, writing a macro to test for a real constant can resolve that issue. Sadly, parsing real numbers isn’t as easy as checking for a string constant: there is no single leading character that we can use to say, “Hey, we’ve got a floating-point constant here.” The macro will have to explicitly parse the operand character by character and validate it.
To begin with, here is a grammar that defines a MASM floating-point constant:
Sign ::= (+|-)
Digit ::= [0-9]
Mantissa ::= (Digit)+ | '.' Digit)+ | (Digit)+ '.' Digit*
Exp ::= (e|E) Sign? Digit? Digit? Digit?
Real ::= Sign? Mantissa Exp?
A real number consists of an optional sign followed by a mantissa and an optional exponent. A mantissa contains at least one digit; it can also contain a decimal point with a digit to its left or right (or both). However, a mantissa cannot consist of a decimal point by itself.
The macro function to test for a real constant should be callable as follows:
isReal = getReal(some_text)
where some_text is the textual data we want to test to see if it’s a real constant. The macro for getReal
could be the following:
; getReal - Parses a real constant.
; Returns:
; true - If the parameter contains a syntactically
; correct real number (and no extra characters).
; false - If there are any illegal characters or
; other syntax errors in the numeric string.
getReal macro origParm
local parm, curChar, result
; Make a copy of the parameter so we don't
; delete the characters in the original string.
parm textequ &origParm
; Must have at least one character:
ifb parm
exitm <0>
endif
; Extract the optional sign:
if isSign(parm)
curChar textequ extract1st(parm) ; Skip sign char
endif
; Get the required mantissa:
if getMant(parm) eq 0
exitm <0> ; Bad mantissa
endif
; Extract the optional exponent:
result textequ getExp(parm)
exitm <&result>
endm ; getReal
Testing for real constants is a complex process, so it’s worthwhile to go through this macro (and all subservient macros) step by step:
getReal
will delete characters from the parameter string while parsing the string. This macro makes a copy to prevent disturbing the original text string passed in to it.getReal
must return false. It’s important to check for the empty string right away because the rest of the code makes the assumption that the string is at least one character long.getSign
macro function. This function (its definition appears a little later) returns true if the first character of its argument is a +
or -
symbol; otherwise, it returns false. extract1st
macro:
curChar textequ extract1st(parm) ; Skip sign char
The extract1st
macro returns the first character of its argument as the function result (which this statement assigns to the curChar
symbol) and then deletes the first character of its argument. So if the original string passed to getReal
was +1
, this statement puts +
into curChar
and deletes the first character in parm
(producing the string 1
). The definition for extract1st
appears a little later in this section.
getReal
doesn’t actually use the sign character assigned to curChar
. The purpose of this extract1st
invocation was strictly for the side effect of deleting the first character in parm
.
getMant
. This macro function will return true if the prefix of its string argument is a valid mantissa. It will return false if the mantissa does not contain at least one numeric digit. Note that getMant
will stop processing the string on the first non-mantissa character it encounters (including a second decimal point, if there are two or more decimal points in the mantissa). The getMant
function doesn’t care about illegal characters; it leaves it up to getReal
to look at the remaining characters after the return from getMant
to determine if the whole string is valid. As a side effect, getMant
deletes all leading characters from the parameter string that it processes.getExp
macro function to process any (optional) trailing exponent. The getExp
macro is also responsible for ensuring that no garbage characters follow (which results in a parse failure).The isSign
macro is fairly straightforward. Here’s its implementation:
; isSign - Macro function that returns true if the
; first character of its parameter is a
; "+" or "-".
isSign macro parm
local FirstChar
ifb <parm>
exitm <0>
endif
FirstChar substr parm, 1, 1
ifidn FirstChar, <+>
exitm <1>
endif
ifidn FirstChar, <->
exitm <1>
endif
exitm <0>
endm
This macro uses the substr
operation to extract the first character from the parameter and then compares this against the sign characters (+
or -
). It returns true if it is a sign character, and false otherwise.
The extract1st
macro function removes the first character from the argument passed to it and returns that character as the function result. As a side effect, this macro function also deletes the first character from the parameter passed to it. Here’s extract1st
’s implementation:
extract1st macro parm
local FirstChar
ifb <%parm>
exitm <>
endif
FirstChar substr parm, 1, 1
if @sizestr(%parm) GE 2
parm substr parm, 2
else
parm textequ <>
endif
exitm <FirstChar>
endm
The ifb
directive checks whether the parameter string is empty. If it is, extract1st
immediately returns the empty string without further modification to its parameter.
Note the %
operator before the parm
argument. The parm
argument actually expands to the name of the string variable holding the real constant. This turns out to be something like ??0005
because of the copy made of the original parameter in the getReal
function. Were you to simply specify ifb <parm>
,
the ifb
directive would see <??0005>
, which is not blank. Placing the %
operator before the parm
symbol tells MASM to evaluate the expression (which is just the ??0005
symbol) and replace it by the text it evaluates to (which, in this case, is the actual string).
If the string is not blank, extract1st
uses the substr
directive to extract the first character from the string and assigns this character to the FirstChar
symbol. The extract1st
macro function will return this value as the function result.
Next, the extract1st
function has to delete the first character from the parameter string. It uses the @sizestr
function (whose definition appears a little earlier in this chapter) to determine whether the character string contains at least two characters. If so, extract1st
uses the substr
directive to extract all the characters from the parameter, starting at the second character position. It assigns this substring back to the parameter passed in. If extract1st
is processing the last character in the string (that is, if @sizestr
returns 1), then the code cannot use the substr
directive because the index would be out of range. The else
section of the if
directive returns an empty string if @sizestr
returns a value less than 2.
The next getReal
subservient macro function is getMant
. This macro is responsible for parsing the mantissa component of the floating-point constant. The implementation is the following:
getMant macro parm
local curChar, sawDecPt, rpt
sawDecPt = 0
curChar textequ extract1st(parm) ; Get 1st char
ifidn curChar, <.> ; Check for dec pt
sawDecPt = 1
curChar textequ extract1st(parm) ; Get 2nd char
endif
; Must have at least one digit:
if isDigit(curChar) eq 0
exitm <0> ; Bad mantissa
endif
; Process zero or more digits. If we haven't already
; seen a decimal point, allow exactly one of those.
; Do loop at least once if there is at least one
; character left in parm:
rpt = @sizestr(%parm)
while rpt
; Get the 1st char from parm and see if
; it is a decimal point or a digit:
curChar substr parm, 1, 1
ifidn curChar, <.>
rpt = sawDecPt eq 0
sawDecPt = 1
else
rpt = isDigit(curChar)
endif
; If char was legal, then extract it from parm:
if rpt
curChar textequ extract1st(parm) ; Get next char
endif
; Repeat as long as we have more chars and the
; current character is legal:
rpt = rpt and (@sizestr(%parm) gt 0)
endm ; while
; If we've seen at least one digit, we've got a valid
; mantissa. We've stopped processing on the first
; character that is not a digit or the 2nd "." char.
exitm <1>
endm ; getMant
A mantissa must have at least one decimal digit. It can have zero or one occurrence of a decimal point (which may appear before the first digit, at the end of the mantissa, or in the middle of a string of digits). The getMant
macro function uses the local symbol sawDecPt
to keep track of whether it has seen a decimal point already. The function begins by initializing sawDecPt
to false (0).
A valid mantissa must have at least one character (because it must have at least one decimal digit). So the next thing getMant
does is extract the first character from the parameter string (and place this character in curChar
). If the first character is a period (decimal point), the macro sets sawDecPt
to true.
The getMant
function uses a while
directive to process all the remaining characters in the mantissa. A local variable, rpt
, controls the execution of the while
loop. The code at the beginning of getMant
sets rpt
to true if the first character is a period or a decimal digit. The isDigit
macro function tests the first character of its argument and returns true if it’s one of the characters 0 to 9. The definition for isDigit
will appear shortly.
If the first character in the parameter was a dot (.
) or a decimal digit, the getMant
function removes that character from the beginning of the string and executes the body of the while
loop for the first time if the new parameter string length is greater than zero.
The while
loop grabs the first character from the current parameter string (without deleting it just yet) and tests it against a decimal digit or a .
character. If it’s a decimal digit, the loop will remove that character from the parameter string and repeat. If the current character is a period, the code first checks whether it has already seen a decimal point (using sawDecPt
). If this is a second decimal point, the function returns true (later code will deal with the second .
character). If the code has not already seen a decimal point, the loop sets sawDecPt
to true and continues with the loop execution.
The while
loop repeats as long as it sees decimal digits, a single decimal point, or a string with length greater than zero. Once the loop completes, the getMant
function returns true. The only way getMant
returns false is if it does not see at least one decimal digit (either at the beginning of the string or immediately after the decimal point at the beginning of the string).
The isDigit
macro function is a brute-force function that tests its first character against the 10 decimal digits. This function does not remove any characters from its parameter argument. The source code for isDigit
is the following:
isDigit macro parm
local FirstChar
if @sizestr(%parm) eq 0
exitm <0>
endif
FirstChar substr parm, 1, 1
ifidn FirstChar, <0>
exitm <1>
endif
ifidn FirstChar, <1>
exitm <1>
endif
ifidn FirstChar, <2>
exitm <1>
endif
ifidn FirstChar, <3>
exitm <1>
endif
ifidn FirstChar, <4>
exitm <1>
endif
ifidn FirstChar, <5>
exitm <1>
endif
ifidn FirstChar, <6>
exitm <1>
endif
ifidn FirstChar, <7>
exitm <1>
endif
ifidn FirstChar, <8>
exitm <1>
endif
ifidn FirstChar, <9>
exitm <1>
endif
exitm <0>
endm
The only thing worth commenting on is the %
operator in @sizestr
(for reasons explained earlier).
Now we arrive at the last helper function appearing in getReal
: the getExp
(get exponent) macro function. Here’s its implementation:
getExp macro parm
local curChar
; Return success if no exponent present.
if @sizestr(%parm) eq 0
exitm <1>
endif
; Extract the next character, return failure
; if it is not an "e" or "E" character:
curChar textequ extract1st(parm)
if isE(curChar) eq 0
exitm <0>
endif
; Extract the next character:
curChar textequ extract1st(parm)
; If an optional sign character appears,
; remove it from the string:
if isSign(curChar)
curChar textequ extract1st(parm) ; Skip sign char
endif ; isSign
; Must have at least one digit:
if isDigit(curChar) eq 0
exitm <0>
endif
; Optionally, we can have up to three additional digits:
if @sizestr(%parm) gt 0
curChar textequ extract1st(parm) ; Skip 1st digit
if isDigit(curChar) eq 0
exitm <0>
endif
endif
if @sizestr(%parm) gt 0
curChar textequ extract1st(parm) ; Skip 2nd digit
if isDigit(curChar) eq 0
exitm <0>
endif
endif
if @sizestr(%parm) gt 0
curChar textequ extract1st(parm) ; Skip 3rd digit
if isDigit(curChar) eq 0
exitm <0>
endif
endif
; If we get to this point, we have a valid exponent.
exitm <1>
endm ; getExp
Exponents are optional in a real constant. Therefore, the first thing this macro function does is check whether it has been passed an empty string. If so, it immediately returns success. Once again, the ifb <%parm>
directive must have the %
operator before the parm
argument.
If the parameter string is not empty, the first character in the string must be an E
or e
character. This function returns false if this is not the case. Checking for an E
or e
is done with the isE
helper function, whose implementation is the following (note the use of ifidni
, which is case-insensitive):
isE macro parm
local FirstChar
if @sizestr(%parm) eq 0
exitm <0>
endif
FirstChar substr parm, 1, 1
ifidni FirstChar, <e>
exitm <1>
endif
exitm <0>
endm
Next, the getExp
function looks for an optional sign character. If it encounters one, it deletes the sign character from the beginning of the string.
At least one decimal digit, and at most four decimal digits, must follow the e
or E
and sign characters. The remaining code in getExp
handles that.
Listing 13-9 is a demonstration of the macro snippets appearing throughout this section. Note that this is a pure compile-time program; all its activity takes place while MASM assembles this source code. It does not generate any executable machine code.
; Listing 13-9
; This is a compile-time program.
; It does not generate any executable code.
; Several useful macro functions:
; mout - Like echo, but allows "%" operators.
; testStr - Tests an operand to see if it
; is a string literal constant.
; @sizestr - Handles missing MASM function.
; isDigit - Tests first character of its
; argument to see if it's a decimal
; digit.
; isSign - Tests first character of its
; argument to see if it's a "+"
; or a "-" character.
; extract1st - Removes the first character
; from its argument (side effect)
; and returns that character as
; the function result.
; getReal - Parses the argument and returns
; true if it is a reasonable-
; looking real constant.
; Test strings and invocations for the
; getReal macro:
Note: actual macro code appears in previous code snippets
and has been removed from this listing for brevity
mant1 textequ <1>
mant2 textequ <.2>
mant3 textequ <3.4>
rv4 textequ <1e1>
rv5 textequ <1.e1>
rv6 textequ <1.0e1>
rv7 textequ <1.0e + 1>
rv8 textequ <1.0e - 1>
rv9 textequ <1.0e12>
rva textequ <1.0e1234>
rvb textequ <1.0E123>
rvc textequ <1.0E + 1234>
rvd textequ <1.0E - 1234>
rve textequ <-1.0E - 1234>
rvf textequ <+1.0E - 1234>
badr1 textequ <>
badr2 textequ <a>
badr3 textequ <1.1.0>
badr4 textequ <e1>
badr5 textequ <1ea1>
badr6 textequ <1e1a>
% echo get_Real(mant1) = getReal(mant1)
% echo get_Real(mant2) = getReal(mant2)
% echo get_Real(mant3) = getReal(mant3)
% echo get_Real(rv4) = getReal(rv4)
% echo get_Real(rv5) = getReal(rv5)
% echo get_Real(rv6) = getReal(rv6)
% echo get_Real(rv7) = getReal(rv7)
% echo get_Real(rv8) = getReal(rv8)
% echo get_Real(rv9) = getReal(rv9)
% echo get_Real(rva) = getReal(rva)
% echo get_Real(rvb) = getReal(rvb)
% echo get_Real(rvc) = getReal(rvc)
% echo get_Real(rvd) = getReal(rvd)
% echo get_Real(rve) = getReal(rve)
% echo get_Real(rvf) = getReal(rvf)
% echo get_Real(badr1) = getReal(badr1)
% echo get_Real(badr2) = getReal(badr2)
% echo get_Real(badr3) = getReal(badr3)
% echo get_Real(badr4) = getReal(badr4)
% echo get_Real(badr5) = getReal(badr5)
% echo get_Real(badr5) = getReal(badr5)
end
Listing 13-9: Compile-time program with test code for getReal
macro
Here’s the build command and (compile-time) program output:
C:\>ml64 /c listing13-9.asm
Microsoft (R) Macro Assembler (x64) Version 14.15.26730.0
Copyright (C) Microsoft Corporation. All rights reserved.
Assembling: listing13-9.asm
get_Real(1) = 1
get_Real(.2) = 1
get_Real(3.4) = 1
get_Real(1e1) = 1
get_Real(1.e1) = 1
get_Real(1.0e1) = 1
get_Real(1.0e + 1) = 1
get_Real(1.0e - 1) = 1
get_Real(1.0e12) = 1
get_Real(1.0e1234) = 1
get_Real(1.0E123) = 1
get_Real(1.0E + 1234) = 1
get_Real(1.0E - 1234) = 1
get_Real(-1.0E - 1234) = 1
get_Real(+1.0E - 1234) = 1
get_Real() = 0
get_Real(a) = 0
get_Real(1.1.0) = 0
get_Real(e1) = 0
get_Real(1ea1) = 0
get_Real(1ea1) = 0
Although the opattr
operator provides a bit to tell you that its operand is an x86-64 register, that’s the only information opattr
provides. In particular, opattr
’s return value won’t tell you which register it has seen; whether it’s a general-purpose, XMM, YMM, ZMM, MM, ST, or other register; or the size of that register. Fortunately, with a little work on your part, you can determine all this information by using MASM’s conditional assembly statements and other operators.
To begin with, here’s a simple macro function, isReg
, that returns 1 or 0 depending on whether its operand is a register. This is a simple shell around the opattr
operator that returns the setting of bit 4:
isReg macro parm
local result
result textequ %(((opattr &parm) and 10h) eq 10h)
exitm <&result>
endm
While this function provides some convenience, it doesn’t really provide any information that the opattr
operator already provides. We want to know what register appears in the operand as well as the size of that register.
Listing 13-10 (available online at http://artofasm.randallhyde.com/) provides a wide range of useful macro functions and equates for processing register operands in your own macros. The following paragraphs describe some of the more useful equates and macros.
Listing 13-10 contains a set of equates that map register names to numeric values. These equates use symbols of the form reg
XXX, where XXX is the register name (all uppercase). Examples include the following: regAL
, regSIL
, regR8B
, regAX
, regBP
, regR8W
, regEAX
, regEBP
, regR8D
, regRAX
, regRSI
, regR15
, regST
, regST0
, regMM0
, regXMM0
, and regYMM0
.
There is also a special equate for the symbol regNone
that represents a non-register entity. These equates give numeric values in the range 1 to 117 to each of these symbols (regNone
is given the value 0).
The purpose behind all these equates (and, in general, assigning numeric values to registers) is to make it easier to test for specific registers (or ranges of registers) within your macros by using conditional assembly.
A useful set of macros appearing in Listing 13-10 converts textual forms of the register names (that is, AL, AX, EAX, RAX, and so forth) to their numeric form (regAL
, regAX
, regEAX
, regRAX
, and so on). The most generic macro function to do this is whichReg(
register)
. This function accepts a text object and returns the appropriate reg
XXX value for that text. If the text passed as an argument is not one of the valid general-purpose, FPU, MMX, XMM, or YMM registers, whichReg
returns the value regNone
. Here are some examples of calls to whichReg
:
alVal = whichReg(al)
axTxt textequ <ax>
axVal = whichReg(axTxt)
aMac macro parameter
local regVal
regVal = whichReg(parameter)
if regVal eq regNone
.err <Expected a register argument>
exitm
endif
.
.
.
endm
The whichReg
macro function accepts any of the x86-64 general-purpose, FPU, MMX, XMM, or YMM registers. In many situations, you might want to limit the set of registers to a particular subset of these. Therefore, Listing 13-11 (also available online at http://artofasm.randallhyde.com/) provides the following macro functions:
isGPReg(
text)
Returns a nonzero register value for any of the general-purpose (8-, 16-, 32-, or 64-bit) registers. Returns regNone
(0) if the argument is not one of these registers.is8BitReg(
text)
Returns a nonzero register value for any of the general-purpose 8-bit registers. Otherwise, it returns regNone
(0).is16BitReg(
text)
Returns a nonzero register value for any of the general-purpose 16-bit registers. Otherwise, it returns regNone
(0).is32BitReg(
text)
Returns a nonzero register value for any of the general-purpose 32-bit registers. Otherwise, it returns regNone
(0).is64BitReg(
text)
Returns a nonzero register value for any of the general-purpose 64-bit registers. Otherwise, it returns regNone
(0).isFPReg(
text)
Returns a nonzero register value for any of the FPU registers (ST
, and ST(0)
to ST(7)
). Otherwise, it returns regNone
(0).isMMReg(
text)
Returns a nonzero register value for any of the MMX registers (MM0
to MM7
). Otherwise, it returns regNone
(0).isXMMReg
(
text)
Returns a nonzero register value for any of the XMM registers (XMM0
to XMM15
). Otherwise, it returns regNone
(0).isYMMReg
(
text)
Returns a nonzero register value for any of the YMM registers (YMM0
to YMM15
). Otherwise, it returns regNone
(0).If you need other register classifications, it’s easy to write your own macro functions to return an appropriate value. For example, if you want to test whether a particular register is one of the Windows ABI parameter registers (RCX, RDX, R8, or R9), you could create a macro function like the following:
isWinParm macro theReg
local regVal, isParm
regVal = whichReg(theReg)
isParm = (regVal eq regRCX) or (regVal eq regRDX)
isParm = isParm or (regVal eq regR8)
isParm = isParm or (regVal eq regR9)
if isParm
exitm <%regVal>
endif
exitm <%regNone>
endm
If you’ve converted a register in text form to its numeric value, at some point you might need to convert that numeric value back to text so you can use that register as part of an instruction. The toReg(
reg_num)
macro in Listing 13-10 accomplishes this. If you supply it a value in the range 1 to 117 (the numeric values for the registers), this macro will return the text that corresponds to that register value. For example:
mov toReg(1), 0 ; Equivalent to mov al, 0
(Note that regAL
= 1.)
If you pass regNone
to the toReg
macro, toReg
returns an empty string. Any value outside the range 0 to 117 will produce an undefined symbol error message.
When working in macros, where you’ve passed a register as an argument, you may find that you need to convert that register to a larger size (for example, convert AL to AX, EAX, or RAX; convert AX to EAX or RAX; or convert EAX to RAX). Listing 13-11 provides several macros to do the up conversion. These macro functions accept a register number as their parameter input and produce a textual result holding the actual register name:
reg8To16
Converts an 8-bit general-purpose register to its 16-bit equivalent8reg8To32
Converts an 8-bit general-purpose register to its 32-bit equivalent reg8To64
Converts an 8-bit general-purpose register to its 64-bit equivalent reg16To32
Converts a 16-bit general-purpose register to its 32-bit equivalentreg16To64
Converts a 16-bit general-purpose register to its 64-bit equivalent reg32To64
Converts a 32-bit general-purpose register to its 64-bit equivalentAnother useful macro function in Listing 13-10 is the regSize(
reg_value)
macro. This function returns the size (in bytes) of the register value passed as an argument. Here are some example calls:
alSize = regSize(regAL) ; Returns 1
axSize = regSize(regAX) ; Returns 2
eaxSize = regSize(regEAX) ; Returns 4
raxSize = regSize(regRAX) ; Returns 8
stSize = regSize(regST0) ; Returns 10
mmSize = regSize(regMM0) ; Returns 8
xmmSize = regSize(regXMM0) ; Returns 16
ymmSize = regSize(regYMM0) ; Returns 32
The macros and equates in Listing 13-10 come in handy when you are writing macros to handle generic code. For example, suppose you want to create a putInt
macro that accepts an arbitrary 8-, 16-, or 32-bit register operand and that will print that register’s value as an integer. You would like to be able to pass any arbitrary (general-purpose) register and sign-extend it, if necessary, before printing. Listing 13-12 is one possible implementation of this macro.
; Listing 13-12
; Demonstration of putInt macro.
; putInt - This macro expects an 8-, 16-, or 32-bit
; general-purpose register argument. It will
; print the value of that register as an
; integer.
putInt macro theReg
local regVal, sz
regVal = isGPReg(theReg)
; Before we do anything else, make sure
; we were actually passed a register:
if regVal eq regNone
.err <Expected a register>
endif
; Get the size of the register so we can
; determine if we need to sign-extend its
; value:
sz = regSize(regVal)
; If it was a 64-bit register, report an
; error:
if sz gt 4
.err 64-bit register not allowed
endif
; If it's a 1- or 2-byte register, we will need
; to sign-extend the value into EDX:
if (sz eq 1) or (sz eq 2)
movsx edx, theReg
; If it's a 32-bit register, but is not EDX, we need
; to move it into EDX (don't bother emitting
; the instruction if the register is EDX;
; the data is already where we want it):
elseif regVal ne regEDX
mov edx, theReg
endif
; Print the value in EDX as an integer:
call print
byte "%d", 0
endm
option casemap:none
nl = 10
.const
ttlStr byte "Listing 13-12", 0
Note: several thousand lines of code omitted here
for brevity. This includes most of the text from
Listing 13-11 plus the putInt macro
.code
include getTitle.inc
include print.inc
public asmMain
asmMain proc
push rbx
push rbp
mov rbp, rsp
sub rsp, 56 ; Shadow storage
call print
byte "Value 1:", 0
mov al, 55
putInt al
call print
byte nl, "Value 2:", 0
mov cx, 1234
putInt cx
call print
byte nl, "Value 3:", 0
mov ebx, 12345678
putInt ebx
call print
byte nl, "Value 4:", 0
mov edx, 1
putInt edx
call print
byte nl, 0
allDone: leave
pop rbx
ret ; Returns to caller
asmMain endp
end
Listing 13-12: putInt
macro function test program
Here’s the build command and sample output for Listing 13-12:
C:\>build listing13-12
C:\>echo off
Assembling: listing13-12.asm
c.cpp
C:\>listing13-11
Calling Listing 13-12:
Value 1:55
Value 2:1234
Value 3:12345678
Value 4:1
Listing 13-12 terminated
Though Listing 13-12 is a relatively simple example, it should give you a good idea of how you could make use of the macros in Listing 13-10.
A compile-time constant array is an array that exists only at compile time—data for the array does not exist at runtime. Sadly, MASM doesn’t provide direct support for this useful CTL data type. Fortunately, it’s possible to use other MASM CTL features to simulate compile-time arrays.
This section considers two ways to simulate compile-time arrays: text strings and a list of equates (one equate per array element). The list of equates is probably the easiest implementation, so this section considers that first.
In Listing 13-11 (available online), a very useful function converts all the text in a string to uppercase (toUpper
). The register macros use this macro to convert register names to uppercase characters (so that register name comparisons are case-insensitive). The toUpper
macro is relatively straightforward. It extracts each character of a string and checks whether that character’s value is in the range a
to z
, and if it is, it uses that character’s value as an index into an array (indexed from a
to z
) to extract the corresponding array element value (which will have the values A
to Z
for each element of the array). Here’s the toUpper
macro:
; toUpper - Converts alphabetic characters to uppercase
; in a text string.
toUpper macro lcStr
local result
; Build the result string in "result":
result textequ <>
; For each character in the source string,
; convert it to uppercase.
forc eachChar, <lcStr>
; See if we have a lowercase character:
if ('&eachChar' ge 'a') and ('&eachChar' le 'z')
; If lowercase, convert it to the symbol "lc_*" where "*"
; is the lowercase character. The equates below will map
; this character to uppercase:
eachChar catstr <lc_>,<eachChar>
result catstr result, &eachChar
else
; If it wasn't a lowercase character, just append it
; to the end of the string:
result catstr result, <eachChar>
endif
endm ; forc
exitm result ; Return result string
endm ; toUpper
The “magic” statements, which handle the array access, are these two statements:
eachChar catstr <lc_>,<eachChar>
result catstr result, &eachChar
The eachChar catstr
operation produces a string of the form lc_a, lc_b,
. . . , lc_z
whenever this macro encounters a lowercase character. The result catstr
operation expands a label of the form lc_a
, . . . , to its value and concatenates the result to the end of the result
string (which is a register name). Immediately after the toUpper
macro in Listing 13-11, you will find the following equates:
lc_a textequ <A>
lc_b textequ <B>
lc_c textequ <C>
lc_d textequ <D>
lc_e textequ <E>
lc_f textequ <F>
lc_g textequ <G>
lc_h textequ <H>
lc_i textequ <I>
lc_j textequ <J>
lc_k textequ <K>
lc_l textequ <L>
lc_m textequ <M>
lc_n textequ <N>
lc_o textequ <O>
lc_p textequ <P>
lc_q textequ <Q>
lc_r textequ <R>
lc_s textequ <S>
lc_t textequ <T>
lc_u textequ <U>
lc_v textequ <V>
lc_w textequ <W>
lc_x textequ <X>
lc_y textequ <Y>
lc_z textequ <Z>
Therefore, lc_a
will expand to the character A
, lc_b
will expand to the character B
, and so forth. This sequence of equates forms the lookup table (array) that toUpper
uses. The array should be called lc_
, and the index into the array is the suffix of the array’s name (a
to z
). The toUpper
macro accesses element lc_[
character]
by appending character to lc_
and then expanding the text equate lc_
character (expansion happens by applying the &
operator to the eachChar
string the macro produces).
Note the following two things. First, the array index doesn’t have to be an integer (or ordinal) value. Any arbitrary string of characters will suffice.9 Second, if you supply an index that isn’t within bounds (a
to z
), the toUpper
macro will attempt to expand a symbol of the form lc_
xxxx that results in an undefined identifier. Therefore, MASM will report an undefined symbol error should you attempt to supply an index that is not within range. This will not be an issue for the toUpper
macro because toUpper
validates the index (using a conditional if
statement) prior to constructing the lc_
xxxx symbol.
Listing 13-11 also provides an example of another way to implement a compile-time array: using a text string to hold array elements and using substr
to extract elements of the array from that string. The is
XXBitReg
macros (is8BitReg
, is16BitReg
, and so forth) pass along a couple of arrays of data to the more generic lookupReg
macro. Here’s the is16BitReg
macro:10
all16Regs catstr <AX>,
<BX>,
<CX>,
<DX>,
<SI>,
<DI>,
<BP>,
<SP>,
<R8W>,
<R10W>,
<R11W>,
<R12W>,
<R13W>,
<R14W>,
<R15W>
all16Lens catstr <2>, <0>, ; AX
<2>, <0>, ; BX
<2>, <0>, ; CX
<2>, <0>, ; DX
<2>, <0>, ; SI
<2>, <0>, ; DI
<2>, <0>, ; BP
<2>, <0>, ; SP
<3>, <0>, <0>, ; R8W
<3>, <0>, <0>, ; R9W
<4>, <0>, <0>, <0>, ; R10W
<4>, <0>, <0>, <0>, ; R11W
<4>, <0>, <0>, <0>, ; R12W
<4>, <0>, <0>, <0>, ; R13W
<4>, <0>, <0>, <0>, ; R14W
<4>, <0>, <0>, <0> ; R15W
is16BitReg macro parm
exitm lookupReg(parm, all16Regs, all16Lens)
endm ; is16BitReg
The all16Regs
string is a list of register names (all concatenated together into one string). The lookupReg
macro will search for a user-supplied register (parm
) in this string of register names by using the MASM instr
directive. If instr
does not find the register in the list of names, parm
is not a valid 16-bit register and instr
returns the value 0. If it does locate the string held by parm
in all16Regs
, then instr
returns the (nonzero) index into all16Regs
where the match occurs. By itself, a nonzero index does not indicate that lookupReg
has found a valid 16-bit register. For example, if the user supplies PR
as a register name, the instr
directive will return a nonzero index into the all16Regs
string (the index of the last character of the SP
register, with the R
coming from the first character of the R8W
register name). Likewise, if the caller passes the string R8
to is16BitReg
, the instr
directive will return the index to the first character of the R8W
entry, but R8 is not a valid 16-bit register.
Although instr
can reject a register name (by returning 0), additional validation is necessary if instr
returns a nonzero value; this is where the all16Lens
array comes in. The lookupReg
macro uses the index that instr
returns as an index into the all16Lens
array. If the entry is 0, the index into all16Regs
is not a valid register index (it’s an index to a string that is not at the start of a register name). If the index into all16Lens
points at a nonzero value, lookupReg
compares this value against the length of the parm
string. If they are equal, parm
holds an actual 16-bit register name; if they are not equal, parm
is too long or too short and is not a valid 16-bit register name. Here’s the full lookupReg
macro:
; lookupReg - Given a (suspected) register and a lookup table, convert
; that register to the corresponding numeric form.
lookupReg macro theReg, regList, regIndex
local regUpper, regConst, inst, regLen, indexLen
; Convert (possible) register to uppercase:
regUpper textequ toUpper(theReg)
regLen sizestr <&theReg>
; Does it exist in regList? If not, it's not a register.
inst instr 1, regList, ®Upper
if inst ne 0
regConst substr ®Index, inst, 1
if ®Const eq regLen
; It's a register (in text form). Create an identifier of
; the form "regXX" where "XX" represents the register name.
regConst catStr <reg>,regUpper
ifdef ®Const
; Return "regXX" as function result. This is the numeric value
; for the register.
exitm regConst
endif
endif
endif
; If the parameter string wasn't in regList, then return
; "regNone" as the function result:
exitm <regNone>
endm ; lookupReg
Note that lookupReg
also uses the register value constants (regNone
, regAL
, regBL
, and so on) as an associative compile-time array (see the regConst
definitions).
One advanced use of macros is to have a macro invocation create one or more new macros. If you nest a macro declaration inside another macro, invoking that (enclosing) macro will expand the enclosed macro definition and define that macro at that point. Of course, if you invoke the outside (enclosing) macro more than once, you could wind up with a duplicate macro definition unless you take care in the construction of the new macro (that is, by assigning it a new name with each new invocation of the outside macro). In a few cases, being able to generate macros on the fly can be useful.
Consider the compile-time array examples from the previous section. If you want to create a compile-time array by using the multiple equates method, you will have to manually define equates for all the array elements before you can use that array. This can be tedious, especially if the array has a large number of elements. Fortunately, it’s easy to create a macro to automate this process for you.
The following macro declaration accepts two arguments: the name of an array to create and the number of elements to put into the array. This macro generates a list of definitions (using the =
directive, rather than the textequ
directive) with each element initialized to 0:
genArray macro arrayName, elements
local index, eleName, getName
; Loop over each element of the array:
index = 0
while index lt &elements
; Generate a textequ statement to define a single
; element of the array, for example:
; aryXX = 0
; where "XX" is the index (0 to (elements - 1)).
eleName catstr <&arrayName>,%index,< = 0>
; Expand the text just created with the catstr directive.
eleName
; Move on to next array index:
index = index + 1
endm ; while
endm ; genArray
For example, the following macro invocation creates 10 array elements, named ary0
to ary9
:
genArray ary, 10
You can access the array elements directly by using the names ary0
, ary1
, ary2
, . . . , ary9
. If you want to access these array elements programmatically (perhaps in a compile-time while
loop), you would have to use the catstr
directive to create a text equate that has the array name (ary
) concatenated with the index. Wouldn’t it be more convenient to have a macro function that creates this text equate for you? It’s easy enough to write a macro that does this:
ary_get macro index
local element
element catstr <ary>,%index
exitm <element>
endm
With this macro, you can easily access elements of the ary
array by using the macro invocation ary_get(
index)
. You could also write a macro to store a value into a specified element of the ary
array:
ary_set macro index, value
local assign
assign catstr <ary>, %index, < = >, %value
assign
endm
These two macros are so useful, you’d probably want to include them with each array you create with the genArray
macro. So why not have the genArray macro write these macros for you? Listing 13-13 provides an implementation of
genArray
that does exactly this.
; Listing 13-13
; This is a compile-time program.
; It does not generate any executable code.
option casemap:none
genArray macro arrayName, elements
local index, eleName, getName
; Loop over each element of the array:
index = 0
while index lt &elements
; Generate a textequ statement to define a single
; element of the array, for example:
; aryXX = 0
; where "XX" is the index (0 to (elements - 1)).
eleName catstr <&arrayName>,%index,< = 0>
; Expand the text just created with the catstr directive:
eleName
; Move on to next array index:
index = index + 1
endm ; while
; Create a macro function to retrieve a value from
; the array:
getName catstr <&arrayName>,<_get>
getName macro theIndex
local element
element catstr <&arrayName>,%theIndex
exitm <element>
endm
; Create a macro to assign a value to
; an array element.
setName catstr <&arrayName>,<_set>
setName macro theIndex, theValue
local assign
assign catstr <&arrayName>, %theIndex, < = >, %theValue
assign
endm
endm ; genArray
; mout - Replacement for echo. Allows "%" operator
; in operand field to expand text symbols.
mout macro valToPrint
local cmd
cmd catstr <echo >, <valToPrint>
cmd
endm
; Create an array ("ary") with ten elements:
genArray ary, 10
; Initialize each element of the array to
; its index value:
index = 0
while index lt 10
ary_set index, index
index = index + 1
endm
; Print out the array values:
index = 0
while index lt 10
value = ary_get(index)
mout ary[%index] = %value
index = index + 1
endm
end
Listing 13-13: A macro that writes another pair of macros
Here’s the build command and sample output for the compile-time program in Listing 13-13:
C:\>ml64 /c /Fl listing13-13.asm
Microsoft (R) Macro Assembler (x64) Version 14.15.26730.0
Copyright (C) Microsoft Corporation. All rights reserved.
Assembling: listing13-13.asm
ary[0] = 0
ary[1] = 1
ary[2] = 2
ary[3] = 3
ary[4] = 4
ary[5] = 5
ary[6] = 6
ary[7] = 7
ary[8] = 8
ary[9] = 9
When writing compile-time programs, keep in mind that MASM is interpreting these programs during assembly. This can have a huge impact on the time it takes MASM to assemble your source files. Indeed, it is quite possible to create infinite loops that will cause MASM to (seemingly) hang up during assembly. Consider the following trivial example:
true = 1
while true
endm
Any attempt to assemble a MASM source file containing this sequence will lock up the system until you press ctrl-C (or use another mechanism to abort the assembly process).
Even without infinite loops, it is easy to create macros that take a considerable amount of time to process. If you use such macros hundreds (or even thousands) of times in a source file (as is common for some complex print-type macros), it could take a while for MASM to process your source files. Be aware of this (and be patient if MASM seems to hang up—it could simply be your compile-time programs taking a while to do their job).
If you think a compile-time program has entered an infinite loop, the echo
directive (or macros like mout
, appearing throughout this chapter) can help you track down the infinite loop (or other bugs) in your compile-time programs.
Although this chapter has spent a considerable amount of time describing various features of MASM’s macro support and compile-time language features, the truth is this chapter has barely described what’s possible with MASM. Sadly, Microsoft’s documentation all but ignores the macro facilities of MASM. Probably the best place to learn about advanced macro programming with MASM is the MASM32 forum at http://www.masm32.com/board/index.php.
Although it is an older book, covering MASM version 6, The Waite Group’s Microsoft Macro Assembler Bible by Nabajyoti Barkakati and this author (Sams, 1992) does go into detail about the use of MASM’s macro facilities (as well as other directives that are poorly documented these days). Also, the MASM 6.x manual can still be found online at various sites. While this manual is woefully outdated with respect to the latest versions of MASM (it does not, for example, cover any of the 64-bit instructions or addressing modes), it does a decent job of describing MASM’s macro facilities and many of MASM’s directives. Just keep in mind when reading the older documentation that Microsoft has disabled many features that used to be present in MASM.
%
operator do?&
operator do? catstr
directive do?instr
directive do?sizestr
directive do?substr
directive do?:req
suffix).