13
Macros and the MASM Compile-Time Language

This chapter discusses the MASM compile-time language, including the very important macro expansion facilities. A macro is an identifier that the assembler will expand into additional text (often many lines of text), allowing you to abbreviate large amounts of code with a single identifier. MASM’s macro facility is actually a computer language inside a computer language; that is, you can write short little programs inside a MASM source file whose purpose is to generate other MASM source code to be assembled by MASM.

This language inside a language, also known as a compile-time language, consists of macros (the compile-time language equivalent of a procedure), conditionals (if statements), loops, and other statements. This chapter covers many of the MASM compile-time language features and shows how you can use them to reduce the effort needed to write assembly language code.

13.1 Introduction to the Compile-Time Language

MASM is actually two languages rolled into a single program. The runtime language is the standard x86-64/MASM assembly language you’ve been reading about in all the previous chapters. This is called the runtime language because the programs you write execute when you run the executable file. MASM contains an interpreter for a second language, the MASM compile-time language (CTL). MASM source files contain instructions for both the MASM CTL and the runtime program, and MASM executes the CTL program during assembly (compilation). Once MASM completes assembly, the CTL program terminates (see Figure 13-1).

f13001 — Figure 13-1: Compile-time versus runtime execution

The CTL application is not a part of the runtime executable that MASM emits, although the CTL application can write part of the runtime program for you, and, in fact, this is the major purpose of the CTL. Using automatic code generation, the CTL gives you the ability to easily and elegantly emit repetitive code. By learning how to use the MASM CTL and applying it properly, you can develop assembly language applications as rapidly as high-level language applications (even faster because MASM’s CTL lets you create very high-level-language constructs).

13.2 The echo and .err Directives

You may recall that Chapter 1 began with the typical first program most people write when learning a new language, the “Hello, world!” program. Listing 13-1 provides the basic “Hello, world!” program written in the MASM compile-time language.

; Listing 13-1
 
; CTL "Hello, world!" program.

echo    Listing 13-1: Hello, world!
end

Listing 13-1: The CTL “Hello, world!” program

The only CTL statement in this program is the echo statement.¹ The end statement is needed just to keep MASM happy.

The echo statement displays the textual representation of its argument list during the assembly of a MASM program. Therefore, if you compile the preceding program with the command

ml64 /c listing13-1.asm

the MASM assembler will immediately print the following text:

Listing 13-1: Hello, world!

Other than displaying the text associated with the echo parameter list, the echo statement has no effect on the assembly of the program. It is invaluable for debugging CTL programs, displaying the progress of the assembly, and displaying assumptions and default actions that take place during assembly.

Though assembly language calls to print also emit text to the standard output, there is a big difference between the following two groups of statements in a MASM source file:

echo "Hello World"

call print
byte "Hello World", nl,0

The first statement prints "Hello World" (and a newline) during the assembly process and has no effect on the executable program. The last two lines don’t affect the assembly process (other than the emission of code to the executable file). However, when you run the executable file, the second set of statements prints the string Hello World followed by a newline sequence.

The .err directive, like echo, will display a string to the console during assembly, though this must be a text string (delimited by < and >). The .err statement displays the text as part of a MASM error diagnostic. Furthermore, the .err statement increments the error count, and this will cause MASM to stop the assembly (without assembling or linking) after processing the current source file. You would normally use the .err statement to display an error message during assembly if your CTL code discovers something that prevents it from creating valid code. For example:

.err <Statement must have exactly one operand>

13.3 Compile-Time Constants and Variables

Just as the runtime language does, the compile-time language supports constants and variables. You declare compile-time constants by using the textequ or equ directives. You declare compile-time variables by using the = directive (compile-time assignment statement). For example:

inc_by equ 1
ctlVar = 0
ctlVar = ctlVar + inc_by

13.4 Compile-Time Expressions and Operators

The MASM CTL supports constant expressions in the CTL assignment statement. See “MASM Constant Declarations” in Chapter 4 for a discussion of constant expressions (which are also the CTL expressions and operators).

In addition to the operators and functions appearing in that chapter, MASM includes several additional CTL operators, functions, and directives you will find useful. The following subsections describe these.

13.4.1 The MASM Escape (!) Operator

The first operator is the ! operator. When placed in front of another symbol, this operator tells MASM to treat that character as text rather than as a special symbol. For example, !; creates a text constant consisting of the semicolon character, rather than a comment that causes MASM to ignore all text after the ; symbol (for C/C++ programmers, this is similar to the backslash escape character, \, in a string constant).

13.4.2 The MASM Evaluation (%) Operator

The second useful CTL operator is %. The percent operator causes MASM to evaluate the expression following it and replace that expression with its value. For example, consider the following code sequence:

num10   =        10
text10  textequ  <10>
tn11    textequ  %num10 + 1

If you assemble this sequence in an assembly language source file and direct MASM to produce an assembly listing, it will report the following for these three symbols:

num10  . . . . . . . . . . . . .        Number   0000000Ah
text10 . . . . . . . . . . . . .        Text     10
tn11 . . . . . . . . . . . . . .        Text     11

The num10 is properly reported as a numeric value (decimal 10), text10 as a text symbol (containing the string 10), and tn11 as a text symbol (as you would expect, because this code sequence uses the textequ directive to define it). However, rather than containing the string %num10 + 1, MASM evaluates the expression num10 + 1 to produce the numeric value 11, which MASM then converts to text data. (By the way, to put a percent sign in a text string, use the text sequence <!%>.)

If you place the % operator in the first column of a source line, MASM will translate all numeric expressions on that line to textual form. This is handy with the echo directive. It causes echo to display the value of numeric equates rather than simply displaying the equate names.

13.4.3 The catstr Directive

The catstr function has the following syntax:

identifier   catstr  string1, string2, ...

The identifier is an (up to this point) undefined symbol. The string1 and string2 operands are textual data surrounded by < and > symbols. This statement stores the concatenation of the two strings into identifier. Note that identifier is a text object, not a string object. If you specify the identifier in your code, MASM will substitute the text string for the identifier and try to process that text data as though it were part of your source code input.

The catstr statement allows two or more operands separated by commas. The catstr directive will concatenate the text values in the order they appear in the operand field. The following statement generates the textual data Hello, World!:

helloWorld catstr <Hello>, <, >, <World!!>

Two exclamation marks are necessary in this example, because ! is an operator telling MASM to treat the next symbol as text rather than as an operator. With only one ! symbol, MASM thinks that you’re attempting to include a > symbol as part of the string and reports an error (because there is no closing >). Putting !! in the text string tells MASM to treat the second ! symbol as a text character.

13.4.4 The instr Directive

The instr directive searches for the presence of one string within another. The syntax for the directive is

identifier  instr  start, source, search

where identifier is a symbol into which MASM will put the offset of the search string within the source string. The search begins at position start within source. Unconventionally, the first character in source has the position 1 (not 0). The following example searches for World within the string Hello World (starting at character position 1, which is the index of the H character):

WorldPosn  instr 1, <Hello World>, <World>

This statement defines WorldPosn as a number with the value 7 (as the string World is at position 7 in Hello World if you start counting from position 1).

13.4.5 The sizestr Directive

The sizestr directive computes the length of a string.² The syntax for the directive is

identifier  sizestr  string

where identifier is the symbol into which MASM will store the string’s length, and string is the string literal whose length this directive computes. As an example,

hwLen sizestr <Hello World>

defines the symbol hwLen as a number and sets it to the value 11.

13.4.6 The substr Directive

The substr directive extracts a substring from a larger string. The syntax for this directive is

identifier substr source, start, len

where identifier is the symbol that MASM will create (type TEXT, initialized with the substring characters), source is the source string from which MASM will extract the substring, start is the starting position in the string to begin the extraction, and len is the length of the substring to extract. The len operand is optional; if it is absent, MASM will assume you want to use the remainder of the string (starting at position start) for the substring. Here’s an example that extracts Hello from the string Hello World:

hString substr <Hello World>, 1, 5

13.5 Conditional Assembly (Compile-Time Decisions)

MASM’s compile-time language provides an if statement, if, that lets you make decisions at assembly time. The if statement has two main purposes. The traditional use of if is to support conditional assembly, allowing you to include or exclude code during an assembly, depending on the status of various symbols or constant values in your program. The second use is to support the standard if-statement decision-making process in the MASM compile-time language. This section discusses these two uses for the MASM if statement.

The simplest form of the MASM compile-time if statement uses the following syntax:

if constant_boolean_expression  
      Text  
endif

At compile time, MASM evaluates the expression after the if. This must be a constant expression that evaluates to an integer value. If the expression evaluates to true (nonzero), MASM continues to process the text in the source file as though the if statement were not present. However, if the expression evaluates to false (zero), MASM treats all the text between the if and the corresponding endif clause as though it were a comment (that is, it ignores this text), as shown in Figure 13-2.

f13002 — Figure 13-2: Operation of a MASM compile-time `if` statement

The identifiers in a compile-time expression must all be constant identifiers or a MASM compile-time function call (with appropriate parameters). Because MASM evaluates these expressions at assembly time, they cannot contain runtime variables.

The MASM if statement supports optional elseif and else clauses that behave in an intuitive fashion. The complete syntax for the if statement looks like the following:

if constant_boolean_expression1
      Text  
elseif constant_boolean_expression2
      Text  
else 
      Text  
endif

If the first Boolean expression evaluates to true, MASM processes the text up to the elseif clause. It then skips all text (that is, treats it like a comment) until it encounters the endif clause. MASM continues processing the text after the endif clause in the normal fashion.

If the first Boolean expression evaluates to false, MASM skips all the text until it encounters an elseif, else, or endif clause. If it encounters an elseif clause (as in the preceding example), MASM evaluates the Boolean expression associated with that clause. If it evaluates to true, MASM processes the text between the elseif and the else clauses (or to the endif clause if the else clause is not present). If, during the processing of this text, MASM encounters another elseif or, as in the preceding example, an else clause, then MASM ignores all further text until it finds the corresponding endif. If both the first and second Boolean expressions in the previous example evaluate to false, MASM skips their associated text and begins processing the text in the else clause.

You can create a nearly infinite variety of if statement sequences by including zero or more elseif clauses and optionally supplying the else clause.

A traditional use of conditional assembly is to develop software that you can easily configure for several environments. For example, the fcomip instruction makes floating-point comparisons easy, but this instruction is available only on Pentium Pro and later processors. To use this instruction on the processors that support it and fall back to the standard floating-point comparison on the older processors, most engineers use conditional assembly to embed the separate sequences in the same source file (instead of writing and maintaining two versions of the program). The following example demonstrates how to do this:

; Set true (1) to use FCOMIxx instrs.

PentProOrLater = 0
          . 
          . 
          . 
        if PentProOrLater

          fcomip st(0), st(1) ; Compare ST1 to ST0 and set flags

        else 

          fcomp               ; Compare ST1 to ST0
          fstsw ax            ; Move the FPU condition code bits
          sahf                ; into the FLAGS register

        endif

As currently written, this code fragment will compile the three-instruction sequence in the else clause and ignore the code between the if and else clauses (because the constant PentProOrLater is false). By changing the value of PentProOrLater to true, you can tell MASM to compile the single fcomip instruction rather than the three-instruction sequence.

Though you need to maintain only a single source file, conditional assembly does not let you create a single executable that runs efficiently on all processors. When using this technique, you will still have to create two executable programs (one for Pentium Pro and later processors, one for the earlier processors) by compiling your source file twice: during the first assembly, you must set the PentProOrLater constant to false; during the second assembly, you must set it to true.

If you are familiar with conditional assembly in other languages, such as C/C++, you may be wondering if MASM supports a statement like C’s #ifdef statement. The answer is yes, it does. Consider the following modification to the preceding code that uses this directive:

; Note: uncomment the following line if you are compiling this 
; code for a Pentium Pro or later CPU. 

; PentProOrLater = 0       ; Value and type are irrelevant
          . 
          . 
          . 
ifdef PentProOrLater 

     fcomip st(0), st(1)   ; Compare ST1 to ST0 and set flags

else 

     fcomp                 ; Compare ST1 to ST0
     fstsw ax              ; Move the FPU condition code bits
     sahf                  ; into the FLAGS register

endif

Another common use of conditional assembly is to introduce debugging and testing code into your programs. A typical debugging technique that many MASM programmers use is to insert print statements at strategic points throughout their code; this enables them to trace through their code and display important values at various checkpoints.

A big problem with this technique, however, is that they must remove the debugging code prior to completing the project. Two further problems are as follows:

Programmers often forget to remove some debugging statements, and this creates defects in the final program.
After removing a debugging statement, these programmers often discover that they need that same statement to debug a different problem at a later time. Hence, they are constantly inserting and removing the same statements over and over again.

Conditional assembly can provide a solution to this problem. By defining a symbol (say, debug) to control debugging output in your program, you can activate or deactivate all debugging output by modifying a single line of source code. The following code fragment demonstrates this:

; Set to true to activate debug output.

debug   =    0
          . 
          . 
          . 
     if debug

        echo *** DEBUG build

        mov  edx, i
        call print
        byte "At point A, i=%d", nl, 0 

     else

     echo *** RELEASE build

     endif

As long as you surround all debugging output statements with an if statement like the preceding one, you don’t have to worry about debugging output accidentally appearing in your final application. By setting the debug symbol to false, you can automatically disable all such output. Likewise, you don’t have to remove all your debugging statements from your programs after they’ve served their immediate purpose. By using conditional assembly, you can leave these statements in your code because they are so easy to deactivate. Later, if you decide you need to view this same debugging information during assembly, you can reactivate it by setting the debug symbol to true.

Although program configuration and debugging control are two of the more common, traditional uses for conditional assembly, don’t forget that the if statement provides the basic conditional statement in the MASM CTL. You will use the if statement in your compile-time programs the same way you would use an if statement in MASM or another language. Later sections in this chapter present lots of examples of using the if statement in this capacity.

13.6 Repetitive Assembly (Compile-Time Loops)

MASM’s while..endm, for..endm, and forc..endm statements provide compile-time loop constructs.³ The while statement tells MASM to process the same sequence of statements repetitively during assembly. This is handy for constructing data tables as well as providing a traditional looping structure for compile-time programs.

The while statement uses the following syntax:

while constant_boolean_expression
      Text  
endm

When MASM encounters the while statement during assembly, it evaluates the constant Boolean expression. If the expression evaluates to false, MASM will skip over the text between the while and the endm clauses (the behavior is similar to the if statement if the expression evaluates to false). If the expression evaluates to true, MASM will process the statements between the while and endm clauses and then “jump back” to the start of the while statement in the source file and repeat this process, as shown in Figure 13-3.

f13003 — Figure 13-3: MASM compile-time `while` statement operation

To understand how this process works, consider the program in Listing 13-2.

; Listing 13-2
 
; CTL while loop demonstration program.

        option  casemap:none

nl          =       10

            .const
ttlStr      byte    "Listing 13-2", 0
           
            .data
ary         dword   2, 3, 5, 8, 13

            include getTitle.inc
            include print.inc
            
            .code
            
; Here is the "asmMain" function.
        
            public  asmMain
asmMain     proc
            push    rbx
            push    rbp
            mov     rbp, rsp
            sub     rsp, 56           ; Shadow storage

i           =       0            
            while   i LT lengthof ary ; 5  

            mov     edx, i            ; This is a constant!
            mov     r8d, ary[i * 4]   ; Index is a constant
            call    print
            byte    "array[%d] = %d", nl, 0
              
i           =       i + 1
            endm 
             
allDone:    leave
            pop     rbx
            ret     ; Returns to caller
asmMain     endp
            end

Listing 13-2: while..endm demonstration

Here’s the build command and program output for Listing 13-2:

C:\>build listing13-2

C:\>echo off
 Assembling: listing13-2.asm
c.cpp

C:\>listing13-2
Calling Listing 13-2:
array[0] = 2
array[1] = 3
array[2] = 5
array[3] = 8
array[4] = 13
Listing 13-2 terminated

The while loop repeats five times during assembly. On each repetition of the loop, the MASM assembler processes the statements between the while and endm directives. Therefore, the preceding program is really equivalent to the code fragment shown in Listing 13-3.

.
.
.
mov     edx, 0          ; This is a constant!
mov     r8d, ary[0]     ; Index is a constant
call    print
byte    "array[%d] = %d", nl, 0

mov     edx, 1          ; This is a constant!
mov     r8d, ary[4]     ; Index is a constant
call    print
byte    "array[%d] = %d", nl, 0

mov     edx, 2          ; This is a constant!
mov     r8d, ary[8]     ; Index is a constant
call    print
byte    "array[%d] = %d", nl, 0

mov     edx, 3          ; This is a constant!
mov     r8d, ary[12]    ; Index is a constant
call    print
byte    "array[%d] = %d", nl, 0

mov     edx, 4          ; This is a constant!
mov     r8d, ary[16]    ; Index is a constant
call    print
byte    "array[%d] = %d", nl, 0

Listing 13-3: Program equivalent to the code in Listing 13-2

As you can see in this example, the while statement is convenient for constructing repetitive-code sequences, especially for unrolling loops.

MASM provides two forms of the for..endm loop. These two loops take the following general form:

for identifier, <arg1, arg2, ..., argn> 
  . 
  . 
  . 
endm 

forc identifier, <string>
  . 
  . 
  . 
endm

The first form of the for loop (plain for) repeats the code once for each of the arguments specified between the < and > brackets. On each repetition of the loop, it sets identifier to the text of the current argument: on the first iteration of the loop, identifier is set to arg1, and on the second iteration it is set to arg2, and so on, until the last iteration, when it is set to argn. For example, the following for loop will generate code that pushes the RAX, RBX, RCX, and RDX registers onto the stack:

for  reg, <rax, rbx, rcx, rdx>
push reg
endm

This for loop is equivalent to the following code:

push rax
push rbx
push rcx
push rdx

The forc compile-time loop repeats the body of its loop for each character appearing in the string specified by the second argument. For example, the following forc loop generates a hexadecimal byte value for each character in the string:

        forc   hex, <0123456789ABCDEF>
hexNum  catstr <0>,<hex>,<h>
        byte   hexNum
        endm

The for loop will turn out to be a lot more useful than forc. Nevertheless, forc is handy on occasion. Most of the time when you’re using these loops, you’ll be passing them a variable set of arguments rather than a fixed string. As you’ll soon see, these loops are handy for processing macro parameters.

13.7 Macros (Compile-Time Procedures)

Macros are objects that a language processor replaces with other text during compilation. Macros are great devices for replacing long, repetitive sequences of text with much shorter sequences of text. In addition to the traditional role that macros play (for example, #define in C/C++), MASM’s macros also serve as the equivalent of a compile-time language procedure or function.

Macros are one of MASM’s main features. The following sections explore MASM’s macro-processing facilities and the relationship between macros and other MASM CTL control constructs.

13.8 Standard Macros

MASM supports a straightforward macro facility that lets you define macros in a manner that is similar to declaring a procedure. A typical, simple macro declaration takes the following form:

macro_name macro arguments 
      Macro body
          endm

The following code is a concrete example of a macro declaration:

neg128 macro 

       neg rdx 
       neg rax 
       sbb rdx, 0 

       endm

Execution of this macro’s code will compute the two’s complement of the 128-bit value in RDX:RAX (see the description of extended-precision neg in “Extended-Precision Negation Operations” in Chapter 8).

To execute the code associated with neg128, you specify the macro’s name at the point you want to execute these instructions. For example:

mov    rax, qword ptr i128 
mov    rdx, qword ptr i128[8] 
neg128

This intentionally looks just like any other instruction; the original purpose of macros was to create synthetic instructions to simplify assembly language programming.

Though you don’t need to use a call instruction to invoke a macro, from the point of view of your program, invoking a macro executes a sequence of instructions just like calling a procedure. You could implement this simple macro as a procedure by using the following procedure declaration:

neg128p  proc 

         neg   rdx
         neg   rax
         sbb   rdx, 0
         ret

neg128p  endp

The following two statements will both negate the value in RDX:RAX:

neg128
call   neg128p

The difference between these two (the macro invocation versus the procedure call) is that macros expand their text inline, whereas a procedure call emits a call to the corresponding procedure elsewhere in the text. That is, MASM replaces the invocation neg128 directly with the following text:

neg  rdx
neg  rax
sbb  rdx, 0

On the other hand, MASM replaces the procedure call neg128p with the machine code for the call instruction:

call neg128p

You should choose macro versus procedure call based on efficiency. Macros are slightly faster than procedure calls because you don’t execute the call and corresponding ret instructions, but they can make your program larger because a macro invocation expands to the text of the macro’s body on each invocation. If the macro body is large and you invoke the macro several times throughout your program, it will make your final executable much larger. Also, if the body of your macro executes more than a few simple instructions, the overhead of a call and ret sequence has little impact on the overall execution time of the code, so the execution time savings are nearly negligible. On the other hand, if the body of a procedure is very short (like the preceding neg128 example), the macro implementation can be faster and doesn’t expand the size of your program by much. A good rule of thumb is as follows:

Use macros for short, time-critical program units. Use procedures for longer blocks of code and when execution time is not as critical.

Macros have many other disadvantages over procedures. Macros cannot have local (automatic) variables, macro parameters work differently than procedure parameters, macros don’t support (runtime) recursion, and macros are a little more difficult to debug than procedures (just to name a few disadvantages). Therefore, you shouldn’t really use macros as a substitute for procedures except when performance is absolutely critical.

13.9 Macro Parameters

Like procedures, macros allow you to define parameters that let you supply different data on each macro invocation, which lets you write generic macros whose behavior can vary depending on the parameters you supply. By processing these macro parameters at compile time, you can write sophisticated macros.

Macro parameter declaration syntax is straightforward. You supply a list of parameter names as the operands in a macro declaration:

neg128  macro reg64HO, reg64LO

        neg   reg64HO
        neg   reg64LO
        sbb   reg64HO, 0

        endm

When you invoke a macro, you supply the actual parameters as arguments to the macro invocation:

neg128  rdx, rax

13.9.1 Standard Macro Parameter Expansion

MASM automatically associates the type text with macro parameters. This means that during a macro expansion, MASM substitutes the text you supply as the actual parameter everywhere the formal parameter name appears. The semantics of pass by textual substitution are a little different from pass by value or pass by reference, so exploring those differences here is worthwhile.

Consider the following macro invocations, using the neg128 macro from the previous section:

neg128 rdx, rax
neg128 rbx, rcx

These two invocations expand into the following code:

; neg128 rdx, rax 

     neg rdx 
     neg rax 
     sbb rdx, 0

; neg128 rbx, rcx 

     neg rbx 
     neg rcx 
     sbb rbx, 0

Macro invocations do not make a local copy of the parameters (as pass by value does), nor do they pass the address of the actual parameter to the macro. Instead, a macro invocation of the form neg128 rdx, rax is equivalent to the following:

reg64HO  textequ <rdx> 
reg64LO  textequ <rax> 

         neg    reg64HO  
         neg    reg64LO  
         sbb    reg64HO, 0

The text objects immediately expand their string values inline, producing the former expansion for neg128 rdx, rax.

Macro parameters are not limited to memory, register, or constant operands as are instruction or procedure operands. Any text is fine as long as its expansion is legal wherever you use the formal parameter. Similarly, formal parameters may appear anywhere in the macro body, not just where memory, register, or constant operands are legal. Consider the following macro declaration and sample invocations that demonstrate how you can expand a formal parameter into a whole instruction:

chkError macro instr, jump, target

         instr 
         jump  target 

         endm

     chkError <cmp eax, 0>, jnl, RangeError  ; Example 1
          .
          .
          . 
     chkError <test bl, 1>, jnz, ParityError ; Example 2

; Example 1 expands to:

     cmp  eax, 0 
     jnl  RangeError 

; Example 2 expands to:

     test bl, 1 
     jnz  ParityError

We use the < and > brackets to treat the full cmp and test instructions as a single string (normally, the comma in these instructions would split them into two macro parameters).

In general, MASM assumes that all text between commas constitutes a single macro parameter. If MASM encounters any opening bracketing symbols (left parentheses, left braces, or left angle brackets), then it will include all text up to the appropriate closing symbol, ignoring any commas that may appear within the bracketing symbols. Of course, MASM does not consider commas (and bracketing symbols) within a string constant as the end of an actual parameter. So the following macro and invocation are perfectly legal:

_print macro strToPrint 

       call print
       byte strToPrint, nl, 0 

      endm 
       . 
       . 
       . 
      _print "Hello, world!"

MASM treats the string Hello, world! as a single parameter because the comma appears inside a literal string constant, just as your intuition suggests.

You can run into some issues when MASM expands your macro parameters, because parameters are expanded as text, not values. Consider the following macro declaration and invocation:

Echo2nTimes macro n, theStr
echoCnt     =     0
            while echoCnt LT n * 2

            call  print
            byte  theStr, nl, 0

echoCnt     =     echoCnt + 1
            endm
            endm
             . 
             . 
             . 
            Echo2nTimes  3 + 1, "Hello"

This example displays Hello five times during assembly rather than the eight times you might intuitively expect. This is because the preceding while statement expands to

while  echoCnt LT 3 + 1 * 2

The actual parameter for n is 3 + 1; because MASM expands this text directly in place of n, you get an erroneous text expansion. At compile time MASM computes 3 + 1 * 2 as the value 5 rather than as the value 8 (which you would get if the MASM passed this parameter by value rather than by textual substitution).

The common solution to this problem when passing numeric parameters that may contain compile-time expressions is to surround the formal parameter in the macro with parentheses; for example, you would rewrite the preceding macro as follows:

Echo2nTimes macro n, theStr
echoCnt     =     0
            while echoCnt LT (n) * 2

            call  print
            byte  theStr, nl, 0

echoCnt     =     echoCnt + 1
            endm  ; while
            endm  ; macro

Now, the invocation expands to the following code that produces the intuitive result:

while  echoCnt LT (3 + 1) * 2 
call   print
byte   theStr, nl, 0
endm

If you don’t have control over the macro definition (perhaps it’s part of a library module you use, and you can’t change the macro definition because doing so could break existing code), there is another solution to this problem: use the MASM % operator before the argument in the macro invocation so that the CTL interpreter evaluates the expression before expanding the parameters. For example:

Echo2nTimes  %3 + 1, "Hello"

This will cause MASM to properly generate eight calls to the print procedure (and associated data).

13.9.2 Optional and Required Macro Parameters

As a general rule, MASM treats macro arguments as optional arguments. If you define a macro that specifies two arguments and invoke that argument with only one argument, MASM will not (normally) complain about the invocation. Instead, it will simply substitute the empty string for the expansion of the second argument. In some cases, this is acceptable and possibly even desirable.

However, suppose you left off the second parameter in the neg128 macro given earlier. That would compile to a neg instruction with a missing operand and MASM would report an error; for example:

neg128      macro   arg1, arg2      ; Line 6
            neg     arg1            ; Line 7
            neg     arg2            ; Line 8
            sbb     arg1, 0         ; Line 9
            endm                    ; Line 10
                                    ; Line 11
            neg128  rdx             ; Line 12

Here’s the error that MASM reports:

listing14.asm(12) : error A2008:syntax error : in instruction
 neg128(2): Macro Called From
  listing14.asm(12): Main Line Code

The (12) is telling us that the error occurred on line 12 in the source file. The neg128(2) line is telling us that the error occurred on line 2 of the neg128 macro. It’s a bit difficult to see what is actually causing the problem here.

One solution is to use conditional assembly inside the macro to test for the presence of both parameters. At first, you might think you could use code like this:

neg128  macro reg64HO, reg64LO

        if   reg64LO eq <>
        .err <neg128 requires 2 operands>
        endif

        neg  reg64HO
        neg  reg64LO
        sbb  reg64O, 0
        endm
         .
         .
         .
        neg128 rdx

Unfortunately, this fails for a couple of reasons. First of all, the eq operator doesn’t work with text operands. MASM will expand the text operands before attempting to apply this operator, so the if statement in the preceding example effectively becomes

        if   eq

because MASM substitutes the empty string for both the operands around the eq operator. This, of course, generates a syntax error. Even if there were non-blank textual operands around the eq operator, this would still fail because eq expects numeric operands. MASM solves this issue by introducing several additional conditional if statements intended for use with text operands and macro arguments. Table 13-1 lists these additional if statements.

Table 13-1: Text-Handling Conditional if Statements

Statement	Text operand(s)	Meaning
`ifb`^*	`arg`	If blank: true if `arg` evaluates to an empty string.
`ifnb`	`arg`	If not blank: true if `arg` evaluates to a non-empty string.
`ifdif`	`arg1,` `arg2`	If different: true if `arg1` and `arg2` are different (case-sensitive).
`ifdifi`	`arg1,` `arg2`	If different: true if `arg1` and `arg2` are different (case-insensitive).
`ifidn`	`arg1,` `arg2`	If identical: true if `arg1` and `arg2` are exactly the same (case-sensitive).
`ifidni`	`arg2,` `arg2`	If identical: true if `arg1` and `arg2` are exactly the same (case-insensitive).
^* `ifb` `arg` is shorthand for `ifidn <arg>, <>`.

You use these conditional if statements exactly like the standard if statement. You can also follow these if statements with an elseif or else clause, but there are no elseifb, elseifnb, . . . , variants of these if statements (only a standard elseif with a Boolean expression may follow these statements).

The following snippet demonstrates how to use the ifb statement to ensure that the neg128 macro has exactly two arguments. There is no need to check whether reg64HO is also blank; if reg64HO is blank, reg64LO will also be blank, and the ifb statement will report the appropriate error:

neg128  macro reg64HO, reg64LO

        ifb  <reg64LO>
        .err <neg128 requires 2 operands>
        endif

        neg  reg64HO
        neg  reg64LO
        sbb  reg64HO, 0
        endm

Be very careful about using ifb in your programs. It is easy to pass in a text symbol to a macro and wind up testing whether the name of that symbol is blank rather than the text itself. Consider the following:

symbol      textequ <>
            neg128  rax, symbol     ; Generates an error

The neg128 invocation has two arguments, and the second one is not blank, so the ifb directive is happy with the argument list. However, inside the macro when neg128 expands reg64LO after the neg instruction, the expansion is the empty string, producing an error (which is what the ifb was supposed to prevent).

A different way to handle missing macro arguments is to explicitly tell MASM that an argument is required with the :req suffix on the macro definition line. Consider the following definition for the neg128 macro:

neg128  macro reg64HO:req, reg64LO:req
        neg   reg64HO
        neg   reg64LO
        sbb   reg64HO, 0
        endm

With the :req option present, MASM reports the following if you are missing one or more of the macro arguments:

listing14.asm(12) : error A2125:missing macro argument

13.9.3 Default Macro Parameter Values

One way to handle missing macro arguments is to define default values for those arguments. Consider the following definition for the neg128 macro:

neg128  macro reg64HO:=<rdx>, reg64LO:=<rax>
        neg   reg64HO
        neg   reg64LO
        sbb   reg64HO, 0
        endm

The := operator tells MASM to substitute the text constant to the right of the operator for the associated macro argument if an actual value is not present on the macro invocation line. Consider the following two invocations of neg128:

neg128       ; Defaults to "RDX, RAX" for the args
neg128 rbx   ; Uses RBX:RAX for the 128-bit register pair

13.9.4 Macros with a Variable Number of Parameters

It is possible to tell MASM to allow a variable number of arguments in a macro invocation:

varParms  macro varying:vararg 

      Macro body

          endm 
           . 
           . 
           . 
          varParms 1 
          varParms 1, 2 
          varParms 1, 2, 3 
          varParms

Within the macro, MASM will create a text object of the form <arg1, arg2, ..., argn> and assign this text object to the associated parameter name (varying, in the preceding example). You can use the MASM for loop to extract the individual values of the varying argument. For example:

varParms  macro varying:vararg 
          for   curArg, <varying>
          byte  curArg
          endm  ; End of FOR loop
          endm  ; End of macro
  
          varParms 1 
          varParms 1, 2 
          varParms 1, 2, 3
          varParms <5 dup (?)>

Here’s the listing output for an assembly containing this example source code:

 00000000                        .data
                       varParms  macro varying:vararg
                                 for   curArg, <varying>
                                 byte  curArg
                                 endm  ; End of FOR loop
                                 endm  ; End of macro

                                 varParms 1
 00000000  01         2          byte  1
                                 varParms 1, 2
 00000001  01         2          byte  1
 00000002  02         2          byte  2
                                 varParms 1, 2, 3
 00000003  01         2          byte  1
 00000004  02         2          byte  2
 00000005  03         2          byte  3
                                 varParms <5 dup (?)>
 00000006  00000005 [ 2          byte  5 dup (?)
            00
           ]

A macro can have, at most, one vararg parameter. If a macro has more than one parameter and also has a vararg parameter, the vararg parameter must be the last argument.

13.9.5 The Macro Expansion (&) Operator

Inside a macro, you can use the & operator to replace a macro name (or other text symbol) with its actual value. This operator is active anywhere, even with string literals. Consider the following examples:

expand      macro   parm
            byte    '&parm', 0
            endm    
            
            .data
            expand  a

The macro invocation in this example expands to the following code:

byte 'a', 0

If, for some reason, you need the string '&parm' to be emitted within a macro (that has parm as one of its parameters), you will have to work around the expansion operator. Note that '!&parm' will not escape the & operator. One solution that works in this specific case is to rewrite the byte directive:

expand      macro   parm
            byte    '&', 'parm', 0
            endm

Now the & operator is not causing the expansion of parm inside a string.

13.10 Local Symbols in a Macro

Consider the following macro declaration:

jzc    macro  target

       jnz    NotTarget 
       jc     target 
NotTarget: 
       endm

This macro simulates an instruction that jumps to the specified target location if the zero flag is set and the carry flag is set. Conversely, if either the zero flag or the carry flag is clear, this macro transfers control to the instruction immediately following the macro invocation.

There is a serious problem with this macro. Consider what happens if you use this macro more than once in your program:

jzc Dest1 
  . 
  . 
  . 
jzc Dest2 
  . 
  . 
  .

The preceding macro invocations expand to the following code:

         jnz NotTarget 
         jc Dest1 
NotTarget: 
          . 
          . 
          . 
         jnz NotTarget 
         jc Dest2 
NotTarget: 
          . 
          . 
          .

These two macro invocations both emit the same label, NotTarget, during macro expansion. When MASM processes this code, it will complain about a duplicate symbol definition.

MASM’s solution to this problem is to allow the use of local symbols within a macro. Local macro symbols are unique to a specific invocation of a macro. You must explicitly tell MASM which symbols must be local by using the local directive:

macro_name    macro  optional_parameters 
              local  list_of_local_names
         Macro body
              endm

The list_of_local_names is a sequence of one or more MASM identifiers separated by commas. Whenever MASM encounters one of these names in a particular macro invocation, it automatically substitutes a unique name for that identifier. For each macro invocation, MASM substitutes a different name for the local symbol.

You can correct the problem with the jzc macro by using the following macro code:

jzc      macro   target
         local   NotTarget

         jnz     NotTarget
         jc      target
NotTarget: 

         endm

Now whenever MASM processes this macro, it will automatically associate a unique symbol with each occurrence of NotTarget. This will prevent the duplicate symbol error that occurs if you do not declare NotTarget as a local symbol.

MASM generates symbols of the form ??nnnn, where nnnn is a (unique) four-digit hexadecimal number, for each local symbol. So, if you see symbols such as ??0000 in your assembly listings, you know where they came from.

A macro definition can have multiple local directives, each with its own list of local names. However, if you have multiple local statements in a macro, they should all immediately follow the macro directive.

13.11 The exitm Directive

The MASM exitm directive (which may appear only within a macro) tells MASM to immediately terminate the processing of the macro. MASM will ignore any additional lines of text within the macro. If you think of a macro as a procedure, exitm is the return statement.

The exitm directive is useful in a conditional assembly sequence. Perhaps after checking for the presence (or absence) of certain macro arguments, you might want to stop processing the macro to avoid additional errors from MASM. For example, consider the earlier neg128 macro:

neg128  macro reg64HO, reg64LO

        ifb   <reg64LO>
        .err  <neg128 requires 2 operands>
        exitm
        endif

        neg   reg64HO
        neg   reg64LO
        sbb   reg64HO, 0
        endm

Without the exitm directive inside the conditional assembly, this macro would attempt to assemble the neg reg64LO instruction, generating another error because reg64LO expands to the empty string.

13.12 MASM Macro Function Syntax

Originally, MASM’s macro design allowed programmers to create substitute mnemonics. A programmer could use a macro to replace a machine instruction or other statement (or sequence of statements) in an assembly language source file. Macros could create only whole lines of output text in the source file. This prevented programmers from using macro invocation such as the following:

mov rax, some_macro_invocation(arguments)

Today, MASM supports additional syntax that allows you to create macro functions. A MASM macro function definition looks exactly like a normal macro definition with one addition: you use an exitm directive with a textual argument to return a function result from the macro. Consider the upperCase macro function in Listing 13-4.

; Listing 13-4
 
; CTL while loop demonstration program.

        option  casemap:none

nl          =       10

            .const
ttlStr      byte    "Listing 13-4", 0
           
; upperCase macro function.
 
; Converts text argument to a string, converting
; all lowercase characters to uppercase.

upperCase   macro   theString
            local   resultString, thisChar, sep
resultStr   equ     <> ; Initialize function result with ""
sep         textequ <> ; Initialize separator char with ""

            forc    curChar, theString
            
; Check to see if the character is lowercase.
; Convert it to uppercase if it is, otherwise
; output it to resultStr as is. Concatenate the
; current character to the end of the result string
; (with a ", " separator, if this isn't the first
; character appended to resultStr).

            if      ('&curChar' GE 'a') and ('&curChar' LE 'z')
resultStr   catstr  resultStr, sep, %'&curChar'-32
            else
resultStr   catstr  resultStr, sep, %'&curChar'
            endif
            
; First time through, sep is the empty string. For all
; other iterations, sep is the comma separator between
; values.

sep         textequ <, >
            endm    ; End for
            
            exitm   <resultStr>
            endm    ; End macro

; Demonstration of the upperCase macro function:
            
            .data
chars       byte    "Demonstration of upperCase"
            byte    "macro function:"
            byte    upperCase(<abcdEFG123>), nl, 0
            
            .code
            externdef printf:proc
            
; Return program title to C++ program:

            public  getTitle
getTitle    proc
            lea     rax, ttlStr
            ret
getTitle    endp
                    
; Here is the "asmMain" function.

            public  asmMain
asmMain     proc
            push    rbx
            push    rbp
            mov     rbp, rsp
            sub     rsp, 56         ; Shadow storage

            lea     rcx, chars      ; Prints characters converted to uppercase
            call    printf

allDone:    leave
            pop     rbx
            ret     ; Returns to caller
asmMain     endp
            end

Listing 13-4: Sample macro function

Whenever you invoke a MASM macro function, you must always follow the macro name with a pair of parentheses enclosing the macro’s arguments. Even if the macro has no arguments, an empty pair of parentheses must be present. This is how MASM differentiates standard macros and macro functions.

Earlier versions of MASM included functions for directives such as sizestr (using the name @sizestr). Recent versions of MASM have removed these functions. However, you can easily write your own macro functions to replace these missing functions. Here’s a quick replacement for the @sizestr function:

; @sizestr - Replacement for the MASM @sizestr function
;            that Microsoft removed from MASM.

@sizestr    macro   theStr
            local   theLen
theLen      sizestr <theStr>
            exitm   <&theLen>
            endm

The & operator in the exitm directive forces the @sizestr macro to expand the text associated with theLen local symbol inside the < and > string delimiters before returning the value to whomever invoked the macro function. Without the & operator, the @sizestr macro will return text of the form ??0002 (the unique symbol MASM creates for the local symbol theLen).

13.13 Macros as Compile-Time Procedures and Functions

Although programmers typically use macros to expand to a sequence of machine instructions, there is absolutely no requirement that a macro body contain any executable instructions. Indeed, many macros contain only compile-time language statements (for example, if, while, for, = assignments, and the like). By placing only compile-time language statements in the body of a macro, you can effectively write compile-time procedures and functions using macros.

The following unique macro is a good example of a compile-time function that returns a string result:

unique macro 
       local  theSym
       exitm  <theSym>
       endm

Whenever your code references this macro, MASM replaces the macro invocation with the text theSym. MASM generates unique symbols such as ??0000 for local macro symbols. Therefore, each invocation of the unique macro will generate a sequence of symbols such as ??0000, ??0001, ??0002, and so forth.

13.14 Writing Compile-Time “Programs”

The MASM compile-time language allows you to write short programs that write other programs—in particular, to automate the creation of large or complex assembly language sequences. The following subsections provide simple examples of such compile-time programs.

13.14.1 Constructing Data Tables at Compile Time

Earlier, this book suggested that you could write programs to generate large, complex lookup tables for your assembly language programs (see the discussion of tables in “Generating Tables” in Chapter 10). Chapter 10 provides C++ programs that generate tables to paste into assembly programs. In this section, we will use the MASM compile-time language to construct data tables during assembly of the program that uses the tables.

One common use for the compile-time language is to build ASCII character lookup tables for alphabetic case manipulation with the xlat instruction at runtime. Listing 13-5 demonstrates how to construct an uppercase conversion table and a lowercase conversion table.⁴ Note the use of a macro as a compile-time procedure to reduce the complexity of the table-generating code.

; Listing 13-5
 
; Creating lookup tables with macros.

        option  casemap:none

nl          =       10

            .const
ttlStr      byte    "Listing 13-5", 0
fmtStr1     byte    "testString converted to UC:", nl
            byte    "%s", nl, 0
            
fmtStr2     byte    "testString converted to LC:", nl
            byte    "%s", nl, 0

testString  byte    "This is a test string ", nl
            byte    "Containing UPPERCASE ", nl
            byte    "and lowercase chars", nl, 0

emitChRange macro   start, last
            local   index, resultStr
index       =       start
            while   index lt last
            byte    index
index       =       index + 1
            endm
            endm

; Lookup table that will convert lowercase
; characters to uppercase. The byte at each
; index contains the value of that index,
; except for the bytes at indexes "a" to "z".
; Those bytes contain the values "A" to "Z".
; Therefore, if a program uses an ASCII
; character's numeric value as an index
; into this table and retrieves that byte,
; it will convert the character to uppercase.

lcToUC      equ             this byte
            emitChRange     0, 'a'
            emitChRange     'A', %'Z'+1
            emitChRange     %'z'+1, 0ffh

; As above, but this table converts uppercase
; to lowercase characters.
            
UCTolc      equ             this byte
            emitChRange     0, 'A'
            emitChRange     'a', %'z'+1
            emitChRange     %'Z'+1, 0ffh

            .data

; Store the destination strings here:

toUC        byte    256 dup (0)
TOlc        byte    256 dup (0)     

            .code
            externdef printf:proc
            
; Return program title to C++ program:

            public  getTitle
getTitle    proc
            lea     rax, ttlStr
            ret
getTitle    endp

; Here is the "asmMain" function.

            public  asmMain
asmMain     proc
            push    rbx
            push    rdi
            push    rsi
            push    rbp
            mov     rbp, rsp
            sub     rsp, 56         ; Shadow storage
            
; Convert the characters in testString to uppercase:

            lea     rbx, lcToUC
            lea     rsi, testString
            lea     rdi, toUC
            jmp     getUC
            
toUCLp:     xlat
            mov     [rdi], al
            inc     rsi
            inc     rdi
getUC:      mov     al, [rsi]
            cmp     al, 0
            jne     toUCLp
            
; Display the converted string:

            lea     rcx, fmtStr1
            lea     rdx, toUC
            call    printf
                    
; Convert the characters in testString to lowercase:

            lea     rbx, UCTolc
            lea     rsi, testString
            lea     rdi, TOlc
            jmp     getLC
            
toLCLp:     xlat
            mov     [rdi], al
            inc     rsi
            inc     rdi
getLC:      mov     al, [rsi]
            cmp     al, 0
            jne     toLCLp
            
; Display the converted string:

            lea     rcx, fmtStr2
            lea     rdx, TOlc
            call    printf
                              
allDone:    leave
            pop     rsi
            pop     rdi
            pop     rbx
            ret     ; Returns to caller
asmMain     endp
            end

Listing 13-5: Generating case-conversion tables with the compile-time language

Here’s the build command and sample output for the program in Listing 13-5:

C:\>build listing13-5

C:\>echo off
 Assembling: listing13-5.asm
c.cpp

C:\>listing13-5
Calling Listing 13-5:
testString converted to UC:
THIS IS A TEST STRING
CONTAINING UPPERCASE
AND LOWERCASE CHARS

testString converted to LC:
this is a test string
containing uppercase
and lowercase chars

Listing 13-5 terminated

13.14.2 Unrolling Loops

Chapter 7 points out that you can unroll loops to improve the performance of certain assembly language programs. However, this requires a lot of extra typing, especially if you have many loop iterations. Fortunately, MASM’s compile-time language facilities, especially the while loop, come to the rescue. With a small amount of extra typing plus one copy of the loop body, you can unroll a loop as many times as you please.

If you simply want to repeat the same code sequence a certain number of times, unrolling the code is especially trivial. All you have to do is wrap a MASM while..endm loop around the sequence and count off the specified number of iterations. For example, if you wanted to print Hello World 10 times, you could encode this as follows:

count = 0
while count LT 10
     call print
     byte "Hello World", nl, 0 

count = count + 1
endm

Although this code looks similar to a high-level language while loop, remember the fundamental difference: the preceding code simply consists of 10 straight calls to print in the program. Were you to encode this using an actual loop, there would be only one call to print and lots of additional logic to loop back and execute that single call 10 times.

Unrolling loops becomes slightly more complicated if any instructions in that loop refer to the value of a loop control variable or another value, which changes with each iteration of the loop. A typical example is a loop that zeroes the elements of an integer array:

        xor eax, eax   ; Set EAX and RBX to 0
        xor rbx, rbx
        lea rcx, array
whlLp:  cmp rbx, 20
        jae loopDone
        mov [rcx][rbx * 4], eax
        inc rbx
        jmp whlLp

loopDone:

In this code fragment, the loop uses the value of the loop control variable (in RBX) to index into array. Simply copying mov [rcx][ebx * 4], eax 20 times is not the proper way to unroll this loop. You must substitute an appropriate constant index in the range 0 to 76 (the corresponding loop indices, times 4) in place of rbx * 4 in this example. Correctly unrolling this loop should produce the following code sequence:

mov  [rcx][0 * 4], eax
mov  [rcx][1 * 4], eax
mov  [rcx][2 * 4], eax
mov  [rcx][3 * 4], eax
mov  [rcx][4 * 4], eax
mov  [rcx][5 * 4], eax
mov  [rcx][6 * 4], eax
mov  [rcx][7 * 4], eax
mov  [rcx][8 * 4], eax
mov  [rcx][9 * 4], eax
mov [rcx][10 * 4], eax 
mov [rcx][11 * 4], eax 
mov [rcx][12 * 4], eax 
mov [rcx][13 * 4], eax 
mov [rcx][14 * 4], eax 
mov [rcx][15 * 4], eax 
mov [rcx][16 * 4], eax 
mov [rcx][17 * 4], eax 
mov [rcx][18 * 4], eax 
mov [rcx][19 * 4], eax

You can easily do this using the following compile-time code sequence:

iteration = 0
while iteration LT 20 
     mov [rcx][iteration * 4], eax
     iteration = iteration + 1
endm

If the statements in a loop use the loop control variable’s value, it is possible to unroll such loops only if those values are known at compile time. You cannot unroll loops when user input (or other runtime information) controls the number of iterations.

Of course, if the code sequence loaded RCX with the address of array immediately prior to this loop, you could also use the following while loop to save the use of the RCX register:

iteration = 0
while iteration LT 20 
     mov array[iteration * 4], eax
     iteration = iteration + 1
endm

13.15 Simulating HLL Procedure Calls

Calling procedures (functions) in assembly language is a real chore. Loading registers with parameters, pushing values onto the stack, and other activities are a complete distraction. High-level language procedure calls are far more readable and easier to write than the same calls to an assembly language function. Macros provide a good mechanism to call procedures and functions in a high-level-like manner.

13.15.1 HLL-Like Calls with No Parameters

Of course, the most trivial example is a call to an assembly language procedure that has no arguments at all:

someProc  macro
          call    _someProc
          endm

_someProc proc
            .
            .
            .
_someProc endp
            .
            .
            .
          someProc   ; Call the procedure

This simple example demonstrates a couple of conventions this book will use for calling procedures via macro invocation:

If the procedure and all calls to the procedure occur within the same source file, place the macro definition immediately before the procedure to make it easy to find. (Chapter 15 discusses the placement of the macro if you call the procedure from several different source files.)
If you would normally name the procedure someProc, change the procedure’s name to _someProc and then use someProc as the macro name.

While the advantage to using a macro invocation of the form someProc versus a call to the procedure using call someProc might seem somewhat dubious, keeping all procedure calls consistent (by using macro invocations for all of them) helps make your programs more readable.

13.15.2 HLL-Like Calls with One Parameter

The next step up in complexity is to call a procedure with a single parameter. Assuming you’re using the Microsoft ABI and passing the parameter in RCX, the simplest solution is something like the following:

someProc  macro   parm1
          mov     rcx, parm1
          call    _someProc
          endm
           .
           .
           .
          someProc Parm1Value

This macro works well if you’re passing a 64-bit integer by value. If the parameter is an 8-, 16-, or 32-bit value, you would swap CL, CX, or ECX for RCX in the mov instruction.⁵

If you’re passing the first argument by reference, you would swap an lea instruction for the mov instruction in this example. As reference parameters are always 64-bit values, the lea instruction would usually take this form:

lea     rcx, parm1

Finally, if you’re passing a real4 or real8 value as the parameter, you’d swap one of the following instructions for the mov instruction in the previous macro:

movss  xmm0, parm1  ; Use this for real4 parameters
movsd  xmm0, parm1  ; Use this for real8 parameters

As long as the actual parameter is a memory variable or an appropriate integer constant, this simple macro definition works quite well, covering a very large percentage of the real-world cases.

For example, to call the C Standard Library printf() function with a single argument (the format string) using the current macro scheme, you’d write the macro as follows:⁶

cprintf  macro  parm1
         lea    rcx, parm1
         call   printf
         endm

So you can invoke this macro as

cprintf fmtStr

where fmtStr is (presumably) the name of a byte object in your .data section containing the printf format string.

For a more high-level-like syntax for our procedure calls, we should allow something like the following:

cprintf "This is a printf format string"

Unfortunately, the way the macro is currently written, this will generate the following (syntactically incorrect) statement:

lea   rcx, "This is a printf format string"

We could modify this macro to allow this invocation by rewriting it as follows:

cprintf  macro  parm1
         local  fmtStr
         .data
fmtStr   byte   parm1, nl, 0
         .code
         lea    rcx, fmtStr
         call   printf
         endm

Invoking this macro by using a string constant as the argument expands to the following code:

         .data
fmtStr   byte   "This is a printf format string", nl, 0
         .code
         lea    rcx, fmtStr  ; Technically, fmtStr will really be something
         call   printf       ; like ??0001

The only problem with this new form of the macro is that it no longer accepts invocations such as

cprintf fmtStr

where fmtStr is a byte object in the .data section. We’d really like to have a macro that can accept both forms.

13.15.3 Using opattr to Determine Argument Types

The trick to this is the opattr operator (see Table 4-1 in Chapter 4). This operator returns an integer value with certain bits set based on the type of expression that follows. In particular, bit 2 will be set if the expression following is relocatable or otherwise references memory. Therefore, this bit will be set if a variable such as fmtStr appears as the argument, and it will be clear if you pass a string literal as the argument (opattr actually returns the value 0 for string literals that are longer than 8 characters, just so you know). Now consider the code in Listing 13-6.

; Listing 13-6
 
; opattr demonstration.

        option  casemap:none

nl          =       10

            .const
ttlStr      byte    "Listing 13-6", 0
           
fmtStr      byte    nl, "Hello, World! #2", nl, 0

            .code
            externdef printf:proc
            
; Return program title to C++ program:

            public  getTitle
getTitle    proc
            lea     rax, ttlStr
            ret
getTitle    endp

; cprintf macro:
 
;           cprintf fmtStr
;           cprintf "Format String"

cprintf     macro   fmtStrArg
            local   fmtStr, attr, isConst
            
attr        =       opattr fmtStrArg
isConst     =       (attr and 4) eq 4
            if      (attr eq 0) or isConst
            .data   
fmtStr      byte    fmtStrArg, nl, 0
            .code
            lea     rcx, fmtStr
            
            else
            
            lea     rcx, fmtStrArg
            
            endif
            call    printf
            endm
 
atw         =       opattr "Hello World"
bin         =       opattr "abcdefghijklmnopqrstuvwxyz"
            
; Here is the "asmMain" function.

            public  asmMain
asmMain     proc
            push    rbx
            push    rdi
            push    rsi
            push    rbp
            mov     rbp, rsp
            sub     rsp, 56         ; Shadow storage

            cprintf "Hello World!"
            cprintf fmtStr
             
allDone:    leave
            pop     rsi
            pop     rdi
            pop     rbx
            ret     ; Returns to caller
asmMain     endp
            end

Listing 13-6: opattr operator in a macro

Here’s the build command and sample output for Listing 13-6:

C:\>build listing13-6

C:\>echo off
 Assembling: listing13-6.asm
c.cpp

C:\>listing13-6
Calling Listing 13-6:
Hello World!
Hello, World! #2
Listing 13-6 terminated

This cprintf macro is far from perfect. For example, the C/C++ printf() function allows multiple arguments that this macro does not handle. But this macro does demonstrate how to handle two different calls to printf based on the type of the argument you pass cprintf.

13.15.4 HLL-Like Calls with a Fixed Number of Parameters

Expanding the macro-calling mechanism from one parameter to two or more (assuming a fixed number of parameters) is fairly easy. All you need to do is add more formal parameters and handle those arguments in your macro definition. Listing 13-7 is a modification of Listing 9-11 in Chapter 9 that uses macro invocations for calls to r10ToStr, e10ToStr, and some fixed calls to printf (for brevity, as this is a very long program, only the macros and a few invocations are included).

           .
           .     ; About 1200 lines from Listing 9-10.
           .

; r10ToStr - Macro to create an HLL-like call for the 
;            _r10ToStr procedure.

; Parameters:

;   r10    - Must be the name of a real4, real8, or 
;            real10 variable.
;   dest   - Must be the name of a byte buffer to hold 
;            string result.

;   wdth   - Output width for the string. Either an
;            integer constant or a dword variable.

;   dPts   - Number of positions after the decimal
;            point. Either an integer constant or
;            a dword variable.

;   fill   - Fill char. Either a character constant
;            or a byte variable.

;   mxLen  - Maximum length of output string. Either
;            an integer constant or a dword variable.

r10ToStr     macro   r10, dest, wdth, dPts, fill, mxLen
             fld     r10
            
; dest is a label associated with a string variable:

             lea     rdi, dest
            
; wdth is either a constant or a dword var:

             mov     eax, wdth
            
; dPts is either a constant or a dword var
; holding the number of decimal point positions:

            mov     edx, dPts
            
; Process fill character. If it's a constant, 
; directly load it into ECX (which zero-extends
; into RCX). If it's a variable, then move with
; zero extension into ECX (which also zero-
; extends into RCX).
 
; Note: bit 2 from opattr is 1 if fill is 
; a constant.
            
            if      ((opattr fill) and 4) eq 4
            mov     ecx, fill
            else
            movzx   ecx, fill
            endif

; mxLen is either a constant or a dword var.
            
            mov     r8d, mxLen
            call    _r10ToStr
            endm

; e10ToStr - Macro to create an HLL-like call for the 
;            _e10ToStr procedure.

; Parameters:

;   e10   - Must be the name of a real4, real8, or 
;           real10 variable.
;   dest  - Must be the name of a byte buffer to hold 
;           string result.

;   wdth  - Output width for the string. Either an
;           integer constant or a dword variable.

;   xDigs - Number of exponent digits.

;   fill  - Fill char. Either a character constant
;           or a byte variable.

;   mxLen - Maximum length of output string. Either
;           an integer constant or a dword variable.

e10ToStr    macro   e10, dest, wdth, xDigs, fill, mxLen
            fld     e10
            
; dest is a label associated with a string variable:

            lea     rdi, dest
            
; wdth is either a constant or a dword var:

            mov     eax, wdth
            
; xDigs is either a constant or a dword var
; holding the number of decimal point positions:

            mov     edx, xDigs
            
; Process fill character. If it's a constant, 
; directly load it into ECX (which zero-extends
; into RCX). If it's a variable, then move with
; zero extension into ECX (which also zero-
; extends into RCX).

; Note: bit 2 from opattr is 1 if fill is 
; a constant.
            
            if      ((opattr fill) and 4) eq 4
            mov     ecx, fill
            else
            movzx   ecx, fill
            endif

; mxLen is either a constant or a dword var.
            
            mov     r8d, mxLen
            call    _e10ToStr
            endm
 
; puts - A macro to print a string using printf.
 
; Parameters:
 
;   fmt    - Format string (must be a byte
;            variable or string constant).

;   theStr - String to print (must be a
;            byte variable, a register,
;            or a string constant).

puts         macro   fmt, theStr
             local   strConst, bool
            
             lea     rcx, fmt
            
             if      ((opattr theStr) and 2)
            
; If memory operand:

             lea     rdx, theStr
            
             elseif  ((opattr theStr) and 10h)
            
; If register operand:

             mov     rdx, theStr
            
             else 
            
; Assume it must be a string constant.

            .data
strConst    byte    theStr, 0
            .code
            lea     rdx, strConst
            
            endif
            
            call    printf
            endm
        
            public  asmMain
asmMain     proc
            push    rbx
            push    rsi
            push    rdi
            push    rbp
            mov     rbp, rsp
            sub     rsp, 64         ; Shadow storage
            
; F output:

            r10ToStr r10_1, r10str_1, 30, 16, '*', 32
            jc      fpError
            puts    fmtStr1, r10str_1
            
            r10ToStr r10_1, r10str_1, 30, 15, '*', 32
            jc      fpError
            puts    fmtStr1, r10str_1
             .
             .    ; Similar code to Listing 9-10 with macro
             .    ; invocations rather than procedure calls.
; E output:

            e10ToStr e10_1, r10str_1, 26, 3, '*', 32
            jc      fpError
            puts    fmtStr3, r10str_1

            e10ToStr e10_2, r10str_1, 26, 3, '*', 32
            jc      fpError
            puts    fmtStr3, r10str_1
             .
             .    ; Similar code to Listing 9-10 with macro
             .    ; invocations rather than procedure calls.

Listing 13-7: Macro call implementation for converting floating-point values to strings

Compare the HLL-like calls to these three functions against the original procedure calls in Listing 9-11:

; F output:

fld     r10_1
lea     rdi, r10str_1
mov     eax, 30         ; fWidth
mov     edx, 16         ; decimalPts
mov     ecx, '*'        ; Fill
mov     r8d, 32         ; maxLength
call    r10ToStr
jc      fpError

lea     rcx, fmtStr1
lea     rdx, r10str_1
call    printf

fld     r10_1
lea     rdi, r10str_1
mov     eax, 30         ; fWidth
mov     edx, 15         ; decimalPts
mov     ecx, '*'        ; Fill
mov     r8d, 32         ; maxLength
call    r10ToStr
jc      fpError

lea     rcx, fmtStr1
lea     rdx, r10str_1
call    printf
.
.   ; Additional code from Listing 9-10.
.
; E output:

fld     e10_1
lea     rdi, r10str_1
mov     eax, 26         ; fWidth
mov     edx, 3          ; expDigits
mov     ecx, '*'        ; Fill
mov     r8d, 32         ; maxLength
call    e10ToStr
jc      fpError

lea     rcx, fmtStr3
lea     rdx, r10str_1
call    printf

fld     e10_2
lea     rdi, r10str_1
mov     eax, 26         ; fWidth
mov     edx, 3          ; expDigits
mov     ecx, '*'        ; Fill
mov     r8d, 32         ; maxLength
call    e10ToStr
jc      fpError

lea     rcx, fmtStr3
lea     rdx, r10str_1
call    printf
.
.   ; Additional code from Listing 9-10.
.

Clearly, the macro version is easier to read (and, as it turns out, easier to debug and maintain too).

13.15.5 HLL-Like Calls with a Varying Parameter List

Some procedures expect a varying number of parameters; the C/C++ printf() function is a good example. Some procedures, though they might support only a fixed number of arguments, could be better written using a varying argument list. For example, consider the print procedure that has appeared throughout the examples in this book; its string parameter (which follows the call to print in the code stream) is, technically, a single-string argument. Consider the following macro implementation for a call to print:

print       macro   arg
            call    _print
            byte    arg, 0
            endm

You could invoke this macro as follows:

print  "Hello, World!"

The only problem with this macro is that you will often want to supply multiple arguments in its invocation, such as this:

print  "Hello, World!", nl, "It's a great day!", nl

Unfortunately, this macro will not accept this list of parameters. However, this seems like a natural use of the print macro, so it makes a lot of sense to modify the print macro to handle multiple arguments and combine them as a single string after the call to the _print function. Listing 13-8 provides such an implementation.

; Listing 13-8

; HLL-like procedure calls with
; a varying parameter list.

        option  casemap:none

nl          =       10

            .const
ttlStr      byte    "Listing 13-8", 0

            .code
            externdef printf:proc
            
            include getTitle.inc
            
; Note: don't include print.inc here
; because this code uses a macro for
; print.

; print macro - HLL-like calling sequence for the _print
;               function (which is, itself, a shell for
;               the printf function).

; If print appears on a line by itself (no; arguments), 
; then emit a string consisting of a single newline 
; character (and zero-terminating byte). If there are 
; one or more arguments, emit each argument and append 
; a single 0 byte after all the arguments.

; Examples:

;           print
;           print   "Hello, World!"
;           print   "Hello, World!", nl

print       macro   arg1, optArgs:vararg
            call    _print
            
            ifb     <arg1>

; If print is used by itself, print a
; newline character:
            
            byte    nl, 0
            
            else
            
; If we have one or more arguments, then
; emit each of them:
            
            byte    arg1

            for     oa, <optArgs>
            
            byte    oa
            
            endm

; Zero-terminate the string.

            byte    0
            
            endif
            endm
 
_print      proc
            push    rax
            push    rbx
            push    rcx
            push    rdx
            push    r8
            push    r9
            push    r10
            push    r11
            
            push    rbp
            mov     rbp, rsp
            sub     rsp, 40
            and     rsp, -16
            
            mov     rcx, [rbp + 72]   ; Return address
            call    printf
            
            mov     rcx, [rbp + 72]
            dec     rcx
skipTo0:    inc     rcx
            cmp     byte ptr [rcx], 0
            jne     skipTo0
            inc     rcx
            mov     [rbp + 72], rcx
            
            leave
            pop     r11
            pop     r10
            pop     r9
            pop     r8
            pop     rdx
            pop     rcx
            pop     rbx
            pop     rax
            ret
_print      endp
              
p           macro   arg
            call    _print
            byte    arg, 0
            endm      
            
; Here is the "asmMain" function.
        
            public  asmMain
asmMain     proc
            push    rbx
            push    rdi
            push    rsi
            push    rbp
            mov     rbp, rsp
            sub     rsp, 56         ; Shadow storage

            print   "Hello world"
            print
            print   "Hello, World!", nl
            
allDone:    leave
            pop     rsi
            pop     rdi
            pop     rbx
            ret     ; Returns to caller
asmMain     endp
            end

Listing 13-8: Varying arguments’ implementation of print macro

Here’s the build command and output for the program in Listing 13-8:

C:\>build listing13-8

C:\>echo off
 Assembling: listing13-8.asm
c.cpp

C:\>listing13-8
Calling Listing 13-8:
Hello world
Hello, World!
Listing 13-8 terminated

With this new print macro, you can now call the _print procedure in an HLL-like fashion by simply listing the arguments in the print invocation:

print "Hello World", nl, "How are you today?", nl

This will generate a byte directive that concatenates all the individual string components.

Note, by the way, that it is possible to pass a string containing multiple arguments to the original (single-argument) version of print. By rewriting the macro invocation

print "Hello World", nl

print <"Hello World", nl>

you get the desired output. MASM treats everything between the < and > brackets as a single argument. However, it’s a bit of a pain to have to constantly put these brackets around multiple arguments (and your code is inconsistent, as single arguments don’t require them). The print macro implementation with varying arguments is a much better solution.

13.16 The invoke Macro

At one time, MASM provided a special directive, invoke, that you could use to call a procedure and pass it parameters (it worked with the proc directive to determine the number and type of parameters a procedure expected). When Microsoft modified MASM to support 64-bit code, it removed the invoke statement from the MASM language.

However, some enterprising programmers have written MASM macros to simulate the invoke directive in 64-bit versions of MASM. The invoke macro not only is useful in its own right but also provides a great example of how to write advanced macros to call procedures. For more information on the invoke macro, visit https://www.masm32.com/ and download the MASM32 SDK. This includes a set of macros (and other utilities) for 64-bit programs, including the invoke macro.

13.17 Advanced Macro Parameter Parsing

The previous sections provided examples of macro parameter processing used to determine the type of a macro argument in order to determine the type of code to emit. By carefully examining the attributes of an argument, a macro can make various choices concerning how to deal with that argument. This section presents some more advanced techniques you can use when processing macro arguments.

Clearly, the opattr compile-time operator is one of the most important tools you can use when looking at macro arguments. This operator uses the following syntax:

opattr expression

Note that a generic address expression follows opattr; you are not limited to a single symbol.

The opattr operator returns an integer value that is a bit mask specifying the opattr attributes of the associated expression. If the expression following opattr contains forward-referenced symbols or is an illegal expression, opattr returns 0. Microsoft’s documentation indicates that opattr returns the values shown in Table 13-2.

Table 13-2: opattr Return Values

Bit	Meaning
0	There is a code label in the expression.
1	The expression is relocatable.
2	The expression is a constant expression.
3	The expression is uses direct (PC-relative) addressing.
4	The expression is a register.
5	The expression contains no undefined symbols (obsolete).
6	The expression is a stack-segment memory expression.
7	The expression references an external symbol.
8–11	Language type^*
	Value	Language
	0	No language type
	1	C
	2	SYSCALL
	3	STDCALL
	4	Pascal
	5	FORTRAN
	6	BASIC
^* 64-bit code generally doesn’t support a language type, so these bits are usually 0.

Quite honestly, Microsoft’s documentation does not do the best job explaining how MASM sets the bits. For example, consider the following MASM statements:

codeLabel:
opcl       =  opattr codeLabel ; Sets opcl to 25h or 0010_0101b
opconst    =  opattr 0         ; Sets opconst to 36 or 0010_0100b

The opconst has bits 2 and 5 set, just as you would expect from Table 13-2. However, opcl has bits 0, 2, and 5 set; 0 and 5 make sense, but bit 2 (the expression is a constant expression) does not make sense. If, in a macro, you were to test only bit 2 to determine if the operand is a constant (as, I must admit, I have done in earlier examples in this chapter), you could get into trouble when bit 2 is set and you assume that it is a constant.

Probably the wisest thing to do is to mask off bits 0 to 7 (or maybe just bits 0 to 6) and compare the result against an 8-bit value rather than a simple mask. Table 13-3 lists some common values you can test against.

Table 13-3: 8-Bit Values for opattr Results

Value	Meaning
0	Undefined (forward-referenced) symbol or illegal expression
34 / 22h	Memory access of the form `[reg` `+` `const]`
36 / 24h	Constant
37 / 25h	Code label (proc name or symbol with a `:` suffix) or `offset` `code_label` form
38 / 26h	Expression of the form `offset` `label`, where `label` is a variable in the `.data` section
42 / 2Ah	Global symbol (for example, symbol in `.data` section)
43 / 2Bh	Memory access of the form `[reg` `+` `code_label]`, where `code_label` is a proc name or symbol with `:` suffix
48 / 30h	Register (general-purpose, MM, XMM, YMM, ZMM, floating-point/ST, or other special-purpose register)
98 / 62h	Stack-relative memory access (memory addresses of the form `[rsp +` `xxx]` and `[rbp +` `xxx]`)
165 / 0A5h	External code symbol (37 / 25h with bit 7 set)
171 / ABh	External data symbol (43 / 2Bh with bit 7 set)

Perhaps the biggest issue with opattr, as has already been pointed out, is that it believes that constant expressions are integers that can fit into 64 bits. This creates a problem for two important constant types: string literals (longer than 8 characters) and floating-point constants. opattr returns 0 for both.⁸

13.17.1 Checking for String Literal Constants

Although opattr won’t help us determine whether an operand is a string, we can use MASM’s string-processing operations to test the first character of an operand to see if it is a quote. The following code does just that:

; testStr is a macro function that tests its
; operand to see if it is a string literal.
            
testStr     macro   strParm
            local   firstChar
            
            ifnb    <strParm>
firstChar   substr  <strParm>, 1, 1

            ifidn   firstChar,<!">
            
; First character was ", so assume it's
; a string.

            exitm   <1>
            endif   ; ifidn
            endif   ; ifnb
            
; If we get to this point in the macro,
; we definitely do not have a string.

            exitm   <0>
            endm

Consider the following two invocations of the testStr macro:

isAStr  = testStr("String Literal")
notAStr = testStr(someLabel)

MASM will set the symbol isAStr to the value 1, and notAStr to the value 0.

13.17.2 Checking for Real Constants

Real constants are another literal type that MASM’s opattr operator doesn’t support. Again, writing a macro to test for a real constant can resolve that issue. Sadly, parsing real numbers isn’t as easy as checking for a string constant: there is no single leading character that we can use to say, “Hey, we’ve got a floating-point constant here.” The macro will have to explicitly parse the operand character by character and validate it.

To begin with, here is a grammar that defines a MASM floating-point constant:

Sign     ::= (+|-) 
Digit    ::= [0-9]
Mantissa ::= (Digit)+ | '.' Digit)+ | (Digit)+ '.' Digit*
Exp      ::= (e|E) Sign? Digit? Digit? Digit?
Real     ::= Sign? Mantissa Exp?

A real number consists of an optional sign followed by a mantissa and an optional exponent. A mantissa contains at least one digit; it can also contain a decimal point with a digit to its left or right (or both). However, a mantissa cannot consist of a decimal point by itself.

The macro function to test for a real constant should be callable as follows:

isReal = getReal(some_text)

where some_text is the textual data we want to test to see if it’s a real constant. The macro for getReal could be the following:

; getReal - Parses a real constant.

; Returns:
;    true  - If the parameter contains a syntactically
;            correct real number (and no extra characters).
;    false - If there are any illegal characters or
;            other syntax errors in the numeric string.

getReal      macro   origParm
             local   parm, curChar, result
            
; Make a copy of the parameter so we don't
; delete the characters in the original string.

parm         textequ &origParm

; Must have at least one character:

            ifb     parm
            exitm   <0>
            endif
            
; Extract the optional sign:

            if      isSign(parm)
curChar     textequ extract1st(parm)        ; Skip sign char
            endif
            
; Get the required mantissa:

            if      getMant(parm) eq 0
            exitm   <0>                     ; Bad mantissa
            endif

; Extract the optional exponent:

result      textequ getExp(parm)    
            exitm   <&result>       
            
            endm    ; getReal

Testing for real constants is a complex process, so it’s worthwhile to go through this macro (and all subservient macros) step by step:

Make a copy of the original parameter string. During processing, getReal will delete characters from the parameter string while parsing the string. This macro makes a copy to prevent disturbing the original text string passed in to it.
Check for a blank parameter. If the caller passes in an empty string, the result is not a valid real constant and getReal must return false. It’s important to check for the empty string right away because the rest of the code makes the assumption that the string is at least one character long.
Call the getSign macro function. This function (its definition appears a little later) returns true if the first character of its argument is a + or - symbol; otherwise, it returns false.
If the first character is a sign character, invoke the extract1st macro:
```
curChar     textequ extract1st(parm)        ; Skip sign char
```
The extract1st macro returns the first character of its argument as the function result (which this statement assigns to the curChar symbol) and then deletes the first character of its argument. So if the original string passed to getReal was +1, this statement puts + into curChar and deletes the first character in parm (producing the string 1). The definition for extract1st appears a little later in this section.

getReal doesn’t actually use the sign character assigned to curChar. The purpose of this extract1st invocation was strictly for the side effect of deleting the first character in parm.
Invoke getMant. This macro function will return true if the prefix of its string argument is a valid mantissa. It will return false if the mantissa does not contain at least one numeric digit. Note that getMant will stop processing the string on the first non-mantissa character it encounters (including a second decimal point, if there are two or more decimal points in the mantissa). The getMant function doesn’t care about illegal characters; it leaves it up to getReal to look at the remaining characters after the return from getMant to determine if the whole string is valid. As a side effect, getMant deletes all leading characters from the parameter string that it processes.
Invoke the getExp macro function to process any (optional) trailing exponent. The getExp macro is also responsible for ensuring that no garbage characters follow (which results in a parse failure).

The isSign macro is fairly straightforward. Here’s its implementation:

; isSign - Macro function that returns true if the
;          first character of its parameter is a
;          "+" or "-".

isSign      macro   parm
            local   FirstChar
            ifb     <parm>
            exitm   <0>
            endif

FirstChar   substr  parm, 1, 1
            ifidn   FirstChar, <+>
            exitm   <1>
            endif
            ifidn   FirstChar, <->
            exitm   <1>
            endif
            exitm   <0>
            endm

This macro uses the substr operation to extract the first character from the parameter and then compares this against the sign characters (+ or -). It returns true if it is a sign character, and false otherwise.

The extract1st macro function removes the first character from the argument passed to it and returns that character as the function result. As a side effect, this macro function also deletes the first character from the parameter passed to it. Here’s extract1st’s implementation:

extract1st  macro   parm
            local   FirstChar
            ifb     <%parm>
            exitm   <>
            endif
FirstChar   substr  parm, 1, 1
            if      @sizestr(%parm) GE 2
parm        substr  parm, 2
            else
parm        textequ <>
            endif

            exitm   <FirstChar>
            endm

The ifb directive checks whether the parameter string is empty. If it is, extract1st immediately returns the empty string without further modification to its parameter.

Note the % operator before the parm argument. The parm argument actually expands to the name of the string variable holding the real constant. This turns out to be something like ??0005 because of the copy made of the original parameter in the getReal function. Were you to simply specify ifb <parm>, the ifb directive would see <??0005>, which is not blank. Placing the % operator before the parm symbol tells MASM to evaluate the expression (which is just the ??0005 symbol) and replace it by the text it evaluates to (which, in this case, is the actual string).

If the string is not blank, extract1st uses the substr directive to extract the first character from the string and assigns this character to the FirstChar symbol. The extract1st macro function will return this value as the function result.

Next, the extract1st function has to delete the first character from the parameter string. It uses the @sizestr function (whose definition appears a little earlier in this chapter) to determine whether the character string contains at least two characters. If so, extract1st uses the substr directive to extract all the characters from the parameter, starting at the second character position. It assigns this substring back to the parameter passed in. If extract1st is processing the last character in the string (that is, if @sizestr returns 1), then the code cannot use the substr directive because the index would be out of range. The else section of the if directive returns an empty string if @sizestr returns a value less than 2.

The next getReal subservient macro function is getMant. This macro is responsible for parsing the mantissa component of the floating-point constant. The implementation is the following:

getMant     macro   parm
            local   curChar, sawDecPt, rpt
sawDecPt    =       0
curChar     textequ extract1st(parm)        ; Get 1st char
            ifidn   curChar, <.>            ; Check for dec pt
sawDecPt    =       1
curChar     textequ extract1st(parm)        ; Get 2nd char
            endif
            
; Must have at least one digit:

            if      isDigit(curChar) eq 0
            exitm   <0>                     ; Bad mantissa
            endif
            
; Process zero or more digits. If we haven't already
; seen a decimal point, allow exactly one of those.
 
; Do loop at least once if there is at least one
; character left in parm:

rpt         =       @sizestr(%parm)
            while   rpt
            
; Get the 1st char from parm and see if
; it is a decimal point or a digit:

curChar     substr  parm, 1, 1
            ifidn   curChar, <.>
rpt         =       sawDecPt eq 0
sawDecPt    =       1
            else
rpt         =       isDigit(curChar)
            endif

; If char was legal, then extract it from parm:

            if      rpt
curChar     textequ extract1st(parm)        ; Get next char
            endif
            
; Repeat as long as we have more chars and the
; current character is legal:

rpt         =       rpt and (@sizestr(%parm) gt 0)
            endm    ; while
            
; If we've seen at least one digit, we've got a valid
; mantissa. We've stopped processing on the first 
; character that is not a digit or the 2nd "." char.

            exitm   <1>
            endm    ; getMant

A mantissa must have at least one decimal digit. It can have zero or one occurrence of a decimal point (which may appear before the first digit, at the end of the mantissa, or in the middle of a string of digits). The getMant macro function uses the local symbol sawDecPt to keep track of whether it has seen a decimal point already. The function begins by initializing sawDecPt to false (0).

A valid mantissa must have at least one character (because it must have at least one decimal digit). So the next thing getMant does is extract the first character from the parameter string (and place this character in curChar). If the first character is a period (decimal point), the macro sets sawDecPt to true.

The getMant function uses a while directive to process all the remaining characters in the mantissa. A local variable, rpt, controls the execution of the while loop. The code at the beginning of getMant sets rpt to true if the first character is a period or a decimal digit. The isDigit macro function tests the first character of its argument and returns true if it’s one of the characters 0 to 9. The definition for isDigit will appear shortly.

If the first character in the parameter was a dot (.) or a decimal digit, the getMant function removes that character from the beginning of the string and executes the body of the while loop for the first time if the new parameter string length is greater than zero.

The while loop grabs the first character from the current parameter string (without deleting it just yet) and tests it against a decimal digit or a . character. If it’s a decimal digit, the loop will remove that character from the parameter string and repeat. If the current character is a period, the code first checks whether it has already seen a decimal point (using sawDecPt). If this is a second decimal point, the function returns true (later code will deal with the second . character). If the code has not already seen a decimal point, the loop sets sawDecPt to true and continues with the loop execution.

The while loop repeats as long as it sees decimal digits, a single decimal point, or a string with length greater than zero. Once the loop completes, the getMant function returns true. The only way getMant returns false is if it does not see at least one decimal digit (either at the beginning of the string or immediately after the decimal point at the beginning of the string).

The isDigit macro function is a brute-force function that tests its first character against the 10 decimal digits. This function does not remove any characters from its parameter argument. The source code for isDigit is the following:

isDigit     macro   parm
            local   FirstChar
            if      @sizestr(%parm) eq 0
            exitm   <0>
            endif
            
FirstChar   substr  parm, 1, 1
            ifidn   FirstChar, <0>
            exitm   <1>
            endif
            ifidn   FirstChar, <1>
            exitm   <1>
            endif
            ifidn   FirstChar, <2>
            exitm   <1>
            endif
            ifidn   FirstChar, <3>
            exitm   <1>
            endif
            ifidn   FirstChar, <4>
            exitm   <1>
            endif
            ifidn   FirstChar, <5>
            exitm   <1>
            endif
            ifidn   FirstChar, <6>
            exitm   <1>
            endif
            ifidn   FirstChar, <7>
            exitm   <1>
            endif
            ifidn   FirstChar, <8>
            exitm   <1>
            endif
            ifidn   FirstChar, <9>
            exitm   <1>
            endif
            exitm   <0>
            endm

The only thing worth commenting on is the % operator in @sizestr (for reasons explained earlier).

Now we arrive at the last helper function appearing in getReal: the getExp (get exponent) macro function. Here’s its implementation:

getExp      macro   parm
            local   curChar
            
; Return success if no exponent present.

            if      @sizestr(%parm) eq 0
            exitm   <1>
            endif

; Extract the next character, return failure
; if it is not an "e" or "E" character:
            
curChar     textequ extract1st(parm)
            if      isE(curChar) eq 0
            exitm   <0>
            endif

; Extract the next character:
            
curChar     textequ extract1st(parm)

; If an optional sign character appears,
; remove it from the string:

            if      isSign(curChar)
curChar     textequ extract1st(parm)        ; Skip sign char
            endif                           ; isSign
            
; Must have at least one digit:

            if      isDigit(curChar) eq 0
            exitm   <0>
            endif
            
; Optionally, we can have up to three additional digits:

            if      @sizestr(%parm) gt 0
curChar     textequ extract1st(parm)        ; Skip 1st digit
            if      isDigit(curChar) eq 0
            exitm   <0>
            endif
            endif

            if      @sizestr(%parm) gt 0
curChar     textequ extract1st(parm)        ; Skip 2nd digit
            if      isDigit(curChar) eq 0
            exitm   <0>
            endif
            endif

            if      @sizestr(%parm) gt 0
curChar     textequ extract1st(parm)        ; Skip 3rd digit
            if      isDigit(curChar) eq 0
            exitm   <0>
            endif
            endif

; If we get to this point, we have a valid exponent.

            exitm   <1>     
            endm    ; getExp

Exponents are optional in a real constant. Therefore, the first thing this macro function does is check whether it has been passed an empty string. If so, it immediately returns success. Once again, the ifb <%parm> directive must have the % operator before the parm argument.

If the parameter string is not empty, the first character in the string must be an E or e character. This function returns false if this is not the case. Checking for an E or e is done with the isE helper function, whose implementation is the following (note the use of ifidni, which is case-insensitive):

isE         macro   parm
            local   FirstChar
            if      @sizestr(%parm) eq 0
            exitm   <0>
            endif
            
FirstChar   substr  parm, 1, 1
            ifidni   FirstChar, <e>
            exitm   <1>
            endif
            exitm   <0>
            endm

Next, the getExp function looks for an optional sign character. If it encounters one, it deletes the sign character from the beginning of the string.

At least one decimal digit, and at most four decimal digits, must follow the e or E and sign characters. The remaining code in getExp handles that.

Listing 13-9 is a demonstration of the macro snippets appearing throughout this section. Note that this is a pure compile-time program; all its activity takes place while MASM assembles this source code. It does not generate any executable machine code.

; Listing 13-9

; This is a compile-time program.
; It does not generate any executable code.

; Several useful macro functions:

; mout       - Like echo, but allows "%" operators.

; testStr    - Tests an operand to see if it
;              is a string literal constant.

; @sizestr   - Handles missing MASM function.

; isDigit    - Tests first character of its
;              argument to see if it's a decimal
;              digit.

; isSign     - Tests first character of its
;              argument to see if it's a "+"
;              or a "-" character.

; extract1st - Removes the first character
;              from its argument (side effect)
;              and returns that character as
;              the function result.

; getReal    - Parses the argument and returns
;              true if it is a reasonable-
;              looking real constant.

; Test strings and invocations for the
; getReal macro:

 Note: actual macro code appears in previous code snippets
   and has been removed from this listing for brevity 
            
mant1       textequ <1>
mant2       textequ <.2>
mant3       textequ <3.4>
rv4         textequ <1e1>
rv5         textequ <1.e1>
rv6         textequ <1.0e1>
rv7         textequ <1.0e + 1>
rv8         textequ <1.0e - 1>
rv9         textequ <1.0e12>
rva         textequ <1.0e1234>
rvb         textequ <1.0E123>
rvc         textequ <1.0E + 1234>
rvd         textequ <1.0E - 1234>
rve         textequ <-1.0E - 1234>
rvf         textequ <+1.0E - 1234>
badr1       textequ <>
badr2       textequ <a>
badr3       textequ <1.1.0>
badr4       textequ <e1>
badr5       textequ <1ea1>
badr6       textequ <1e1a>

% echo get_Real(mant1) = getReal(mant1) 
% echo get_Real(mant2) = getReal(mant2)
% echo get_Real(mant3) = getReal(mant3)
% echo get_Real(rv4)   = getReal(rv4)
% echo get_Real(rv5)   = getReal(rv5)
% echo get_Real(rv6)   = getReal(rv6)
% echo get_Real(rv7)   = getReal(rv7)
% echo get_Real(rv8)   = getReal(rv8)
% echo get_Real(rv9)   = getReal(rv9)
% echo get_Real(rva)   = getReal(rva)
% echo get_Real(rvb)   = getReal(rvb)
% echo get_Real(rvc)   = getReal(rvc)
% echo get_Real(rvd)   = getReal(rvd)
% echo get_Real(rve)   = getReal(rve)
% echo get_Real(rvf)   = getReal(rvf)
% echo get_Real(badr1) = getReal(badr1)
% echo get_Real(badr2) = getReal(badr2)
% echo get_Real(badr3) = getReal(badr3)
% echo get_Real(badr4) = getReal(badr4)
% echo get_Real(badr5) = getReal(badr5)
% echo get_Real(badr5) = getReal(badr5)
        end

Listing 13-9: Compile-time program with test code for getReal macro

Here’s the build command and (compile-time) program output:

C:\>ml64 /c listing13-9.asm
Microsoft (R) Macro Assembler (x64) Version 14.15.26730.0
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: listing13-9.asm
get_Real(1) = 1
get_Real(.2) = 1
get_Real(3.4) = 1
get_Real(1e1)  = 1
get_Real(1.e1) = 1
get_Real(1.0e1) = 1
get_Real(1.0e + 1) = 1
get_Real(1.0e - 1) = 1
get_Real(1.0e12) = 1
get_Real(1.0e1234) = 1
get_Real(1.0E123) = 1
get_Real(1.0E + 1234) = 1
get_Real(1.0E - 1234) = 1
get_Real(-1.0E - 1234) = 1
get_Real(+1.0E - 1234) = 1
get_Real() = 0
get_Real(a) = 0
get_Real(1.1.0) = 0
get_Real(e1) = 0
get_Real(1ea1) = 0
get_Real(1ea1) = 0

13.17.3 Checking for Registers

Although the opattr operator provides a bit to tell you that its operand is an x86-64 register, that’s the only information opattr provides. In particular, opattr’s return value won’t tell you which register it has seen; whether it’s a general-purpose, XMM, YMM, ZMM, MM, ST, or other register; or the size of that register. Fortunately, with a little work on your part, you can determine all this information by using MASM’s conditional assembly statements and other operators.

To begin with, here’s a simple macro function, isReg, that returns 1 or 0 depending on whether its operand is a register. This is a simple shell around the opattr operator that returns the setting of bit 4:

isReg       macro   parm
            local   result
result      textequ %(((opattr &parm) and 10h) eq 10h)
            exitm   <&result>
            endm

While this function provides some convenience, it doesn’t really provide any information that the opattr operator already provides. We want to know what register appears in the operand as well as the size of that register.

Listing 13-10 (available online at http://artofasm.randallhyde.com/) provides a wide range of useful macro functions and equates for processing register operands in your own macros. The following paragraphs describe some of the more useful equates and macros.

Listing 13-10 contains a set of equates that map register names to numeric values. These equates use symbols of the form regXXX, where XXX is the register name (all uppercase). Examples include the following: regAL, regSIL, regR8B, regAX, regBP, regR8W, regEAX, regEBP, regR8D, regRAX, regRSI, regR15, regST, regST0, regMM0, regXMM0, and regYMM0.

There is also a special equate for the symbol regNone that represents a non-register entity. These equates give numeric values in the range 1 to 117 to each of these symbols (regNone is given the value 0).

The purpose behind all these equates (and, in general, assigning numeric values to registers) is to make it easier to test for specific registers (or ranges of registers) within your macros by using conditional assembly.

A useful set of macros appearing in Listing 13-10 converts textual forms of the register names (that is, AL, AX, EAX, RAX, and so forth) to their numeric form (regAL, regAX, regEAX, regRAX, and so on). The most generic macro function to do this is whichReg(register). This function accepts a text object and returns the appropriate regXXX value for that text. If the text passed as an argument is not one of the valid general-purpose, FPU, MMX, XMM, or YMM registers, whichReg returns the value regNone. Here are some examples of calls to whichReg:

alVal  =       whichReg(al)
axTxt  textequ <ax>
axVal  =       whichReg(axTxt)

aMac   macro   parameter
       local   regVal
regVal =       whichReg(parameter)
       if      regVal eq regNone
       .err    <Expected a register argument>
       exitm
       endif
         .
         .
         .
       endm

The whichReg macro function accepts any of the x86-64 general-purpose, FPU, MMX, XMM, or YMM registers. In many situations, you might want to limit the set of registers to a particular subset of these. Therefore, Listing 13-11 (also available online at http://artofasm.randallhyde.com/) provides the following macro functions:

isGPReg(text) Returns a nonzero register value for any of the general-purpose (8-, 16-, 32-, or 64-bit) registers. Returns regNone (0) if the argument is not one of these registers.
is8BitReg(text) Returns a nonzero register value for any of the general-purpose 8-bit registers. Otherwise, it returns regNone (0).
is16BitReg(text) Returns a nonzero register value for any of the general-purpose 16-bit registers. Otherwise, it returns regNone (0).
is32BitReg(text) Returns a nonzero register value for any of the general-purpose 32-bit registers. Otherwise, it returns regNone (0).
is64BitReg(text) Returns a nonzero register value for any of the general-purpose 64-bit registers. Otherwise, it returns regNone (0).
isFPReg(text) Returns a nonzero register value for any of the FPU registers (ST, and ST(0) to ST(7)). Otherwise, it returns regNone (0).
isMMReg(text) Returns a nonzero register value for any of the MMX registers (MM0 to MM7). Otherwise, it returns regNone (0).
isXMMReg(text) Returns a nonzero register value for any of the XMM registers (XMM0 to XMM15). Otherwise, it returns regNone (0).
isYMMReg(text) Returns a nonzero register value for any of the YMM registers (YMM0 to YMM15). Otherwise, it returns regNone (0).

If you need other register classifications, it’s easy to write your own macro functions to return an appropriate value. For example, if you want to test whether a particular register is one of the Windows ABI parameter registers (RCX, RDX, R8, or R9), you could create a macro function like the following:

isWinParm  macro  theReg
           local  regVal, isParm
regVal      =     whichReg(theReg)
isParm      =     (regVal eq regRCX) or (regVal eq regRDX)
isParm      =     isParm or (regVal eq regR8)
isParm      =     isParm or (regVal eq regR9)

            if    isParm
            exitm <%regVal>
            endif
            exitm <%regNone>
            endm

If you’ve converted a register in text form to its numeric value, at some point you might need to convert that numeric value back to text so you can use that register as part of an instruction. The toReg(reg_num) macro in Listing 13-10 accomplishes this. If you supply it a value in the range 1 to 117 (the numeric values for the registers), this macro will return the text that corresponds to that register value. For example:

mov toReg(1), 0    ; Equivalent to mov al, 0

(Note that regAL = 1.)

If you pass regNone to the toReg macro, toReg returns an empty string. Any value outside the range 0 to 117 will produce an undefined symbol error message.

When working in macros, where you’ve passed a register as an argument, you may find that you need to convert that register to a larger size (for example, convert AL to AX, EAX, or RAX; convert AX to EAX or RAX; or convert EAX to RAX). Listing 13-11 provides several macros to do the up conversion. These macro functions accept a register number as their parameter input and produce a textual result holding the actual register name:

reg8To16 Converts an 8-bit general-purpose register to its 16-bit equivalent⁸
reg8To32 Converts an 8-bit general-purpose register to its 32-bit equivalent
reg8To64 Converts an 8-bit general-purpose register to its 64-bit equivalent
reg16To32 Converts a 16-bit general-purpose register to its 32-bit equivalent
reg16To64 Converts a 16-bit general-purpose register to its 64-bit equivalent
reg32To64 Converts a 32-bit general-purpose register to its 64-bit equivalent

Another useful macro function in Listing 13-10 is the regSize(reg_value) macro. This function returns the size (in bytes) of the register value passed as an argument. Here are some example calls:

alSize    =  regSize(regAL)   ; Returns 1
axSize    =  regSize(regAX)   ; Returns 2
eaxSize   =  regSize(regEAX)  ; Returns 4
raxSize   =  regSize(regRAX)  ; Returns 8
stSize    =  regSize(regST0)  ; Returns 10
mmSize    =  regSize(regMM0)  ; Returns 8
xmmSize   =  regSize(regXMM0) ; Returns 16
ymmSize   =  regSize(regYMM0) ; Returns 32

The macros and equates in Listing 13-10 come in handy when you are writing macros to handle generic code. For example, suppose you want to create a putInt macro that accepts an arbitrary 8-, 16-, or 32-bit register operand and that will print that register’s value as an integer. You would like to be able to pass any arbitrary (general-purpose) register and sign-extend it, if necessary, before printing. Listing 13-12 is one possible implementation of this macro.

; Listing 13-12
 
; Demonstration of putInt macro.
 
; putInt - This macro expects an 8-, 16-, or 32-bit
;          general-purpose register argument. It will
;          print the value of that register as an
;          integer.

putInt      macro   theReg
            local   regVal, sz
regVal      =       isGPReg(theReg)

; Before we do anything else, make sure
; we were actually passed a register:

            if      regVal eq regNone
            .err    <Expected a register>
            endif

; Get the size of the register so we can
; determine if we need to sign-extend its
; value:
            
sz          =       regSize(regVal)

; If it was a 64-bit register, report an
; error:

            if      sz gt 4
            .err    64-bit register not allowed
            endif

; If it's a 1- or 2-byte register, we will need
; to sign-extend the value into EDX:
            
            if      (sz eq 1) or (sz eq 2)
            movsx   edx, theReg
            
; If it's a 32-bit register, but is not EDX, we need
; to move it into EDX (don't bother emitting
; the instruction if the register is EDX;
; the data is already where we want it):
 
            elseif  regVal ne regEDX
            mov     edx, theReg
            endif
            
; Print the value in EDX as an integer:

            call    print
            byte    "%d", 0
            endm

        option  casemap:none

nl          =       10

            .const
ttlStr      byte    "Listing 13-12", 0

 Note: several thousand lines of code omitted here
    for brevity. This includes most of the text from
    Listing 13-11 plus the putInt macro

            .code
            
            include getTitle.inc
            include print.inc
            public  asmMain
asmMain     proc
            push    rbx
            push    rbp
            mov     rbp, rsp
            sub     rsp, 56         ; Shadow storage
            
            call    print
            byte    "Value 1:", 0
            mov     al, 55
            putInt  al
            
            call    print
            byte    nl, "Value 2:", 0
            mov     cx, 1234
            putInt  cx
            
            call    print
            byte    nl, "Value 3:", 0
            mov     ebx, 12345678
            putInt  ebx
            
            call    print
            byte    nl, "Value 4:", 0
            mov     edx, 1
            putInt  edx
            call    print
            byte    nl, 0

allDone:    leave
            pop     rbx
            ret     ; Returns to caller
asmMain     endp
            end

Listing 13-12: putInt macro function test program

Here’s the build command and sample output for Listing 13-12:

C:\>build listing13-12

C:\>echo off
 Assembling: listing13-12.asm
c.cpp

C:\>listing13-11
Calling Listing 13-12:
Value 1:55
Value 2:1234
Value 3:12345678
Value 4:1
Listing 13-12 terminated

Though Listing 13-12 is a relatively simple example, it should give you a good idea of how you could make use of the macros in Listing 13-10.

13.17.4 Compile-Time Arrays

A compile-time constant array is an array that exists only at compile time—data for the array does not exist at runtime. Sadly, MASM doesn’t provide direct support for this useful CTL data type. Fortunately, it’s possible to use other MASM CTL features to simulate compile-time arrays.

This section considers two ways to simulate compile-time arrays: text strings and a list of equates (one equate per array element). The list of equates is probably the easiest implementation, so this section considers that first.

In Listing 13-11 (available online), a very useful function converts all the text in a string to uppercase (toUpper). The register macros use this macro to convert register names to uppercase characters (so that register name comparisons are case-insensitive). The toUpper macro is relatively straightforward. It extracts each character of a string and checks whether that character’s value is in the range a to z, and if it is, it uses that character’s value as an index into an array (indexed from a to z) to extract the corresponding array element value (which will have the values A to Z for each element of the array). Here’s the toUpper macro:

; toUpper - Converts alphabetic characters to uppercase
;           in a text string.

toUpper     macro   lcStr
            local   result

; Build the result string in "result":

result      textequ <>

; For each character in the source string, 
; convert it to uppercase.

            forc    eachChar, <lcStr>

; See if we have a lowercase character:

            if      ('&eachChar' ge 'a') and ('&eachChar' le 'z')
            
; If lowercase, convert it to the symbol "lc_*" where "*"
; is the lowercase character. The equates below will map
; this character to uppercase:

eachChar    catstr  <lc_>,<eachChar>
result      catstr  result, &eachChar

            else
            
; If it wasn't a lowercase character, just append it
; to the end of the string:

result      catstr  result, <eachChar>

            endif
            endm            ; forc
            exitm   result  ; Return result string
            endm            ; toUpper

The “magic” statements, which handle the array access, are these two statements:

eachChar    catstr  <lc_>,<eachChar>
result      catstr  result, &eachChar

The eachChar catstr operation produces a string of the form lc_a, lc_b, . . . , lc_z whenever this macro encounters a lowercase character. The result catstr operation expands a label of the form lc_a, . . . , to its value and concatenates the result to the end of the result string (which is a register name). Immediately after the toUpper macro in Listing 13-11, you will find the following equates:

lc_a        textequ <A>
lc_b        textequ <B>
lc_c        textequ <C>
lc_d        textequ <D>
lc_e        textequ <E>
lc_f        textequ <F>
lc_g        textequ <G>
lc_h        textequ <H>
lc_i        textequ <I>
lc_j        textequ <J>
lc_k        textequ <K>
lc_l        textequ <L>
lc_m        textequ <M>
lc_n        textequ <N>
lc_o        textequ <O>
lc_p        textequ <P>
lc_q        textequ <Q>
lc_r        textequ <R>
lc_s        textequ <S>
lc_t        textequ <T>
lc_u        textequ <U>
lc_v        textequ <V>
lc_w        textequ <W>
lc_x        textequ <X>
lc_y        textequ <Y>
lc_z        textequ <Z>

Therefore, lc_a will expand to the character A, lc_b will expand to the character B, and so forth. This sequence of equates forms the lookup table (array) that toUpper uses. The array should be called lc_, and the index into the array is the suffix of the array’s name (a to z). The toUpper macro accesses element lc_[character] by appending character to lc_ and then expanding the text equate lc_character (expansion happens by applying the & operator to the eachChar string the macro produces).

Note the following two things. First, the array index doesn’t have to be an integer (or ordinal) value. Any arbitrary string of characters will suffice.⁹ Second, if you supply an index that isn’t within bounds (a to z), the toUpper macro will attempt to expand a symbol of the form lc_xxxx that results in an undefined identifier. Therefore, MASM will report an undefined symbol error should you attempt to supply an index that is not within range. This will not be an issue for the toUpper macro because toUpper validates the index (using a conditional if statement) prior to constructing the lc_xxxx symbol.

Listing 13-11 also provides an example of another way to implement a compile-time array: using a text string to hold array elements and using substr to extract elements of the array from that string. The isXXBitReg macros (is8BitReg, is16BitReg, and so forth) pass along a couple of arrays of data to the more generic lookupReg macro. Here’s the is16BitReg macro:¹⁰

all16Regs   catstr <AX>,
                   <BX>,
                   <CX>,
                   <DX>,
                   <SI>,
                   <DI>,
                   <BP>,
                   <SP>,
                   <R8W>,
                   <R10W>,
                   <R11W>,
                   <R12W>,
                   <R13W>,
                   <R14W>,
                   <R15W>

all16Lens   catstr <2>, <0>,           ; AX
                   <2>, <0>,           ; BX
                   <2>, <0>,           ; CX 
                   <2>, <0>,           ; DX
                   <2>, <0>,           ; SI
                   <2>, <0>,           ; DI
                   <2>, <0>,           ; BP
                   <2>, <0>,           ; SP
                   <3>, <0>, <0>,      ; R8W
                   <3>, <0>, <0>,      ; R9W
                   <4>, <0>, <0>, <0>, ; R10W
                   <4>, <0>, <0>, <0>, ; R11W
                   <4>, <0>, <0>, <0>, ; R12W
                   <4>, <0>, <0>, <0>, ; R13W
                   <4>, <0>, <0>, <0>, ; R14W
                   <4>, <0>, <0>, <0>  ; R15W

is16BitReg  macro   parm
            exitm   lookupReg(parm, all16Regs, all16Lens)
            endm    ; is16BitReg

The all16Regs string is a list of register names (all concatenated together into one string). The lookupReg macro will search for a user-supplied register (parm) in this string of register names by using the MASM instr directive. If instr does not find the register in the list of names, parm is not a valid 16-bit register and instr returns the value 0. If it does locate the string held by parm in all16Regs, then instr returns the (nonzero) index into all16Regs where the match occurs. By itself, a nonzero index does not indicate that lookupReg has found a valid 16-bit register. For example, if the user supplies PR as a register name, the instr directive will return a nonzero index into the all16Regs string (the index of the last character of the SP register, with the R coming from the first character of the R8W register name). Likewise, if the caller passes the string R8 to is16BitReg, the instr directive will return the index to the first character of the R8W entry, but R8 is not a valid 16-bit register.

Although instr can reject a register name (by returning 0), additional validation is necessary if instr returns a nonzero value; this is where the all16Lens array comes in. The lookupReg macro uses the index that instr returns as an index into the all16Lens array. If the entry is 0, the index into all16Regs is not a valid register index (it’s an index to a string that is not at the start of a register name). If the index into all16Lens points at a nonzero value, lookupReg compares this value against the length of the parm string. If they are equal, parm holds an actual 16-bit register name; if they are not equal, parm is too long or too short and is not a valid 16-bit register name. Here’s the full lookupReg macro:

; lookupReg - Given a (suspected) register and a lookup table, convert
;             that register to the corresponding numeric form.

lookupReg   macro   theReg, regList, regIndex
            local   regUpper, regConst, inst, regLen, indexLen

; Convert (possible) register to uppercase:

regUpper    textequ toUpper(theReg)
regLen      sizestr <&theReg>

; Does it exist in regList? If not, it's not a register.

inst        instr   1, regList, &regUpper
            if      inst ne 0

regConst    substr  &regIndex, inst, 1
            if      &regConst eq regLen
            
; It's a register (in text form). Create an identifier of
; the form "regXX" where "XX" represents the register name.

regConst    catStr  <reg>,regUpper

            ifdef   &regConst

; Return "regXX" as function result. This is the numeric value
; for the register.

            exitm   regConst
            endif
            endif
            endif

; If the parameter string wasn't in regList, then return
; "regNone" as the function result:

            exitm   <regNone>
            endm    ; lookupReg

Note that lookupReg also uses the register value constants (regNone, regAL, regBL, and so on) as an associative compile-time array (see the regConst definitions).

13.18 Using Macros to Write Macros

One advanced use of macros is to have a macro invocation create one or more new macros. If you nest a macro declaration inside another macro, invoking that (enclosing) macro will expand the enclosed macro definition and define that macro at that point. Of course, if you invoke the outside (enclosing) macro more than once, you could wind up with a duplicate macro definition unless you take care in the construction of the new macro (that is, by assigning it a new name with each new invocation of the outside macro). In a few cases, being able to generate macros on the fly can be useful.

Consider the compile-time array examples from the previous section. If you want to create a compile-time array by using the multiple equates method, you will have to manually define equates for all the array elements before you can use that array. This can be tedious, especially if the array has a large number of elements. Fortunately, it’s easy to create a macro to automate this process for you.

The following macro declaration accepts two arguments: the name of an array to create and the number of elements to put into the array. This macro generates a list of definitions (using the = directive, rather than the textequ directive) with each element initialized to 0:

genArray    macro   arrayName, elements
            local   index, eleName, getName
            
; Loop over each element of the array:

index       =       0
            while   index lt &elements
            
; Generate a textequ statement to define a single
; element of the array, for example:

; aryXX = 0

; where "XX" is the index (0 to (elements - 1)).

eleName     catstr  <&arrayName>,%index,< = 0>

; Expand the text just created with the catstr directive.

            eleName
            
; Move on to next array index:

index       =       index + 1
            endm    ; while
            
            endm    ; genArray

For example, the following macro invocation creates 10 array elements, named ary0 to ary9:

genArray ary, 10

You can access the array elements directly by using the names ary0, ary1, ary2, . . . , ary9. If you want to access these array elements programmatically (perhaps in a compile-time while loop), you would have to use the catstr directive to create a text equate that has the array name (ary) concatenated with the index. Wouldn’t it be more convenient to have a macro function that creates this text equate for you? It’s easy enough to write a macro that does this:

ary_get     macro   index
            local   element
element     catstr  <ary>,%index
            exitm   <element>
            endm

With this macro, you can easily access elements of the ary array by using the macro invocation ary_get(index). You could also write a macro to store a value into a specified element of the ary array:

ary_set     macro   index, value
            local   assign
assign      catstr  <ary>, %index, < = >, %value
            assign
            endm

These two macros are so useful, you’d probably want to include them with each array you create with the genArray macro. So why not have the genArray macro write these macros for you? Listing 13-13 provides an implementation of genArray that does exactly this.


; Listing 13-13

; This is a compile-time program.
; It does not generate any executable code.

        option  casemap:none

genArray    macro   arrayName, elements
            local   index, eleName, getName
            
; Loop over each element of the array:

index       =       0
            while   index lt &elements
            
; Generate a textequ statement to define a single
; element of the array, for example:

; aryXX = 0

; where "XX" is the index (0 to (elements - 1)).

eleName     catstr  <&arrayName>,%index,< = 0>

; Expand the text just created with the catstr directive:

            eleName
            
; Move on to next array index:

index       =       index + 1
            endm    ; while
            
; Create a macro function to retrieve a value from
; the array:

getName     catstr  <&arrayName>,<_get>

getName     macro   theIndex
            local   element
element     catstr  <&arrayName>,%theIndex
            exitm   <element>
            endm
            
; Create a macro to assign a value to
; an array element.

setName     catstr  <&arrayName>,<_set>

setName     macro   theIndex, theValue
            local   assign
assign      catstr  <&arrayName>, %theIndex, < = >, %theValue
            assign
            endm

            endm    ; genArray

; mout - Replacement for echo. Allows "%" operator
;        in operand field to expand text symbols.

mout        macro   valToPrint
            local   cmd
cmd         catstr  <echo >, <valToPrint>
            cmd
            endm

; Create an array ("ary") with ten elements:

            genArray ary, 10
            
; Initialize each element of the array to
; its index value:

index       = 0
            while   index lt 10
            ary_set index, index
index       =       index + 1
            endm
            
; Print out the array values:

index       =       0
            while   index lt 10
            
value       =       ary_get(index)
            mout    ary[%index] = %value
index       =       index + 1
            endm
            
            end
Listing 13-13: A macro that writes another pair of macros
Here’s the build command and sample output for the compile-time program in Listing 13-13:
C:\>ml64 /c /Fl listing13-13.asm
Microsoft (R) Macro Assembler (x64) Version 14.15.26730.0
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: listing13-13.asm
ary[0] = 0
ary[1] = 1
ary[2] = 2
ary[3] = 3
ary[4] = 4
ary[5] = 5
ary[6] = 6
ary[7] = 7
ary[8] = 8
ary[9] = 9
	13.19	Compile-Time Program Performance
When writing compile-time programs, keep in mind that MASM is interpreting these programs during assembly. This can have a huge impact on the time it takes MASM to assemble your source files. Indeed, it is quite possible to create infinite loops that will cause MASM to (seemingly) hang up during assembly. Consider the following trivial example:
true        =     1
            while true
            endm
Any attempt to assemble a MASM source file containing this sequence will lock up the system until you press ctrl-C (or use another mechanism to abort the assembly process).
Even without infinite loops, it is easy to create macros that take a considerable amount of time to process. If you use such macros hundreds (or even thousands) of times in a source file (as is common for some complex print-type macros), it could take a while for MASM to process your source files. Be aware of this (and be patient if MASM seems to hang up—it could simply be your compile-time programs taking a while to do their job).
If you think a compile-time program has entered an infinite loop, the echo directive (or macros like mout, appearing throughout this chapter) can help you track down the infinite loop (or other bugs) in your compile-time programs.
	13.20	For More Information
Although this chapter has spent a considerable amount of time describing various features of MASM’s macro support and compile-time language features, the truth is this chapter has barely described what’s possible with MASM. Sadly, Microsoft’s documentation all but ignores the macro facilities of MASM. Probably the best place to learn about advanced macro programming with MASM is the MASM32 forum at http://www.masm32.com/board/index.php.
Although it is an older book, covering MASM version 6, The Waite Group’s Microsoft Macro Assembler Bible by Nabajyoti Barkakati and this author (Sams, 1992) does go into detail about the use of MASM’s macro facilities (as well as other directives that are poorly documented these days). Also, the MASM 6.x manual can still be found online at various sites. While this manual is woefully outdated with respect to the latest versions of MASM (it does not, for example, cover any of the 64-bit instructions or addressing modes), it does a decent job of describing MASM’s macro facilities and many of MASM’s directives. Just keep in mind when reading the older documentation that Microsoft has disabled many features that used to be present in MASM.
	13.21	Test Yourself

What does CTL stand for?
When do CTL programs execute?
What directive would you use to print a message (not an error) during assembly?
What directive would you use to print an error message during assembly? 
What directive would you use to create a CTL variable? 
What is the MASM macro escape character operator?
What does the MASM % operator do?
What does the MASM macro & operator do? 
What does the catstr directive do?
What does the MASM instr directive do?
What does the sizestr directive do?
What does the substr directive do?
What are the main (four) conditional assembly directives?
What directives could you use to create compile-time loops?
What directive would you use to extract the characters from a MASM text object in a loop?
What directives do you use to define a macro?
How do you invoke a macro in a MASM source file? 
How do you specify macro parameters in a macro declaration? 
How do you specify that a macro parameter is required?
How do you specify that a macro parameter is optional? 
How do you specify a variable number of macro arguments?
Explain how you can manually test whether a macro parameter is present (without using the :req suffix). 
How can you define local symbols in a macro? 
What directive would you use (generally inside a conditional assembly sequence) to immediately terminate macro expansion without processing any additional statements in the macro? 
How would you return a textual value from a macro function? 
What operator could you use to test a macro parameter to see if it is a machine register versus a memory variable?


^1. %out is a synonym for echo (just in case you see %out in any MASM source files).
^2. If you’re wondering, MASM already uses the length reserved word for other purposes.
^3. endm stands for end macro, in case you’re wondering. MASM considers all CTL looping instructions variants of the MASM macro facility. irp and irpc are synonyms for for and forc, respectively.
^4. On modern processors, using a lookup table is probably not the most efficient way to convert between alphabetic cases. However, this is just an example of filling in the table using the compile-time language. The principles are correct, even if the code is not exactly the best it could be.
^5. Some people will even use movzx ecx, parm1 for 8- or 16-bit values to ensure the HO bits of ECX and RCX are all 0 upon entry into the procedure.
^6. We don’t get to pick the name of the function here. We must call the printf function; we cannot arbitrarily name it _printf in our code. Therefore, this macro uses the identifier cprintf (for call printf).
^7. MASM will treat a sequence of one to eight characters as an integer value. So short strings (eight characters or less) work fine as expressions.
^8. Registers AH, BH, CH, and DH get converted to the same registers as AL, BL, CL, and DL, respectively.
^9. Technically, this type of data structure is a dictionary, or associative array. However, it serves as a perfectly good array for our purposes.
^10. This macro has a couple of slight modifications (using catstr rather than textequ) to make it more readable within this book. Functionally, it is the same as the macro appearing in the actual source code.