Please note that index links to approximate location of each term.
Numbers
8-bit excess-127 exponent, 88
8-bit registers, 10
16-bit integer variables, 54
16-bit registers, 10
16-byte-aligned addresses, 606
32-bit integer variables, 54
32-bit registers, 10
32-byte alignment within a segment, 605
64-byte alignment within a segment, 605
64-byte memory alignment, 607
80x86 memory addressing modes, 105
96-bit rcl
and rcr
operations, 484
128-bit comparisons, 461
128-bit decimal output (conversion to string), 508
256-bit by 64-bit division, 468
8087 FPU, 317
Symbols
%1
(batch file parameter), 34
/c
MASM command line option, 9
.code
section, 108
.const
declaration section, 109
.data
declaration section, 108
.data?
declaration section, 110
.data
directive, 14
.err
CTL statement, 748
!
escape operator (MASM macros), 750
#IA exception (invalid arithmetic operation), 673
.inc files (include files), 848
+infinity, 90
–infinity, 90
.lib files, 869
$
operator, 154
%
operator in the first column of a source line, 751
%
operator (MASM macros), 750
– (unary negation, within a constant expression), 153
+ (within a constant expression), 153
[ ] (within a constant expression), 153
* (within a constant expression), 153
/ (within a constant expression), 153
A
ABI (application binary interface), 27, 261
ABI (Microsoft) register usage, 38
abs
external symbol type, 851
absolute value (floating-point), 349
absolute value (SIMD), 659
access fields of a struct/record, 199
accessing
an element of a single dimensional array, 182
data on the stack, 142
data via a pointer, 162
elements of an array, 183
elements of multidimensional arrays, 196
elements of three- and four-dimensional arrays, 191
fields of a struct/record via a pointer, 199
fields of a union, 206
local variables, 235
record/struct fields, 199
reference parameters, 256
subfields of a nested structure, 200
value parameters, 253
accumulated errors in a floating-point calculation, 315
activation record
construction at runtime, 228
definition, 228
adding 1 to a register or memory location, 149
add
instruction, 21
addition (extended-precision), 454
addition (horizontal, packed), 650
addition (SIMD), 648
addition (vertical, packed), 649
addpd
instruction, 669
addps
instruction, 669
addresses, 9
address expressions, 130
addressing modes, 122
indirect, 124
indirect-plus-offset, 125
register indirect, 124
scaled-indexed, 126
scaling factor, 126
address of an object, 22
addsd
instruction, 371
addss
instruction, 371
Advanced Vector Extensions (AVX), 596
aggregate data types, 174
AH register, 10
copying AH to FLAGS register, 86, 350
AL/AX/EAX register usage in string instructions, 826
algorithm to convert a string to an integer, 546
aliases, 207
align
directive, 121
aligned data movement instructions (SSE/AVX), 610
aligning
bit strings, 710
data in a segment, 605
data objects on the stack or heap, 607
within a record, 204
alignment
data alignment, 119
variable alignment, 121
within a record, 204
allocating storage for arrays, 194. See also arrays
allocating storage for uninitialized arrays, 183
AL register, 10
anatomy of a MASM program, 5
ANDN (and not) operation, 645
andnpd
instruction, 645
AND operation, 55
AND operator, 153
andpd
instruction, 645
anonymous
unions, 208
variables, 125
application binary interface (ABI), 27, 261
application programming interface (API), 35
arbitrary alignment within a segment, 605
arctangent, 361
arithmetic
idioms, 310
logical systems, 310
operators within a constant expression, 153
shift right, 77
arithmetic shifts (SSE/AVX), 647
arrays, 191
accessing elements of an array, 183
accessing elements of multidimensional arrays, 196
allocating storage for a multidimensional array, 194
arrays of arrays, 192
arrays of structs, 203
base address, 182
bubble sort, 188
column-major ordering, 193
declarations, 182
definition, 181
dup
operator, 182
four-dimensional array access (row major), 191
indexing operator, 181
initialized arrays, 183
LARGEADDRESSAWARE
, 183
row-major ordering, 190
sorting, 185
three-dimensional array access (row major), 191
two or more dimensions, 189
uninitialized storage, 183
array variables, 182
ASCII
codes for numeric digits, 95
groups, 94
assembly language procedures, xxviii, 22
assembly-time initialization of structures, 200
assigning, 299
constant to a variable, 299
one variable to another, 299
automatic allocation, 240
automatic code generation, 748
automatic (local) variables, 235
automatic variables, 234
in a procedure, 234
average computation (SIMD), 657
avoiding branches by using calculations, 409
AVX
aligned data movement instructions, 610
AVX-512 memory alignment, 607
AVX, AVX2, AVX-256, AVX-512, 596
AVX/SSE comparison synonyms, 673
extensions, 596
floating-point arithmetic (SIMD), 668
floating-point conversions, 679
instruction operands, 606
memory alignment requirements, 606
packed byte data types, 597
packed dword data types, 598
packed qword data types, 598
packed word data types, 597
programming model, 596
sign extension, 666
unaligned memory access, 606, 612
zero extension, 665
AX register, 10
B
backspace, 93
base address (of an array), 182
Base Pointer register (RBP), 230
Basic Multilingual Plane (Unicode BMP), 97
batch files, 33
BCD (binary coded decimal), 91
arithmetic, 486
numbers, 51
BH register, 10
biased (excess) exponents, 88
big-endian data organization, 115
big-endian to little-endian conversion, 116
binary
data types, 51
digits, 44
formats, 45
numbering system, 43
point (binary fractions), 87
binary-coded decimal (BCD), 91
arithmetic, 487
numbers, 51
representation, 91
binary search, 422
complement, 708
counting, 739
data, 707
fields, 79
inversion, 708
mask, 708
offset, 708
packed data, 79
pattern search, 743
runs, 708
sets, 708
arrays, 733
extraction, 742
merging, 741
reversal, 739
test for 1 bits, 714
bit-by-bit operations, 58
bit string alignment, 710
bit string masking, 58
bitwise operations, 58
blank macro arguments, 767
BL register, 10
BMP (Unicode Basic Multilingual Plane), 97
Boolean
evaluation
complete, 400
short-circuit, 401
expressions, 308
logical systems, 310
values, 51
BP register, 10
bracketing characters in macro parameters, 764
branch out of range, 393
branch-prediction hardware, 448
break
statement, 438
bsf
instruction, 737
bsr
instruction, 737
bswap
instruction, 116
btc
instruction, 715
bt
instruction, 715
btoStr
(byte to string) function, 493
btr
instruction, 715
bts
, btc
, and btr
instructions and CPU performance, 716
bts
instruction, 715
bubble sort, 185
busy bit (FPU), 324
BX register, 10
byte, 52
alignment in a segment, 605
data directive, 53
directive, 15
byte-sized lanes, 598
byte strings, 825
byte vectors (packed bytes), 597
C
C++ compiler, 4
callee register preservation, 222
caller register preservation, 222
call indirect, 278
calling assembly code from C/C++, 4
calling C/C++ code from assembly, 4
call
instruction, 22, 216, 218
carriage return, 93
and
, or
, and xor
instruction effect, 712
as a bit accumulator, 716
setting after an arithmetic operation, 71
case
labels (noncontiguous), 418
case-sensitive identifiers, 8
catstr
directive, 751
cbw
instruction, 288
C/C++ Standard Library, 4
cd
command, 930
cdecl
calling convention, 262
cdqe
instruction, 288
cdq
instruction, 288
central processing unit, 9
change sign (floating-point), 349
char
data type, 96
declaring characters in a MASM program, 96
character
data type, 92
literal constants, 95
strings, 174
chdir
command, 930
checking a bit to see if it is zero or one, 298
checking to see if a macro argument is blank, 767
checking whether a bit string contains all 1 bits, 714
choosing an alignment value for variables, 121
CH register, 10
C integer types, 454
class argument for segment directive, 605
cld
instruction, 86
clearing
bits, 708
clearing bits prior to comparing them, 709
FPU exception bits, 363
CLI (command line interpreter), xxx
cd
command, 930
del
command, 932
cli
instruction, 86
clipping (saturation), 68
closeHandle
function, 890
CL register, 10
in rotate operations, 79
in shl
instruction, 75
cls
command, 931
cmd.exe (command line interpreter), xxx
cmovae
instruction, 395
cmova
instruction, 395
cmovbe
instruction, 395
cmovb
instruction, 395
cmove
instruction, 395
cmovge
instruction, 395
cmovg
instruction, 395
cmovnp
instruction, 395
cmovpe
instruction, 395
cmovp
instruction, 395
cmovle
instruction, 395
cmovl
instruction, 395
cmovnae
instruction, 395
cmovna
instruction, 395
cmovnbe
instruction, 395
cmovnb
instruction, 395
cmovne
instruction, 395
cmovnge
instruction, 395
cmovng
instruction, 395
cmovnle
instruction, 395
cmovnl
instruction, 395
cmovno
instruction, 395
cmovns
instruction, 394
cmovnz
instruction, 394
cmovo
instruction, 394
cmovpo
instruction, 395
cmovs
instruction, 394
cmovz
instruction, 394
cmpeqps
instruction, 674
cmpeqsd
instruction, 373
cmpeqss
instruction, 372
cmpleps
instruction, 674
cmplesd
instruction, 373
cmpless
instruction, 372
cmpltps
instruction, 674
cmpltsd
instruction, 373
cmpltss
instruction, 372
cmpneps
instruction, 674
cmpnesd
instruction, 373
cmpness
instruction, 372
cmpnleps
instruction, 674
cmpnless
instruction, 372
cmpnltps
instruction, 674
cmpnltsd
instruction, 373
cmpnltss
instruction, 372
cmpordps
instruction, 674
cmpordsd
instruction, 373
cmpordss
instruction, 372
cmppd
instruction, 671
cmpsd
instruction, 372
cmpss
instruction, 372
cmps
string instruction, 832
cmpunordps
instruction, 674
cmpunordsd
instruction, 373
cmpunordss
instruction, 372
coalescing bit strings, 728
code planes (Unicode), 97
code points (Unicode), 96
code sections, 108
code snippets, xxviii
coercion, 157
collecting disparate bits into a bit string, 728
collecting macro parameters, 764
column major ordering, 193
formula, 193
command line, xxx
command line assembler, 6
command line interpreter. See CLI
common C++ data type sizes, 35
commutative operators, 307
comparing
a register to zero, 298
bits, 708
dates, 85
strings, 825
comparison for less than (packed/vector/SIMD), 662
comparison operators in a constant expression, 153
comparison results (SIMD), 663, 678
comparisons
dates, 85
floating point, 323
SIMD, 660
comparison synonyms (AVX/SSE), 673
compile-time
decisions, 752
expressions and operators, 750
language, 748
loops, 756
procedures, 760
compile-time function
sizeof
, 207
compile-time language. See CTL
compile-time statement
echo
, 748
else
, 753
elseif
, 753
.err
, 748
forc
, 756
if
, 752
while
, 756
compile-time versus runtime expressions, 155–156
complete Boolean evaluation, 400
complex arithmetic expressions, 302
complex string functions, 837
composite data types, 174
computation via table lookup, 584
computing
arctangent, 362
cos, 361
cosine, 361
log2(x), 362
log2(x) plus one, 362
sine, 361
tangent, 361
2x minus one, 361
computing the address of a memory variable, 22
computing the length of a string at assembly time, 176
concatenation of text values in MASM, 751
conditional
compilation, 752
jmp
aliases, 392
jmp
instructions (opposite conditions), 391–392
statements, 396
conditional jump instructions, 70
conditional jumps
ja
, 391
jae
, 391
jb
, 391
jbe
, 391
je
, 391
jg
, 391
jge
, 391
jl
, 391
jle
, 391
jna
, 391
jnae
, 391
jnb
, 391
jnbe
, 391
jne
, 391
jng
, 391
jnge
, 391
jnl
, 391
jnle
, 391
jno
, 391
jnp
, 391
jns
, 391
jnz
, 391
jo
, 391
jp
, 391
jpe
, 391
jpo
, 391
js
, 391
jz
, 391
conditional move (if carry), 716
conditional move instructions, 394
condition code
flags, 12
FPU condition codes, 322
settings after cmp
instruction, 294
conditioning inputs, 589
configuring software for several environments, 754
constant
0.0 (FPU load instruction), 360
expressions in CTL statements, 750
log2(10), 361
log2(e), 361
log10(2), 361
loge(2), 361
pi, 360
constant declarations, 18, 149
constant expression evaluation, 156
constant expressions, 164
constant values, 18
construction of an activation record, 228
continue
statement, 438
control characters, 93
conversions (floating-point instructions), 328
converting
32-bit integers to floating-point, 679
arithmetic expressions to postfix notation, 366
ASCII digit code (0 to 9) to its corresponding integer value, 95
BCD to floating-point, 329
between big-endian and little-endian forms, 116
binary to hexadecimal, 48
binary value (0 to 9) to its ASCII character representation, 95
break
statements to pure assembly, 438
complex expressions to assembly, 302
continue
statements to pure assembly, 439
double-precision floating-point values to single-precision, 680
floating-point expressions to assembly, 364
floating-point values to a decimal string, 527
floating-point values to an integer, 319, 679
with truncation, 680
floating-point values to exponential form, 537
forever
statements to pure assembly, 436
for
statements to pure assembly, 437
hexadecimal digit to a character, 493
hexadecimal to binary, 47
if
statements to pure assembly, 396
integer to floating-point, 328
larger integer object to a smaller one (via saturation), 667
noncommutative arithmetic operators to assembly, 305
numbers to strings using fbstp
, 503
postfix notation to assembly, 367
repeat..until
loop to pure assembly, 434
simple expressions to assembly, 300
single-precision floating-point values to double-precision, 680
strings to integers, 546
while
loops to pure assembly, 433
copy command (CLI), 931
copying
arbitrary number of bytes using the movsd
instruction, 831
overlapping arrays using the movs
string instructions, 830
cosine, 361
counting bits, 739
cpuid
instruction, 599
CPU registers, 10
cqo
instruction, 288
creating lookup tables, 590
CTL (compile-time language), 748
conditional assembly, 752
decisions, 752
else
, 753
elseif
, 753
endif
, 753
endm
, 756
forc
, 756
for
loop, 756
if
statement, 752
instr
operator, 751
loops, 756
macros, 760
!
operator, 750
%
operator, 750
procedures (compile-time), 760
sizestr
operator, 752
substring
operator, 752
while
statement, 756
cvtdq2pd
instruction, 679
cvtdq2ps
instruction, 679
cvtpd2dq
instruction, 679
cvtpd2ps
instruction, 680
cvtps2dq
instruction, 680
cvtps2pd
instruction, 680
cvttpd2dq
instruction, 680
cvttps2dq
instruction, 680
cwde
instruction, 288
cwd
instruction, 288
CX register, 10
D
dangling pointers, 169
data alignment, 119
in a segment, 605
Microsoft ABI, 144
data declaration directives, 15
data representation, 147
data type coercion, 157
data types associated with SSE/AVX move
instructions, 622
data type sizes (C++), 35
date
command (CLI), 931
date comparison, 85
date/time stamp of a file in a make operation, 865
db
directive, 15
dd
directive, 15
debugging CTL programs, 749
debugging with conditional compilation, 755
decimal arithmetic, 453, 486, 581
decimal numbering system, 44
decimal (signed) to string conversion (extended-precision), 513
decimal string-to-integer conversion, 546
decimal string-to-numeric conversion (extended-precision), 569
decimal-to-string conversion, 500
dec
instruction, 149
decisions in MASM, 397
declarations
.code
section, 108
.const
, 109
.data
, 108
.data?
, 110
typedef
, 156
declaring character variables in a MASM program, 96
declaring constants, 18
declaring parameters with the proc
directive, 255
default macro parameter values, 768
default segment alignment, 605
defining read-only data in a user-defined segment, 605
definite loop, 437
del
command (CLI), 932
delimiter characters, 546
delimiting macro parameters, 764
denormal exception flag (DE, SSE), 369
denormalized
exception (FPU), 320
floating-point values, 325
values, 90
denormal mask (DM, SSE), 370
denormals are zero (DAZ, SSE), 370
dependencies (in a makefile), 864
destructuring, 407
determining which CPU a piece of software is running on, 599
DH register, 10
dialog box (example code), 879
differences in the imul
instructions, 291
different-size operands, 485
dir
command, 932
direction flag and the string instructions, 826
directives, 6
?
, 15
align
, 121
catstr
, 751
db
, 15
dd
, 15
dq
, 15
dt
, 15
dw
, 15
else
, 753
elseif
, 753
endif
, 753
endp
, 216
ends
(for structs), 198
extern
, 850
if
, 753
ifb
, 767
ifdef
, 849
ifdif
, 767
ifdifi
, 767
ifidn
, 767
ifidni
, 767
ifnb
, 767
include
, 848
instr
, 751
label
, 156
local
(in procedures), 237
macro
, 760
option epilogue
, 238
option prologue
, 238
real4
, 15
real8
, 15
real10
, 15
sdword
, 15
sizestr
, 752
sqword
, 15
struct
, 198
substr
, 752
sword
, 15
tbyte
, 15
textequ
, 151
typedef
, 156
while
, 756
direct jump instructions, 382
DI register, 10
disadvantages of macros (versus procedures), 762
displacements, 113
displaying equate values during assembly, 751
distributing bit strings, 728
div
and idiv
instructions, 291, 466
divide-by-zero exception (FPU), 320
divide-by-zero mask (ZM, SSE), 370
division without div
or idiv
, 312
divpd
instruction, 670
divps
instruction, 670
divsd
instruction, 371
divss
instruction, 371
DL register, 10
domain conditioning, 589
dot notation for accessing struct/record fields, 199
dot operator, 199
double-precision floating-point format, 88
double-precision (floating-point) lanes, 599
double-precision vector types, 597
double word, 51, 54. See also dword
double-word strings, 825
dq
directive, 15
dt
directive, 15
dtoStr
(double word to string) function, 493
duplicate include files/operations (preventing), 849
duplicating data in an XMM/YMM register, 620
alignment within a segment, 605
dword-sized lanes, 598
vectors (packed dwords), 598
DX register, 10
dyadic operations, 55
dynamic
type systems, 209
E
e10toStr
function, 537
EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP registers, 10
echo
CTL statement, 748
effective address, 125
EFLAGS register, 12
else
compile-time statement, 753
else
directive, 753
elseif
compile-time statement, 753
elseif
directive, 753
else
statement, 397
empty macro arguments, 767
endian byte organization, 114
endian conversions, 116
endif
directive, 753
endm
compile-time statement, 756, 759
endp
directive, 216
ends
directive (for structs), 198
ends
(end segment) directive, 604
enumerated data constants in MASM, 156
epiloguedef
option, 239
epilogue
(operand for option
directive), 238
eq
operator, 153
equality (macro arguments), 767
equates, 149
erase
command (CLI), 932
escape character in MASM expressions, 750
exception-handling in C++, 30
exceptions
divide by zero (FPU), 320
flags (FPU), 322
FPU exception bits, 363
masks (FPU), 320
overflow (FPU), 320
excess-1023 exponent, 88
excess (biased) exponents, 88
exclusive-or operation, 55, 57
executing a loop backward, 445
exponent of a floating-point number, 88
expressions, 302
and temporary values, 307
extended-precision
addition, 454
AND, 479
arithmetic, 453
comparisons, 458
conversions
decimal-to-string (signed), 513
decimal-to-string (unsigned), 566
string-to-numeric, 555
unsigned integer-to-string, 508
division, 466
floating-point format, 89
formatted I/O, 514
I/O, 491
multiplication, 461
neg
, 477
NOT, 480
numeric conversion routines, 546
OR, 479
rotates, 484
shifts, 480
shifts and the flags, 482
XOR, 480
external directives, 849
external symbols, 850
external symbol types, 851
externdef
directive, 24, 849, 851
extracting
bits, 708
bit strings, 742
sign bits from SSE/AVX floating-point values, 676
extractps
instruction, 643
F
f2xm1
instruction, 361
fabs
instruction, 349
facade code, 27
fadd
instruction, 330
faddp
instruction, 330
false precision, 315
false (representation), 308
FASTCALL calling convention, 263
fbld
instruction, 329, 488, 566
fbstp
instruction, 329, 488, 503, 566
fchs
instruction, 349
fclex
instruction, 363
fcomi
instruction, 357
fcomip
instruction, 357
fcos
instruction, 361
fdiv
instruction, 343
fdivp
instruction, 343
fdivr
instruction, 343
fdivrp
instruction, 343
ficom
instruction, 322
ficomp
instruction, 322
field, 197
field access (of a record/struct) via a pointer, 199
field alignment within a record, 204
fild
instruction, 328
finit
instruction, 363
fist
instruction, 328
fistp
instruction, 328
fisttp
instruction, 328
flags, 12
and
instruction, 712
cmp
instruction effect on flags, 293
copying AH register to flags, 86, 350
direction, 826
lahf
instruction, 86
or
instruction, 712
overflow, 293
sign, 293
xor
instruction, 712
zero, 293
flag settings for the logical instructions (and
, or
, xor
, and not
), 71
FLAGS register, 12
fld1
instruction, 360
fld
instruction, 326
fldl2e
instruction, 361
fldl2t
instruction, 361
fldlg2
instruction, 361
fldln2
instruction, 361
fldpi
instruction, 360
fldz
instruction, 360
floating-point
arithmetic, 317
calculations, 317
SIMD, 671
control register, 317
conversion to integer, 319, 328
conversion to string, 519, 527
exponential form, 537
data registers, 317
data types, 324
division, 343
exchange registers, 327
FPU (floating-point unit), 11, 317
multiplication, 339
negation, 349
normalized format, 325
overflow, 316
overflow exception, 320
partial remainder, 348
precision control, 320
pushing a value onto the FPU stack, 326
pushing the constant 1.0 onto the FPU stack, 360
remainder, 348
rounding control, 319
status register, 317
string conversion (to real
), 570
string output, 519
subtraction, 334
underflow, 316
unordered comparisons, 357, 360
unit. See FPU
values, 54
as parameters, 244
flush to zero (FZ, SSE), 370
fmul
instruction, 339
fmulp
instruction, 339
fnclex
instruction, 363
fninit
instruction, 363
fnstsw
instruction, 364
forcing
a zero result, 56
bits to one, 58
bits to zero, 58
for
and endm
compile-time statement, 756, 759
for
loops, 437
format specifiers (printf
), 24
formatted numeric-to-string conversions, 514
formula for two-dimensional row-major access, 191
FORTRAN programming language, 424
four-dimensional array element access, 191
fpatan
instruction, 362
fprem1
instruction, 348
fprem
instruction, 348
fptan
instruction, 361
FPU (floating-point unit), 11, 317
busy bit, 324
condition code bits, 322
control register, 318
data movement instructions, 326
data registers, 317
data types, 324
denormalized result exception, 320
divide-by-zero exception, 320
exception bits, 363
exception flags, 322
exception masks, 320
floating-point unit, 317
invalid operation exception, 320
overflow exception, 320
popping the FPU stack, 326
precision exception, 321
registers, 317
rounding control, 319
round-up and round-down, 319
stack fault flag, 322
status word, 321
top of stack pointer, 324
truncate during computations, 319
underflow exception, 321
free (memory deallocation) function, 170
frndint
instruction, 349
fsincos
instruction, 361
fsin
instruction, 361
fst
instruction, 326
fstp
instruction, 326
fstsw
instruction, 321, 350, 364
fsub
instruction, 334
fsubp
instruction, 334
fsubr
instruction, 334
fsubrp
instruction, 334
fucom
instruction, 323
fucomp
instruction, 323
fucompp
instruction, 323
function
computation via table lookup, 584
results, 270
fxam
instruction, 323
fxch
instruction, 327
fyl2x
instruction, 362
fyl2xp1
instruction, 362
G
general protection fault, 107
general purpose registers, 10, 12
ge
operator, 153
getLastError
function, 891
getStdErrHandle
function, 883
GetStdHandle
(Win32 API function), 875
getStdInHandle
function, 884
getStdOutHandle
function, 883
getting the address of a variable, 22
granularity (MMU pages), 111
greater-than comparisons on SSE CPUs, 673
GT
operator, 153
guard digits/bits, 314
H
haddpd
instruction, 671
haddps
instruction, 671
handling SIMD comparisons, 663
heap variable address alignment, 607
Hello, world!
compile-time program, 748
MASM program, 6
stand-alone version, 874
hexadecimal
digit-to-character conversion, 493
hexadecimal-to-string conversion, 492
using table lookup, 497
numbers, 51
output (extended-precision), 499
string-to-numeric conversion, 556
high32
operator, 153
high
operator, 153
high-order (HO), 46
byte, 53
nibble, 52
word, 54
highword
operator, 153
HO (high-order), 46
horizontal addition, 650
and subtraction (floating-point), 671
hsubpd
instruction, 671
hsubps
instruction, 671
hybrid programs (assembly and C/C++), 7
I
i128toStr
function, 513
identifiers, 8
idiom, 685
machine idiosyncrasies, 310
idiv
instruction, 291, 407, 466
IEEE
floating-point standard, 86, 318, 320
ifb
directive, 767
if
compile-time statement, 752
if
conditional statement, 396
ifdef
directive, 849
ifdif
directive, 767
ifdifi
directive, 767
if
directive, 753
ifidn
directive, 767
ifidni
directive, 767
ifnb
directive, 767
imul
instruction, 148, 289, 461
inc
instruction, 149
include
directive, 848
inclusive-or operation, 56
indirect
addressing modes, 124
indirect and scaled-indexed addressing modes, 106
indirect-plus-offset addressing mode, 125
calls, 278
jump
instructions, 383
through a memory pointer, 389
induction variables, 449
infinite loops, 433
infinite-precision arithmetic, 313
infinity (IEEE representation), 90
infix notation, 364
initialized arrays, 183
initializing struct fields, 200
initializing the FPU, 363
input conditioning, 589
input/output (I/O), 9
input redirection, 927
inserting
a bit into a bit array, 734
a bit set into another bit string, 710
a bit string into a larger bit string, 718
insertps
instruction, 643
instr
directive, 751
instructions
add
, 21
addpd
, 669
addps
, 669
addsd
, 371
adss
, 371
andnpd
, 645
andpd
, 645
bsf
, 737
bsr
, 737
bswap
, 116
bt
, 715
btc
, 715
btr
, 715
bts
, 715
cbw
, 288
cdq
, 288
cdqe
, 288
cld
, 86
cli
, 86
cmova
, 395
cmovae
, 395
cmovb
, 395
cmovbe
, 395
cmove
, 395
cmovg
, 395
cmovge
, 395
cmovl
, 395
cmovle
, 395
cmovna
, 395
cmovnae
, 395
cmovnb
, 395
cmovnbe
, 395
cmovne
, 395
cmovng
, 395
cmovnge
, 395
cmovnl
, 395
cmovnle
, 395
cmovno
, 395
cmovnp
, 395
cmovns
, 394
cmovnz
, 394
cmovo
, 394
cmovp
, 395
cmovpe
, 395
cmovpo
, 395
cmovs
, 394
cmovz
, 394
cmpeqps
, 674
cmpeqsd
, 373
cmpeqss
, 372
cmpleps
, 674
cmplesd
, 373
cmpless
, 372
cmpltps
, 674
cmpltsd
, 373
cmpltss
, 372
cmpneps
, 674
cmpnesd
, 373
cmpness
, 372
cmpnleps
, 674
cmpnless
, 373
cmpnltps
, 674
cmpnltsd
, 373
cmpnltss
, 372
cmpordps
, 674
cmpordsd
, 373
cmpordss
, 373
cmppd
, 671
cmps
, 832
cmpsd
, 372
cmpss
, 372
cmpunordps
, 674
cmpunordsd
, 373
cmpunordss
, 372
cqo
, 288
cvtdq2pd
, 679
cvtdq2ps
, 679
cvtpd2dq
, 679
cvtpd2ps
, 680
cvtps2dq
, 680
cvtps2pd
, 680
cvttpd2dq
, 680
cvttps2dq
, 680
cwd
, 288
cwde
, 288
dec
, 149
divpd
, 670
divps
, 670
divsd
, 371
divss
, 371
extractps
, 643
f2xm1
, 361
fabs
, 349
fadd
, 330
faddp
, 330
fchs
, 349
fclex
, 363
fcomi
, 357
fcomip
, 357
fcos
, 361
fdiv
, 343
fdivp
, 343
fdivr
, 343
fdivrp
, 343
ficom
, 322
ficomp
, 322
fild
, 328
finit
, 363
fist
, 328
fistp
, 328
fisttp
, 328
fld
, 326
fld1
, 360
fld2e
, 361
fldl2t
, 361
fldlg2
, 361
fldln2
, 361
fldpi
, 360
fldz
, 360
floating-point comparisons, 350
floating-point conversions, 328
fmul
, 339
fmulp
, 339
fnclex
, 363
fninit
, 363
fnstsw
, 364
fpatan
, 362
fprem
, 348
fprem1
, 348
fptan
, 361
FPU data movement, 326
frndint
, 349
fsin
, 361
fsincos
, 361
fst
, 326
fstp
, 326
fsub
, 334
fsubp
, 334
fsubr
, 334
fsubrp
, 334
fucom
, 323
fucomp
, 323
fxam
, 323
fxch
, 327
fyl2x
, 362
fyl2xp1
, 362
haddpd
, 671
haddps
, 671
hsubpd
, 671
hsubps
, 671
inc
, 149
indirect jumps, 383
insertps
, 643
intmul
, 291
jp
, 391
jpe
, 391
jpo
, 391
lahf
, 86
lddqu
, 622
ldmxcsr
, 370
leave
, 234
lods
, 836
maxpd
, 670
maxps
, 670
maxsd
, 371
maxss
, 371
minpd
, 670
minps
, 670
minsd
, 371
minss
, 371
movapd
, 610
movaps
, 610
movddup
, 621
movdqa
, 610
movdqu
, 612
movhlps
, 619
movhpd
, 617
movhps
, 617
movlhps
, 619
movlpd
, 615
movlp
s, 615
movmskpd
, 676
movmskps
, 676
movs
, 826
movsb
, 826
movshdup
, 620
movsldup
, 620
movss
, 370
movsw
, 826
movupd
, 612
movups
, 612
mulpd
, 670
mulps
, 670
mulsd
, 371
mulss
, 371
neg
, 478
orpd
, 645
pabsb
, 659
pabsd
, 659
pabsw
, 659
packssdw
, 667
packsswb
, 667
packusdw
, 667
packuswb
, 667
paddb
, 648
paddd
, 649
paddq
, 649
pavgb
, 657
pavgw
, 657
pclmulqdq
, 656
pcmpeqb
, 660
pcmpeqd
, 660
pcmpeqq
, 660
pcmpeqw
, 660
pcmpgtb
, 660
pcmpgtd
, 660
pcmpgtq
, 660
pcmpgtw
, 660
pextrb
, 641
pextrd
, 642
pextrq
, 642
pextrw
, 642
phaddd
, 650
phaddw
, 650
pinsrd
, 642
pinsrq
, 642
pinsrw
, 642
pmaxsb
, 657
pmaxsd
, 658
pmaxsq
, 658
pmaxsw
, 657
pmaxub
, 658
pmaxud
, 658
pmaxuq
, 658
pmaxuw
, 658
pminsb
, 658
pminsd
, 658
pminsw
, 658
pminub
, 658
pminud
, 658
pminuq
, 658
pminuw
, 658
pmovmskb
, 662
pmovsxbd
, 666
pmovsxbq
, 666
pmovsxbw
, 666
pmovsxdq
, 666
pmovsxwd
, 666
pmovsxwq
, 666
pmovzxbd
, 665
pmovzxbq
, 665
pmovzxbw
, 665
pmovzxdq
, 665
pmovzxwd
, 665
pmovzxwq
, 665
pmuldq
, 656
pmulld
, 655
pmuludq
, 656
popf
, 140
popfd
, 140
pshufb
, 625
pshufd
, 626
pshufhw
, 628
pshuflw
, 628
psignb
, 659
psignd
, 660
psignw
, 659
pslldq
, 647
psllw
, 647
psrldq
, 647
psubb
, 654
psubd
, 653
psubq
, 653
psubw
, 654
ptest
, 646
punpckhbw
, 637
punpckhdq
, 637
punpckhqdq
, 637
punpcklbw
, 637
punpckldq
, 637
punpcklqdq
, 637
punpcklwd
, 637
pushf
, 140
pushfq
, 140
pushw
, 134
rcpss
, 372
repe
prefix on cmpsb
, cmpsw
, cmpsd
, and cmpsq
, 827
repne
prefix on cmpsb
, cmpsw
, cmpsd
, and cmpsq
, 827
rep
prefix on movsb
, movsw
, movsd
, and movsq
, 826
rol
, 78
ror
, 78
rsqrtps
, 670
rsqrtss
, 372
scas
, 835
seta
, 296
setae
, 296
setb
, 296
setbe
, 296
sete
, 296
setg
, 296
setge
, 297
setl
, 297
setna
, 296
setnae
, 296
setnb
, 296
setnbe
, 296
setne
, 296
setng
, 297
setnge
, 297
setnl
, 297
setnle
, 296
setno
, 295
setnp
, 295
setns
, 295
seto
, 295
setp
, 295
setpe
, 295
setpo
, 295
sets
, 295
shld
, 482
shrd
, 482
shufpd
, 630
shufps
, 630
sqrtpd
, 670
sqrtps
, 670
sqrtsd
, 372
sqrtss
, 372
stc
, 716
std
, 86
sti
, 86
stmxcsr
, 370
stos
, 835
sub
, 21
subpd
, 669
subps
, 669
subsd
, 371
subss
, 371
unpckhpd
, 633
unpckhps
, 633
unpcklpd
, 633
unpcklps
, 633
vaddpd
, 669
vaddps
, 669
vandnpd
, 645
vandpd
, 645
vcvtdq2pd
, 679
vcvtdq2ps
, 679
vcvtpd2dq
, 679
vcvtpd2ps
, 680
vcvtps2dq
, 680
vcvtps2pd
, 680
vcvttpd2dq
, 680
vcvttps2dq
, 680
vdivpd
, 670
vdivps
, 670
vextractps
, 643
vhaddpd
, 671
vhaddps
, 671
vhsubpd
, 671
vhsubps
, 671
vinsertps
, 643
vlddqu
, 622
vmaxpd
, 670
vmaxps
, 670
vminpd
, 670
vminps
, 670
vmovapd
, 610
vmovaps
, 610
vmovd
, 609
vmovddup
, 621
vmovdqa
, 610
vmovdqu
, 612
vmovhlps
, 619
vmovhpd
, 618
vmovhps
, 618
vmovlhps
, 619
vmovlpd
, 615
vmovlps
, 615
vmovmskpd
, 676
vmovmskps
, 676
vmovq
, 609
vmovshdup
, 620
vmovsldup
, 620
vmovupd
, 612
vmovups
, 612
vmulpd
, 670
vmulps
, 670
vorpd
, 645
vpabsb
, 659
vpabsd
, 659
vpabsw
, 659
vpackssdw
, 667
vpacksswb
, 667
vpackusdw
, 667
vpackuswb
, 667
vpaddb
, 649
vpaddd
, 649
vpaddq
, 649
vpavgb
, 657
vpavgw
, 657
vpclmulqdq
, 656
vpcmpeqb
, 661
vpcmpeqd
, 661
vpcmpeqq
, 661
vpcmpeqw
, 661
vpcmpgtb
, 661
vpcmpgtd
, 661
vpcmpgtq
, 661
vpcmpgtw
, 661
vpextrb
, 642
vpextrd
, 642
vpextrq
, 642
vpextrw
, 642
vphaddd
, 650
vphaddw
, 650
vpinsrd
, 643
vpinsrq
, 643
vpinsrw
, 643
vpmaxsb
, 657
vpmaxsd
, 658
vpmaxsq
, 658
vpmaxsw
, 657
vpmaxub
, 658
vpmaxud
, 658
vpmaxuq
, 658
vpmaxuw
, 658
vpminsb
, 658
vpminsd
, 658
vpminsw
, 658
vpminub
, 658
vpminud
, 658
vpminuq
, 658
vpminuw
, 658
vpmovmskb
, 662
vpmovsxbd
, 666
vpmovsxbq
, 666
vpmovsxbw
, 666
vpmovsxdq
, 666
vpmovsxwd
, 666
vpmovsxwq
, 666
vpmovzxbd
, 665
vpmovzxbq
, 665
vpmovzxbw
, 665
vpmovzxdq
, 665
vpmovzxwd
, 665
vpmovzxwq
, 665
vpmuldq
, 656
vpmulld
, 655
vpmuludq
, 656
vpshufb
, 625
vpshufd
, 626
vpshufhw
, 628
vpshuflw
, 628
vpshufps
, 632
vpsignb
, 659
vpsignd
, 660
vpsignw
, 659
vpslldq
, 647
vpsllw
, 647
vpsrldq
, 647
vpsubb
, 654
vpsubd
, 653
vpsubq
, 653
vpsubw
, 654
vptest
, 646
vpunpckhbw
, 640
vpunpckhdq
, 641
vpunpckhqdq
, 641
vpunpckhwd
, 640
vpunpcklbw
, 640
vpunpckldq
, 640
vpunpcklqdq
, 641
vrsqrtps
, 670
vshufpd
, 632
vsqrtpd
, 670
vsqrtps
, 670
vsubpd
, 669
vsubps
, 669
vunpckhpd
, 633
vunpckhps
, 633
vunpcklpd
, 633
vunpcklps
, 633
vxorpd
, 645
xchg
, 116
xlat
, 584
xorpd
, 645
integer
addition (SIMD), 648
arithmetic (SIMD), 648
average computation (SIMD), 657
comparisons (SIMD), 660
conversions (SIMD), 664
integer portion of a floating-point number, 349
integer-to-floating-point conversion, 328
integer-to-string conversion (extended precision, unsigned), 508
integer-to-string conversion (signed), 507
less-than comparison (SIMD), 662
multiplication (SIMD), 654
signed remainder/modulo, 407
subtraction (SIMD), 653
integer types in C, 454
integer unpack instructions (SSE/AVX), 637
interleaving comparison results (SIMD), 664
imul
instruction, 291
invalid arithmetic operation (IA), 673
invalid operation exception flag (IE, SSE), 369
invalid operation exception (FPU), 320
invalid operation mask (IM, SSE), 370
invariant computations, 446
inverting
bits in a bit string, 57
selected bits in a bit set, 712
I/O (input/output), 9
iSize
function, 516
J
jc
instruction, 70, 74, 390, 716
je
instruction, 72, 74, 390, 390–391
jnc
instruction, 70, 74, 390, 716
jne
instruction, 72, 74, 390, 390–391
jnge
instruction, 74, 390, 392
jnp
instruction, 390
jnz
instruction, 70, 74, 298, 390
jpe
instruction, 390
jp
instruction, 390
jpo
instruction, 390
jump instructions, 382
jz
instruction, 70, 74, 298, 390
K
KCS Floating-Point Standard, 87
L
label
declaration, 114
label
directive, 156
labels, 378
in a procedure, 219
lahf
instruction, 86
lanes (elements of an SSE/AVX packed array), 598
LARGEADDRESSAWARE
, 127
and arrays, 183
large address unaware applications, 127
large parameters, 258
last-in, first-out (LIFO) data structures, 137
last set bit, 736
lddqu
instruction, 622
ldmxcsr
instruction, 370
leaf function, 278
leave
instruction, 234
left
rotates, 78
shifts, 75
left-associative operators, 304
lengthof
operator, 153
length of text string in MASM textual constants, 752
length-prefixed strings, 175
le
operator, 153
less-than comparison (SIMD), 662
lexical scope, 378
lexicographical ordering, 833
library file, 869
library module, 853
lifetime of a local variable, 234
LIFO (last in, first out), 137
linear search, 422
line feed, 93
listings, xxviii
literal constant, 18
little-endian data organization, 114
little-endian to big-endian conversion, 116
LO (low-order), 46
load effective address, 378
instruction, 22
loading data into an SSE/AVX register, 610
loading single-precision vectors into SSE/AVX registers, 612
loading the flags register from AH, 86
loading the FPU control word, 363
local directive (in procedures), 237
local symbols in procedures, 378
local symbols (statement labels) in a procedure, 219
local variable access, 235
local variable address alignment, 607
local variables, 234
lods
instruction, 836
log2(e), 361
log2(x), 362
logical
exclusive-or operation, 55, 57
operations on binary numbers, 57
operations on bits, 55
operators within a constant expression, 153
shift right, 77
logical systems
arithmetic, 310
Boolean, 310
invariant computations, 446
loop-control variables, 433
register usage, 442
termination, 443
unraveling/unrolling, 447
loops in the MASM compile-time language, 756
low32
operator, 154
low-level control structures, 378
low
operator, 153
low-order (LO), 46
byte, 53
nibble, 52
word, 54
lowword
operator, 153
lt
operator, 153
M
machine code encoding, 73
machine idioms, 310
machine state (preservation), 220
machine state, saving the, 220
macro
default parameter values, 768
optional parameters, 766
parameter delimiters, 764
parameter expansion, 762
parameter expansion issues, 765
parameters, 762
required parameters, 766
macroarchitecture, 622
macro
directive, 760
macros, 760
make dependencies, 864
makefiles, 34
makefile syntax, 863
making symbols case-sensitive in MASM, 8
malloc
(C Standard Library function), 166
manipulating bits in memory, 707
mantissa, 87
mask (bits), 708
masking
bit strings, 58
masking in bits, 58
masking out bits, 58
MASM (Microsoft Macro Assembler)
dup
operator in a data declaration, 31
enumerated constants, 156
pointers, 162
procedures, 22
structures (struct
), 198
support for ASCII characters, 95
variables, 14
masm32.com website, 874
MASM /c
command line option, 9
MASM/C++ hybrid programs, 7
maximum instructions (SIMD), 657
maxpd
instruction, 670
maxps
instruction, 670
maxsd
instruction, 372
maxss
instruction, 371
memory, 9
allocation, 105
indirect jump through memory, 389
organization, 106
read operation, 14
subsystem, 13
write operation, 13
memory access violation exception, 169
memory addresses, 9
memory alignment requirements (SSE/AVX/SIMD), 606
memory leaks, 171
memory management unit (MMU), 111
merging bit strings, 741
merging source files during assembly, 848
microarchitecture, 622
Microsoft ABI, 35
data alignment boundary, 144
register usage, 38
volatile registers, 38
Microsoft Macro Assembler. See MASM
Microsoft Visual C++ (MSVC), 9, 920
minimal procedures, 218
minimum instructions (SIMD), 657
minpd
instruction, 670
minps
instruction, 670
minsd
instruction, 371
minss
instruction, 371
misaligned data and the system cache, 121
mkActRec
(macro), 882
MMU (memory management unit), 111
MMX (Multimedia Extensions), 624
MMX register set, 11
mnemonic, 289
modulo
floating-point remainder, 348
integer remainder, 407
modulo-n counters, 312
mod
(within a constant expression), 153
monadic operations, 57
more
command (CLI), 932
movapd
instruction, 610
movapd
operands (MASM), 611
movaps
instruction, 610
movaps
operands (MASM), 611
movddup
instruction, 621
movdqa
instruction, 610
movdqa
operands (MASM), 611
movdqu
instruction, 612
move
command (CLI), 933
movhlps
instruction, 619
movhpd
instruction, 617
movhps
instruction, 617
moving string data, 825
mov
instruction operands, 20
movlhps
instruction, 619
movlpd
instruction, 615
movlps
instruction, 615
movmskpd
instruction, 676
movmskps
instruction, 676
movsb
instruction, 827
movshdup
instruction, 620
movs
instruction, 827
movs
instruction performance, 831
movsldup
instruction, 620
movss
instruction, 370
movsw
instruction, 827
movsx
instruction, 288
movupd
instruction, 612
movups
instruction, 612
MSVC (Microsoft Visual C++), 9, 920
mulpd
instruction, 670
mulps
instruction, 670
mulsd
instruction, 371
mulss
instruction, 371
multi-byte data structure organization (in memory), 114
multilingual planes (Unicode), 97
Multimedia Extensions (MMX), 624
multiple data values in a single data declaration, 16
multiplication, 148, 289, 291, 461
floating-point, 339
multiplying
by a reciprocal to simulate division, 312
register value by ten, 311
without mul
or imul
, 310
multiprecision
addition, 454
comparisons, 458
subtraction, 457
N
naming a segment, 604
NaN (not a number), 90, 296, 320
natural data alignment boundary, 144
neg128
(macro), 760
negating large values, 478
negation (floating-point), 349
neg
instruction, 478
ne
operator, 153
nested array constants, 195
nested dup
operator, 195
nested structs, 200
nested subfield access (of a structure), 200
newLn
function, 886
nibble, 51
N/No N rule, 392
noncommutative binary operators, 308
nonvolatile registers, 265
nonvolatile registers (Microsoft ABI), 39
normalized floating-point numbers, 89, 325
NOT operator, 153
NULL
pointer references, 107
numbering system, 44
binary, 44
decimal, 44
hexadecimal, 46
positional, 44
numeric
conversion from string, 546
memory addresses, 9
numeric-to-string conversion performance, 507
numeric-to-string conversions, 491
representation, 48
O
octal words, 55
offsets, 113
one’s complement format, 87
opattr
operator, 154
opcode, 123
open
function, 888
openNew
function, 889
operation code (opcode), 123
operations
AND, 309
NOT, 309
on binary numbers, 57
rotation, 74
shift arithmetic right, 77
shifts, 74
operator precedence, 303
operators, 195
$
, 154
AND, 153
dot (structure/record field access), 199
eq
, 153
ge
, 153
gt
, 153
high
, 153
high32
, 153
highword
, 153
le
, 153
lengthof
, 153
logical operators, 153
low
, 153
low32
, 154
lowword
, 153
lt
, 153
ne
, 153
NOT, 153
opattr
, 154
OR, 153
size
, 154
sizeof
, 154
this
, 154
type, 159
opposite jumps, 392
optional macro parameters, 766
epilogue
operand, 238
prologue
operand, 238
OR operation, 55
OR operator, 153
orpd
instruction, 645
output redirection (standard output), 926
overflow exception flag (OE, SSE), 369
overflow exception (FPU), 320
setting after an arithmetic operation, 71
overflow mask (OM, SSE), 370
overlaid registers (XMM/YMM), 623
oword, 51
P
pabsb
instruction, 659
pabsd
instruction, 659
pabsw
instruction, 659
packed
absolute value (integer), 659
addition, 648
arrays of bit strings, 733
byte data types, 597
data, 79
decimal arithmetic, 488
double (precision) arithmetic instructions, 668
dword data types, 598
floating-point arithmetic, 668
integer comparisons, 660
integer multiplication, 654
memory operands (SSE/AVX), 606
operands for SSE/AVX instructions, 606
qword data types, 598
shifts, 647
sign extension, 666
sign transfer, 659
(SIMD) integer comparison for less than, 662
single (precision) arithmetic instructions, 668
word data types, 597
zero extension, 665
packing and unpacking bit strings, 717
packssdw
instruction, 667
packsswb
instruction, 667
packusdw
instruction, 667
packuswb
instruction, 667
paddb
instruction, 649
paddd
instruction, 649
paddq
instruction, 649
page (256-byte) alignment within a segment, 605
pages (memory management), 111
paragraph memory alignment, 606
paragraph (para/16-byte) alignment within a segment, 605
parameter declarations with the proc
directive, 255
parameter expansion in macros, 762
parameters, 240
variable length, 248
partial remainder, 348
pass by reference
efficiency, 243
passing
large objects as parameters, 258
parameters by reference, 241
parameters by value, 241
parameters in registers, 243
parameters in the code stream, 246
parameters on the stack, 249
pavgb
instruction, 657
pavgw
instruction, 657
pclmulqdq
instruction, 656
pcmpeqb
instruction, 660
pcmpeqd
instruction, 660
pcmpeqq
instruction, 660
pcmpeqw
instruction, 660
pcmpgtb
instruction, 660
pcmpgtd
instruction, 660
pcmpgtq
instruction, 660
pcmpgtw
instruction, 660
PC-relative addressing mode, 122
performance improvements for loops, 443
performance of numeric-to-string conversion, 507
performance of the string instructions, 837
pextrb
instruction, 641
pextrd
instruction, 641
pextrq
instruction, 641
pextrw
instruction, 641
phaddd
instruction, 650
phaddsw
instruction, 650
phaddw
instruction, 650
pi
(FPU load instruction), 360
pinsrb
instruction, 642
pinsrd
instruction, 642
pinsrq
instruction, 642
pinsrw
instruction, 642
pmaxsb
instruction, 657
pmaxsd
instruction, 658
pmaxsq
instruction, 658
pmaxsw
instruction, 657
pmaxub
instruction, 658
pmaxud
instruction, 658
pmaxuq
instruction, 658
pmaxuw
instruction, 658
pminsb
instruction, 658
pminsd
instruction, 658
pminsq
instruction, 658
pminsw
instruction, 658
pminub
instruction, 658
pminud
instruction, 658
pminuq
instruction, 658
pminuw
instruction, 658
pmovmskb
instruction, 662
pmovmskd
simulation, 663
pmovmskw
simulation, 663
pmovmsq
simulation, 663
pmovsxbd
instruction, 666
pmovsxbq
instruction, 666
pmovsxbw
instruction, 666
pmovsxdq
instruction, 666
pmovsxwq
instruction, 666
pmovzxbd
instruction, 665
pmovzxbq
instruciton, 665
pmovzxbw
instruction, 665
pmovzxdq
instruction, 665
pmovzxwd
instruction, 666
pmovzxwq
instruction, 665
pmuldq
instruction, 656
pmulld
instruction, 655
pmuludq
instruction, 656
pointer constants and pointer constant expressions, 164
pointer data access, 162
pointer problems, 167
pointers, 161
popfd
instruction, 140
popf
instruction, 140
popping the FPU stack, 326
postfix notation, 364
conversion to assembly language, 367
precedence
of arithmetic operators, 303
rules, 303
precision, 314
control bits (FPU), 320
control during floating-point computations, 320
exception (FPU), 321
precision exception flag (PE, SSE), 369
precision mask (PM, SSE), 370
preserving
machine state, 220
in loops, 442
printf
format specifiers, 24
problems with macro parameter expansion, 765
parameter declarations, 255
procedural parameters, 280
passing procedures as parameters, 280
procedure invocation, 216
procedure pointers, 278
effect on the stack, 278
in MASM, 22
processing SIMD comparison results, 678
proc
external symbol type, 851
program counter in a section, 154
programming in the large, 847
programming language
FORTRAN, 424
program size and object/library files, 870
prolog (standard entry sequence code), 239
option
, 239
prologue
(operand for option
directive), 238
pshufb
instruction, 625
pshufd
instruction, 626
pshufhw
instruction, 628
pshuflw
instruction, 628
psignb
instruction, 659
psignd
instruction, 660
psignw
instruction, 659
pslldq
instruction, 647
psllw
instruction, 647
psrldq
instruction, 647
psubb
instruction, 654
psubd
instruction, 653
psubq
instruction, 653
psubw
instruction, 654
ptest
instruction, 646
punpckhbw
instruction, 637
punpckhdq
instruction, 637
punpckhqdq
instruction, 637
punpckhwd
instruction, 637
punpcklbw
instruction, 637
punpckldq
instruction, 637
punpcklqdq
instruction, 637
punpcklwd
instruction, 637
pushf
instruction, 140
pushfq
instruction, 140
pushing a value onto the floating-point stack, 326
pushing the constant 1.0 onto the FPU stack, 360
pushw
instruction, 134
puts
function, 885
Q
qtoStr
(quad word to string) function, 493
quad words, 55
quad-word strings, 825
question mark in a data declaration directive, 15
quicksort, 272
qword, 51
qword data declarations, 55
qword
directive, 15
qword-sized lanes, 599
qword vectors (packed qwords), 598
R
R8B, R9B, R10B, R11B, R12B, R13B, R14B, and R15B registers, 10
R8D, R9D, R10D, R11D, R12D, R13D, R14D, and R15D registers, 10
R8W, R9W, R10W, R11W, R12W, R13W, R14W, and R15W registers, 10
radix, 46
range of a function, 586
RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, R8, R9, R10, R11, R12, R13, R14, and R15 registers, 10
rcpss
instruction, 372
RCX register usage in string instructions, 826
RDI register usage in string instructions, 826
rd/rmdir
commands (CLI), 933
read
function, 887
reading from memory, 13
readLine()
function, 30
readLn
function, 893
readonly
segment argument, 605
variables as constants, 150
real4
directive, 15
real8
directive, 15
real10
directive, 15
real values as parameters, 244
rearranging bytes in an XMM/YMM register, 625
rearranging expressions
in if
statements to improve performance, 406
to make them more efficient, 406
record, 197
declarations, 198
field access, 199
field alignment, 204
record/struct field access via pointer, 200
recursion, 271
recursively converting numbers to strings, 500
reference parameters, 241, 256
register
8-bit, 10
16-bit, 10
32-bit, 10
64-bit, 10
addressing modes, 122
as a procedure parameters, 243
comparison to zero, 298
FPU, 317
indirect addressing mode, 124
indirect jump instruction, 383
overlaying, 10
callee, 222
caller, 222
usage in loops, 442
usage in string instructions, 826
usage in the Microsoft ABI, 38
remainder
floating point, 348
signed integer, 407
removing unwanted data from the stack, 140
ren/rename
commands, 933
repe
prefix on cmpsb
, cmpsw
, cmpsd
, and cmpsq
instructions, 827
repetitive compilation, 756
repne
prefix on cmpsb
, cmpsw
, cmpsd
, and cmpsq
instructions, 827
rep
prefix on movsb
, movsw
, movsd
, and movsq
instructions, 826
rep
/repe
/repz
and repnz
/repne
string instruction prefixes, 826
required macro parameters, 766
restrictions in simple switch statement implementations, 414
return address, 218
returning a result to a C++ program from an assembly language function, 30
reverse
division (floating-point), 343
Polish notation (RPN), 364
subtraction (floating-point), 334
reversing bits in a bit string, 739
right
rotates, 78
shifts, 75
right associative operators, 304
RIP-relative addressing mode, 123
rol
instruction, 78
ror
instruction, 78
rotate
left, 77
operations, 74
right, 77
rounding
control (FPU), 319
control (SSE), 370
floating-point numbers, 349
floating-point value to an integer, 349
round-up and round-down options during floating-point computations, 319
row-major array access for three-dimensional arrays, 191
row-major ordering, 190
RPN (reverse Polish notation), 364. See also postfix notation
RSI register usage in string instructions, 826
rsqrtps
instruction, 670
rsqrtss
instruction, 372
rstrActRec
(macro), 883
run of zeros bit string, 708
runtime
language, 748
memory organization, 106
runtime versus compile-time expressions, 155
S
saturation addition (horizontal), 650, 652
saturation (SSE/AVX/SIMD), 667
saving the machine state, 220
sbyte
directive, 15
scalar data types, 597
scaled-indexed addressing mode, 126
scaling factor, 126
scas
instruction, 835
of a local variable, 234
sdword
directive, 15
searching
for a bit, 736
for a bit pattern, 743
for a substring within another string in MASM textual constants, 751
for the first (or last) set bit, 737
section location counter, 154
segment
alignment option, 605
alignment (powers of 2), 605
class argument, 605
declarations, 604
directive, 604
directive align option (for 32-byte alignment), 606
faults, 107
faults on unaligned memory accesses (SSE/AVX), 606
names, 604
registers, 10
separate assembly, 854
separate compilation, 847, 854
setae
instruction, 296
seta
instruction, 296
setbe
instruction, 296
setb
instruction, 296
setcc
instructions, 295
sete
instruction, 296
setge
instruction, 297
setg
instruction, 296
setl
instruction, 297
setnae
instruction, 296
setna
instruction, 296
setnbe
instruction, 296
setnb
instruction, 296
setne
instruction, 296
setnge
instruction, 297
setng
instruction, 297
setnle
instruction, 296
setnl
instruction, 297
setno
instruction, 295
setnp
instruction, 295
setns
instruction, 295
seto
instruction, 295
set on condition instructions, 295
setpe
instruction, 295
setp
instruction, 295
setpo
instruction, 295
sets
instruction, 295
setting bits, 708
shadow storage (for parameters), 255, 264
shift
arithmetic right operation, 77
left operation, 75
operations, 74
operations (SSE/AVX), 647
right operation, 76
shift and rotate instructions, 709, 716
shld
instruction, 482
short-circuit
Boolean evaluation, 401
short-circuit versus complete Boolean evaluation, 403
shrd
instruction, 482
shuffle instructions, 625
shufpd
instruction, 630
shufps
instruction, 630
side effects, 403
sign
bit, 62
contraction, 67
extension prior to division, 305
sign and zero flag settings after mul
and imul
instructions, 291
signed
comparison flag settings, 294
comparisons, 296
decimal input (extended-precision), 569
decimal output (extended-precision), 513
division, 292
integer remainder/modulo, 407
integer-to-string conversion, 507
multiplication, 148, 289, 291, 461
numbers, 62
signed and unsigned numbers, 62
sign extension (SIMD/SSE/AVX), 666
setting after an arithmetic operation, 71
sign flag and the and
, or
, and xor
instructions, 712
significant digits, 314
sign transfer, 659
SIMD (single instruction, multiple data), 11, 55, 595
arithmetic/logical operations, 644
bitwise instructions, 645
comparison instructions (floating-point), 671
comparison results (processing multiple comparisons), 663
floating-point arithmetic operations, 668
floating-point conversions, 679
integer absolute value, 659
integer addition, 648
integer arithmetic instructions, 648
integer average instructions, 657
integer comparison instructions, 660
integer conversions, 664
integer minimum and maximum, 657
integer multiplication, 654
integer sign-transfer instructions, 659
integer subtraction, 653
memory alignment requirements, 606
programming model, 596
saturation, 667
SIMD string instructions, 838
SIMD zero-extension instructions, 665
simple assignments (conversion to assembly language), 299
simulating div
, 312
sine, 361
single-instruction, multiple-data (SIMD) instructions. See SIMD
single-instruction, single-data (SISD) instructions. See SISD
single-precision floating-point format, 87
single-precision (floating-point) lanes, 598
single-precision vector types, 597
SI register, 10
SISD (single instruction, single data), 595
sizeof
function (applied to UNIONs), 207
sizeof
operator, 154
size
operator, 154
sizestr
directive, 752
software configuration via conditional compilation, 754
sorting, 185
bubble sort, 185
quicksort, 272
special-purpose application-accessible registers, 10
special-purpose kernel-mode registers, 10
specifying a variable name and type without allocating storage, 114
SP register, 10
sqrtpd
instruction, 670
sqrtps
instruction, 670
sqrtsd
instruction, 372
sqrtss
instruction, 372
sqword
directive, 15
SSE (Streaming SIMD Extensions), 596, 624
aligned data movement instructions, 610
denormal exception flag (DE), 369
denormal mask (DM), 370
denormals are zero (DAZ), 370
divide-by-zero mask (ZM), 370
floating-point arithmetic (SIMD), 668
floating-point conversions, 679
flush to zero (FZ), 370
instruction operands, 606
invalid operation mask (IM), 370
memory alignment requirements, 606
overflow exception flag (OE), 369
overflow exception flag (UE), 369
overflow mask (OM), 370
packed byte data types, 597
packed dword data types, 598
packed qword data types, 598
packed word data types, 597
precision exception flag (PE), 369
precision mask (PM), 370
programming model, 596
rounding control, 370
sign extension, 666
string instructions, 838
unaligned memory access, 606, 612
underflow mask (UM), 370
zero exception flag (ZE), 369
zero extension, 665
SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2, 596
SSE/AVX comparison synonyms, 673
SSE/SSE2 instruction set, 11
ST0, 318
ST1, 318
stack, 134
stack fault flag (FPU), 322
stack manipulation by procedure calls, 224
stack operations
popf
, 140
popfd
, 140
pushf
, 140
pushfd
, 140
pushw
, 134
stack pointer register, 13
stack segment, 134
stack variable address alignment, 607
standard entry sequence (to a procedure), 231
standard exit sequence (from a procedure), 233
standard input redirection, 927
standard macro parameter expansion, 762
standard macros, 760
standard output redirection, 926
state machine, 424
statement labels, 378
statements
break
, 438
conditional, 396
continue
, 438
else
, 397
for
, 437
if
, 396
repeat..until
, 433
while
, 433
state variable, 424
static variable declaration section, 108
status register (FPU), 321, 364
stc
instruction, 716
STDCALL calling convention, 263
stdin_getc
function, 892
stdin_read
function, 891
std
instruction, 86
sti
instruction, 86
stmxcsr
instruction, 370
store data from an SSE/AVX register into memory, 610
storing AH register into flags, 86, 350
storing single-precision vectors from SSE/AVX registers to memory, 612
storing the FPU control word, 321
storing the FPU status word, 321, 350, 364
stos
instruction, 835
streaming data types, 596
streaming SIMD extensions. See SSE
strength-reduction optimizations, 311
strfill
procedure, 244
strings, 174
comparisons, 825
descriptors, 176
equality test for macro/text arguments, 767
instruction performance, 837
length, 174
length calculated at assembly time, 176
length operator in MASM textual constants, 752
length-prefixed, 175
SSE instructions, 838
zero-terminated, 174
string-to-decimal conversion (unsigned), 563
string-to-floating-point conversion, 570
string-to-integer conversion, 546
string-to-numeric conversion (hexadecimal), 556
string-to-numeric conversions, 546
string-to-numeric conversion (signed, extended-precision), 569
strtoh128
function, 561
strtoh
function, 557
strtoi
function, 550
strToR10
function, 573
strtou128
function, 567
struct
arrays, 203
struct
assembler directive, 198
struct
declarations, 198
struct
directive, 198
struct/record
field access via pointer, 199
structs, 197
nested, 200
structure field access, 199
structure field initialization, 200
sub
instruction, 21
subpd
instruction, 669
subps
instruction, 669
subregisters, 623
subsd
instruction, 371
subss
instruction, 371
substr
directive, 752
substring operator (MASM text strings), 752
substring search in MASM textual constants, 751
floating-point, 334
subtract with borrow, 457, 716
swapping bytes in a multi-byte object, 116
swapping registers on the FPU stack, 327
switch
statement, 410
sword
directive, 15
synthesizing
break
statements in assembly language, 438
continue
statements in assembly language, 439
forever..endfor
loops in assembly language, 436
for
statements in assembly language, 437
repeat..until
loops in assembly language, 434
while
loops in assembly language, 433
system bus, 9
T
tables and table lookups, 583
table lookup computations, 584
table lookup (hexadecimal-to-string conversion), 497
tag field, 209
taking the address of a statement label, 378
tangent, 361
tbyte
directive, 15
tbyte
values (BCD), 488
temporary values in an expression, 307
temporary variables, 306
test for zero (floating-point), 360
testing a floating-point operand for zero, 322, 360
testing bits, 708
testing to see if a macro argument is the empty string, 767
testing two text objects for equality, 767
text delimiters, 151
textequ
directive, 151
this
operator, 154
three-dimensional array element access (row-major), 191
time
command (CLI), 933
top of stack pointer (FPU), 324
trampoline, 393
transcendental function instructions, 361
translate arithmetic expressions into assembly language, 287
translate instruction, 585
tricky programming, 310
true (representation), 308
truncation during FPU calculations, 319
truth table, 55
try..catch
statement (C++), 30
two-dimensional row-major ordered array formula (for accessing array elements), 191
two’s complement
numbering system, 54
numeric representation, 62
operation, 63
type checking, 20
coercion, 157
type declaration section, 156
typedef
directive, 156
type operator, 159
U
unaligned loads (to XMM/YMM registers), 622
unaligned SSE/AVX data movements, 612
unaligned SSE/AVX memory accesses, 606
unary operator (conversion to assembly language), 301
unconditional jump instruction, 69
underflow, 316
underflow exception flag (UE, SSE), 369
underflow exception (FPU), 321
underflow mask (UM, SSE), 370
BMP (Basic Multilingual Plane), 97
UTF-8 encoding, 98
UTF-16 encoding, 98
UTF-32 encoding, 98
code planes, 97
code points, 96
encodings, 97
multilingual planes, 97
uninitialized pointers, 168
unions, 206
accessing fields of a union, 206
anonymous, 208
definition, 206
syntax (declaration), 206
unordered comparisons, 90, 360, 373, 673
floating-point, 357
unpacking bit strings, 717
unpack instructions, 625
unpckhpd
instruction, 633
unpckhps
instruction, 633
unpcklpd
instruction, 633
unpcklps
instruction, 633
unraveling loops, 447
unrolling loops, 448
unsigned
comparisons, 296
decimal input (extended-precision), 566
decimal output, 500
division, 291
integer-to-string conversion (extended-precision), 508
numbers, 62
string-to-decimal conversion, 563
untyped reference parameters, 284
using echo to display equate values, 751
uSize
function, 514
UTF-8 encoding, 98
UTF-16 encoding (Unicode), 98
UTF-32 encoding (Unicode), 98
utoStrSize
function, 517
V
vaddpd
instruction, 669
vaddps
instruction, 669
vandnpd
instruction, 645
vandpd
instruction, 645
variable-length parameters, 248
variable names, 14
variables in MASM, 14
variant objects, 209
variant types, 209
vcvtdq2pd
instruction, 679
vcvtdq2ps
instruction, 679
vcvtpd2dq
instruction, 679
vcvtpd2ps
instruction, 680
vcvtps2dq
instruction, 680
vcvtps2pd
instruction, 680
vcvttpd2dq
instruction, 680
vcvttps2dq
instruction, 680
vdivpd
instruction, 670
vdivps
instruction, 670
vector
absolute value (integer), 659
addition, 648
data types, 597
floating-point arithmetic, 668
instructions, 595
integer comparisons, 660
integer multiplication, 654
memory operands, 606
operands for SSE/AVX instructions, 606
shifts, 647
sign extension, 666
sign transfer, 659
(SIMD) integer comparison for less than, 662
zero extension, 665
vertical addition, 649
vextractps
instruction, 643
vhaddpd
instruction, 671
vhaddps
instruction, 671
vhsubpd
instruction, 671
vhsubps
instruction, 671
vinsertps
instruction, 643
vlddqu
instruction, 622
vmaxpd
instruction, 670
vmaxps
instruction, 670
vminpd
instruction, 670
vminps
instruction, 670
vmovapd
instruction, 610
vmovapd
operands (MASM), 611
vmovaps
instruction, 610
vmovaps
operands (MASM), 611
vmovddup
instruction, 621
vmovd
instruction, 609
vmovdqa
instruction, 610
vmovdqa
operands (MASM), 611
vmovdqu
instruction, 612
vmovhlps
instruction, 619
vmovhpd
instruction, 618
vmovhps
instruction, 618
vmovlhps
instruction, 619
vmovlpd
instruction, 615
vmovlps
instruction, 615
vmovmskpd
instruction, 676
vmovmskps
instruction, 676
vmovq
instruction, 609
vmovshdup
instruction, 620
vmovsldup
instruction, 620
vmovupd
instruction, 612
vmovups
instruction, 612
vmulpd
instruction, 670
vmulps
instruction, 670
volatile registers, 265
Microsoft ABI, 38
von Neumann architecture, 9
vorpd
instruction, 645
vpabsb
instruction, 659
vpabsd
instruction, 659
vpabsw
instruction, 659
vpackssdw
instruction, 667
vpacksswb
instruction, 667
vpackusdw
instruction, 667
vpackuswb
instruction, 667
vpaddb
instruction, 649
vpaddd
instruction, 649
vpaddq
instruction, 649
vpavgb
instruction, 657
vpavgw
instruction, 657
vpclmulqdq
instruction, 656
vpcmpeqb
instruction, 661
vpcmpeqd
instruction, 661
vpcmpeqq
instruction, 661
vpcmpeqw
instruction, 661
vpcmpgtb
instruction, 661
vpcmpgtd
instruction, 661
vpcmpgtq
instruction, 661
vpcmpgtw
instruction, 661
vpextrb
instruction, 642
vpextrd
instruction, 642
vpextrq
instruction, 642
vpextrw
instruction, 642
vphaddd
instruction, 650
vphaddw
instruction, 650
vpinsrb
instruction, 642
vpinsrd
instruction, 643
vpinsrq
instruction, 643
vpinsrw
instruction, 643
vpmaxsb
instruction, 657
vpmaxsd
instruction, 658
vpmaxsq
instruction, 658
vpmaxsw
instruction, 657
vpmaxub
instruction, 658
vpmaxud
instruction, 658
vpmaxuq
instruction, 658
vpmaxuw
instruction, 658
vpminsb
instruction, 658
vpminsd
instruction, 658
vpminsw
instruction, 658
vpminub
instruction, 658
vpminud
instruction, 658
vpminuq
instruction, 658
vpminuw
instruction, 658
vpmovmskb
instruction, 662
vpmovsxbd
instruction, 666
vpmovsxbq
instruction, 666
vpmovsxbw
instruction, 666
vpmovsxdq
instruction, 666
vpmovsxwd
instruction, 666
vpmovsxwq
instruction, 666
vpmovzxbd
instruction, 665
vpmovzxbq
instruction, 665
vpmovzxbw
instruction, 665
vpmovzxdq
instruction, 665
vpmovzxwd
instruction, 665
vpmovzxwq
instruction, 665
vpmuldq
instruction, 656
vpmulld
instruction, 655
vpmuludq
instruction, 656
vpshufb
instruction, 625
vpshufd
instruction, 626
vpshufhw
instruction, 628
vpshuflw
instruction, 628
vpsignb
instruction, 659
vpsignd
instruction, 660
vpsignw
instruction, 659
vpslldq
instruction, 647
vpsllw
instruction, 647
vpsrldq
instruction, 647
vpsubd
instruction, 653
vpsubq
instruction, 653
vpsubsb
instruction, 654
vpsubw
instruction, 654
vptest
instruction, 646
vpunpckhbw
instruction, 640
vpunpckhdq
instruction, 641
vpunpckhqdq
instruction, 641
vpunpckhwd
instruction, 640
vpunpcklbw
instruction, 640
vpunpckldq
instruction, 640
vpunpcklqdq
instruction, 641
vpunpcklwd
instruction, 640
vrsqrtps
instruction, 670
vshufpd
instruction, 632
vshufps
instruction, 632
vsqrtpd
instruction, 670
vsqrtps
instruction, 670
vsubpd
instruction, 670
vsubps
instruction, 670
vunpckhpd
instruction, 633
vunpckhps
instruction, 633
vunpcklpd
instruction, 633
vunpcklps
instruction, 633
vxorpd
instruction, 645
W
while
directive, 756
while..endm
compile-time statement, 756
while
statement, 433
Win32 API, 876
Windows command line, xxx
16-bit variables, 54
alignment in a segment, 605
strings, 825
vectors (packed words), 597
word-sized lanes, 598
wrapper code, 882
WriteFile
(Win32 API function), 875
write
function, 884
wtoStr
(word to string) function, 493
X
xchg
instruction, 116
xlat
instruction, 584
XMM registers, 11
xor
instruction, 58, 309, 709, 712
xorpd
instruction, 645
Y
Y2K, 85
YMM registers, 11
Z
zero and sign flag settings after mul
and imul
, 291
zero-divide exception (FPU), 320
zero exception flag (ZE, SSE), 369
zero-extension, 292
zero-extension (SIMD), 665
setting after a multiprecision OR, 479
setting after an arithmetic operation, 71
settings after mul
and imul
instructions, 291
zero-terminated strings, 174