Hey there peoples. It has been my observation that assembly programming in Euphoria is still a little difficult for some people. So I have decided to write this little tutorial for Euphoria programmers that might help people write super-fast machine codes for their programs. I intend to explain how asm works using Euphoria examples and a better explanation of my asm.e translator. --Pete

First thing you should know is that the cpu is programmed using registers. There are four general registers called eax, ebx, ecx, edx. These are each 32-bits but can be used in many different ways. The lowest 8 bits can be accessed in the byte-sized l-register (al for eax, bl for ebx, etc.) The next 8 bits can be accessed in the byte-sized h-register (ah for eax, bh for ebx, etc.) There are four other registers esi, edi, esp, ebp, ip which are used mainly for pointer operations. The segment registers, cs, ds, es, fs, and gs, are used for selecting segments and other stuff you usually don't have to worring about.

eax - this register is used for almost everything
ebx - so's this one
ecx - this one can also be used for counting in a loop
edx - this one can be used for port operations

esi - a pointer to memory as the source in a mov operation
edi - a pointer to memory as the destination in a mov operation
esp - the stack pointer - don't mess with it!
ebp - just another pointer
ip - pointer to the current instruction being executed in the code segment

cs - code segment
ds - data segment (or source segment for mov operations)
es - extra segment (or destination segment for mov operations)
fs - extra segment (funky segment)
gs - extra segment (groovy segment)

Any of these registers can actually be used for whatever purpose you want, as long as they are used in the correct context. Changing esp is not recommended because it will piss off the cpu if it's not restored after your machine code finishes. Changing cs will also cause the cpu to do some pretty weird stuff like executing instructions that aren't really programmed.

[High and Low bytes in a register explained]
The low and high bytes are the lower 8-bits and upper 8-bits of a 16-bit register. Clear as mud, right? Here are some examples:

say AX = #10FF
the high byte AH is #10, which is the same as floor(AX/#100) and the low byte AL is #FF, which is the same as and_bits(AX,#FF)

now say you have a low byte AL = #20 and high byte AH = #4C

AX = AH*#100 + AL
   = #4C*#100 + #20
   = #4C00 + #20
   = #4C20

But if you're dealing with 32-bit registers, the lower byte is the same, but the upper byte isn't the upper-most anymore. If EAX=#10FF09C2, then AH=and_bits(floor(EAX/#100),#FF) = #09. As before, AL = and_bits(EAX,#FF).

So, EAX = and_bits(EAX,#FFFF0000) + AH*#100 + AL

To access the upper word in EAX, you may have to use a shift operation to move the bytes into AH and AL. ex. SHR EAX,16

Note: these examples also work with the other general registers (BX,CX,DX or EBX,ECX,EDX with BH,BL,CH,CL,DH,DL)


On to the instructions!

 ASM              Euphoria

mov x, y          x = y 

add x, y          x = x + y

sub x, y          x = x - y

lea x,[m]         x = m
  where m = a*s + b + c
  where a = register (index)
        s = 0,1,2,4,8
        b = register (base)
        c = integer constant

lea means "load effective address"

An effective address is the part of an instruction with the brackets []. It means to use at the memory at the value between the brackets, so you can do pointer operations.

mov al, [1000]     is like  "al = peek(1000)"  in Euphoria
mov eax, [edi]     is like  "eax = peek4u(edi)"  in Euphoria
mov [edi], eax     is like  "poke4(edi, eax)"  in Euphoria

Note: you cannot have two effective addresses in the same instruction, i.e. mov [edi], [esi] will not work. This is why some tutorials say you cannot directly move memory to memory.

An effective address can be made up of the following three parts:

  an integer offset:  [1000]
  an base register :  [eax]
  an index register:  [ebx*4]

  the integer offset can be any value in the range from -2^31 to 2^31-1.
  the base register can be any of the 8 general registers.
  the index register can be any of the 8 general registers multiplied by
    one of the following constants: 1,2,4,8

  these can be combined in a variety of ways:
    [eax + eax*4 + 100]
    [ebx + 10000]
    [ecx + edx]     -- edx*1 is implied
    [ebp*8]

    etc.

now back to lea...

  lea eax, [eax + ebx*2 + 4]

    is about the same as

  mov eax, eax + ebx*2 + 4

    except that you can't have complex calculations like that in a mov
    instruction.  The benefits of lea are pretty obvious.

  a lea instruction can be use to multiply any register by certain small
  constants, without using any other registers.

    lea eax, [eax*2]        -- eax = eax * 2
    lea eax, [eax + eax*2]  -- eax = eax * 3
    lea eax, [eax*4]        -- eax = eax * 4
    lea eax, [eax + eax*2]  -- eax = eax * 3
    lea eax, [eax*4]        -- eax = eax * 4
    lea eax, [eax + eax*4]  -- eax = eax * 5
    lea eax, [eax*8]        -- eax = eax * 8
    lea eax, [eax + eax*8]  -- eax = eax * 9

  multiple lea commands can be used to perform more complex calculations,
  that would normally require more instructions and extra registers.  Lets
  use my mode 19 address calculation as an example:

    mov ebx, x
    mov edx, y

    lea edx, [edx + edx*4]     -- y = y + y*4
                               -- y = y*5
    lea edx, [edx*8 + #14000]  -- y = (y*5)*8 + #14000
                               -- y = y*40 + #14000
    lea edi, [ebx + edx*8]     -- a = x + (y*40 + #14000)*8
                               -- a = x + y*320 + #A0000
    lea edi, [ebx + edx*8]     -- a = x + (y*40 + #14000)*8
                               -- a = x + y*320 + #A0000

  see how it works?