Lame excuses and incoherent ramblings

July 26,2001

Well, I know it's been a while since I've posted anything on my Euphoria site, so I thought I'd let people know what's up. I'm not dead or anything, just been busy working/playing and haven't done hardly any coding in my spare time. My job as a tech support engineer at a software development toolset company has been educational. I've had the opportunity to work with a large complex application development environment that supports languages such as Ada/C/C++. It has been primarily enlightening learning Ada, a really great language in many respects, but burdened by an overly complex language specification that's about as much fun to read as tax law. There are many features in Ada that I would love to see in Euphoria, the package mechanism being one, generics (C++ templates) being another. I think Kat would like Ada too because it has built-in support for tasking. Ada and Euphoria have similar goals, those being: easy to read source code (visibility through verbosity), run-time safety, fast execution time. One of the things I hate about Ada is that the typing is extremely strong, you can't assign anything to anything unless they are exactly same named type.

I've also been playing Final Fantasy VIII for the PC. The graphics suck. At least until you have a Nvidia GeForce card and the right drivers that allow 3x3 antialiasing forced in all applications. Then the graphics become quite pretty. The story is pretty standard for Final Fantasy, but the prerendered cutscenes are really phenomenal. It's worth the 5 CDs that the game ships with.

It's nice to see that Robert Craig is heading more and more to releasing the source to Euphoria. The C translation code seems pretty cool, but the generated code is pretty awful. Peu didn't do any better job though. I still think the best way to mix Eu and C would be to have the interpreter/translator portion actually written in Eu langauge, and the core routines exist only in C object code form. That way Euphoria programmers could redefine the language, adding the features they want, and recompile to make their own interpreter. I think Rob sees this too, now.

It also might be nice to have a just-in-time compiler for Eu. I could probably do it for Intel architecture, but there's a lot of wicked stuff I would want it to do. Subscripting sequences can be done quickly using the bound assembly instruction, which generates an interrupt if it's out of range. The same could probably be done for type testing... the code could be written with an integer-based path, and bound instructions at key points to make sure that an integer is actually the data at that point. If the bound instruction raises the interrupt, the handler could look and see which instructions were supposed to be executed and perform the equivalent sequence or float based operation, change the return address to jump over the integer operation and away we go. This might be pretty quick for integer code but I don't know how much overhead sequences and floats would incur. Of course, that is when the compiler cannot determine at compile time what the operating type would be.

I'm thinking about rewriting someone's Euphoria source code processor to determine how much optimization information can be done. Some things I already had in Peu were: determine which functions don't rely on variables outside of the function itself. That way if the function is called with constant arguments, it can be evaluated at compile time and the function call is replaced by the constant result. This thing made peu kick the crap out of ex for silly benchmarks that append a constant to another constant several thousand times. Peu would optimize out the append and just do the assign. Another thing Euphoria would benefit greatly from is inlining routines. This is extremely difficult and hasn't been done by any processors that I know of.

Then theres the many faces of sequences. Eu has only supported 32-bit elements in sequences. While this makes the source code very clean and fast, it tends to eat up a lot of memory. If you're just storing ascii strings, you're effectively wasting 75% of your memory. The same goes for 256-color graphics (and more for 16-color but who uses that anyway.) Peu had support for byte-sized sequences, which worked, but introduced tons of bug potential. I'm also tinking about bit-sequences (or bit vectors stored as an integer array) that would allow bitwise operations on an integer, but using sequence notation. The compiler would have to be very smart to determine when bit sequences could be used, and which shift/rotate instructions could be used. It might not be worth it, because the Euphoria code would have to really conform to certain rules to be optimized efficiently, and writing the compiler code to handle unoptimal situations makes me shudder. Bit vectors could be created as the result of a boolean operation involving a sequence. My favorite boolean operation trick is the upper/lowercase conversion: upper = string + ('A' - 'a') * (string >= 'a' and string <= 'z')
But the generated code is hideous:
two boolean sequence to integer comparison operations creating two new sequences
one boolean sequence to sequence 'and' operation resulting in another sequence
one sequence to integer MULTIPLY operation resulting in a new sequence (the subtract can be done at compile-time)
one sequence to sequence add operation resulting in the final uppercase result

It would be cool if this could be compiled as an iterative operation somehow... it might benefit from being a bit vector operation... all we would have to do is optimize expressions of (seq op constant * bit_vector) or just go all out and iterate all known sequence expressions anyway. You would have a lot to worry about if the sequences are mixed 32-bit and 8-bit or even 1-bit. Checks would have to be made to ensure that the sequences are compatible, or at least know exactly what type each is, then you could add the conversion to the inner loop.

I've been spoiled on fancy debuggers. The Ada one I use lets you switch back and forth between assembly and source, and even intersperses the source code with the assembly statements.

Slice operations have always been a good chance to optimize. Instead of copying to a new sequence, why not just increase the reference count of the sequence and have a pointer into the sequence at the slice location. As long as the sequences aren't changed much, they will never have to be copied again. Some weird stuff happens when you slice a slice, because the original sequence needs to have its reference count consistent with the number of slices pointing to it. Since a slice should never have another sequence reference it (because it doesn't actually own the memory that it's pointing to) the reference count for a slice could actually be the pointer to the parent sequence....