Or “How I doubled the speed of Neat“.
Evaluating related_post_gen benchmark, you’ll notice an improvement in the positioning of my language, Neat. How was this achieved? Was the application of new high-level optimizer passes to utilize language particulars and uncover hidden optimization potential?
It was as simple as making a change in the parameters. Instead of one parameter consisting of a struct with three pointers, arrays were shifted to be passed as three-pointer parameters. And that’s all it took.
The perplexing issue of Neat trailing behind D has been resolved. A stack profile showed an excessive amount of stack restructuring for function calls. A closer inspection revealed that arrays in Neat needed three pointers to function. This is unlike D, which takes advantage of a refcounter. Arrays in Neat, in addition to start and end, also require a pointer to the base of the array object where the reference count is stored. Unsurprisingly, the reason for the slowness of Neat arrays compared to D arrays is due to the additional 8 bytes they carry.
Referring to the SystemV AMD64 ABI specification (PDF), the instructions for passing a structure by memory are elucidated. It clearly states that if the size of the aggregate exceeds 16 bytes, the whole argument is passed in memory. Consequently, to pass a struct by memory, a struct-sized spot is allocated on the stack. This spot is filled with the values to be passed, which are further passed on to the function.
Though LLVM is known for being an excellent optimizer, this arrangement does not provide much scope for optimization. There is no escaping the stack — the value must go there, and space has to be factored in. Not to mention, it needs to be copied out of the register it is residing in. Keeping track of which parts of the stack are occupied and which parts are open for use is a task that LLVM is not particularly adept at.
The issue becomes apparent through this benchmark:
==========
harness.h:
==========
#define TYPE double
struct Vector { TYPE x, y, z; };
struct Vector vector_add_struct(struct Vector left, struct Vector right);
struct Vector vector_add_fields(
TYPE left_x, TYPE left_y, TYPE left_z,
TYPE right_x, TYPE right_y, TYPE right_z);
==========
harness.c:
==========
#include
#include
#include “harness.h”
int main(int argc, const char *argv[])
{
int mode = atoi(argv[1]);
int length = atoi(argv[2]);
struct Vector result = {0};
if (mode == 0)
{
for (int i = 0; i < length; i++)
result = vector_add_struct(result, (struct Vector) {i, i, i});
}
else
{
for (int i = 0; i < length; i++)
result = vector_add_fields(result.x, result.y, result.z, i, i, i); » …
Read More rnrn

