Optimization Techniques

If your code runs slow or causes hitches on some computers, here are several optimization techniques you can apply to your code to make it run better.

Keep the log clean

Every line that is written to the log file takes time. Writing lots of them to the log may slow down your mod to a crawl and can inflate the game's log to several megabytes after a short time already.

Fix Accessed None and other log warnings: Fix all Accessed None and other log warnings you find in the log after executing your mod. (You can safely assume that your code is responsible for any script warning you find in the log after executing it, even those that point to somewhere in Epic's code. Compare with the log when not playing your mod if in doubt.)
Remove debug logging: Make sure to remove any debugging log statements in release versions of your mods. If you think you may need the log statements in future again, just comment them out.

Avoid iterators

Avoid using any iterators in frequently executed functions like Tick or PostRender because they are, for the most part, extremely slow.

Instead try using linked lists or dynamic arrays for the actors you work with. You could either fill these at the start of the match, e.g. with a single iterator loop, or maintain the list when spawning and destroying actors like the SpawnNotify in UT or the Interaction lists maintained by the InteractionMaster in UT2003.

Optimize iterator use

If you can't avoid using them, at least try to optimize them.

VisibleActors and VisibleCollidingActors are good examples of this. Every actor that is within the radius has a FastTrace called on it. Traces use a lot of CPU resources and if the radius is large you will be calling a lot of traces. You can spare yourself some traces by checking anything else than needs to be checked on top of the trace first.

For example, if you are looking for all visible pawns within 2000 units that have over 50 health, we can check the health before we check the trace – that way we don't bother doing the expensive visibility check if it was going to fail the health check anyway. For example:

// this way is slow
foreach VisibleActors(class'Pawn', P, Radius, Location, True)
  if (P.Health > 50)
    // do something

// this way is faster
foreach RadiusActors(class'Pawn', P, Radius, Location, True)
  if (P.Health > 50 && FastTrace(P.Location, Location))
    // do something

Another example of an even slower implementation that needs to check all actors, not just Pawns:

foreach VisibleCollidingActors(class'Actor', thisActor, Radius, Location)
{
    if (thisActor.bStatic || thisActor.Physics == PHYS_None)
        continue;  // skip this actor
    // do the actual stuff
}

Within a radius of e.g. 1500 you could easily find over 300 actors. The loop will execute a FastTrace (see Actor/Methods) for every single actor in the collision hash within the specified area. However, a lot of those actors are most probably static and/or have Physics == None. FastTrace requires much more time than the checks used within the loop, but those checks would catch almost as many actors. A simple optimization would be executing the FastTrace after the other two checks. This can be done by using the CollidingActors (still faster than RadiusActors with up to 2000 UU radius) iterator instead:

ForEach CollidingActors(class'Actor', thisActor, Radius, Location) {
    if ( thisActor.bStatic || thisActor.Physics == PHYS_None || !FastTrace(Other.Location, Location) )
        continue; // skip this actor

    // do the actual stuff
}

This will execute much faster, but why?

Imagine those 300 actors, let's say 200-250 of them are StaticMeshes placed in the map. Those are all static actors which will be caught by the first part of the if statement. Some other actors might be e.g. gibs lying on the floor. Those actors have Physics == PHYS_None and will be caught by the second part of the if statement. Typically over 90% of the actors will fail to pass those first two tests, leaving only about 30 actors for the FastTrace check in this example. This means we only have to do 30 FastTraces instead of 300. Now imagine you want to run this loop every Tick. A high number of FastTraces can slow down the game by 50% or even more, while about 10-30 of those Traces can only be noticed by checking the frame rate.

Disable engine events when you don't need them

Use the Disable function to deactivate certain engine events when you don't need them. (That only applies for events you provide an UnrealScript implementation for; if you haven't overwritten an engine event in your class, disabling it makes no difference.)

event PostBeginPlay
{
    // all events are enabled by default, so disable Tick event to start with
    Disable('Tick');
}

event Trigger(Actor Sender, Pawn Instigator)
{
    // tell engine to call Tick event from now on
    Enable('Tick');
}

event Tick(float DeltaTime)
{
    // do something -- executed only after the actor has been triggered
}

Using Disable and Enable is more efficient than using a bool variable and doing an UnrealScript-level check in the Tick function. (And it's more elegant as well.)

Re-use objects

Creating actors and objects is a relatively expensive operation. If you can, design your code so that you spawn an actor or object you need only once, save a reference to it and use it later again.

The object pool makes re-using non-Actor objects convenient and straightforward: Allocating an object of a given class either takes an existing one from the pool or automatically creates a new one if none exists yet; freeing an object doesn't destroy it but puts it into the pool.

Precache materials and static meshes

Precache any new materials or static meshes you use to avoid hitches when they're displayed the first time. Overwrite the UpdatePrecacheMaterials and UpdatePrecacheStaticMeshes functions (defined in Actor) to do that:

simulated function UpdatePrecacheMaterials()
{
    Super.UpdatePrecacheMaterials();
    Level.AddPrecacheMaterial(Texture'MyUserInterfaceTexture');  // hard-coded texture reference
    Level.AddPrecacheMaterial(MapperSpecifiedMaterial);          // specified by mapper in UnrealEd
}

simulated function UpdatePrecacheStaticMeshes()
{
    Super.UpdatePrecacheStaticMeshes();
    Level.AddPrecacheStaticMesh(StaticMesh'MyStaticMesh');
}

Optimize expressions

Place operators in an optimized order – this seems trivial but can be important for things that get called a lot, especially for replication statements.

Expressions in UnrealScript will terminate prematurely if applicable, so you can take advantage of this. Similar to the optimizations for iterators, if using "or" expressions, evaluate the most likely or least expensive things first. That way, if it is true, it doesn't have to waste time on things that usually will be false anyway. If it is an "and" expression, evaluate the least likely thing first – that way you won't pass one check only to get stopped by the second as often.

The only real exception to this is if it is necessary to avoid Accessed Nones – obviously it's more likely that a Controller's Pawn will have over 10 health than that a controller will not have a pawn at all, but for obvious reasons you need to confirm that the controller has a pawn before attempting to read a variable from it.

Avoid Nesting functions

Although it seems like it would use less memory to nest function calls, I've timed different variations on nested and non-nested calls, and a non-nested call consistently ran at twice the speed as a nested one:

// Runs relatively slow.
function int NestedFunction()
{
  return DoSomething( DoSomethingElse( DoEvenMore( 5 ) ) );
};

// Runs WAY faster.
function AFunction()
{
  local int Value;

  Value = DoEvenMore( 5 );
  Value = DoSomethingElse( Value );
  return DoSomething( Value );
};

I suspect this is an indication that the interpreter's "stack" is not used very efficiently.

Use native functionality instead of UnrealScript code

Use native functions instead of scripted ones whenever possible.

UnrealScript runs a lot slower than the native functions do – it's usually better to use a native function instead of writing your own, even if the native function does a bunch of stuff you don't need. The wasted functionality is nothing compared to the added speed of native functions.

Execute code only as often as needed

Timeslicing less important calls in Tick or Timer can increase speed.

You could, for instance, use a boolean variable and store the previous deltatime so that a less critical function can be called only half the time, with the cumulative deltatime. You can also use an incrementing integer and a cumulative Deltatime float to call functions even less often. This makes the program look more complicated, unfortunately, but can decrease the strain on the CPU.

EntropicLqd: Under those circumstances couldn't you use the SetTimer(..) function to reduce the number of times the Timer() function is called?

Foxpaw: Yes, but you only have one Timer. If a superclass uses it you won't be able to use it unless your timing needs are the exact same as those of the superclass. Furthermore, you can only use it for one thing then. Say, for instance, you wanted to make your own physics system. You wanted to update location every tick, velocity every 3 ticks, and rotation every 5 ticks. Furthermore, you want to check collision only every 13 ticks. This would be a bit difficult to do in Timer.

Find bottlenecks by measuring execution time

You can use the Clock and UnClock functions (defined in Actor, moved to Object for DeusEx) to measure the time a part of your code spends executing, in milliseconds. Use this to find the sections of your code that require performance optimzation most urgently, and to compare different ways of doing something performance-wise.

function MyFunction()
{
    local float ExecutionTime;

    Clock(ExecutionTime);
    // do something
    UnClock(ExecutionTime);


    Log("Time spent executing something:" @ ExecutionTime @ "ms");
}

Clock is a very useful function when you know exactly what you want to measure. It is also handy because it returns the elapsed time to you, so you can use it to take averages, etc. You could also even use it for some kind of gameplay timer, though it sometimes "rolls over" which might not make it the best for that.

If you want to test a lot of code, adding a lot of clock, unclock, and log statements can be tedious. Enter Stopwatch. Stopwatch is a much more powerful function than Clock, but works very differently. The stopwatch is global, and it's either "on" or "off."

StopWatch( false ); starts the stopwatch. StopWatch( true ); disables and resets the timer. When you stop the timer, a line will be printed in the log stating something like: Time=41.768ms. This is the time that the stopwatch was at when it was stopped. This makes it appear to be very much like clock.. but the log statement that shows the stop time is not what you can really do with StopWatch.

The magic of stopwatch is that when it is running it timestamps log entries. Each is stamped with the time elapsed since the timer was started. It is great for finding out the time taken throughout a function, without writing a veritable pile of clock and unclock statements, as well as adding temporary variables for them, and the works! But I digress. Here's an example:

function SuperFantasticFunction()
{
  local int i;

  StopWatch( false );

  log( "Beginning Execution" );
  // ... some code goes here ...
  log( "Initialization Complete" );
  // ... more code ...
  log( "Precomputation Complete" );
  // ... yet more code ...
  log( "Entering Loop" );
  for (i=0;i<5;i++)
  {
    // .. something done in a loop
    log( "Iteration "$(i+1) );
  }

  log( "Loop Completed" );
  // ... some finishing code ...
  log( "Function Completed" );

  StopWatch( true );
}

Produces something like the following in the log: (except stopwatch actually has a few more digits of precision.)

  ScriptLog: 0.00 ms: Beginning Execution 
  ScriptLog: 0.12 ms: Initialization Complete
  ScriptLog: 0.45 ms: Precomputation Complete
  ScriptLog: 0.76 ms: Entering Loop
  ScriptLog: 1.02 ms: Iteration 1 
  ScriptLog: 1.35 ms: Iteration 2
  ScriptLog: 1.68 ms: Iteration 3
  ScriptLog: 1.92 ms: Iteration 4
  ScriptLog: 2.24 ms: Iteration 5
  ScriptLog: 2.24 ms: Loop Completed
  ScriptLog: 3.16 ms: Function Completed
  Time=3.16 ms

Unroll your loops

Setting up a loop can take extra time (especially if Mychaeel is correct and loop iterations are counted). In some cases, the number of iterations are known, and a loop doesn't need to be used at all:

for (int i = 0; i < 5; i++)
{
    //Some Code.
}

can be changed to

//Some Code.
//Some Code.
//Some Code.
//Some Code.
//Some Code.

Dante: What if the engine doesn't count loop iterations but counts every instruction ? Then you might save 3 called instructions with the unrolled loop. Sometimes it looks very ugly, leaving the for(i=0...5) solution the better one.

Sordith: We could second guess how the scripting engine works until we fill up the server's hard drive and not really get anywhere. I think I tested this one (along with all of the techniques I posted), but I can't remember the results. I'll test again later and post some sample numbers. You are correct that sometimes (read most of the time) it looks very ugly, and the time saved during execution may not be worth the time lost when working with the ugly code. I didn't mention that because the questions of what and when to optimize could easily take it's own page.

Refine your algorithms

Different operations take different amounts of time. Generally speaking, addition, subtraction, assignment, and shifting take small amounts time. Multiplication takes a slightly longer time, while division takes the most time. Using floating point numbers also increases execution time. Integer multiplication and division by powers of 2 can be converted into shifts. Division by a floating point number can be converted into multiplication by a floating point number (x/0.5 == x * (1/0.5)).

Sordith: Don't like the way this one reads, but can't seem to spit it out more clearly.

Mychaeel: I don't believe UnrealScript's compiler optimizes constant subexpressions, so "1/0.5" does probably involve a division at run time. Also, while you're technically right of course, I don't believe that this technique has noticable impact on the execution speed of UnrealScript code – you have to assume that UnrealScript byte code is executed at least one order of magnitude slower than native code, and on top of that every operation in an expression translates to a function call with all of its overhead anyway (instead of being resolved to an inline operation).

Sordith: This is true. Optimising outside of a loop will rarely have noticable results. These techniques should be used where working with a large number of objects, or when doing calculations every frame, and probably not even then unless you need the extra time.

I was thinking more along the lines of:

local float mult = 1/0.5;
for (int i = 0; i < 1000; i++)
{
    a *= mult;
    b *= mult;
    c *= mult;
}

also, if you want to divide by a constant number, you can change it to multiplication by a constant number without the overhead of the first division or the temp variable.

Be careful with temporary variables

If you add a local variable to a function, that variable will be created when that function is call, even if the variable isn't used. With a normal int or float that isn't much of a problem, but if the variable is another class, every variable inside that class will also be created (and initialized?).

Wormbo: Actually all variables (class variables and local ones) are initialized with their corresponding null value, which is None for all object and class variables. (They are just references anyway.) It's probably more important to keep track of the structs and static arrays since all their elements are actually created. (and again initialized with null values) This is the reason why you can declare recursive structs but can't actually use them.
Anyway, the compiler will warn you about any unused local variable it finds.