The search for background code execution in FG

**sirnoobsauce** · April 21st, 2023, 16:09

I recently did a bunch of experimentation and work figuring out a way to run non-blocking code (or as close to non-blocking as I could manage) and figured it might be useful information for the rest of the extension/ruleset development community. This is mainly directed at other folks who work on Extensions/Rulesets if you've ever wanted to have the option to run async code in the style of an executor, or asycnc/await calls.

I will explain the mechanism below, but I put the async code in a github repo in case anyone out there also wants to be able to schedule background jobs in FG extensions or rulesets:

https://github.com/bakermd86/AsyncLoop

I should note that while I did zip this up into an ext file, I don't really think this makes a lot of sense to package as its own standalone Extension. I just did that in case anyone wants to try the included example, but really if you want to make use of this in your own extensions/rulesets, you probably just want to checkout the code from github and include it directly in your own ext/pak file.

Usage is relatively straightforward, although depending on what you want to do it can get as complicated as you want. It works a bit like a map() call in python or JS. You provide a job name, a list of arguments, a target function, and two optional arguments: a callback function and boolean to override the status display setting. Then the iterable table of arguments will be passed to the target function, and any output from those calls is aggregated and passed to your callback function at the end. This is the function signature:

Code:

function scheduleAsync(callName, targetFn, callArgs, callbackFn, silent)

The repo includes an examples.lua script showing a simple example where you can run the same operation (multiplying some large number of integers) both synchronously and asynchronously to compare:

Code:

	...
    Comm.registerSlashHandler("asyncMathExample", doMathAsync, "/asyncMathExample <number of args>")
	...
	
function doMathAsync(sCommand, sParam)
    local numbers = {}
    for i=1,tonumber(sParam) do
        local _x = math.random(1, 1000)
        local _y = math.random(1, 1000)
        table.insert(numbers, { x=_x, y=_y })
    end
    AsyncLib.scheduleAsync("doMath", wrapDoMath, numbers, mathCallback)
    AsyncLib.startAsync()
end

function mathCallback(callName, asyncResults, asyncCount, asyncCpuTime)
    Debug.chat("Got " .. asyncCount .. " results in " .. asyncCpuTime .. "s of CPU time")
end

function wrapDoMath(callArg)
    return doMath(callArg.x, callArg.y)
end

function doMath(x, y)
    local z = x * y
    Debug.chat(x .. " x " .. y .. " = " .. z)
    return z
end

The argument list (numbers) is a list of tables, and the target function (wrapDoMath) takes those tables as its arguments. The signature for the callback function (mathCallback) is:

Code:

callName (str), asyncResults (table), asyncCount (int), asyncTime (float)

callName is whatever you provided, asyncResults is a list of the outputs from your target function, asyncCount is how many input arguments were passed (which can be different to the number of results) and asyncTime is the CPU time of the execution (which can be different to the wall time).

More complex tasks can be achieved using table arguments that contain data on their state (like an instance of a class, although Lua doesn't really have classes but that's a separate topi...). The event loop will check for the presence of the boolean value "isActive" in the input arguments, and if present will re-run the same argument against the target function until isActive is false. This allows for jobs to function like coroutines, with suspended execution and resumption, even though the FG Lua environment does not include coroutines.

An example of this more complex multi-call usecase can be see in the search indexer in my record browser extension:
https://www.fantasygrounds.com/forum...G-5e-SWD-other
https://github.com/bakermd86/FoogleBrowser

Code:

function initIndexer(indexer)
    indexer.childNodes = walkChildren(indexer.node)
    indexer.isStarted = true
    indexer.isActive = true
end
...
...
function updateOnIndex(indexer)
    ...
    ...
    indexer.isActive = false
end

function runIndexer(indexer)
    if not indexer.isStarted then
        initIndexer(indexer)
    elseif #indexer.childNodes == 0 then
        updateOnIndex(indexer)
    else
        indexNextChild(indexer)
    end
end

function newIndexer(node, recordType, isLibrary)
    local indexer = {}
    indexer.node = node or ""
    indexer.recordType = recordType or ""
    indexer.nodeType = DB.getType(node)
    indexer.nodeStr = DB.getPath(indexer.node)
    indexer.isActive = false
    indexer.isStarted = false
    indexer.isLibrary = isLibrary or false
    indexer.isReindex = false
    indexer.node_results = {}
    indexer.childNodes = {}
    return indexer
end

In this example, the tables being used as the arguments are created by the newIndexer() function, and contain state data that is used to chain the execution across across multiple invocations of the event loop. This was necessary because each individual record can take longer to index than would be acceptable in some computing environments.

By including the isActive boolean, the event loop will pass the same table to the runIndexer() function continuously until the indexer has gone through all of its childNodes, and sets isActive to false. I removed all the actual functional code relevant to the indexing itself, this is just showing how a table can be used to perform more complex multi-step tasks using this mechanism.

**sirnoobsauce** · April 21st, 2023, 16:12

Now in case anyone is interested in the details of how and why this works, I will go into a bit of detail.

This all came about because of the aforementioned browser extension. The search was initially an afterthought, but enough people seemed interested in the search specifically that I put some though into a proper search function.

Immediately I ran into the issue that building an index can take a long time if there are a lot of modules. The lua scripting environment that is available in FG does not implement any sort of non-blocking execution (coroutines are not included). No matter what code you write, execution is always blocking. During the time that your code is executing, the entire client UI will hang.

I immediately got interested in seeing if there was a way to work-around that, because it was just a fun problem to play around with after putting the kids to bed.

I don't have access to FG source code, but I am guessing that they are invoking any Lua method calls from the main client UI event thread. That itself is not too hard to overcome. As long as you can break up whatever long-running task you want to do into small chunks. Hence why it uses this map/iterable model.

The bigger issue is finding a reliable and stable way to trigger the events. I experimented with a lot of different options, but most of them don't really work. Things like DB handlers, OOBMessage handlers, event handlers, etc. are all processed sequentially. So if you (for example) fire an OOBMessage to your own client locally, the handler for that OOBMessage is not processed in a separate Lua invokation, it runs directly on the same stack. That is fine for handling a single event, but if you want to chain them one after another to continuously trigger an event loop, it means that the UI will block until the recursive Lua calls cause a stack overflow.

The workaround I found was to use the UI events as the trigger. There are probably other ways to get this same effect, but this was the best one I found. The key element is an interaction between setSize() and the sizelimits on dynamically sized windows.

So normally, if you were to write a window with an onSizeChanged handler that calls setSize(), one of two things will happen:

1) At some point, setSize() will set the size equal to the current size, and the recursion will stop because the handler will not be called
2) If you are careful to set a new size each time, then the recursion will continue until you get a stack overflow and it errors out

The trick I found to avoid these is to just create a window with a minimum dynamic size, then have an onSizeChanged handler that sets the size below the minimum window size:

Code:

    <windowclass name="async_trigger">
		...
        <sizelimits>
            <minimum width="50" height="50"/>
            <dynamic />
        </sizelimits>

Code:

function onInit()
    self.onSizeChanged = sizeTrigger
    setSize(25, 25)
...
function sizeTrigger()
    if not AsyncLib.eventLoop() then
        emptyRuns = emptyRuns + 1
        if emptyRuns > 15 then
            self.onSizeChanged = closeSafe
        end
    end
    setSize(25, 25)
end

I don't have access to the FGU source to see the exact logic in their event thread. But I am guessing that setSize() allows you to set windows to sizes below their minimum size, but then some check that runs after the setSize() call is done finds that the window is below the minimum, and resizes it back to the minimum. If you look at the stack for the setSize() calls, you can see that it does directly call onSizeChanged() from the setSize() native call the first time:

Code:

[4/21/2023 4:07:16 PM] 
stack traceback:
	[string "AsyncLib:lib_async/scripts/async_lib.lua"]:117: in function 'eventLoop'
	[string "AsyncLib:.._async/scripts/async_trigger.lua"]:13: in function <[string "AsyncLib:.._async/scripts/async_trigger.lua"]:12>
	[C]: in function 'setSize'
	[string "AsyncLib:.._async/scripts/async_trigger.lua"]:19: in function <[string "AsyncLib:.._async/scripts/async_trigger.lua"]:12>

But the recursion breaks because the handler just sets the size to the same size it already is (25x25), thereby avoiding a stack overflow and allowing execution to return from the Lua handler to the application event thread.

However, because the size is now below the minimum size (50x50) a second onSizeChanged() event is called from the main application, and this 2-step repeats:

Code:

[4/21/2023 4:07:16 PM] 
stack traceback:
	[string "AsyncLib:lib_async/scripts/async_lib.lua"]:117: in function 'eventLoop'
	[string "AsyncLib:.._async/scripts/async_trigger.lua"]:13: in function <[string "AsyncLib:.._async/scripts/async_trigger.lua"]:12>
[4/21/2023 4:07:16 PM] 
stack traceback:
	[string "AsyncLib:lib_async/scripts/async_lib.lua"]:117: in function 'eventLoop'
	[string "AsyncLib:.._async/scripts/async_trigger.lua"]:13: in function <[string "AsyncLib:.._async/scripts/async_trigger.lua"]:12>
	[C]: in function 'setSize'
	[string "AsyncLib:.._async/scripts/async_trigger.lua"]:19: in function <[string "AsyncLib:.._async/scripts/async_trigger.lua"]:12>

This allows for the event loop to run, then break out and return to the main application event thread in between calls. It allows the client to remain responsive and functional, as long as the individual tasks being run can be completed within a few dozen milliseconds each. There is also an async weight setting that will set the event loop to call multiple arguments for each invocation, to speed up scheduling if the PC can handle it.

That defaults to automatic, where the scheduler adjusts itself every second to shoot for returning control to the UI thread 30 times / second.

Anyways, this is maybe too niche to be of much interest to the broader community, but I figured I'd share in case anyone else is looking to be able to run longer-running bits of code without having to worry about locking the UI. It was certain a fun little problem to work on the past week or two.

**Moon Wizard** · April 21st, 2023, 16:14

One caveat for anyone writing "background execution code" is that you have to take into consideration how much is happening during the "background code" from taking too much time and slowing down execution in the rest of the application. All existing code in the standard FG rulesets is triggering based on data/UI/network events, rather than update loops.

Regards,
JPG

**sirnoobsauce** · April 22nd, 2023, 20:08

Originally Posted by Moon Wizard

One caveat for anyone writing "background execution code" is that you have to take into consideration how much is happening during the "background code" from taking too much time and slowing down execution in the rest of the application. All existing code in the standard FG rulesets is triggering based on data/UI/network events, rather than update loops.

Regards,
JPG

Yeah, that is why the map-style approach. For anyone out there that is curious: there is no actual async execution possible, anytime user Lua code is running the UI will be completely frozen. That is why I explain above that you have to be careful that each individual execution of whatever you want to run is on the order of 10s of milliseconds or less, to ensure that you can return control to the underlying client code to be able to process UI events.

Ideally, you want each individual call to be incredibly short (1 or <1 ms), and the "Automatic" priority logic can loop through records with a target of returning control to the underlying client application code 30 times / second. If you were to pass some function into this that takes 10 seconds to run, it isn't going to do anything for you. It will still just block fully for 10 seconds.

Thread: The search for background code execution in FG

Thread Tools