Qt
Internal/Contributor docs for the Qt SDK. <b>Note:</b> These are NOT official API docs; those are found <a href='https://doc.qt.io/'>here</a>.
|
Since Qt 6.8, V4 uses an incremental, precise mark-and-sweep gc algorithm. It is neither generational nor moving.
In the mark phase, each heap-item can be in one of three states:
Items are black if they have their corresponding bit set in the black-object bitmap. They are grey if they are stored at least once in the MarkStack, a stack data structure. Items are white if they are neither grey nor black. Note that black items must never be pushed to the MarkStack (otherwise we could easily end up with endless cycles), but items already on the MarkStack can be black: If an item has been pushed multiple times before it has been popped, this can happen. It causes some additional work to revisit its fields, but that is safe, as after popping the item will be black, and thus we won't keep on repeatedly pushing the same item on the mark stack.
The roots consist of
All roots are added to the MarkStack. Then, during mark phase, entries are:
To avoid that values that were on the heap during the start of the gc cycle, then moved to the stack before they could be visited and consequently freed even though they are still live, the stack is rescanned before the sweep phase.
To avoid that unmarked heap-items are moved from one heap item (or the stack) to an already marked heap-item (and consequently end up hidden from the gc), we employ a Dijkstra style write barrier: Any item that becomes reachable from another heap-item is marked grey (unless already black).
While a gc cycle is ongoing, allocations are black, meaning every allocated object is considered to be live (until the next gc cycle is started). This is currently required as compilation units linked to the engine while the gc is running are not protected by the write barrier or another mechanism. It also helps to reduce the amount of work to be done when rescanning the JS stack (as it helps to ensure that most items are already black at that point).
To facilitate incremental garbage collection, the gc algorithm is divided into the following stages:
V4_NEEDS_DESTROY
. There is one phase for the various allocators (identifier table, block allocator, huge item allocator, IC allocator)To avoid constantly having to query the timer, even interruptible phases run for a fixed amount of steps before checking whether there's a timemout.
Most steps are straight-forward, only the persistent and weak value phases require some explanation as to why it's safe to interrupt the process: The important thing to note is that we never remove elements from the structure while we're undergoing gc, and that we only ever append at the end. So we will see any new values that might be added.
As shown in the diagram above, the handling of persistent values is interruptible (both for "real" persistent values, and also for weak vaules which also are stored in a PersistentValueStorage
data structure. This is done by storing the PersistentValueStorage::Iterator
in the gc state machine. That in turn raises two questions: Is the iterator safe against invalidation? And do we actually keep track of newly added persistent values?
The latter part is easy to answer: Any newly added weak value is marked when we are in a gc cycle, so the marking part is handled. Sweeping only cares about unmarked values, so that's safe too. To answer the question about iterator validity, we have to look at the PersistentValueStorage
data structure. Conceptionally, it's a forward-list of Page
s (arrays of QV4::Value
). Pages are ref-counted, and only unliked from the list if the ref-count reaches zero. Moreover, iterators also increase the ref-count. Therefore, as long as we iterate over the list, we don't risk having the pointer point to a deleted Page
– even if all values in it have been freed. Freeing values is unproblematic for the gc – it will simply encounter undefined
values, something it is already prepared to handle. Pages are also kept in the list while they are not deleted, so iteration works as expected. The only adjustment we need to do is to disable an optimization: When searching for a Page with an empty slot, we have to (potentially) travese the whole PersistentValueStorage
. To avoid that, the first Page with empty slots is normally moved to the front of the list. However, that would mean that we could potentially skip over it during the marking phase. We sidestep that issue by simply disabling the optimization. This area could easily be improved in the future by keeping track of the first page with free slots in a different way.
Some parts of the engine have to deviate from the general scheme described in the overview, as they don't integrate with the normal WriteBarrier. They are wrapped in the callback of the QV4::WriteBarrier::markCustom
function, so that they can easily be found via "Find references".
QJSValue::encode
. QJSValues act as roots, and don't act as normal heap-items. When the current value of a QJSValue is overwritten with another heap-item, we also mark the new object. That aligns nicely with the gc barrier.PersistentValue::set
.SharedInternalClassDataPrivate<PropertyKey>::set
(for PropertyKeys that had already been allocated), and on the fact that we allocate black for newly created PropertyKeys.QV4::Heap::InternalClass
also requires special handling, as it uses plain members to Heap objects, notably to the prototype and to the parent internal class. As the class is somewhat special in any case (due to the usage of SharedInternalClassData
and especially due to the usage of SharedInternalClassData<PropertyKey>
, see notes on PropertyKey above), we use some bespoke sections for now. This could probably be cleaned up.A story for another day
Your explanation is in another castle.