Chapter 9. MC-3020 Collections

Lists are used by MC-3020 to keep track of collections of instances in the system. Collections of instances appear in several contexts including pools of instances of classes, sets of instances participating in an association and sets of instances resulting from SELECT MANY statements.

MC-3020 Collections

MC-3020 manages collections of instances using various collection mechanisms which are selectable through marking. As of version 3, MC-3020 provides two flavors of collection container, singly linked lists (slists) and doubly linked lists (dlists). Each container strategy has advantages and disadvantages with regard to code size and speed. In general, dlists are faster but take more storage. In the following sections details of these constructs are outlined.

Containers

Collections of class instances (objects) must be supported by any model compiler. Processing performs operations on these sets of data. Instances can be collected and organized in several ways. At minimum, a model compiler must be able to support:

  • selection from the existing instances of any specified class

  • selection from instances of any associated class

  • the analysis variable type inst_ref_set

We refer to the first case as instance extents. The second are referred to as association extents. The third are transient set variables (often called selection extents).

As of version 3, MC-3020 maintains collections in linked lists. Linked lists are flexible and relatively light weight (small, simple code).

Sets

Operations on sets include Insert, Remove, Clear, Copy, Cardinality, IsEmpty, Equality, Contains and Iterate. Some variations on these methods may also be included for various optimizations.

SELECT MANY makes a copy of an instance extent or association extent into a transient set variable. Cursors are used to iterate through sets of instances.

SELECT ANY returns the first element in the instance extent or association extent.

A set generates a different type of variable than does an element in a set. This allows for sets to have attributes such as a head (and perhaps a tail and cardinality). Set elements have attributes such as a link to the next element and a pointer to the substance of the data they contain.

Set Symmetry

Consistency between the three flavors of collections makes the model compiler simpler. Simpler usually means smaller code, but there are some exceptions to this general rule. Up through version 2.2 of MC-3020, instance extents, association extents and transient sets were treated almost identically. In versions above 2.2, instance extents get special treatment so that the model compiler can optimize for size and speed based upon characteristics true only of instance extents.

Instance extent sets are treated special in that their containers are never returned to the free pool and remain permanently attached to the instance data. Since the instances basically move between active (animate) and inactive (inanimate) lists, there is no need to detach the container as in association extents and transient set variables. Although this breaks the symmetry, it enables the generation of smaller and faster code.

MC-3020 applies these specializations as marking options to enhance the set operation performance of the model compiler generated code.

Singly Linked Lists

Singly linked lists are simple and small. For most of the set operations, singly linked lists are fast because of their simplicity. The exception to this is the delete operation. A delete operation on a singly linked set requires the address of the container of the data element being deleted. Delete also requires the address of the container preceding the deleted data and the address of the container following the deleted item. As input we have none of these three addresses but only the address of the data (object) being deleted.

In MC-3020 marked with slist containers, this meant that a search for the data element is required. This search begins at the head of the list and proceeds down the list until the data element is found. Variables track the current node and previous nodes as the list is traversed. Once the data element is found, these node variables contain 2 of the 3 three needed container addresses. A simple dereference of current->next yields the final container address. The singly linked list delete links the previous container to the next container thus unlinking the current node.

As collections get large, this linear search involved in set item deletion gets burdensome.

Figure 9.1.  Singly Linked Lists of Instances

Singly Linked Lists of Instances

Doubly Linked Lists

Adding a more capable container node structure alleviates the delete instance problem. With a prev pointer together with the next pointer, it is an easy matter to learn the address of the previous and following containers given the address of the container containing the data to be deleted. Note however, that we are not given the address of the container but the address of the data itself. Thus, unless a clever mechanism exists for deriving (divining) the container from the data handle, we are relegated to linear searches once again.

Figure 9.2.  Doubly Linked Lists of Instances

Doubly Linked Lists of Instances

If data allocation of instances and containers is performed in a special way, there does exist a mechanism for deriving the address of the container from the address of the data.

In MC-3020 versions beyond 2.2, a doubly linked list container is available. Instances and containers are allocated in memory in a way that relates the instances to their containers mathematically. One container pool is allocated for each pool of class instances. These are parallel arrays. Remember that the container stays attached to the instance for the duration of system run time, since instances from the extents are simply moved from inactive to active and back for create and delete respectively.

During initialization, containers are linked to instance data in parallel. Container 0 is the container for instance index 0 of any particular class. Container 1 is the container for the instance at index two in the class pool array for this class. In other words, the array indexes for the containers are the same as the array indexes of the instances for which they serve.

Figure 9.3.  Parallel Container/Object Arrays

Parallel Container/Object Arrays

Thus, if we know the index of an instance data element, we know the index of the instance extent container pointing to it. We can derive the index of any pointer to an array element through pointer arithmetic.

    index = pointer_to_array_element - address_of( array_element[0] )
    

We now have all the information needed to derive the address of an instance container, its previous container and next container all in constant time. The delete operation on an instance no longer involves a linear search of the extent.

Note that this "magic" only applies to sets managing instance extents. Deletions from association extents, UNRELATES, are not helped and still involve a search of the set. UNRELATES from very large association extents (hundreds or thousands) can suffer from linear search overhead.

Performance Considerations

A comparison of the two collection mechanisms in light of deleting instances shows the power of the constant time address derivation. Once instance extents grow larger than one hundred, the doubly linked collection mechanism with address derivation begins to show benefit. At extent sizes of ten thousand (10,000), the singly linked collection mechanism becomes two orders of magnitude slower than the doubly linked counterpart.

Merged Containers

Another clever way to relate the instance containers to the instances themselves is to allocate them inside of the instances themselves. Under such a scheme, the base instance class would have the container class as its first element. This would amount to overloading the class with container members as data elements, namely next, prev, and object. Such a scheme renders trivial the mathematics of deriving the address of the instance from the address of the container. They are exactly the same. The address of the container is the address of the instance (and vice versa).

Figure 9.4.  Containers Merged Into Instance Data

Containers Merged Into Instance Data

Such a scheme suffers a weakness in light of preexisting instance (PEI) support. The merged containers scheme adds two or three pointer size data elements to every class instance. To populate the instances with data, it would be necessary to populate the container pointers with the instances. Even if zero, this would mean 4 to 12 bytes of additional constant initializer data per instance in the system.

In light of this weakness, MC-3020 has opted for the parallel array approach to mathematically relating the containers to the instances. Even though MC-3020 does not (currently) use merged containers, there remains a great deal of merit in the approach.