Private Practice: Taming Templates


In this episode, we’ll look at a way to tame the code expansion caused by excessive template use.

Templates are, of course, a way to create new classes and function from a template. If that template isn’t carefully designed, it can easily cause a lot of code to be created, because each function and each class instantiated (=generated by the compiler) from a template is a separate entity that shares absolutely nothing with other instantiations of the same template, except the part of the name that comes before the first <.

Bjarne Stroustrup writes about this problem in his excellent book “The Design and Evolution of C++”:

Avoiding unnecessary space overheads caused by too many instantiations was considered a […] problem […]. The rules requiring “late” instantiation of template functions […] ensure that code is not replicated when a template is used with the same template arguments in different translation units. I considered it unlikely [however] that early (or even late) template implementations would be able to look at instantiations of a class for different template arguments and figure out when all or part of the instantiated code could be shared.
[…]
People […] have found that replicated code can cost megabytes of code space even in moderate size programs.

He goes on to present derivation as a solution for this problem. Basically, he creates a pvector (pointer vector) class template by deriving from vector<void*> and implementing all functionality in terms of that private base class, so that functions in pvector at most need to add a cast.

Instead of looking at this ridiculously simple example, I’d like to look at three class templates from KDTools: KDAutoPointer, KDThreadRunner and KDGenericFactory.

KDAutoPointer is a mixture between QPointer and std::auto_ptr: It owns the contained object, but relinquishes ownership if the object is otherwise deleted. Like QPointer, it needs the contained object to derive QObject.

The first implementation looked like this:

template <typename T>
class KDAutoPointer {
    QPointer<T> o;
public:
    typedef T element_type;
    typedef T value_type;
    typedef T * pointer;

    KDAutoPointer() : o() {}
    explicit KDAutoPointer( T * obj ) : o( obj ) {}
    KDAutoPointer( KDAutoPointer & other ) : o( other.release() ) {}
    template <typename U>;
    KDAutoPointer( KDAutoPointer<U> & other ) : o( other.release() ) {}

    KDAutoPointer & operator=( KDAutoPointer & other ) {
        reset( other.release() );
        return *this;
    }

    // ...

    ~KDAutoPointer() { delete o; }

    void swap( KDAutoPointer & other ) {
        const QPointer<T> copy( o );
        o = other.o;
        other.o = copy;
    }

    T * get() const { return o; }
    T * release() { T * copy = o; o = 0; return copy; }

    void reset( T * other=0 ) { if ( o != other ) { delete o; o = other; } }

    T & operator*() const { return *get(); }
    T * operator->() const { return get(); }

    KDAB_IMPLEMENT_SAFE_BOOL_OPERATOR( get() )
};

Since T must always derive from QObject, we can go the pvector path, and implement KDAutoPointer<T> in terms of KDAutoPointer<QObject>, which we’ll call KDAutoPointerBase:

class KDAutoPointerBase {
protected:
    QPointer<QObject> o;

    KDAutoPointerBase() : o() {}
    explicit KDAutoPointerBase( QObject * obj ) : o( obj ) {}
    KDAutoPointerBase( KDAutoPointerBase & other ) : o( other.release() ) {}
    ~KDAutoPointerBase() { delete o; }

    void swap( KDAutoPointerBase & other ) {
        const QPointer<QObject> copy( o );
        o = other.o;
        other.o = copy;
    }

    QObject * release() { QObject * copy = o; o = 0 ; return copy; }
    void reset( QObject * other ) { if ( o != other ) { delete o; o = other; } }

public:
    KDAB_IMPLEMENT_SAFE_BOOL_OPERATOR( o );
};

template <typename T>
class KDAutoPointer : KDAutoPointerBase {
public:
    typedef T element_type;
    typedef T value_type;
    typedef T * pointer;

    KDAutoPointer() : KDAutoPointerBase() {}
    explicit KDAutoPointer( T * obj ) : KDAutoPointerBase( obj ) {}
    KDAutoPointer( KDAutoPointer & other ) : KDAutoPointerBase( other.release() ) {}
    template <typename U>
    KDAutoPointer( KDAutoPointer<U> & other ) : KDAutoPointerBase( other.release() ) { T * t = other.get(); ( void )t; }

    KDAutoPointer & operator=( KDAutoPointer & other ) {
        this->reset( other.release() );
        return *this;
    }

    // ...

    ~KDAutoPointer() {}

    void swap( KDAutoPointer & other ) {
        KDAutoPointerBase::swap( other );
    }

    T * get() const { QObject * obj = o; return static_cast<T*>( obj ); }
    T * release() { return static_cast<T*>( KDAutoPointerBase::release() ); }

    void reset( T * other=0 ) { KDAutoPointerBase::reset( other ); }

    T & operator*() const { return *get(); }
    T * operator->() const { return get(); }

    KDAB_USING_SAFE_BOOL_OPERATOR( KDAutoPointer )
};

It takes an analytical look to see commonalities and factor them out. A particularly easy prey are template arguments only used as pointers. Most of the time, you can factor code out if you mentally derive the class template from its own instantiation for the template argument’s required base class (QObject, in the KDAutoPointer case) or with void (the pvector case).

Let’s try on another one.

KDThreadRunner is an implementation of the pattern described by Bradley T. Hughes in his classic You’re doing it wrong.. blog post.

The original code looked like this:

template <class T> // T must derive from QObject
class KDThreadRunner : public QThread
{
public:
    explicit KDThreadRunner( QObject* parent = 0 )
        : QThread(parent), m_waitStarted()
    {
    }

    T* startThread()
    {
        start();
        m_waitStarted.acquire(); // wait for T to be created by run()
        return m_impl;
    }

protected:
    virtual void run()
    {
        // impl is created in the thread so that m_impl-&gt;thread()==this
        m_impl = new T(this);
        // Tell startThread that we have created T
        m_waitStarted.release();

        exec();

        delete m_impl;
    }
private:
    QSemaphore m_waitStarted;
    T* m_impl;
};

Here, we can’t hope to move all functionality into a base class (non-template). Among other things, we’ll need to keep the new T(this) in the class template. The solution here is to apply Extract Method to startThread() and run() to separate the template-dependent from the common code, and only push down the common code:

class KDThreadRunnerBase : public QThread
{
public:
    ~KDThreadRunnerBase();

protected:
    explicit KDThreadRunnerBase( QObject* parent = 0 );

    void doStart();
    void doExec();

    QObject* m_impl;

private:
    QSemaphore m_waitStarted;
};

template <class T>
class KDThreadRunner : public KDThreadRunnerBase
{
public:
    explicit KDThreadRunner( QObject* parent = 0 )
        : KDThreadRunnerBase(parent) {}

    T* startThread()
    {
        doStart();
        return static_cast<T*>(m_impl);
    }

protected:
    virtual void run()
    {
        // impl is created in the thread so that m_impl->thread()==this
        m_impl = new T(this);
        doExec();
    }
};

As a benefit, the base class can now be implemented out-of-line.

An even more complex, but particularly rewarding, example is KDGenericFactory:

template< typename T_Product, typename T_Identifier=QString, template<typename U, typename V> class T_Map=QHash >
class MAKEINCLUDES_EXPORT KDGenericFactory
{
public:
    virtual ~KDGenericFactory()
    {
    }

    typedef T_Product* (*FactoryFunction)();

    template< typename T >
    void registerProduct( const T_Identifier & name )
    {
#ifdef Q_CC_MSVC
        FactoryFunction function = &KDGenericFactory::create<T>;
        registerProductionFunction( name, function );
#else
        registerProductionFunction( name, &create<T> );
#endif
    }

    void unregisterProduct( const T_Identifier & name )
    {
        map.remove( name );
    }

    unsigned int productCount() const
    {
        return map.size();
    }

    QList< T_Identifier > availableProducts() const
    {
        return map.keys();
    }

    T_Product* create( const T_Identifier & name ) const
    {
        const typename T_Map< T_Identifier, FactoryFunction >::const_iterator it = map.find( name );
        if( it == map.end() )
            return 0;
        return (*it)();
    }

protected:
    void registerProductionFunction( const T_Identifier & name, FactoryFunction createfun )
    {
        map.insert( name, createfun );
    }

private:
    template< typename T >
    static T_Product* create()
    {
        return new T;
    }

    T_Map< T_Identifier, FactoryFunction > map;
};

Here, we have three template arguments: T_Product, which can be any class’ hierarchy’s base class, T_Identifier, which is often QString, but could be anything that can be a key type in an associative container, and T_Map which is the associative container to use (can only be QHash or QMap). Let’s do the exercise and look at the class template after replacing T_Product with void:

template< typename T_Identifier=QString, template<typename U, typename V> class T_Map=QHash >
class MAKEINCLUDES_EXPORT KDGenericFactoryBase
{
public:
    virtual ~KDGenericFactory()
    {
    }

    typedef void* (*FactoryFunction)();

    template< typename T >
    void registerProduct( const T_Identifier & name )
    {
        FactoryFunction function = &KDGenericFactoryBase::create<T>;
        registerProductionFunction( name, function );
    }

    void unregisterProduct( const T_Identifier & name )
    {
        map.remove( name );
    }

    unsigned int productCount() const
    {
        return map.size();
    }

    QList< T_Identifier > availableProducts() const
    {
        return map.keys();
    }

    void* create( const T_Identifier & name ) const
    {
        const typename T_Map< T_Identifier, FactoryFunction >::const_iterator it = map.find( name );
        if( it == map.end() )
            return 0;
        return (*it)();
    }

protected:
    void registerProductionFunction( const T_Identifier & name, FactoryFunction createfun )
    {
        map.insert( name, createfun );
    }

private:
    template< typename T >
    static void* create()
    {
        return new T;
    }

    T_Map< T_Identifier, FactoryFunction > map;
};

The difference here is that the result is still a class template. But who cares? Just inherit from it and implement the general case in terms of the base class (after the micro-optimisation of making the destructor protected and non-virtual):

// KDGenericFactoryBase as above, but with protected non-virtual dtor

template< typename T_Product, typename T_Identifier=QString, template<typename U, typename V> class T_Map=QHash >
class MAKEINCLUDES_EXPORT KDGenericFactory : KDGenericFactoryBase<T_Identifier,T_Map>
{
    typedef KDGenericFactoryBase<T_Identifier,T_Map> base;
public:
    virtual ~KDGenericFactory() // still virtual!!
    {
    }

    typedef T_Product* (*FactoryFunction)();

    using base::registerProduct;
    using base::unregisterProduct;
    using base::productCount;
    using base::availableProducts;

    T_Product* create( const T_Identifier & name ) const
    {
        return static_cast<T_Product*>( base::create( name ) );
    }

protected:
    void registerProductionFunction( const T_Identifier & name, FactoryFunction createfun )
    {
        typename base::FactoryFunction fun = reinterpret_cast<typename base::FactoryFunction>( createfun );
        base::registerProductionFunction( name, fun );
    }
};

So far, this has been the same procedure as in the other cases, except that our base class is still a class template (albeit with fewer template arguments). The cool thing is that the base class template is now independent of the product created. So if you only ever use QString and QHash as further KDGenericFactory template arguments, we just saved you an awful lot of code.

But with C++11, we can do even better. Realising that there are only a handful likely candidates for the template arguments of KDGenericFactoryBase (QHash and QMap for T_Map and, say, QString and qint64 for T_Identifier), we can provide explicit instantiation of these in the cpp file:

template class KDGenericFactoryBase<QString,QHash>;
template class KDGenericFactoryBase<QString,QMap>;
template class KDGenericFactoryBase<qint64,QHash>;
template class KDGenericFactoryBase<qint64,QMap>;

(that's C++03, nothing new there) and prevent the C++11 compiler from instantiating KDGenericFactoryBase with these arguments in the header file:

// don't instantiate these, expect their definitions at link time (C++11-only):
extern template class KDGenericFactoryBase<QString,QHash>;
extern template class KDGenericFactoryBase<QString,QMap>;
extern template class KDGenericFactoryBase<qint64,QHash>;
extern template class KDGenericFactoryBase<qint64,QMap>;

One can even go yet another step further and implement KDGenericFactoryBase for T_Identifiers of integral (std::is_integral) or enum type (std::is_enum) in terms of KDGenericFactory<qint64,T_Map>, but this is left as an exercise to the reader🙂

[[EDIT: 20111128]]

The KDGenericFactory case is much more complicated than the other examples. Did you spot the bug?

Consider a simple T_Product such as

class Fruit {
public:
    virtual ~Fruit();
};

and implementations

class Apple : public Fruit { /*...*/ };
class Pear  : public Fruit { /*...*/ };
class Orange : public QObject, public Fruit { /*...*/ }; // inherits QObject

Orange needs to inherit QObject (maybe it needs signals and slots). Due to moc limitations, QObject needs to be the first base class of a multiply-inherited derived class. But that means that if you cast an Orange* to a void*, you’ll get the address of its QObject sub-object. If you then cast this pointer value back to Fruit*, the result will differ from the value you get when you cast an Orange* to a Fruit* directly.

But that’s exactly what the above KDGenericFactory<Fruit,T_Identifier,T_Map> does: in (static) KDGenericFactoryBase<T_Identifier,T_Map>::create<T>(), the new T is (implicitly) cast to void* and returned. The returned value is then cast (explicitly) “back” to Fruit* in KDGenericFactory<Fruit,T_Identifier,T_Map>::create( const T_Identifier & ) const.

To fix this bug, we need to make sure that the result of new T is cast to T_Product* (here: Fruit*) first, then to void*. To do this, we can simply move the static create<T>() and the registerProduct<T>() function templates back up into KDGenericFactory:

template< typename T_Identifier, template< typename U, typename V > class T_Map >
class KDTOOLSCORE_EXPORT KDGenericFactoryBase
{
protected:
    ~KDGenericFactoryBase() {}

    typedef void* (*FactoryFunction)();

    void unregisterProduct( const T_Identifier& name )
    {
        map.remove( name );
    }

    unsigned int productCount() const
    {
        return map.size();
    }

    QList< T_Identifier > availableProducts() const
    {
        return map.keys();
    }

    void * create( const T_Identifier & name ) const
    {
        const typename T_Map< T_Identifier, FactoryFunction >::const_iterator it = map.find( name );
        if( it == map.end() )
            return 0;
        return (*it)();
    }

protected:
    void registerProductionFunction( const T_Identifier& name, FactoryFunction createfun )
    {
        map.insert( name, createfun );
    }

private:
    T_Map< T_Identifier, FactoryFunction > map;
};

extern template class KDGenericFactoryBase<QString,QHash>;
extern template class KDGenericFactoryBase<QString,QMap>;
extern template class KDGenericFactoryBase<int,QHash>;
extern template class KDGenericFactoryBase<int,QMap>;

template< typename T_Product, typename T_Identifier = QString, template< typename U, typename V > class T_Map = QHash >
class KDGenericFactory : KDGenericFactoryBase<T_Identifier,T_Map>
{
    typedef KDGenericFactoryBase<T_Identifier,T_Map> base;
public:
    virtual ~KDGenericFactory()
    {
    }

    typedef T_Product* (*FactoryFunction)();

    template< typename T >
    void registerProduct( const T_Identifier& name ) // moved up from KDGenericFactoryBase again
    {
        FactoryFunction function = &KDGenericFactory::create<T>;
        registerProductionFunction( name, function );
    }

    using base::unregisterProduct;
    using base::productCount;
    using base::availableProducts;

    T_Product* create( const T_Identifier& name ) const
    {
        return static_cast<T_Product*>( base::create( name ) );
    }

protected:
    void registerProductionFunction( const T_Identifier& name, FactoryFunction createfun )
    {
        typename base::FactoryFunction fun = reinterpret_cast<typename base::FactoryFunction>( createfun );
        base::registerProductionFunction( name, fun );
    }

private:
    template< typename T >
    static T_Product * create() // now returns as T_Product*, not void*
    {
        return new T;
    }
};

Voilá!

Of course, this bug was deliberately hidden to see if someone would find and comment on it🙂

4 Responses to Private Practice: Taming Templates

  1. andrej says:

    The angle brackets and ampersands in your code blocks have all been mangled – they seem to have been doubly escaped.

  2. Peter Kümmel says:

    Isn’t it possible to suppress any compiler generated instantiations by moving the definition of the template functions into a source file and explicitly instantiate each type ‘by hand’ there?
    Then “extern template class” wouldn’t be needed.

    • marcmutz says:

      Yes, that is an alternative.

      The difference is that extern template lets you suppress instantiation for only a selected subset of arguments. They’re meant to complement explicit template instantiations (in particular, in libraries) so compilers don’t instantiate templates that the linker will throw away anyway.

      Also, in general, you cannot instantiate all templates in their cpp file (again consider a library), so you’d need to provide a header with the declaration (for using the template), and then another one (gcc calls these tcc, I think) with the template definition (so users can include that in their own files to instantiate templates with their custom types).

      Consider that in the KDGenericFactoryBase example, we don’t want to restrict the types of T_Identifier (or even T_Map, even though there are only two classes that meet the required concept atm), we just want to collect the common instantiations in the library instead of in user code, as an optimisation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: