Pimp My Pimpl — Reloaded


This is a translation of a two-part article that originally appeared on Heise Developer. You can find the originals here:

You can find Part One here:

Pimp My Pimpl — Reloaded

Much has been written about this funnily-named idiom, alternatively known as d-pointer, compiler firewall, or Cheshire Cat. After a first article on Heise Developer first presented the classic Pimpl Idiom and its benefits, this second part will focus on removing some of the drawbacks that inevitably occur when using Pimpls.

Part Two

 

The Shallow-const Problem

A first gotcha, easily overlooked, has to do with C’s concept of shallow const. When using Pimpl, all methods access data fields of the class merely through d:

SomeThing & Class::someThing() const {
    return d->someThing;
}

Only a closer look reveals that the code evades a C++ security feature: Since the method is declared const, this inside someThing() is of type const Class*, hence d is of type Class::Private * const. That, however, does not suffice to prevent write access to data fields of Class::Private, because even though d is const, *d is not.

Remember: In C/C++, const is not deep, but shallow:

const int * pci;        // pointer to const int
int * const cpi;        // const pointer to int
const int * const cpci; // const pointer to const int

*pci  = 1;  // error: *pci is const
*cpi  = 1;  // ok! *cpi isn't const
*cpci = 1;  // error: *cpci is const

int i;
pci  = &i;  // ok
cpi  = &i;  // error: cpi is const
cpci = &i;  // error: cpci is const

When using Pimpl, therefore, both const and non-const methods can write to the data field of the object. In the version without Pimpl, the compiler actively prevents that.

This hole in the type system will usually be undesirable and should be closed. This is possible using deep_const_ptr or a pair of d_func() methods. The former is a simple smart pointer class, which retrofits deep const for selected pointer variables. Its class definition, reduced to the essentials, might look as follows:

template <typename T>
class deep_const_ptr {
    T * p;
public:
    explicit deep_const_ptr( T * t ) : p( t ) {}

    const T & operator*() const { return *p; }
    T & operator*() { return *p; }

    const T * operator->() const { return p; }
    T * operator->() { return p; }
};

By using the trick to overload const and non-const versions of operator*() and operator->, the constness of d is forwarded to *d. Simply replacing Private * d; with deep_const_ptr<Private> d closes the hole in an effective manner. But there’s no need for a smart pointer here: The const/non-const overloading trick also works with methods on Class directly:

class Class {
    // ...
private:
    const Private * d_func() const { return _d; }
    Private * d_func() { return _d; }
private:
    Private * _d;
};

Instead of accessing _d in method implementations, one always uses d_func():

void Class::f() const {
     const Private * d = d_func();
     // use 'd' ...
}

Of course, nothing prevents the direct use of _d here; something that isn’t possible when using deep_const_ptr. This variant, therefore, requires a little more programmer discipline. In addition, the developer can extend deep_const_ptr such that its destructor deletes the payload for him, while he himself is responsible for deleting _d. In return, the d_func() variant scores points when dealing with polymorphic class hierarchies, as will be shown later.

Accessing The Public Class

A further obstacle arises when a developer actually wishes to move all private functions from the public to the Private class: He is missing a way to call (non-static) public or protected methods of the public class from methods on Private, since the link of the public class to its Private class is unidirectional:

class Class::Private {
public:
    Private() : ... {}
    // ...
    void callPublicFunc() { /*???*/Class::publicFunc(); }
};

Class::Class()
    : d( new Private ) {}

The problem can be solved by introducing a back-link (the name chosen here, q, originates from Qt):

class Class::Private {
    Class * const q; // back-link
public:
    explicit Private( Class * qq ) : q( qq ), ... {}
    // ...
    void callPublicFunc() { q->publicFunc(); }
};

Class::Class()
  : d( new Private( this ) ) {}

When using the back-link it is imperative to bear in mind that the initialisation of d is not ensured until the Private constructor has finished executing. One should avoid calling (Class) methods which require a valid d pointer during the execution of the Private constructor, lest it rains crashes or undefined behaviour.

The security-minded developer will therefore initialise the back-links to zero first, and only implant them the reference to the public class after construction is complete:

class Class::Private {
    Class * const q; // back-link
public:
    explicit Private( /*Class * qq*/ ) : q( 0 ), ... {}
    // ...
};

Class::Class()
    : d( new Private/*( this )*/ )
{
    // establish back-link:
    d->q = this;
}

Despite these restrictions, a considerable amount of the initalisation code of a class can usually be moved into the Private constructor, which is valuable in classes with overloaded constructors. It should not be left unmentioned that the q-pointer, too, can be talked into propagating constness using the deep_const_ptr or, in case of class hierarchies, q_func()-functions.

Having re-added missing functionality now, the rest of the article will show how to attenuate the Pimpl overhead with a trick from the depths of the magic bag.

Raising Efficiency Through Recycling

As a good C++ programmer, the reader will have become sceptical as he read the introductory remarks on the classical Pimpl idiom in Part One. In particular, the additional dynamic memory allocation will have caused headache, the more so if classes otherwise allocate little or no additional memory.

Even though such impressions should first be verified with a profiler, it cannot hurt acceptance of the technique to find mechanisms that soften the potential performance trap. Part One already mentioned the direct embedding of data fields, with which dynamic memory allocations can be saved. In the following, we will look at an additional, much more involved, technique: the recycling of the d-pointer.

In a polymorphic class hierarchy, the problem of additional dynamic memory allocations caused by Pimpl multiplies by the depth of the hierarchy: Each class in the hierarchy has its own “pimple”, even though for some it might be completely empty (for example, if one only reimplemented virtual functions, but added no additional data members).

The developer can fight the proliferation of d-pointers (and of the dynamic memory allocations associated with them) by re-using the base-class d-pointer in derived classes:

// base.h:
class Base {
    // ...
public:
    Base();
protected:
    class Private;
    explicit Base( Private * d );
    Private * d_func() { return _d; }
    const Private * d_func() const { return _d; }
private:
    Private * _d;
};

// base.cpp:
Base::Base()
    : _d( new Private )
{
    // ...
}

Base::Base( Private * d )
    : _d( d )
{
    // ...
}

The addition of the protected constructor alongside the public ones allows derived classes to implant their own d-pointer into the base class. The code also employs const forwarding using (now protected) d_func()-functions — and not deep_const_ptr — to allow derived classes (read-)access to _d.

// derived.h:
class Derived : public Base {
public:
    Derived();
    // ...
protected:
    class Private;
    Private * d_func();             // can't be implemented inline here
    const Private * d_func() const; // ditto
};

// derived.cpp:
Derived::Private * Derived::d_func() {
    return static_cast<Private*>( Base::d_func() );
}

const Derived::Private * Derived::d_func() const {
    return static_cast<const Private*>( Base::d_func() );
}

Derived::Derived()
    : Base( new Private ) {}

The author of Derived now uses the newly added Base constructor to implant a Derived::Private instead of a Base::Private into Base::_d (note the resolution of the unqualified name Private in the different contexts). He also implements the Derived::d_func() overloads in terms of the Base::d_func() ones, but returns his own Private class instead.

For the Base constructor call to work, Derived::Private needs to inherit from Base::Private:

class Derived::Private : public Base::Private {
    // ...
};

To actually be able to perform this inheritance, three conditions must be met: First, the developer has to declare the Base::Private destructor virtual, otherwise he’ll be caught up in undefined behaviour when the Base destructor deletes the Private class hierarchy.

Furthermore, he must implement both classes in the same library, since the Private classes are usually not exported — they carry no declspec(dllexport) on Windows and are not visibility=hidden in ELF binaries (Executable and Linkable Format). Export would be necessary, however, if Derived lay in a different library than Base. In exceptional cases, the Private classes of central classes may be exported: Nokia engineers, for example, have exported the classes QObjectPrivate (from QtCore) and QWidgetPrivate (from QtGui), which are so central to the Qt library, since so many classes from modules other than QtCore and QtGui derive from QObject and QWidget. Doing so, however, ties such libraries intrinsically to each other at the version level, such that end users can normally exchange them only in conjunction with each other: In general, a libQtGui.so.4.5.0 will not work if the runtime environment links it against a libQtCore.so.4.6.0.

Third, the definition of Base::Private can no longer be hidden in the implementation file of the base class (base.cpp), since the definition of Derived::Private requires it. So where to put the Base::Private definition? The developer can hardly put it into the header file (base.h), then he could just do away with the effort to use Pimpl. The answer lies in the creation of a second, private header file. Qt and KDE established the classname_p.h naming scheme for this purpose (_priv, _i and _impl suffixes are also common). Besides the Base::Private definition, the header file may also hold inline definitions of Base methods, for example the new constructor:

inline Base::Base( Private * d ) : _d( d ) {}

And in derived_p.h:

inline Derived::Derived( Private * d ) : Base( d ) {}
inline const Derived::Private * Derived::d_func() const {
    return static_cast<const Private*>( Base::d_func() );
}
inline Derived::Private * Derived::d_func() {
    return static_cast<Private*>( Base::d_func() );
}

Strictly speaking, the definitions as shown above violate the ODR (One Definition Rule), since the d_func() functions are inline in those translation units which include derived_p.h, but extern in all others.

In practice, however, that is not a problem, since all users of d_func() have to include derived_p.h, too. To be on the safe side, declare Derived::d_func() inline already in the class definition; current compilers don’t mind the missing definition.

In practice, one hides the non-negligible code noise introduced by this technique in preprocessor macros. Qt, for example, has a Q_DECLARE_PRIVATE macro that class definitions can use, as well as Q_D, which declares a local d pointer in method implementations and initialises it with a call to d_func().

One downside still remains, though: If the developer wants to combine d-pointer recycling with back-links, some complications ensue. To begin with, meticulous attention needs to be paid not to dereference (neither directly nor indirectly) the Derived pointer passed to the Private constructor, until the whole class hierarchy has finished constructing.

Derived::Private( Derived * qq )
  : Base( qq ) // ok, not dereferencing
{
    q->setFoo( ... ); // dereferences q -> crash!
}

For not only has Derived not finished constructing at the moment of the Private constructor call, neither has — and that is the difference to the non-polymorphic case discussed above — Base: Its constructor hasn’t been entered yet; the constructor argument is still under construction.

In this case, too, it helps to initialise the back-link to 0 first. The task to set the back-link then falls to the most-derived class, that is, the one that implants its concrete Private object into the hierarchy. In the case of Derived, this would look as follows:

Derived::Derived()
    : Base( new Private/*( this )*/ )
{
    d_func()->_q = this;
}

The author customarily rolls parts of the initialisation that require the back-link into a separate Private::init() function (=two-step construction of Private), called (only) by the constructor whose own Private class is being used.

Derived::Derived( Private * d )
    : Base( d )
{
    // does _not_ call d->init()!
}
Derived::Derived()
    : Base( new Private )
{
    d_func()->init( this );
}
Derived::Private::init( Derived * qq ) {
  Base::Private::init( qq ); // sets _q
  // my initialisation goes here
}

Furthermore, each Private class needs to declare its own back-link, or else “q_func()” methods that take care of casting the base-class back-link. The code needed for this is left as an exercise to you, gentle reader. The solution can be found on the Heise FTP server in the form of a “pimped” Shape-hierarchy.

Findings

As a well-known C++ idiom, Pimpl allows class authors to separate class interface and implementation to an extent not directly provided for by C++. As a positive side-effect, the use of d-pointers speeds up compilation runs, eases implementation of transaction semantics, and allows, through extended means of composition, implementations that potentially are more runtime-efficient.

Not everything is shiny when using d-pointers, though: In addition to the extra Private class, and its dynamic memory allocation, modified const method semantics, as well as potential allocation sequence errors are cause for concern. For both of these, this article has presented remedies, which, however, cause a lot more coding effort. Because of the increased complexity involved in these, the “fully pimped” Pimpl, including recycling and back-links, can be recommended only for a few selected classes or projects.

However, projects that do not shy away from the effort will be rewarded with intriguing interface stability, allowing far-reaching implementation changes.

Literature

  • John Lakos; Large-Scale C++ Software Design; Addison-Wesley Longman, 1996
  • Herb Sutter; Exceptional C++: 47 Engineering Puzzles, Programming Problems, and Solutions; Addison-Wesley Longman, 2000
  • Herb Sutter, Andrei Alexandrescu: C++ Coding Standards: 101 Rules, Guidelines and Best Practices; Addison-Wesley Longman, 2004
  • Marc Mutz; Pimp my Pimpl; C++: Vor- und Nachteile des d-Zeiger-Idioms, Teil 1; Artikel auf heise Developer (English Translation available)
Advertisement

7 Responses to Pimp My Pimpl — Reloaded

  1. Pingback: Heise Developer: Pimp My Pimpl (part 2) « -Wmarc

  2. Pingback: Translated: Pimp My Pimpl (part 2) « -Wmarc

  3. Pingback: Перевод статьи «Pimp my Pimpl», часть 2 / C++ / Хабрахабр | TechRetriever

  4. wjh says:

    In the following excerpt from your article, I believe that member q of Class::Private would technically need to be public (not private as is currently the case) in order for the assignment “d->q = this;” in the Class constructor to compile:

    >>>>>
    The security-minded developer will therefore initialise the back-links to zero first, and only implant them the reference to the public class after construction is complete:

    class Class::Private {
    Class * const q; // back-link
    public:
    explicit Private( /*Class * qq*/ ) : q( 0 ), … {}
    // …
    };

    Class::Class()
    : d( new Private/*( this )*/ )
    {
    // establish back-link:
    d->q = this;
    }
    <<<<<

    • Matt says:

      As long as it is constant (Class * const q) it doesn’t matter that you change encapsulation because you won’t be able to assign any value to that in constructor.

  5. Pingback: Stepanov-Regularity and Partially-Formed Objects vs. C++ Value Types - KDAB

  6. Pingback: QObjects, Ownership, propagate_const and C++ Evolution

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: