Pimp My Pimpl


This is a translation of a two-part article that originally appeared in Heise Developer. You can find the originals here:

You can find Part Two here:

Pimp My Pimpl

Much has been written about this funnily-named idiom, alternatively known as d-pointer, compiler firewall, or Cheshire Cat. Heise Developer highlights some angles of this practical construct that go beyond the classic technique.

Part One

This article first recapitulates the classic Pimpl idiom (pointer-to-implementation), points out its advantages, and goes on to develop further idioms on its basis. The second part will concentrate on how to mitigate the disadvantages that inevitably arise through Pimpl use.

The Classic Idiom

Every C++ programmer has probably stumbled across a class definition akin to the following:

class Class {
    // ...
private:
    class Private; // forward declaration
    Private * d;   // hide impl details
};

Here, the programmer moved the data fields of his class Class into a nested class Class::Private. Instances of Class merely contain a pointer d to their Class::Private instance.

To understand why the class author used this indirection, we need to take a step back and look at the C++ module system. In contrast to many other languages, C++, as a language of C descent, has no built-in support for modules (such support was proposed for C++0x, but did not make it into the final standard). Instead, one factors module function declarations (but not usually their definitions) into header files, and makes them available to other modules using the #include preprocessor directive. This, however, leaves the header files filling a double role: On the one hand, they serve as the module interface. On the other, as a declaration site for potentially internal implementation details.

In times of C, this worked well: Implementation details of functions are encapsulated perfectly by the declaration/definition split; one could either merely forward-declare structs (in which case they were private), or define them directly in the header file (in which case they were public). In “object-oriented C”, class Class from above would maybe look like the following:

struct Class;                           // forward declaration
typedef struct Class * Class_t;         // -> only private members
void Class_new( Class_t * cls );        // Class::Class()
void Class_release( Class_t cls );      // Class::~Class()
int Class_f( Class_t cls, double num ); // int Class::f( double )
//...

Unfortunately, that doesn’t work in C++. Methods must be declared inside the class. Since classes without methods are rather boring, class definitions usually appear in C++ header files. Since classes, unlike namespaces, cannot be reopened, the header file must contain declarations for all (data fields, as well as) methods:

class Class {
public:
    // ... public methods ... ok
private:
    // ... private data & methods ... don't want these here
};

The problem is evident: The module interface (header file) necessarily contains implementation details; always a bad idea. That is why one uses a rather ugly trick and in short factors all implementation details (data fields as well as private methods) into a separate class:

// --- class.h ---
class Class {
public:
    Class();
    ~Class();

    // ... public methods ...

    void f( double n );

private:
    class Private;
    Private * d;
};
// -- class.cpp --
#include <class.h>

class Class::Private {
public:
    // ... private data & methods ...
    bool canAcceptN( double num ) const { return num != 0 ; }
    double n;
};

Class::Class()
    : d( new Private ) {}

Class::~Class() {
    delete d;
}

void Class::f( double n ) {
    if ( d->canAcceptN( n ) )
        d->n = n;
}

Since Class::Private is used only in the declaration of a pointer variable, ie. “in name only” (Lakos) and not “in size”, a forward declaration suffices, as in the C case. All methods of Class now access private methods and data members of Class::Private through d only.

In this way, one gains the convenience of a fully-encapsulating module system in C++, too. Because of the recourse to indirection, the developer pays for these benefits with an additional memory allocation (new Class::Private), the indirection on accessing data fields and private methods, as well as the total waiving of (at least public) inline methods. As the second part will show, the semantics of const methods also change.

Before the second part of this article addresses the issue of how to rectify, or at least mitigate, the above downsides, the remainder of this article will first shed some light on the idiom’s benefits.

Benefits of the Pimpl Idiom

They are substantial. By encapsulating all implementation details, a slim and long-term stable interface (header file) arises. The former leads to more readable class definitions; the latter helps maintaining binary compatibility even through extensive changes to the implementation.

For instance, Nokia’s “Qt Development Frameworks” department (formerly Trolltech) has carried out profound changes to the widget rendering at least twice during the development of their “Qt 4” class library without the need to so much as relink programs using Qt 4.

Particularly in larger projects, the tendency of the Pimpl Idiom to dramatically speed up builds should not be underestimated. This is accomplished both by a reduction of #include directives in header files and though the considerably reduced frequency of changes to header files of Pimpl classes in general. In “Exceptional C++”, Herb Sutter reports regular doubling of compilation speeds, John Lakos even claims build speed-ups of two orders of magnitude.

Another virtue of the design: classes with d-pointers are well-suited for transaction-oriented and exception-safe code, respectively. For instance, the developer may use the Copy-Swap Idiom (Sutter/Alexandrescu: C++ Coding Standards, Item 56) to create a transactional (all-or-nothing) copy assignment operator:

class Class {
    // ...
    void swap( Class & other ) {
        std::swap( d, other.d );
    }
    Class & operator=( const Class & other ) {
        // this may fail, but doesn't change *this
        Class copy( other );
        // this cannot fail, commits to *this:
        swap( copy );
        return *this;
    }

Implementation of C++0x move operations is trivial as well (and, in particular, identical across all Pimpl classes):

    // C++0x move semantics:
    Class( Class && other ) : d( other.d ) { other.d = 0; }
    Class & operator=( Class && other ) {
        std::swap( d, other.d );
        return *this;
    }
    // ...
};

Both member swap and assignment operators may be implemented inline in this model, without compromising the class’ encapsulation; developers should make good use of this fact.

Extended Means of Composition

As the last benefit the option to cut down on some of the extra dynamic memory allocations through direct aggregation of data fields should be mentioned. Without Pimpl, aggregation would customarily have been through a pointer in order to decouple classes from each other (a kind of Pimpl per data field). By “pimpling” the whole class once, the need to hold private data fields of complex type only though pointers can be dispensed with.

For instance, the idiomatic Qt dialog class

class QLineEdit;
class QLabel;
class MyDialog : public QDialog {
    // ...
private:
    // idiomatic Qt:
    QLabel    * m_loginLB;
    QLineEdit * m_loginLE;
    QLabel    * m_passwdLB;
    QLineEdit * m_passwdLE;
};

turns into

#include <QLabel>
#include <QLineEdit>

class MyDialog::Private {
    // ...
    // not idiomatic Qt, but less heap allocations:
    QLabel    loginLB;
    QLineEdit loginLE;
    QLabel    passwdLB;
    QLineEdit passwdLE;
};

Qt aficionados may argue that the QDialog destructor already destroys the child widgets; direct aggregation would therefore trigger a double-delete. Indeed, usage of this technique poses the threat of allocation sequence errors (double-delete, use-after-free, etc), particularly if data fields are also owned by the class, and vice versa. The transformation shown, however, is safe here, since Qt always allows to delete children before their parents.

This approach is especially effective in case data fields aggregated this way are themselves instances of “pimpled” classes. This is the case in the example shown, and usage of the Pimpl Idiom saves four dynamic memory allocations of size sizeof(void*) while incurring only one additional (larger) allocation. This can lead to more efficient use of the heap, since small allocations regularly create especially high overhead in the allocator.

In addition, the compiler is much more likely to “de-virtualise” calls to virtual functions in this scenario, ie. it removes the double indirection caused by the virtuality of the function call. This requires interprocedural optimisation when using aggregation by pointer. Whether or not this indeed constitutes a win in runtime performance against the background of an extra indirection though the d-pointer has to be checked as needed by profiling concrete classes.

In case profiling shows that the dynamic memory allocation turns in to a bottleneck, the “Fast Pimpl” Idiom (Exceptional C++, Item 30) may produce relief. In this variant, a fast allocator, e.g. boost::singleton_pool, is used to create Private instances instead of global operator new().

Interim Findings

As a well-known C++ idiom, Pimpl allows class authors to separate class interface and implementation to an extent not directly provided for by C++. As a positive side-effect, the use of d-pointers speeds up compilation runs, eases implementation of transaction semantics, and allows, through extended means of composition, implementations that potentially are more runtime-efficient.

Not everything is shiny when using d-pointers, though: In addition to the extra Private class, and its dynamic memory allocation, modified const method semantics, as well as potential allocation sequence errors are cause for concern.

For some of these, the author will show solutions in the second part of the article. Complexity will increase further, though, so that for each concrete case one has to verify anew that the benefits of the idiom outweigh the downsides. If in doubt, this needs to be done per class in question. As usual, there can be no blanket judgements.

Outlook

The second and last part of this article will take a closer look under the hood of Pimpl, uncover the rust-streaked areas, and pimp the idiom using a whole array of accessories.

Advertisement

17 Responses to Pimp My Pimpl

  1. Pingback: Translated: Pimp My Pimpl (part 1) « -Wmarc

  2. Pingback: Heise Developer: Pimp My Pimpl (part 1) « -Wmarc

  3. Joe says:

    Good job! Awesome article!

    I found typo:
    void Class_new( Class_t * cls ); // Class::Class()
    probably must be:
    void Class_new( Class_t cls ); // Class::Class()
    because Class_t is already pointer.
    It’s not your typo, just retyped from original.

    P.S. Think I make translation of this article into russian, so it will be amazing to have part 2 in English as I don’t know Deutsch.

    • marcmutz says:

      No, the code is correct as it is. It needs to be a pointer-to-Class_t, so the “constructor” can assign to it:

      Class_t c = 0;
      Class_new( &c );
      // now c != 0 (hopefully)
      // ...
      Class_release( c );
      c = 0;
      

      Alternatively, Class_new() could return the new pointer, but C class libraries usually reserve the return value for error codes.

    • marcmutz says:

      > P.S. Think I make translation of this article into russian, so it will be amazing to have part 2 in English as I don’t know Deutsch.

      Be my guest. The only requirement I have is that you link back to both the German (on heise.de) and English versions (this one), and let me link to your Russian one from here.
      Yes, I intend to translate the second part, too, at some point.

  4. Joe says:

    >typedef Class * Class_t;

    I guess it should be:
    typedef struct Class * Class_t;

  5. Joe says:

    I suggest to turn code


    #include
    #include

    class MyDialog::Private {
    // ...
    // not idiomatic Qt, but less heap allocations:
    QLabel loginLB;
    QLineEdit loginLE;
    QLabel passwdLB;
    QLineEdit passwdLE;
    };

    to this one:


    #include
    #include

    class MyDialog::Private {
    // ...
    public:
    // not idiomatic Qt, but less heap allocations:
    QLabel loginLB;
    QLineEdit loginLE;
    QLabel passwdLB;
    QLineEdit passwdLE;
    };

    to explicitly show that data fields locates at public section

    • marcmutz says:

      > to explicitly show that data fields locates at public section

      I don’t think they should be public. Esp. if you do the Qt-style DerivedPrivate : public BasePrivate from part II, you don’t want all data members to be at most protected. Just make Class::Private a friend of Class.

  6. Great part one. Can’t wait for part two, in English. In my opinion d-ptr a.k.a Pimpl is not nearly used enough.

    Another area when I’ve used this technique is when using actors (or active objects) to make it explicitly clear what belongs to the API for “users” and what are worker thread internals…

  7. Joe says:

    There is link to my translation: http://habrahabr.ru/blogs/cpp/118010/
    It contains link to original (deutsch) article and your translation, as I promised.

  8. Crocozaurus says:

    Thanks for this (still) very useful article. FYI, both links to part 2 are broken, the correct link is:
    https://marcmutz.wordpress.com/translated-articles/pimp-my-pimpl-reloaded/

  9. Pingback: Stepanov-Regularity and Partially-Formed Objects vs. C++ Value Types - KDAB

  10. Pingback: How to use the Qt's PIMPL idiom? - PhotoLens

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: