Pimp My Pimpl — Reloaded
This is a translation of a two-part article that originally appeared on Heise Developer. You can find the originals here:
- Part One: http://www.heise.de/developer/artikel/C-Vor-und-Nachteile-des-d-Zeiger-Idioms-Teil-1-1097781.html
- Part Two: http://www.heise.de/developer/artikel/C-Vor-und-Nachteile-des-d-Zeiger-Idioms-Teil-2-1136104.html
You can find Part One here:
Pimp My Pimpl — Reloaded
Much has been written about this funnily-named idiom, alternatively known as d-pointer, compiler firewall, or Cheshire Cat. After a first article on Heise Developer first presented the classic Pimpl Idiom and its benefits, this second part will focus on removing some of the drawbacks that inevitably occur when using Pimpls.
Part Two
The Shallow-const
Problem
A first gotcha, easily overlooked, has to do with C’s concept of shallow const
. When using Pimpl, all methods access data fields of the class merely through d
:
SomeThing & Class::someThing() const { return d->someThing; }
Only a closer look reveals that the code evades a C++ security feature: Since the method is declared const
, this
inside someThing()
is of type const Class*
, hence d
is of type Class::Private * const
. That, however, does not suffice to prevent write access to data fields of Class::Private
, because even though d
is const
, *d
is not.
Remember: In C/C++, const
is not deep, but shallow:
const int * pci; // pointer to const int int * const cpi; // const pointer to int const int * const cpci; // const pointer to const int *pci = 1; // error: *pci is const *cpi = 1; // ok! *cpi isn't const *cpci = 1; // error: *cpci is const int i; pci = &i; // ok cpi = &i; // error: cpi is const cpci = &i; // error: cpci is const
When using Pimpl, therefore, both const
and non-const
methods can write to the data field of the object. In the version without Pimpl, the compiler actively prevents that.
This hole in the type system will usually be undesirable and should be closed. This is possible using deep_const_ptr
or a pair of d_func()
methods. The former is a simple smart pointer class, which retrofits deep const
for selected pointer variables. Its class definition, reduced to the essentials, might look as follows:
template <typename T> class deep_const_ptr { T * p; public: explicit deep_const_ptr( T * t ) : p( t ) {} const T & operator*() const { return *p; } T & operator*() { return *p; } const T * operator->() const { return p; } T * operator->() { return p; } };
By using the trick to overload const
and non-const
versions of operator*()
and operator->
, the constness of d
is forwarded to *d
. Simply replacing Private * d;
with deep_const_ptr<Private> d
closes the hole in an effective manner. But there’s no need for a smart pointer here: The const
/non-const
overloading trick also works with methods on Class
directly:
class Class { // ... private: const Private * d_func() const { return _d; } Private * d_func() { return _d; } private: Private * _d; };
Instead of accessing _d
in method implementations, one always uses d_func()
:
void Class::f() const { const Private * d = d_func(); // use 'd' ... }
Of course, nothing prevents the direct use of _d
here; something that isn’t possible when using deep_const_ptr
. This variant, therefore, requires a little more programmer discipline. In addition, the developer can extend deep_const_ptr
such that its destructor deletes the payload for him, while he himself is responsible for deleting _d
. In return, the d_func()
variant scores points when dealing with polymorphic class hierarchies, as will be shown later.
Accessing The Public Class
A further obstacle arises when a developer actually wishes to move all private functions from the public to the Private
class: He is missing a way to call (non-static) public or protected methods of the public class from methods on Private
, since the link of the public class to its Private
class is unidirectional:
class Class::Private { public: Private() : ... {} // ... void callPublicFunc() { /*???*/Class::publicFunc(); } }; Class::Class() : d( new Private ) {}
The problem can be solved by introducing a back-link (the name chosen here, q
, originates from Qt):
class Class::Private { Class * const q; // back-link public: explicit Private( Class * qq ) : q( qq ), ... {} // ... void callPublicFunc() { q->publicFunc(); } }; Class::Class() : d( new Private( this ) ) {}
When using the back-link it is imperative to bear in mind that the initialisation of d
is not ensured until the Private
constructor has finished executing. One should avoid calling (Class
) methods which require a valid d
pointer during the execution of the Private
constructor, lest it rains crashes or undefined behaviour.
The security-minded developer will therefore initialise the back-links to zero first, and only implant them the reference to the public class after construction is complete:
class Class::Private { Class * const q; // back-link public: explicit Private( /*Class * qq*/ ) : q( 0 ), ... {} // ... }; Class::Class() : d( new Private/*( this )*/ ) { // establish back-link: d->q = this; }
Despite these restrictions, a considerable amount of the initalisation code of a class can usually be moved into the Private
constructor, which is valuable in classes with overloaded constructors. It should not be left unmentioned that the q
-pointer, too, can be talked into propagating constness using the deep_const_ptr
or, in case of class hierarchies, q_func()
-functions.
Having re-added missing functionality now, the rest of the article will show how to attenuate the Pimpl overhead with a trick from the depths of the magic bag.
Raising Efficiency Through Recycling
As a good C++ programmer, the reader will have become sceptical as he read the introductory remarks on the classical Pimpl idiom in Part One. In particular, the additional dynamic memory allocation will have caused headache, the more so if classes otherwise allocate little or no additional memory.
Even though such impressions should first be verified with a profiler, it cannot hurt acceptance of the technique to find mechanisms that soften the potential performance trap. Part One already mentioned the direct embedding of data fields, with which dynamic memory allocations can be saved. In the following, we will look at an additional, much more involved, technique: the recycling of the d
-pointer.
In a polymorphic class hierarchy, the problem of additional dynamic memory allocations caused by Pimpl multiplies by the depth of the hierarchy: Each class in the hierarchy has its own “pimple”, even though for some it might be completely empty (for example, if one only reimplemented virtual functions, but added no additional data members).
The developer can fight the proliferation of d
-pointers (and of the dynamic memory allocations associated with them) by re-using the base-class d
-pointer in derived classes:
// base.h: class Base { // ... public: Base(); protected: class Private; explicit Base( Private * d ); Private * d_func() { return _d; } const Private * d_func() const { return _d; } private: Private * _d; }; // base.cpp: Base::Base() : _d( new Private ) { // ... } Base::Base( Private * d ) : _d( d ) { // ... }
The addition of the protected
constructor alongside the public ones allows derived classes to implant their own d
-pointer into the base class. The code also employs const
forwarding using (now protected
) d_func()
-functions — and not deep_const_ptr
— to allow derived classes (read-)access to _d
.
// derived.h: class Derived : public Base { public: Derived(); // ... protected: class Private; Private * d_func(); // can't be implemented inline here const Private * d_func() const; // ditto }; // derived.cpp: Derived::Private * Derived::d_func() { return static_cast<Private*>( Base::d_func() ); } const Derived::Private * Derived::d_func() const { return static_cast<const Private*>( Base::d_func() ); } Derived::Derived() : Base( new Private ) {}
The author of Derived
now uses the newly added Base
constructor to implant a Derived::Private
instead of a Base::Private
into Base::_d
(note the resolution of the unqualified name Private
in the different contexts). He also implements the Derived::d_func()
overloads in terms of the Base::d_func()
ones, but returns his own Private
class instead.
For the Base
constructor call to work, Derived::Private
needs to inherit from Base::Private
:
class Derived::Private : public Base::Private { // ... };
To actually be able to perform this inheritance, three conditions must be met: First, the developer has to declare the Base::Private
destructor virtual, otherwise he’ll be caught up in undefined behaviour when the Base
destructor deletes the Private
class hierarchy.
Furthermore, he must implement both classes in the same library, since the Private
classes are usually not exported — they carry no declspec(dllexport)
on Windows and are not visibility=hidden
in ELF binaries (Executable and Linkable Format). Export would be necessary, however, if Derived
lay in a different library than Base
. In exceptional cases, the Private
classes of central classes may be exported: Nokia engineers, for example, have exported the classes QObjectPrivate
(from QtCore) and QWidgetPrivate
(from QtGui), which are so central to the Qt library, since so many classes from modules other than QtCore and QtGui derive from QObject
and QWidget
. Doing so, however, ties such libraries intrinsically to each other at the version level, such that end users can normally exchange them only in conjunction with each other: In general, a libQtGui.so.4.5.0
will not work if the runtime environment links it against a libQtCore.so.4.6.0
.
Third, the definition of Base::Private
can no longer be hidden in the implementation file of the base class (base.cpp
), since the definition of Derived::Private
requires it. So where to put the Base::Private
definition? The developer can hardly put it into the header file (base.h
), then he could just do away with the effort to use Pimpl. The answer lies in the creation of a second, private header file. Qt and KDE established the classname_p.h
naming scheme for this purpose (_priv
, _i
and _impl
suffixes are also common). Besides the Base::Private
definition, the header file may also hold inline
definitions of Base
methods, for example the new constructor:
inline Base::Base( Private * d ) : _d( d ) {}
And in derived_p.h
:
inline Derived::Derived( Private * d ) : Base( d ) {} inline const Derived::Private * Derived::d_func() const { return static_cast<const Private*>( Base::d_func() ); } inline Derived::Private * Derived::d_func() { return static_cast<Private*>( Base::d_func() ); }
Strictly speaking, the definitions as shown above violate the ODR (One Definition Rule), since the d_func()
functions are inline
in those translation units which include derived_p.h
, but extern
in all others.
In practice, however, that is not a problem, since all users of d_func()
have to include derived_p.h
, too. To be on the safe side, declare Derived::d_func()
inline
already in the class definition; current compilers don’t mind the missing definition.
In practice, one hides the non-negligible code noise introduced by this technique in preprocessor macros. Qt, for example, has a Q_DECLARE_PRIVATE
macro that class definitions can use, as well as Q_D
, which declares a local d
pointer in method implementations and initialises it with a call to d_func()
.
One downside still remains, though: If the developer wants to combine d
-pointer recycling with back-links, some complications ensue. To begin with, meticulous attention needs to be paid not to dereference (neither directly nor indirectly) the Derived
pointer passed to the Private
constructor, until the whole class hierarchy has finished constructing.
Derived::Private( Derived * qq ) : Base( qq ) // ok, not dereferencing { q->setFoo( ... ); // dereferences q -> crash! }
For not only has Derived
not finished constructing at the moment of the Private
constructor call, neither has — and that is the difference to the non-polymorphic case discussed above — Base
: Its constructor hasn’t been entered yet; the constructor argument is still under construction.
In this case, too, it helps to initialise the back-link to 0
first. The task to set the back-link then falls to the most-derived class, that is, the one that implants its concrete Private
object into the hierarchy. In the case of Derived
, this would look as follows:
Derived::Derived() : Base( new Private/*( this )*/ ) { d_func()->_q = this; }
The author customarily rolls parts of the initialisation that require the back-link into a separate Private::init()
function (=two-step construction of Private
), called (only) by the constructor whose own Private
class is being used.
Derived::Derived( Private * d ) : Base( d ) { // does _not_ call d->init()! } Derived::Derived() : Base( new Private ) { d_func()->init( this ); } Derived::Private::init( Derived * qq ) { Base::Private::init( qq ); // sets _q // my initialisation goes here }
Furthermore, each Private
class needs to declare its own back-link, or else “q_func()
” methods that take care of casting the base-class back-link. The code needed for this is left as an exercise to you, gentle reader. The solution can be found on the Heise FTP server in the form of a “pimped” Shape
-hierarchy.
Findings
As a well-known C++ idiom, Pimpl allows class authors to separate class interface and implementation to an extent not directly provided for by C++. As a positive side-effect, the use of d-pointers speeds up compilation runs, eases implementation of transaction semantics, and allows, through extended means of composition, implementations that potentially are more runtime-efficient.
Not everything is shiny when using d-pointers, though: In addition to the extra Private
class, and its dynamic memory allocation, modified const
method semantics, as well as potential allocation sequence errors are cause for concern. For both of these, this article has presented remedies, which, however, cause a lot more coding effort. Because of the increased complexity involved in these, the “fully pimped” Pimpl, including recycling and back-links, can be recommended only for a few selected classes or projects.
However, projects that do not shy away from the effort will be rewarded with intriguing interface stability, allowing far-reaching implementation changes.
Literature
- John Lakos; Large-Scale C++ Software Design; Addison-Wesley Longman, 1996
- Herb Sutter; Exceptional C++: 47 Engineering Puzzles, Programming Problems, and Solutions; Addison-Wesley Longman, 2000
- Herb Sutter, Andrei Alexandrescu: C++ Coding Standards: 101 Rules, Guidelines and Best Practices; Addison-Wesley Longman, 2004
- Marc Mutz; Pimp my Pimpl; C++: Vor- und Nachteile des d-Zeiger-Idioms, Teil 1; Artikel auf heise Developer (English Translation available)
Pingback: Heise Developer: Pimp My Pimpl (part 2) « -Wmarc
Pingback: Translated: Pimp My Pimpl (part 2) « -Wmarc
Pingback: Перевод статьи «Pimp my Pimpl», часть 2 / C++ / Хабрахабр | TechRetriever
In the following excerpt from your article, I believe that member q of Class::Private would technically need to be public (not private as is currently the case) in order for the assignment “d->q = this;” in the Class constructor to compile:
>>>>>
The security-minded developer will therefore initialise the back-links to zero first, and only implant them the reference to the public class after construction is complete:
class Class::Private {
Class * const q; // back-link
public:
explicit Private( /*Class * qq*/ ) : q( 0 ), … {}
// …
};
Class::Class()
: d( new Private/*( this )*/ )
{
// establish back-link:
d->q = this;
}
<<<<<
As long as it is constant (Class * const q) it doesn’t matter that you change encapsulation because you won’t be able to assign any value to that in constructor.
Pingback: Stepanov-Regularity and Partially-Formed Objects vs. C++ Value Types - KDAB
Pingback: QObjects, Ownership, propagate_const and C++ Evolution