A case against direct initialisation — the sequel


A few days ago, I wrote about why I consider the commonly seen rule to prefer direct initialisation over copy initialisation false advice.

Sometimes, you feel something, based on your experience, and can’t quite pinpoint it, or recall where the feeling comes from. Yet, you’re perfectly certain it makes sense. I thought I had collected enough reasons to explain why I don’t like direct initialisation (with one argument, you don’t have much choice with more than one :)), but as you can see from the comments, not everyone agrees they’re sufficient.

Today, I recalled a fourth reason to despise direct initialisation. Try the following program in Qt 3:

#define QT_NO_CAST_ASCII
#include <qstring.h>

int main() {
    QString s1 = "foo";
    QString s2( "bar" );
    return 0;
}

You would rightfully expect that both of the statements in main() fail to compile. After all, by defining QT_NO_CAST_ASCII, I (thought I) disabled the const char* → QString conversion for good. Sure enough, that’s what happens for the first statement. But the compiler always tries to make sense of your feeble scribblings into a text editor, and in this case, it found that it could make the second statement compile by going via std::string(const char*) and QString(std::string). Oops.

Of course, this is really a bug in QString, as the QString(std::string) constructor should have been under QT_NO_CAST_ASCII protection, too. But given that even the Trolls managed to let that one slip through, before you throw the first stone, consider whether you yourself would have managed to call that bug.

To summarise: If you use direct initialisation, T t(u), you effectively allow the compiler to use two user-defined conversions to go from a U u to a T. The mediator type is not visible in the actual source code. With copy initialisation, T t = u, only one user-defined conversion is allowed (as usual), and it must either be U::operator T(), or T(U).

I don’t know about you, but I don’t like my compiler second-guessing me at every opportunityinitalisation, and I would hate for it to make an error of mine compile by inserting a double conversion via an expensive third type. So, I take my compiler on the short leash and give it copy initialisation statements wherever possible.

Advertisement

A case against direct initialisation


I would like to start by extending an apology to You-Know-Who-You-Are :). Even though your coding style prompted me to revisit it, I have been pondering this issue for some time now. It was only when I went back to the text books to read up on it that I finally understood how to describe this in accessible terms. Take solace in the fact that the “you” above is the plural form :).

Over the years, I have repeatedly come across colleagues that are obviously familiar with the Sutter book “More Exceptional C++” and its Item 36 (or his internet column, Guru of the Week #1), which contains the guideline

Prefer using the form “T t(u)” over “T t = u” for variable initialisation.

While I myself try to follow Sutter/Alexandrescu/Meyers in my everyday coding, I’ve always been slightly uncomfortable with a few of their guidelines, such as the suggestion to add virtual in front of reimplemented virtuals. That particular issue is fodder for another blog post. Today, I’d like to talk about why I never follow the guideline to prefer direct over copy initialisation, unless I have to use direct initialisation.

Direct Initialisation, of course, is the construction of variables by calling one of their constructors (T t(u)), while Copy Initialisation is construction of variables by first constructing a temporary of the variable’s type, and then copy-constructing the variable from that temporary (T t = u, which is equivalent to T t( T(u) )).

At face value, direct initialisation is preferable, because it avoids the extra copy constructor. However, the standard allows compilers to elide the copy constructor call (the copy constructor still has to be accessible, though), and every half-decent compiler will implement that optimisation, thus making the two all but identical at runtime.

So, if avoiding premature pessimisation isn’t the reason to prefer direct initialisation, what is? Interestingly, there appear to be no reasons (left anymore). While the original GotW cited “works in more cases” as a reason, the corresponding MEC++ Item #36, written a few years later, does not give any rationale for the guideline anymore. Even more interestingly, The Good Book, written yet some time later, doesn’t even include that guideline! Clearly, something’s wrong with that guideline. Don’t you, too, sometimes wish the Good Book had an appendix on “items that didn’t make it, and why?” :).

As far as I have made out, there are three main reasons not to prefer to use direct initialisation, two objective, and one highly subjective.

Let’s start with the subjective one, so you end up remembering the objective ones better 🙂

It just looks plain weird.

Especially for people coming from C, which has no syntax for direct initialisation.

Ok, with the subjective reason out-of-the-way, let’s look at the two objective reasons:

First, direct initialisation sometimes ends up looking like a declaration, and the standard requires that everything that looks like a declaration is also parsed as one. That’s the issue behind C++’ most vexing parse (coined in Meyers: Effective STL, Item 6):

const U u;      // ok, defines a variable u of type U, and default-initialises it
const T t(u);   // ok, defines a variable t of type T, initialised from u
const T t(U()); // oops, declares a function t, taking a function returning U as argument and returning const T!

Second, direct initialisation disables the protection provided by explicit constructors. Consider the classical mistake that explicit constructors are to prevent you from making:

class Stack {
public:
    explicit Stack( size_t maxSize );
    // ...
};

Stack * stack = 0; // ok
// same line, but forgot the '*'
Stack stack = 0; // error: Stack(size_t) is explicit -> good
// same line, now with direct initialisation:
Stack stack(0); // oops, compiles!

In other words, copy construction, by virtue of potentially involving an implicit call to a conversion constructor (remember that T t = u is really T t( T(u) )), cannot be used when the conversion constructor is explicit. Direct initialisation, on the other hand, by making an explicit call to the conversion constructor (T t(u)), will succeed whether or not the constructor is explicit.

So, if you always prefer direct initialisation over copy initialisation, you’re more likely to hit “C++’s most vexing parse”, as well as suffering from unintentional use of explicit constructors. Don’t go there. Prefer copy construction.

That said, there are, of course, situations where you should use direct initialisation. E.g. when calling a constructor with more than one argument:

// this is overly verbose:
const QDateTime dt = QDateTime( 2010, 8, 16 );
// better:
const QDateTime dt( 2010, 8, 16 );

Likewise, when you want to call an explicit constructor, you have to use direct initialisation, too:

std::vector<std::string> v( 10 );

If you default to using copy initialisation, direct initialisation stands out in your code, and marks places where something potentially dangerous, or expensive happens.

Fun with exceptions


Here’s a guideline for you: If your library uses exceptions, and you intend users of your library to catch them, don’t implement them inline in the header.

Example:

class LIB_EXPORT MyException : public std::exception {
public:
    explicit MyException( ... );
    // ... (but no dtor)
};

Here, the compiler will generate a MyException destructor for you, since you didn’t provide one. Compiler-synthesised destructors are public, inline, have an empty body, and the same exception specification as the base class’ destructor. So, the above is equivalent to:

class LIB_EXPORT MyException : public std::exception {
public:
    explicit MyException( ... );
    ~MyException() throw() {}
    // ...
};

Unfortunately (well, actually fortunately, but not for our scenario), the destructor of std::exception is virtual, so ~MyException() throw() {} (explicitly written by you or synthesised by the compiler) is the definition (as opposed to declaration) of the first virtual function of the class MyException. Most C++ compilers take that as a cue to emit the MyException virtual function table, as well as its RTTI information at that point.

Oops. Does that mean we’ll end up with duplicate vtables and RTTI information for a class whose first virtual function is defined inline?

Yes, it does. That’s not normally a big problem, since the linker will merge them at link time. However, at least my GCC 4.3 doesn’t do this across shared library boundaries.

Thus, we end up with a situation where the throw site uses a different vtable (and std::type_info instance) for the exception class than the user of the library (which has its own copy helpfully provided by the compiler). The result is that you can’t catch the exception in client code anymore:

// in the client
try {
    doSomethingThatThrowsMyException();
} catch ( const MyException & e ) { // never hit
    log( "Caught MyException: %s", e.what() );
    return;
} catch ( const std::exception & e ) { // hit instead
    log( "Caught unexpected exception %s: %s:", typeid(e).name(), e.what() );
    return;
}

Now, consider the surprise—the sheer terror—of a developer using your library, only to find he’s thrown a MyException that he can’t catch.

Don’t go there. Define your exception classes out-of-line!

[Update 2010-12-02: The problem only occurs if you throw the exception from out-of-line code. If you only throw the exception from inline code, everything is peachy.]