BC/SC Gotcha: reimplementing a virtual function


This might surprise you: Adding a reimplementation of a base class virtual function to a derived class might be binary incompatible. Here’s why.

Assume you have the following classes in a library that maintains binary compatibility across releases:

// release 1.0
struct base { virtual f() {} };
struct derived : base {}; // doesn't reimplement base::f()

And in an application using the above library:

derived d;
d.f(); // ok, calls base::f()

Q: Is d.f() a virtual function call?
A: No. Not if your compiler is any good.

When the compiler can prove the dynamic type of an object, it will resolve the polymorphic calls at compile time. In other words: It will de-virtualise the call.

In our case, the compiler can prove that d‘s dynamic type is derived, so it just inserts the function call to base::f(). Don’t believe me? Ask your compiler:

struct base {
    virtual ~base();
    virtual void f() const;
};

struct derived : base {
    //void f() const;
};

void foo1( derived d ) {
    d.f();
}

void foo2( const derived & d ) {
    d.f();
}

My GCC gives me (with -O) these assembly listings for foo1() and foo2(), resp.:

foo1() foo2()

_Z4foo17derived:
.LFB2:
subq $8, %rsp
.LCFI1:
call _ZNK4base1fEv
addq $8, %rsp
ret

_Z4foo2RK7derived:
.LFB3:
subq $8, %rsp
.LCFI0:
movq (%rdi), %rax
call *16(%rax)
addq $8, %rsp
ret

Even if you don’t speak AMD64 assembler, you will recognise that foo1() calls base::f() directly while foo2() calls indirectly through base‘s virtual function table.

Now, assume we’re adding a reimplementation of base::f() to derived:

// release 1.1
struct derived : base {
    /* reimp */ void f() {}
};

But we won’t recompile the application. What happens is that the application still calls base::f(). Yes, even across DSOs. The linker will replace the direct function call with an indirect one into the DSO, for sure. But it won’t second-guess the compiler and insert a virtual function table lookup again.

So now that the mechanism is clear (hopefully): Is this a binary compatibility issue?

That depends entirely on what the new functions do. If they (even ever-so-slightly) reinterpret the meaning of data in the class, class invariants might be in jeopardy.

Consider this: Code in the application that can prove the dynamic type of a base instance is a derived will not call the new reimplementations. Code that cannot, will. An instance that will be manipulated with one set of functions on one hand, and another on the other might have a hard time maintaining its class invariants if one set of functions expects one set of invariants, and the other another.

Note how this is similar to changing functions with inline linkage: Depending on whether the compiler decides to inline the code at one particular call site or not, the old copy or the new copy of the code will be called. The difference is one of awareness: If you change a method with inline linkage, you are usually aware of the possible problem. When adding a reimplementation of a virtual function, you’re probably not. Up to now you weren’t, that is :).

To summarise: Adding a reimplementation of a virtual function to a class can lead to clients of the class either calling or not calling the new implementation, depending on whether the compiler can prove at compile time that static and dynamic type of an instance are the same. For the same instance, this will typically succeed in one function and fail in another. Problems arise when the reimplementation does not take into account that it might be bypassed.

About marcmutz
Marc Mutz is a Principal Software Engineer with The Qt Company.

5 Responses to BC/SC Gotcha: reimplementing a virtual function

  1. Diederik van der Boor says:

    Hi, interesting read, but I think not fully correct.

    Two gotcha’s

    1. foo1( derived d ) never receives an object pointer, of any higher class, so there is nothing to de-virtualise. Any class you pass to it, will have the “derived” part taken out, and copied (full memory copy) as new object instance.

    2. adding virtual functions is not binary compatible any way.

    Please correct me if I’m wrong.

    • Gof says:

      1. derived is the derived, so one would expect derived::f() to be called regardeless of the ‘virtuality’

      2. re-implementing virtual functions is not neccesarily adding.

  2. Tim Northover says:

    I think the “virtual”s may be clouding the issue here.

    The foo that takes a Derived by value has a true object of type Derived (obtained via Derived::Derived(const Derived&) if a cast from SuperDerived was used). Whether f is virtual in Base, Derived, both or neither the compiler can statically determine which function needs to be called so it emits a static call to either Base::f or Derived::f if it exists.

    Virtual functions only come into play when you call a member function via a reference or pointer. In this case a Derived& could either be a real Derived or something else. In your first example Base::f is virtual so it’s called via the vtable. In the second case Derived::f is no longer virtual so it would appear as if an optimisation has taken place. Really it’s a subtle distinction — calling d.f() when d actually refers to a SuperDerived will still (by design) call Derived::f because it’s no longer virtual in children of Derived.

    I think the references and virtual-tables are really just masking the major issue that modules compiled with a previous version of Derived shouldn’t necessarily work with newer ones. In particular the foo2 version works via Base’s virtual table (sitting in Derived) so would probably go horribly wrong if f had covariant return types or virtual inheritance was used.

  3. Gof says:

    Note that if the virtual function that you are reimplementing is not in the first base class, it is never BC.

    Example: code in version 1.0

    struct A { virtual void f() {} };
    struct B { virtual void g() {} };
    struct C : A, B { virtual void h() {} }; //does not reimplement f or g;

    Then in versionh 1.1, you change C

    struct C : A, B { virtual void h() {}
    virtual void g() {} }; //added reimplementation of g()

    this will change the layout of the virtual talbe of C by adding a new entry of g into the first table. Hence, if you inherit from C in your application, the virtual table of this derived class will appear corrupted.

    • marcmutz says:

      this will change the layout of the virtual talbe of C by adding a new entry of g into the first table. Hence, if you inherit from C in your application, the virtual table of this derived class will appear corrupted.

      This doesn’t sound logical, but you seem to be right. At least the assembly of

      struct D : C {};
      void foo( D & d ) { d.g(); }
      

      changes when adding C::g.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.