CRTP Pattern with Most Derived Rebind, streambuf as Example


1 The Name

CRTP(Curiously Recurring Template Pattern) is a C++ idiom in which a class X derives from a class template instantiation using X itself as template argument.
CRTP is used for static polymorphism, to avoid virtual functions, and give the compiler a chance to do the best optimization. Call of virtual function doesn't cost much, esp. with correctly predicted branch. but virtual function stop the inline optimization, and cost very much for floating point numeric algorithms.
CRTP is used for equipping object with interfaces based on a very smaller set of interfaces.

2 streambuf Example

std::streambuf is a suitable example. Most of the public/protected functions is implemented based on virtual functions overflowunderflowuflow. Functions like xsgetn and xsputn with default implementation, based on overflowunderflowuflow. The pointer gptrpptr are null if it's not modified by the derived object. Every time in the loop, xsgetn / xsputn check the pointer range. If it's not empty(the pointer range modified by the derived object), memcpy can be used for performance. If it's empty(or all exhausted), uflow / overflow is called.
In some instances(char oriented), gptr and pptr are always null, we can override the implementation without the check. And in some instances, the whole buffer is contiguous, then overflow / uflow always return eof. We can override the virtual functions with these conditions. But not the non-virtual one.
CRTP is suit to solve these two problems, without dynamic cost.
template <class Derived, class Char, class Traits>
struct streambuf_base {
  // ....

  Derived & get_derived() {
    return *this;
  }

  const Derived & get_derived() const {
    return *this;
  }

  int_type snextc() {
    int_type r = traits_type::eof();
    if (get_derived().sbumpc() != traits_type::eof())
      r = get_derived().sgetc();
    return r;
  }

  // ....
};
The default snextc implementation uses sbumpc and sgetc, with get_dervied prefix, the functions are resolved from derived to template base. Only if the derived doesn't override these functions, the template base's default are taken.
For example, stringbuf, alike to std::stringbuf, can implement snextcsbumpc and sgetc, just with one pointer comparation. All other default implementations will use this new-implemented version, thanks to get_derived prefix.

3 The Problem

The above is fine with just static polymorphism. The problems raise if we want the streambuf_base be compatible with std::streambuf, to reuse all routines who only accepts std::streambuf. So we must inherit from std::streambuf.
We can't inherit from std::streambuf in derived. If we do, any functions not re-implemented by derived will be ambiguous.
Inherit from std::streambuf in streambuf_base is fine. But then the default implementations in streambuf_base will override all in std::streambuf. It's ok in these case, but not acceptable if we inherit from another base, which provides more resonable implementations than streambuf_base.
So, what we want is an order of functions resolution. from derived to ancestor(std::streambuf in former case), and at last streambuf_base.

4 The Solution

First, we must modify the streambuf_base to take another Ancestor type parameters, and provides dummy as default.
We can check whether Ancestor is dummy in every streambuf_base's function. If Ancestor is not dummy, we just call Ancestor's version. Otherwise, the default implementation is taken.
template <class Derived, class Char, class Traits, class Ancestor=dummy>
struct streambuf_base : Ancestor {
  static const bool is_dummy = std::is_same<Ancestor, dummy>::value;
  typedef typename std::conditional<
    is_dummy, streambuf_base, Ancestor>::type ancestor_type;

  int_type snextc() {
    if (!is_dummy) return ancestor_type::snextc();
    int_type r = traits_type::eof();
    if (get_derived().sbumpc() != traits_type::eof())
      r = get_derived().sgetc();
    return r;
  }
};
This introduces some other problems. If Ancestor is another one derived from streambuf_base, the Ancestor::get_derived is wrong. So if Ancestor::snextcis taken, Ancestor::snextc will call another sbumpc and sgetc, not the most-derived object's one.
One solution to this problem is using virtual functions. But make all functions be virutal is not a good solution.
Another solution is to redefine the get_derived. Add another MostDerived paramter to streambuf_base, and add most_derived_rebind meta functor. For every Ancestor, we check whether Ancestor::most_derived_rebind exists. If it exists, streambuf_base inherited from Ancestor::most_derived_rebind::template apply::type instead of Ancestor.
template <class Derived,
          class Ancestor,
          bool=has_nested_type_most_derived_rebind<Ancestor>::value>
struct streambuf_base_rebind_ancestor {
  typedef Base type;
};

template <class Derived, class Base>
struct streambuf_base_rebind_ancestor<Derived, Ancestor, true> {
  typedef typename Ancestor::most_derived_rebind::template apply<Derived>::type type;
};

template <class Derived, class Char, class Traits,
  class Ancestor=dummy,
  class MostDerived=Derived>
struct streambuf_base : streambuf_base_rebind_ancestor<MostDerived, Ancestor>::type {
  typedef typename streambuf_base_rebind_ancestor<
    MostDerived, Ancestor>::type _ancestor_type;
  typedef typename std::conditional<
    std::is_same<_ancestor_type, dummy>::value,
    streambuf_base, _ancestor_type>::type ancestor_type;

  MostDerived & get_derived() {
    return *this;
  }

  const MostDerived & get_derived() const {
    return *this;
  }

  struct most_derived_rebind {
    template <class T>
    struct apply {
      typedef streambuf_base<Derived, Char, Traits, Ancestor, T> type;
    };
  };
};

5 The Result

With above streambuf_base, we can implements text format input/output routines in very higher performance, even faster than strtod / strtol in experiments(locale is ignored).
The floating pointer converion is done by double-conversion, in shorted mode. For printf/scanf and stringstream, the precision is 20.
The random sequence for floating is generated by random() / M_PI and for long, just random().
For fscanf and fprintf, the FILE* is obtained by fmemopen and open_memstream.
Random generated input speed(in nano-seconds)
implements / typelongfloatdouble
streambuf_base45.1380623474121387.06188110351565193.94798608398446
strtox56.3869073638916299.91599140930177246.5969282989502
fscanf201.75892387390138244.81490730285645437.06397615051276
istringstream168.8839549407959382.54306460571297674.1460767364503
Random generated output speed(in nano-seconds)
implements / typelongfloatdouble
streambuf_base109.96006123352053221.0190715637207285.9951219940186
snprintf290.83284875488281263.60215902709961915.7261646881104
fprintf279.305025695800741298.27311700439442073.194082229614
ostringstream122.751863525390631086.2168742828371651.6398792114258