Learn STL - What is std::valarray and how does it work?

valarray is probably the least known standard library container, it uses expression templates to optimize compute optimizations. This technique is the foundation of most numerical libraries such as Eigen, PyTorch, and TensorFlow.

Expression templates improve the performance of compute operations as memory-bandwidth matters more than computational time. In naive libraries, there will be binary functions that perform operations such as addition or multiplication. This is equivalent of writing,

learn::vector<double> values(30000, 0.4);

for (auto& value: values) {
    value += 3.0;
}

for (auto& value: values) {
    value *= 2.1;
}

for (auto& value: values) {
    value *= value;
}

, this snippet loops through the data three times. But what does that mean? For each operation, the pre-fetcher has to get data from memory or cache, the operation is performed, and then its written back to memory. This is repeated 3 times. If this was written in a single loop; it only has to fetch the data once.

learn::vector<double> values(30000, 0.4);

for (auto& value: values) {
    const auto tmp = 2.1 * (value + 3.0);
    value = tmp * tmp;
}

. As the size of values increases, the single loop version will tend to 3x quicker than the naive version. This is the problem that expression templates solve. For smaller vectors and matrices of <10 values the effect is negligible.

How its used

To construct a valarray we can pass the value and number of elements, annoyingly in the opposite order as vector. Operations are performed with valarray as if it was a value. We can see this in the snippet below,

learn::valarray<double> values(0.4, 30000);

const auto tmp = 2.1 * value + 3.0;
values = tmp * tmp;

, this is equivalent to a single loop version. In some languages optimized for compute such as Fortran; array operations are performed this way by default.

The valarray can then be accessed like other containers,

const double a = values[12];

How it works

valarray and all operations on it rely on and inherit from detail::expression. An expression consists of a .size() and a operator[] which is indexed from 0 to .size(),

namespace learn {
namespace detail {
template <typename ObjectT, typename ValueT>
class expression {
  public:
    using object = typename std::decay<ObjectT>::type;
    using value_type = ValueT;
    using size_type = std::size_t;

    value_type operator[](const size_type index) const {
        return static_cast<const object&>(*this)[index];
    }
    size_type size() const { return static_cast<const object&>(*this).size(); }

    operator object&() { return static_cast<object&>(*this); }
    operator const object&() { return static_cast<const object&>(*this); }
};

// more detail stuff...
} // namespace detail

. expression can be used for lazy evaluation and to construct expressions. To construct operator+ for valarray, we use binary_op which takes a left-hand and right-hand expression. binary_op uses the Curiously Recurring Template Pattern to inherit from expression. This lets binary operations be used together to form more complex expressions. We can see how binary_op can be used to construct operator+ by passing learn::plus as a template type,

template <class LhsT, class RhsT, class Op>
class binary_op : public expression<binary_op<LhsT, RhsT, Op>, typename LhsT::value_type> {
  public:
    using Lhs = typename std::decay<LhsT>::type;
    using Rhs = typename std::decay<RhsT>::type;
    using size_type = std::size_t;
    using value_type = typename Lhs::value_type;

    static_assert(std::is_same<typename Lhs::value_type, typename Rhs::value_type>::value);

    binary_op(const LhsT& lhs, const RhsT& rhs) : lhs_(lhs), rhs_(rhs) {}

    value_type operator[](const size_type index) const { return Op()(lhs_[index], rhs_[index]); }
    size_type size() const { return lhs_.size(); }

  private:
    const Lhs& lhs_;
    const Rhs& rhs_;
};
}  // namespace detail

template <typename Lhs, typename Rhs>
auto operator+(const Lhs& lhs, const Rhs& rhs) {
    return detail::binary_op<Lhs, Rhs, plus<typename Lhs::value_type>>(lhs, rhs);
}

. The expression is then evaluated when valarray is constructed from detail::expression. The expression is evaluated in a single for-loop, avoiding the performance costs associated with naive operator overload implementations.

template <typename ValueT>
class valarray : public detail::expression<valarray<ValueT>, ValueT> {
  public:
    using value_type = ValueT;
    using size_type = std::size_t;
    using reference = value_type&;
    using const_reference = const value_type&;
    // other type definitions

    // other constructors

    template <typename Expr>
    valarray(const detail::expression<Expr, ValueT>& expression) {
        data_.reserve(expression.size());

        for (size_type i = 0; i != expression.size(); ++i) {
            data_.emplace_back(expression[i]);
        }
    }

    size_type size() const { return data_.size(); }

    reference operator[](const size_type index) { return data_[index]; }
    const_reference operator[](const size_type index) const { return data_[index]; }

  private:
    using Storage = vector<value_type>;
    Storage data_;
};

} // namespace learn