Cache, Y Combinator in C++17 Lambda

The good of this approach is that the non cached function keeps simple, and the cache is mixed deeply in to the implementation. The demo implementation only supports a single key, but that could be easily extended to a tuple.

#include <iostream>
#include <memory>
#include <unordered_map>

#define CE(...)                                                               \
  {                                                                           \
    const auto&& Result_ = (__VA_ARGS__);                                     \
    std::cout << __LINE__ << "]" << #__VA_ARGS__ << " = " << Result_ << "\n"; \

int main(int argc, char* argv[]) {
  auto fib = [](auto&& impl, std::uint64_t n) {
    if (n < 2) return n;
    return impl(impl, n - 2) + impl(impl, n - 1);
  auto add_cache = [](auto&& impl_do, auto cache_ptr) {
    return [cache_ptr, impl_do](
               auto& impl, const auto& key) -> decltype(impl_do(impl_do, key)) {
      auto [it, is_new] = cache_ptr->try_emplace(key);
      if (is_new) {
        it->second = impl_do(impl, key);
        std::cerr << "cached key=" << key << " value=" << it->second << "\n";
      return it->second;
  std::unordered_map<std::uint64_t, std::uint64_t> cache;
  auto fib_cache = add_cache(fib, &cache);

  CE(fib_cache(fib_cache, 40));
  CE(fib_cache(fib_cache, 50));

  auto fix = [](auto func) {
    return [func](const auto&... args) {
      return func(func, std::forward<decltype(args)>(args)...);

  return 0;

A Simple Refine to WebKit's WTF::Lock

There is a very good article about lock in WebKit blog: Locking in WebKit. It has a lot details about the implementation and benchmarks.

TL; DR: The Lock is based on an user land implementation of Linux futex, with platform mutex and condition_variable.

After reading the article, I checked out the recent WebKit code, and tried the benchmark LockSpeedTest.cpp with some other locks, this is the result:

The weizi_mutex(the red line, i.e. the second line in the right end) is my toy mutex after reading the article, with one refinement:

In Unlock, in order for other threads to barge in as early as possible, it clears the holding bit first, and tries to awake another thread if the parking bit is set and hold_bit is still clear.

Inexact Sum of Floating Point Numbers

It is common to calculate sum. Usually, a sum over $n$ numbers has $n - 1$ roundings.

Use $fl(x)$ as the floating number round $x$ (usually rounded by hardware, the closest representable number, with some even/odd considerations). So $n$ numbers $x_1, x_2, \ldots, x_n$ is usually calculated as

$$S(x_1, x_2, \ldots, x_n) = fl(x_n + fl(x_{n-1} + \cdots ))$$

This calculation is acceptable in most case since floating type, like double, has quite large precision. Things become worse when the number of values are large, and values are in a wild range, esp. when a lot of them could cancel each other, and the result is very small compared to some numbers. In tradition, we could use Kahan summation algorithm to improve the precision.

In this post, I try a simple algorithm which would give the rounding floating point number of the exact sum.

Two number sum could be represented exactly as the floating sum and residual, i.e. $a + b = fl(a + b) + e$, with e also a floating number. In Knuth's TAOCP book, there is a simple algorithm discussion to find the $e$.

Similar to this idea, and also considering floating formats usually have fixed width digits (mantissa part) and exponents, so the sum could be represented by a limit numbers sum. That is, for numbers $x_1, x_2, \ldots, x_n$, we could find up to $L$ numbers $y_j$, such that

$$\sum_i x_i = \sum_{j=1}^{L} y_j$$.

We can assume that a minimal number of $y_i$ have no zeros,and no overlapped precession, i.e. any $i$, $j$, $fl(y_i + y_j) = y_i$, or $y_j$, since we could always rewrite $y_i + y_j$ to $fl(y_i + y_j) + e$. Based on this, we could bound the $L$ as the exponent span divided by the digits size.

To calculate numbers sum, we manage a number series (length up to $L$), starting from empty. We add numbers to this series one by one. When the length exceeds the limit $L$, we merge some overlaps $y_i + y_j$ to $fl(y_i + y_j) + e$, and some $e$ or even $fl(y_i + y_j)$ are zeros and omitted, which makes the length bounded by $L$ again. After all numbers added, we merge all overlaps and the number in the series with largest absolute value is the final result.


Download all pdf files from Google scholar alert emails

I have gathered two many Google scholar alert emails. When i am with my laptops, i have much more fun things to do than reading papers. But when I am with my e-reader, there are only a small number of older documents. To keep documents in the e-reader updated, i added a cronjob to parse the scholar alert emails in Gmail's inbox, download all pdf files, push them into Google drive, so that next time, i could simply sync these folders again with my e-reader.

The alert email parsing file is a pure Python script, but the shell script which downloads gmails, and syncs files between local and Google drive depending on some personal binaries, which need to be replaced if you want to use, sorry. The csv_sql could be simply replaced by sqlite3, the pjobs could be replaced by parallel from moreutils, and sync between local and Google drive is optional.

A naive secure model of secure sharing

I have used secure sharing to distribute my private encrypted data for many years. And i have a demo project in my github. FYI, this demo project is not the one used by me in these years, since it is just a demo and is not safe.


In a secure sharing, we say every sharing as a part of the original message, and we can denote the $i$-th part by $P_i$.

Probability Model

To protect our data, we have to prove the secure sharing scheme is safe. A naive definition of sharing safe can be:

Given any message distribution, the likelihood of a part is independent of any other (K - 1) parts.

The above text can transferred to, for any given message distribution, and some parts $P_{i_j}$ of any message extracted from this distribution,

$$P(P_{i_0}|P_{i_1}, P_{i_2}, \ldots, P_{i_{K-1}}) = P(P_{i_0}).$$

A easiest way to achieve this is using a weaker secure shared transformation T, which is not part of message, and based on the transformation, we have, any (K - 1) parts are valid for any message. That is, for any message M, and any (K - 1) parts $P_0, P_1, \ldots, P_{K-1}$, we have a transformation $T$, such that $S_i(M|T) = P_i$, where $S_i$ is a $i$-th sharing part under transformation $T$. If we can get the $T$ from $K - 1$ parts, the secure is still not guaranteed.

But it's easy to prove that a revertible transformation can be secured shared in a sense that:

Given any $K - 1$ rows of a revertible transformation matrix, the space for the last row is isomorphic to $F^{K-1}$, where $F$ is the under field space.

That is, we lost one random dimension. If this is not acceptable, we can have a chain of transformations, and this chain will converge to the real random secure model.


In practice, only one transformation, plus a random accumulated random vector, give quite high entropy of every parts, which is verified by gzip.

If you have different views of this secure sharing model, please kindly let me know, so that i'm not in a risk I do not know.

CMake to mimic Google Build language

Google use a description language for build and dependency management. That is very convenient, in order to use some libraries, you just need to include some headers in your C++, and add that libraries path to 'deps' list.

I'm using CMake for personal tiny projects. One CMake feature that annoys me a lot is no target namespaces. We can not define the same target in different directories. So I thought, we can define some Google BUILD like functions for CMake and make every target dependent on the it's directory. In my personal tiny projects, I'm using weizi as root of source directory, and in every sub directory, we can refer targets in current directory directly and targets in other directory with full path.

  LIBS /options_parser/options_parser)
  LIBS /options_parser/options_parser
  LIBS /test/test_lite_main)

We can not use '/' in target names directly, one choice is using '.' to join directories.

Real symmetric matrix exponent

General matrix exponent is not trivial, scipy's implementation(help scipy.linalg.expm) refers a paper A New Scaling and Squaring Algorithm for the Matrix Exponential, and here is a revisited version.

But in most optimization cases, the matrices are real and symmetric, even positive definite. We can build a easy to understand method based on SVD. Let us assume that

$$A = USV'$$

is the SVD decomposition of a symmetric matrix A, we have

$$A = USV^T, A^T = VSU^T = A$$
$$A^2 = USV^TVSU^T = US^2U^T$$
$$A^3 = US^2U^TUSV^T = US^3V^T$$
$$A^{2k} = US^{2k}U^T$$
$$A^{2k+1} = US^{2k+1}V^T$$

This suggests that

$$e^A = \sum_k \dfrac{A^k}{k!} = \sum_k U\dfrac{S^{2k}}{(2k)!}U^T + \sum_k U\dfrac{S^{2k+1}}{(2k+1)!}V^T$$

Which leads to

$$e^A = \dfrac{U(e^S + e^{-S})U^T + U(e^S - e^{-S})V^T}{2}$$

If A is a positive definite matrix, then $U=V$, so the above formula can be simplified as $e^A=Ue^SU^T$.