OACC - Online Algorithmic Complexity Calculator

Choosing evaluation parameters for Block Decomposition

If the string that you want to evaluate has length shorter than $13$, you should use the Coding Theorem Method ($\textit{CTM}$) to estimate its algorithmic complexity. Otherwise, you should use the Block Decomposition Method ($\textit{BDM}$), which requires specifying block size and block overlap values if you do not want to use the (optimal) values. The key is to always compare among values with same block size and block overlap values, and not among different ones (as the size and overlap may under- or over-estimate complexity)

The following is a $\textit{BDM}$ partitioning example for block size $= 6$ and block overlap $= 1$ as an illustration of the meaning of block size and block overlap in the estimation of a complexity of a long string:

$\textit{BDM}$ is defined by $$\textit{BDM} = \sum_{i}^{n} \textit{CTM}(\textit{block}_i) + \log_{2}(|\textit{block}_i|),$$ where $\textit{CTM}(\textit{block}_i)$ denotes the approximated algorithmic complexity of the pattern in the block of symbols, and $|\textit{block}_i|$ denotes the number of occurrences (multiplicity) of the block.

You should always pick the largest available block size, as it provides better approximations to algorithmic complexity. In contrast, the smallest block size ($=1$) approximates Shannon Entropy. You can pick any overlapping value that is shorter than your block size. For example, say your string is $111001010111$ and you use block size $=6$. If the overlapping is $0$, then $\textit{BDM}$ will look up the known $\textit{CTM}$ values of $111001$ and $010111$ and add them up, outputting $29.9515$ bits. Alternatively, you can choose block overlap $=1$, for which the strings whose $\textit{CTM}$ values are added are $111001$, $101011$, and $010111$. This second evaluation will output $32.9672$ bits.

Overlapping helps to deal with leftovers of the block partitioning if the string length is not a multiple of the block size, otherwise, leftovers on the borders with length less than the block size will be discarded and won't be considered in the complexity estimation. However, overlapping leads to overestimation and thus is not advised. We have proven in this paper that the error from the underestimation when leaving the borders out of the evaluation converges and is bounded by a fraction of the overall complexity. shows different schemes to deal with boundary conditions. The online calculator only deals with the most 'basic' of these schemes, but we have proven that the error is bounded, and thus the output values are reliable for most comparative purposes. In general, overlapping blocks produce overestimations of complexity, and non-overlapping blocks lead to an underestimation only for objects with dimensions that are not a multiple of the block size.

You should always compare results with the same chosen parameters (unless you estimate the error as we did in this paper and then make corrections or take the deviations into consideration).

As for matrices, the same rule holds. Current support for strings is binary and non-binary, but for arrays it's currently only binary. With some loss of precision, one can always translate any alphabet into binary with some loss of information due to the extra granularity introduced in the translation.

Estimating Algorithmic Probability (AP)

For more technical details you must read the papers listed in the Bibliography Section below. In a nutshell, we calculate a function $D(n,m)(s)$, which estimates the Algorithmic Probability of a string $s$ from a set of halting Turing machines with $n$ states and $m$ symbols denoted by $(n,m)$. We use the standard model of Turing machines used by Tibor Rado in the definition of the Busy Beaver problem, but we have also proven that radical changes to the model produce similar estimations. Beyond the known values of the Busy Beaver problem we have also shown that educated choices of reasonable halting times can be chosen to reach certainty up to any arbitrary statistical significance level.

Formally, $$D(n, m)(s)=\frac{|\{T\in(n, m) : T(p) = s,\ b\in\{0,1,\cdots,m-1\}\}|}{|\{(T,b)\in(n, m)\times \{0,1,\cdots,m-1\} : T(b) \textit{ halts }\}|},$$ where $T(b)$ represents the output of Turing machine $T$ when running on a blank tape filled with symbol $b$ that produces $s$ upon halting, and $|A|$ represents the cardinality of the set $A$.

For $(n,2)$ with $ n < 5 $, the known Busy Beaver values give the maximum number of steps that a machine can run upon halting. But for $n \geq 5$ or for $(n,m)$ with $m > 2$, no Busy Beaver values are known, and the size of the machine spaces make impossible a complete exploration to calculate $D(n,m)$ for arbitrary $n$ and $m$, but an educated choice of timeouts can be made and samples produced (see the bibliography section below).

Approximating Algorithmic Complexity (K) by CTM and BDM

The function $D(n,m)$ is an approximation to Levin's Universal Distribution $\mathfrak{m}(s)$, and it can be used to approximate $K(s)$, the algorithmic complexity of string $s$ by using the Coding Theorem, $$K(s) \simeq -\log_2 \mathfrak{m}(s)$$

The greater value of $n$ we use to calculate $D(n,m)$, the better approximations we make to $K(s)$ for $s$ a string of an alphabet of $m$ symbols. Due to the uncomputability of $D(n,m)$ we work with samples and runtime cutoffs. For the simulation of Turing machines we use a C++ simulator running on a supercomputer of medium size.

$\textit{BDM}$ extends the power of $\textit{CTM}$ as explained in this paper.

Approximating the complexity of non-binary strings or arrays

The calculator allows the use of a number of symbols greater than 2, i.e. non-binary. However, one must take into consideration some factors that may appear non-obvious. When evaluating sequences such as 123456789 you may find that it has not a lower complexity than any other permutation of the same length. This is how Shannon Entropy would also behave yet we are claiming we go beyond Entropy. This is because for a computer program, each individual label for a digit, e.g. the symbol 8, has no meaning. To us, 8 is a number one short from 9 and one above 7, so one can actually reconstruct all the natural numbers from the number. For a Turing machine, 9 is just a symbol and not a number. Yet one would like to identify that 123456789... is of low algorithmic complexity as it is produced by a function such as f(x) = x+1 which certainly can be encoded in a single line of computer code. The easiest way to do this is, surprisingly, transforming each digit into a binary sequence. The result looks like 110111001011101111000... which is simply a binary counter of which there are many more Turing machines and computer programs than most permutations. This is because the binary converter does inject the meaning into the binary sequence.

Approximating Bennett's Logical Depth

The $\textit{CTM}$ allows us not only to build an empirical distribution of computer program outputs from smallest to larger size but once a string is generated for the first time among all the computer programs of the smallest size, we know which Turing machine is the smallest one producing a string and also the runtime of such a Turing machine. We take the runtime of such a Turing machine as an estimation of Bennett's Logical Depth ($LD$) by $\textit{CTM}$, and also extend the power of $\textit{CTM}$ to estimate $LD$ with a multiplicative variation of the $\textit{BDM}$ formula. Despite the fact that $LD$ is neither lower nor upper semi-computable (and therefore truly non-computable), estimations by $\textit{CTM}$ and $\textit{BDM}$ do produce the characteristic concave distribution assigning algorithmic random strings lower logical depth, thereby conforming with the theoretical expectation unlike Shannon Entropy and lossless compression:

Unlike approximations to algorithmic complexity by lossless compression (top left plot), $LD$-based values using $\textit{CTM}$ and $\textit{BDM}$ conform to theoretical expectation regarding $LD$ behaviour. The behaviour is confirmed among all the $\textit{BDM}$ variations according to different boundary conditions schemes, and it could not be reproduced either using Shannon Entropy (or Entropy rate) nor using lossless compression algorithms. Further information is available in this paper.

Calculating your ability to behave randomly by producing a random-looking sequence and grid

In an article published by PLoS Computational Biology we have shown that the ability to produce algorithmic randomness peaks at age 25. The article was widely covered by the world media. The data produced by more than 3400 people generating random data can be found in here. You can test your own personal ability using this calculator. The results in the paper are given in what are called '$\textit{z-scores}$'. To obtain your result and be able to compare it to those in the paper, you only need to apply the following formula to the output that you obtain from this calculator: your $\textit{z-score} = (K-m)/s$, where $K$ is the output that you will obtain from this calculator for a sequence (in the tab 'For short strings' choose $\textit{CTM}$ value) or for a grid (in the tab 'For binary arrays' choose $\textit{BDM}$ value). The values for $m$ and $s$, and the parameters to choose in the calculator appear in the following table:

Your $\textit{z-score}$ calculation
item	task	$m$	$s$	alphabet	length
1	Toss	32.5	1.56	2	12
2	Guess	34.7	1.07	5	10
3	Spot	41.41	0.93	9	10
4	Roll	36.62	1.04	6	10
5	Grid	17.01	1.08	2D	3 x 3

Graph complexity and perturbation analysis to move networks towards and away from randomness

Our algorithmic causal calculus article explains the details of the methods, and we have prepared an animated video with some of the basic ideas behind these methods in an application to steering biological (genetic regulatory) networks:

The method implemented in this version of the calculator does not allow the removal of nodes in networks of less than 5 nodes.

Numerical Limitations

Numerical limitations of CTM is the ultimate incomputability of the universal distribution, and the constant involved in the invariance theorem which, nevertheless, we have quantified and estimated to apparently be under control, with values converging even in the face of computational model variations. Our papers cover these limitations and their consequences should be taken into account.

For BDM, the limitations are explored in this paper and they are related to boundary conditions and to the limitations of CTM itself. The paper also shows that when CTM is not updated, BDM starts approximating Shannon Entropy in the long range, yet the local estimations of CTM shed light on the algorithmic causal nature of even large objects.

For the algorithmic perturbation analysis of graphs and networks capabilities, as described in our algorithmic causal calculus article and implemented in this calculator, the current version (3.0) does not correct for graph isomorphisms, something that one has to take into consideration by taking the minimum value of the information shift among all nodes/edges in the same group orbit. This is because of limitations of the Block Decomposition Method (BDM) to deal with the boundaries of adjacency matrices not multiple of the BDM block size (e.g. 4x4). In version 3.5 we are aiming at making this correction automatically by calculating the graph automorphism group using publicly available computer programs such as nauty or saucy that you can currently use yourself to make these corrections in the meantime. In any case, we have demonstrated that the error vanishes for large networks and we have also proven that approximations of algorithmic complexity by BDM of a labelled graph (an instance of its automorphism group) is a good approximation of the unlabelled version because one can calculate the group from any instance with a brute-force program of (small) fixed length (even when computationally intractable). So, in practice, the approximations provided are still reasonably good enough, robust and stable even if with some irregularities for some elements.

Version history and future of the OACC

We keep expanding the calculator to wider horizons of methodological and numerical capabilities:

Version 1: Estimations of $K$ for short binary strings
Version 2: Expanded $\textit{CTM}$ and $\textit{BDM}$ capabilities for non-binary strings and arrays
Version 2.5 (current version): Estimations of Bennett's logical depth based on $\textit{CTM}$
Version 3 (some Beta functionality available): Algorithmic-information dynamics of strings, arrays and networks
Version 4: Algorithmic complexity of models, algorithmic feature selection, algorithmic dimensionality reduction and model generator.

Bibliography

Delahaye J.-P. and Zenil, H. (2012) Numerical Evaluation of the Complexity of Short Strings: A Glance Into the Innermost Structure of Algorithmic Randomness. Applied Mathematics and Computation 219, pp. 63-77.
BibTex entry

Soler-Toscano F., Zenil H., Delahaye J.-P. and Gauvrit N. (2014) Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines. PLoS ONE 9(5): e96223.
BibTex entry

Zenil H., Hernández-Orozco S., Kiani N.A., Soler-Toscano F., Rueda-Toicen A. (2017)A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity arXiv: 1609.00110 [cs.IT]

Zenil H., Soler-Toscano F., Dingle K. and Louis A. (2014) Correlation of Automorphism Group Size and Topological Properties with Program-size Complexity Evaluations of Graphs and Complex Networks, Physica A: Statistical Mechanics and its Applications, vol. 404, pp. 341–358, 2014.
BibTex entry

Zenil H., Soler-Toscano F., Kiani N.A., Hernández-Orozco S., Rueda-Toicen A. (2016) A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity, arXiv:1609.00110
BibTex entry

Zenil H., Kiani N.A., Marabita F., Deng Y., Elias S., Schmidt A., Ball G., Tegnér J.(2017) An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems arXiv:1709.05429 [q-bio.OT]

See also the Publications section for more articles and to know what to cite for what you use.

The Online Algorithmic Complexity Calculator

Overview

Advantages of CTM and BDM over Entropy and Lossless Compression

Choosing evaluation parameters for Block Decomposition

Estimating Algorithmic Probability (AP)

Approximating Algorithmic Complexity (K) by CTM and BDM

Approximating the complexity of non-binary strings or arrays

Calculating your ability to behave randomly by producing a random-looking sequence and grid

Graph complexity and perturbation analysis to move networks towards and away from randomness

Numerical Limitations

Version history and future of the OACC

Bibliography

Content on this site is licensed under a
Creative Commons Attribution 3.0 License

View License Deed | View Legal Code

Contact info: hector.zenil at algorithmicnaturelag dot org

If you use results from the OACC in a publication, please visit How to Cite.

Algorithmic Nature Group - LABORES

The Online Algorithmic Complexity Calculator

Overview

Advantages of CTM and BDM over Entropy and Lossless Compression

Choosing evaluation parameters for Block Decomposition

Estimating Algorithmic Probability (AP)

Approximating Algorithmic Complexity (K) by CTM and BDM

Approximating the complexity of non-binary strings or arrays

Calculating your ability to behave randomly by producing a random-looking sequence and grid

Graph complexity and perturbation analysis to move networks towards and away from randomness

Numerical Limitations

Version history and future of the OACC

Bibliography

Content on this site is licensed under a Creative Commons Attribution 3.0 License View License Deed | View Legal Code

Contact info: hector.zenil at algorithmicnaturelag dot org

If you use results from the OACC in a publication, please visit How to Cite. Algorithmic Nature Group - LABORES

Content on this site is licensed under a
Creative Commons Attribution 3.0 License

View License Deed | View Legal Code

If you use results from the OACC in a publication, please visit How to Cite.

Algorithmic Nature Group - LABORES