Artificial Hydrocarbon Networks

Description

Artificial Hydrocarbon Networks (AHN) is a supervised learning method, inspired on chemical hydrocarbon compounds and derived from artificial organic networks. Since it derives from the latter, it also allows modularity and organization of information, inheritance of packaging information, and structural stability.

As the name implies, this algorithm is only composed by hydrogen and carbon atoms. By linking these two elements in different arrays, different molecules are formed. As in AON, the union of two or more molecules forms a compound, and different compounds combined together create mixtures.

Some of the most useful properties of AHN are presented when considering regression and classification problems. These useful properties are stability, robustness, packaging data and parameter interpretability. Stability ensures the minimal changes in its output response with slight input changes. Robustness considers that the algorithm is able to deal with noisy and uncertain data, effectively acting as a filtering system. Packaging data, enables the clustering of similar data by computing molecular structures and their alikeness. This also means that data is not just stored or packaged, it is also stored by its tendency. Parameter interpretability means that different coefficients and weights (molecular centers, hydrogen values, stoichiometric coefficients, intermolecular distances, etc.) involved in the algorithm can be useful as metadata, to partially understand underlying information or to extract features.

Fundamentals

Below, you will find a brief explanation about the components and interactions to build an artificial hydrocarbon network.

CH-Molecules

In artificial hydrocarbon networks, the basic unit of information is the CH-molecule. It is made of two or more atoms related among them in order to define a behavioral function $ \varphi_i(x) $ due to an input vector $x = \{x_1,\dots,x_k\}$. This CH-molecule is made of one carbon atom with value $v_C$ and surrounded with $d$ hydrogen atoms with values $h_i \in C$, with $1 \leq d \leq 4$, expressed as: $$ \varphi_i(x) = v_C \sum_{r=1}^{k}\prod_{i=1}^{d}(x_r - h_{i,r}).$$

Unsaturated CH-molecules ($d < 4$), denoted as $CH_d$, can be joined together forming chains of molecules namely artificial hydrocarbon compounds.

Artificial Hydrocarbon Compounds

A compound behavior function $\psi(x)$ is then defined in terms of the $n$ unsaturated molecules via their molecular behaviors $\{\varphi_1(x),\dots,\varphi_n(x)\}$, like $$\psi(x) = \psi(x;\varphi_1,\dots,\varphi_n). $$

The simplest definition of a compound behavior $\psi$ is the linear and saturated chain compound represented as $$ CH_3 - CH_2 - \cdots - CH_2 -CH_3,$$ and expressed like $$ \psi(x) = \{\varphi_i(x) | i = \arg\min_t(x-M_{c,t})\};$$ where, $M_c$ represents the center of the molecule in which it has the most influence value over the input space.

Intermolecular Distances

Two adjacent CH-molecules with centers $M_{c,j-1}$ and $M_{c,j}$ are separated with length $r_j = \|M_{c,j}-M_{c,j-1}\|$ for all $j=1,\dots,n-1$ and initial condition $M_{c,0}=0$. This length is also called intermolecular distance.

Mixtures

Several artificial hydrocarbon compounds can interact among them in definite ratios (weights), so-called stoichiometric coefficients $\alpha_i$, forming a mixture with behavior $S(x)$, as shown $$S(x) = \sum_{i=1}^{c}\alpha_i \psi_i(x).$$
Formally, an artificial hydrocarbon network is a mixture of artificial hydrocarbon compounds, each one computed using a chemical-based heuristic rule, expressed in the so-called AHN-algorithm.

Algorithm

The AHN-algorithm provides a chemical-based heuristic rule to build and train artificial hydrocarbon networks.

The aim of this algorithm is to find the inner parameters (hydrogen and carbon values, centers of molecules, intermolecular distances, stoichiometric parameters), as well as to find a suitable topology of mixtures (number of compounds) and of artificial hydrocarbon compounds (number of molecules, number of hydrogens per molecule).

The general AHN-algorithm is as follows:

Inputs: data in tuples $(x,y)$, maximum number of compounds $c_{max} \geq 1$, maximum number of molecules $n_{max} \geq 2$, tolerance value $\epsilon > 0$, learning rate $0 < \eta < 1$.
Outputs: structure and values of mixture $S(x)$.
  1. Set $i =1$.
  2. Set the residual $R_i = y$.
  3. While $i < c_{max}$ and $\|R_i\| > \epsilon$ do
    1. Set $n = 2$ molecules.
    2. While $n < n_{max}$ and $m > 1$ do
      • Build a compound with $n$ molecules.
      • Find hydrogen and carbon atom values and centers, $h_{i,r}$, $v_C$ and $M_{c,ยท}$.
      • Update intermolecular distances $r_j$ using a gradient descent approach and learning rate $\eta$.
      • Compute the enthalpy value $m$.
      • If $m > 1$, then $n = n+1$
    3. Set the compound behavior $\psi_i(x)$.
    4. Compute the residual $R_{i+1} = R_i - \psi_i(x)$
    5. Update $i = i +1$.
  4. Compute the stoichiometric coefficients $\alpha_i$.
  5. Set the mixture function $S(x)$.
  6. Return the mixture $S(x)$.

More details can be found in the book:
Artificial Organic Networks - Artificial Intelligence Based on Carbon Networks (Springer, 2014).

Resources

Here you will find current resources about the AHN-algorithm:

Web Service - Try an online demo of artificial hydrocarbon networks.
Java - Implementation of the AHN-algorithm in Java classes.
MATLAB - Implementation of the AHN-algorithm in M-scripts.
R - Implementation of the AHN-algorithm in R.
LabVIEW - Implementation of the Artificial Organic Networks Toolbox in LabVIEW.