This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.
Introduction
As a kind of information representation and shared model, ontology is introduced in nearly all fields of computer science. Acting as a concept semantic framework, ontology works high effectiveness and is widely employed in other engineering applications such as biology science, medical science, pharmaceutical science, material science, mechanical science and chemical science (for instance, see Coronnello et al. [2], Vishnu et al. [3], Roantree et al. [4], Kim and Park [5], Hinkelmann et al. [6], Pesaranghader et al. [7], Daly et al. [8], Agapito et al. [9], Umadevi et al. [10] and Cohen [11]).
The model of ontology can be regarded as a graph G = (V, E), in which each vertex v expresses a concept and each edge e = vivj represents a directly contact between two concepts vi and vj. The aim of ontology similarity calculating is to learn a similarity function Sim : V × V → ℝ+ ∪ {0} which maps each pair of vertices to a score real number. Moreover, the purpose of ontology mapping is to bridge the link between two or more different ontologies based on the similarity between their concepts. Using two graphs G1 and G2 to express two ontologies O1 and O2, respectively. The target is to determine a set Sv ⊆ V(G2) for each v ∈ G1 where the vertices in Sv are semantically high similarity to the concept corresponding to v. Hence, it may compute the similarity S(v, vj) for each vj ∈ V (G2) and select a parameter 0 < M < 1 for each v ∈ G1. Sv is set for vertex v and its element meets S(v, vj) ≥ M. From this perspective, the essence of ontology mapping is to yield a similarity function S and to determine a suitable parameter M according to detailed applications.
There are several effective learning tricks in ontology similarity measure and ontology mapping. Gao and Zhu [12] studied the gradient learning algorithms for ontology similarity computing and ontology mapping. Gao and Xu [13] obtained the stability analysis for ontology learning algorithms. Gao et al. [14] manifested an ontology sparse vector learning approach for ontology similarity measuring and ontology mapping based on ADAL trick. Gao et al. [15] researched an ontology optimization tactics according to distance calculating techniques. More theoretical analysis of ontology learning algorithm can be referred to Gao et al. [16].
In this paper, we propose a new ontology learning trick based on affine transformation. Furthermore, we present the efficiency of the algorithm in the biology and chemical applications via experiments.
Setting
Let V be an instance space. We use p dimension vector to express the semantics information of each vertex in ontology graph. Specifically, let v = {v1, ···, vp} be a vector corresponding to a vertex v. To facilitate the representation, we slightly confused the notations and v is used to represent both the ontology vertex and its corresponding vector.In the learning setting, the aim of ontology algorithms is to yield an vertices can be determined according to the difference between their corresponding real numbers. Obviously, the ontology function can be regarded as a dimensionality reduction operator f : ℝp → ℝ.
In recent years, the application of ontology algorithm faces many challenges. When it comes to the field of chemical and biology, the situation may become very complex since we need to deal with high dimensional data or big data. Under this background, sparse vector learning algorithms are introduced in biology and chemical ontology computation (see Afzali et al. [17], Khormuji, and Bazrafkan [18], Ciaramella and Borzi [19], Lorincz et al. [20], Saadat et al. [21], Yamamoto et al. [22], Lorintiu et al. [23], Mesnil and Ruzzene [24], Gopi et al. [25], and Dowell and Pinson [26] for more details). For example, if we aim to find what kind of genes causes a certain genetic disease, there are millions of genes in human’s bodies and the computation task is complex and tough. However, in fact, only a few classes of genes cause this kind of genetic disease. The sparse vector learning algorithm can effectively help scientists pinpoint genes in the mass disease genes.
One computational method of ontology function via sparse vector is expressed by
where w = {w1, ···, wp} is a sparse vector which is used to shrink irrelevant component to zero and δ is a noise term. Using this model, the key to determine the ontology function f is to learn the optimal sparse vector w.
For example, the standard framework with the penalize term via the l1 -norm of the unknown sparse vector w ∈ ℝp can be stated as:
where λ > 0 is a balance parameter and l is the principal function to measure the error of w. The balance term λ ||w||1 is used to measure the sparsity of sparse vector w.
where ρ ≥ 0 is also a balance parameter. Let $\begin{array}{}
\displaystyle
{\bf{\tilde V}} = (\begin{array}{*{20}{c}}
{\bf{V}} \\
{\sqrt {\rho {\bf{I}}} } \\
\end{array})
\end{array}$ and $\begin{array}{}
\displaystyle
{\bf{\tilde y}} = (\begin{array}{*{20}{c}}
{\bf{y}} \\
0 \\
\end{array})
\end{array}$. Then ontology sparse vector learning problem (4) can be expressed as
Set $\begin{array}{}
\displaystyle
{\scr D} = \left\{ {\theta :\left| {v_i^T\theta } \right| \le 1,i \in \left\{ {1,2, \cdots ,p} \right\}} \right\}
\end{array}$ as the feasible set of ontology problem (8). Obviously, 𝒟 can be regarded as the intersection of a collection of closed half spaces which is a closed convex set, and 𝒟 ≠ ∅ since 0 ∈ 𝒟 .By means of (8), the projection of $\begin{array}{}
\displaystyle
\frac{{\rm{y}}}{\lambda }
\end{array}$ onto 𝒟 is the dual optimal solution θ* which is stated as $\begin{array}{}
\displaystyle
{\theta ^*} = {\mathbb{P}_{\scr D}}\frac{{\bf{y}}}{\lambda }
\end{array}$.
Next, we present our dual framework of ontology problem which can be formulated as a problem of projection. Set
It is not hard to check that the dual optimal solution of ontology problem is the projection of $\begin{array}{}
\frac{{{\bf{\tilde y}}}}{\lambda }
\end{array}$ onto $\begin{array}{}
\bar {\scr H}
\end{array}$.
In the following contexts, we show the equivalent optimization model of our ontology sparse vector problem. Our discussion can be divided into two cases according to whether the value of $\begin{array}{}
{{\bf{\bar v}}_p}
\end{array}$ equals to zero or not.
If $\begin{array}{}
\displaystyle
{{\bf{\bar v}}_p} = 0
\end{array}$, then we can skip the condition $\begin{array}{}
\displaystyle
{\rm{\bar v}}_p^T\theta = 0
\end{array}$ and the ontology framework can be reduced to
It implies that the dual optimal solution of ontology problem is the projection of $\begin{array}{}
\displaystyle
\frac{{{{{\bf{\tilde y}}}^ \bot }}}{\lambda }
\end{array}$ onto the feasible set ℋ⊥. Finally, we have the final version of ontology sparse vector learning problem which has the same optimal solution with ontology problem (20):
In this section, we test the feasibility of our new algorithm via the following four simulation experiments related to ontology similarity measure and ontology mapping below. After obtaining the sparse vector w, the ontology function is given by $\begin{array}{}
\displaystyle
{f_{\bf{w}}}(v) = \sum\nolimits_{i = 1}^p {} {v_i}{w_i}
\end{array}$ in which we ignore the noise term.
Ontology similarity measure experiment on biology data
In biology science, “GO” ontology (denoted by O1 which was constructed in http: //www. geneontology. org, and Fig. 1 presents the basic structure of O1) is a widely used database for gene researchers. Now, we apply this data set for our first experiment. We use P@N (Precision Ratio, see Craswell and Hawking [27] for more details) to measure the effectiveness of the experiment. In the first step, the closest N concepts (have highest similarity) for each vertex was deduced by the expert. Then, in the second step, the first N concepts for each vertex on ontology graph are determined by the algorithm and the precision ratios are obtained. In addition to our ontology learning algorithm, programming from Huang et al. [29], Gao and Liang [30] and Gao et al. [16] are employed to “GO” ontology, and the precision ratios which we inferred from these tricks are compared. Partial experiment results can be referred to Tab. 1.
The experiment results of ontology similarity measure
From Fig. 1, take N = 3,5,10 or 20, the precision ratio in terms of our sparse vector ontology learning algorithm is higher than the precision ratio computed by algorithms by Huang et al. [29], Gao and Liang [30] and Gao et al. [16]. Specially, such precision ratios apparently increase as N increases. Thus, one result can be concluded that the ontology learning algorithm described in our paper is superior to that proposed by Huang et al. [29], Gao and Liang [30] and Gao et al. [16].
Ontology mapping experiment on physical data
Physical ontologies O2 and O3 (the structures of O2 and O3 can refer to Fig. 2 and Fig. 3, respectively) are used for our second experiment which aims to test the utility of ontology mapping. The ontology mapping between O2 and O3 are determined by means of our new ontology learning algorithm and P@N criterion is applied as well to test the equality of the experiment. Huang et al. [29], Gao and Liang [30] and Gao et al. [31] also employed ontology algorithms to “Physical” ontology, and we made a comparison among the precision ratios which we get from four methods. Several experiment results can be referred to Tab. 2.
It can be seen that our algorithm is more efficient than ontology learning algorithms raised in Huang et al. [29], Gao and Liang [30] and Gao et al. [31] in particular when N is sufficiently large.
Ontology similarity measure experiment on plant data
In this part, “PO” ontology O4 (which was constructed in http: //www.plantontology.org. Fig. 4 shows the basic structure of O4) is used to test the efficiency of our new ontology learning algorithm for ontology similarity calculating. This ontology is famous in plant science which can be used as a dictionary for scientists to learn and search concepts and botanical features. P@N standard is used again for this experiment. Furthermore, ontology learning approaches in Wang et al. [28], Huang et al. [29] and Gao and Liang [30] are borrowed to the “PO” ontology in our experiment for comparison requirements. The accuracy by these ontology learning algorithms are computed and parts of the results are compared and presented in Tab. 3.
The experiment results of ontology similarity measure
It’s revealed in the Tab. 3 that the precision ratio in view of our ontology sparse vector learning algorithm is higher than the precision ratio proposed by ontology learning algorithms that Wang et al. [28], Huang et al. [29] and Gao and Liang [30] when N=3, 5 or 10. Furthermore, such precision ratios are increasing apparently as N increases. Therefore, we can conclude that the ontology sparse vector learning algorithm described in our paper is superior to the trick recommended in Wang et al. [28], Huang et al. [29] and Gao and Liang [30].
Ontology mapping experiment on humanoid robotics data
Humanoid robotics ontologies (denoted by O5 and O6, constructed by Gao and Zhu [12], and the structures of O5 and O6 can refer to in Fig. 5 and Fig. 6 respectively) are employed for our last experiment. Humanoid robotics ontologies are used to orderly and clearly express the humanoid robotics, and this experiment aims to determine ontology mapping between O5 and O6. Again, we use P@N criterion to measure the accuracy of data gotten in the experiment. Beside our ontology learning algorithm, ontology algorithms raised in Gao and Lan [32], Gao and Liang [30] and Gao et al. [31] are also applied on humanoid robotics ontologies, and the precision ratios which are obtained from four ontology learning algorithms are compared. Partial experiment results can refer to Tab. 4.
The experiment results presented in Table 4 imply that our ontology sparse vector learning algorithm works with more efficiency than other ontology learning algorithms obtained in Gao and Lan [32], Gao and Liang [30] and Gao et al. [31] especially when N is sufficiently large.
Conclusion
In our paper, an affine transformation based computation technology is considered and presented to the readers. This ontology technology is suitable for biological and chemical ontology engineering applications because of its similarity measure and ontology mapping. The main approach is based on affine transformation and its theoretical derivation. At last, simulation data show that our ontology scheming has high efficiency in biology, physics, plant and humanoid robotics fields. The ontology sparse vector learning algorithm raised in our paper illustrates the promising application prospects in multiple disciplines.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.