Associating concepts and synsets

At this point, we have on the one hand an ontology for which concept labels are disambiguated and on the other hand possible senses in each HTML pages of the Web site with their relative frequency, and their evaluated convenience. In the next step, indexes are matched with concepts of ontologies. For each sense, we search for the same one in the index. If it exists, concerned web pages and coefficients are added to it.

For the moment, our process gets pages containing concepts of the ontology. However, it does not take the weighted frequency of synsets into account. Consequently, a concept that appears only one time in a page allows this page to be referred by the ontology. For this reason, we added a frequency threshold to consider a concept only if its weighted frequency (section 4.1) is greater or equal than this threshold. In the next section, we will present several indexation processes according to the evolution of the threshold.

To evaluate the appropriatness of an ontology according to a of HTML pages, four typical coefficients are calculated:

Finally, a relevant ontology is an ontology having these coefficients close to 1.0. A high covering degree implies a wide proportion of the pages contain concepts of the ontology. A high direct indexing degree implies a lot of concepts can be found in the pages. A high value for this couple of coefficient is quite important. Namely, we can have a site where only one page contains the ontology (this gives an indexing degree at 1 and a little covering degree). In the same way, all pages can contain a general concept like "Entity" in the head of each page (this gives a weak indexing degree but a covering degree equals to 1).

The indexation process can also highlight indexes, which do not match with concepts of ontologies. In this case, we may search for ontologies related to this index. In the future, one can redo the indexation process either when the site content notably evolves or when the used ontologies are updated. This process can only be executed with modified pages.