CISMeF: Catalog and Index of French-language Health Internet resources. A quality-controlled subject gateway

[CISMeF:homepage] [French version]

 

 


CISMeF Latest Presentation 
Slides
Stéfan J. Darmoni 2009

Document witten by the CISMeF team.

Abstract

In 2007, the Internet is already a major source of health information. CISMeF ([French] acronym for Catalog and Index of French Language Health Resources on the Internet) is a quality-controlled health gateway to catalog and index the most important and quality-controlled sources of institutional health information in French. The objective of CISMeF is to assist health professionals and consumers in their search for electronic information available on the Internet. In December  2007, the number of indexed resources totaled over 41,300 with a mean of 80 new resources each week. CISMeF was  initiated by the Rouen University Hospital (RUH) in February 1995. Its Universal Resource Locator (URL) are http://www.chu-rouen.fr/cismef & http://www.cismef.org. CISMeF uses two standard tools for organizing information: the Medline bibliographic database MeSH thesaurus and several metadata element sets, including the Dublin Core. Resources included in CISMeF are described by the following: title, author or creator, subject and keywords, description, publisher, date, resource type, format, identifier, and language. To index resources, CISMeF uses four different concepts: "meta-term", keyword, subheading, and resource type. CISMeF contains a thematic index, including medical specialities and an alphabetic index. CISMeF respects the Net Scoring and the HON Code, criteria to assess the quality of health information on the Internet. The CISMeF project offers a valuable tool for the French-speaking health community: over different 30,000 computers visit the Web site each working day.

MeSH keywords: Abstracting and indexing; Cataloging; France; Internet; MEDLINE; National Library of Medicine (U.S.); Subject Headings; Support, Non-U.S. Gov't; Vocabulary controlled

1. Introduction

In 2007, the Internet can now be considered as a major source of scientific and health information [1]. For the healthcare professional and the health consumer, the access of accurate information on the Internet is not easy; therefore, there is a profusion of directories and search engines available in this new media [2]. However, directories, such as Nomade [http://www.nomade.fr/], Carrefour [http://www.carrefour.net/] or Yahoo [http://www.yahoo.com], or search engines, such as Google  [http://www.google.com] do not permit the end-user to obtain a clear and organized presentation of the available health information. This limits the use of these tools. There is not a catalogue of resources in a medical speciality, e.g., in neurology. These directories or search engines contain a large number of sites but the organization and the hierarchy of the data is not adapted to health and medicine. In this field, there is a specific need for a specialized classification. The up and down consultation of the tree in this classification permits the acquisition of greater information.

Furthermore, the need of a rigorous human classification is mandatory to coherently organize heterogeneous resources, such as patient associations, electronic journals, mailing lists, clinical guidelines. In a previous study, we demonstrated that manually catalogues vs. search engines were less sensitive but far more specific [3]. The main problem for the end-user is therefore to find useful information. Automated indexing has other drawbacks: difficulty in indexing non-text media, problems with word indexing (contact, syntax, morphology, content, polysemy, synonyms and granularity), lack of structured information (keywords, controlled thesaurus, metadata e.g., resource types).

In this context, several quality-controlled health gateways have been developed]. Quality-controlled subject gateways were defined by Koch [21] as Internet services which apply a comprehensive set of quality measures to support systematic resource discovery. Considerable manual effort is used to process a selection of resources which meet quality criteria and to display an extensive description and indexing of these resources with standards-based metadata. Regular checking and updating ensure optimal collection management. The main goal is to provide a high quality of subject access through indexing resources using controlled vocabularies and by offering a deep classification structure for advanced searching and browsing. CISMeF ([French] acronym for Catalog and Index of French Language Health Resources on the Internet) is a quality-controlled health gateway to catalog and index the most important and quality-controlled sources of institutional health information in French. The objective of CISMeF is to assist health professionals and consumers in their search for electronic information available on the Internet. In December  2007, the number of indexed resources totaled over 41,300 with a mean of 80 new resources each week. CISMeF was  initiated by the Rouen University Hospital (RUH) in February 1995. Its Universal Resource Locator (URL) are http://www.chu-rouen.fr/cismef & http://www.cismef.org (S.J.D. and B.T are the two co-webmasters).

The scope of CISMeF covers healthcare disciplines and medical sciences. This means than besides medical doctors which were historically the first target, other health professionals, such as nurses, midwives, veterinarians, physiotherapists, nutritionists, will find resources devoted to their professions. More recently, patients and the general public has found valuable and reliable health information specially written for them. CISMeF describes and indexes a large number of health resources. The main health areas of the indexed resources in CISMeF are:

This resource guide is necessary because (a) there is an extensive amount of information potentially accessible for the health professional; (b) it is often difficult to easily separate the information for the health professional from patient information; and above all (c) the absolute requirement in medicine is to know the source and the quality of the information available in the Internet.

In 2007 the CISMeF team is composed of four medical librarians, two medical informaticians, two research engineers, and three PhD students majoring in Computer Science.

2. Material and methods

2.1 Hardware and software

CISMeF was implemented in February 1995 on a Sun running Sun Unix. We changed the machine and the operating system in October 1999 from Unix to Linux. CISMeF is a Web site currently using the Apache http server. The Webtrends Log Analyzer, version 4.5.2. programs evaluate the use of the Web page after excluding requests by our Hospital Information System (3,500 computers).

2.2 Standards

From 1995 to 1999, CISMeF was entirely based on static HTML. Since June 2000, we developed Doc'CISMeF which is a search tool. We had originally used the HTML 2.0 standard, in order to be readable by the vast majority of Web browsers. We are now using the newer HTML 3.2 standard and  XML. 

To organize the information, CISMeF uses two standard tools for organizing information: a controlled vocabulary which 'encapsulates' the MeSH (Medical Subject Heading) thesaurus from the Medline bibliographic database [4] (US National Library of Medicine) and several  element sets of metadata, including the Dublin Core metadata format [5, 19], IEEE 1484 LOM & HIDDEL. The Dublin Core. Resources included in CISMeF are described by the following: title, author or creator, subject and keywords, description, publisher, date, resource type, format, identifier, and language. The use of specific metainformation is crucial in order to improve the recall and precision of internet searches 26. As proposed by Hoelzer et coll. 26, CISMeF uses XML and RDF to meet these requirements. This structure enables us to place the project at an overlap between the actual informal Web and the forthcoming Semantic Web.

We have organized our catalogue with the MeSH thesaurus [4], which contains around  25,000 terms in its 2008 version and eleven levels of classification. The MeSH was selected because it responds to the aims of the medical librarians and it is well known by the health professionals. This thesaurus is precise, rigorous and annually updated. We also use the French translation of this thesaurus [http://dicdoc.kb.inserm.fr:2010/basismesh/mesh.html], performed by the French Medlars Center, the National Institute for Health and Medical Research (INSERM, and more specifically the DIC-DOC Network). In some rare cases, one resource cannot be perfectly indexed using the MeSH thesaurus: we are using "manual mapping" to search the nearest MeSH term, e.g. a resource coping with dysmelia has been indexed with the MeSH term ectromelia. Thanks to the VUMeF project, we have added 8,000 CISMeF French synonyms to the MeSH thesaurus and translated 5,000 MeSH Scope Notes to French. All these specific information in the CISMeF MeSH Terminology Server, which is also a cross-lingual tool to acesss PubMed [25]. 

The MeSH terms are organized into hierarchies going from the most general on at the top of the hierarchy to the most specific in the bottom of the hierarchy. For example, the MeSH term hepatitis is more general than the MeSH term hepatitis viral A. The qualifiers, also organized into hierarchies, allowing to specify which particular aspect of a keyword is addressed, and then to focus on a sub-field of the keyword. For example the association of the keyword hepatitis with the qualifier diagnosis (noted hepatitis/diagnosis) restrict the hepatitis to its diagnosis aspect. The “is-a” relations between concepts are extracted from the MeSH text files to define the subsumption relationships in the CISMeF keywords hierarchy.

However, the MeSH thesaurus was originally intended to index scientific articles for the Index Medicus and for the MEDLINE database. In order to customize it to the broader field of health Internet resources, we have been developing several enhancements [22] to the MeSH thesaurus, with the introduction of two new concepts, respectively metaterms (MT) and resource types (RT). The CISMeF terminology is shown in Figure 1. CISMeF resource types (RT) are an extension of the publication types of MEDLINE.

The CISMeF terminology is exploited for several tasks: resource indexing performed manually, resource categorization performed automatically, visualization and navigation through the concept hierarchies and a CISMeF Terminology Server (URL: http://www.chu-rouen.fr/terminologiecismef/) and information retrieval using the Doc’CISMeF search engine.

Figure 1: Semantic links between CISMeF metaterms and MeSH terms, MeSH subheadings and CISMeF resource types

2.2.1 Metaterms

CISMeF metaterms [9] correspond to medical specialties (e.g. cardiology), types of medical procedures (e.g. surgery) or health topics (e.g. diagnosis, therapy) , which has semantic links with one or more MeSH terms, subheadings and RTs.

In fact, the idea of creating meta-terms came up to optimize information retrieval (in particular to maximize the recall) in CISMeF (Doc’CISMeF search engine; URL: http://doccismef.chu-rouen.fr/servlets/Simple) and to cope with the relatively restrictive nature of these medical specialties as MeSH terms, when searching 'guidelines in cardiology' or 'databases in virology', where cardiology and virology are metaterms and guidelines and databases are resource types. The MeSH thesaurus does not allow to have a global vision of a medical specialty. Therefore, in the CISMeF terminology, metaterms can be considered as “metaconcepts”. Metaterms have been manually selected by the chief medical librarian (BT). The semantic links between metaterms and MeSH terms, MeSH subheadings and CISMeF resource types are based on his know-how and expertise of medical specialists of the Rouen University Hospital. There is a 0 to N relations between CISMeF metaterms and MeSH terms, MeSH subheadings and CISMeF resource types

 

Each metaterm has a semantic link with the corresponding MeSH term, e.g. the metaterm cardiology has a semantic link with the MeSH term cardiology. For instance, the queries 'guidelines in cardiology' and 'databases in psychiatry' where cardiology and psychiatry are only MeSH keywords get few or no answers.

Introducing cardiology and psychiatry as metaterms is an efficient strategy to get more results because instead of exploding one single MeSH tree (e.g. psychiatry as a MeSH term), using metaterms results in an automatic expansion of the queries by exploding other related MeSH, such as psychiatric hospital that belongs to a completely different tree structure within the MeSH, or CISMeF trees (for resource types) as well as the current tree (e.g. psychiatric hospital as a MeSH keyword or mental health dispensary as a resource type will be exploded in the case of the psychiatry query). In example, the metaterm psychiatry has the following semantic links: [MeSH Terms] "behavioral symptoms”; "community mental health centers"; "diagnostic and statistical manual of mental disorders"; "hospitals, psychiatric"; "mental disorders"; "mental health services"; "mentally ill persons"; "psychiatric department, hospital"; "psychiatric somatic therapies"; "psychiatric status rating  scales"; "psychiatry"; "psychological techniques"; "psychophysiologic disorders"; "psychotherapy"; "psychotropic drugs" "schizophrenic psychology"; [CISMeF resource types] "community mental health centers"; "hospitals, psychiatric";

 

In January 2007, the number of metaterms in the CISMeF terminology was 110. The comprehensive list of metaterms is available at the following URL: http://doccismef.churouen. fr/liste_des_meta_termes_anglais.html

Major Topics exist in the Medline database and the CISMeF catalogue for keywords and qualifiers. A term is said to be “major” if the concept it represents is discussed throughout the whole document, or on the contrary "minor" if it is referred to only in a few paragraphs. Major terms are marked in Medline & CISMeF by a star. In CISMeF, Major Topics are extended to resource types and metaterms. This task is manually performed by CISMeF medical librarians for resource types, and automatically performed for metaterms : a metaterm is “major” for a CISMeF resource if and only if at least one keyword, qualifier or resource type semantically linked to this metaterm is major for the same CISMeF resource (otherwise, the metaterm is minor). In a comparative study performed by Abad Garcia et al. [29] among six European health gateways. Although CISMeF was rated second, it has been criticized because “failure on precision may be due to exhaustive indexing” [29]. To optimize the precision of our health gateway, we have introduced a major modification of the CISMeF information retrieval algorithm: when a query will be mapped to one or several terms of the CISMeF terminology (CISMeF metaterms, MeSH terms, MeSH subheadings, CISMeF resource types), the resources with Major Topics will be first displayed (e.g. in case of the following query 'guidelines in cardiology', resources with Major Topic cardiology as a metaterm and Major Topic guideline as a resource type will be first displayed).

The optimization of the information retrieval was proven in 2007 by a formal study of Gehanno et al [28] that showed that the corresponding MeSH terms had the same precision than the eponym metaterm, when the recall of the MeSH term was 0.44 where the metaterm recall was 1. To construct a taxonomy of medicine, the publishing division of American Medical Association (AMA) took as its precedent the simplified access to MeSH via CISMeF metaterms [27].

Meta-terms had semantic links with the three other levels of CISMeF terminology: MeSH terms, qualifiers and resource types. Each metaterm has a semantic link with one or more keywords, subheadings and resources types. Each term of the CISMeF terminology can have a set of synonyms and can belong to several trees. For example, on the "oncology" page, the sites of general interest on this specialty were indexed and described, using the MeSH keyword "medical oncology", followed by a list of starting points of MeSH terms. The MeSH terms are: (a) antineoplastic agents, (b) medical oncology, (c) neoplasms, (d) tumors markers, biological, and (e) oncology service, hospital. The subheading is: "secondary" and the resource type is: "oncology service, hospital". The controlled list of meta-terms is available at the following URL http://www.chu-rouen.fr/ssf/santspe.html.

2.2.2. Resource types

As defined by the Dublin Core Metadata Initiative (URL: http://www.dublincore.org/documents/dcmi-terms/), a RT is used to categorize the nature or genre of the content of the resource. MeSH (term/subheading) pairs describe the topic of the resource. RT is one of the fifteen Dublin Core repeatable and optional elements. Compared to the publication types of MEDLINE, the CISMeF RTs are more diverse, with specific RTs dedicated to electronic health resources, such as association, patient information, community networks, or clinical guidelines. For example, in the case of a clinical guideline about carbon monoxide intoxication, ‘carbon monoxide poisoning’ is the MeSH term and ‘clinical guidelines’ is the resource type. CISMeF RTs are organized similarly to MeSH terms and subheadings, in a hierarchical structure with subsumption relationships (allowing the explode property) and a maximum of five-level depth. The MEDLINE publication types were mainly a flat list till 2005 (see URL: http://www.nlm.nih.gov/mesh/pubtypes2005.html). Since 2006, MEDLINE publication types has also a hierarchical structure.

 

In preparation of the PhD thesis aiming at the automatic indexing of bimodal resources (text + image) in CISMeF, we have introduced new resources types, which are all medical image types. The medical image types list (n=112) are in majority MeSH terms from the diagnostic imaging term and its sub-tree composed of narrower terms. The overall number of resources types in December 2007 is 295. The controlled list of is available at the following URL: http://www.chu-rouen.fr/documed/typeeng.html. This RT list was manually built and maintained by the CISMeF team since 1997.

Nonetheless, this list is largely inspired by the MeSH thesaurus as 76% are deliberately ambiguous because they are also MeSH terms (e.g. magnetic resonance imaging). The objective of this ambiguity is to maximize the recall then the search answers (which means the Doc’CISMeF search ORes the answers for the MeSH term and the answers for the RT) when the request contains this kind of ambiguous term. Furthermore, to be as close to a standard as possible, 11% are also MEDLINE publication types (e.g. technical report).

 

The medical image types list has been reviewed and updated manually by one medical imaging expert (JND). Medical image types are used in the CISMeF database to index a health resource containing images. For example, a teaching resource about choledocholithiasis, which includes ultrasonography images, will be indexed with the RTs ultrasonography and teaching material.

If this teaching resource contains a paragraph describing the ultrasonography of the choledocholithiasis, the librarian will use the MeSH (term/subheading) pair (choledocholithiasis/ultrasonography). If necessary, two others subheadings can be used: ‘radiography’ and ‘radionuclide imaging’.

In the case of other medical image types, MeSH terms will be used. For example, if the resource about choledocholithiasis includes text about magnetic resonance imaging, it will be indexed with the two MeSH terms 'choledocholithiasis' and 'magnetic resonance imaging'. If there is an image of magnetic resonance imaging, it will be indexed with the RT 'magnetic resonance imaging'.

Affiliation of resource type

Until the introduction of images types, we did not think to use the existing RTs for affiliation to a specific aspect of MeSH term or (term/subheading) pairs [30]. In April 2004, the creation of medical image types led us to propose a refinement of the CISMeF terminology and thus a refinement of manual indexing procedures: for a number of specific RTs, mainly images types but not exclusively (e.g. multiple choice quiz also applies here), we proposed to affiliate a RT to a MeSH term or to a MeSH (term/subheading) pair. Thus we can obtain a [MeSH (term/subheading)\CISMeF RT] triplet, where the backslash character ‘\’ represents the RT affiliation (the slash character / represents in MEDLINE the affiliation of subheading to a term). This approach can be viewed as an extension of the affiliation of a subheading to a MeSH term.

Therefore, a teaching resource about choledocholithiasis, which includes ultrasonography images, should be indexed with the (term\RT) pair (choledocholithiasis\ultrasonography). If the teaching resource specifies for example that ultrasonography images are valuable for the diagnosis of choledocholithiasis, the [MeSH (term/subheading)\CISMeF RT] triplet [(choledocholithiasis/diagnosis)\ultrasonography] should be used. Another triplet can be viewed in the following description of a resource indexed in the CISMeF catalogue; the triplet “atrial fibrillation/diagnosis\electrocardiographyindicates that this resource contains an EKG image to contribute to the diagnosis of atrial fibrillation.

Atrial fibrillation (The) –

French pre-residency program examination : question 236 –

Pr Vanzetto G (Grenoble University), M. Defaye P. (Grenoble University)

[publisher : Joseph Fourier University, Medical School ; definition, etiology, physiopathology, anatomical sequels, diagnosis, evolution and prognosis, treatment ; printable version, references, pre-necessary, exercises ; language : French ; format : xml ; access : free ; date : 2005 ; visited on November 2006]. Grenoble-Fr

keywords : cardiology / education ; *atrial fibrillation ; atrial fibrillation / diagnosis \ electrocardiography ; atrial fibrillation / etiology ; atrial fibrillation / physiopathology ; atrial fibrillation / therapy ; prognosis ; signs and symptoms

 

2.3 Methodology and realization of the catalogue

CISMeF is a quality controlled subject gateway. Koch defines a quality controlled subject gateway as an Internet service, which apply a set of quality measures to support systematic resource discovery. Considerable manual effort is used to secure a selection of resources, which meet quality criteria and to display a rich description on these resources with standard-based metadata. Regular checking and updating ensure good collection management. A main goal is to provide high quality criteria of subject access through indexing resources using controlled vocabularies and by offering a deep classification structure for advanced searching and browsing.

Each following elements proposed by Koch, which characterize a typical quality-controlled subject gateway are implemented in CISMeF:

To allow interoperability with other Internet services, gateways apply open standards. CISMeF uses two standard tools for organizing information: the MeSH (Medical Subject Heading) thesaurus from the US National Library of Medicine, and several metadata element sets: (a) the Dublin Core metadata format to describe and index all the health resources included in CISMeF, (b) some elements from IEEE1484 Learning Object Metadata for teaching resources, (c) specific metadata for evidenced-base medicine resources which also qualify the health content, and (d) the HIDDEL metadata set will be used to enhance transparency, trust and quality of health information on the Internet in the EU-funded MedCIRCLE project.

2.3.1 Why index only French-speaking Health resources

In November 1994, when the RUH was first connected to the Internet, we rapidly observed the absence of a specific catalogue of French-speaking health resources. On the contrary, several very good catalogues for English-speaking resources such as DDRT (Diseases, Disorders and Related Topics [http://www.mic.ki.se/Diseases/index.html], Medical Library and Medical Information Center, Karolinska Institute, Stockholm, Sweden) or MedWeb, Health Science Center Library of Emory University-Us [http://www.medweb.emory.edu/MedWeb/] were already in operation (see also our selection of these catalogues in the following URL: http://www.chu-rouen.fr/ssm/listemed.html). Therefore, since its creation, CISMeF has only catalogued and indexed French-speaking health resources, independent of its origin.

The CISMeF method entails a four-fold process: resource collection, filtering, description and index. One deputy medical librarian (J.P.) performs the resource collection and the information watch. The editorial boards filters and selects the resources. Three  deputy medical librarians (Catherine Letord, Gaétan Kerdelhué & Josette Piot.) describe and index resources. The chief medical librarian (B.T.) is a ‘super-indexer’ in charge of checking the indexing. 

From 1995 to 2006, CISMeF was exclusively  manually indexed by a team of four indexers, which are medical librarians. There is regular meetings with one medical informatician (S.J.D.) for double-checking.

Since 2002, automatic indexing tools were developed in the CISMeF team using primarily natural language processing (NLP) and K-nearest neighbours (KNN) methods [23] , followed by a more simple bag of words algorithm [24]. The latest was successfully evaluated in the context of teaching resources. Then the CISMeF team has decided to use this algorithm in the daily practice for most of the Internet resources (except guidelines which are still manually indexed because the type of resources need in depth indexing). Three levels of indexing were defined in the CISMeF catalogue: (a) level 1 or Core-CISMeF (N=17,806): totally manually indexed resources (e.g. guidelines); (b) level 2 or supervised resources (N=5,257): these resources are rated by the CISMeF editorial board as less important than level 1. Then, these resources do not need in-depth indexing (e.g. technical reports, teaching resources designed at the national level, document for patients from medical specialties). Supervision mean that these resources are primarily automatically indexed, then this indexing is reviewed by a CISMeF medical librarian; (c) level 3 or automatically indexed resources (N=18,023). The CISMeF editorial board has rated these resources as less important than level 1 and level 2 (e.g. teaching resources designed at the medical school level, patient association Web sites).

Then, the CISMeF wanted to create a level 4, which could be defined as the exhaustive automatically indexed pages from the CISMeF publishers. The latest can be defined as the publishers that have at least one Internet resource included in the CISMeF catalogue. Instead of reinventing the wheel, we have decided to use a customized version of a generic search engine (Google): Google Coop.

2.3.2. Resource collection

The resource collection is performed on a daily basis and is partially automated. French-speaking directories and search engines are checked, such as Carrefour, Ecila, Eureka, Francite, Nomade, Toile du Quebec, especially their "what's new" pages (see http://www.chu-rouen.fr/documed/docum.html#VEILLE). A total of 1,779 health webmasters have sent us an Email or a specific form to be indexed in CISMeF. Indexing priority is given to Internet sites of institutions and scientific societies. Resources include sites and high quality documents, issued especially from evidence-based medicine, practice guidelines, consensus development conferences, teaching and education resources and consumer health information.

2.3.3. Filtering and selection: how the information is validated

In order to include only reliable resources, CISMeF uses the main criteria (e.g. source, description, disclosure, last update) of the Net Scoring [6] and of  the HON Code to assess the quality of health information on the Internet. There are 49 criteria, which fall into eight categories: credibility, content, hyperlinks, design, interactivity, quantitative aspects, ethics and accessibility. All the criteria were chosen by an expert consensus. Some of these criteria have been inspired by a US white paper [7] and some criteria are included in Dublin Core metadata format. The description of a site should permit the evaluation of the quality of its content. Some resources are not introduced in the catalogue because they don't respect basic, particularly ethical criteria. The quality of health information is a key point to consider specifically for the patients and their families.

2.3.4. Description and index

Cataloguing a site is necessary because it helps end-user to estimate, in advance, the type of information and to evaluate its content. This process also saves time for the end-user.  

Till 2001, In CISMeF, each keyword was "de facto" a MeSH Major Topic with a mean of 1.87 MeSH term per resource. Since 2001, each keywors as in Medline can be used in CISMeF as major or minor with a mean of 2.5 MeSH term and a mean of 1.61 qualifier per resource. MeSH subheading permits a focus on a sub-field of a MeSH term, e.g., chloride/toxicity. We also use a French translation of the MeSH subheadings [http://dicdoc.kb.inserm.fr:2010/basismesh99/ind_fr_eng.html], which is less systematically used in CISMeF compared to Medline. CISMeF resource type [8] is a generalization of the publication type of Medline. We have added types which are specific of the resources available on the Internet, such as association, patient information, community networks. The controlled list of resource types is available in Table 1. Resource type describes the nature of the resource and MeSH describes the subject of the resource. For example, in case of a clinical guideline about carbon monoxide intoxication, ‘carbon monoxide poisoning’ is the MeSH keyword and ‘clinical guidelines’ is the resource type. In CISMeF, each description also contains the geographic localization, including the city, the province or state, and the country of origin.

Example of a description of a document indexed in CISMeF:

Alcool et risque de cancers Etat des lieux des données scientifiques et recommandations de santé publique [site éditeur INCA Institut National du Cancer] introduction générale, métabolisme de l'alcool et polymorphismes génétiques associés, alcool et cancers des coies aérodigestives supérieures (VADS), alcool et cancer du foie, cancer du sein, cancer colorectal, autres cancers, enjeux de santé publique, recommandations, note épidémiologique, annexe ; 60 pages [langue : français ; format : html ; accès : gratuit et libre ; site non parrainé ; daté du 01/11/2007 ; visité le 13/12/2007]. -Fr
mots clés : *boissons alcoolisées /effets indésirables ; éthanol /effets indésirables ; éthanol /métabolisme ; *facteur risque ; *troubles liés à l'alcool ; troubles liés à l'alcool /épidémiologie ; troubles liés à l'alcool /génétique ; troubles liés à l'alcool /prévention et contrôle ; *tumeurs ; tumeurs colorectales /étiologie ; tumeurs de la tête et du cou /étiologie ; tumeurs du foie /étiologie ; tumeurs du sein /étiologie
type : *rapport technique ; *recommandation de santé publique

 2.3.5. Structure of the CISMeF catalogue

CISMeF contains a thematic index, including medical specialties and biological sciences  [http://www.chu-rouen.fr/ssf/santspe.html] and an alphabetic index [http://www.chu-rouen.fr/ssf/santpath.html] (see Figure 1: CISMeF homepage screenshot). Both indexes use the Medline thesaurus. A brief description of each site indexed in CISMeF is systematically added.

In CISMeF, each MeSH term corresponds to a HTML static document, which is organized first with MeSH subheading, then for each subheading, the resource types (e.g  CISMeF mercury resources screenshot).

The alphabetic index uses the MeSH terms in English and their French translation, which permits bilingual search. The thematic index contains 112  meta-terms, which are in France medical specialties, such as Aerospace Medicine. In addition, a general index is also available which contains MeSH synonyms in French and allows permuted utilization . Currently, we have indexed over 41,000 resources with over 15,800 MeSH terms. A mean of 80 resources are indexed each week.

2.3.6. CISMeF Metadata element set

Metadata are documentation about documents and objects or structured information about information. In the Internet, metadata refers to descriptive information about the Web resources and is used to improve information retrieval. Metadata represents the content, structure and logistical information of any information objects including electronic resources. Metadata is used for data discovery and control of data. Metadata can describe information resources. Metadata is important for customization as it helps selecting suitable resources for a particular user and his/her particular needs. Metadata can greatly enhance information retrieval, and enable accurate matches to be done, while being totally transparent and invisible to the use, but this depends on the quality of the metadata we are using.

The DCMI is a metadata element set intended to facilitate the discovery of electronic resources (http://dublincore.org). Originally conceived for an author-generated description of Web resources, the DCMI is now used by museums, libraries, government agencies, and commercial organizations alike.

There is a need for an interoperable infrastructure for Digital Libraries, quality-controlled subject gateways, and other Web-based services that rely on cross-institutional and cross-border co-operation. Agreement on a metadata standard that serves as a starting point for information exchange in specific domains and provides common ground for cross-domain interoperability is a crucial element of this infrastructure. The main metadata standard with this cross-domain perspective is the Dublin Core, now recommended across Europe for use in many sectors as the standard of choice to ensure interoperability between resource discovery systems on the Internet.

There exist two types of metadata:

Stand-alone metadata (annotations) are external comments notes remarks that can be attached to any Web document or a selected part of the document. They are classified as metadata as they give additional information about an existing piece of data (the annotated resource).

In CISMeF we do not impose any structural changes on the peripheral resources or their hosting servers: after collecting and filtering the resources, the selected resources are indexed and described by an annotation (an HTML or XML file) created by the librarians. The metadata allows the indexing of the resource by its informational contents. A set of keywords, qualifiers and resources types, according to the CISMeF terminology allows a content indexing.

The metadata in CISMeF is composed by several elements:

2.3.6.1 The Dublin Core Metadata Initiative (DCMI)

The Dublin Core Metadata Initiative (DCMI) is a project from the Online Computer Library Center (OCLC) and the National Center for Supercomputing Applications (NCSA). The DC conceived at a workshop convened by OCLC and NCSA in 1995 in Dublin, Ohio, USA. This 15-element set emerged from a series of international invitational workshops that have been held since 1995, at which broad consensus was reached among experts in resource description, networking, encoding standards, information retrieval and a range of subject disciplines. DCMIA is intended to aid discovery for electronic resources. The current DC describes the following elements covering resource content, intellectual property and instantiation: title, creator, subject (using a controlled vocabulary), description, publisher, contributor, date, type format, identifier, source language, relation, coverage and rights. Each element is optional and repeatable.

The DCMI is a metadata element set intended to facilitate the discovery of electronic resources (http://dublincore.org). Originally conceived for an author-generated description of Web resources, the DCMI is now used by museums, libraries, government agencies, and commercial organizations alike.

The construction of an interdisciplinary, international consensus around a core element set is the central feature of the DCMI which benefits from active participation and promotion in over 20 countries in North America, Europe, Australia, and Asia. The DCMI is intended to be used by non-cataloguers as well as resource description specialists.

DCMI describes two board classes of DC qualifiers:

The construction of an interdisciplinary, international consensus around a core element set is the central feature of the DCMI which benefits from active participation and promotion in over 20 countries in North America, Europe, Australia, and Asia. The DCMI is intended to be used by non-cataloguers as well as resource description specialists.

Resources included in CISMeF are described by 11 of 15 items taken from version 1.1 of the DCMI (http://dublincore.org/documents/dces/). These are: author or creator, date, description, format, identifier, language, publisher, resource type, rights, subject and keywords, and title. CISMeF does not use the 4 other DCMI items (contributor, coverage, relation, source).

To capture more information for each health resource indexed in CISMeF, another element set was developed locally to meet specific search and retrieval needs. The following eight fields are added in the data and metadata and are specific to CISMeF: institution, city, province or state, country, target or audience, type of access, cost and sponsorship. Some of these fields (e.g.. cost) are also present in LOM.

Since 2000, CISMeF also includes a database and a search tool, which generates an HTML (or XML) page for every indexed resource.  These metadata elements were manually written and updated by the CISMeF team from 1995 to 1999 and currently automatically created and updated from the CISMeF database.

2.3.6.2 The IEEE 1484 Learning Objects Metadata (LOM)

The IEEE 1484 Learning Object Metadata (LOM) (http://ltsc.ieee.org/doc/wg12/LOM3.6.html) version 3.6 contains over 60 elements in the following nine categories: General,. Lifecycle, Meta-metadata, Technical Educational, Rights, Relation, Annotation, Classification. LOM metadata includes the 15 DCMI elements.

CISMeF is one of the search tool of the French Medical Virtual University (FMVU) Consortium which includes 8 Medical Schools. This consortium was created to experiment various tools and methods necessary to build a virtual university (http://www.umvf.prd.fr). To describe and index teaching resources, this consortium has decided to use in its search tools only the 11 elements of the LOM Educational category because they are the most specific. Also, a feasibility study showed that: the CISMeF team spends an average of 30 minutes to describe and index a teaching resource with the Dublin Core set and needs 30 minutes more for the LOM Educational subset. The field cost of the LOM is used in CISMeF.

2.3.6.3 The Evidence Based Medicine Metadata

CISMeF uses two specific metadata elements for EBM resources and more broadly ‘sensitive’ information. Sensitive information is defined as information found in documents published on the Internet, which could be used in a medical decision: These two metadata elements are: (a) indication of level of evidence which we proposed to be the main criterion chosen for the quality of the health information content and (b) the method used to calculate the level of evidence as more than twenty are currently used in the literature. CISMeF is a quality-controlled health gateway to explicitly indicate if level of evidence is mentioned for each indexed 'sensitive' document. Furthermore, this criterion is searchable using the Doc'CISMeF search tool.

Example:

asthma[MeSH term] AND guidelines[ressource type] => 42 resources

see

 http://doccismef.chu-rouen.fr/servlets/Simple?Mot=asthma+guidelines&aff=4&tri=20&datt=1&debut=0

(asthma[MeSH term] AND guidelines[ressource type]) LIMIT to those explicitly indicating the level of evidence => 7 resources

 http://doccismef.chu-rouen.fr/servlets/Logique?Mot=asthma.mc&chreco=1&chpreuve=1&moisdeb=0&anneedeb=0&moisfin=0&anneefin=0&aff=4&tri=20&datt=1&debut=0&rechercher.x=41&rechercher.y=19

2.3.6.4 The HIDDEL (Health Information, Disclosure, Description and Evaluation Language) metadata

HIDDEL is a standard vocabulary/metadata language developed in the MedCERTAIN project. HIDDEL is designed to be used by: information providers to describe and disclose properties of e-health services (self-rating) and third parties e.g. by subject gateways to express third-party opinions about health information providers.

CISMeF is a member of the MedCIRCLE project which is a collaboration of trusted European health subject gateways, medical associations, accreditation, certification, or rating services, which share the common goal of evaluating, describing, or annotating health information. This project began in March 2002 and will last 18 months.

As a quality-controlled subject gateway, CISMeF will use HIDDEL only as a third-party. Some elements of the HIDDEL are close to Dublin Core (e.g. HIDDEL.Identity and DC.Author) but these elements will be repeated in the two metadata element sets to allow multiple interoperability. Most of the HIDDEL elements are common with the Net Scoring previously used by CISMeF and some are already present in the CISMeF database (e.g. HIDDEL.policies).

This metadata element set will be useful for cross-searching distributed and heterogenous subject gateways. We have successfully tested the interoperability of the CISMeF metadata element set with the FMVU e-learning platform using the XML version of CISMeF resource pages (Example: http://doccismef.chu-rouen.fr/xml/00008637.xml including the DTD http://doccismef.chu-rouen.fr/xml/dtdNL.dtd).

3. Results

CISMeF is efficient and an end-user-friendly solution to find French-speaking worldwide health resources on the Internet. Seventy per cent of these resources are located in France, 16% are from Canada, in particular the Quebec Province, 4% from Switzerland and Belgium and 3% from Africa. For more information, see our internal statistics and external statistics via www.alexa.com

This Web site is principally and initially oriented for the health professional, although the general public may also have access to it. Many sites are devoted to both. There are no HTML documents with restricted access in the CISMeF Web site. Thus our traditional "end-users" are now not only healthcare practitioners but also patients, their families and anyone seeking health information [9]. Training should take into consideration the information needs of the lay person as well as those of the medical professional. Training sessions on the use of our catalogue have been offered to Patients' Associations, especially to handicapped people at RUH since February 1999.

CISMeF has three priority axes: evidence based medicine, teaching and patient information. CISMeF also includes a list of clinical guidelines and consensus development conferences, hospitals, medical universities, health institutions, medical libraries, medical publishers, electronic journals, electronic textbooks, databases, teaching and CME, mailing lists, research laboratories and institutes, pharmaceutical firms, health and patient associations, and commercial companies in the health sector.

Since February 1995, some new features have been added to optimize the navigability and the access to the information for the end-user: (a) use of an internal search engine (full-text and boolean searches), (b) a general index, and (c) a "what's new" page to easily display the newly indexed sites on a weekly basis. Since January 1997 it has also included an archive of the what's new pages. Two guides to use CISMeF are also on line, one for basic search and one for advanced search. CISMeF is accessible by the lowest common denominator of current browser technology.

3.1. Use patterns of the Web site

Use of the Web site increased in an approximate linear progression with time starting in February 1995. Our Web server software, which provides documents to users on request, does not know the identities of individual users, such as E-mail; the only identifying data available are the Internet IP addresses of the machines from which the users connect to the site.

Analysis of a representative period, the month of November 2007, showed that every working day approximately 30,000 unique machines visited our site (excluding ours). For more information, see our internal statistics and external statistics via www.alexa.com

We also use the following indicator (WIF Web Impact Factor [10]) to measure the current impact and potential ongoing future usage of CISMeF: number of sites, which have at least one hyperlink to our site [http://www.chu-rouen.fr/dsii/html/pointeur.html]. Currently, our Web impact factor is over 800 (Altavista indicates more than 2,800 pages after exclusion of our internal links), including the most prestigious Health resource catalogues. Approximately 130 press articles released information about our Web site [http://www.chu-rouen.fr/dsii/html/presse.html]. In March 2007, the daily newspaper "Le Monde" was rating CISMeF as n°5 as a health Web site in France

4. Discussion

The Internet facilitates the communication among the health professionals and with the general public, and also improves the information access. However, most of medical resources available on the Internet only have a "marketing dimension" (description of the institution), and only a minority have a valid information content.

Several tools in the retrieval of health information on the Internet have been distinguished and structured:

CISMeF uses DCMI differently according to the "browse" (CISMeF MeSH Page, Figure 1) or "search" (CISMeF resource page, Figure 2) strategy chosen by the end-user. The choice of the Dublin Core was prompted by its institutional origin and its notoriety in the academic world. Several other health sites are now using the Dublin Core: the Australian Department of Health and Aged Care (http://www.health.gov.au/), the Better Health Channel (http://www.betterhealth.vic.gov.au/), the National Health and Medical Research Council (URL: http://www.nhmrc.health.gov.au/), and more recently the US National Library of Medicine (NLM) (http://www.nlm.nih.gov/tsd/cataloging/metadata/index.htm) (see a comprehensive list of health sites using DCMI at the following http://www.chu-rouen.fr/documed/dc.html).

OMNI indexes approximately 4,500 resources, mostly from UK, CISMeF about 7,000, mostly from France, MedHunt and HON around 40,000. OMNI and MedWebPlus are also using the UMLS metathesaurus to provide a conceptual network to the subject headings. OMNI, HON, and CliniWeb have also developed a structured database (dynamic HTML) which permits better searches. OMNI and CISMeF are using the Dublin Core metadata format, which is expected to become the dominant metadata format for Internet resource description [13].

It is quite difficult, especially for students, patients and the general public to evaluate the quality of the medical Web sites, which, in a majority of cases, are not peer-reviewed.

One main objective of CISMeF is to promote best medical practice and teaching. Therefore, we index high-quality documents available on the Internet on a priority basis. The organization of the CISMeF data model permits the discovery "by chance" of other "neighbour" sites and documents: e.g., a search about hemiplegia may permit the discovery of sites about paraplegia (relation of proximity) or more generally going up in the tree about paralysis (relation of hierarchy). It also possible to find more information using the "see also" relation. It allows links between medical terms, which are not in the same tree, e.g., in the page about pain, see also terminal care. This "see also" relation seems very difficult to automatically generate because it is based on human knowledge and not on a statistical model. This relation may be asymmetrical. If this relation is symmetrical between terminal care and pain, this is not the case for pain and bioethics. This link is only significant from pain to bioethics and not the opposite

Further challenges that CISMeF needs to address in the next months are to expand sites and high-quality documents, especially patient information, and to collaborate more closely with similar services, particularly in Europe (DDRT, HON and OMNI).

The major drawback of CISMeF is its weak technical level: CISMeF uses only static HTML and does not yet use a database which will allow more complex searches, e.g. guidelines in hepatitis, combining the explode command for keywords, qualifiers but also for metaterms and resource types. We plan to build this CISMeF database in the first semester 2000.

The major feature of CISMeF is its information structure model. CISMeF conceptually encapsulates the MeSH structure (category, keyword, qualifier) by adding two levels: one of top of it (metaterm) and one below it (resource type). This model is completely generic. It can help to design equivalent health catalogs in various languages where the MeSH thesaurus is already translated in these languages.

One key success of CISMeF is the typology of its webmasters: one medical librarian and one medical informatician [15-16]. The CISMeF model was designed by these two individuals illustrating the synergy between these two professions. According to this experience, we suggest the use of this webmaster typology to design health web sites.

We did not formally and directly assess how end-users use CISMeF. During CISMeF training sessions, we observed that patients mostly use the internal search engine to access health information. Few end-users employ the CISMeF model to perform better searches: in September 1999, 7% of the pages loaded are coping with MeSH categories or trees. In the near future, we will measure its real usefulness for different communities (MDs, nurses, and patients) in the different French-speaking countries by a questionnaire based on the Net Scoring. Some indirect elements indicate the CISMeF success: CESIM surveys, CISMeF use patterns and its Web impact factor. In order to enhance its quality, CISMeF respects the Net Scoring , e.g. its two webmasters personally answer each request.

CISMeF is a part of a wider project at RUH: digital library [1] and virtual university [17]. We have already developed some parts of this digital library: access to Medline and 45 electronic full-text English journals on Intranet (OVID provider) [18] plus access to 40 electronic full-text French journals on Extranet (Masson publisher). We plan to extend this library, giving access to electronic textbooks (e.g., the Harrison on the Internet). Our project of virtual university is to develop specific tools for students: a bank of multiple choice questions and a bank for standardized clinical examinations.

5. Conclusion

To help healthcare professionals and health consumers to more easily locate high-quality health information on the Internet, catalogues must use standard tools to describe and index resources.

Acknowledgement

CISMeF is supported by several partners

References

See the publications of the CISMeF team and the publications of the GCSIS, LITIS lab (EA 4108), University of Rouen

1.      Schatz BR. Information Retrieval in Digital Libraries: Bringing Search to the Net. Science 1997;275:327-34.

2.      Flannery MR. Cataloging Internet resources. Bull Med Libr Assoc 1995;83(2):211-5.

3.      Darmoni SJ, Thirion B. Indexing the Web ? A comparative study of three medical Web servers on the Internet: Cliniweb ,"Diseases, Disorders and Related Topics". Omni. In: Proceeding of the 1st European Congress of the Internet in Medicine. 1996: 5-6. 

4.      National Library of Medicine. Fact Sheet Medline. 6 July 1998 [Web document, accessed 11 Jan 1999] Available from Internet: <http://www.nlm.nih.gov/pubs/factsheets/medline.html>

5.      Weibel S, Juha H. DC-5: The Helsinki Metadata Workshop; A Report on the Workshop and Subsequent Developments. D-Lib Magazine 1998 February. Available from Internet: <http://www.dlib.org/dlib/february98/02weibel.html>.

6.      Centrale Santé. Net Scoring : critères de qualité de l'information de santé sur l'Internet 20 Apr 1998 [Web document, accessed 27 Apr 1999]. Available from Internet: <http://www.chu-rouen.fr/dsii/publi/critqualv2.html>.

7.      Ambre J, Guard R, Perveiler FM, Renner J, Rippen H. Health Information Technology Institute. Working Draft White Paper: Criteria for Assessing the Quality of Health Information 8 Apr 1999. [Web document, accessed 22 May 2000]. Available from Internet: <http://hitiweb.mitretek.org/docs/policy.pdf>.

8.      Darmoni SJ, Thirion B. A standard metadata scheme for health resources J Am Med Inform Assoc 1999; 2000; Jan-Feb;7(1):108-109

9.      Thirion B, Darmoni SJ. Simplified access to MeSH Tree Structures on CISMeF. Bull Med Libr Assoc 1999; Oct;87(4):480-1.

10.  Ingwersen P. The calculation of WEB impact factor. Journal of Documentation 1998;54(2):236-43.

11.  National Library of Medicine. Fact Sheet UMLS Metathesaurus. 12 Aug 1998 [Web document, accessed 27 Apr 1999]. Available from Internet: <http://www.nlm.nih.gov/pubs/factsheets/online_indexing_system.html>

12.  Hersh WR. Brown KE. Donohoe LC. Campbell EM. Horacek AE. CliniWeb: managing clinical information on the World Wide Web. Journal of the American Medical Informatics Association 1996;3(4):273-80.

13.  Norman F. Organising Medical Networks' information (OMNI). Med. Inf. 1998;23:43-51.

14.  Boyer C, Baujard O, Baujard V, Aurel S, Selby M, Appel RD. Health On the Net automated database of Health and medical information. International Journal of Medical Informatics 1997;47(1-2):27-9.

15.  Braude RM. Medical librarianship and medical informatics: a call for the disciplines to join hands to train tomorrow's leaders. J Am Med Inform Assoc 1994 Nov-Dec;1(6):467-8.

16.  Braude RM, Florance V, Frisse M, Fuller S. The organization of the digital library. Acad Med 1995 Apr;70(4):286-91

17.  P. LeBeux, F. Duff, A. Fresnel, Y. Berland, R. Beuscart, A. Burgun, JM. Brunetaud, G. Chatellier,  SJ. Darmoni, R. Duvauferrier, M. Fieschi, P. GilloiS , F. Guille, F. Kohler, D. Pagonis, B. Pouliquen, G. Soula, J. Weber. The French Virtual Medical University. In: Proceedings of MIE 2000, Sixteenth International Congress of the European Federation for Medical Informatics, Hanover, Germany.

18.  Darmoni SJ, Benichou J, Thirion B, Fuss J. A study comparing centralized CD-ROM and decentralized intranet access to MEDLINE. Bull Med Libr Assoc 2000 Apr;88(2):152-6

19.  Darmoni SJ, Thirion B, Leroy JP, Douyère M, Piot J. - The use of Dublin Core metadata in a structured health resource guide on the Internet. - Bulletin of the Medical Library Association 2001; July;89(3) 297-301.

20.  Boyer C, Gaudinat A, Baujard V, Geissbühler A. Health on the Net Foundation: assessing the quality of health web pages all over the world. Medinfo. 2007;12(Pt 2):1017-21.

21.   Koch T: Quality-controlled subject gateways: definitions, typologies, empirical overview, Subject gateways. Online Information Review 2000: 24(1): 24-34.

22.  Douyère M, Soualmia LF, Névéol A, Rogozan A, Dahamna B, Leroy JP, Thirion B, Darmoni SJ: Enhancing the MeSH thesaurus to retrieve French online health resources in a quality-controlled gateway. Health Info Libr J 2004 Dec: 21(4):253-61.

23.  Névéol A, Rogozan A, Darmoni SJ. Automatic indexing of online health resources for a French quality controlled gateway.  Information Management & Processing 2006:1; 695-709.

24.  Névéol A, Pereira S, Kerdelhué G, Dahamna B, Joubert M , Darmoni SJ.  Evaluation of a simple method for the automatic assignment of MeSH descriptors. to health resources in a French online catalogue. Medinfo  2007;129:407-11.

25.  Benoit Thirion, Susanne Pereira, Aurélie Névéol, Badisse Dahamna, Stéfan J. Darmoni. - French MeSH Browser: a cross-language tool to access MEDLINE/PubMed. - AMIA 2007; 1132.

26.   Hoelzer S, Schweiger RK, Boettcher H, Rieger J, Dudeck J. Indexing of Internet resources in order to improve the provision of problem-relevant medical information. Stud Health Technol Inform. 2002;90:174-7.

27.   Mc Gregor B. Constructing a concise medical taxonomy. J Med Libr Assoc. 2005 January; 93(1): 121–123.

28.  Gehanno JF,  Thirion B, Darmoni SJ, Evaluation of Meta-concepts for Information Retrieval in a Quality-Controlled Health Gateway. AMIA Symp. 2007 (in press).

29.  Abad Garcia F, Gonzalez Teruel A, Bayo Calduch P, de Ramon Frias R, Castillo Blasco L. A comparative study of  six European databases of medically oriented Web resources. J Med Libr Assoc. 2005;93(4):467-79.

30.  Darmoni SJ, Thirion B, Ionut-Florea F, Rogazan A, Letord C, Kerdelhué G, Dacher JN. Affiliation of a resource type to a MeSH term in a quality-controlled health gateway Medinfo 2007, Twelveth World Congress on Health and Medical Informatics (poster). [affiliation_medinfo_2007.pdf]

 

28-08-2009 


[RUH Homepage] [CISMeF:homepage] [French version]