|
||
|
|
|
CISMeF Latest Presentation
Slides
Stéfan J. Darmoni 2009
Document witten by the CISMeF team.
Abstract
In 2007, the Internet is already a major source of
health information. CISMeF ([French] acronym
for Catalog and Index of French Language
Health Resources on the Internet) is a quality-controlled health gateway to
catalog and index the most important and quality-controlled sources of
institutional health information in French. The objective of CISMeF is to
assist health professionals and consumers in their search for electronic
information available on the Internet. In December 2007, the number of indexed
resources totaled over 41,300 with a mean of 80 new resources each week. CISMeF
was initiated by the Rouen University Hospital (RUH) in February 1995.
Its Universal Resource Locator (URL) are http://www.chu-rouen.fr/cismef & http://www.cismef.org. CISMeF uses
two standard tools for organizing information: the Medline bibliographic
database MeSH thesaurus and several metadata element sets, including the Dublin
Core. Resources included in CISMeF are described by the following: title,
author or creator, subject and keywords, description, publisher, date, resource
type, format, identifier, and language. To index resources, CISMeF uses four
different concepts: "meta-term", keyword, subheading, and resource
type. CISMeF contains a thematic index, including medical specialities and an
alphabetic index. CISMeF respects the Net Scoring and the HON Code, criteria to
assess the quality of health information on the Internet. The CISMeF project
offers a valuable tool for the French-speaking health community: over different
30,000 computers visit the Web site each working day.
MeSH keywords: Abstracting and indexing;
Cataloging; France; Internet; MEDLINE; National Library of Medicine (U.S.);
Subject Headings; Support, Non-U.S. Gov't; Vocabulary controlled
1. Introduction
In 2007, the Internet can now be considered as a major
source of scientific and health information [1]. For the healthcare
professional and the health consumer, the access of accurate information on the
Internet is not easy; therefore, there is a profusion of directories and search
engines available in this new media [2]. However, directories, such as Nomade [http://www.nomade.fr/], Carrefour [http://www.carrefour.net/] or Yahoo [http://www.yahoo.com], or search
engines, such as Google [http://www.google.com] do not permit
the end-user to obtain a clear and organized presentation of the available
health information. This limits the use of these tools. There is not a
catalogue of resources in a medical speciality, e.g., in neurology. These
directories or search engines contain a large number of sites but the
organization and the hierarchy of the data is not adapted to health and
medicine. In this field, there is a specific need for a specialized
classification. The up and down consultation of the tree in this classification
permits the acquisition of greater information.
Furthermore, the need of a rigorous human
classification is mandatory to coherently organize heterogeneous resources,
such as patient associations, electronic journals, mailing lists, clinical
guidelines. In a previous study, we demonstrated that manually catalogues vs.
search engines were less sensitive but far more specific [3]. The main problem
for the end-user is therefore to find useful information. Automated indexing
has other drawbacks: difficulty in indexing non-text media, problems with word
indexing (contact, syntax, morphology, content, polysemy, synonyms and
granularity), lack of structured information (keywords, controlled thesaurus,
metadata e.g., resource types).
In this context, several quality-controlled
health gateways have been developed]. Quality-controlled subject gateways were
defined by Koch [21] as Internet services which apply a comprehensive set of
quality measures to support systematic resource discovery. Considerable
manual effort is used to process a selection of resources which meet quality
criteria and to display an extensive
description and indexing of these resources with standards-based metadata. Regular checking and updating ensure optimal
collection management. The main goal is to provide a high quality of subject
access through indexing resources using controlled vocabularies and by offering
a deep classification structure for advanced searching and browsing. CISMeF ([French] acronym for Catalog and Index of French Language Health Resources on the Internet) is
a quality-controlled health gateway to catalog and index the most important and
quality-controlled sources of institutional health information in French. The
objective of CISMeF is to assist health professionals and consumers in their
search for electronic information available on the Internet. In December
2007, the number of indexed resources totaled over 41,300 with a mean of 80 new
resources each week. CISMeF was initiated by the Rouen University
Hospital (RUH) in February 1995. Its Universal Resource Locator (URL) are http://www.chu-rouen.fr/cismef & http://www.cismef.org (S.J.D. and
B.T are the two co-webmasters).
The scope of CISMeF covers healthcare disciplines and
medical sciences. This means than besides medical doctors which were
historically the first target, other health professionals, such as nurses,
midwives, veterinarians, physiotherapists, nutritionists, will find resources
devoted to their professions. More recently, patients and the general public
has found valuable and reliable health information specially written for them.
CISMeF describes and indexes a large number of health resources. The main
health areas of the indexed resources in CISMeF are:
This resource guide is necessary because (a) there is
an extensive amount of information potentially accessible for the health
professional; (b) it is often difficult to easily separate the information for
the health professional from patient information; and above all (c) the
absolute requirement in medicine is to know the source and the quality of the
information available in the Internet.
In 2007 the CISMeF team is
composed of four medical librarians, two medical informaticians, two research engineers,
and three PhD students majoring in Computer Science.
2. Material and methods
2.1 Hardware and software
CISMeF was implemented in February 1995 on a Sun
running Sun Unix. We changed the machine and the operating system in October
1999 from Unix to Linux. CISMeF is a Web site currently using the Apache http
server. The Webtrends Log Analyzer, version 4.5.2. programs evaluate the use of
the Web page after excluding requests by our Hospital Information System (3,500
computers).
2.2 Standards
From 1995 to 1999, CISMeF was entirely based on static
HTML. Since June 2000, we developed Doc'CISMeF which is a search tool. We had
originally used the HTML 2.0 standard, in order to be readable by the vast
majority of Web browsers. We are now using the newer HTML 3.2 standard and
XML.
To organize the information, CISMeF uses two standard
tools for organizing information: a controlled vocabulary which 'encapsulates'
the MeSH (Medical Subject Heading) thesaurus from the Medline bibliographic
database [4] (US National Library of Medicine) and several element sets
of metadata, including the Dublin Core metadata format [5, 19], IEEE 1484 LOM & HIDDEL.
The Dublin Core. Resources included in CISMeF are described by the following:
title, author or creator, subject and keywords, description, publisher, date,
resource type, format, identifier, and language. The
use of specific metainformation is crucial in order to improve the recall and
precision of internet searches 26. As
proposed by Hoelzer et coll. 26, CISMeF uses XML and RDF to
meet these requirements. This structure enables us to place the project at an
overlap between the actual informal Web and the forthcoming Semantic Web.
We have
organized our catalogue with the MeSH thesaurus [4], which contains
around 25,000 terms in its 2008 version and eleven levels of
classification. The MeSH was selected because it
responds to the aims of the medical librarians and it is well known by the
health professionals. This thesaurus is precise, rigorous and annually
updated. We also use the French translation of this thesaurus [http://dicdoc.kb.inserm.fr:2010/basismesh/mesh.html], performed by
the French Medlars Center, the National Institute for Health and Medical
Research (INSERM, and more specifically the DIC-DOC Network). In some rare
cases, one resource cannot be perfectly indexed using the MeSH thesaurus: we
are using "manual mapping" to search the nearest MeSH term, e.g. a
resource coping with dysmelia has been indexed with the MeSH term ectromelia. Thanks
to the VUMeF project, we have added
8,000 CISMeF French synonyms to the MeSH thesaurus and translated 5,000 MeSH
Scope Notes to French. All these specific information in the CISMeF MeSH
Terminology Server, which is also a cross-lingual tool to acesss PubMed
[25].
The MeSH terms are organized into hierarchies going from the most
general on at the top of the hierarchy to the most specific in the bottom of
the hierarchy. For example, the MeSH term hepatitis is more general than
the MeSH term hepatitis viral A. The qualifiers, also organized into
hierarchies, allowing to specify which particular aspect of a keyword is
addressed, and then to focus on a sub-field of the keyword. For example the
association of the keyword hepatitis with the qualifier diagnosis (noted
hepatitis/diagnosis) restrict the hepatitis to its diagnosis aspect.
The “is-a” relations between concepts are extracted from the MeSH text
files to define the subsumption relationships in the CISMeF keywords hierarchy.
However, the
MeSH thesaurus was originally intended to index scientific articles for the
Index Medicus and for the MEDLINE database. In order to customize it to the
broader field of health Internet resources, we have been developing several
enhancements [22] to the MeSH thesaurus, with the
introduction of two new concepts, respectively metaterms (MT) and resource types (RT). The CISMeF terminology is shown
in Figure 1. CISMeF resource types (RT) are an
extension of the publication types of MEDLINE.
The CISMeF terminology is exploited for several tasks: resource indexing
performed manually, resource categorization performed automatically,
visualization and navigation through the concept hierarchies and a CISMeF
Terminology Server (URL: http://www.chu-rouen.fr/terminologiecismef/)
and information retrieval using the Doc’CISMeF search engine.

Figure 1: Semantic links
between CISMeF metaterms and MeSH terms, MeSH subheadings and CISMeF resource
types
2.2.1 Metaterms
CISMeF metaterms [9] correspond to medical specialties (e.g. cardiology), types of medical
procedures (e.g. surgery) or health topics (e.g. diagnosis, therapy) , which has
semantic links with one or more MeSH terms, subheadings and RTs.
In fact, the idea of creating meta-terms came up to optimize information
retrieval (in particular to
maximize the recall) in CISMeF (Doc’CISMeF search
engine; URL: http://doccismef.chu-rouen.fr/servlets/Simple) and to cope with
the relatively restrictive nature of these medical specialties as MeSH terms, when searching
'guidelines in cardiology' or 'databases in virology', where cardiology
and virology are metaterms and guidelines and databases
are resource types. The MeSH thesaurus does not
allow to have a global vision of a medical specialty. Therefore, in the CISMeF
terminology, metaterms can be considered as “metaconcepts”. Metaterms have been
manually selected by the chief medical librarian (BT). The semantic links
between metaterms and MeSH terms, MeSH subheadings and CISMeF resource types
are based on his know-how and expertise of medical specialists of the Rouen
University Hospital. There is a 0 to N relations between CISMeF metaterms and
MeSH terms, MeSH subheadings and CISMeF resource types
Each metaterm has a semantic link
with the corresponding MeSH term, e.g. the metaterm cardiology has
a semantic link with the MeSH term cardiology. For instance, the queries
'guidelines in cardiology' and 'databases in psychiatry' where cardiology and
psychiatry are only MeSH keywords get few or no answers.
Introducing cardiology and
psychiatry as metaterms is an efficient strategy to get more results
because instead of exploding one single MeSH tree (e.g. psychiatry as a
MeSH term), using metaterms results in an automatic expansion of the queries by
exploding other related MeSH, such as psychiatric hospital that belongs
to a completely different tree structure within the MeSH, or CISMeF trees (for
resource types) as well as the current tree (e.g. psychiatric hospital as
a MeSH keyword or mental health dispensary as a resource type will be
exploded in the case of the psychiatry query). In example, the
metaterm psychiatry has the following semantic links: [MeSH Terms] "behavioral
symptoms”; "community mental health centers"; "diagnostic and
statistical manual of mental disorders"; "hospitals, psychiatric";
"mental disorders"; "mental health services"; "mentally
ill persons"; "psychiatric department, hospital"; "psychiatric
somatic therapies"; "psychiatric status rating scales"; "psychiatry";
"psychological techniques"; "psychophysiologic disorders";
"psychotherapy"; "psychotropic drugs" "schizophrenic
psychology"; [CISMeF resource types] "community mental health
centers"; "hospitals, psychiatric";
In January 2007, the number of
metaterms in the CISMeF terminology was 110. The comprehensive list of
metaterms is available at the following URL: http://doccismef.churouen. fr/liste_des_meta_termes_anglais.html
Major Topics exist in the Medline
database and the CISMeF catalogue for keywords and qualifiers. A term is said
to be “major” if the concept it represents is discussed throughout the whole
document, or on the contrary "minor" if it is referred to only in a
few paragraphs. Major terms are marked in Medline & CISMeF by a star. In
CISMeF, Major Topics are extended to resource types and metaterms. This task is
manually performed by CISMeF medical librarians for resource types, and automatically
performed for metaterms : a metaterm is “major” for a CISMeF resource if and
only if at least one keyword, qualifier or resource type semantically linked to
this metaterm is major for the same CISMeF resource (otherwise, the metaterm is
minor). In a comparative study performed by Abad Garcia et al. [29] among
six European health gateways. Although CISMeF was rated second, it has been
criticized because “failure on precision may be due to exhaustive indexing” [29].
To optimize the precision of our health gateway, we have introduced a major modification
of the CISMeF information retrieval algorithm: when a query will be mapped to
one or several terms of the CISMeF terminology (CISMeF metaterms, MeSH terms, MeSH
subheadings, CISMeF resource types), the resources with Major Topics will be
first displayed (e.g. in case of the following query 'guidelines in
cardiology', resources with Major Topic cardiology as a metaterm and Major
Topic guideline as a resource type will be first displayed).
The optimization of the information retrieval was proven in 2007 by a formal
study of Gehanno et al [28] that showed that the corresponding MeSH terms had
the same precision than the eponym metaterm, when the recall of the MeSH term
was 0.44 where the metaterm recall was 1. To construct a taxonomy of
medicine, the publishing division of American Medical Association (AMA) took as
its precedent the simplified access to MeSH via CISMeF metaterms [27].
Meta-terms had semantic links with the three other
levels of CISMeF terminology: MeSH terms, qualifiers and resource types. Each
metaterm has a semantic link with one or more keywords, subheadings and
resources types. Each term of the CISMeF terminology can have a set of synonyms
and can belong to several trees. For example, on the "oncology" page,
the sites of general interest on this specialty were indexed and described,
using the MeSH keyword "medical oncology", followed by a
list of starting points of MeSH terms. The MeSH terms are: (a) antineoplastic
agents, (b) medical oncology, (c) neoplasms, (d) tumors markers, biological,
and (e) oncology service, hospital. The subheading is: "secondary"
and the resource type is: "oncology service, hospital". The controlled
list of meta-terms is available at the following URL
http://www.chu-rouen.fr/ssf/santspe.html.
2.2.2. Resource types
As defined by the Dublin
Core Metadata Initiative (URL:
http://www.dublincore.org/documents/dcmi-terms/), a RT is used to categorize the nature or genre of the content of
the resource. MeSH (term/subheading) pairs describe the topic of the resource.
RT is one of the fifteen Dublin Core repeatable and optional elements. Compared
to the publication types of MEDLINE, the CISMeF RTs are more diverse, with
specific RTs dedicated to electronic health resources, such as association,
patient information, community networks, or clinical guidelines.
For example, in the case of a clinical guideline
about carbon monoxide intoxication, ‘carbon monoxide poisoning’ is the
MeSH term and ‘clinical guidelines’ is the resource type. CISMeF RTs are
organized similarly to MeSH terms and subheadings, in a hierarchical structure
with subsumption relationships (allowing the explode property) and a maximum of
five-level depth. The MEDLINE publication
types were mainly a flat list till 2005 (see URL:
http://www.nlm.nih.gov/mesh/pubtypes2005.html). Since 2006, MEDLINE publication
types has also a hierarchical structure.
In preparation of the PhD
thesis aiming at the automatic indexing of bimodal resources (text + image) in
CISMeF, we have introduced new resources types, which are all medical image
types. The medical image types list (n=112) are in majority MeSH terms from the
diagnostic imaging term and its sub-tree composed of narrower terms. The
overall number of resources types in December 2007 is 295. The controlled list
of is available at the following URL: http://www.chu-rouen.fr/documed/typeeng.html.
This RT list was manually built and maintained by the CISMeF team since 1997.
Nonetheless, this list is
largely inspired by the MeSH thesaurus as 76% are deliberately ambiguous
because they are also MeSH terms (e.g. magnetic resonance imaging). The
objective of this ambiguity is to maximize the recall then the search answers
(which means the Doc’CISMeF search ORes the answers for the MeSH term and the
answers for the RT) when the request contains this kind of ambiguous term.
Furthermore, to be as close to a standard as
possible, 11% are also MEDLINE publication types (e.g. technical report).
The medical image types
list has been reviewed and updated manually by one medical imaging expert
(JND). Medical image types are used in the CISMeF database to index a health
resource containing images. For example, a teaching resource about
choledocholithiasis, which includes
ultrasonography images, will be indexed with the RTs ultrasonography and teaching material.
If this
teaching resource contains a paragraph describing the ultrasonography of the choledocholithiasis,
the librarian will use the MeSH
(term/subheading) pair (choledocholithiasis/ultrasonography). If necessary, two others subheadings can be used:
‘radiography’ and ‘radionuclide
imaging’.
In the
case of other medical image types, MeSH terms will be used. For example, if the
resource about choledocholithiasis includes
text about magnetic resonance imaging, it will be indexed with the two MeSH
terms 'choledocholithiasis' and 'magnetic resonance
imaging'. If there is an image of
magnetic resonance imaging, it will be indexed with the RT 'magnetic
resonance imaging'.
Until the introduction of
images types, we did not think to use the existing RTs for affiliation to a
specific aspect of MeSH term or (term/subheading) pairs [30]. In April 2004,
the creation of medical image types led us to propose
a refinement of the CISMeF terminology and thus a refinement of manual indexing
procedures: for a number of specific RTs, mainly images types but not
exclusively (e.g. multiple choice quiz also applies here), we proposed to affiliate a RT to a MeSH term or to a MeSH
(term/subheading) pair. Thus we can obtain a [MeSH (term/subheading)\CISMeF RT]
triplet, where the backslash character ‘\’ represents the RT affiliation (the
slash character / represents in MEDLINE the affiliation of subheading to a term).
This approach can be viewed as an extension of the affiliation of a subheading
to a MeSH term.
Therefore, a teaching
resource about choledocholithiasis, which
includes ultrasonography images, should be indexed with the (term\RT) pair (choledocholithiasis\ultrasonography). If the teaching resource specifies
for example that ultrasonography images are valuable for the diagnosis of choledocholithiasis,
the [MeSH (term/subheading)\CISMeF RT] triplet [(choledocholithiasis/diagnosis)\ultrasonography] should be used. Another triplet can
be viewed in the following description of a resource indexed in the CISMeF
catalogue; the triplet “atrial
fibrillation/diagnosis\electrocardiography” indicates that this
resource contains an EKG image to contribute to the diagnosis of atrial
fibrillation.
Atrial fibrillation (The) –
French pre-residency program examination : question
236 –
Pr Vanzetto G (Grenoble University), M. Defaye P.
(Grenoble University)
[publisher : Joseph Fourier University, Medical
School ; definition, etiology, physiopathology, anatomical sequels, diagnosis,
evolution and prognosis, treatment ; printable version, references,
pre-necessary, exercises ; language : French ; format : xml ; access : free ;
date : 2005 ; visited on November 2006]. Grenoble-Fr
keywords : cardiology /
education ; *atrial fibrillation ; atrial fibrillation / diagnosis \
electrocardiography ; atrial fibrillation / etiology ; atrial fibrillation
/ physiopathology ; atrial fibrillation / therapy ; prognosis ; signs and
symptoms
2.3 Methodology and realization
of the catalogue
CISMeF is a quality controlled subject gateway. Koch
defines a quality controlled subject gateway as an Internet service, which
apply a set of quality measures to support systematic resource discovery.
Considerable manual effort is used to secure a selection of resources, which
meet quality criteria and to display a rich description on these resources with
standard-based metadata. Regular checking and updating ensure good collection
management. A main goal is to provide high quality criteria of subject access
through indexing resources using controlled vocabularies and by offering a deep
classification structure for advanced searching and browsing.
Each following elements proposed by Koch, which
characterize a typical quality-controlled subject gateway are implemented in
CISMeF:
To allow interoperability with other Internet services,
gateways apply open standards. CISMeF uses two standard tools for organizing
information: the MeSH (Medical Subject Heading) thesaurus from the US National
Library of Medicine, and several metadata element sets: (a) the Dublin Core
metadata format to describe and index all the health resources included in
CISMeF, (b) some elements from IEEE1484 Learning Object Metadata for teaching
resources, (c) specific metadata for evidenced-base medicine resources which
also qualify the health content, and (d) the HIDDEL metadata set will be used
to enhance transparency, trust and quality of health information on the
Internet in the EU-funded MedCIRCLE project.
2.3.1 Why index only
French-speaking Health resources
In November 1994, when the RUH was first connected to
the Internet, we rapidly observed the absence of a specific catalogue of
French-speaking health resources. On the contrary, several very good catalogues
for English-speaking resources such as DDRT (Diseases, Disorders and Related
Topics [http://www.mic.ki.se/Diseases/index.html], Medical
Library and Medical Information Center, Karolinska Institute, Stockholm,
Sweden) or MedWeb, Health Science Center Library of Emory University-Us [http://www.medweb.emory.edu/MedWeb/] were already
in operation (see also our selection of these catalogues in the following URL: http://www.chu-rouen.fr/ssm/listemed.html). Therefore,
since its creation, CISMeF has only catalogued and indexed French-speaking
health resources, independent of its origin.
The CISMeF method entails a four-fold process:
resource collection, filtering, description and index. One deputy medical
librarian (J.P.) performs the resource collection and the information watch.
The editorial boards filters and selects the resources. Three deputy
medical librarians (Catherine Letord, Gaétan Kerdelhué & Josette Piot.)
describe and index resources. The chief medical librarian (B.T.) is a
‘super-indexer’ in charge of checking the indexing.
From 1995 to 2006, CISMeF was exclusively
manually indexed by a team of four indexers, which are medical librarians.
There is regular meetings with one medical informatician (S.J.D.) for
double-checking.
Since 2002, automatic indexing tools were developed in the CISMeF team
using primarily natural language processing (NLP) and K-nearest neighbours
(KNN) methods [23] , followed by a more simple bag of words algorithm [24]. The
latest was successfully evaluated in the context of teaching resources. Then
the CISMeF team has decided to use this algorithm in the daily practice for
most of the Internet resources (except guidelines which are still manually
indexed because the type of resources need in depth indexing). Three levels of
indexing were defined in the CISMeF catalogue: (a) level 1 or Core-CISMeF (N=17,806): totally manually indexed resources (e.g.
guidelines); (b) level 2 or supervised resources (N=5,257): these
resources are rated by the CISMeF editorial board as less important than level
1. Then, these resources do not need in-depth indexing (e.g. technical reports,
teaching resources designed at the national level, document for patients from
medical specialties). Supervision mean that these resources are primarily
automatically indexed, then this indexing is reviewed by a CISMeF medical
librarian; (c) level 3 or automatically indexed resources (N=18,023). The CISMeF
editorial board has rated these resources as less important than level 1 and
level 2 (e.g. teaching resources designed at the medical school level, patient
association Web sites).
Then, the CISMeF wanted to create a level 4, which could be defined as
the exhaustive automatically indexed pages from the CISMeF publishers. The
latest can be defined as the
publishers that have at least one Internet resource included in the CISMeF
catalogue. Instead of reinventing the wheel, we have decided to use a
customized version of a generic search engine (Google): Google Coop.
2.3.2. Resource collection
The resource collection is performed on a daily basis
and is partially automated. French-speaking directories and search engines are
checked, such as Carrefour, Ecila, Eureka, Francite, Nomade, Toile du Quebec,
especially their "what's new" pages (see http://www.chu-rouen.fr/documed/docum.html#VEILLE). A total of
1,779 health webmasters have sent us an Email or a specific form to be indexed
in CISMeF. Indexing priority is given to Internet sites of institutions and
scientific societies. Resources include sites and high quality documents,
issued especially from evidence-based medicine, practice guidelines, consensus
development conferences, teaching and education resources and consumer health
information.
2.3.3. Filtering and selection:
how the information is validated
In order to include only reliable resources, CISMeF uses
the main criteria (e.g. source, description, disclosure, last update) of the Net Scoring [6] and
of the HON
Code
to assess the quality of health information on the Internet.
There are 49 criteria, which fall into eight categories: credibility, content,
hyperlinks, design, interactivity, quantitative aspects, ethics and
accessibility. All the criteria were chosen by an expert consensus. Some of
these criteria have been inspired by a US white paper [7] and some criteria are
included in Dublin Core metadata format. The description of a site should
permit the evaluation of the quality of its content. Some resources are not
introduced in the catalogue because they don't respect basic, particularly
ethical criteria. The quality of health information is a key point to consider
specifically for the patients and their families.
2.3.4. Description and index
Cataloguing a site is necessary because it helps
end-user to estimate, in advance, the type of information and to evaluate its
content. This process also saves time for the end-user.
Till 2001, In CISMeF, each keyword was "de
facto" a MeSH Major Topic with a mean of 1.87 MeSH term per resource.
Since 2001, each keywors as in Medline can be used in CISMeF as major or minor
with a mean of 2.5 MeSH term and a mean of 1.61 qualifier per resource. MeSH
subheading permits a focus on a sub-field of a MeSH term, e.g.,
chloride/toxicity. We also use a French translation of the MeSH subheadings [http://dicdoc.kb.inserm.fr:2010/basismesh99/ind_fr_eng.html], which is
less systematically used in CISMeF compared to Medline. CISMeF resource type
[8] is a generalization of the publication type of Medline. We have added types
which are specific of the resources available on the Internet, such as
association, patient information, community networks. The controlled list of
resource types is available in Table 1. Resource type describes
the nature of the resource and MeSH describes the subject of the resource. For
example, in case of a clinical guideline about carbon monoxide intoxication,
‘carbon monoxide poisoning’ is the MeSH keyword and ‘clinical guidelines’ is
the resource type. In CISMeF, each description also contains the geographic
localization, including the city, the province or state, and the country of origin.
Example of a description of a document indexed in
CISMeF:
Alcool et risque
de cancers Etat des lieux des données
scientifiques et recommandations de santé publique [site éditeur INCA Institut National du Cancer] introduction
générale, métabolisme de l'alcool et polymorphismes génétiques associés, alcool
et cancers des coies aérodigestives supérieures (VADS), alcool et cancer du
foie, cancer du sein, cancer colorectal, autres cancers, enjeux de santé
publique, recommandations, note épidémiologique, annexe ; 60 pages [langue :
français ; format : html ; accès : gratuit et libre ; site non parrainé ; daté
du 01/11/2007 ; visité le 13/12/2007]. -Fr
mots clés : *boissons
alcoolisées /effets indésirables ; éthanol
/effets indésirables ; éthanol
/métabolisme ; *facteur
risque ; *troubles
liés à l'alcool ; troubles
liés à l'alcool /épidémiologie ; troubles
liés à l'alcool /génétique ; troubles
liés à l'alcool /prévention et contrôle ; *tumeurs
; tumeurs
colorectales /étiologie ; tumeurs
de la tête et du cou /étiologie ; tumeurs
du foie /étiologie ; tumeurs
du sein /étiologie
type : *rapport
technique ; *recommandation
de santé publique
2.3.5.
Structure of the CISMeF catalogue
CISMeF contains a thematic index, including medical
specialties and biological sciences [http://www.chu-rouen.fr/ssf/santspe.html] and an
alphabetic index [http://www.chu-rouen.fr/ssf/santpath.html] (see Figure
1: CISMeF homepage screenshot). Both indexes use the
Medline thesaurus. A brief description of each site indexed in CISMeF is
systematically added.
In CISMeF, each MeSH term corresponds to a HTML static
document, which is organized first with MeSH subheading, then for each
subheading, the resource types (e.g CISMeF mercury resources screenshot).
The alphabetic index uses the MeSH terms in English
and their French translation, which permits bilingual search. The thematic
index contains 112 meta-terms, which are in France medical specialties,
such as Aerospace Medicine. In addition, a general index is also available
which contains MeSH synonyms in French and allows permuted utilization .
Currently, we have indexed over 41,000 resources with over 15,800 MeSH terms. A
mean of 80 resources are indexed each week.
2.3.6. CISMeF Metadata element
set
Metadata are documentation about documents and objects
or structured information about information. In the Internet, metadata refers
to descriptive information about the Web resources and is used to improve
information retrieval. Metadata represents the content, structure and
logistical information of any information objects including electronic
resources. Metadata is used for data discovery and control of data. Metadata
can describe information resources. Metadata is important for customization as it
helps selecting suitable resources for a particular user and his/her particular
needs. Metadata can greatly enhance information retrieval, and enable accurate
matches to be done, while being totally transparent and invisible to the use,
but this depends on the quality of the metadata we are using.
The DCMI is a metadata element set intended to
facilitate the discovery of electronic resources (http://dublincore.org).
Originally conceived for an author-generated description of Web resources, the
DCMI is now used by museums, libraries, government agencies, and commercial
organizations alike.
There is a need for an interoperable infrastructure
for Digital Libraries, quality-controlled subject gateways, and other Web-based
services that rely on cross-institutional and cross-border co-operation.
Agreement on a metadata standard that serves as a starting point for
information exchange in specific domains and provides common ground for
cross-domain interoperability is a crucial element of this infrastructure. The main
metadata standard with this cross-domain perspective is the Dublin Core, now
recommended across Europe for use in many sectors as the standard of choice to
ensure interoperability between resource discovery systems on the Internet.
There exist two types of metadata:
Stand-alone metadata (annotations) are external
comments notes remarks that can be attached to any Web document or a selected
part of the document. They are classified as metadata as they give additional
information about an existing piece of data (the annotated resource).
In CISMeF we do not impose any structural changes on
the peripheral resources or their hosting servers: after collecting and
filtering the resources, the selected resources are indexed and described by an
annotation (an HTML or XML file) created by the librarians. The metadata allows
the indexing of the resource by its informational contents. A set of keywords,
qualifiers and resources types, according to the CISMeF terminology allows a
content indexing.
The metadata in CISMeF is composed by several
elements:
2.3.6.1 The Dublin Core
Metadata Initiative (DCMI)
The Dublin Core Metadata Initiative (DCMI) is a
project from the Online Computer Library Center (OCLC) and the National Center
for Supercomputing Applications (NCSA). The DC conceived at a workshop convened
by OCLC and NCSA in 1995 in Dublin, Ohio, USA. This 15-element set emerged from
a series of international invitational workshops that have been held since
1995, at which broad consensus was reached among experts in resource
description, networking, encoding standards, information retrieval and a range
of subject disciplines. DCMIA is intended to aid discovery for electronic
resources. The current DC describes the following elements covering resource
content, intellectual property and instantiation: title, creator, subject
(using a controlled vocabulary), description, publisher, contributor, date,
type format, identifier, source language, relation, coverage and rights. Each
element is optional and repeatable.
The DCMI is a metadata element set intended to
facilitate the discovery of electronic resources (http://dublincore.org).
Originally conceived for an author-generated description of Web resources, the
DCMI is now used by museums, libraries, government agencies, and commercial
organizations alike.
The construction of an interdisciplinary,
international consensus around a core element set is the central feature of the
DCMI which benefits from active participation and promotion in over 20
countries in North America, Europe, Australia, and Asia. The DCMI is intended
to be used by non-cataloguers as well as resource description specialists.
DCMI describes two board classes of DC qualifiers:
The construction of an interdisciplinary,
international consensus around a core element set is the central feature of the
DCMI which benefits from active participation and promotion in over 20
countries in North America, Europe, Australia, and Asia. The DCMI is intended
to be used by non-cataloguers as well as resource description specialists.
Resources included in CISMeF are described by 11 of 15
items taken from version 1.1 of the DCMI (http://dublincore.org/documents/dces/). These are: author or
creator, date, description, format, identifier, language, publisher, resource
type, rights, subject and keywords, and title. CISMeF does not use the 4 other
DCMI items (contributor, coverage, relation, source).
To capture more information for each health resource
indexed in CISMeF, another element set was developed locally to meet specific
search and retrieval needs. The following eight fields are added in the data
and metadata and are specific to CISMeF: institution, city, province or state,
country, target or audience, type of access, cost and sponsorship. Some of
these fields (e.g.. cost) are also present in LOM.
Since 2000, CISMeF also includes a database and a
search tool, which generates an HTML (or XML) page for every indexed
resource. These metadata elements were manually written and updated by
the CISMeF team from 1995 to 1999 and currently automatically created and
updated from the CISMeF database.
2.3.6.2 The IEEE 1484
Learning Objects Metadata (LOM)
The IEEE 1484 Learning Object Metadata (LOM)
(http://ltsc.ieee.org/doc/wg12/LOM3.6.html) version 3.6 contains over 60
elements in the following nine categories: General,. Lifecycle,
Meta-metadata, Technical Educational, Rights, Relation,
Annotation, Classification. LOM metadata includes the 15
DCMI elements.
CISMeF is one of the search tool of the French Medical
Virtual University (FMVU) Consortium which includes 8 Medical Schools. This
consortium was created to experiment various tools and methods necessary to
build a virtual university (http://www.umvf.prd.fr). To describe and index
teaching resources, this consortium has decided to use in its search tools only
the 11 elements of the LOM Educational category because they are the most
specific. Also, a feasibility study showed that: the CISMeF team spends an
average of 30 minutes to describe and index a teaching resource with the Dublin
Core set and needs 30 minutes more for the LOM Educational subset. The field cost
of the LOM is used in CISMeF.
2.3.6.3 The Evidence
Based Medicine Metadata
CISMeF uses two specific metadata elements for EBM
resources and more broadly ‘sensitive’ information. Sensitive information is
defined as information found in documents published on the Internet, which
could be used in a medical decision: These two metadata elements are: (a)
indication of level of evidence which we proposed to be the main criterion
chosen for the quality of the health information content and (b) the method
used to calculate the level of evidence as more than twenty are currently used
in the literature. CISMeF is a quality-controlled health gateway to explicitly
indicate if level of evidence is mentioned for each indexed 'sensitive'
document. Furthermore, this criterion is searchable using the Doc'CISMeF search
tool.
Example:
asthma[MeSH term] AND guidelines[ressource type] =>
42 resources
see
http://doccismef.chu-rouen.fr/servlets/Simple?Mot=asthma+guidelines&aff=4&tri=20&datt=1&debut=0
(asthma[MeSH term] AND guidelines[ressource type])
LIMIT to those explicitly indicating the level of evidence => 7 resources
2.3.6.4 The HIDDEL
(Health Information, Disclosure, Description and Evaluation Language) metadata
HIDDEL is a standard vocabulary/metadata language
developed in the MedCERTAIN project. HIDDEL is designed to be used by:
information providers to describe and disclose properties of e-health services
(self-rating) and third parties e.g. by subject gateways to express third-party
opinions about health information providers.
CISMeF is a member of the MedCIRCLE project which is a
collaboration of trusted European health subject gateways, medical
associations, accreditation, certification, or rating services, which share the
common goal of evaluating, describing, or annotating health information. This
project began in March 2002 and will last 18 months.
As a quality-controlled subject gateway, CISMeF will
use HIDDEL only as a third-party. Some elements of the HIDDEL are close to
Dublin Core (e.g. HIDDEL.Identity and DC.Author) but these elements will be
repeated in the two metadata element sets to allow multiple interoperability.
Most of the HIDDEL elements are common with the Net Scoring previously used by
CISMeF and some are already present in the CISMeF database (e.g.
HIDDEL.policies).
This metadata element set will be useful for
cross-searching distributed and heterogenous subject gateways. We have
successfully tested the interoperability of the CISMeF metadata element set with
the FMVU e-learning platform using the XML version of CISMeF resource pages
(Example: http://doccismef.chu-rouen.fr/xml/00008637.xml including the
DTD http://doccismef.chu-rouen.fr/xml/dtdNL.dtd).
3. Results
CISMeF is efficient and an end-user-friendly solution
to find French-speaking worldwide health resources on the Internet. Seventy per
cent of these resources are located in France, 16% are from Canada, in
particular the Quebec Province, 4% from Switzerland and Belgium and 3% from
Africa. For more information, see our internal statistics and external statistics via
www.alexa.com
This Web site is principally and initially oriented
for the health professional, although the general public may also have access
to it. Many sites are devoted to both. There are no HTML documents with
restricted access in the CISMeF Web site. Thus our traditional
"end-users" are now not only healthcare practitioners but also
patients, their families and anyone seeking health information [9]. Training should
take into consideration the information needs of the lay person as well as
those of the medical professional. Training sessions on the use of our
catalogue have been offered to Patients' Associations, especially to
handicapped people at RUH since February 1999.
CISMeF has three priority axes: evidence based
medicine, teaching and patient information. CISMeF also includes a list of
clinical guidelines and consensus development conferences, hospitals, medical
universities, health institutions, medical libraries, medical publishers,
electronic journals, electronic textbooks, databases, teaching and CME, mailing
lists, research laboratories and institutes, pharmaceutical firms, health and
patient associations, and commercial companies in the health sector.
Since February 1995, some new features have been added
to optimize the navigability and the access to the information for the
end-user: (a) use of an internal search engine (full-text and boolean
searches), (b) a general index, and (c) a "what's new" page to easily
display the newly indexed sites on a weekly basis. Since January 1997 it has
also included an archive of the what's new pages. Two guides to use CISMeF are
also on line, one for basic search and one for advanced search. CISMeF is
accessible by the lowest common denominator of current browser technology.
3.1. Use patterns of the Web site
Use of the Web site increased in an approximate linear
progression with time starting in February 1995. Our Web server software, which
provides documents to users on request, does not know the identities of
individual users, such as E-mail; the only identifying data available are the
Internet IP addresses of the machines from which the users connect to the site.
Analysis of a representative period, the month of
November 2007, showed that every working day approximately 30,000 unique
machines visited our site (excluding ours). For more information, see our internal statistics and external statistics via
www.alexa.com
We also use the following indicator (WIF Web Impact
Factor [10]) to measure the current impact and potential ongoing future usage
of CISMeF: number of sites, which have at least one hyperlink to our site [http://www.chu-rouen.fr/dsii/html/pointeur.html]. Currently,
our Web impact factor is over 800 (Altavista indicates more than 2,800 pages
after exclusion of our internal links), including the most prestigious Health
resource catalogues. Approximately 130 press articles released information
about our Web site [http://www.chu-rouen.fr/dsii/html/presse.html]. In March
2007, the daily newspaper "Le Monde" was rating CISMeF as n°5 as a
health Web site in France
4. Discussion
The Internet facilitates the communication among the
health professionals and with the general public, and also improves the
information access. However, most of medical resources available on the
Internet only have a "marketing dimension" (description of the
institution), and only a minority have a valid information content.
Several tools in the retrieval of health information
on the Internet have been distinguished and structured:
CISMeF uses DCMI differently according to the
"browse" (CISMeF MeSH Page, Figure 1) or "search" (CISMeF
resource page, Figure 2) strategy chosen by the end-user. The choice of the
Dublin Core was prompted by its institutional origin and its notoriety in the
academic world. Several other health sites are now using the Dublin Core: the
Australian Department of Health and Aged Care (http://www.health.gov.au/), the
Better Health Channel (http://www.betterhealth.vic.gov.au/), the National
Health and Medical Research Council (URL: http://www.nhmrc.health.gov.au/), and
more recently the US National Library of Medicine (NLM)
(http://www.nlm.nih.gov/tsd/cataloging/metadata/index.htm) (see a comprehensive
list of health sites using DCMI at the following
http://www.chu-rouen.fr/documed/dc.html).
OMNI indexes approximately 4,500 resources, mostly
from UK, CISMeF about 7,000, mostly from France, MedHunt and HON around 40,000.
OMNI and MedWebPlus are also using the UMLS metathesaurus to provide a
conceptual network to the subject headings. OMNI, HON, and CliniWeb have also
developed a structured database (dynamic HTML) which permits better searches.
OMNI and CISMeF are using the Dublin Core metadata format, which is expected to
become the dominant metadata format for Internet resource description [13].
It is quite difficult, especially for students,
patients and the general public to evaluate the quality of the medical Web
sites, which, in a majority of cases, are not peer-reviewed.
One main objective of CISMeF is to promote best
medical practice and teaching. Therefore, we index high-quality documents
available on the Internet on a priority basis. The organization of the CISMeF
data model permits the discovery "by chance" of other
"neighbour" sites and documents: e.g., a search about hemiplegia may
permit the discovery of sites about paraplegia (relation of proximity) or more
generally going up in the tree about paralysis (relation of hierarchy). It also
possible to find more information using the "see also" relation. It
allows links between medical terms, which are not in the same tree, e.g., in
the page about pain, see also terminal care. This "see also" relation
seems very difficult to automatically generate because it is based on human
knowledge and not on a statistical model. This relation may be asymmetrical. If
this relation is symmetrical between terminal care and pain, this is not the
case for pain and bioethics. This link is only significant from pain to
bioethics and not the opposite
Further challenges that CISMeF needs to address in the
next months are to expand sites and high-quality documents, especially patient
information, and to collaborate more closely with similar services,
particularly in Europe (DDRT, HON and OMNI).
The major drawback of CISMeF is its weak technical
level: CISMeF uses only static HTML and does not yet use a database which will
allow more complex searches, e.g. guidelines in hepatitis, combining the
explode command for keywords, qualifiers but also for metaterms and resource
types. We plan to build this CISMeF database in the first semester 2000.
The major feature of CISMeF is its information
structure model. CISMeF conceptually encapsulates the MeSH structure (category,
keyword, qualifier) by adding two levels: one of top of it (metaterm) and one
below it (resource type). This model is completely generic. It can help to
design equivalent health catalogs in various languages where the MeSH thesaurus
is already translated in these languages.
One key success of CISMeF is the typology of its
webmasters: one medical librarian and one medical informatician [15-16]. The
CISMeF model was designed by these two individuals illustrating the synergy
between these two professions. According to this experience, we suggest the use
of this webmaster typology to design health web sites.
We did not formally and directly assess how end-users
use CISMeF. During CISMeF training sessions, we observed that patients mostly
use the internal search engine to access health information. Few end-users
employ the CISMeF model to perform better searches: in September 1999, 7% of
the pages loaded are coping with MeSH categories or trees. In the near future,
we will measure its real usefulness for different communities (MDs, nurses, and
patients) in the different French-speaking countries by a questionnaire based
on the Net Scoring. Some indirect elements indicate the CISMeF success: CESIM
surveys, CISMeF use patterns and its Web impact factor. In order to enhance its
quality, CISMeF respects the Net Scoring , e.g. its two webmasters personally
answer each request.
CISMeF is a part of a wider project at RUH: digital
library [1] and virtual university [17]. We have already developed some parts
of this digital library: access to Medline and 45 electronic full-text English
journals on Intranet (OVID provider) [18] plus access to 40 electronic
full-text French journals on Extranet (Masson publisher). We plan to extend
this library, giving access to electronic textbooks (e.g., the Harrison on the
Internet). Our project of virtual university is to develop specific tools for
students: a bank of multiple choice questions and a bank for standardized
clinical examinations.
5. Conclusion
To help healthcare professionals and health consumers
to more easily locate high-quality health information on the Internet,
catalogues must use standard tools to describe and index resources.
Acknowledgement
CISMeF is supported by several partners.
References
See the publications of
the CISMeF team and the publications of
the GCSIS, LITIS
lab (EA 4108), University of Rouen
1.
Schatz BR. Information Retrieval in Digital Libraries:
Bringing Search to the Net. Science 1997;275:327-34.
2.
Flannery MR. Cataloging
Internet resources. Bull Med Libr Assoc 1995;83(2):211-5.
3.
Darmoni SJ, Thirion B. Indexing the Web ? A comparative
study of three medical Web servers on the Internet: Cliniweb ,"Diseases,
Disorders and Related Topics". Omni. In: Proceeding of the 1st
European Congress of the Internet in Medicine. 1996: 5-6.
4.
National Library of Medicine. Fact Sheet Medline.
6 July 1998 [Web document, accessed 11 Jan 1999] Available from Internet: <http://www.nlm.nih.gov/pubs/factsheets/medline.html>
5.
Weibel S, Juha H. DC-5: The Helsinki Metadata
Workshop; A Report on the Workshop and Subsequent Developments. D-Lib Magazine 1998
February. Available from Internet:
<http://www.dlib.org/dlib/february98/02weibel.html>.
6.
Centrale Santé. Net
Scoring : critères de qualité de l'information de santé sur l'Internet 20 Apr
1998 [Web document, accessed 27 Apr 1999]. Available from Internet:
<http://www.chu-rouen.fr/dsii/publi/critqualv2.html>.
7.
Ambre J, Guard R, Perveiler FM, Renner J, Rippen H.
Health Information Technology Institute. Working Draft White Paper: Criteria
for Assessing the Quality of Health Information 8 Apr 1999. [Web document,
accessed 22 May 2000]. Available from Internet: <http://hitiweb.mitretek.org/docs/policy.pdf>.
8.
Darmoni SJ, Thirion B. A standard metadata scheme for
health resources J Am Med Inform Assoc 1999; 2000;
Jan-Feb;7(1):108-109
9.
Thirion B, Darmoni SJ. Simplified access to MeSH Tree
Structures on CISMeF. Bull Med Libr Assoc 1999; Oct;87(4):480-1.
10.
Ingwersen P. The calculation of WEB impact factor.
Journal of Documentation 1998;54(2):236-43.
11.
National Library of Medicine. Fact Sheet UMLS
Metathesaurus. 12 Aug 1998 [Web document, accessed 27 Apr 1999]. Available from
Internet: <http://www.nlm.nih.gov/pubs/factsheets/online_indexing_system.html>
12.
Hersh WR. Brown KE. Donohoe LC. Campbell EM. Horacek
AE. CliniWeb: managing clinical information on the World Wide Web. Journal of the American Medical Informatics Association
1996;3(4):273-80.
13.
Norman F. Organising Medical Networks' information
(OMNI). Med. Inf. 1998;23:43-51.
14.
Boyer C, Baujard O, Baujard V, Aurel S, Selby M, Appel
RD. Health On the Net automated database of Health and medical information. International Journal of Medical Informatics 1997;47(1-2):27-9.
15.
Braude RM. Medical librarianship and medical informatics:
a call for the disciplines to join hands to train tomorrow's leaders. J Am Med
Inform Assoc 1994 Nov-Dec;1(6):467-8.
16.
Braude RM, Florance V, Frisse M, Fuller S. The
organization of the digital library. Acad Med 1995 Apr;70(4):286-91
17.
P. LeBeux, F. Duff, A. Fresnel, Y. Berland, R.
Beuscart, A. Burgun, JM. Brunetaud, G. Chatellier, SJ. Darmoni, R. Duvauferrier, M. Fieschi, P. GilloiS ,
F. Guille, F. Kohler, D. Pagonis, B. Pouliquen, G. Soula, J. Weber. The French
Virtual Medical University. In: Proceedings of MIE 2000, Sixteenth
International Congress of the European Federation for Medical Informatics,
Hanover, Germany.
18.
Darmoni SJ, Benichou J, Thirion B, Fuss J. A study
comparing centralized CD-ROM and decentralized intranet access to MEDLINE. Bull
Med Libr Assoc 2000 Apr;88(2):152-6
19.
Darmoni SJ, Thirion B, Leroy JP, Douyère M, Piot J. - The use of Dublin Core metadata in a
structured health resource guide on the Internet. - Bulletin of the Medical Library Association 2001;
July;89(3) 297-301.
20.
Boyer C, Gaudinat A, Baujard V, Geissbühler A. Health
on the Net Foundation: assessing the quality of health web pages all over the
world. Medinfo. 2007;12(Pt 2):1017-21.
21.
Koch T:
Quality-controlled subject gateways: definitions, typologies, empirical
overview, Subject gateways. Online Information Review 2000: 24(1): 24-34.
22.
Douyère M, Soualmia LF, Névéol A, Rogozan A, Dahamna
B, Leroy JP, Thirion B, Darmoni SJ: Enhancing the MeSH thesaurus to retrieve
French online health resources in a quality-controlled gateway. Health Info
Libr J 2004 Dec: 21(4):253-61.
23.
Névéol A, Rogozan A, Darmoni SJ. Automatic
indexing of online health resources for a French quality controlled
gateway. Information Management & Processing 2006:1; 695-709.
24.
Névéol A, Pereira S, Kerdelhué G, Dahamna
B, Joubert M , Darmoni SJ. Evaluation of a simple method for
the automatic assignment of MeSH descriptors. to health resources in a French
online catalogue. Medinfo 2007;129:407-11.
25. Benoit
Thirion, Susanne Pereira, Aurélie Névéol, Badisse Dahamna, Stéfan J. Darmoni. -
French MeSH Browser: a cross-language tool to access MEDLINE/PubMed. - AMIA
2007; 1132.
26. Hoelzer
S, Schweiger RK, Boettcher H, Rieger J, Dudeck J. Indexing of Internet
resources in order to improve the provision of problem-relevant medical
information. Stud Health Technol Inform. 2002;90:174-7.
27. Mc Gregor B. Constructing a concise medical taxonomy. J Med Libr Assoc. 2005 January; 93(1):
121–123.
28. Gehanno JF, Thirion B, Darmoni
SJ, Evaluation of Meta-concepts for Information Retrieval in a
Quality-Controlled Health Gateway. AMIA Symp. 2007 (in press).
29. Abad Garcia F, Gonzalez Teruel A, Bayo Calduch P, de Ramon
Frias R, Castillo Blasco L. A comparative study of six European databases of medically oriented Web resources. J
Med Libr Assoc. 2005;93(4):467-79.
30. Darmoni SJ, Thirion B, Ionut-Florea
F, Rogazan A, Letord C, Kerdelhué G, Dacher JN. Affiliation of a
resource type to a MeSH term in a quality-controlled health gateway Medinfo
2007, Twelveth World Congress on Health and Medical Informatics (poster). [affiliation_medinfo_2007.pdf]
28-08-2009