VOC bijeenkomsten

  • VOC-najaarsbijeenkomst 2009
  • VOC-voorjaarsbijeenkomst 2009
  • VOC-najaarsbijeenkomst 2008
  • VOC-voorjaarsbijeenkomst 2008
  • VOC-najaarsbijeenkomst 2007
  • VOC-voorjaarsbijeenkomst 2007
  • Eerdere VOC bijeenkomsten 1998-2006
  • VOC Jubileumbijeenkomst 2009

    The VOC celebrates its 20th anniversary with a Jubilee Meeting on November 12-13, 2009. The meeting promises an exciting programme of lectures by distinguished researchers. Typical for the rich variety of VOC interests, the theme ‘Everything is in Flux’ will be approached from different perspectives. The interesting program, of which you can find more details below, leaves room for social interaction. The Meeting will take place in the beautifully located Wageningse Berg in Wageningen (www.hoteldewageningseberg.nl).

    The meeting is open to both VOC-members and non-members. You are advised to book early as accommodation is limited. Early-bird rates are available for those registering before August 10th 2009. After September 10th, we do not guarantee accommodation at the ‘De Wageningse Berg’, and you will have to arrange the accommodation by yourself.

    Registration fees:

    Early Bird Registration

    (before August 10th 2009)

    Late Registration

    (after August 10th 2009)

    VOC Members
    200 euro*
    250 euro*
    Non-Members
    240 euro*
    290 euro*

    *Guests who don’t need accommodation at De Wageningse Berg or register after September 10th, get a reduction of 60 euro.

    The registration fee for the Jubilee meeting includes:

    • Lunches and refreshment breaks during the Jubilee meeting
    • Dinner on the evening of 12 November
    • Accommodation at ‘De Wageningse Berg’ (if you register before September 10th) for the night of 12 November

    To register, download the registration form here. Send the completed form by email to m.e.timmerman at rug.nl.

    For specific questions, please contact the VOC chair, Ron Wehrens (r.wehrens at science.ru.nl)

    The complete program including abstracts can be found in the latest newsletter.

    Program

    Classification from a Philosophy of Science Perspective


    Other times, other suffering. On the changing classification of despondency.

    Trudy Dehue (University of Groningen; www.rug.nl/staff/g.c.g.dehue/index)


    Biological and Social Networks


    Gene Sets, False Discovery Rates, and High-Dimensional Prediction.

    Korbinian Strimmer (University of Leipzig, strimmerlab.org/korbinian.html)


    From metabolomics data to biological networks and back.

    Age Smilde (University of Amsterdam, www.bdagroup.nl/index.php/people)


    Modelling interdependent actors: Cross-sectional and longitudinal approaches in social network analysis.

    Christian Steglich (University of Groningen, www.ppsw.rug.nl/~steglich/)


    Biostatistics


    Targeted Maximum Likelihood Machine Learning: Applications to Causal effect/Variable Importance Assessment and Prediction with Censored Data.

    Mark van der Laan (University of California, Berkeley; www.stat.berkeley.edu/~laan)


    Characterisation and inference of biological networks.

    Marcel Reinders (Delft University of Technology; ict.ewi.tudelft.nl/~marcel/)


    Spectral decomposition and fuzzy clustering of network data with an application in genetics .

    Cajo ter Braak (Wageningen University and Research Centre, www.biometris.wur.nl/UK/Staff/Cajo+ter+Braak


    Gene and QTL networks

    Ritsert Jansen* (University of Groningen, gbic.biol.rug.nl/~rjansen)


    Psychometrics

    Classification models to retrieve the sequential process basis of person-in-context behavior.

    Iven van Mechelen (University of Leuven, ppw.kuleuven.be/okp/people/Iven_Van_Mechelen)


    Classification through time: Markov models for time series data.

    Ingmar Visser (University of Amsterdam; users.fmg.uva.nl/ivisser)


    Sudden change and types.

    Han van der Maas (University of Amsterdam; users.fmg.uva.nl/hvandermaas)


    Some applications of stochastic differential equations in psychological research.

    Francis Tuerlinckx (University of Leuven; ppw.kuleuven.be/okp/people/Francis_Tuerlinckx)


    Latent variable modeling of strategy choice and strategy accuracy in primary school mathematics.

    Marian Hickendorff (Leiden University; www.socialsciences.leiden.edu/psychology/organisation/ms/staff/hickendorff.html)


    Voorjaarsbijeenkomst 2009

    The spring meeting of the VOC will be held at Tilburg University on Friday 17th April. The topic of the meeting is 'Mixture Models'. During the meeting, presenters from different research areas will present state of the art work in mixture modeling. Those who would like to participate are welcome and are kindly requested to register at the VOC website by using this link. Participation is free, lunch is available for 10 Euros and must be requested upon registration. Registration deadline: April 10th.

    Location: The meeting will take place in room WZ-101 of the Warande building on the campus of Tilburg University. For directions on how to get there, see http://www.uvt.nl/contact.

    The program is as follows:

    9.45 Welcome
    10.00 Geert Molenberghs A Latent-Variable Mixture Model as a Basis for Sensitivity Analysis in Incomplete Longitudinal Data
    10.45 Coffe & VOC annual member meeting
    11.15 Gerrit Gort Codominant scoring of AFLP: an application of normal mixture models
    11.50 Andries van der Ark A new reliability coefficient based on latent class analysis
    12.30 Lunch
    13.45 Carlos Hernandez Timing and Speed of New Product Price Landings
    14.20 Tomoki Tokuda Bayesian mixture modelling with variable selection
    14.55 Tea
    15.15 Christian Hennig Merging normal mixture components
    16.00 Drinks

    Abstracts

    A Latent-Variable Mixture Model as a Basis for Sensitivity Analysis in Incomplete Longitudinal Data

    Geert Molenberghs

    Standard methodology used to analyze incomplete longitudinal data has been based for a long time on methods such as “last observation carried forward” (LOCF) and “complete case analysis” (CC). Since these are based on extremely strong assumptions (even the strong MCAR assumption does not suffice to guarantee an LOCF analysis is valid) and their validity can be called into question, there is a tendency to shift towards more generally valid methodology. In the so-called selection model framework, under MAR, valid inference can be obtained through a likelihood-based or Bayesian analysis, including the linear, generalized linear, and non-linear mixed models, without the need for modelling the dropout process. In addition, weighted generalized estimating equations (WGEE) can be used under MNAR. As a consequence, incomplete longitudinal data, both of a Gaussian as well as of a non-Gaussian nature, can easily be analyzed under the MAR assumption, using standard statistical software tools. However, missingness not at random (MNAR) can never be entirely excluded, and one should therefore supplement an MAR analysis with a suitable chosen set of sensitivity analyses. Existing methods are based on local influence or the use of the pattern-mixture modelling framework. In this presentation, we propose a flexible class of models, based on a combination of latent variables and random effects that govern both the measurement and missingness processes. The method will be presented, estimation will be discussed, and its position within the family of sensitivity analysis tools considered.

    Reference

    Beunckens, C., Molenberghs, G., Verbeke, G., and Mallinckrodt, C. (2008). A latent-class mixture model for incomplete longitudinal Gaussian data. Biometrics, 64, 96-105.

    Geert Molenberghs is Professor of Biostatistics at Universiteit Hasselt and Katholieke Universiteit Leuven in Belgium. He received the B.S. degree in mathematics (1988) and a Ph.D. in biostatistics (1993) from Universiteit Antwerpen. He published on surrogate markers in clinical trials, and on categorical, longitudinal, and incomplete data. He was Joint Editor of Applied Statistics (2001-2004) and Co-Editor of Biometrics (2007-2009). He was President of the International Biometric Society (2004-2005), received the Guy Medal in Bronze from the Royal Statistical Society and the Myrto Lefkopoulou award from the Harvard School of Public Health. Geert Molenberghs is founding director of the Center for Statistics. He is also the director of the Interuniversity Institute for Biostatistics and statistical Bioinformatics (I-BioStat). Together with Geert Verbeke, he authored several books on the use of linear mixed models for the analysis of longitudinal and incomplete data and taught numerous short and longer courses on the topic in universities as well as industry, in Europe, North America, Latin America, and Australia. Geert Molenberghs repeatedly received the American Statistical Association's Excellence in Continuing Education Award (2002, 2004, 2005, 2008). He is elected Fellow of the American Statistical Association and elected member of the International Statistical Institute.

    Codominant scoring of AFLP: an application of normal mixture models

    Gerrit Gort

    AFLP is a DNA fingerprinting technique frequently used in the plant sciences. AFLP results in an electrophoretic gel containing patterns of bands of different genotypes. In the presentation we will show AFLPs from 94 tomato genotypes, as studied in the Centre for BioSystems Genomics program. The bands represent DNA fragments. AFLP profiles are usually binary interpreted, that is, bands are scored as either present or absent. The gels, however, reveal more information if the intensities of the bands are scored. The intensity of a band can be used as a measure of the amount of DNA. In the case of diploid organisms, like tomato, 3 groups of genotypes are expected: genotypes with two copies of a DNA fragment (homozygous), with one copy (heterozygous), and no copies (homozygous absent). We fit normal mixture models using the EM-algorithm to the band intensities, allowing for 3 groups. The inference on zygosity of a genotype is called codominant scoring.

    Gerrit Gort is assistant professor of Statistics at Biometris, Wageningen University. His main task is teaching statistics courses at the BSc and Msc level, like Introduction Statistics, Advanced Statistics, and Modern Statistics of the Life Statistics. He is furthermore involved in statistics courses for PhD students, like Linear Models, Generalized Linear Models, Mixed Models, and Bayesian Statistics, and in statistical consultancy for PhD studentes and staff of Wageningen University. Presently he is finalizing his PhD on statistical properties of AFLP.

    A new reliability coefficient based on latent class analysis

    L. Andries van der Ark

    We used the latent class model to estimate the reliability of test scores. Unlike well known reliability coefficients such as Cronbach's alpha and Guttman's lambda 2, the new reliability coefficient is not a lower bound to the reliability but an unbiased estimate of the reliability under the theoretical condition that the latent class model fits perfectly. In practice, a first problem is that only a limited number of latent classes can be estimated, and a second problem is that computation time increases rapidly with the number of latent classes. We studied the bias and computation time of the new coefficient under several conditions with respect to numbers of items and numbers of latent classes, and compared them with bias and computation of existing reliability coefficients. Tentative results indicate that the new coefficient has less bias than other reliability coefficients (alpha, lambda 2, MS statistic) even when a latent class model is used with a limited number of latent classes.

    Andries van der Ark is associated professor at the Department of Methodology and Statistics, Tilburg University. His primary research interests include item response theory, latent class

    analysis, and missing data analysis.

    Timing and Speed of New Product Price Landings

    Carlos Hernandez

    In this paper we examine how new products are priced over time. Specifically, we develop a model to describe the often-observed sharp decreases in prices. In the model we focus on the main features of this price decrease: the timing and the speed of the decrease. Many high-tech products, information goods and durable goods exhibit exactly one significant price cut some time after their launch. We call this price landing and we propose a model for prices that has the timing of price landings and their speed as main parameters. Prior literature suggests that price landings might be driven by sales, product line pricing, competitor’s sales or simply by time. We propose a mixture specification to find out which of these explanations best describe the pricing patterns we observe in our data. The price landing will obviously differ across products, even if the same driving force applies to the products. We explicitly allow for heterogeneity in the timing and speed of the landings for each mixture component. To this end we develop a hierarchical mixture model. To our knowledge, an empirical study of price landings, like ours, is unavailable. We estimate our model using a rich dataset containing the sales and prices of 1195 newly released video games. In contrast with previous literature, our findings suggest that it is not product line pricing or sales but that it is mainly competition and time itself that best describes price landings. Finally, we find substantial heterogeneity in the timing and speed of landing across firms and product types.

    Carlos Hernandez is PhD Candidate at ERIM (Erasmus Research Institute of Management). Prof. Philip Hans Franses and Dr. Dennis Fok supervise him and he is currently writing his thesis that focuses on marketing models for new products.

    Bayesian mixture modelling with variable selection

    Tomoki Tokuda

    A general problem in clustering high-dimensional data is that inclusion of irrelevant variables can mask the 'true' group structure. For an effective clustering of observations, some form of variable selection is therefore essential. In this presentation, I will discuss a Bayesian multivariate normal mixture method with variable selection for high-dimensional data, proposed by Tadesse, Sha and Vannuci (JASA, 2005). It is found that there are three drawbacks for this method. Firstly, the method is not scale-invariant (i.e., transforming the unit of one variable may influence the results); secondly, the results of the method are sensitive to the number of irrelevant variables; thirdly, the method may get trapped in a one-cluster solution. These drawbacks may considerably hamper the use of the method in practice. A way out of these drawbacks will be proposed together with simulation results. Furthermore, the performance will be compared with Steinley & Brusco's (Psychometrika, 2008) method, which is based on the k-means algorithm, combined with a so-called clusterability index for screening possible discriminating variables. In an earlier comparison, it outperforms various alternative clustering methods with variable selection.

    Tomoki Tokuda is a Doctoral student in the department of psychology, KU Leuven. He received B.S. in Mathematics from Nagoya University in Japan and M.S.in Statistics in KU Leuven.

    Merging normal mixture components

    Christian Hennig

    Normal mixture models are often used for cluster analysis. Usually, every component of the mixture is interpreted as a cluster. This, however, is often not appropriate. A mixture of two normal components can be unimodal and quite homogeneous. Particularly, mixtures of several normals can be needed to approximate homogeneous non-normal distributions. Even if there are non-normal subpopulations in the data, the normal mixture model is still a good tool for clustering because of its flexibility. This presentation is about methods to decide whether, after having fitted a normal mixture, several mixture components should be merged in order to be interpreted as a single cluster. Note that this cannot be formulated as a statistical estimation problem, because the likelihood and the general fitting quality of the model does not depend on whether single mixture components or sets of mixture components are interpreted as clusters. So any method depends on a specification of what the user wants to regard as a "cluster". There are at least two different cluster concepts, namely identifying clusters with modes (and therefore merging unimodal mixtures) and identifying clusters with clear patterns in the data (which for example means that scale mixtures, though unimodal, should not necessarily be merged). Furthermore, it has to be specified how strong a separation is required between different clusters. The methods proposed and compared in this presentation are all hierarchical. From an estimated mixture, pairs of components (and later pairs of already merged mixtures) are merged until members of a pair are separated enough in order to be interpreted as different clusters. This can be measured in many different ways. It can be checked whether mixtures are (approximately) unimodal using the ridgeline approach of Ray and Lindsay (2006) or the dip test (Tantrum, Murua and Stuetzle 2003). The misclassification probability between mixtures can be estimated by using estimated a posteriori probabilities, the Bhattacharyya distance or a modified version of the prediction strength (Tibshirani and Walther 2005).

    Christian Hennig is lecturer at the Department of Statistical Science, University College London since 2005. He studied Mathematics at the University of Hamburg and Statistics at the University of Dortmund. He got his PhD 1997 at the University of Hamburg for a thesis about linear regression clustering under the supervision of Prof. Konrad Behnen. He worked as a university assistant at the University of Hamburg 1997-2001 and 2003-2005 and at the ETH Zuerich 2001-2003. He published on robust statistics, cluster analysis, mixture models, data visualisation, classification, philosophy of statistics, and applications of statistics in biogeography, astronomy, musicology, psychology, biology and chemistry. A current topic of interest is the exploration of the practical and philosophical implications of subjective decisions in statistics (particularly cluster analysis).

    Najaarsbijeenkomst 2008

    The autumn meeting of the VOC will take place Friday 28th November. The topic of the meeting is 'propensity scores'.

    Those who would like to participate are welcome and are kindly requested to register, before November 24th, at the VOC website by using this link. Participation is free.

    The meeting will take place in room T 3-24 of the T-building at the Woudenstein location of the Erasmus University Rotterdam. For details on how to reach the T-building of the Woudenstein complex of the Erasmus University Rotterdam, see http://www.eur.nl/adressen/wegwijzer/.

    The program of the meeting is as follows:

    10.00 Welcome & Coffee
    10.30 Saskia le Cessie Propensity scores, an introduction
    11.30 Fannie Cobben Het gebruik van respons propensities in survey onderzoek
    12.15 Lunch
    13.45 Arjan Blokland The (collateral) effects of imprisonment
    14.30 Edwin Martens Preference for propensity scores when estimating an average treatment effect in case of a dichotomous outcome
    15.15 Tea
    15.45 Stef van Buuren Pooling outcomes after quintile stratification
    16.20 Drinks

    Propensity scores, an introduction

    Saskia le Cessie

    Using propensity scores to deal with confounding has become very popular in recent years. By estimating the probability to receive a certain treatment (the propensity), one can adjust for observed imbalance between treatment groups. In this talk, the basis concepts of propensity scores are considered. We discussed in which situations propensity scores are useful. We also consider how propensity scores can be constructed; answering questions like whether all possible variables related to the treatment should be included in the score. Finally we compare different ways of using propensity scores: propensity matching, stratification, inverse probability weighting, and using the propensity score as covariate. We show that the different approaches can yield quite different results.

    Saskia le Cessie is an associate professor at the department of Medical Statistics and Bio-informatics and at the department of Clinical Epidemiology of the Leiden University Medical Center. Her research interests are in statistical methods for epidemiological research. She is a consultant for the Comprehensive Cancer Center West and a member of the scientific board of the Dutch Arthritis Association (Reumafonds). She has been an associate editor of Applied Statistics (JRSS-C) and has served in the Editorial Advisory Committee and the Council of the International Biometrical Society. She has been a co-author of over 150 publications in Medical and Statistical Journals.

    Het gebruik van respons propensities in survey onderzoek

    Fannie Cobben

    Door de introductie van de propensity score method (Rosenbaum en Rubin, 1983) heeft het idee van het gebruiken van geschatte responskansen de laatste tijd veel aandacht gekregen in de survey literatuur. Rosenbaum en Rubin (1983) hebben deze methode geintroduceerd voor het schatten van het effect van medische behandelingen. Harris Interactive gebruikt de propensity score method met geschatte responskansen voor het oplossen van problemen in vrijwillige internetpanels veroorzaakt door onderdekking en zelf-selectie. Hierbij worden de geschatte responskansen beschouwd als zg. respons propensities.

    Het gebruik van (inverse) responskansen voor de correctie van nonrespons is al in 1952 geopperd door Horvitz en Thompson. Zij stellen voor om de insluitkansen in hun bekende Horvitz-Thompson schatter aan te passen voor het optreden van (selectieve) nonrespons. Onder andere Bethlehem (1988) en Särndal et al. (1992) beschrijven hoe de Horvitz-Thompson schatter aangepast kan worden aan de situatie van nonrespons met het gebruik van geschatte responskansen.

    In mijn presentatie beschrijf ik een aantal methoden voor het gebruik van respons propensities in de correctie voor nonrespons bias. Vervolgens beschrijf ik de toepassing van deze methoden op het Permanent Onderzoek Leefsituatie uit 2002. In mijn presentatie zal ik ingaan op de verschillen tussen de methoden onderling, en het verschil met de traditionele nonrespons correctie methode van linear wegen.

    Fannie Cobben heeft econometrie gestudeerd aan de Vrije Universiteit (VU) in Amsterdam. Haar afstudeerrichting was Statistische Econometrie, of econometrie in enge zin. Na het behalen van haar bul, is zij in 2004 gaan werken bij de Divisie Methodologie en Kwaliteit van het Centraal Bureau voor de Statistiek. Zij werkt momenteel aan haar PhD-thesis over analyse en correctie van nonrespons in persoonsenquêtes, onder begeleiding van Jelke Bethlehem.

    The (collateral) effects of imprisonment

    Arjan Blokland

    At the outset of the new millennium 2.5 million individuals are confined in prisons or jails across North America and Western Europe and in most countries rates are at or near all time highs.

    A growing international literature has been attentive to the collateral consequences of the increased use of imprisonment. The potential irony of mass imprisonment is that, to the extent it has unintended adverse effects on life outcomes that are correlated with criminal offending, large-scale growth in the incarceration rate may actually exacerbate the crime problem over the long run by stigmatizing an ever larger class of individuals. Using data from the Netherlands-Based Criminal Career and Life-course Study the effect of first-time adult imprisonment on criminal recidivism and life circumstances in the years following the imprisonment was examined. Unadjusted comparisons of those imprisoned and those not imprisoned will be biased because imprisonment is not meted out randomly. Selection processes will tend to make the imprisoned group disproportionately crime prone compared to the not imprisoned group. In this study group-based trajectory modeling was combined with risk set matching to balance a variety of measurable indicators of criminal propensity.

    Arjan Blokland (PhD) is researcher at the Netherlands Institute for the Study of Crime and Law Enforcement (NSCR) in Leiden and senior-researcher at Parnassia Addiction Research Centre (PARC) in The Hague. In 2006 he received a VENI-grant for his work on specialization in offending. He currently chairs the European Developmental and Life-course Criminology working group. His main area of research is life course criminology and focuses on the development of criminal careers, the influences of life course transitions on criminal behavior, drug use and crime and the (un)intended consequences of interventions.

    Preference for propensity scores when estimating an average treatment effect in case of a dichotomous outcome

    Edwin Martens

    In observational studies with a dichotmous outcome, a multivariable logistic regression analysis is often used to adjust for confounding and estimate an adjusted treatment effect. This treatment effect is in general an overestimation of the treatment effect that is in most circumstances the intended one. The method of propensity scores on the other hand, will result in a treatment effect that is in general closer to the treatment effect that would have been found when the study was a randomized one. The larger the number of confounders or the larger the treatment effect, the more preferred is the method propensity scores over a multivariable logistic regression analysis.

    After 12 years of sociologic-economic research at the Erasmus University of Rotterdam, Edwin Martens worked from 2000 as a biostatistiian at Utrecht University. In 2007 he finished his PhD on the methods of propensity scores and instrumental variables.

    Pooling outcomes after quintile stratification

    Stef van Buuren

    Propensity score methods offer both theoretical and practical advantages over conventional regression techniques to control for bias in observational studies. Quintile stratification is a popular technique in which exposed and non-exposed subjects are divided into five homogeneous strata. Exposed and non-exposed are compared within each stratum, which leads to five results instead of one. The relevant literature pays surprising little attention to the problem how to aggregate these results into one overall estimate. I will outline pooling methods for differences in means and proportions and for the odds ratio, and illustrate these methods on real data.

    Stef van Buuren develops and applies quantitative methods in medicine and social science, with an emphasis on childhood growth and incomplete data. Van Buuren is professor of applied statistics in prevention at the University of Utrecht. More information can be found at http:\\www.stefvanbuuren.nl.

    Voorjaarsbijeenkomst 2008

    The spring meeting of the VOC is organized in cooperation with het Nederlands Forensisch Instituut (NFI). The topic of the meeting is 'Statistical methods in criminology and law enforcement'. The NFI is very kind to host the meeting. For security reasons, those who like to participate are kindly requested to register before April 10 and to bring an ID (passport, driving license or other valid ID) to be shown at the reception. Please note that without valid ID you will not be allowed to enter the NFI building.

    Registration can be done by using this link. Note that after April 10th it is not possible to register for this meeting. Participation is free and the lunch and drinks are kindly offered by the NFI.

    For details how to reach the NFI see: http://www.forensischinstituut.nl/contact/. Given the limited parking facilities at NFI, participants who will arrive by car are kindly requested to use the car park of “terrein van Wereldhave”, which is located near the NFI building.

    The program of the meeting is as follows:

    10.00 Coffee & Registration
    10.30 Colin Aitken Evidence evaluation using the likelihood ratio statistic
    11.20 Catrien Bijleveld Some examples from criminological research practice
    11.55 Annabel Bolck XTC classification and evaluation
    12.30 Lunch
    14.00 Peter van der Heijden Estimating the prevalence of rule transgression using data collected by randomized response
    14.50 Maarten Cruyff Population size estimation using zero truncated Poisson models
    15.25 Tea
    15.55 Andre Hoogstrate Evidence: value, confidence and statistics
    16.25 VOC annual member meeting
    16.45 Drinks

    ABSTRACTS


    Evidence evaluation using the likelihood ratio

    Colin Aitken

    Likelihood ratios provide a natural way of computing the value of evidence under competing propositions in that one can report that the evidence is so-many times more likely if one proposition is true than another one is true. Likelihood ratios have been developed for multivariate hierarchical random effects models with the use of graphical models to help with the curse of dimensionality. An example of the application of this methodology to forensic casework is given.

    Colin Aitken is Professor of Forensic Statistics at The University of Edinburgh. He has a PhD from the University of Glasgow. His research interests lie at the interface of statistics, law and forensic science and he has published many papers and is co-author of two books on the subject. He is Editor-in-Chief of Law, Probability and Risk and Chairman of the Statistics and Law working group of the Royal Statistical Society. He is a Chartered Statistician and Fellow of the Forensic Science Society.

    Some examples from criminological research practice

    Catrien Bijleveld

    In this presentation I give a number of examples of methodological applications in criminological research. Starting with an overview of the development of research methodology in criminology in the Netherlands over the past 25 years, I next focus on experimental studies, and on studies on international crimes and gross human rights violations, in which there is extreme data paucity.

    Catrien Bijleveld is professor of research methods in Criminology at the department of criminal law and criminology at the Free University of Amsterdam and Senior Researcher at the Netherlands Institute for the study of Crime and Law Enforcment (NSCR), where she is coordinator of a research group focusing on life course, crime and intervention. She strongly advocates the use of quantitative as well as qualitative measurements and statistical models in the study of crime. Her research focuses on criminal careers, intergenerational continuity in offending, effectiveness of interventions and international crimes.

    XTC classification and evaluation

    Annabel Bolck

    The Netherlands is one of the most XTC producing countries. Therefore much research is done in this field. One of the research questions the NFI was involved in concerned the classification of XTC tablets by their origin: "Is it possible to discriminate between XTC tablets from different batches/factories/production methods? In this presentation I will show that parts of this research questions can be answered, but that there are also many problems involved in this. Eventually I will show how this research can also help to answer a typical Forensic case work question concerning comparison investigation. Many questions in Forensic case work concern comparisons. Shoes with shoeprints, fingermarks with fingerprints, DNA on the crime scene with that of a suspect and so on. What does it mean when many similarities are found between the compared objects/subjects? What is the strength of this evidence? A comparison problem in drug research concerns the question whether tablets of two different consignments of tablets (found at different locations at different times) come from the same batch. This problem has its own aspects, but is also related to the classification problem discussed earlier.

    Annabel Bolck studied Econometrics at the University of Groningen. In 1996 she finished her PhD in Chemometrics, also in Groningen. After two and a half years of research as a postdoc at the faculty of social Sciences at Tilburg University, she worked for 4 years as a statistician at 2 different Universities abroad (Fiji and Australia). In September 2002 she came back to the Netherlands and was employed by the Netherlands Forensic Institute (NFI) as a statistician, where she still works.

    Estimating the prevalence of rule transgression using data collected by randomized response

    Peter G.M. van der Heijden and co-workers

    In criminology self-report studies are a means to obtain prevalence estimates of for rule transgressions, violations of the law, and so on. In surveys individuals are interviewed about their behaviour. An obvious problem is, or course, that due to reasons such as social desirability people do not always answer honestly about their behaviour.

    For this reason about forty years ago randomized response was introduced to collect data about sensitive issues. Many forms of randomized response are possible. In our own research we have predominantly used the following form: a respondent is asked to throw two dice. When the dice roll 2, 3, or 4, with probability P1=1/6, the respondent is asked to answer ‘yes’ irrespective of his own true answer to the question. When the dice roll 11 or 12, with probability P2 = 1/12, the respondent is asked to answer ‘no’ again irrespective of his own true answer, and when the dice roll 5-10 the respondent is asked to answer the question truthfully (P3 = 3/4). The interviewer, who does not know the outcome of the dice, only hears the answer 'yes' or 'no' and thus the respondent can safely give his answer. Yet, because the randomization mechanism is known it is possible to estimate the prevalence of the sensitive topic.

    Our research group has worked in this area for about 10 years and I will give an overview of our results. The results are:

    (i) a 'best practice' for asking randomized response questions

    (ii) a meta-analysis showing that randomized response is the most valid method for answering questions about sensitive topics

    (iii) accommodating existing models for the multivariate data so that they can handle randomized response data, such as logistic regression, item response theory, and randomized response count data

    (iv) accommodating these models for the potential presence of respondents that do not follow the randomized response design.

    We present these results and illustrate them using surveys that we conducted for the Ministry of Social Affairs into social benefit fraud, that we conducted on a two-yearly base from 1998 to 2006.


    Peter G.M. van der Heijden obtained his PhD at Leiden University in 1986 and became professor of statistics at the Faculty of Behavioural and Social Sciences at Utrecht University in 1992. There he founded the Department of Methodology and Statistics, that is now one of the largest departments in this field in the Netherlands. His research interests focus on the analysis of categorical data, in particular randomized response and population size estimation using capture-recapture. He conducts much contract research for ministries where he applies these methods and further develops them. Email: p.g.m.vanderheijden@fss.uu.nl, for recent publications see www.fss.uu.nl/ms/vanderheijden

    Population size estimation using zero-truncated Poisson models.

    Maarten Cruyff

    The size of delinquent populations is usually unknown. For delinquent populations police registration files usually exist that contain information about the number of apprehensions of the individual population member. Under the assumption that the number of apprehensions is generated by a homogenous Poisson process, these capture-recapture allow for population size estimation. In most applications however, the assumption of homogeneity does not hold. The presentation gives an overview of different models that allow for a heterogeneous Poisson process. These models have been applied to estimate such populations as drunken drivers, illegal immigrants, illegal gun owners and perpetrators of domestic violence. Some examples will be presented to compare the performances of these models.

    Maarten Cruyff studied psychology at Leiden University and started in 2002 as Ph.D student at the Department of Methodology & Statistics of Utrecht University, where he currently works as assistant professor. His fields of interest are the multivariate analysis of randomized response data and capture-recapture analysis.

    Evidence: value, confidence and statistics

    André Hoogstrate

    This presentation covers 2 topics. First the Knowledge and expertise centre for intelligent data analysis (Kecida) which is currently formed at the Netherlands Forensic Institute will be introduced. Secondly I will discuss several aspects of the increased use of statistics and statistical modeling in law and law enforcement. The focus will be on the evidential value of evidence. Besides a short introduction of the Likelihood ratio or Bayesian approach the link will be made with de diagnostic value. Finally, some ongoing research in will be presented.

    Dr. André J. Hoogstrate received his PhD in Econometrics from the University of Maastricht in 1998. He further was affiliated from 1998 to 2001 with the Tilburg University and Center. Since 2000 he has held several positions at the Netherlands Forensic Institute. Currently he is project manager of the Knowledge and expertise center for intelligent data analysis(Kecida).

    Najaarsbijeenkomst 2007

    We would like to inform you about our VOC Fall meeting 2007. The meeting will take place on Friday, November 16, 2007 at Leiden University. We are happy to announce our speakers, who will cover the central theme 'Applications in biostatistics' from different perspectives: Richard Gill (Leiden University; keynote), Jos Twisk (Free University, Medical Center; keynote), Hein Putter (Leiden University), Luc Bonneux (Netherlands Interdisciplinary Demographic Institute), Frans Oort (University of Amsterdam), Caspar Looman (Erasmus MC Rotterdam).

    Those who would like to participate are welcome and are kindly requested to register at the VOC website by using this link, or by sending an e-mail to meeting@voc.ac. Participation is free, lunch is available for 10 Euros and must be requested upon registration. Registration deadline: November 12th.

    The program is as follows:

    .
    10.15 COFFEE
    10.45 Jos Twisk The analysis of recurrent event data: An overview
    11.30 Caspar Looman Decomposition techniques for health expectancy: Linking causes for disability (diseases) to disability prevalence
    12.00 Frans Oort Response shift and measurement bias
    12.30 LUNCH
    14.00 Hein Putter Joint analysis of multiple longitudinal outcomes: Application of a latent class model
    14.30 Luc Bonneux From bias to politics
    15.00 TEA
    15.25 Richard Gill Lies, damned lies and legat truths: statistics and data-analysis in the courtroom
    16.10 DRINKS

    Locatie: De bijeenkomst vindt plaats in Zaal CH10 van het tijdelijke gebouw dat "het chalet" wordt genoemd. Dit tijdelijke gebouw bereik je door uit het Pieter de la Court gebouw over het parkeerterrein te lopen. Routebeschrijving.

    ABSTRACTS

    Jos Twisk (Vrije Universiteit in Amsterdam)

    The analysis of recurrent event data: An overview

    The purpose of this presentation is to give an overview of different easily applicable statistical techniques to analyse recurrent event data. These techniques include naive techniques that are mostly used in epidemiological studies and longitudinal techniques such as Cox regression for recurrent events, generalised estimating equations (GEE), and random coefficient analysis. The different techniques are illustrated with a dataset from a randomised controlled trial regarding the treatment of lateral epicondylitis. It is striking to see that the different statistical techniques lead to different results and different conclusions regarding the effectiveness of the different intervention strategies. It is concluded that if one is interested in a particular short term or long term result, simple naive techniques are appropriate. However, if the development of a particular outcome is of interest, statistical techniques that consider the recurrent events and additionally correct for the dependency of the observations are necessary.

    Jos Twisk studied human movement science at the Vrije Universiteit in Amsterdam, and after his graduation in 1990, he started to work at the same faculty, where he joined the research team of the Amsterdam Growth and Health Study. In 1995, he finished his PhD-thesis, which was related to this longitudinal study. In the same year, he moved with the AGHLS from the faculty of human movement science to the EMGO-Institute. After his PhD, he supervised several projects within the AGHSL and participated as a teacher and coordinator in several postdoctoral courses given at the EMGO-Institute. In this period, he specialised himself in the methodological field of longitudinal data analysis and multilevel analysis and wrote two textbooks about it (both published by Cambridge University Press). In 2000, he moved to the department of clinical epidemiology and biostatistics of the VU university medical centre. In 2005, he became head of the department of Methodology and applied biostatistics at the Institute of Health Sciences from the Vrije Universiteit in Amsterdam. He is also head of the expertise centre of applied longitudinal data analysis, which is an interfaculty centre of the Institute of Health Sciences and the Medical Centre of the Vrije Universiteit in Amsterdam. His main activities are statistical and methodological consultancies (both in the clinic and at the university), and teaching.

    Caspar Looman & Wilma Nusselder (Department of Public Health, Erasmus MC Rotterdam)

    Decomposition techniques for Health Expectancy: Linking causes for disability (diseases) to disability prevalence

    Health expectancy is an extension of the concept of life expectancy. Hereby it is possible to compare health of populations not only by mortality, but also by differences in health of the living, for instance, levels of disability. To further investigate these differences it is profitable if bad health (disability) can be linked to causes, i.e., different diseases. For simple life expectancy (based on mortality) causes of death can be used, but for HLE surveys have to be used, which are often only cross-sectional.

    After a short introduction to the concepts of life tables and (healthy) life expectancy we will present a method to quantify the attribution of diseases to the prevalence of disability. A regression model with additive hazards performs the trick, but because of the necessary interactions with age we propose a parsimonious model where impacts of different diseases and age patterns are combined in a factorial way.

    Caspar Looman completed a study for rural engineer at Wageningen University, but changed already during his study to data analysis and statistics and was further educated by Cajo ter Braak. In 1985 he entered the Department of Public Health of the Erasmus University Rotterdam as a consulting statistician and still holds the same position

    Frans Oort (University of Amsterdam)

    Response shift and measurement bias

    In research on health-related quality-of-life (HRQL; i.e., self-perceived physical, mental, and social health), we have observed that there are studies in which severe patients report better HRQL than healthy people. Moreover, we have observed that patients who are objectively deteriorating, actually report improving HRQL. In HRQL research such unexpected results are often attributed to response shifts, caused by changes in patients’ frames of reference.

    In an attempt to clarify what response shift is and what it is not, it has been formally defined as a violation of measurement invariance. Measurement invariance is defined as f(X|T) = g(X|T,V), where f and g are distribution functions, X are test scores (e.g., scales of a self-report questionnaire), T are the attributes that we want to measure (e.g., quality of life), and V are possible violators of measurement invariance (e.g., other attributes than T). Variables X, T, and V may be nominal, ordinal, interval or ratio, they may be latent or manifest, and interrelationships may be linear or non-linear. Measurement bias is defined as a violation of measurement invariance. With longitudinal data, time of measurement occasion can be taken as V, and response shift can be defined as a special case of measurement bias.

    The measurement bias definition of response shift implies that it can be detected by testing measurement invariance across measurement occasions. The choice of a particular test depends on the nature, measurement levels, and interrelationships of variables X, T, and V. For example, if the X variables are observed and the T variables are latent, then structural equation modelling can be used to detect response shifts. Many more methods are described for research on measurement bias. However, most of this research is geared to item bias (or differential item functioning), with item responses for X and group membership for V.

    A possible drawback of the response shift definition is its association with a measurement perspective that may seem shallow to people who want to consider response shift as an explanation for unexpected effects on self-reported health that are much larger than those that can be explained by measurement bias. In addition, advocates of the so-called formative measurement model may object to the reflective measurement model that is used in response shift detection.

    So: What do we win by considering response shift as a special case of measurement bias? And what do we loose?

    Frans Oort studied Psychology at the University of Amsterdam, graduated in 1989, and got his Ph.D. in 1996. He is especially interested in non-standard applications of structural equation modelling (SEM). SEM includes the latent variable modelling of mean and covariance structures. His thesis was about the application of SEM to item response theory and test construction. At Leiden University he studied the application of SEM to three-mode data, such as multi-trait multi-method data, and multivariate longitudinal data. In 1999, he returned to the University of Amsterdam, to work as a statistician at the department of Medical Psychology of the Academic Medical Centre. In 2005 he was appointed as associate professor of Methods and Statistics at the Department of Education of the University of Amsterdam. Current interests include the integration of SEM with multi-level models, generalised linear models, exploratory factor models, and item response models. The focus of present research is “unbiased measurement” of psychological attributes in psychological and educational research (e.g., “quality of life”). Many problems in psychometrics, such as item bias, test bias, response shift, culture bias, gender bias, response styles and tendencies, social desirability, etc., can be described as violations of “measurement invariance”. This enables a single general approach to these various problems, using SEM to test measurement invariance hypotheses.

    Hein Putter (Leiden University Medical Center)

    Joint analysis of multiple longitudinal outcomes: Application of a latent class model

    We address the problem of joint analysis of two series of longitudinal measurements. The typical way of approaching this problem is as a joint mixed effects model for the two outcomes. Apart from the large number of parameters needed to specify such a model, perhaps the biggest drawback of this approach is the difficulty in interpreting the results of the model, particularly when the main interest is in the relation between the two longitudinal outcomes. Here we propose an alternative approach to this problem. We apply a latent class model to the longitudinal data of the first outcome. We then use the posterior class membership probabilities that follow from this latent class model and multiple imputation to study the relation between the latent classes and the other outcome(s). We apply the method to data from 195 consecutive lung cancer patients in two out-patient clinics of lung diseases in The Hague. At four pre-defined time points, a validated semi-structured interview measuring the level of denial (the DCI) as well as a number of validated questionnaires measuring emotional and physical functioning and QOL were assessed. The aim was to study the relation between denial on the one hand and socio-demographic and illness-related characteristics and a large number of emotional and physical functioning scales on the other hand. Our approach clearly revealed an interesting phenomenon: whereas no difference between classes could be detected for objective measures of health, patients in classes representing higher levels of denial consistently scored significantly higher in subjective measures of health.

    Hein Putter obtained his PhD in mathematical statistics under the supervision of Willem van Zwet. At present he is working at the department of Medical Statistics and Bioinformatics at the Leiden University Medical Center. His research interests include statistical genetics and survival analysis.

    Luc Bonneux (NIDI)

    From bias to politics

    The bias of a bowl in (English) bowling is the asymmetry which permits the bowl to roll a curved path. In epidemiology, bias is any systematic error which leads the results away from the true values. Statistics may disentangle chance and random noise from true signal, it is powerless against bias, because the error is systematic.

    The list of possible biases is very long. They fall apart in three families: bias by selection, information and confounding.

    Selection bias means that the study sample is not representative for the study population. For example, are non-smoking partners of smokers representative for the population of non-smokers? The results documenting the fatal consequences of passive smoking critically rely on these populations. If they are not representative, the results are untrue.

    Information bias means that the information collected in the study sample is different of the one that would be collected in the study population. For example, in cancer screening trials, the information available of screened and control populations is very different. If that leads to distortion in the registration of the cause of death, cause of death registers are systematically biased. The results of cancer screening trials critically rely on code of death registrations as their main outcome.

    Confounding bias means that the relation between study exposure and outcome is systematically distorted by some other exposure, causally related to the study exposure and to the outcome, and accounting for the observed relation. For example, people drinking more alcohol show higher lung cancer rates. But drinkers more often tend to smoke. If alcohol drinkers are separated by smoking status, within smoking strata lung cancer is associated with smoking, but not with drinking.

    However, in confounding, your strata may miss information on selected subjects. Epidemiological studies studying the long term health effects of particulate matter compare inhabitants of “dirty” cities with inhabitants of “clean” cities. They meticulously adjust for socio-economic variables. But are socio-economic variables in dirty cities equal to those in healthy cities? And are we measuring the effect of particulate matter, or of specific pollution cocktails in dirty towns?

    Biases are the best friends of epidemiologists, as they permit life long debate. They can not be decided by more statistical analysis, or more studies, or more power. If non-smokers exposed to tobacco smoke are not representative of the population non-smokers, if systematic information bias in screening studies leads to differential miscoding, if people living in rusty cities are not comparable to people living in clean cities, multiplying the same study designs only succeed in multiplying the same systematic errors and propagating the wrong results.

    The major epidemiological defence has always been randomisation, blinding and distrusting small signals in observational studies. However, political and increasingly financial interests of research organisations and academies has led to scientism, small signals called “scientific results” and used as fascist hammers bludgeoning the political opposition. The cited examples are chosen to highlight the contradictions between scientific fishing in muddy waters and transparent policy making by informed debate. Tobacco smoke is filthy, air pollution is undesirable and the benefits of cancer screening are always small. Whatever study design or analysis.

    Luc Bonneux studied tropical medicine in Antwerp and worked in a basic health unit in Zaire, now Congo. From 1986 he worked as physician and researcher in the department of microbiology of the institute of tropical medicine in Antwerp and in 1988 he became MSc in epidemiology in the London School of Hygiene and Tropical medicine. From 1989 he worked at the department of public health of Erasmus MC in Rotterdam on the thesis ''Degenerative disease in an aging population: Models and conjectures'' together with Jan Barendregt. After a period of free-lance epidemiologic projects he works now at the Netherlands Interdisciplinary Demographic Institute.

    Richard Gill (Leiden University)

    Lies, damned lies and legal truths: statistics and data-analysis in the courtroom

    In the legal proceedings against the nurse Lucia de B. it seems that every deadly sin in the statistical book was committed. How does one classify nurses as serial killers or not? How do you order them in degrees of murderousness? Is roster-data (time of medical events, time of shifts on duty) useful for this? Does it make any sense at all to use statistics in court, or would it be better to ban its use entirely?

    Richard Gill studied Statistics at the University of Cambridge. He obtained his Ph.D. in mathematics and natural sciences at the Free University in Amsterdam in 1979. In 2006 he was appointed as full professor of Mathematical Statistics at the University of Leiden. At present, he is a member of the biomathematics Leiden node of the NDNS+ cluster (the PLUS for probability and statistics), and advisor/project-coordinator in mathematical statistics at EURANDOM (Eindhoven). He is proud and honoured [and not a little humbled] to be president of the Dutch society for Statistics and Operations Research, VVS-OR. Current interests include quantum statistics, statistics in molecular biology and genetics, causality, graphical models, statistics health and tobacco, statistics and law, statistical and computational learning, missing data, and censoring.

    Voorjaarsbijeenkomst 2007: Joint VOC and BNVKI meeting on data mining

    Important Announcement: The talk of Thorsten Joachims has been cancelled. Paul Eilers will be his replacement. You can find his title and abstract in the program below.

    The scientific interests of the VOC and BNVKI members have a large overlap. Therefore, both societies would like stimulate the communication between their members and organize a joint meeting in Utrecht on Friday, April 27, 2007, around the broad theme of data mining. The location is Faculty Club, Room Kanunikkenzaal, Achter de Dom 7, Utrecht. We have an interesting mix of VOC and BNVKI speakers: Patrick Groenen, Cristophe Croux, Bernard De Baets, Koen Vanhoof, Lambert Schomaker, and Paul Eilers. Those who would like to participate are welcome and are kindly requested to register at the VOC website by using this link, or by sending an e-mail to meeting@voc.ac. Participation is free, lunch is available for 17 Euros.

    10.00 COFFEE
    10.40 Patrick Groenen Minimization for Support Vector Machines by Iterative Majorization
    11.20 Cristophe Croux Robust Discrimination: an influence function approach
    12.00 Bernard De Baets Monotone distribution classifiers
    12.40 LUNCH
    13.50 Koen Vanhoof Aggregation operators' measures
    14.30 Lambert Schomaker t.b.a.
    15.10 TEA
    15.40 Paul Eilers Statistical Classification for Reliable High-volume Genetic Measurements
    16.45 DRINKS

    ABSTRACTS

    Patrick Groenen (Erasmus Universiteit Rotterdam)

    Minimization for Support Vector Machines by Iterative Majorization

    Over the last few years there has been increasing interest in support vector machines (SVM) for the two-group classification problem. Currently, SVM classification belongs to the best prediction methods available. A nice property of the standard SVM is that it can be expressed as a minimization problem of a quadratic loss function. However, as the solution is often obtained by switching to the dual problem, it is not easy to understand what the method is doing. In this presentation, we stick to the original loss function and discuss an intuitive explanation of what makes an SVM working. It turns out that an SVM is closely related to multiple regression with optimal scaling. In addition, we present a new iterative algorithm to solve the SVM that is based on iterative majorization. We discuss the advantages and disadvantages of this algorithm and compare its performance against standard methods for SVM.

    Patrick Groenen is full professor in statistics at the Econometric Institute, Erasmus University Rotterdam. He has published several papers on multivariate analysis methods and multidimensional scaling in the international literature. He is also co-author of a textbook on multidimensional scaling. His research interests include visualization, multidimensional scaling, (nonlinear) multivariate analysis, support vector machines, and majorization. Until this meeting, he is chairman of the VOC.

    Christophe Croux (Catholic University Leuven)

    Robust Discrimination: an influence function approach

    A discriminant rule allows to classify an observation to a certain group, depending on the characteristics of the observations. Classification rules are constructed from a so-called training sample, where outliers might be present. Hence, robust discriminant rules have been developped. The performance of a classification rule is typically measured by the error rate, being the percentage of incorrectly predicted observations. We study the sensitivity of the error rate with respect to the observations in the training sample by means of an influence functions approach. Such an approach allows to quantify the robustness of classifiers. Besides a classification rule being robust, we would also like it to be efficient, where efficiency is measured by the closeness of the classifiers' error rate to the lowest possible error rate one could get, the Bayes error rate. We show that, in the setting of discriminant analysis, the second order influence function of the error rate can be used to compute this classification efficiency.

    Reference: Croux, C., Filzmoser, P., & Joossens, K. (to appear). Classification Efficiencies for Robust Linear Discriminant Analysis. Statistica Sinica.

    Christophe Croux is Professor of Statistics and Econometrics at the Catholic University Leuven (Belgium). His research interests are robust statistics, multivariate data analysis, classification, computational statistics, and applied time series analysis. He serves on the editorial board of Computational Statistics and Data Analysis and the Journal of the American Statistical Association.

    Bernard De Baets (Ghent University)

    Monotone distribution classifiers

    We present a rigorous framework for supervised ranking, a specific type of supervised classification:

    (1) objects are assigned labels belonging to a totally ordered set of labels;

    (2) objects are not described in terms of attributes, but in terms of (true) criteria;

    (3) the labels are assigned in a monotone way: objects with equal or higher scores on all criteria do not receive a lower overall score, or, in other words, are not assigned a lower label.

    The purpose of supervised ranking is then to produce such a monotone classifier on the basis of a learning sample. Real-world data sets of this kind are usually pervaded with two undesirable phenomena: doubt and reversed preference. We focus on distribution classifiers. The monotonicity constraint then naturally leads to the notion of stochastic dominance. We confine ourselves to an ordinal setting and present a general framework from which several instance-based supervised ranking algorithms can be derived, such as the Ordinal Stochastic Dominance Learner.

    Bernard De Baets (1966) holds an MSc and PhD in Mathematics (1988, 1995), and a Postgraduate degree in Knowledge Technology (1991). He is a Government of Canada Award holder (1988-89) and is Honorary Professor (2006) of Budapest Tech (Hungary). He is a Professor in Applied Mathematics (1999) at Ghent University, where he is leading the research unit Knowledge-based Systems (KERMIT) at the Faculty of Bioscience Engineering. The activities of KERMIT concern the principles and practice of the extraction, representation and management of knowledge by means of intelligent techniques. He is co-editor-in-chief of "Fuzzy Sets and Systems", area-editor of "40R" and is on the editorial board of 9 other journals. He coordinates EUROFUSE (the EURO Working Group on Fuzzy Sets), is a member of the Board of Directors of EUSFLAT, of the Administrative Board of the Belgian OR Society, of the Technical Committee on Artificial Intelligence and Expert Systems of IASTED, and of the Executive Board of the International Computational Intelligence Society.

    Koen Vanhoof (Universiteit Hasselt)

    Aggregation Operators’ Measures

    Over the last ten years, a whole range of aggregation operators (AGOPs) have been developed and extensively studied. Past research on the applications of AGOPs has mainly been focused on the AGOP’s domain representation power or decision-making strength. However, as this study shows, certain AGOP’s measures which play a role in the AGOP’s behaviour (i.e., behavioural parameters) can be of great importance to practitioners, especially when these behavioural parameters seem to be proxies for domain-specific concepts which is difficult to measure directly or to derive statistically. In such cases, these AGOP’s behavioural parameters become much more than just another mathematical measure. The claim that a behavioural parameter can be interpreted as a proxy for a domain concept has to be validated both theoretically and empirically. Theoretical validation implies a close match between the AGOP’s mathematical and behavioural properties on the one hand and the theoretical domain’s information fusion process on other hand. Empirical validation is present when the behavioural parameter’s empirical results correspond with existing domain knowledge. In case studies both theoretical and empirical validation is provided to support the basic assumption that aggregation operator measures enable us to obtain superior consumer information with substantial managerial relevance.

    Koen Vanhoof attained a master in Physics in 1982, a master in Computer Science in 1985 and a PH.D. in Computer Science in 1988 at the Katholieke Universiteit Leuven. His major research interests lie in the areas of data mining, statistics, knowledge engineering and modelling, computational intelligence methods, decision support systems and soft computing applications to information management, marketing and finance. He has authored and/or co-authored over 20 peer-reviewed journal articles and about 6 book chapters and 60 conference papers on his research topics. He is co-editor of the International Journal of Information Theory and Applications. He has been appointed as a guest professor at Jagilionski University (Cracow, Poland), the University of Antwerp (Antwerp, Belgium), the University of Maastricht (Maastricht, the Netherlands), the University of Economics (Sofia, Bulgaria), the Technical University (St. Petersburg, Russian Federation) and the Academy of Economics (Wroclaw, Poland). He is Head of the Data mining research group at Hasselt University in Belgium.

    Lambert Schomaker (University of Groningen)

    Machine learning or pattern recognition? On classification methods in writer identification and handwritten manuscript retrieval

    The automatic recognition of handwriting and writer identification constitute a considerable challenge to science and engineering. As in speech recognition, the promising academic results in handwriting recognition are reduced drastically when methods are applied in the real world. In many application domains, the academic results cannot be replicated due to the lack of labeled data for supervised training. Consequently, there is a continuous use for unsupervised clustering methods and an increased use of plain techniques such as nearest-neighbour search and non-probabilistic pattern matching. Examples will be presented from the field of handwritten historical manuscript retrieval and a branch of behavioral biometrics: writer identification. After an extensive study, using very large data sets, we have found that in the end more is gained from human time spent on feature design, i.e., traditional pattern recognition, than from the time spent in comparing the experimental results of machine-learning methods. Furthermore, many traditional methods appear ill-suited to practically handle the massive data sets (dozens of Gigabytes), with large numbers of instances and large numbers of feature dimensions. Our results in handwriting biometrics belong to the best achievable results to date, with data sets of up to 900 writers. Results in handwritten manuscript retrieval are more recent and very promising. Here, the challenge is to bootstrap a search engine for any new historical handwritten collection from a 'zero-knowledge' starting point, without any class label or any reliable segmentation into individual words. In brief: what to do when statistics are not (yet) present?

    Prof. dr. Lambert Schomaker (19-2-1957) is the Research Director of the Department of Artificial Intelligence at Groningen University. His research concerns pattern-recognition problems in handwriting recognition, writer identification, handwritten-manuscript retrieval and related topics. Recent work involves large-scale historical handwriting retrieval on high-performance computers. He is member of the IEEE Computer Society, is an active member within the IAPR and member of the BNVKI.

    Paul Eilers (Utrecht University)

    Statistical Classification for Reliable High-volume Genetic Measurements

    Single nucleotide polymorphisms (SNPs, pronounced as "snips") are mutations in which only one of the bases (A, C, G or T) that make up our DNA has changed. SNPs occur very frequently (one million or more on the whole genome) and they can tell a lot about changes in DNA, between people or between tumors and normal tissue. Modern technology allows the measurement of (hundreds of) thousands of SNPs at the same time. Unfortunately, depending of the method used and the quality of the biological samples, the measurements are not perfect. Advanced statistical classification methods are very useful to improve the determination of genotypes and to quantify reliability. I will describe the application of mixtures of regression models in this context.

    When using SNP results to compare tumors and normal tissue, new challenges show up. We do have spatial information, the location of SNPs on chromosomes. Because genomic changes tend to show spatial correlations, the task then becomes to classify changed segments.

    Thorsten Joachims (Cornell University, USA): Cancelled

    Efficient Training of SVMs for Structured Outputs

    This talk explores a large-margin approach to predicting multivariate objects like trees, clusterings, or alignments. Such problems arise, for example, when a natural language parser needs to predict the correct parse tree for a given sentence, when one needs to determine the co-reference relationships of noun-phrases in a document, or when predicting the alignment between two proteins. In particular, the talk will show how training such complex prediction rules can be formulated as a convex program. This leads to a Support Vector Machine (SVM) that generalizes conventional classification SVMs to a large range of structured outputs and multivariate loss functions. While the resulting optimization problems are convex quadratic, they have an exponential (or infinite) number of constraints. Nevertheless, the talk will show how cutting-plane methods can be used to solve the optimization problems efficiently. A by-product is a linear-time training algorithm also for linear binary classification SVMs. The algorithm is implemented in the SVM-Struct software and empirical results will be given for several application examples.

    Thorsten Joachims is an Associate Professor in the Department of Computer Science at Cornell University. In 2001, he finished his dissertation with the title "The Maximum-Margin Approach to Learning Text Classifiers: Methods, Theory, and Algorithms", advised by Prof. Katharina Morik at the University of Dortmund. From there he also received his Diplom in Computer Science in 1997 with a thesis on WebWatcher, a browsing assistant for the Web. From 1994 to 1996 he was a visiting scientist at Carnegie Mellon University with Prof. Tom Mitchell. His research interests center on a synthesis of theory and system building in the field of machine learning, with a focus on Support Vector Machines and machine learning with text. He authored the SVM-Light algorithm and software for support vector learning.



    Webmaster:Michel van de Velden, Erasmus Universiteit, Rotterdam
    Last update: 23-02-2007