This is the homepage of the lecture Empirical Evaluation in Informatics (Vorlesung "Empirische Bewertung in der Informatik") and its corresponding tutorial (Übung).
Description
As an engineering discipline, Informatics is constantly developing
new artifacts such as methods, languages/notations or
concrete software systems. In most cases, the functional efficiency and effectiveness
of these solutions for the intended purpose is not obvious --
especially not in comparison to other already existing solutions
for the same or similar purpose.
For this reason, methods for evaluating the efficacy of these solutions
must be a routine part of Informatics -- a fact which unfortunately
only slowly has become recognized.
Evaluation is needed by those who create new solutions (that is in research and development),
but also by the users, as these need to evaluate the expected efficacy specifically for their situation.
These evaluations need to be empirical (that is based on observation),
because the problems are nearly always too complicated
for an analytical (that is a purely thought-based) approach.
This lecture presents the most important empirical evaluation methods
and explains where these have been used (using examples) and should be used,
how to use them and what to consider when doing so.
Administration
Lecturers
Requirements/target group, classification, credit points etc.
see
entry in the KVV course syllabus
Registration
- For the tutorials every participant needs to have registered in the KVV.
- Subscribe to »Empirische Bewertung in der Informatik S17«
Dates
- The lecture is held every Monday from 10:15 to 11:45 in room 049, Takustr. 9
- The first lecture is on Monday, 2017-04-24
- The tutorial takes place every Monday from 12:15 to 13:45 in room 049, Takustr. 9
- The first tutorial is on Monday, 2017-04-24
- Attention: The first practice sheet is due on 2017-04-23, i.e. before the first lecture!
- Written exam: Monday, 2017-07-24, 09:59, Takustr. 9, room 006
- Post-exam review (Klausureinsicht): Monday, 2017-10-16, 15:59 until at least 16:30, room 049, Takustr. 9
Language
- The course language is German, but the actual slides and practice sheets are in English.
- The exam will be formulated in German, but answers may be given in English, too.
Examination modalities
Necessary criteria for obtaining the credit points:
- Completion of at least 80% of the tasks on the practice sheets
- active participation in the tutorial
- passing of the written examination (dictionary may be used)
Content
Note concerning links
*Most of the linked documents and videos can only be accessed from the FU Berlin network* (externally you receive a 403/Forbidden: "You don't have permission to access ...", use a VPN connection in this case).
Lecture topics
The lecture divides into three sections:
- Introduction (3 weeks): Introduces the basic ideas of empiricism and discusses quality characteristics for empirical studies (lectures 1 to 3).
- Methods (7 weeks): Presents basic aspects of and approaches to various empirical methods and illustrates them with concrete examples from the scientific literature.
- Data analysis (2 weeks): Empirical studies always generate raw data first which may partly be of qualitative and partly of quantitative nature. The research results only arise from the data's analysis and interpretation. The topic of the analysis of quantitative data is so comprehensive that you may dedicate an entire degree to it (statistics).
This section gives the first introduction to the analysis of quantitative data. (The completely different analysis of qualitative data is beyond the scope of this lecture.)
Please note:
The entries below are listed thematically, but not entirely chronologically (see specific dates to be sure).
Our schedule is a bit bumpy due to unfortunate floating holidays in 2017.
- (24.4.2017) Introduction - The role of empiricism:
(Video 2017-04, video has many temporary freezes)
- Term "empirical evaluation";
theory, construction, empiricism;
status of empiricism in Informatics
- Hypothetical examples of use
- quality criteria: reliability, relevance
- Note: scale types
- (
01.05.2017 self-study in CW 18) The scientific method:
(Video 2014-04)
- Science and methods for gaining insights;
classification of Informatics
- The scientific method;
variables, hypotheses, control;
internal and external validity;
validity, reliability, relevance
- (22.05.2017) How to lie with statistics:
(Video 2014-05)
- When looking at somebody else's conclusions from data: What is actually meant? What specifically? How can they know it? What is not said?
- Does the measurement distort the meaning? Is the sample biased?, etc.
- Material: book on the topic;
Study on alternative ink;
article with arguments against hypothesis testing:
"The earth is round (p < 0.05)".
- (08.05.2017) Empirical approach:
(Video 2015-05)
- steps: formulate aim and question; select method and design study;
create study situation; collect data;
evaluate findings; interpret results.
- example: N-version programming
(article,
reply to the criticisms against it)
- (15.05.2017) Survey:
(Video 2014-05)
- example: relevance of different topics in Informatics education
(article)
- method: selection of aims; selection of group to be interviewed;
design and validation of the questionnaire;
execution of the survey; evaluation; interpretation
- (29.05.2017) Controlled experiment:
(Video 2014-05)
- example 1: flow charts versus pseudo-code
(article,
criticized prior work)
- method: control and constancy; problems with reaching constancy;
techniques for reaching constancy
- example 2: use of design pattern documentation
(article)
- (
05.06.2017 self-study in CW 23) Quasi experiment:
(Video 2015-06)
- example 1: comparison of 7 programming languages
(article,
detailed technical report)
- method: like controlled experiment, but with
incomplete control (mostly: no randomization)
- example 2: influence of work place conditions on productivity
(article)
- (12.06.2017) Data analysis - basic terminology:
(Video 2014-06)
- (19.06.2017) Data analysis - techniques:
(Slide video 2017-06, no webcam)
- Samples and populations; average value;
variability; comparison of samples:
significance test, confidence interval; bootstrap; relations between variables:
plots, linear models, correlations, local models (loess)
- Article: "A tour through the visualization zoo"
- (26.06.2017) Benchmarking:
(Video 2014-06)
- example 1: SPEC CPU2000
(article)
- Benchmark = measurement + task + comparison; problems (costs,
task selection, overfitting); quality characteristics (accessibility,
effort, clarity, portability, scalability, relevance)
(article)
- example 2: TREC
(article)
- (03.07.2017) Case study:
(Video 2014-07)
- example 1: Familiarization with a software team
(article)
- method: characteristics of case studies; what is the 'case'?;
use of many data types; triangulation;
validity dimensions
- example 2: An unconventional methods for für
requirements inspections
(article)
- (10.7.2017) Other methods:
(Video 2014-07)
- The method landscape; simulation; software archeology
(studies on the basis of existing data); literature study;
- example simulation: scaling of P2P file sharing
(article)
- example software archeology: code decline
(article)
- example literature study: a model of the effectiveness of reviews
(article)
- (17.7.2017) Summary and advice:
(Slide video 2017-07, no webcam)
- Role of empiricism; quality criteria; generic method;
advantages and disadvantages of the methods; practical advice (for data analysis;
for conclusion-drawing; for final presentation); outlook
Aims of the tutorials
- Tutorial 1 to 3 (concerning R)
- To get to know the possibilities of a free, comprehensive and modern statistics software and gain basic skills with it.
- To get to know a new way of thinking for programming ("programming with data") and practice it.
- Realize how enlightening a data analysis may be in some cases and how useless in others.
- Tutorial 4 to 13 (project: empirical study)
- To have gone through the design process of an empirical study oneself and to realize how many aspects must be considered.
- To experience how many good ideas you may have and how many others possibly are still missing.
- To realize how important it is to work accurately (because a correction of mistakes is often impossible and usually causes a huge amount of extra work).
- To have had the gee-whiz experience of analyzing data which nobody else in this world has seen so far.
Practice sheets
(These links will be added continuously as the course proceeds.)
- Preparation for the tutorial: install R
- Practice sheet 1: get to know the data evaluation software "R", (due on 2017-04-23)
- Practice sheet 2: practice R, part 1
-
data.zip
(contains junit.tsv
, junit20.tsv
, zile.tsv
, jikes.tsv
)
- Practice sheet 3: practice R, part 2
- Practice sheet 4: practice R, part 3; Survey: topic
- Practice sheet 5: Survey: method and first version of questionnaire
- Practice sheet 6: Survey: revise questionnaire
- Practice sheet 7: Survey: polish questionnaire
- Practice sheet 8: Survey: pilot test, recruit participants
- Practice sheet 9: Survey: interim report, Evaluating research: Experiments
- Practice sheet 10: Survey: validate data, prepare analysis
- Practice sheet 11: Evaluating research: Quantitative Data Analysis
- Practice sheet 12: Survey: data analysis, final presentation
- Practice sheet 13: Survey: final report
Changes over the years
- 2004: Lecture first held.
- 2005: Lecture: only minor changes. Tutorial: broader choice of topics for the surveys.
- 2010: Lecture and tutorial both held in English.
Literature
Should you have comments or suggestions concerning this page,
you may add them here (possibly with date and name):