Visualization

Generation and plotting functions

mlr's visualization capabilities rely on generation functions which generate data for plots, and plotting functions which plot this output using either ggplot2 or ggvis (the latter being currently experimental).

This separation allows users to easily make custom visualizations by taking advantage of the generation functions. The only data transformation that is handled inside plotting functions is reshaping. The reshaped data is also accessible by calling the plotting functions and then extracting the data from the ggplot2::ggplot object.

The functions are named accordingly.

In the example below we create a plot of classifier performance as function of the decision threshold for the binary classification problem sonar.task. The generation function generateThreshVsPerfData creates an object of class ThreshVsPerfData which contains the data for the plot in slot $data.

lrn = makeLearner("classif.lda", predict.type = "prob")
n = getTaskSize(sonar.task)
mod = train(lrn, task = sonar.task, subset = seq(1, n, by = 2))
pred = predict(mod, task = sonar.task, subset = seq(2, n, by = 2))
d = generateThreshVsPerfData(pred, measures = list(fpr, fnr, mmce))

class(d)
#> [1] "ThreshVsPerfData"

head(d$data)
#>         fpr       fnr      mmce  threshold
#> 1 1.0000000 0.0000000 0.4615385 0.00000000
#> 2 0.3541667 0.1964286 0.2692308 0.01010101
#> 3 0.3333333 0.2321429 0.2788462 0.02020202
#> 4 0.3333333 0.2321429 0.2788462 0.03030303
#> 5 0.3333333 0.2321429 0.2788462 0.04040404
#> 6 0.3125000 0.2321429 0.2692308 0.05050505

For plotting we can use the built-in mlr function plotThreshVsPerf.

plotThreshVsPerf(d)

plot of chunk unnamed-chunk-2

Note that by default the Measure names are used to annotate the plot.

fpr$name
#> [1] "False positive rate"

fpr$id
#> [1] "fpr"

This does not only apply to plotThreshVsPerf, but to most other plot functions that show performance measures. You can use the ids instead of the names by setting pretty.names = FALSE.

Instead of using the built-in function plotThreshVsPerf we could also manually create the plot based on the output of generateThreshVsPerfData: in this case to plot only one measure.

ggplot(d$data, aes(threshold, fpr)) + geom_line()

plot of chunk unnamed-chunk-4

The decoupling of generation and plotting functions is especially practical for all users who prefer traditional graphics or lattice. Here is a lattice plot which gives a result similar to that of plotThreshVsPerf.

lattice::xyplot(fpr + fnr + mmce ~ threshold, data = d$data, type = "l", ylab = "performance",
  outer = TRUE, scales = list(relation = "free"),
  strip = strip.custom(factor.levels = sapply(d$measures, function(x) x$name)))

plot of chunk unnamed-chunk-5

Let's conclude with a brief look on a second example. Here we use plotPartialPrediction but extract the data from the plot object and use it to create a traditional graphics::plot, additional to the ggplot2 plot.

sonar = getTaskData(sonar.task)
pd = generatePartialPredictionData(mod, sonar, "V11")
plt = plotPartialPrediction(pd)
head(plt$data)
#>   Class Probability Feature     Value
#> 1     M   0.9295997     V11 0.7342000
#> 2     M   0.9084961     V11 0.6558333
#> 3     M   0.8792694     V11 0.5774667
#> 4     M   0.8232852     V11 0.4991000
#> 5     M   0.7387962     V11 0.4207333
#> 6     M   0.6557857     V11 0.3423667

plt

plot of chunk unnamed-chunk-6

plot(Probability ~ Value, data = plt$data, type = "b")

plot of chunk unnamed-chunk-7

List of available functions

The table shows the currently available generation and plotting functions. It also references tutorial pages that provide in depth descriptions of the listed functions.

Note that some plots, e.g., plotTuneMultiCritResult are not described here since they lack a generation function. Both plotThreshVsPerf and plotROCCurves operate on the result of generateThreshVsPerfData.

The ggvis functions are experimental and are subject to change, though they should work. Most generate interactive shiny applications, that automatically start and run locally.

generation function ggplot2 plotting function ggvis plotting function tutorial page
generateThreshVsPerfData plotThresVsPerf plotThreshVsPerfGGVIS Performance
plotROCCurves -- ROC Analysis
generateCritDifferencesData plotCritDifferences -- Benchmark Experiments
generateFilterValuesData plotFilterValues plotFilterValuesGGVIS Feature Selection
generateLearningCurveData plotLearningCurve plotLearningCurveGGVIS Learning Curves
generatePartialPredictionData plotPartialPrediction plotPartialPredictionGGVIS Partial Prediction Plots
generateCalibrationData plotCalibration -- Classifier Calibration Plots