KDD Cup 2004 - Download PERF Software

UPDATE 5th of July: One of the participant pointed out that on some platforms, PERF might underestimate APR by a small amount when there are certain numbers of ties. Most likely, this will not affect your results. However, we put a new version that is more robust on the KDD-Cup Web page. The new version is 5.11. Just in case you want to be absolutely certain, you might want to use the new code. Note again, that this only affects APR on the protein problem. However, if you are still using a version of PERF prior to 5.10, you should download and use the new version. See the FAQ, or README file in the download, for more detail.

We will use the program perf to measure the performance of the predictions you submit on the eight performance metrics. You do not need to use perf, but using perf will insure that the metrics you optimize to are defined the same way we will measure them.

Perf calculates a variety of performance metrics suitable for boolean classification problems. Metrics include: accuracy, root-mean-squared-error, cross-entropy, precision, recall, precision/recall break-even point and F-score, area under the ROC curve, lift, weighted cost, top 1, top 10, rank of lowest positive case, q-score, several measures of probability calibration, etc.

Perf is available for download already compiled for a number of platforms:

Each directory contains a subdirectory with sample test data and sample output. You can us this to test that perf works on your platorm, and to see what the input format to perf is.

We recently made a number of changes to perf for the KDD-CUP, so please let us know if you find bugs.

Perf is a stand-alone C-program that is easy to compile on different platforms. Here is the source code and makefile:

perf.src.tar.gz

Perf can read from the standard input, or from files containing target values and predicted values. When reading from the standard input, the input to perf is a series of lines, each of which contains a target value and predicted value separated by whitespace. Perf reads the entire input corresponding to the targets and predictions for a train or test set, and then calculates the performance measures you request. Here is a short example of the input file. The first column is the target class (0 or 1). The second column is the probabilities the model predicts the case is in class 1. The input format allows any kind of whitespace to separate the two columns (e.g. spaces, tabs, commas).

1 0.80962 0 0.48458 1 0.65812 0 0.16117 0 0.47375 0 0.26587 1 0.71517 1 0.63866 0 0.36296 1 0.89639 0 0.35936 0 0.22413 0 0.36402 1 0.41459 1 0.83148 0 0.23271

The software can calculate a variety of performance measures, but you won't need most of them for the competition. For the KDD-CUP 2004 competition the performance measures we are interested in are:

For the Particle Physics Problem:
- ACC: accuracy
- ROC: area under the ROC curve (aka AUC)
- CXE: cross-entropy
- SLQ 0.01: Stanford Linear Accelerator Q-score (more on this later)
For the Protein Matching Problem:
- TOP1: how often is a correct match (a homolog) ranked first
- RMS: root-mean-squared-error (similar to optimizing squared error)
- RKL: rank of the last matching case (rank of the last positive case)
- APR: average precision

If you specify no options, perf prints a variety of performance measures. Typically you will specify options so that perf only calculates the performance metric(s) you are interested in, but here is sample output of perf when run on one of the test data sets included in the perf_sample_test_data directory with no options specified:

[caruana] perf < testperfdata ACC 0.83292 pred_thresh 0.500000 PPV 0.35294 pred_thresh 0.500000 NPV 0.96203 pred_thresh 0.500000 SEN 0.71429 pred_thresh 0.500000 SPC 0.84680 pred_thresh 0.500000 PRE 0.35294 pred_thresh 0.500000 REC 0.71429 pred_thresh 0.500000 PRF 0.47244 pred_thresh 0.500000 LFT 3.36975 pred_thresh 0.500000 SAR 0.78902 pred_thresh 0.500000 wacc 1.000000 wroc 1.000000 wrms 1.000000 ACC 0.90524 freq_thresh 0.617802 PPV 0.54762 freq_thresh 0.617802 NPV 0.94708 freq_thresh 0.617802 SEN 0.54762 freq_thresh 0.617802 SPC 0.94708 freq_thresh 0.617802 PRE 0.54762 freq_thresh 0.617802 REC 0.54762 freq_thresh 0.617802 PRF 0.54762 freq_thresh 0.617802 LFT 5.22846 freq_thresh 0.617802 SAR 0.81313 freq_thresh 0.617802 wacc 1.000000 wroc 1.000000 wrms 1.000000 ACC 0.91521 max_acc_thresh 0.712250 PPV 0.68182 max_acc_thresh 0.712250 NPV 0.92876 max_acc_thresh 0.712250 SEN 0.35714 max_acc_thresh 0.712250 SPC 0.98050 max_acc_thresh 0.712250 PRE 0.68182 max_acc_thresh 0.712250 REC 0.35714 max_acc_thresh 0.712250 PRF 0.46875 max_acc_thresh 0.712250 LFT 6.50974 max_acc_thresh 0.712250 SAR 0.81645 max_acc_thresh 0.712250 wacc 1.000000 wroc 1.000000 wrms 1.000000 PRB 0.54762 APR 0.51425 ROC 0.88380 R50 0.49954 RKL 273 TOP1 1.00000 TOP10 1.00000 SLQ 0.80851 Bin_Width 0.010000 RMS 0.34966 CXE 0.57335 CA1 0.22115 19_0.05_bins CA2 0.22962 Bin_Size 100

To make the output simpler, you can specify only the measure(s) you are interested in. For example, to compute just the ROC Area or just the average precision:

[caruana] perf -roc < testperfdata ROC 0.88380 [caruana] perf -apr < testperfdata APR 0.51425

To compute the accuracy, cross-entropy, and root-mean-squared-error:

[caruana] perf -acc -cxe -rms < testperfdata ACC 0.83292 pred_thresh 0.500000 RMS 0.34966 CXE 0.57335

Note that accuracy needed a threshold and perf used a default threshold of 0.5. If you want to use a different threshold (e.g. a threshold of 0 when using SVMs), the threshold can be specified with a "-threshold" option:

[caruana] perf -acc -threshold 0.0 -cxe -rms < testperfdata ACC 0.10474 pred_thresh 0.000000 RMS 0.34966 CXE 0.57335

Note that the threshold changed only the accuracy, but not the RMS or CXE. Predictions below threshold are treated as class 0 and predictions above threshold are treated as class 1. When submitting predictions for the KDD-CUP for accuracy (the only performance measure we are using in the cup that depends on a threshold) you will be asked to submit a threshold as well.

Perf can read from files instead of from the standard input:

[caruana] perf -acc -threshold 0.0 -cxe -rms -file testperfdata ACC 0.10474 pred_thresh 0.000000 RMS 0.34966 CXE 0.57335

Note that the file option must be the last option specified.

Perf has a variety of other options not described here. Perf can plot ROC curves and precision-recall plots, automatically select thresholds that maximize accuracy or make the frequency of the cases predicted to be positive match the number of cases that are positive in the data set (both of these should be used to find thresholds on train or validation sets, and then you should specify that threshold with the "-threshold" option when testing on test sets -- finding optimal thresholds directly on test sets usually is a no-no), display confusion matrices, calculate cost when unequal costs apply to false positives and false negatives, etc. A tutorial on perf is currently being prepared, but you really won't need for the KDD-CUP. To see a list of perf's options, run perf with an illegal option such as "-help":

[caruana] perf -help Error: Unrecognized program option -help Version 5.00 [KDDCUP-2004] Usage: ./perf [options] < input OR ./perf [options] -file <input file> OR ./perf [options] -files <targets file> <predictions file> Allowed options: PERFORMANCE MEASURES -ACC Accuracy -RMS Root Mean Squared Error -CXE Mean Cross-Entropy -ROC ROC area [default, if nothing else selected] -R50 ROC area up to 50 negative examples -SEN Sensitivity -SPC Specificity -NPV Negative Predictive Value -PPV Positive Predictive Value -PRE Precision -REC Recall -PRF F1 score -PRB Precision/Recall Break Even Point -APR Mean Average Precision -LFT Lift (at threshold) -TOP1 Top 1: is the top ranked case positive -TOP10 Top 10: is there a positive in the top 10 ranked cases -NTOP <N> How many positives in the top N ranked cases -RKL Rank of *last* (poorest ranked) positive case -NRM <arg> Norm error using L<arg> metric -CST <tp> <fn> <fp> <tn> Total cost using these cost values, plus min-cost results -SAR <wACC> <wROC> <wRMS><br> SAR = (wACC*ACC + wROC*ROC + wRMS(1-RMS))/(wACC+wROC+wRMS) typically wACC = wROC = wRMS = 1.0 -CAL <bin_size> CA1/CA2 scores -SLQ <bin_width> Slac Q-score METRIC GROUPS -all display most metrics (the default if no options are specified) -easy ROC, ACC and RMS -stats Accuracy, confusion table metrics, lift -confusion Confusion table plus all metrics in stat PLOTS (Only one plot is drawn at a time) -plot roc Draw ROC plot -plor pr Draw Precision/Recall plot -plot lift Draw Lift versus threshold plot -plot cost Draw Cost versus threshold plot -plot acc Draw Accuracy versus threshold plot PARAMETERS -t <arg> Set threshold [default threshold is 0.5 if not set] -percent <arg> Set threshold so <arg> percent of data falls above threshold (predicted positive) INPUT -blocks Input has BLOCK ID numbers in first column. Calculate performance for each block and report the mean performance across the blocks. Only works with APR, TOP1, RKL, and RMS. If using separate files for target and predictions input (-file option), the BLOCK ID numbers must be the first column of the target file, with no block numbers in the predictions file. -file <file> Read input from one file (1st col targets, 2nd col predictions) -files <targets file> <predictions file> Read input from two separate files

Home

KDD 2004 Conference