This tool is designed for the problem of peer-grading/peer-reviewing. Given a set of assignments that need to be graded, we aggregate the grades provided by the peer graders/reviewers. The peer-grading toolkit takes as input a set of orderings provided by the reviewers indicating their preferences over the different assignments. Given these orderings you can use the tool to produce an overall ranking of all assignments as well as an estimate of how reliable each of the different reviewers were.

On this page you will find the python code to run the peer grading toolkit offline. Documentation for the code along with illustrative examples and details of the input .pgf format can also be found on this page.

To learn more about the machine learning techniques we use please check out our papers. If you have further questions regarding the tool or the code please contact us.



This python-based toolkit provides more advanced functionality for the peer-grading problem and is designed for offline use. It implements different peer-grading techniques and produces an estimate of the assignment scores and user reliabilities. Please find further documentation of the toolkit below.


Version 1.0: Download as ZIP

Compiling and Running

PeerGrading-Toolkit can be run in Windows, Linux and Mac environments.

PeerGrading-Toolkit requires Python version 2.7 or newer in order to run properly.
Additionally certain methods require NumPy and SciPy.

You can download the latest version of Python here.

To run the code:

Usage: [-h] -i INPUTFILE [-f FORMAT] [-doccol DOCID-COLUMN] [-grcol GRADERID-COLUMN] [-vcol VALUE/GRADE-COLUMN] [-m METHOD] [-iter NUM-ITERATIONS] [--borda] [--kemen] [--all_pairs] [--model_ties] -o OUTPUT-PREFIX -log lOG-FILE [-v VERBOSITY]

-h, --help show this help message and exit
-i INPUTFILE Input data file (PGF/TSV/CMTXLS format).
-f FORMAT Input data format. Options: PGF,TSV,CMTXLS
-doccol DOCID-COLUMN Document ID column which contains the ID of the document (index starts from 1). Applicable only for TSV and CMTXLS format files.
-grcol GRADERID-COLUMN Grader/Reviewer ID column (index starts from 1). Applicable only for TSV and CMTXLS format files.
-vcol VALUE/GRADE-COLUMN Data value column which contains the grade given (index starts from 1). Applicable only for TSV and CMTXLS format files.
-m METHOD Choice of methods to run include MAL (Mallows Model), MALS (Mallows with Scores), BT (Bradley-Terry), THUR (Thurstone Model), PL (Plackett-Luce Model). Also included is the cardinal method: Score-Averaging (SCAVG).
-iter NUM-ITERATIONS Number of iterations for estimating reliabilities
--borda Use Borda Count for Mallows Model
--kemen Use Kemenization for Mallows Model
--all_pairs Use All Pairs
--model_ties Use variant that models ties
-o OUTPUT-PREFIX Output file prefix (two files will be generated: A scores file and a reliabilities file).
-log LOG-FILE Log file path.
-v VERBOSITY Level of verbosity. Options (in decreasing order): DEBUG/INFO/WARNING/ERROR/CRITICAL


The function takes in a file as input which can be one of three formats:
a) CMTXLS FORMAT: The file exported from the Microsoft CMT (Conference-Management Toolkit) as an XLS file. If you use this format, you need to provide the column indices of the grader, document and value columns.
b) TSV FORMAT: Tab-separated file format. Like for the CMTXLS format, you need to provide the column indices of the grader, document and value columns.
c) PGF FORMAT: The custom Peer-Grade File Format used by our toolkit. The data from the other formats is converted into this format.


The peer-grade file format is simply a succinct line-by-line description of the orderings provided by each grader.

Each line has the following format:

The task identifier is is multiple grading tasks are performed and a single grader reliability is desired.
The grader identifier is for identifying the different graders.

The ORDERING has the following format:
[ASSIGNMENT-ID] (Optional-Cardinal-Score) ['>'|'?'] [REMAINING-ORDERING]

The '>' indicates a strict preference.
The '?' represents an unknown preference or no preference.

An example PGF file is given below:
task1 rvwrid_1 assgnid_1 > assgnid_2 > assgnid_3
task1 rvwrid_2 assgnid_1 > assgnid_2 > assgnid_3
task1 rvwrid_3 assgnid_1 > assgnid_3 > assgnid_2

In this example: Reviewer 1 rates assignment 1 as being better than assignment 2 which in turn is better than assignment 3.

Another example with the cardinal score provided:
task1 rvwrid_1 assgnid_1 (8.0) > assgnid_2 (7.0) > assgnid_3 (5.0)
task1 rvwrid_2 assgnid_1 (9.0) > assgnid_2 (7.0) > assgnid_3 (6.0)
task1 rvwrid_3 assgnid_1 (8.0) > assgnid_3 (6.0) ? assgnid_2 (6.0)

Sample files are provided in the package.


Two files are produced as output.

One is the grade file which has the suffix '_docscores.txt'. For each task, this contains the aggregated score for that assignment computed by the method run using the peer grades provided as input. It is of the form:

with the scores sorted in decreasing order.

The second file contains the grader reliabilities (as predicted by the method) and has the suffix '_userrels.txt'. It has the format

Have further questions?

In case you have problems with the code you can look for error messages in the log file generated.

If you would like to contact us about bugs/problems with the code please email us at


Last Edited: April 21th, 2014