Paper Artifacts

Intelligent Code Completion with Bayesian Networks Sebastian Proksch, Johannes Lerch, Mira Mezini

This page lists all available artifacts for the paper and links to the source code of the implementation.

These files are made available for academic purposes only. Do not redistribute these files. Do not distribute direct links to these files, link to this page instead.

We do not claim authorship for the source code contained in the snapshot taken from the Eclipse update site. Please respect the respective licence information for all included software.

Replication

We provide all required data and tools to replicate the results presented in the paper. To achieve this, please start with downloading the "Extracted Object Usages" (see links) and put them in the following folder structure.

The file organization is a bit complicated, sorry for this, but it exists for a reason. The 5700 simply represents the rough size of the full dataset and could be any label or number. In our experiments, it has proven helpful to provide different dataset sizes to speed up testing. For example, you could also create a subset of the data for speeding up playing around (e.g., use only 500MB of randomly selected .zip files) and make them available in another folder (e.g., ./evalDir/0500) to enable quicker debug runs. You need to make sure though that the selected subset contains usages of (a) the SWT API from (b) at least 10 different projects (important to enable cross folding). You also need to make sure that this "selector" is then used consistently later on.

As the next step, check out the Source Code. Once you have a copy of it on your machine, start the import in Eclipse (Import... » General » Existing Projects into Workspace) and select your check out folder.

All projects should compile out of the box and you should have the following projects in your workspace now.

As the next step, copy or rename our template to evaluation.properties and adapt the evaluation settings. Make sure that the entry points to a folder and ends in a "/".

To get things started, you need to preprocess the usages that you have just downloaded. The easiest way to achieve this is to open the "runPreprocessing" class in the exec.plm15 project. The "SELECTOR" field in this file should contain the name of the subfolder in which you have placed all usages (e.g., "5700"). If everything is configured correctly, you can start the class (Right click » Run as... » Java Application) and you will see a lengthy output of the preprocessed .zip files and statistics about the extracted object usages. The script will run for several minutes and will filter out invalid usages (red) and will only keep usages related to SWT. Not all .zips will have "remaining" usages after the preprocessing, because many do not contain any SWT API usages.

Once the script ended, you can replicate our experiments. To run the experiments, the first thing you should do is to build the whole project using Maven. Head to the check out folder and invoke "mvn clean install", which should successfully build the project. Once you did this, change directory to the exec/exec.plm15 subfolder and invoke the ./build.sh script. After the successful run of this script, you will find in the target folder everything that is needed to run the evaluations. Switch into "target" and run, for example, "start-local.sh". You will find some variables within that script that control the configuration that will be performed (You will find a full list of available experiments in DistributedServer.getProviders()). Please note that some configuration options can only be changed in the "Module.java", e.g., the selection of the QueryBuilder. After the execution ends, you will see a CSV dump of the evaluation results.

The execution has two phases. First, each evaluation will instantiate several "tasks" that contain parts of the evaluation. After that, the program will compute each task one by one until all tasks are done. To speed up the experiments, you can distribute them over several machines within the same network. Simply execute "start-server.sh" (similar configuration in the script) and start as many workers as you have CPU cores available ("start-worker.sh"). You can even distribute the workers in the local network, but you also need to copy the evaluation dir around (you don't need to repeat the preprocessing though).

(this page was last updated at June 12, 2017)