General info
|n this part of the Practical course sequence analysis you will be confronted with the situation of integrating several NGS analysis programs (which you programmed in the block before) into a typical workflow engine, namely
KNIME.
Date |
Content |
Lecturer |
19.06 |
Introduction to KNIME and how to use SeqAn in KNIME |
Knut Reinert, Stephan Aiche |
26.06 |
Pulling your data into KNIME |
Stephan Aiche |
03.07 |
Advanced KNIME workflows |
Stephan Aiche |
Day 1 (19.06)
- Introduction to workflow systems and KNIME.
- Assignment 1: Install KNIME and work through the IRIS set tutorial using KNIME_quickstart.pdf. The data set can be found in the KNIME distribution or here and an explanation of the data set here.
- Assignment 2: Build the SeqAn KNIME plugin including your apps.
- Download the KNIME SDK for your platform and install/extract it.
- Open the KNIME SDK and install the File Handling Extensions
- Click Help → Install new Software
- Select the "KNIME Update Site"
- In KNIME Labs Extensions you will find the "KNIME File Handling Nodes"
- Download the GenericKNIMENodes (GKN) source code (see below)
- Create the file
build.properties
in the GKN directory with the following content
knime.sdk=/path/to/your/knime/sdk
Note: On windows you need to use slashes instead of backslashes
- Import the GKN source code into the KNIME SDK (File → Import → Existing project into workspace)
- Enable your apps for KNIME integration by adding the following line to the CMakeLists.txt of your apps
set (SEQAN_CTD_EXECUTABLES ${SEQAN_CTD_EXECUTABLES} <name of your app> CACHE INTERNAL "")
- Prepare your SeqAn installation to build a KNIME plugin, e.g.,
make prepare_workflow_plugin
- Execute the GKN node generator in the GenericKNIMENodes directory
ant -Dplugin.dir=/path/to/your/seqan/build/workflow_plugin_dir
- Import the generated nodes into the KNIME SDK (File → Import → Existing project into workspace, directory is
<GKN-directory>/generated_plugin
)
- Start KNIME out of the KNIME SDK (see here)
- Your nodes should be available under Community Nodes/SeqAn
- Nodes for input and output files can be found under Community Nodes/!GenericKnimeNodes
- In case your nodes are not there:
- Check if you used the argument parser for your app
- Check if you set valid values (i.e., filetypes) for every input and output file of your app
- Check the output of the node generator call (
ant ...
) if it contains any information (e.g., Exception …)
- After you fixed all these errors, rerun the
prepare_workflow_plugin
target and the node generator (ant
).
- In the KNIME SDK refresh all SeqAn projects (right click → Refresh) and force the KNIME SDK to rebuild all the plugins (Project → Clean → Clean all projects)
- Assignment 3: Create a simple pipeline combining your tools (trimming, de-multiplexing, read mapping)
- Assignment 4: Load the mapping results of Razers3 and try to visualise the coverage of mapped reads w.r.t. to the genome location.
- Use Razers3 to map the bee_reads against the Varroa Destructor genome (data see below).
- Import the mapping results into KNIME (hint: the Razers3 output format is described here.
- Visualise the mapping reads (hint: have a look at JFreeChart nodes (no hiliting) and KNIME histograms).
- (optional) Compare the coverage of different read qualities.
Day 2 (26.06)
- Assignment 1: Work through KNIME node generation tutorial (KNIME developer guide).
- Assignment 2: Implement your own table reader for the output of your read mapper.
- Use the New Node Wizard to create a new node
- Open the
plugin.xml
file of the project containing the the node
- Goto the
Dependencies
tab
- In the section
Required Plug-ins
click on Add
and add the org.knime.core.data.uritype
plugin.
- Change the signature of the
NodeModel::configure
method from
protected DataTableSpec[] configure(final DataTableSpec[] inSpecs) throws InvalidSettingsException
{
// ..
}
to
protected DataTableSpec[] configure(final PortObjectSpec[] inSpecs) throws InvalidSettingsException
{
// ..
}
.
- Change the signature of the
NodeModel::execute
method from
protected BufferedDataTable[] execute(final BufferedDataTable[] inData, final ExecutionContext exec) throws Exception
{
// ..
}
to
protected BufferedDataTable[] execute(final PortObject[] inData, final ExecutionContext exec) throws Exception
{
// ..
}
.
- Change the NodeModel constructor to
protected YourNodeModel() {
super(new PortType[] { URIPortObject.TYPE },
new PortType[] { new PortType(BufferedDataTable.class) });
}
.
- Adept the
configure()
method to the layout of your output file. Your configure method needs to return a DataTableSpec that corresponds to the structure of your file, e.g., a file with an integer and a string column could be realised in the following DataTableSpec
DataColumnSpec[] columnsSpec = new DataColumnSpec[2];
columnsSpec[0] = new DataColumnSpecCreator("int-colum", IntCell.TYPE).createSpec();
columnsSpec[1] = new DataColumnSpecCreator("string-column", StringCell.TYPE).createSpec();
DataTableSpec outputSpec = new DataTableSpec(columnsSpec);
- Read the content of the incoming file in the
execute
method and fill the DataTable accordingly. Hint: You can extract the File from the inData
using the following code snippet
File theFileToRead = new File(((URIPortObject) inData[0]).getURIContents()
.get(0).getURI());
. With the following code snippet you can create a table
BufferedDataContainer container = exec.createDataContainer(outputSpec);
// for each line in your file
RowKey key = new RowKey("Row " + rowIdx);
DataCell[] cells = new DataCell[2];
cells[0] = new IntCell(yourIntValue);
cells[1] = new StringCell(yourStringValue);
DataRow row = new DefaultRow(key, cells);
container.addRowToTable(row);
// .. end for each
// at the end of execute
container.close();
BufferedDataTable out = container.getTable();
return new BufferedDataTable[] { out };
outputSpec
is the same data table spec you created above.
- Assignment 3: Since your read mapper produces more then one entry per read use KNIME to filter the read table.
- Assignment 4: Create a workflow using your tools, the reader node, and the filter to read the results of a mapping run into a KNIME table.
- Assignment 5: Visualise the number of mapped reads per location.
Day 3 (03.07)
- Exam assignment lottery.
- Assignment 1: Benchmark your read mapper against SeqAn's Razers3. Razers3, if configured correctly, is a full sensitive read mapper, so your tool should report all hits that Razers3 reports. If not, check your tool. Use the Drosophila data set Reads&Genome (drosophila_*)
- Integrate Razers3 parallel to your own mapper into your read mapping pipeline.
- Create a reader node for Razers3's output.
- Map the results of Razers3 to your own filtered results.
- Customise your read mapper to map also against the reverse strand and compare the results to those of Razers3. Note: Your read mapper should report always the position on the forward strand.
- (optional) Customise your read mapper to report also the number of errors and compare them to the ones reported by Razers3.
Day 4 (10.07)
Resources and links