MarDRe: MapReduce Duplicate Removal tool GPLv3 logo


News

2019/01/23: MarDRe v1.4 released

MarDRe v1.4 has been released and can be obtained from the Downloads section.

The main changes included in this version are:

  • Improved overall performance by optimizing the writing of Sequence objects
  • Disable merge operation by default
  • Fix typos in user's guide
  • Fix typos in the help usage of the program when using -v

2017/12/19: MarDRe v1.3 released

MarDRe v1.3 has been released and can be obtained from the Downloads section.

The main changes included in this version are:

  • MarDRe now uses the Hadoop Sequence Parser (HSP) library to read input FASTQ/FASTA datasets. Overall performance is improved due to the custom sequence record readers provided by this library, especially in the paired-end mode when using a map-side join. Using HSP, this mode now executes a single MapReduce job (note that paired-end mode using a reduce-side join still needs to execute two MapReduce jobs)
  • Fix bug in paired-end mode scenarios

2017/07/28: MarDRe v1.2 released

MarDRe v1.2 has been released and can be obtained from the Downloads section.

The main changes included in this version are:

  • Optimized paired-end mode using a map-side join (enabled by default)
  • Support for input/output datasets compressed with Gzip (.gz extension) and BZip2 (.bz2 extension) codecs. Read user's guide for more information
  • New command-line options (-q and -f) to specify the input file format for compressed datasets
  • The user can decide to compare only a certain number of bases for each read using a new command-line option (-c)
  • The number of reducers can now be specified using a new command-line option (-nr)
  • New configuration parameter to enable Snappy compression of the map output phase
  • New configuration parameter to specify HDFS block replication factor for output sequence files
  • New configuration parameter to specify the HDFS base path used by MarDRe to store both the output and intermediate files
  • Fix bug in paired-end mode when processing FASTA datasets

2017/05/15: MarDRe v1.1 released

MarDRe v1.1 has been released and can be obtained from the Downloads section.

The main changes included in this version are:

  • Overall performance enhancements by reducing memory footprint, GC overhead and sequence parsing
  • Optimized paired-end mode
  • When two FASTQ reads are collapsed, now the one with the highest average quality is kept
  • The user can specify the output file names using new command-line options (-o and -r)
  • Sanity checks for the value of some configuration parameters
  • Minor bug fixes

2017/03/29: MarDRe v1.0.1 released

MarDRe v1.0.1 has been released and can be obtained from the Downloads section.

The main changes included in this version are:

  • User's guide has been updated to specify that the current version of MarDRe does not take into account the quality scores for selecting the representative sequence when duplicated
  • Fix bug with the distribution tarball to include the parent directory

2017/02/01: MarDRe v1.0 released

The first public release of MarDRe (v1.0) has been released and can be obtained from the Downloads section.