MarDRe: MapReduce Duplicate Removal tool GPLv3 logo


Other Projects

  • BDEv

    BDEv is a tool to evaluate Big Data processing solutions in terms of performance and resource efficiency. It includes several ready-to-use frameworks (e.g. Hadoop, Spark, Flink) and manages the configuration needed to leverage the available computational resources, like CPU, memory and network interfaces. The evaluation of these frameworks can be done by using different benchmarks (e.g. TeraSort, WordCount) included in the BDEv distribution, while also enabling the execution of user-defined commands. Moreover, BDEv eases the execution of experiments and the task of recovering results by providing automatically generated graphs.

  • Flame-MR

    Flame-MR is a MapReduce framework which transparently improves the performance of Hadoop applications. It employs several kinds of optimizations, like avoidance of memory copies, efficient sort and merge algorithms and flexible use of resources. Moreover, its event-driven architecture overlaps the data transferring and processing. Flame-MR also keeps binary compatibility with Hadoop, so applications do not have to be modified or recompiled to be executed. The experimental results show that Flame-MR can reduce the execution time of iterative workloads by a half.

  • BDWatchdog

    BDWatchdog is a novel framework that allows real-time and scalable analysis of Big Data applications. Two approaches are used in order to get an accurate picture of what an application is doing with the resources it has available (e.g., CPU, memory, disk and network): per-process resource monitoring using time series and mixed system and JVM low-level profiling using flame graphs.