Class DBSCANClusterer<T extends Clusterable>

java.lang.Object
org.apache.commons.math3.ml.clustering.Clusterer<T>
org.apache.commons.math3.ml.clustering.DBSCANClusterer<T>
Type Parameters:
T - type of the points to cluster

public class DBSCANClusterer<T extends Clusterable> extends Clusterer<T>
DBSCAN (density-based spatial clustering of applications with noise) algorithm.

The DBSCAN algorithm forms clusters based on the idea of density connectivity, i.e. a point p is density connected to another point q, if there exists a chain of points pi, with i = 1 .. n and p1 = p and pn = q, such that each pair <pi, pi+1> is directly density-reachable. A point q is directly density-reachable from point p if it is in the ε-neighborhood of this point.

Any point that is not density-reachable from a formed cluster is treated as noise, and will thus not be present in the result.

The algorithm requires two parameters:

  • eps: the distance that defines the ε-neighborhood of a point
  • minPoints: the minimum number of density-connected points required to form a cluster
Since:
3.2
See Also:
  • Field Details

    • eps

      private final double eps
      Maximum radius of the neighborhood to be considered.
    • minPts

      private final int minPts
      Minimum number of points needed for a cluster.
  • Constructor Details

    • DBSCANClusterer

      public DBSCANClusterer(double eps, int minPts) throws NotPositiveException
      Creates a new instance of a DBSCANClusterer.

      The euclidean distance will be used as default distance measure.

      Parameters:
      eps - maximum radius of the neighborhood to be considered
      minPts - minimum number of points needed for a cluster
      Throws:
      NotPositiveException - if eps < 0.0 or minPts < 0
    • DBSCANClusterer

      public DBSCANClusterer(double eps, int minPts, DistanceMeasure measure) throws NotPositiveException
      Creates a new instance of a DBSCANClusterer.
      Parameters:
      eps - maximum radius of the neighborhood to be considered
      minPts - minimum number of points needed for a cluster
      measure - the distance measure to use
      Throws:
      NotPositiveException - if eps < 0.0 or minPts < 0
  • Method Details

    • getEps

      public double getEps()
      Returns the maximum radius of the neighborhood to be considered.
      Returns:
      maximum radius of the neighborhood
    • getMinPts

      public int getMinPts()
      Returns the minimum number of points needed for a cluster.
      Returns:
      minimum number of points needed for a cluster
    • cluster

      public List<Cluster<T>> cluster(Collection<T> points) throws NullArgumentException
      Performs DBSCAN cluster analysis.
      Specified by:
      cluster in class Clusterer<T extends Clusterable>
      Parameters:
      points - the points to cluster
      Returns:
      the list of clusters
      Throws:
      NullArgumentException - if the data points are null
    • expandCluster

      private Cluster<T> expandCluster(Cluster<T> cluster, T point, List<T> neighbors, Collection<T> points, Map<Clusterable,DBSCANClusterer.PointStatus> visited)
      Expands the cluster to include density-reachable items.
      Parameters:
      cluster - Cluster to expand
      point - Point to add to cluster
      neighbors - List of neighbors
      points - the data set
      visited - the set of already visited points
      Returns:
      the expanded cluster
    • getNeighbors

      private List<T> getNeighbors(T point, Collection<T> points)
      Returns a list of density-reachable neighbors of a point.
      Parameters:
      point - the point to look for
      points - possible neighbors
      Returns:
      the List of neighbors
    • merge

      private List<T> merge(List<T> one, List<T> two)
      Merges two lists together.
      Parameters:
      one - first list
      two - second list
      Returns:
      merged lists