2. Find a function qualifies the similarity between any two data points as a number.

3. Define a similarity metric.

4. Calculate Euclidean distance measure between two points.

5. Define weighting: TF-IDF (term frequency-inverse document frequency)

6. Preprocess data, use the data to create vector, and load in SequenceFile for Mahout inputs.

7. Marking the initial clusters is an important step in K-means clustering.

8. Vector refers to an ordered list of numbers, which is all a point or physics vector is anyway.

Vectors have a number of dimensions and a numeric value for each dimension.

9. The K-means algorithm will readjust them at the end of each iteration by computing the average centre of all points in the cluster.

10. A key factor in Cluster is the choice of distance measure.

11. Manhattan distance measure, Cosine distance measure and

**Weighted distance measure**.

## No comments:

## Post a Comment