Skip to content

Commit e82eacd

Browse files
committed
improve the docs and move around docstrings
1 parent c922d3e commit e82eacd

File tree

2 files changed

+36
-38
lines changed

2 files changed

+36
-38
lines changed

README.md

Lines changed: 3 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -13,40 +13,9 @@ which interplay with the functions:
1313
- `cluster_labels`
1414
- `cluster_probs`
1515

16-
## `cluster` documentation
17-
18-
```julia
19-
cluster(ca::ClusteringAlgortihm, data) cr::ClusteringResults
20-
```
21-
22-
Cluster input `data` according to the algorithm specified by `ca`.
23-
All options related to the algorithm are given as keyword arguments when
24-
constructing `ca`.
25-
26-
The input `data` is a length-m iterable of "vectors" (data points).
27-
"Vector" here is considered in the generalized sense, i.e., any objects that
28-
a distance can be defined on them so that they can be clustered.
29-
In the majority of cases these are vectors of real numbers.
30-
If you have a matrix with each row a data point, simply pass in `eachrow(matrix)`.
31-
32-
The output is always a subtype of `ClusteringResults` that can be further queried.
33-
The cluster labels are always the
34-
positive integers `1:n` with `n::Int` the number of created clusters,
35-
Data points that couldn't get clustered (e.g., outliers or noise)
36-
get assigned negative integers, typically just `-1`.
37-
38-
`ClusteringResults` subtypes always implement the following functions:
39-
40-
- `cluster_labels(cr)` returns a length-m vector `labels::Vector{Int}` containing
41-
the clustering labels , so that `data[i]` has label `labels[i]`.
42-
- `cluster_probs(cr)` returns `probs` a length-m vector of length-`n` vectors
43-
containing the "probabilities" or "score" of each point belonging to one of
44-
the created clusters (useful for fuzzy clustering algorithms).
45-
- `cluster_number(cr)` returns `n`.
46-
47-
Other algorithm-related output can be obtained as a field of the result type,
48-
or by using other specific functions of the result type.
49-
This is described in the individual algorithm implementations docstrings.
16+
The specification of the API is based on `cluster`: given
17+
a `ClusteringAlgorithm` and some data in the form of iterable of "vectors", `cluster` returns a `ClusteringResult`.
18+
The result can be queried with the functions `cluster_number, cluster_labels, cluster_probs`.
5019

5120
## For developers
5221

@@ -56,8 +25,4 @@ so that it returns a new subtype of `ClusteringResult`.
5625
This result must extend `cluster_number, cluster_labels`
5726
and optionally `cluster_probs`.
5827

59-
See also the two helper functions `each_data_point, input_data_size`
60-
which help you can support matrix input while abiding the declared api
61-
of iterable of vectors as input.
62-
6328
For more, see the docstring of `cluster`.

src/ClusteringAPI.jl

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,40 @@ export cluster, cluster_number, cluster_labels, cluster_probs
66
abstract type ClusteringAlgorithm end
77
abstract type ClusteringResults end
88

9+
"""
10+
```julia
11+
cluster(ca::ClusteringAlgortihm, data) → cr::ClusteringResults
12+
```
13+
14+
Cluster input `data` according to the algorithm specified by `ca`.
15+
All options related to the algorithm are given as _keyword_ arguments when
16+
constructing `ca`.
17+
18+
The input `data` is a length-`m` iterable of "data points".
19+
A data point is something generic: any objects that
20+
a distance can be defined on them so that they can be clustered.
21+
In the majority of cases data points are just vectors of real numbers.
22+
If you have a matrix with each row a data point, simply pass in `eachrow(matrix)` as `data`.
23+
24+
The output is always a subtype of `ClusteringResults` that can be further queried.
25+
The cluster labels are always the
26+
positive integers `1:n` with `n::Int` the number of created clusters,
27+
Data points that couldn't get clustered (e.g., outliers or noise)
28+
get assigned negative integers, typically just `-1`.
929
30+
`ClusteringResults` subtypes always implement the following functions:
31+
32+
- `cluster_labels(cr)` returns a length-`m` vector `labels::Vector{Int}` containing
33+
the clustering labels , so that `data[i]` has label `labels[i]`.
34+
- `cluster_probs(cr)` returns a length-`m` vector `probs`, whose elements are length-`n` vectors
35+
containing the "probabilities" or "scores" for each point belonging to one of
36+
the created clusters (useful for fuzzy clustering algorithms).
37+
- `cluster_number(cr)` returns `n`.
38+
39+
Other algorithm-related output can be obtained as a field of the result type,
40+
or by using other specific functions of the result type.
41+
This is described in the individual algorithm implementations docstrings.
42+
"""
1043
function cluster(ca::ClusteringAlgorithm, data)
1144
throw(ArgumentError("No implementation for `cluster` for $(typeof(ca))."))
1245
end

0 commit comments

Comments
 (0)