-
Notifications
You must be signed in to change notification settings - Fork 45
Open
Description
This issue is to discuss what functions should be ported from StatsBase to Statistics (#2). Some functions would better move to a separate package:
- statmodels.jl: should go to StatsAPI.jl
Most APIs have passed the test of time so they are probably good enough, but I find some of them are not completely satisfying:
- hist.jl: I don't know this part of the code enough to judge whether the API is OK. There have been proposals to move these to a separate package (Proposal: Move histograms to separate package StatsBase.jl#650).
- weights.jl: Weighted
sum
cannot be implemented via aweights
keyword arguments like other functions since the function lives in Base (RFC: Add weights argument to sum JuliaLang/julia#33310). We could either exportwsum
or keep it internal and do not support it for now. - counts.jl:
counts
sounds a bit too generic of a term for a function that only allows counting integer values.countmap
is more general and its name is explicit. That said,counts
could easily be extended to allow any type of levels -- its limitation is just that it returns a vector without names so the mapping to the levels has to be done by hand, which isn't user-friendly. APIs provided by FreqTables.jl are nicer to use, but they need NamedArrays.jl (or a similar package). Then there's the issue thatcountmap
uses radix sort for performance with some types, but this needs SortingAlgorithms.jl, which isn't a stdlib (yet?). - deviation.jl: Do we really need all of these small convenience functions?
counteq
andcountne
don't really sound like statistical functions and I'm not sure how commonly they are used.sqL2dist
,L2dist
,L1dist
,Linfdist
have an uppercase in their name; these and remaining functions are redundant with functions provided in Distances.jl. That only leavespsnr
. - misc.jl:
indexmap
is justindexin
so remove it.levelsmap
andindicatormat
sound a bit limited compared with what StatsModels provides.rle
andinverse_rle
are not really related to statistics. - scalarstats.jl:
mean_and_var
andmean_and_std
have weird names so I'm not sure we should keep them or not.zscore
andzscore!
are convenient but redundant with (more general and more verbose) functions in transformations.jl. - transformations.jl:
transform
andtransform!
are too generic names, I propose overloading LinearAlgebra'snormalize
andnormalize!
, since that name is actually the commonly used term for such transformations. I wonder whether we really needreconstruct
andreconstruct!
(which could be calledunnormalize
if we keep them). I'm also not sure what's the use of allowing a separatefit
operation before actually applying the transformation (I'd imagine one would always normalize the data immediately). - moments.jl:
moment
is redundant with specific functions so I'd drop it. - robust.jl:
trimvar(x)
could bevar(trim(x))
iftrim(x)
returned a special iterator type to dispatch on
See also my previous notes at JuliaLang/julia#27152 (comment).
Metadata
Metadata
Assignees
Labels
No labels