Skip to content

Conversation

alyst
Copy link
Contributor

@alyst alyst commented Aug 11, 2024

Another iteration of cherry-picking the changes from #193.

The main part of this PR is the simplification of the evaluation API.
Currently, SemImply and SemLoss subtypes declare a separate method for each combination of objective, gradient or hessian requested to be evaluated (objective!(), objective_gradient!() asf).
This introduces a lot of code duplication, and complicates maintenance as any fix should be applied at multiple places.
The PR replaces it with a single evaluate!(objective, gradient, hessian, ...) call for the in-place evaluation, where the evaluation of objective, gradient, or hessian is skipped if nothing is supplied in the corresponding argument.
Since Julia compiles a separate version of the function for each combination of the input parameters, the new API does not incur any performance penalty and effectively achieves what the old code did.

For updating the internal state of SemImply subtypes, the equivalent update!(targets::EvaluationTargets, imply::SemImply, ...) method is introduced,
where EvaluationTargets is a simple structure that specifies (via its type parameters, i.e. at compile time), whether the SemImply internal fields required for objective, gradient and/or hessian needed to be updated.

Also, the PR converts meanstructure and approximatehessian from the Val{T} struct fields of SemImply into just type parameters (without the corresponding field).
That seems to be a more "idiomatic" way of declaring traits in Julia.
For some SEM loss types, ApproximateHessian trait is fixed to either NoHessian or ExactHessian; for others, like SemWLS, the type of Hessian (approximate or exact) could be specified at construction.

As another immediate benefit of evaluation API refactoring, the Optim.jl and NLopt.jl wrappers are simplified, since the new API evaluate!() call directly implements the semantics these packages expect.

alyst and others added 13 commits August 11, 2024 13:13
does not seem to be used anywhere;
also the method signature does not match Julia conventions
* replace has_meanstrcture and approximate_hessian fields with trait-like typeparams
* remove methods for has_meanstructure-based dispatch
the intent of this commit is to refactor the API for objective,
gradient and hessian evaluation, such that the evaluation code
does not have to be duplicates across functions that calculate different
combinations of those functions

* introduce EvaluationTargets class that handles selection of what to evaluate
* add evaluate!(EvalTargets, ...) methods for loss and imply objs
that evaluate only what is required
* objective!(), obj_grad!() etc calls are just a wrapper of evaluate!() with proper targets
* explicitly use Cholesky factorization
remove unnecesary arguments
by dispatching over optimizer
by dispatching over optimizer
Copy link

codecov bot commented Aug 11, 2024

Codecov Report

Attention: Patch coverage is 88.60759% with 36 lines in your changes missing coverage. Please review.

Project coverage is 71.69%. Comparing base (2f6e8b7) to head (0cecaa8).
Report is 34 commits behind head on devel.

Files with missing lines Patch % Lines
src/objective_gradient_hessian.jl 79.22% 16 Missing ⚠️
src/frontend/fit/standard_errors/hessian.jl 70.83% 7 Missing ⚠️
src/loss/WLS/WLS.jl 90.47% 4 Missing ⚠️
src/imply/empty.jl 0.00% 2 Missing ⚠️
src/loss/ML/ML.jl 97.67% 2 Missing ⚠️
src/loss/regularization/ridge.jl 66.66% 2 Missing ⚠️
src/additional_functions/helper.jl 0.00% 1 Missing ⚠️
src/loss/constant/constant.jl 80.00% 1 Missing ⚠️
src/optimizer/documentation.jl 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #214      +/-   ##
==========================================
+ Coverage   70.52%   71.69%   +1.17%     
==========================================
  Files          52       52              
  Lines        2446     2120     -326     
==========================================
- Hits         1725     1520     -205     
+ Misses        721      600     -121     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Alexey Stukalov added 3 commits August 11, 2024 14:44
to reduce allocations
to reduce allocations
to avoid extra allocation
Co-authored-by: Maximilian Ernst <[email protected]>
@Maximilian-Stefan-Ernst
Copy link
Collaborator

Maximilian-Stefan-Ernst commented Sep 18, 2024

Thanks a lot for those changes, and sorry it took that long to review. I think this greatly simplifies the codebase (and also improves some of the linear algebra computations)!

Something that is still to do is adding these changes to the online documentation on how to implement new loss functions and imply types - but I will do this later when I make a new release and will just add it to the according issue for now.

Regarding the MeanStructure and HessianEvaluation traits: I think it's great to have them, just one thing where I'm unsure is whether to have them as type parameters to SemImply and SemLossFunction already, or whether we should keep those abstract types without parameters. The reason I'm hesitating is because I would like to make it as easy as possible for new package users to implement their own loss functions / imply types. I imagine that most people that would happen to try out the package are former R users that are not familiar with Julias type system in general, even less so with parametric types. So I'm a bit afraid that if the documentation on implementing new loss function starts with implementing a parametric type that shares type parameters with the supertype SemImply, this might be too confusing. What do you think about that?

@alyst
Copy link
Contributor Author

alyst commented Sep 20, 2024

Regarding the MeanStructure and HessianEvaluation traits

Thanks for bringing it up! My two main motivations to use the type parameters were:

  1. to have a common mechanism that mandates declaring and querying these traits. E.g. the traits, such as the presence of a mean structure, have to be aligned across the implied structure, loss term, and the observed data.
  2. to enable Julia compile-time type inference for compiling efficient versions of the functions (evaluate!() in particular). Mostly it is about the elimination of the branches that are not required for the given input arguments or for the given SEM problem structure.

To achieve just the 2nd goal, one can potentially avoid extra type parameters in the base abstract type and keep it at the level of individual implementations (e.g. replace MeanStructure(imply) === HasMeanStructure with isnothing(imply.M)).
It means the rule that the field M should be declared with its own type parameter to achieve better performance (as well as the other rules regarding the interaction of SemImply and AbstractSemLoss types)
should be documented somewhere, and that the users should read, remember and follow this rule to get more efficient code.
I think there are high chances that user implementations would actually not follow it.

That said, I don't know how critical is it to have a hessian-related trait. Potentially, it is a mechanism to query whether all loss terms support exact Hessian evaluation, which might be important for methods like se_hessian().
This trait could also be used to declare and then check whether SEM model supports Hessian calculation at all -- that may help in e.g. automatic selection of the optimization method.
For my problems, Hessian evaluation is challenging, so I did not look much into it.

Also, the design decision depends on your vision about the future SemImply implementations and their new features.
If you expect that there would be many traits similar to MeanStructure -- then I would agree that trying to support all of them via SemImply type parameters is not a good design.

I am not too much worried about the steep learning curve for the new users in this particular case.
In my own experience, as long as the documentation contains an example of the custom SemImply implementation, I just copy it and adjust for my needs without necessary understanding all the details :)

@Maximilian-Stefan-Ernst
Copy link
Collaborator

Okay, some thoughts on this:
Maybe we could keep the MeanStructure and HessianEvaluation types, just not as type parameters to SemImply and SemLossFunction but have something like HessianEvaluation(lossfun::SemLossFunction) = lossfun.hessian_evaluation and document this. There is a decent chance users wont follow it, but whenever we call a function that needs to know this trait we can throw an error if it's not implemented. We should then also have something like HessianEvaluation(sl::SemLoss) = all(HessianEvaluation(lf) === ExactHessian for lf in sl.functions) ? ExactHessian : ApproximateHessian
I know this is not the most elegant way, but as you said if we have them as type parameters to SemLossFunction, every time we want to add a new trait this way we would break all user code that defines a new loss function.

Some thoughts on hessian computation and standard errors:
In terms of performance, I think it almost never makes sense to use hessian-based optimizers, even for small problems - although it might make sense if you have convergence problems. I did not invest too much time into standard error computation so far, but it seems like an important feature to be able to check if a model supports exact hessian calculation - it should never be the case that we just silently compute standard errors from an approximation of the hessian and don't tell the user.
However, users might want to explicitly use an approximation.
So maybe we should add se_hessian(fit::SemFit; method = :analytic_approx) and every time a user wants to compute standard errors with method = :analytic and we have HessianEvaluation(loss) === ApproximateHessian, we should throw an error, and if we have method = :analytic_approx and HessianEvaluation(loss) === ExactHessian we throw a warning?

@alyst
Copy link
Contributor Author

alyst commented Sep 26, 2024

but have something like HessianEvaluation(lossfun::SemLossFunction) = lossfun.hessian_evaluation and document this.

Just to clarify -- how do you see the type and declaration of the hessian_evaluation field?
If HessianEvaluation(lossfun::SemLossFunction) cannot be evaluated at the compile time, it can potentially affect all type inference of evaluate!() calls (because the compiler would not be able to decide which branches would be executed), so all operations inside evaluate!() might have performance penalty, which will be especially noticeable for small problems.
That was the reason to use the traits approach and have HessianEvaluation(::Type{<:SemImply}), so that it works just with the type information.
If you envision something like hessian_evaluation::HE, where HE is a type parameter, then it is similar to the current approach, we just have an extra struct field that we don't really need, and the type parameter applies to the user's type rather than to the abstract SemImply or to SemLossFunction.

For MeanStructure it is a bit more straigthforward, as the mean structure implies the existence of the M field that could be a type-parameterized.
So I can tweak the PR and remove the MS type parameter, but keep the static evaluation of the MeanStructure(::Type{<:SemImply}).
I agree that MS type parameter is a bit of over-engineering, and for a fast code one needs to add a type parameter for M anyway.
But I think we should keep the requirement of static evaluation, as it guides the users towards writing faster code.

So maybe we should add se_hessian(fit::SemFit; method = :analytic_approx) ...

I agree that it is inconvenient, when the type of Hessian evaluation is fixed at the loss function creation, and it may result in the se_hessian() error quite late in the data analysis, when the model is already inferred.

A far-fetched solution is to separate the constant part of SemImply or SemLossFunction from the current state of evaluation, i.e. move fields like Σ, I_A⁻¹ etc into a new implementation-dependent "state" object (there could also be a base abstract SemImplyState type). So the method of Hessian evaluation belongs to the "state", and the state could be regenerated without changing the model to match the requested evaluation mode. MeanStructure, on the other hand, belongs to the model -- allowing or disallowing the mean structure means changing the SEM.
State will also allow for parallel evaluation -- it will be possible to call evaluate!(state[i], ....) in parallel for i=1,2,3,...n.

@Maximilian-Stefan-Ernst
Copy link
Collaborator

If you envision something like hessian_evaluation::HE, where HE is a type parameter, then it is similar to the current approach, we just have an extra struct field that we don't really need, and the type parameter applies to the user's type rather than to the abstract SemImply or to SemLossFunction.

That's exactly what I was thinking - we pay an extra struct field, but we buy that if we add another trait in the future, we don't break all users's code that implements a custom imply type. (Sorry, my code was wrong.)

I agree that it is inconvenient, when the type of Hessian evaluation is fixed at the loss function creation, and it may result in the se_hessian() error quite late in the data analysis, when the model is already inferred.

It is... a workaround for now would be to construct a new loss function with all the fields from the old one except swapping the hessian_evaluation field.

@alyst
Copy link
Contributor Author

alyst commented Sep 29, 2024

That's exactly what I was thinking - we pay an extra struct field, but we buy that if we add another trait in the future

Maybe one final clarification -- do you mean that imply.hessian_evaluation should be a part of the SemImply public interface, and it is ok to use it outside of the private code for the custom SemImply implementation?
My idea was that all public traits querying should be done via methods, and direct field access is not encouraged -- so it does not really matter if there is a .hessian_evaluation field or not.

I can change the PR and add const hessian_evaluation::HE field to the existing types, move HE type parameter into the custom type, and make the default HessianEvaluation() method to use the hessian_evaluation field.
Given that Julia does not support non-abstract type inheritance out-of-the-box, that requires adding more boilerplate code to each user-defined class.
But if you think it would be easier for the new users coming from R to understand, I can do that.

@Maximilian-Stefan-Ernst
Copy link
Collaborator

Maximilian-Stefan-Ernst commented Oct 1, 2024

My idea was that all public traits querying should be done via methods, and direct field access is not encouraged -- so it does not really matter if there is a .hessian_evaluation field or not.

That's exactly what I meant.

make the default HessianEvaluation() method to use the hessian_evaluation field.

Yes.

But if you think it would be easier for the new users coming from R to understand, I can do that.

Thanks a lot!

So I can tweak the PR and remove the MS type parameter, but keep the static evaluation of the MeanStructure(::Type{<:SemImply}).

This would also be nice.

@alyst
Copy link
Contributor Author

alyst commented Oct 8, 2024

@Maximilian-Stefan-Ernst I've switched from type parameters to the fields, as we discussed. You can take a look at the last commit to confirm whether this is how you imagined it and whether it makes it easier for R users. I still think it's not "idiomatic" Julia :), but I am fine with this approach.

I've also renamed MeanStructure to MeanStruct, HessianEvaluation to HessianEval, and ApproximateHessian to ApproxHessian -- these are quite standard abbreviations that are used in Julia a lot.
But I can revert it if you think the previous full versions are better.

@Maximilian-Stefan-Ernst Maximilian-Stefan-Ernst merged commit a073af5 into StructuralEquationModels:devel Oct 29, 2024
4 of 5 checks passed
@Maximilian-Stefan-Ernst
Copy link
Collaborator

Perfect, merged!

@alyst alyst deleted the refactor_evaluate branch November 23, 2024 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants