-
Notifications
You must be signed in to change notification settings - Fork 6
Enhance SemObserved #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance SemObserved #230
Conversation
define a single source_to_dest_perm() function
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #230 +/- ##
==========================================
- Coverage 73.28% 72.88% -0.40%
==========================================
Files 49 50 +1
Lines 2231 2213 -18
==========================================
- Hits 1635 1613 -22
- Misses 596 600 +4 ☔ View full report in Codecov by Sentry. |
557c49d
to
a3b8bc8
Compare
src/frontend/specification/Sem.jl
Outdated
############################################################################################ | ||
|
||
function Sem(; | ||
specification = RAMMatrices, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make ParameterTable
the default here - I know you are relying on the RAM notation to specify your models, but I believe that most users of the package (~90%) would rather use the ParameterTable
specification.
I'm also thinking about always storing the specification object in the Sem
model, maybe as it's own field in Sem
- what do you think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it, but actually specifying the class is no-op, as internally most SEM terms convert it into RAMMatrices anyway.
As for a separate specification object -- I am just not sure what additional features it will enable in addition to the existing mechanisms of conversion DataFrame into SemSpecification and SemFit into DataFrame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to change that to ParameterTable
- I'll write something below on the specification object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed (also for SemFiniteDiff
below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I also added a keyword-only constructor for ParameterTable
- so this works:
model = Sem(
observed_vars = observed_vars,
latent_vars = latent_vars,
graph = graph,
data = data,
loss = SemML,
meanstructure = false,
)
Thanks again for this PR, I really like the streamlining of FIML and how the missing patterns are now computed/stored/used. I also like the changes to observed - I'm still thinking about whether we should have a field |
Thank you for the review -- thorough and helpful as usual!
To me the efficiency of the model inference and the cleanliness of the user API are the priorities in the design of Also, I am mostly working with SEM ensemble/multigroup models. Does your suggestion imply having specification objects both at the SemEnsemble level and Sem levels? |
My thoughts about having this specification field were that it would just simplify things for the user bcs they don't have to manage a separate specification object. I expect that the overwhelming majority would have a partable to convert to a DataFrame, and they always have to create a partable -> create the model, and then at the end, update partable -> convert to data frame. Assuming the partable is living in the model, we could have a function But this is not high priority right now, so after the last open comment is resolved, I would merge this 🥳 |
reduces code duplication; also annotate types of ctor args now samples(SemObsCov) returns nothing
add observed_vars(data::SemObserved)
* use SemObsMissingPattern struct to simplify code * replace O(Nvars^2) common pattern detection with Dict{} * don't store row-wise, store sub-matrices of non-missing data instead * use StatsBase.mean_and_cov()
StatsBase.mean_and_cov() is used instead
- SemObservedData: parameterize by cov/mean eltype instead of the whole container types Co-authored-by: Maximilian Ernst <[email protected]>
to match the update data preparation behaviour
to specify the prefix of the generated observed_vars if none provided could be inferred, defaults to :obs
e16d68f
to
673aa2b
Compare
I've changed the Sem default specification type to ParameterTable. I think that was the last remaining item from the review, so I have cleaned up the commit history a bit.
I think we actually agree on how the workflow should look from the user perspective: most of the time the user works with the parameter table representation - both for specifying the model and processing the results. |
Thanks! I added one last commit, if this one is fine with you, I would merge. Also, the workflow you suggest seems like a good idea - so maybe writing a function like |
Thank you! |
0d255ea
to
8c7cd14
Compare
v0.7 changed the diff interface (v0.6 was skipped)
3fc0a0e
to
ff67cf7
Compare
I've realized that the tests were failing with the commit that switches Sem to ParameterTable by default. I've fixed it, but I think it is another indication that SEM.jl should move away from kw-arg based constructors. |
ProximalAlgorithms.jl were update to 0.7 yesterday (from 0.5.4). |
Thanks a lot for the fix! |
d3915c1
into
StructuralEquationModels:devel
Another set of changes from #193 related to SemObserved:
observed_vars(observed)
method.That makes things simpler internally and, I think, also for the SEM users -- in particular it is now easy to check the SemSpecification, SemImply, and SemObserved compatibility.
obs_colnames
kwarg renamed toobserved_vars
for consistency with the other functions (but I would like to hear feedback on the better name).The idea is that now it should be more clear what this argument is for.
SemObserved
ctors accept bothspecification
andobserved_vars
. But with this PR havingSEM.observed_vars(spec) != observed_vars
would not raise an error.Instead,
observed_vars
kwarg takes precedence in mapping thedata
matrix columns to the observed variable names, whereas thespec
takes precedence in defining the order of the variables in the imported data(also, observed variables not referenced in
spec
will not be imported). So if the correct SEM specification is already defined, the user will not have to do an extra manual step to match the order of the columnsin the data matrix.
:obs1
,:obs2
etc instead of:1
,:2
), that could be tweaked withobserved_var_prefix
kw arg.prepare_data()
function that is used by allSemObserved
ctorsSemObservedCovariation
is made a type alias ofSemObservedData{Nothing}
(i.e.nothing
as data matrix) instead of being its own type.That reduces some code duplication.
SemObservedMissing
is refactored to incapsulate all properties of a missing pattern into the singleSemObservedMissingPattern
object, soSemObservedMissing
just stores a vector of these objects.In the context of the missing pattern, the observed variables that have values are called measured (so there are
measured_cov
,measured_mask
etc fields); the vars without values are termed missing (miss_mask
).That refactoring simplifies some code for EM, FIML, and minLL; and is also required for further enhancements of EM & FIML.