Skip to content

Conversation

@milancurcic
Copy link
Member

This is the first step toward decoupling the optimizer logic from the concrete layers.

This PR only introduces the abstract optimizer_base_type and a concrete sgd type.

The update of weights is still hardcoded in network % train and the concrete layer implementations; decoupling that remains a TODO.

In a nutshell, the idea is to have

  • Concrete optimizer types such as sgd, adam, etc. in nf_optimizers.f90 (and its submodules, eventually);
  • The type constructors would expect optimizer parameters from the user (e.g. adam(learning_rate, beta1, beta2, epsilon, ...))
  • Each concrete type would define an update subroutine which would expect the needed gradients (dw, db) as input, and also the weights and biases arrays as intent(out) to update.

@rweed let me know if this approach seems reasonable to you.

@milancurcic milancurcic merged commit edd3f70 into modern-fortran:main Jan 19, 2023
@milancurcic milancurcic deleted the refactor-optimizer-stub branch January 19, 2023 15:30
wilsonify pushed a commit to wilsonify/modern-fortran that referenced this pull request Jan 27, 2023
wilsonify pushed a commit to wilsonify/modern-fortran that referenced this pull request Jul 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant