Skip to content

Conversation

@alexdewar
Copy link
Collaborator

@alexdewar alexdewar commented Nov 22, 2024

Description

Currently commodity costs are specified in a vector belonging to the corresponding commodity. At some point though we will need to actually access these values and this current data structure is a bit limited. Accordingly, I've reorganised the code so that they're specified in a map instead (still assigned to the relevant commodity, though) with a combination of commodity ID, region ID and milestone year used as the key. For the sake of encapsulating this functionality a bit, I've used the new type idiom to provide a simple wrapper around a HashMap, with a get method for accessing its contents.

This is in some ways similar to the refactor I want to do for demand slicing (#231), but unfortunately I've got myself in a bit of a tangle with that by trying to do other things first 😞, so I thought this was a better starting point.

Things I'm unsure about:

  • I've made a rule that if costs are provided for a given commodity + region pair, then there must be entries covering every milestone year. It seemed like a good idea to try to constrain the input to make the coding easier for now, but I dunno if it makes sense.
  • I've made it so that you can look up costs with commodity + region + time slice, but I'm not including balance type in that as I was assuming we won't be usually wanting to look up costs by balance type. Not sure if that's right.
  • I don't know what the time_slice_level field of commodities actually means and I couldn't figure it out from the docs. I don't think it's relevant to this PR, but could be wrong.

Closes #153.

Type of change

  • Bug fix (non-breaking change to fix an issue)
  • New feature (non-breaking change to add functionality)
  • Refactoring (non-breaking, non-functional change to improve maintainability)
  • Optimization (non-breaking change to speed up the code)
  • Breaking change (whatever its nature)
  • Documentation (improve or add documentation)

Key checklist

  • All tests pass: $ cargo test
  • The documentation builds and looks OK: $ cargo doc

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@codecov
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 99.28315% with 2 lines in your changes missing coverage. Please review.

Project coverage is 94.68%. Comparing base (3414e17) to head (f6de922).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
src/commodity.rs 99.56% 0 Missing and 1 partial ⚠️
src/model.rs 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #243      +/-   ##
==========================================
+ Coverage   94.04%   94.68%   +0.63%     
==========================================
  Files          13       13              
  Lines        2267     2464     +197     
  Branches     2267     2464     +197     
==========================================
+ Hits         2132     2333     +201     
+ Misses        101       97       -4     
  Partials       34       34              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

I don't think panicking in these cases really adds anything.
@alexdewar alexdewar marked this pull request as ready for review November 25, 2024 09:49
@tsmbland
Copy link
Collaborator

tsmbland commented Nov 26, 2024

I haven't looked at the code yet, but just a few comments based on the description. Overall, I think it's probably all good for now, but there are definitely some things that we'll have to revisit, so worth discussing upfront.

I've made a rule that if costs are provided for a given commodity + region pair, then there must be entries covering every milestone year. It seemed like a good idea to try to constrain the input to make the coding easier for now, but I dunno if it makes sense.

The main issue is that MUSE also calculates commodity prices endogenously, so prices that you put in the input files may end up being overwritten. This depends on the commodity. You can see here for a discussion of how this works in MUSE1, but briefly:

  • Non-environmental commodities that are outputted by a process/processes will only use input prices for the first year, and any future prices will be recalculated endogenously
  • Commodities that are not outputted by any process will not be recalculated (will use input data for the full simulation)
  • If a carbon budget is being used, input CO2 prices will not be used at all, otherwise input prices will be used for all years

For now it's probably ok to mandate that there's data for all commodities in all years. But this will lead to a frustrating situation (like in MUSE1) where users are forced to input data that they know is going to be overwritten. Long run, we're probably going to have to do a check for each commodity to determine whether data is needed for all years or just the first year (or not at all?). We could also give users some control over this, i.e. user's can specify which commodities they want to use endogenous prices for (if possible) and which they want to use input prices for.

I've made it so that you can look up costs with commodity + region + time slice, but I'm not including balance type in that as I was assuming we won't be usually wanting to look up costs by balance type. Not sure if that's right.

I actually have no idea what balance type is, so can't help with this, but I imagine you're correct!

I don't know what the time_slice_level field of commodities actually means and I couldn't figure it out from the docs. I don't think it's relevant to this PR, but could be wrong.

time_slice_level is the timeframe over which supply/demand are balanced for a commodity (e.g. if "annual", then supply/demand of a commodity must be balanced over the whole year, but not necessarily in each timeslice). I haven't quite figured out how this is going to be implemented in MUSE2, but I'm in the process of adding a similar feature to MUSE1 (see here). In this case it's a sector-level parameter, so it works by compressing data to the appropriate timeslice level (summing demand, averaging prices) and passing it to the sector to try to meet those demands. In MUSE2 we don't have sectors and the whole system is solved together, so the implementation will be very different (I imagine this will all be built into the constraints?).

I don't think it's relevant here, but I imagine at some point we might want to group commodities according to their timeslice level (e.g. selecting all commodities with a level of "annual"), and gather their prices accordingly.

Copy link
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ok, but I'm a bit unsure if this new structure will be better in practice when we actually need to use the data. Time will tell.

I've a couple of questions, but my main comments are:

  • The organisation of the code makes hard to follow what's going on. I'd put together the structures and their methods. Not sure if that rusty but I feel it makes the code way more readable.
  • As you're modifying existing functions, you should use this opportunity to add missing docstrings, so we are completing that work bit by bit.

// entries for all the other milestone years.
let mut used_milestone_years = HashMap::new();

for cost in iter {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this right, this is storing the commodity costs in something that, in Python, would be a dictionary where the keys are namedtuples of the form (region, year, timeslice), correct?

This looks ok as a generic storage solution, but I'm not sure how practical will be in reality. I can easily see use cases where you are interested in all costs for all regions and timeslices in a particular year. I guess we will cross that bridge when/if needed, but I would imagine we will need to implement methods for getting all the data corresponding to only one specific key of the tupple-key.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this right, this is storing the commodity costs in something that, in Python, would be a dictionary where the keys are namedtuples of the form (region, year, timeslice), correct?

Yep. We could have used a tuple here too, but I thought I'd be more explicit about it.

This looks ok as a generic storage solution, but I'm not sure how practical will be in reality. I can easily see use cases where you are interested in all costs for all regions and timeslices in a particular year. I guess we will cross that bridge when/if needed, but I would imagine we will need to implement methods for getting all the data corresponding to only one specific key of the tupple-key.

We can still do that if we want, though it will require iterating over the whole map, so it won't be as fast. Hopefully this data structure will do us for the time being, though 🤞

}

/// Iterate over the subset of [`TimeSliceID`] indicated by `selection`
pub fn iter_selection<'a>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the output iterator there's a copy of the data, right? Why do we need to copy it and not just pass a reference to the original or something like that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's Rc, the "copying" only consists of increasing a reference count, so it's not too expensive, but I think you're right that it's not necessary. I'll change it to return references.

@alexdewar alexdewar requested a review from dalonsoa November 26, 2024 15:26
@alexdewar
Copy link
Collaborator Author

alexdewar commented Nov 27, 2024

The main issue is that MUSE also calculates commodity prices endogenously, so prices that you put in the input files may end up being overwritten. This depends on the commodity. You can see here for a discussion of how this works in MUSE1, but briefly:

  • Non-environmental commodities that are outputted by a process/processes will only use input prices for the first year, and any future prices will be recalculated endogenously
  • Commodities that are not outputted by any process will not be recalculated (will use input data for the full simulation)
  • If a carbon budget is being used, input CO2 prices will not be used at all, otherwise input prices will be used for all years

For now it's probably ok to mandate that there's data for all commodities in all years. But this will lead to a frustrating situation (like in MUSE1) where users are forced to input data that they know is going to be overwritten. Long run, we're probably going to have to do a check for each commodity to determine whether data is needed for all years or just the first year (or not at all?). We could also give users some control over this, i.e. user's can specify which commodities they want to use endogenous prices for (if possible) and which they want to use input prices for.

Ah, that's interesting to know. I think in the case of input data that will definitely be overwritten, we'd actually want to forbid users from supplying it or at least issue a warning if they do. Let's leave it as is for now though. I'll open an issue for removing this restriction, once we are able to do this validation.

time_slice_level is the timeframe over which supply/demand are balanced for a commodity (e.g. if "annual", then supply/demand of a commodity must be balanced over the whole year, but not necessarily in each timeslice). I haven't quite figured out how this is going to be implemented in MUSE2, but I'm in the process of adding a similar feature to MUSE1 (see here). In this case it's a sector-level parameter, so it works by compressing data to the appropriate timeslice level (summing demand, averaging prices) and passing it to the sector to try to meet those demands. In MUSE2 we don't have sectors and the whole system is solved together, so the implementation will be very different (I imagine this will all be built into the constraints?).

I don't think it's relevant here, but I imagine at some point we might want to group commodities according to their timeslice level (e.g. selecting all commodities with a level of "annual"), and gather their prices accordingly.

Good to know! Thanks.

Copy link
Collaborator

@dalonsoa dalonsoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@AdrianDAlessandro AdrianDAlessandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big PR... I've just spend an hour going through it and realised I'm not even a quart of the way through the code changes. Might need to go through it in person to understand what it's doing

@AdrianDAlessandro AdrianDAlessandro linked an issue Nov 28, 2024 that may be closed by this pull request
Copy link
Collaborator

@AdrianDAlessandro AdrianDAlessandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy that this seems sensible

};

ensure!(
map.0.insert(key, value).is_none(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm: The reason this works is .insert will return something if there was already a value at key?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

@alexdewar
Copy link
Collaborator Author

Thanks for reviews!

@alexdewar alexdewar merged commit 2276dc8 into main Nov 28, 2024
7 checks passed
@alexdewar alexdewar deleted the refactor-commodity-cost-slices branch November 28, 2024 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Use anyhow::Result for commodity.rs Refactor commodity cost time slices

5 participants