Refactor commodity costs for easy lookup #243

alexdewar · 2024-11-22T17:47:57Z

Description

Currently commodity costs are specified in a vector belonging to the corresponding commodity. At some point though we will need to actually access these values and this current data structure is a bit limited. Accordingly, I've reorganised the code so that they're specified in a map instead (still assigned to the relevant commodity, though) with a combination of commodity ID, region ID and milestone year used as the key. For the sake of encapsulating this functionality a bit, I've used the new type idiom to provide a simple wrapper around a HashMap, with a get method for accessing its contents.

This is in some ways similar to the refactor I want to do for demand slicing (#231), but unfortunately I've got myself in a bit of a tangle with that by trying to do other things first 😞, so I thought this was a better starting point.

Things I'm unsure about:

I've made a rule that if costs are provided for a given commodity + region pair, then there must be entries covering every milestone year. It seemed like a good idea to try to constrain the input to make the coding easier for now, but I dunno if it makes sense.
I've made it so that you can look up costs with commodity + region + time slice, but I'm not including balance type in that as I was assuming we won't be usually wanting to look up costs by balance type. Not sure if that's right.
I don't know what the time_slice_level field of commodities actually means and I couldn't figure it out from the docs. I don't think it's relevant to this PR, but could be wrong.

Closes #153.

Type of change

Bug fix (non-breaking change to fix an issue)
New feature (non-breaking change to add functionality)
Refactoring (non-breaking, non-functional change to improve maintainability)
Optimization (non-breaking change to speed up the code)
Breaking change (whatever its nature)
Documentation (improve or add documentation)

Key checklist

All tests pass: $ cargo test
The documentation builds and looks OK: $ cargo doc

Further checks

Code is commented, particularly in hard-to-understand areas
Tests added that prove fix is effective or that feature works

Closes #153.

…ion combo

codecov · 2024-11-22T17:50:53Z

Codecov Report

Attention: Patch coverage is 99.28315% with 2 lines in your changes missing coverage. Please review.

Project coverage is 94.68%. Comparing base (3414e17) to head (f6de922).
Report is 13 commits behind head on main.

Files with missing lines	Patch %	Lines
src/commodity.rs	99.56%	0 Missing and 1 partial ⚠️
src/model.rs	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #243      +/-   ##
==========================================
+ Coverage   94.04%   94.68%   +0.63%     
==========================================
  Files          13       13              
  Lines        2267     2464     +197     
  Branches     2267     2464     +197     
==========================================
+ Hits         2132     2333     +201     
+ Misses        101       97       -4     
  Partials       34       34

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

I don't think panicking in these cases really adds anything.

tsmbland · 2024-11-26T10:58:04Z

I haven't looked at the code yet, but just a few comments based on the description. Overall, I think it's probably all good for now, but there are definitely some things that we'll have to revisit, so worth discussing upfront.

I've made a rule that if costs are provided for a given commodity + region pair, then there must be entries covering every milestone year. It seemed like a good idea to try to constrain the input to make the coding easier for now, but I dunno if it makes sense.

The main issue is that MUSE also calculates commodity prices endogenously, so prices that you put in the input files may end up being overwritten. This depends on the commodity. You can see here for a discussion of how this works in MUSE1, but briefly:

Non-environmental commodities that are outputted by a process/processes will only use input prices for the first year, and any future prices will be recalculated endogenously
Commodities that are not outputted by any process will not be recalculated (will use input data for the full simulation)
If a carbon budget is being used, input CO2 prices will not be used at all, otherwise input prices will be used for all years

For now it's probably ok to mandate that there's data for all commodities in all years. But this will lead to a frustrating situation (like in MUSE1) where users are forced to input data that they know is going to be overwritten. Long run, we're probably going to have to do a check for each commodity to determine whether data is needed for all years or just the first year (or not at all?). We could also give users some control over this, i.e. user's can specify which commodities they want to use endogenous prices for (if possible) and which they want to use input prices for.

I've made it so that you can look up costs with commodity + region + time slice, but I'm not including balance type in that as I was assuming we won't be usually wanting to look up costs by balance type. Not sure if that's right.

I actually have no idea what balance type is, so can't help with this, but I imagine you're correct!

I don't know what the time_slice_level field of commodities actually means and I couldn't figure it out from the docs. I don't think it's relevant to this PR, but could be wrong.

time_slice_level is the timeframe over which supply/demand are balanced for a commodity (e.g. if "annual", then supply/demand of a commodity must be balanced over the whole year, but not necessarily in each timeslice). I haven't quite figured out how this is going to be implemented in MUSE2, but I'm in the process of adding a similar feature to MUSE1 (see here). In this case it's a sector-level parameter, so it works by compressing data to the appropriate timeslice level (summing demand, averaging prices) and passing it to the sector to try to meet those demands. In MUSE2 we don't have sectors and the whole system is solved together, so the implementation will be very different (I imagine this will all be built into the constraints?).

I don't think it's relevant here, but I imagine at some point we might want to group commodities according to their timeslice level (e.g. selecting all commodities with a level of "annual"), and gather their prices accordingly.

dalonsoa

This looks ok, but I'm a bit unsure if this new structure will be better in practice when we actually need to use the data. Time will tell.

I've a couple of questions, but my main comments are:

The organisation of the code makes hard to follow what's going on. I'd put together the structures and their methods. Not sure if that rusty but I feel it makes the code way more readable.
As you're modifying existing functions, you should use this opportunity to add missing docstrings, so we are completing that work bit by bit.

src/commodity.rs

dalonsoa · 2024-11-26T11:12:59Z

src/commodity.rs

+    // entries for all the other milestone years.
+    let mut used_milestone_years = HashMap::new();
+
+    for cost in iter {


If I understand this right, this is storing the commodity costs in something that, in Python, would be a dictionary where the keys are namedtuples of the form (region, year, timeslice), correct?

This looks ok as a generic storage solution, but I'm not sure how practical will be in reality. I can easily see use cases where you are interested in all costs for all regions and timeslices in a particular year. I guess we will cross that bridge when/if needed, but I would imagine we will need to implement methods for getting all the data corresponding to only one specific key of the tupple-key.

If I understand this right, this is storing the commodity costs in something that, in Python, would be a dictionary where the keys are namedtuples of the form (region, year, timeslice), correct?

Yep. We could have used a tuple here too, but I thought I'd be more explicit about it.

This looks ok as a generic storage solution, but I'm not sure how practical will be in reality. I can easily see use cases where you are interested in all costs for all regions and timeslices in a particular year. I guess we will cross that bridge when/if needed, but I would imagine we will need to implement methods for getting all the data corresponding to only one specific key of the tupple-key.

We can still do that if we want, though it will require iterating over the whole map, so it won't be as fast. Hopefully this data structure will do us for the time being, though 🤞

src/commodity.rs

dalonsoa · 2024-11-26T11:17:59Z

src/time_slice.rs

    }
+
+    /// Iterate over the subset of [`TimeSliceID`] indicated by `selection`
+    pub fn iter_selection<'a>(


In the output iterator there's a copy of the data, right? Why do we need to copy it and not just pass a reference to the original or something like that?

As it's Rc, the "copying" only consists of increasing a reference count, so it's not too expensive, but I think you're right that it's not necessary. I'll change it to return references.

alexdewar · 2024-11-27T09:20:43Z

The main issue is that MUSE also calculates commodity prices endogenously, so prices that you put in the input files may end up being overwritten. This depends on the commodity. You can see here for a discussion of how this works in MUSE1, but briefly:

Non-environmental commodities that are outputted by a process/processes will only use input prices for the first year, and any future prices will be recalculated endogenously

Commodities that are not outputted by any process will not be recalculated (will use input data for the full simulation)

If a carbon budget is being used, input CO2 prices will not be used at all, otherwise input prices will be used for all years

For now it's probably ok to mandate that there's data for all commodities in all years. But this will lead to a frustrating situation (like in MUSE1) where users are forced to input data that they know is going to be overwritten. Long run, we're probably going to have to do a check for each commodity to determine whether data is needed for all years or just the first year (or not at all?). We could also give users some control over this, i.e. user's can specify which commodities they want to use endogenous prices for (if possible) and which they want to use input prices for.

Ah, that's interesting to know. I think in the case of input data that will definitely be overwritten, we'd actually want to forbid users from supplying it or at least issue a warning if they do. Let's leave it as is for now though. I'll open an issue for removing this restriction, once we are able to do this validation.

time_slice_level is the timeframe over which supply/demand are balanced for a commodity (e.g. if "annual", then supply/demand of a commodity must be balanced over the whole year, but not necessarily in each timeslice). I haven't quite figured out how this is going to be implemented in MUSE2, but I'm in the process of adding a similar feature to MUSE1 (see here). In this case it's a sector-level parameter, so it works by compressing data to the appropriate timeslice level (summing demand, averaging prices) and passing it to the sector to try to meet those demands. In MUSE2 we don't have sectors and the whole system is solved together, so the implementation will be very different (I imagine this will all be built into the constraints?).

I don't think it's relevant here, but I imagine at some point we might want to group commodities according to their timeslice level (e.g. selecting all commodities with a level of "annual"), and gather their prices accordingly.

Good to know! Thanks.

dalonsoa

LGTM!

AdrianDAlessandro

This is a big PR... I've just spend an hour going through it and realised I'm not even a quart of the way through the code changes. Might need to go through it in person to understand what it's doing

src/commodity.rs

…st-slices

This makes it consistent with the other error handling in this file.

AdrianDAlessandro

Happy that this seems sensible

AdrianDAlessandro · 2024-11-28T16:28:26Z

src/commodity.rs

+            };
+
+            ensure!(
+                map.0.insert(key, value).is_none(),


To confirm: The reason this works is .insert will return something if there was already a value at key?

alexdewar · 2024-11-28T16:40:12Z

Thanks for reviews!

alexdewar added 6 commits November 22, 2024 16:42

Use anyhow crate for commodity.rs

aeaac30

Disallow non-milestone years for commodity cost entries

96fd1be

Add method to iterate over time slices in selection

8652f9c

Return commodity costs as map

f946da8

Closes #153.

Add test that non-milestone years fail

7559188

Require that all milestone years are covered for each commodity + reg…

3c026d4

…ion combo

alexdewar requested review from AdrianDAlessandro, HarmonicReflux, Sahil590, TinyMarsh, dalonsoa and tsmbland November 22, 2024 21:54

Replace use of todo! with regular errors

399f653

I don't think panicking in these cases really adds anything.

alexdewar marked this pull request as ready for review November 25, 2024 09:49

dalonsoa requested changes Nov 26, 2024

View reviewed changes

alexdewar added 2 commits November 26, 2024 15:14

Move CommodityCostMap's impl block to be beside its definition

2ae199d

Add method to iterate over all time slices + return by reference

f2939b2

alexdewar requested a review from dalonsoa November 26, 2024 15:26

dalonsoa approved these changes Nov 27, 2024

View reviewed changes

AdrianDAlessandro reviewed Nov 27, 2024

View reviewed changes

src/commodity.rs Show resolved Hide resolved

src/commodity.rs Show resolved Hide resolved

src/commodity.rs Show resolved Hide resolved

src/commodity.rs Show resolved Hide resolved

AdrianDAlessandro linked an issue Nov 28, 2024 that may be closed by this pull request

Use anyhow::Result for commodity.rs #252

Closed

alexdewar added 3 commits November 28, 2024 15:19

Merge remote-tracking branch 'origin/main' into refactor-commodity-co…

182d6d6

…st-slices

Return Result from read_commodity_costs too

812f28e

This makes it consistent with the other error handling in this file.

Fix warning about elided lifetime

f6de922

AdrianDAlessandro approved these changes Nov 28, 2024

View reviewed changes

alexdewar merged commit 2276dc8 into main Nov 28, 2024
7 checks passed

alexdewar deleted the refactor-commodity-cost-slices branch November 28, 2024 16:40

Refactor commodity costs for easy lookup #243

Refactor commodity costs for easy lookup #243

Uh oh!

Conversation

alexdewar commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Key checklist

Further checks

Uh oh!

codecov bot commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tsmbland commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalonsoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dalonsoa Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

alexdewar Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dalonsoa Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

alexdewar Nov 26, 2024

Choose a reason for hiding this comment

Uh oh!

alexdewar commented Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dalonsoa left a comment

Choose a reason for hiding this comment

Uh oh!

AdrianDAlessandro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AdrianDAlessandro left a comment

Choose a reason for hiding this comment

Uh oh!

AdrianDAlessandro Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

alexdewar Nov 28, 2024

Choose a reason for hiding this comment

Uh oh!

alexdewar commented Nov 28, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alexdewar commented Nov 22, 2024 •

edited

Loading

codecov bot commented Nov 22, 2024 •

edited

Loading

tsmbland commented Nov 26, 2024 •

edited

Loading

alexdewar commented Nov 27, 2024 •

edited

Loading