-
Notifications
You must be signed in to change notification settings - Fork 15
Fix ssp correction #69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ssp correction #69
Conversation
Note that I won't be in the office tomorrow. We can continue on Thursday or Friday. |
146d642
to
8d6f0e6
Compare
Dear Kris, thanks for the PR. There were indeed many conflicts (my suggestion to start from the previous branch was bad). I can now check the PR contents... 😉 |
I added a couple of commits to move common interpolation code to a function and to remove the unnecessary |
I force-pulled the branch. |
There is still a problem in my tests when computing spectral weights from S/N: signal spectrum and noise spectrum sometimes do not have the same length. Today's is holiday in France and tomorrow I'll take the day off. |
On my side it still works. |
I think the problem is here:
This assumption is no longer valid if the frequency range A possible solution would be to pass the frequency range as an extra optional argument when calling So something like: specnoise = _build_spectrum(config, trace_noise, interpolate_freqs=spec.freqs) |
OK, I will have a look as well. Enjoy your long weekend! |
I understand the problem now. I think it would be better not to change the frequency range of the signal spectrum, and set NaN values outside the frequency range of the residual for the linear-spaced spectrum as well, thus also keeping linear-spaced and log-spaced versions in sync. We then need to check if the NaN values do not give any problem when calculating spectral SNR. I will improve the |
Claudio, I changed the station correction function in a way that the original frequency range is not modified. Can you check if this solves your problem with noise weighting? |
Hi Kris, thanks for this additional work. I had to make some further modifications:
Now it seems to work, but I need to test it more extensively. Could you also check on your side? |
OK |
One more commit related to NaN data 😉 |
My results look identical. |
Great!
Ok, so maybe I can use a similar approach for weight computation, instead of replacing NaN with 1e-9... |
I did it like this:
This is possible because we know the NaN values only occur at the start and/or end. The interpolated spectrum will also be NaN where the original one is NaN, because we don't allow extrapolation. |
…quency range before correcting in station_correction function. Note that the logspaced version of the corrected spectrum is not truncated, but filled with NaN values outside the common frequency range.
…ight before fitting in _spec_inversion function.
c7cdfeb
to
e818baa
Compare
Hi Kris, I just force-pushed to this branch a commit that moves all the interpolation into the I still have to rework the weighting part (where I replaced In particular, I reimplemented your search and removal of Oh, and if you have better names for the new methods in the Looking forward to your feedback 😉 P.S. You must force-pull since I rebased this PR over a few additional commits added to |
Claudio, I ran my test case again after force-pulling the branch. The results (in terms of source parameters) are practically the same. However, there again appear some edge effects that were mostly gone in the previous version. I also noticed that the frequency ranges of the corrected spectra reported in the log file are not the same: Previous run:
New run:
Any idea what could be causing this? |
Thanks, Kris for the test. The frequency ranges are actually the same, when you sort the two files 😉 I need to think on why there is again this edge case on BE.MOLT... |
Oops, I didn't see that... |
Note that it's not just BE.MOLT. Also BE.MOL1, and a number of other stations (I did not include all the plots). |
Probably the best would be if you can send me a working example |
Claudio, Sorry, I have been very busy the past few weeks. I had another look at the issue, and I think the edge effects are still due to extrapolation. We can discuss next week if this solution is OK for you or if we can further improve it. |
…f Spectrum class.
…inear frequency range in make_freq_logspaced method of Spectrum.
08fe6ef
to
0ad5908
Compare
Yes! (I just force-pushed again, but just to fix a typo in a commit message 😉 ) |
OK, I will update and test. |
I force-pulled and ran again. |
That's surprising! Let's continue tomorrow, if that's ok for you |
OK! |
Here are the different outputs: Before force-push:
After force-push:
|
Hi Kris, From your outputs, I could see that the "pre force-push" version was So I made a diff with that version through
Anyway, this is the ouptut of the diff diff --git a/CHANGELOG.md b/CHANGELOG.md
index d412158..8a43c5d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -25,7 +25,7 @@ Copyright (c) 2011-2025 Claudio Satriano <[email protected]>
trace amplitude below which the trace is not checked for clipping
- Use spectral interpolation to compute and apply station residuals
- Limit spectrum and residual to common frequency range when applying
- correction before fitting
+ correction before fitting (see [#69])
### Post-Inversion
@@ -828,3 +828,4 @@ Initial Python port.
[#49]: https://github.com/SeismicSource/sourcespec/issues/49
[#67]: https://github.com/SeismicSource/sourcespec/issues/67
[#68]: https://github.com/SeismicSource/sourcespec/issues/68
+[#69]: https://github.com/SeismicSource/sourcespec/issues/69
diff --git a/sourcespec/spectrum.py b/sourcespec/spectrum.py
index e145a70..e9f9df9 100644
--- a/sourcespec/spectrum.py
+++ b/sourcespec/spectrum.py
@@ -112,7 +112,7 @@ def _find_nan_edges(data):
def _interpolate_data_to_new_freq(
- data, freq, new_freq, fill_value='extrapolate', fix_negative=True):
+ data, freq, new_freq, fill_value=np.nan, fix_negative=True):
"""
Helper function to interpolate data to a new frequency range.
@@ -120,6 +120,7 @@ def _interpolate_data_to_new_freq(
:param freq: The original frequency range.
:param new_freq: The new frequency range.
:param fill_value: The value to use for extrapolation.
+ Default is np.nan.
If 'extrapolate', the data is extrapolated to the new frequency range.
See scipy.interpolate.interp1d for more details.
:param fix_negative: If True, negative values are replaced by the
@@ -393,6 +394,11 @@ class Spectrum():
n = np.ceil((log_freq[-1] - log_freq[0]) / log_df)
freq_logspaced =\
np.logspace(log_freq[0], log_freq[-1], int(n)+1)
+ # Make sure first and last frequencies match exactly between
+ # logspaced and linear frequency axes (since the code above might have
+ # numerical errors)
+ freq_logspaced[0] = self.freq[0]
+ freq_logspaced[-1] = self.freq[-1]
# If logspaced frequencies already exist,
# reinterpolate the data to the new logspaced frequencies
if (
@@ -432,34 +438,15 @@ class Spectrum():
raise ValueError('Data axis is empty')
if self.freq_logspaced.size == 0:
raise ValueError('Logspaced frequency axis is empty')
- # Find NaN indexes at the beginning and end of the data
+ # Find and remove NaN values at the beginning and end of the data
nan_idxs = _find_nan_edges(data)
- # If there are NaN values at the beginning or end,
- # do not use extrapolation
- fill_value = np.nan if nan_idxs.size else 'extrapolate'
data = np.delete(data, nan_idxs)
freq = np.delete(self.freq, nan_idxs)
# Reinterpolate data using log10 frequencies
data_logspaced = _interpolate_data_to_new_freq(
data, freq, self.freq_logspaced,
- fill_value=fill_value, fix_negative=fix_negative
+ fix_negative=fix_negative
)
- # If extrapolation was applied, mask (set to NaN) the values that lie
- # outside the original frequency range to avoid relying on unreliable
- # extrapolated data.
- if fill_value == 'extrapolate':
- # Spacing between the first two log-spaced frequencies.
- # Used as a margin to identify values that fall below
- # the valid range.
- lower_spacing = self.freq_logspaced[1] - self.freq_logspaced[0]
- data_logspaced[
- self.freq_logspaced < freq[0] - lower_spacing] = np.nan
- # Spacing between the last two log-spaced frequencies.
- # Used as a margin to identify values that fall above
- # the valid range.
- upper_spacing = self.freq_logspaced[-1] - self.freq_logspaced[-2]
- data_logspaced[
- self.freq_logspaced > freq[-1] + upper_spacing] = np.nan
if which == 'data':
self.data_logspaced = data_logspaced
else:
@@ -506,12 +493,13 @@ class Spectrum():
else:
self.data_mag = data
- def interp_data_to_new_freq(self, new_freq, fill_value='extrapolate'):
+ def interp_data_to_new_freq(self, new_freq, fill_value=np.nan):
"""
Interpolate the linear data to a new frequency axis.
:param new_freq: The new frequency axis.
:param fill_value: The value to use for extrapolation.
+ Default is np.nan.
If 'extrapolate', the data is extrapolated to the new frequency
range. See scipy.interpolate.interp1d for more details.
"""
@@ -532,13 +520,13 @@ class Spectrum():
)
self.freq = new_freq
- def interp_data_logspaced_to_new_freq(
- self, new_freq, fill_value='extrapolate'):
+ def interp_data_logspaced_to_new_freq(self, new_freq, fill_value=np.nan):
"""
Interpolate the logspaced data to a new frequency axis.
:param new_freq: The new frequency axis.
:param fill_value: The value to use for extrapolation.
+ Default is np.nan.
If 'extrapolate', the data is extrapolated to the new frequency
range. See scipy.interpolate.interp1d for more details.
""" The main changes are:
This later modification affects all the calls to However, those changes were already in the latest version before the force-push. So I think that your reference run was an older one. Do you think that the new solution, even if different, remains acceptable? I think we should accept that this new code produces slightly different spectra and that the solutions might be slightly (hopefully) different. Let me know your thoughts |
In fact, you can make a precise diff between your two versions,
The result is basically the same as above (without the change in CHANGELOG): diff --git a/sourcespec/spectrum.py b/sourcespec/spectrum.py
index e145a70..e9f9df9 100644
--- a/sourcespec/spectrum.py
+++ b/sourcespec/spectrum.py
@@ -112,7 +112,7 @@ def _find_nan_edges(data):
def _interpolate_data_to_new_freq(
- data, freq, new_freq, fill_value='extrapolate', fix_negative=True):
+ data, freq, new_freq, fill_value=np.nan, fix_negative=True):
"""
Helper function to interpolate data to a new frequency range.
@@ -120,6 +120,7 @@ def _interpolate_data_to_new_freq(
:param freq: The original frequency range.
:param new_freq: The new frequency range.
:param fill_value: The value to use for extrapolation.
+ Default is np.nan.
If 'extrapolate', the data is extrapolated to the new frequency range.
See scipy.interpolate.interp1d for more details.
:param fix_negative: If True, negative values are replaced by the
@@ -393,6 +394,11 @@ class Spectrum():
n = np.ceil((log_freq[-1] - log_freq[0]) / log_df)
freq_logspaced =\
np.logspace(log_freq[0], log_freq[-1], int(n)+1)
+ # Make sure first and last frequencies match exactly between
+ # logspaced and linear frequency axes (since the code above might have
+ # numerical errors)
+ freq_logspaced[0] = self.freq[0]
+ freq_logspaced[-1] = self.freq[-1]
# If logspaced frequencies already exist,
# reinterpolate the data to the new logspaced frequencies
if (
@@ -432,34 +438,15 @@ class Spectrum():
raise ValueError('Data axis is empty')
if self.freq_logspaced.size == 0:
raise ValueError('Logspaced frequency axis is empty')
- # Find NaN indexes at the beginning and end of the data
+ # Find and remove NaN values at the beginning and end of the data
nan_idxs = _find_nan_edges(data)
- # If there are NaN values at the beginning or end,
- # do not use extrapolation
- fill_value = np.nan if nan_idxs.size else 'extrapolate'
data = np.delete(data, nan_idxs)
freq = np.delete(self.freq, nan_idxs)
# Reinterpolate data using log10 frequencies
data_logspaced = _interpolate_data_to_new_freq(
data, freq, self.freq_logspaced,
- fill_value=fill_value, fix_negative=fix_negative
+ fix_negative=fix_negative
)
- # If extrapolation was applied, mask (set to NaN) the values that lie
- # outside the original frequency range to avoid relying on unreliable
- # extrapolated data.
- if fill_value == 'extrapolate':
- # Spacing between the first two log-spaced frequencies.
- # Used as a margin to identify values that fall below
- # the valid range.
- lower_spacing = self.freq_logspaced[1] - self.freq_logspaced[0]
- data_logspaced[
- self.freq_logspaced < freq[0] - lower_spacing] = np.nan
- # Spacing between the last two log-spaced frequencies.
- # Used as a margin to identify values that fall above
- # the valid range.
- upper_spacing = self.freq_logspaced[-1] - self.freq_logspaced[-2]
- data_logspaced[
- self.freq_logspaced > freq[-1] + upper_spacing] = np.nan
if which == 'data':
self.data_logspaced = data_logspaced
else:
@@ -506,12 +493,13 @@ class Spectrum():
else:
self.data_mag = data
- def interp_data_to_new_freq(self, new_freq, fill_value='extrapolate'):
+ def interp_data_to_new_freq(self, new_freq, fill_value=np.nan):
"""
Interpolate the linear data to a new frequency axis.
:param new_freq: The new frequency axis.
:param fill_value: The value to use for extrapolation.
+ Default is np.nan.
If 'extrapolate', the data is extrapolated to the new frequency
range. See scipy.interpolate.interp1d for more details.
"""
@@ -532,13 +520,13 @@ class Spectrum():
)
self.freq = new_freq
- def interp_data_logspaced_to_new_freq(
- self, new_freq, fill_value='extrapolate'):
+ def interp_data_logspaced_to_new_freq(self, new_freq, fill_value=np.nan):
"""
Interpolate the logspaced data to a new frequency axis.
:param new_freq: The new frequency axis.
:param fill_value: The value to use for extrapolation.
+ Default is np.nan.
If 'extrapolate', the data is extrapolated to the new frequency
range. See scipy.interpolate.interp1d for more details.
""" |
Digging deeper, you made your reference test before integrating commit You can see it locally (it is no more on GitHub) through: commit fa0959a09488f9b0af73de47c310e0a622a515f6
Author: Claudio Satriano <[email protected]>
Date: Wed Aug 20 17:47:17 2025 +0200
Use np.nan as default fill_value in Spectrum interpolation methods
diff --git a/sourcespec/spectrum.py b/sourcespec/spectrum.py
index e145a70..e9f9df9 100644
--- a/sourcespec/spectrum.py
+++ b/sourcespec/spectrum.py
@@ -112,7 +112,7 @@ def _find_nan_edges(data):
def _interpolate_data_to_new_freq(
- data, freq, new_freq, fill_value='extrapolate', fix_negative=True):
+ data, freq, new_freq, fill_value=np.nan, fix_negative=True):
"""
Helper function to interpolate data to a new frequency range.
@@ -120,6 +120,7 @@ def _interpolate_data_to_new_freq(
:param freq: The original frequency range.
:param new_freq: The new frequency range.
:param fill_value: The value to use for extrapolation.
+ Default is np.nan.
If 'extrapolate', the data is extrapolated to the new frequency range.
See scipy.interpolate.interp1d for more details.
:param fix_negative: If True, negative values are replaced by the
@@ -393,6 +394,11 @@ class Spectrum():
n = np.ceil((log_freq[-1] - log_freq[0]) / log_df)
freq_logspaced =\
np.logspace(log_freq[0], log_freq[-1], int(n)+1)
+ # Make sure first and last frequencies match exactly between
+ # logspaced and linear frequency axes (since the code above might have
+ # numerical errors)
+ freq_logspaced[0] = self.freq[0]
+ freq_logspaced[-1] = self.freq[-1]
# If logspaced frequencies already exist,
# reinterpolate the data to the new logspaced frequencies
if (
@@ -432,34 +438,15 @@ class Spectrum():
raise ValueError('Data axis is empty')
if self.freq_logspaced.size == 0:
raise ValueError('Logspaced frequency axis is empty')
- # Find NaN indexes at the beginning and end of the data
+ # Find and remove NaN values at the beginning and end of the data
nan_idxs = _find_nan_edges(data)
- # If there are NaN values at the beginning or end,
- # do not use extrapolation
- fill_value = np.nan if nan_idxs.size else 'extrapolate'
data = np.delete(data, nan_idxs)
freq = np.delete(self.freq, nan_idxs)
# Reinterpolate data using log10 frequencies
data_logspaced = _interpolate_data_to_new_freq(
data, freq, self.freq_logspaced,
- fill_value=fill_value, fix_negative=fix_negative
+ fix_negative=fix_negative
)
- # If extrapolation was applied, mask (set to NaN) the values that lie
- # outside the original frequency range to avoid relying on unreliable
- # extrapolated data.
- if fill_value == 'extrapolate':
- # Spacing between the first two log-spaced frequencies.
- # Used as a margin to identify values that fall below
- # the valid range.
- lower_spacing = self.freq_logspaced[1] - self.freq_logspaced[0]
- data_logspaced[
- self.freq_logspaced < freq[0] - lower_spacing] = np.nan
- # Spacing between the last two log-spaced frequencies.
- # Used as a margin to identify values that fall above
- # the valid range.
- upper_spacing = self.freq_logspaced[-1] - self.freq_logspaced[-2]
- data_logspaced[
- self.freq_logspaced > freq[-1] + upper_spacing] = np.nan
if which == 'data':
self.data_logspaced = data_logspaced
else:
@@ -506,12 +493,13 @@ class Spectrum():
else:
self.data_mag = data
- def interp_data_to_new_freq(self, new_freq, fill_value='extrapolate'):
+ def interp_data_to_new_freq(self, new_freq, fill_value=np.nan):
"""
Interpolate the linear data to a new frequency axis.
:param new_freq: The new frequency axis.
:param fill_value: The value to use for extrapolation.
+ Default is np.nan.
If 'extrapolate', the data is extrapolated to the new frequency
range. See scipy.interpolate.interp1d for more details.
"""
@@ -532,13 +520,13 @@ class Spectrum():
)
self.freq = new_freq
- def interp_data_logspaced_to_new_freq(
- self, new_freq, fill_value='extrapolate'):
+ def interp_data_logspaced_to_new_freq(self, new_freq, fill_value=np.nan):
"""
Interpolate the logspaced data to a new frequency axis.
:param new_freq: The new frequency axis.
:param fill_value: The value to use for extrapolation.
+ Default is np.nan.
If 'extrapolate', the data is extrapolated to the new frequency
range. See scipy.interpolate.interp1d for more details.
""" |
I'll have a look. |
OK, I agree this is the likely reason. I manually set the fill value to NaN, as I didn't see you had a commit for it. So it was only used in this particular call, while the general interpolation function was still using extrapolation as the default. The differences are minimal in terms of magnitude, but more pronounced for corner frequency (and related parameters), which does raise a question how large the uncertainties are for these parameters. |
Yes, I agree that the uncertainty of single-station estimates of fc and t* is often underestimated, as these values can vary significantly with small changes in the spectral shape or with the choice of inversion method. On the other hand, using multiple stations largely mitigates this issue, since inter-station variability is much greater than within-station uncertainty. This is supported by the fact that your summary values for fc and t* are consistent within their respective errors: Before:
After:
|
Claudio, |
If you're curious, you can try using grid search. You should get much larger single-station uncertainties |
So, are you OK to merge this PR? |
Yes, let's finalize this one. |
Merged! That was a long ride! Thank you so much Kris for all this work 🙏 |
Yes, that was a long one. Glad we solved it! |
Yes! I'm currently directly pushing to Then, I'll:
|
Yes, I will attend the IASPEI meeting in Lisbon: https://c-in.floq.live/event/iagaiaspei-2025/search?objectClass=timeslot&objectId=685539310879475d4c6a57c6&type=detail A pity you are not attending. I will chat with Aurore then. |
Claudio,
Here is my pull request to fix the interpolation issue when residual correction is applied. As you suggested, I created the branch from the
fix_source_residuals
branch, but now it says that it can't automatically merge... Let me know if I need to do it differently.I ran the fix with my test case and it appears to work, but you should be aware that in the spectrum returned by the
station_correction
function, the linear-spaced and log-spaced versions may not have the same length and frequency range. The linear one is truncated to the common frequency range with the station residual, while the logarithmic one keeps the original length, with values outside the common frequency range set to NaN. I'm not sure if this could cause any problems further in the code, but I did not notice anything.