Query to slot routines #108

stemangiola · 2025-08-31T02:24:02Z

No description provided.

…d refactor update_SE_from_tibble to utilize these new functions. Include tests for the new extraction functions to ensure correct functionality.

…ate_features and mutate_samples to modify_features and modify_samples, respectively. Introduce a new helper function, mutate_assay, for handling assay-only mutations. Add query scope analysis functions to determine the target data types for mutate operations. Remove deprecated documentation for the old mutate functions.

Enhance the analyze_query_scope_mutate function to include a new dependency analysis step that detects mixed operations in mutate expressions. Introduce analyze_expression_dependencies function to evaluate the scope of variables referenced in expressions, determining if they span multiple data types (assays, colData, rowData). This addition improves the accuracy of query scope analysis by allowing for more nuanced handling of mixed data types.

Enhance the mutate.SummarizedExperiment function to support mixed scope operations by introducing a new helper function, modify_se_plyxp, which converts expressions to plyxp form for efficient execution. Add a utility function, convert_to_plyxp_form, to handle the conversion of colData and rowData references. This update improves the flexibility and performance of mutate operations involving multiple data types.

Revise the mutate.SummarizedExperiment function to improve scope handling by consolidating logic into a single case_when structure. Introduce a new internal helper function, mutate_via_tibble, for efficient tibble conversion during mutations. This update enhances code clarity and maintains functionality across various data types.

…ures Introduce a new function, decompose_tidy_operation, to handle the decomposition of dplyr operations into individual steps for better analysis. Update the mutate.SummarizedExperiment function to utilize this new decomposition approach, improving the handling of mixed scope operations. Additionally, add a new utility function, map_to_plyxp, to convert mixed scope expressions into plyxp form for efficient execution. This update enhances the flexibility and performance of mutate operations across various data types.

Introduce a new test file for the mutate function, covering a wide range of scenarios including simple and complex operations on colData, rowData, and assays. The tests validate functionality, scope detection, error handling, and integration with other dplyr operations, ensuring robust performance and accuracy in various use cases.

Update the mutate.SummarizedExperiment function to store the latest mutate scope report in the metadata of the result. This addition facilitates testing and introspection of the mutate operations, improving the overall functionality and usability of the mutate feature.

Introduce a new vignette titled "Speed-up `tidySummarizedExperiment` with `plyxp`" that benchmarks various `mutate()` scenarios across two branches of the repository. The vignette outlines the motivation, design principles, and setup for running benchmarks, including helper functions and visualization of results. This addition supports ongoing efforts to enhance the performance of `mutate()` operations in the `tidySummarizedExperiment` package.

Refactor the filter.SummarizedExperiment function to support decomposition of filter queries and analyze the scope of filtering operations. Introduce a new internal function, analyze_query_scope_filter, to determine whether filters target colData, rowData, or assays. Update the modify_se_plyxp function to handle additional arguments, improving flexibility. Additionally, add comprehensive tests to validate the new features and ensure correct scope reporting in metadata.

…marizedExperiment Introduce the analyze_query_scope_filter function to assess the scope of filter queries, determining whether they target colData, rowData, or assays. Integrate this analysis into the filter.SummarizedExperiment function to enhance its capability in handling mixed operations. Additionally, remove the previous implementation of analyze_query_scope_filter from query_scope_analysis.R to streamline the codebase. Update tests and vignettes to reflect these changes and ensure comprehensive coverage of new filtering functionalities.

Remove deprecated filter and join methods from dplyr_methods.R, streamlining the codebase. Introduce a new rename.SummarizedExperiment function in rename.R, enhancing the ability to rename columns in colData and rowData while ensuring proper handling of special/view-only columns. This update improves the overall functionality and usability of the package.

…ty functions Remove deprecated join methods from dplyr_methods.R to streamline the codebase. Introduce a new join utility function, join_efficient_for_SE, in join.R to handle various join operations efficiently based on the scope of join keys. Update tests to validate the functionality of left_join, inner_join, right_join, and full_join for SummarizedExperiment, ensuring comprehensive coverage and improved performance.

Update the join_efficient_for_SE function in join.R to improve clarity and efficiency in join operations. Modify comments for better understanding and remove unnecessary lines. Enhance test cases for left_join, inner_join, right_join, and full_join to ensure accurate metadata reporting of join scopes. Clean up unused code in test files to streamline the testing process.

Introduce a new select.SummarizedExperiment function that allows users to select columns from colData, rowData, and assays while analyzing the scope of the selection. Implement an internal function, analyze_query_scope_select, to determine the domains affected by the selection. Add comprehensive tests to validate the new functionality and ensure correct metadata reporting. Update the benchmark vignette to include select scenarios, enhancing the overall usability and performance of the package.

… comments from dplyr_methods.R to streamline the codebase. This change enhances clarity and prepares for future improvements in column selection functionality.

Introduce a parameter to enable caching of benchmark results, improving efficiency by loading existing results from a file if available. This change enhances the performance of the benchmarking process and reduces redundant computations.

Copilot

Pull Request Overview

This PR introduces "Query to slot routines" which refactors tidySummarizedExperiment's dplyr operations to use scope analysis for improved performance. The changes decompose multi-expression operations, analyze query scope to determine target data types (colData, rowData, assays), and implement optimized execution paths including plyxp-backed mixed operations.

Query scope analysis system to classify operations by target data type
Decomposition of multi-expression operations for better routing
Optimized execution paths with plyxp support for mixed-scope operations

Reviewed Changes

Copilot reviewed 34 out of 35 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
vignettes/introduction.Rmd	Add dplyr dependency
vignettes/benchmark-scoping.Rmd	Comprehensive benchmarking suite for new implementation
tests/testthat/test-*.R	New test suites for select, mutate, join, filter, and helper functions
R/*.R	Major refactoring: separate files for each operation with scope analysis
man/*.Rd	Updated documentation reflecting new file organization
DESCRIPTION	Add plyxp dependency

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-01T09:22:11Z

R/utilities.R

  # Comply to CRAN notes 
  . <- NULL 

-  # Get the colnames of samples and feature datasets
-  colnames_col <- 
-    colnames(colData(se)) %>% 
-    c(s_(se)$name) %>%
-
-    # Forcefully add the column I know the source. This is useful in nesting 
-    # where a unique value cannot be linked to sample or feature
-    c(names(column_belonging[column_belonging == s_(se)$name]))
-
-  colnames_row <- se %>%
-    when(
-      .hasSlot(., "rowData") | .hasSlot(., "elementMetadata") ~ colnames(rowData(.)), 
-      TRUE ~ c()
-    ) %>% 
-    c(f_(se)$name) %>%
-
-    # Forcefully add the column I know the source. This is useful in nesting 
-    # where a unique value cannot be linked to sample or feature
-    c(names(column_belonging[column_belonging == f_(se)$name]))
-
-  special_columns <- get_special_columns(
-    # Decrease the size of the dataset
-    se[1:min(100, nrow(se)), min(1, ncol(se)):min(20, ncol(se))]
-  ) 
-
-  # This is used if I have one column with one value that can be mapped to rows and columns
-  new_colnames_col = c()
-
-  # This condiction is because if I don't have any samples, the new column 
-  # could be mapped to samples and return NA in the final SE
-  if(ncol(se) > 0)
-    new_colnames_col <- 
-    .data_mutated %>%
-    select_if(!colnames(.) %in% setdiff(colnames_col, s_(se)$name)) %>% 
-
-    # Eliminate special columns that are read only. Assays
-    select_if(!colnames(.) %in% special_columns) %>%
-    select_if(!colnames(.) %in% colnames_row) %>%
-    # Replace for subset
-    select(!!s_(se)$symbol, get_subset_columns(., !!s_(se)$symbol)) %>% 
-    colnames()
-
-  col_data <-
-    .data_mutated %>%
-
-    select(c(colnames_col, new_colnames_col)) %>%
-
-
-    # Filter Sample is NA from SE that have 0 samples
-    filter(!is.na(!!s_(se)$symbol)) 
-
-  # This works even if I have 0 samples
-  duplicated_samples = col_data |> pull(!!s_(se)$symbol) %>% duplicated() 
-  if(duplicated_samples |> which() |>  length() > 0)
-    # Make fast distinct()
-    col_data = col_data |> filter(!duplicated_samples)
-
-  col_data = 
-    col_data |> 
-
-    # In case unitary SE subset does not work
-    data.frame(row.names = pull(col_data, !!s_(se)$symbol), check.names = FALSE) %>%
-    select(-!!s_(se)$symbol) %>%
-    DataFrame(check.names = FALSE)
+  # Extract column data
+  col_data <- extract_col_data(.data_mutated, se, column_belonging)


[nitpick] The comment 'Comply to CRAN notes' followed by '. <- NULL' is repeated throughout the codebase. Consider creating a utility function or documenting this pattern more clearly, as the purpose isn't immediately obvious.

Copilot · 2025-09-01T09:22:12Z

R/mutate.R

+mutate_assay <- function(.data, ...) {
+    # Get the mutation expressions
+    dots <- enquos(...)
+
+    # For each new assay column, add it directly to assays
+    for (i in seq_along(dots)) {
+        new_assay_name <- names(dots)[i]
+        expr <- dots[[i]]
+
+        # Evaluate the expression to create the new assay
+        # This assumes the expression references existing assay names
+        new_assay_data <- rlang::eval_tidy(expr, data = as.list(assays(.data)))
+
+        # Add the new assay
+        assays(.data)[[new_assay_name]] <- new_assay_data
+    }
+
+    return(.data)
+}


The function assumes expression references existing assay names but doesn't validate this assumption. If the expression references non-existent assays, eval_tidy will fail with unclear error messages.

Copilot · 2025-09-01T09:22:12Z

R/link_query_slot.R

+extract_col_data <- function(.data_mutated, se, column_belonging = NULL, 
+                             colnames_col = NULL, special_columns = NULL, 
+                             colnames_row = NULL) {


[nitpick] The function has many optional parameters with defaults computed inside the function. Consider using a more explicit parameter design or providing clearer documentation about when each parameter should be provided.

Copilot · 2025-09-01T09:22:12Z

tests/testthat/test-link_query_slot.R

+test_that("extract_col_data works correctly", {
+    # Use testthat edition 3
+    local_edition(3)


[nitpick] Setting testthat edition should be done at the package level (e.g., in testthat.R) rather than in individual test files to ensure consistency across all tests.

Copilot · 2025-09-01T09:22:13Z

R/decompose_tidy_operations.R

+#' @examples
+#' \dontrun{
+#' library(dplyr)
+#' library(airway)
+#' data(airway)
+#' 
+#' # Basic mutate decomposition
+#' mutate_fn <- decompose_tidy_operation("mutate", new_dex = dex, new_cell = cell)
+#' attr(mutate_fn, "pipeline_text")
+#' # "mutate(new_dex = dex) |> mutate(new_cell = cell)"


[nitpick] The examples use \dontrun{} but these appear to be valid executable examples that would help users understand the functionality. Consider using \donttest{} instead if the concern is execution time.

Copilot · 2025-09-01T09:22:13Z

dev/benchmark-scoping.Rmd

+### Visualise with ggplot boxplots
+
+We cap at 1 second to avoid 


The sentence 'We cap at 1 second to avoid' is incomplete and should be finished or removed.

Suggested change

### Visualise with ggplot boxplots

We cap at 1 second to avoid

We cap at 1 second to avoid skewing the plot with extreme outliers and to improve readability.

…in introduction vignette Replace devtools with remotes for GitHub installation and specify dplyr::slice for clarity in the examples. This enhances the documentation for better user guidance.

…s domain

…ng dplyr::rename for improved clarity in examples.

…eamline codebase and improve maintainability.

stemangiola · 2025-10-02T07:53:56Z

@jtlandis @mikelove if you have throughput to test the new tidySE before I merge this branch will be amazing.

stemangiola added 3 commits August 30, 2025 18:33

Add functions to extract column and row data from mutated tibbles, an…

62b5917

…d refactor update_SE_from_tibble to utilize these new functions. Include tests for the new extraction functions to ensure correct functionality.

adapt rename

7a2a77c

stemangiola mentioned this pull request Aug 31, 2025

Attempt using plyxp for some cases in tidySummarizedExperiment tidyomics/genomics-todos#19

Open

stemangiola added 8 commits August 31, 2025 12:33

version UP

b3cd45a

version UP

9bd3589

stemangiola linked an issue Aug 31, 2025 that may be closed by this pull request

plyxp backend #106

Open

stemangiola added 14 commits August 31, 2025 21:43

Remove filter and join methods from dplyr_methods.R

b7e5f7f

Rename scoping vignette

5a9de75

Remove deprecated select.SummarizedExperiment function and associated…

9ae788d

… comments from dplyr_methods.R to streamline the codebase. This change enhances clarity and prepares for future improvements in column selection functionality.

Add left_join documentation and examples for SummarizedExperiment

94cdc85

fix CHECKS

9487269

Merge branch 'master' into query-to-slot-routines

ee7aacd

stemangiola requested review from Copilot and mikelove September 1, 2025 09:19

Copilot AI reviewed Sep 1, 2025

View reviewed changes

stemangiola added 8 commits October 2, 2025 13:57

Moving benchmark vignette to dev

5d4de8b

Update installation instructions and clarify usage of slice function …

b0b7bab

…in introduction vignette Replace devtools with remotes for GitHub installation and specify dplyr::slice for clarity in the examples. This enhances the documentation for better user guidance.

Solve error for count() in vignette

caa3fcb

Update website links in introduction vignette to reflect new tidyomic…

6b36fb7

…s domain

Clarify usage of rename function in introduction vignette by specifyi…

728ece1

…ng dplyr::rename for improved clarity in examples.

Add 'airway' to DESCRIPTION suggests for improved functionality

33d2504

Add NEWS

a10d0f0

Remove deprecated tests for old vocabulary from the test suite to str…

79dcfd9

…eamline codebase and improve maintainability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query to slot routines #108

Query to slot routines #108

Uh oh!

stemangiola commented Aug 31, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 1, 2025

Uh oh!

Copilot AI Sep 1, 2025

Uh oh!

Copilot AI Sep 1, 2025

Uh oh!

Copilot AI Sep 1, 2025

Uh oh!

Copilot AI Sep 1, 2025

Uh oh!

Copilot AI Sep 1, 2025

Uh oh!

stemangiola commented Oct 2, 2025

Uh oh!

Uh oh!

		### Visualise with ggplot boxplots

		We cap at 1 second to avoid

Query to slot routines #108

Are you sure you want to change the base?

Query to slot routines #108

Uh oh!

Conversation

stemangiola commented Aug 31, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

stemangiola commented Oct 2, 2025

Uh oh!

Uh oh!