Possible VRAM Control Issue

I was running the following code to optimise 500 Zn crystal structures

```python
state = ts.initialize_state(compound_init, device=device, dtype=torch.float64)
state_list = state.split()
memory_metric_values = [
    calculate_memory_scaler(s, memory_scales_with="n_atoms_x_density") for s in state_list
]
max_memory_metric = estimate_max_memory_scaler(
    mace_model, state_list, metric_values=memory_metric_values
)
print("Max memory metric", max_memory_metric)

batcher = InFlightAutoBatcher(
    mace_model,
    max_memory_padding=1,
    max_memory_scaler=max_memory_metric * 0.8
)

convergence_fn = ts.generate_force_convergence_fn(0.025)
relaxed_state = ts.optimize(
    system=state,
    model=mace_model,
    optimizer=ts.frechet_cell_fire,
    autobatcher=batcher,
    max_steps=1000,
    convergence_fn=convergence_fn,
)
```
however error below was observed

```bash
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 22
     15 batcher = InFlightAutoBatcher(
     16     mace_model,
     17     max_memory_padding=1,
     18     max_memory_scaler=max_memory_metric * 0.8
     19 )
     21 convergence_fn = ts.generate_force_convergence_fn(0.025)
---> 22 relaxed_state = ts.optimize(
     23     system=state,
     24     model=mace_model,
     25     optimizer=ts.frechet_cell_fire,
     26     autobatcher=batcher,
     27     max_steps=1000,
     28     convergence_fn=convergence_fn,
     29 )
     30 # extract the final energy from the trajectory file
     31 print(relaxed_state.energy)

File [~/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/runners.py:439](http://localhost:8895/lab/workspaces/auto-P/tree/Alan_H/Torch_sim_test/Notebooks/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/runners.py#line=438), in optimize(system, model, optimizer, convergence_fn, trajectory_reporter, autobatcher, max_steps, steps_between_swaps, pbar, **optimizer_kwargs)
    436     pbar_kwargs.setdefault("disable", None)
    437     tqdm_pbar = tqdm(total=state.n_batches, **pbar_kwargs)
--> 439 while (result := autobatcher.next_batch(state, convergence_tensor))[0] is not None:
    440     state, converged_states, batch_indices = result
    441     all_converged_states.extend(converged_states)

File [~/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/autobatching.py:1062](http://localhost:8895/lab/workspaces/auto-P/tree/Alan_H/Torch_sim_test/Notebooks/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/autobatching.py#line=1061), in InFlightAutoBatcher.next_batch(self, updated_state, convergence_tensor)
   1060 if updated_state.n_batches > 0:
   1061     next_states = [updated_state, *next_states]
-> 1062 next_batch = concatenate_states(next_states)
   1064 if self.return_indices:
   1065     return next_batch, completed_states, self.current_idx

File [~/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/state.py:839](http://localhost:8895/lab/workspaces/auto-P/tree/Alan_H/Torch_sim_test/Notebooks/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/state.py#line=838), in concatenate_states(states, device)
    836 # Concatenate collected tensors
    837 for prop, tensors in per_atom_tensors.items():
    838     # if tensors:
--> 839     concatenated[prop] = torch.cat(tensors, dim=0)
    841 for prop, tensors in per_batch_tensors.items():
    842     # if tensors:
    843     concatenated[prop] = torch.cat(tensors, dim=0)

TypeError: expected Tensor as element 1 in argument 0, but got NoneType
```

----------------

This sort of type error may be common when cuda memory was heavily used, and nvidia-smi supported this may be the reasoning.

As I have tried, when I added VRAM cleaning commands when manually written all the batching and optimisation cycles as introduced in the tutorial and using subprcess for optimisation when needed - understanding the autobatching, this problem would resolve at this calculation scale. Also, the convergence becomes easier for smaller scale calculation, interestingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible VRAM Control Issue #234

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible VRAM Control Issue #234

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions