-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Labels
Description
I was running the following code to optimise 500 Zn crystal structures
state = ts.initialize_state(compound_init, device=device, dtype=torch.float64)
state_list = state.split()
memory_metric_values = [
calculate_memory_scaler(s, memory_scales_with="n_atoms_x_density") for s in state_list
]
max_memory_metric = estimate_max_memory_scaler(
mace_model, state_list, metric_values=memory_metric_values
)
print("Max memory metric", max_memory_metric)
batcher = InFlightAutoBatcher(
mace_model,
max_memory_padding=1,
max_memory_scaler=max_memory_metric * 0.8
)
convergence_fn = ts.generate_force_convergence_fn(0.025)
relaxed_state = ts.optimize(
system=state,
model=mace_model,
optimizer=ts.frechet_cell_fire,
autobatcher=batcher,
max_steps=1000,
convergence_fn=convergence_fn,
)
however error below was observed
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[9], line 22
15 batcher = InFlightAutoBatcher(
16 mace_model,
17 max_memory_padding=1,
18 max_memory_scaler=max_memory_metric * 0.8
19 )
21 convergence_fn = ts.generate_force_convergence_fn(0.025)
---> 22 relaxed_state = ts.optimize(
23 system=state,
24 model=mace_model,
25 optimizer=ts.frechet_cell_fire,
26 autobatcher=batcher,
27 max_steps=1000,
28 convergence_fn=convergence_fn,
29 )
30 # extract the final energy from the trajectory file
31 print(relaxed_state.energy)
File [~/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/runners.py:439](http://localhost:8895/lab/workspaces/auto-P/tree/Alan_H/Torch_sim_test/Notebooks/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/runners.py#line=438), in optimize(system, model, optimizer, convergence_fn, trajectory_reporter, autobatcher, max_steps, steps_between_swaps, pbar, **optimizer_kwargs)
436 pbar_kwargs.setdefault("disable", None)
437 tqdm_pbar = tqdm(total=state.n_batches, **pbar_kwargs)
--> 439 while (result := autobatcher.next_batch(state, convergence_tensor))[0] is not None:
440 state, converged_states, batch_indices = result
441 all_converged_states.extend(converged_states)
File [~/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/autobatching.py:1062](http://localhost:8895/lab/workspaces/auto-P/tree/Alan_H/Torch_sim_test/Notebooks/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/autobatching.py#line=1061), in InFlightAutoBatcher.next_batch(self, updated_state, convergence_tensor)
1060 if updated_state.n_batches > 0:
1061 next_states = [updated_state, *next_states]
-> 1062 next_batch = concatenate_states(next_states)
1064 if self.return_indices:
1065 return next_batch, completed_states, self.current_idx
File [~/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/state.py:839](http://localhost:8895/lab/workspaces/auto-P/tree/Alan_H/Torch_sim_test/Notebooks/miniconda3/envs/X/lib/python3.11/site-packages/torch_sim/state.py#line=838), in concatenate_states(states, device)
836 # Concatenate collected tensors
837 for prop, tensors in per_atom_tensors.items():
838 # if tensors:
--> 839 concatenated[prop] = torch.cat(tensors, dim=0)
841 for prop, tensors in per_batch_tensors.items():
842 # if tensors:
843 concatenated[prop] = torch.cat(tensors, dim=0)
TypeError: expected Tensor as element 1 in argument 0, but got NoneType
This sort of type error may be common when cuda memory was heavily used, and nvidia-smi supported this may be the reasoning.
As I have tried, when I added VRAM cleaning commands when manually written all the batching and optimisation cycles as introduced in the tutorial and using subprcess for optimisation when needed - understanding the autobatching, this problem would resolve at this calculation scale. Also, the convergence becomes easier for smaller scale calculation, interestingly.