ask help for run dpgen #1728
maruru0902
started this conversation in
General
Replies: 1 comment
-
This seems to be the error message |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I got such error message when i test CH4 from tutorial. Please give me some advice. Thank you
INFO:dpgen:start running
INFO:dpgen:continue from iter 000 task 06
INFO:dpgen:=============================iter.000000==============================
INFO:dpgen:-------------------------iter.000000 task 07--------------------------
2025-03-18 21:22:54,596 - INFO : info:check_all_finished: False
2025-03-18 21:22:54,640 - INFO : job: 3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 submit; job_id is 788079
2025-03-18 21:22:54,644 - INFO : job: c93b47a7b407b4803c4c2a44fb8673c46f55c85f submit; job_id is 788082
2025-03-18 21:22:54,650 - INFO : job: 6d98526c0f06b02d25c558fd742c2b32dbb80e07 submit; job_id is 788086
2025-03-18 21:22:54,656 - INFO : job: 8516491a1e772723c5ec1004ae1e76156bf2b4c9 submit; job_id is 788091
2025-03-18 21:22:54,661 - INFO : job: 0d1382560fb12b8c32b145c683f7967712e9c29d submit; job_id is 788097
2025-03-18 21:22:54,667 - INFO : job: badaae76be609089ec6ad1ae55b09bb028f08e3e submit; job_id is 788101
2025-03-18 21:22:54,672 - INFO : job: e10a4c341ffe1e63cd43d989a050ba40cf3a159b submit; job_id is 788106
2025-03-18 21:22:54,678 - INFO : job: 18522bc59005e9de2fc8f8b45bfd767ab1de078a submit; job_id is 788111
2025-03-18 21:22:54,684 - INFO : job: 7032c9cdf23ef7581450e8bc438e1c0a372ac3cd submit; job_id is 788116
2025-03-18 21:22:54,690 - INFO : job: ef0ce53101b463dfcee47babff5266d9226bba8c submit; job_id is 788122
2025-03-18 21:22:54,696 - INFO : job: 31af1fac65eb1a27519a11c5e9e39e8608da4d16 submit; job_id is 788126
2025-03-18 21:22:54,704 - INFO : job: 8b4363367a5acec248ab37374aef5f39a71818c7 submit; job_id is 788132
2025-03-18 21:22:54,711 - INFO : job: 2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 submit; job_id is 788137
2025-03-18 21:22:54,717 - INFO : job: 7dd38e4d96ad25bf6644a82ba79a15093866d5a4 submit; job_id is 788142
2025-03-18 21:22:54,723 - INFO : job: 78b3c01f18c6c214e19a179ac6bd678fa92cd416 submit; job_id is 788146
2025-03-18 21:22:54,731 - INFO : job: c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 submit; job_id is 788152
2025-03-18 21:22:54,738 - INFO : job: 8f7e7186419dce4573872b512b651c54a2600d5b submit; job_id is 788157
2025-03-18 21:22:54,744 - INFO : job: 76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 submit; job_id is 788162
2025-03-18 21:22:54,752 - INFO : job: c35f9d8b932e3c3922ebac3b714af87404bd01dc submit; job_id is 788167
2025-03-18 21:22:54,761 - INFO : job: 044e451365c4edc0ec0b91625c4e17967926880e submit; job_id is 788172
2025-03-18 21:22:56,808 - INFO : job: 0d1382560fb12b8c32b145c683f7967712e9c29d 788097 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:56,814 - INFO : job:0d1382560fb12b8c32b145c683f7967712e9c29d re-submit after terminated; new job_id is 798465
2025-03-18 21:22:57,102 - INFO : job:0d1382560fb12b8c32b145c683f7967712e9c29d job_id:798465 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:57,102 - INFO : job: badaae76be609089ec6ad1ae55b09bb028f08e3e 788101 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:57,108 - INFO : job:badaae76be609089ec6ad1ae55b09bb028f08e3e re-submit after terminated; new job_id is 798476
2025-03-18 21:22:57,397 - INFO : job:badaae76be609089ec6ad1ae55b09bb028f08e3e job_id:798476 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:57,397 - INFO : job: e10a4c341ffe1e63cd43d989a050ba40cf3a159b 788106 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:57,401 - INFO : job:e10a4c341ffe1e63cd43d989a050ba40cf3a159b re-submit after terminated; new job_id is 798516
2025-03-18 21:22:57,690 - INFO : job:e10a4c341ffe1e63cd43d989a050ba40cf3a159b job_id:798516 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:57,691 - INFO : job: 18522bc59005e9de2fc8f8b45bfd767ab1de078a 788111 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:57,696 - INFO : job:18522bc59005e9de2fc8f8b45bfd767ab1de078a re-submit after terminated; new job_id is 798943
2025-03-18 21:22:57,981 - INFO : job:18522bc59005e9de2fc8f8b45bfd767ab1de078a job_id:798943 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:57,981 - INFO : job: 7032c9cdf23ef7581450e8bc438e1c0a372ac3cd 788116 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:57,986 - INFO : job:7032c9cdf23ef7581450e8bc438e1c0a372ac3cd re-submit after terminated; new job_id is 799433
2025-03-18 21:22:58,276 - INFO : job:7032c9cdf23ef7581450e8bc438e1c0a372ac3cd job_id:799433 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:58,276 - INFO : job: ef0ce53101b463dfcee47babff5266d9226bba8c 788122 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:58,281 - INFO : job:ef0ce53101b463dfcee47babff5266d9226bba8c re-submit after terminated; new job_id is 799947
2025-03-18 21:22:58,570 - INFO : job:ef0ce53101b463dfcee47babff5266d9226bba8c job_id:799947 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:58,570 - INFO : job: 31af1fac65eb1a27519a11c5e9e39e8608da4d16 788126 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:58,575 - INFO : job:31af1fac65eb1a27519a11c5e9e39e8608da4d16 re-submit after terminated; new job_id is 800476
2025-03-18 21:22:58,863 - INFO : job:31af1fac65eb1a27519a11c5e9e39e8608da4d16 job_id:800476 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:58,863 - INFO : job: 8b4363367a5acec248ab37374aef5f39a71818c7 788132 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:58,868 - INFO : job:8b4363367a5acec248ab37374aef5f39a71818c7 re-submit after terminated; new job_id is 801003
2025-03-18 21:22:59,158 - INFO : job:8b4363367a5acec248ab37374aef5f39a71818c7 job_id:801003 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:59,158 - INFO : job: 2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 788137 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:59,163 - INFO : job:2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 re-submit after terminated; new job_id is 801521
2025-03-18 21:22:59,452 - INFO : job:2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 job_id:801521 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:59,453 - INFO : job: 7dd38e4d96ad25bf6644a82ba79a15093866d5a4 788142 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:59,458 - INFO : job:7dd38e4d96ad25bf6644a82ba79a15093866d5a4 re-submit after terminated; new job_id is 802051
2025-03-18 21:22:59,746 - INFO : job:7dd38e4d96ad25bf6644a82ba79a15093866d5a4 job_id:802051 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:59,746 - INFO : job: 78b3c01f18c6c214e19a179ac6bd678fa92cd416 788146 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:59,751 - INFO : job:78b3c01f18c6c214e19a179ac6bd678fa92cd416 re-submit after terminated; new job_id is 802572
2025-03-18 21:23:00,040 - INFO : job:78b3c01f18c6c214e19a179ac6bd678fa92cd416 job_id:802572 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:00,041 - INFO : job: c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 788152 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:00,045 - INFO : job:c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 re-submit after terminated; new job_id is 803088
2025-03-18 21:23:00,333 - INFO : job:c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 job_id:803088 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:00,334 - INFO : job: 8f7e7186419dce4573872b512b651c54a2600d5b 788157 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:00,338 - INFO : job:8f7e7186419dce4573872b512b651c54a2600d5b re-submit after terminated; new job_id is 803599
2025-03-18 21:23:00,627 - INFO : job:8f7e7186419dce4573872b512b651c54a2600d5b job_id:803599 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:00,627 - INFO : job: 76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 788162 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:00,632 - INFO : job:76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 re-submit after terminated; new job_id is 804108
2025-03-18 21:23:00,921 - INFO : job:76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 job_id:804108 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:00,921 - INFO : job: c35f9d8b932e3c3922ebac3b714af87404bd01dc 788167 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:00,926 - INFO : job:c35f9d8b932e3c3922ebac3b714af87404bd01dc re-submit after terminated; new job_id is 804644
2025-03-18 21:23:01,214 - INFO : job:c35f9d8b932e3c3922ebac3b714af87404bd01dc job_id:804644 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:01,214 - INFO : job: 044e451365c4edc0ec0b91625c4e17967926880e 788172 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:01,219 - INFO : job:044e451365c4edc0ec0b91625c4e17967926880e re-submit after terminated; new job_id is 805183
2025-03-18 21:23:01,505 - INFO : job:044e451365c4edc0ec0b91625c4e17967926880e job_id:805183 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:32,310 - INFO : job: 3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 788079 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:32,317 - INFO : job:3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 re-submit after terminated; new job_id is 806847
2025-03-18 21:23:32,603 - INFO : job:3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 job_id:806847 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:32,604 - INFO : job: c93b47a7b407b4803c4c2a44fb8673c46f55c85f 788082 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:32,609 - INFO : job:c93b47a7b407b4803c4c2a44fb8673c46f55c85f re-submit after terminated; new job_id is 806858
2025-03-18 21:23:32,898 - INFO : job:c93b47a7b407b4803c4c2a44fb8673c46f55c85f job_id:806858 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:32,899 - INFO : job: 6d98526c0f06b02d25c558fd742c2b32dbb80e07 788086 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:32,904 - INFO : job:6d98526c0f06b02d25c558fd742c2b32dbb80e07 re-submit after terminated; new job_id is 806898
2025-03-18 21:23:33,193 - INFO : job:6d98526c0f06b02d25c558fd742c2b32dbb80e07 job_id:806898 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:33,193 - INFO : job: 8516491a1e772723c5ec1004ae1e76156bf2b4c9 788091 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:33,198 - INFO : job:8516491a1e772723c5ec1004ae1e76156bf2b4c9 re-submit after terminated; new job_id is 807304
2025-03-18 21:23:33,487 - INFO : job:8516491a1e772723c5ec1004ae1e76156bf2b4c9 job_id:807304 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:33,488 - INFO : job: 0d1382560fb12b8c32b145c683f7967712e9c29d 798465 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:33,492 - INFO : job:0d1382560fb12b8c32b145c683f7967712e9c29d re-submit after terminated; new job_id is 807825
2025-03-18 21:23:33,782 - INFO : job:0d1382560fb12b8c32b145c683f7967712e9c29d job_id:807825 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:33,782 - INFO : job: badaae76be609089ec6ad1ae55b09bb028f08e3e 798476 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:33,787 - INFO : job:badaae76be609089ec6ad1ae55b09bb028f08e3e re-submit after terminated; new job_id is 808339
2025-03-18 21:23:34,075 - INFO : job:badaae76be609089ec6ad1ae55b09bb028f08e3e job_id:808339 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:34,075 - INFO : job: e10a4c341ffe1e63cd43d989a050ba40cf3a159b 798516 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:34,080 - INFO : job:e10a4c341ffe1e63cd43d989a050ba40cf3a159b re-submit after terminated; new job_id is 808852
2025-03-18 21:23:34,369 - INFO : job:e10a4c341ffe1e63cd43d989a050ba40cf3a159b job_id:808852 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:34,369 - INFO : job: 18522bc59005e9de2fc8f8b45bfd767ab1de078a 798943 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:34,374 - INFO : job:18522bc59005e9de2fc8f8b45bfd767ab1de078a re-submit after terminated; new job_id is 809392
2025-03-18 21:23:34,663 - INFO : job:18522bc59005e9de2fc8f8b45bfd767ab1de078a job_id:809392 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:34,663 - INFO : job: 7032c9cdf23ef7581450e8bc438e1c0a372ac3cd 799433 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:34,668 - INFO : job:7032c9cdf23ef7581450e8bc438e1c0a372ac3cd re-submit after terminated; new job_id is 809910
2025-03-18 21:23:34,957 - INFO : job:7032c9cdf23ef7581450e8bc438e1c0a372ac3cd job_id:809910 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:34,957 - INFO : job: ef0ce53101b463dfcee47babff5266d9226bba8c 799947 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:34,962 - INFO : job:ef0ce53101b463dfcee47babff5266d9226bba8c re-submit after terminated; new job_id is 810427
2025-03-18 21:23:35,250 - INFO : job:ef0ce53101b463dfcee47babff5266d9226bba8c job_id:810427 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:35,250 - INFO : job: 31af1fac65eb1a27519a11c5e9e39e8608da4d16 800476 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:35,255 - INFO : job:31af1fac65eb1a27519a11c5e9e39e8608da4d16 re-submit after terminated; new job_id is 810951
2025-03-18 21:23:35,542 - INFO : job:31af1fac65eb1a27519a11c5e9e39e8608da4d16 job_id:810951 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:35,542 - INFO : job: 8b4363367a5acec248ab37374aef5f39a71818c7 801003 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:35,547 - INFO : job:8b4363367a5acec248ab37374aef5f39a71818c7 re-submit after terminated; new job_id is 811451
2025-03-18 21:23:35,835 - INFO : job:8b4363367a5acec248ab37374aef5f39a71818c7 job_id:811451 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:35,835 - INFO : job: 2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 801521 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:35,840 - INFO : job:2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 re-submit after terminated; new job_id is 811960
2025-03-18 21:23:36,126 - INFO : job:2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 job_id:811960 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:36,126 - INFO : job: 7dd38e4d96ad25bf6644a82ba79a15093866d5a4 802051 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:36,131 - INFO : job:7dd38e4d96ad25bf6644a82ba79a15093866d5a4 re-submit after terminated; new job_id is 812493
2025-03-18 21:23:36,418 - INFO : job:7dd38e4d96ad25bf6644a82ba79a15093866d5a4 job_id:812493 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:36,418 - INFO : job: 78b3c01f18c6c214e19a179ac6bd678fa92cd416 802572 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:36,423 - INFO : job:78b3c01f18c6c214e19a179ac6bd678fa92cd416 re-submit after terminated; new job_id is 812999
2025-03-18 21:23:36,712 - INFO : job:78b3c01f18c6c214e19a179ac6bd678fa92cd416 job_id:812999 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:36,712 - INFO : job: c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 803088 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:36,717 - INFO : job:c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 re-submit after terminated; new job_id is 813517
2025-03-18 21:23:37,005 - INFO : job:c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 job_id:813517 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:37,005 - INFO : job: 8f7e7186419dce4573872b512b651c54a2600d5b 803599 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:37,010 - INFO : job:8f7e7186419dce4573872b512b651c54a2600d5b re-submit after terminated; new job_id is 814066
2025-03-18 21:23:37,299 - INFO : job:8f7e7186419dce4573872b512b651c54a2600d5b job_id:814066 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:37,299 - INFO : job: 76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 804108 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:37,304 - INFO : job:76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 re-submit after terminated; new job_id is 814601
2025-03-18 21:23:37,592 - INFO : job:76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 job_id:814601 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:37,592 - INFO : job: c35f9d8b932e3c3922ebac3b714af87404bd01dc 804644 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:37,597 - INFO : job:c35f9d8b932e3c3922ebac3b714af87404bd01dc re-submit after terminated; new job_id is 815114
2025-03-18 21:23:37,885 - INFO : job:c35f9d8b932e3c3922ebac3b714af87404bd01dc job_id:815114 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:37,886 - INFO : job: 044e451365c4edc0ec0b91625c4e17967926880e 805183 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:37,890 - INFO : job:044e451365c4edc0ec0b91625c4e17967926880e re-submit after terminated; new job_id is 815631
2025-03-18 21:23:38,177 - INFO : job:044e451365c4edc0ec0b91625c4e17967926880e job_id:815631 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:24:08,984 - INFO : job: 3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 806847 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:24:08,989 - INFO : job:3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 re-submit after terminated; new job_id is 817309
2025-03-18 21:24:09,273 - INFO : job:3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 job_id:817309 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:24:09,273 - INFO : job: c93b47a7b407b4803c4c2a44fb8673c46f55c85f 806858 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:24:09,280 - INFO : job:c93b47a7b407b4803c4c2a44fb8673c46f55c85f re-submit after terminated; new job_id is 817322
2025-03-18 21:24:09,568 - INFO : job:c93b47a7b407b4803c4c2a44fb8673c46f55c85f job_id:817322 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:24:09,568 - INFO : job: 6d98526c0f06b02d25c558fd742c2b32dbb80e07 806898 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:24:09,573 - INFO : job:6d98526c0f06b02d25c558fd742c2b32dbb80e07 re-submit after terminated; new job_id is 817337
2025-03-18 21:24:09,859 - INFO : job:6d98526c0f06b02d25c558fd742c2b32dbb80e07 job_id:817337 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:24:09,860 - INFO : job: 8516491a1e772723c5ec1004ae1e76156bf2b4c9 807304 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:24:09,864 - INFO : job:8516491a1e772723c5ec1004ae1e76156bf2b4c9 re-submit after terminated; new job_id is 817708
2025-03-18 21:24:10,153 - INFO : job:8516491a1e772723c5ec1004ae1e76156bf2b4c9 job_id:817708 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:24:10,153 - INFO : job: 0d1382560fb12b8c32b145c683f7967712e9c29d 807825 terminated; fail_cout is 3; resubmitting job
Traceback (most recent call last):
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpdispatcher/submission.py", line 356, in handle_unexpected_submission_state
job.handle_unexpected_job_state()
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpdispatcher/submission.py", line 855, in handle_unexpected_job_state
raise RuntimeError(err_msg)
RuntimeError: job:0d1382560fb12b8c32b145c683f7967712e9c29d 807825 failed 3 times.
Possible remote error message: ==> /homea/wangyl/dpgen/dpgen_example/run/work/c6ba8e6375537415ea5880ac6caf51ae51946e1b/task.000.000009/fp.log <==
/homea/wangyl/dpgen/dpgen_example/run/work/c6ba8e6375537415ea5880ac6caf51ae51946e1b/0d1382560fb12b8c32b145c683f7967712e9c29d.sub.run: line 6: mpirun: command not found
/homea/wangyl/dpgen/dpgen_example/run/work/c6ba8e6375537415ea5880ac6caf51ae51946e1b/0d1382560fb12b8c32b145c683f7967712e9c29d.sub.run: line 6: mpirun: command not found
/homea/wangyl/dpgen/dpgen_example/run/work/c6ba8e6375537415ea5880ac6caf51ae51946e1b/0d1382560fb12b8c32b145c683f7967712e9c29d.sub.run: line 6: mpirun: command not found
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/homea/wangyl/miniconda3/envs/deepmd/bin/dpgen", line 10, in
sys.exit(main())
^^^^^^
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpgen/main.py", line 255, in main
args.func(args)
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpgen/generator/run.py", line 5474, in gen_run
run_iter(args.PARAM, args.MACHINE)
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpgen/generator/run.py", line 4826, in run_iter
run_fp(ii, jdata, mdata)
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpgen/generator/run.py", line 4048, in run_fp
run_fp_inner(
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpgen/generator/run.py", line 4027, in run_fp_inner
submission.run_submission()
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpdispatcher/submission.py", line 260, in run_submission
self.handle_unexpected_submission_state()
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpdispatcher/submission.py", line 360, in handle_unexpected_submission_state
raise RuntimeError(
RuntimeError: Meet errors will handle unexpected submission state.
Debug information: remote_root==/homea/wangyl/dpgen/dpgen_example/run/work/c6ba8e6375537415ea5880ac6caf51ae51946e1b.
Debug information: submission_hash==c6ba8e6375537415ea5880ac6caf51ae51946e1b.
Please check error messages above and in remote_root. The submission information is saved in /homea/wangyl/.dpdispatcher/submission/c6ba8e6375537415ea5880ac6caf51ae51946e1b.json.
For furthur actions, run the following command with proper flags: dpdisp submission c6ba8e6375537415ea5880ac6caf51ae51946e1b
Beta Was this translation helpful? Give feedback.
All reactions