Skip to content

Commit 2c2b991

Browse files
committed
Added some more README Info
1 parent 8c034ce commit 2c2b991

File tree

1 file changed

+15
-8
lines changed

1 file changed

+15
-8
lines changed

README.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -42,19 +42,19 @@ docker-compose -f docker-compose-minimal.yml up
4242

4343

4444
When shutting down the process before completion, make sure to **clean up your containers** !
45-
Otherwise, using docker run it might just restart the stopped container.
45+
Otherwise, using `docker run` it might just restart the stopped container.
4646

4747
Also, if you run the experiment multiple times, **extract the outputs** beforehand.
4848
Otherwise, the output will be overwritten.
4949

50-
I have also tested this to work with podman and podman-compose on debian 10.
50+
Version 1.2 was also tested to work with podman and podman-compose on debian 10.
5151
For running with podman, make sure to have the output folder created first.
5252

5353
## Requirements / Non Docker
5454

55-
In older versions (1.0) this contained an `environment.yml` and instructions how to run this without docker on your own machine.
56-
In theory, this is still possible, but the requirements is (intentionally) reduced to work flawless with the pre-existing dependencies in the container.
57-
To work, you should be good starting from Python 3.6 and installing Pytorch 1.4.
55+
The contained `environment.yml` is a starting point how to run this *without docker* on your own machine.
56+
The provided `requirements.txt` is meant for docker-only as important parts (pytorch) are missing, to align with pre-existing dependencies in the container.
57+
To work, you should be good starting from Python 3.6 and make a fresh conda env from the `environment.yml`.
5858

5959
## Licence
6060

@@ -67,11 +67,11 @@ The original python files from microsoft follow (different) licences, and any ch
6767

6868
For the container to run properly, it needs 15 to 25 gigabyte memory.
6969
On our servers, one cpu epoch on the java data takes ~30h.
70-
The Containers starts ~20 threads for training and your server should have >20 cores.
70+
The CPU Containers starts ~20 threads for training and your server should have >20 cores.
7171

7272
In comparison, training on a RTX 1070 took 7h per epoch.
7373
Training on a 3080ti took 6h per epoch.
74-
Training on an A40 took ~3h per epoch. In general, GPU tries to allocate around 12gb of memory.
74+
Training on an A40 took ~4h per epoch. In general, GPU tries to allocate around 12gb of memory.
7575

7676
In general, despite being a good first step, GPU Containers turned out to be quite fragile.
7777
We have seen multiple problems with Framework-Versions, Hardware and OS combinations.
@@ -142,4 +142,11 @@ You can narrow down whether this is your problem by
142142
4. the time that you see the above message is suspiciously different from the numbers reported above
143143

144144
To adress this, just mount **one** GPU in.
145-
Only one GPU should be picked up, printed as such at the beginning of the container logs.
145+
Only one GPU should be picked up, printed as such at the beginning of the container logs.
146+
147+
## Version History
148+
149+
- 1.0 was the first version with everything hardcoded
150+
- 1.1 had some elements hardcoded, others configurable
151+
- 1.2 was fully configurable but hardcoded to **CPU only**
152+
- 1.3 changed the base image and allows **GPU usage**

0 commit comments

Comments
 (0)