- Apr 2020
-
osf.io osf.io
-
build an542image stack
Or contribute to an existing one!
-
Use a Dockerfile per project and publish it with a525version control system
Given the overlap in concerns - maybe this should be re-organized to be near point (2) - using versioned Docker Images? Having them together would clarify how the two concerns are different, but overlapping
-
Kitematic [
AFAIK Kitematic is totally deprecated. I would recommend Portainer or (on Mac or Windows) the Docker dashboard.
-
ENTRYPOINT ["python"]CMD ["/workspace/run-all.sh"
This is a little obtuse. Wouldn't a python script as the CMD be more obvious / clear?
-
In any case you should document different variants478very well, potentially capture build and run commands in aMakefile[26]
This seems a little inconsistent with the argument to include commands in the Dockerfile itself above (where I commented that maybe you should use Makefiles ;)
-
You should420avoid installing software packages from source afterCOPYing the code into the image,421because the connection between the file outside of the image and the one copied in is422easily lost (cf. Rule 7
Often, though, you are developing the package while working with it. In this case, installing over a bind-mount (as you suggest) in developer mode (e.g.,
pip install -e dir
) can be a good way to go. Such an install can be included in an entrypoint script, or just keep in mind that you can stop and not delete a container and keep things configured how you left them. -
bind405mounts
There are performance considerations here as well - bind mounts save space, and are equal performance on Linux. You can unfortunately get very bad performance on Docker for Win / Mac, and you might want to use a volume mount containing your data or similar. WSL2 on Windows also promises improved performance.
-
Conda
And also Pip, ya? This feels like pushing people towards Pip for funny reasons (I tend to use pip because it is faster)
-
RUN pip install geopy==1.20.0 &&npip install uszipcode==0.2.2
Is there a reason you're recommending two separate pip invocations here? Since it's the same layer, they'll still both run if you change the command. If you give pip all requirements at once, there's less chance of version thrashing (potentially even installing incompatible versions of some things)
-
You should regularly re-build the image using the--no-cacheoption
And perhaps make sure to tag your good / working container before you do!
-
Therefore you should311add instructionsin orderof least likely to change to most likely to change
One complaint about Docker is that it is slow. If you tend to append while building your image, your iterations will be fast. You can re-organize the layers at the end.
Your guidelines still seem good even when you're iterating! But I'd lean towards appending at the end until I figure things out.
-
volume mounts, specific300names, or ports are important for using the container, see for example the final lines of301Listing 1
Its also reasonable to include external commands that include this information - scripts, a Makefile, or a docker-compose.yaml, as examples (all of these allow the use of relative paths, which aren't allowed by the docker command directly).
-
ARG
I would introduce ARG before including it in code, or explain it immediately after. It's not part of the "core" that everyone familiar with Docker will know
-
custom279metadatato images
Similarly, ENV can provide metadata to programs running inside the container (I don't think you can access LABELs from inside?)
-
one scoped251action
one scoped, documented action?
-
keep the script in the container for a future231user to inspect
...and a script is really small so it's not a big deal for size concerns!
You might also note that if you use Docker's COPY command, you can never get rid of the data even if you delete it - it'll hang around in the COPY layer.
-
especially when221connecting multiple commands in aRUNinstruction with&&
Inspired by the standard syle-guide in Elm, I tend to put connecting syntactic elements at the beginning, e.g.:
RUN some-command \ && another-command
This can dramatically reduce the chance of accidentally removing a needed
&&
or leaving one lingering around... -
Do not201docker pusha locally built image,
I think that doing a docker push on a locally built image is generally fine... and probably better to do so for archival (with caveats) vs. not doing it?
There are other ways to address security concerns - e.g., running the container in a cloud docker service? (and I'm not shilling for Gigantum here - we don't quite support this very well unless you create you own base, which is more complex than just publishing to a registry)
-
for images that you build yourself and then run
This seems worth expanding - how do you do this? (I know you specify it in the docker build command, but you could give an example, just as you include a versioned FROM later on)
-
only use183images where you have access to theDockerfile
How do you verify the Dockerfile was actually used to build the image?
-
optimised for high100performance computing
I would say that it's optimized for the security needs of traditional HPC environments. People use Docker (esp. Kubernetes) in novel HPC contexts, and there is even national infrastructure that supports Docker!
-
Research Software Engineers (RSEs) are not the59target audience for this work, but we want to encourage you to reach out to your local60or national RSE community if your needs go beyond the rules of this work.
Not sure why you'd want to suggest RSE's not read the paper? Even if it's head-nodding in total agreement, presumably, they might use this as a resource at least?
-
- Mar 2018
-
paperpile.com paperpile.com
-
Paperpile + hypothesis seems rad
-