📖 2 minutes of estimated reading time 📖

A tiny script to explore package sizes in Anaconda

Today at work, I was trying to reduce the size of a container image and I wanted to check what's in some conda environments, especially what packages are taking up all that sweet storage (looking at you Tensorflow).

I thought, surely conda has a way to show me the packages and their sizes in general and not just when I install them? But it seems that it doesn't, so I hacked together something silly with the help of the coreutils and awk.

See below my conda-size-report.sh,

env=$1

# Select the conda environment path matching the name passed as argument
path=$(conda env list | awk -v env="$env" '$1==env {print $2}')

if [[ -d $path ]]
then
    # Do not follow symlinks, indicated by 'l' at the beginning of the line
    pythons=$(ls -ld $path/lib/python* | grep -v ^l | awk '{print $NF}')
    for p in $pythons
    do
	echo $p | awk -F/ '{printf "Python version: %s\n",$NF}'
	# cd to python version and print sizes -> sort by human-readable size -> print size and pkg name and add total size
	(cd $p && du -ch --max-depth=1 $p/site-packages | sort -hk1 --unique | awk -F/ '{print $1, $NF}' | sed -e 's/site-packages/+Total+/')
    done
else
    echo "No conda environment: $env"
fi

So right now, with an environment called my_env for example, I can simply do:

sh conda-size-report.sh my_env

and have something simple like this:

Python version: python3.8
770K    gunicorn
.
.
34M     numpy
71M     pandas
535M    tensorflow
1203M   +Total+

And that's it, you can enjoy your time again, one hacky script at a time.