For years, the young developer had heard lores of a wise Penguin n master living in the frozen Linux tundra. One day he set out to meet the master, to finally get answers to the many questions he had acquired. He hiked for several days in the bone-chilling snow to finally reach the wise master.
Young developer: Oh master, I have endured days of hardships to find you. I was told you have the answers to all my questions ?
Kernel Penguin: I’m not sure, but I certainly shall try.
Young developer: I have so many questions
Kernel Penguin: Let’s start with one, shall we ?
Young developer: Ok, I have something that I’ve been curious about for a while now. For the last 5 years I’ve been using Ubuntu. And whenever I had to install a new program I had to run something like sudo apt-get install XYZ
. I’m curious, what exactly am I doing here when I run that command?
Kernel Penguin: Ah, I see. Let’s take a real example. Say you’re trying to install something like jq. So you’d run sudo apt-get install jq
. Right ?
Young developer: Yeah
Kernel Penguin: I assume you know what sudo
does?
Young man: Yeah, it runs the command that follows it with root privileges.
Kernel Penguin: We’ll get to that. But let’s go over the other bits first, apt-get
and install
.
apt-get
is your package manager. A package manager helps you…
Young man: Let me guess, manage packages 😒 ?
Kernel Penguin: Haha. Yes. But I’ll be specific. Package managers solve quite a few problems for you. For starters, it simplifies the whole installation process. When you run the install subcommand, apt-get
will automatically download a .deb
file from a repository. Once the package is downloaded apt-get will install it on your system. This usually involves extracting and configuring the contents of the package.
Young developer : Woah, hey, .deb
file ? repository ? dpkg ?
Kernel Penguin: Right, sorry. A .deb file is a debian software package file. This contains the actual executable with other things like metadata about the package, documentation, dependency information etc. Now for repository, think of it as a place to store packages and related information.
Acutally, you know what ? Let’s get your hands dirty. Did you run sudo apt-get install jq
on your machine yet ?
Young developer: Yes.
Kernel Penguin: apt-get usually downloads the package into /var/cache/apt/archives/
. You’ll probably find the package for jq
there
Young developer: Let me see
frodo@shire-labs:~$ ls /var/cache/apt/archives/ | grep jq
jq_1.6-1ubuntu0.20.04.1_amd64.deb
libjq1_1.6-1ubuntu0.20.04.1_amd64.deb
Hmm, there it is.
Kernel Penguin: So when you ran the install
subcommand, apt-get pulled the .deb file from a repository to the path we mentioned earlier. apt-get then uses another package manager called dpkg
to install your packages.
Young developer: If dpkg is package manager in itself, why are we using apt-get ? What does apt-get do that dpkg doesn’t ?
Kernel Penguin: Multiple things, but the most important in my opinion is dependency management. You see, most packages require other packages to function. Sometimes these are shared libraries that are dynamically linked when your new program is run.
You may have noticed this line when an installation is running
...
Building dependency tree
...
This is apt-get figuring out what your package’s dependencies are. apt-get will then automatically download and install those as well.
Dependency management is tricky concept. Your dependencies themselves may have other dependencies, or there may be dependencies that are incompatable with other packages already on your system. I mean, the term “dependency hell” exists for a reason.
Young developer: Is there a way for me to view the dependencies of a package ?
Kernel Penguin: Sure, you can use the dpkg with -I
option. It’ll display the dependencies along with other information.
Young developer:
frodo@shire-labs:~$ dpkg -I /var/cache/apt/archives/jq_1.6-1ubuntu0.20.04.1_amd64.deb
new Debian package, version 2.0.
size 50196 bytes: control archive=960 bytes.
875 bytes, 24 lines control
229 bytes, 4 lines md5sums
Package: jq
Version: 1.6-1ubuntu0.20.04.1
Architecture: amd64
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Installed-Size: 97
Depends: libjq1 (= 1.6-1ubuntu0.20.04.1), libc6 (>= 2.4)
...
Original-Maintainer: ChangZhuo Chen (陳昌倬) <czchen@debian.org>
I see, there they are, libjq1
and libc6
. I can also see some other information like the version and maintainer details.
Kernel Penguin: That is correct. All this data is pulled from the control file of your debian package.
Young developer: You’re doing it again..
Kernel Penguin: What ?
Young developer: Control file… ?
Kernel Penguin: Oh sorry. A control file contains metadata and configuration data of the package. You can extract the control file of debian package using
dpkg -e <your-package.deb>
This should extract a DEBIAN
folder which contains your control file among other things.
Young developer: I see. Looks like we’ve gone down a rabbit hole. Let’s go back to what apt-get does. We’ve discussed dependency management, what else does it do ?
Kernel Penguin: Repository management. dpkg works on individual .deb files and it has no concept of repositories. apt-get on the other hand deals directly with repositories. apt-get has a list of repositories stored at /etc/apt/sources.list
When an install is requested apt-get searches for the requested package in all the repositories. These repositories provide a single point of contact for all package information. When you run sudo apt-get update
, the latest package information is pulled from these repositories and updated. Also, if newer versions of your packages are available, you can upgrade them using
sudo apt-get install --only-upgrade package-name
Young developer: Before I ask you more about apt-get I have a question, possibly a silly question
Kernel Penguin: There is no such thing as a silly question. Ask away !
Young developer: Why am I able to run the newly installed executable from anywhere on the shell. I mean, if I had an executable that I created at ~/some/long/path/my-program
I’d have to type out that full path to run the executable. If I simply run my-program
, my shell throws
-bash: ./my-program: No such file or directory
On the other hand, I just installed jq
, and I can run it from anywhere
Kernel Penguin: I see, now that’s not a silly question at all. Could you run the following command for me
echo $PATH
Young developer:
frodo@shire-labs:~$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin
Okay, what’s this ?
Kernel Penguin: What you’re looking at is the $PATH
variable. If you look at closely, you’ll see that it’s actually many different paths separated by a colon. Whenever you evoke a command from the shell, it searches for the command in all these paths. If the executable is found in any of these paths, it is run, else it throws a command not found error. The reason you’re able to run jq
from anywhere is that, the installation process has either copied the jq executable or has created a symlink for the executable to one of the path in $PATH
.
Let’s, in fact, dig a bit deeper. Why don’t you extract the contents of the .deb file. This time use the -x
option and a target folder
Young developer: Hmm, didn’t know I could do that.
frodo@shire-labs:~$ dpkg -x jq_1.6-1ubuntu0.20.04.1_amd64.deb extracted
frodo@shire-labs:~$ ls extracted
usr
Kernel Penguin: If you explore the extracted
directory, you’ll see that the executable jq
is in usr/bin/jq
within the directory. When dpkg installs the package, it copies the contents of the package into exactly the same path at the root folder. In this case, whatever is inside extracted
is copied into the root folder preserving the path i.e jq
will be present in /usr/bin/jq
.
Remember seeing /usr/bin anywhere ? It is one of the paths in $PATH
. This is why you’re able to run the newly installed executable from anywhere.
Young developer: Hmm, thanks for clearing that up. That was always a mystery to me.
Kernel Penguin: You also had some other questions ?
Young developer: Are dpkg and apt-get the only package managers.
Kernel Penguin: Oh no. There are many more. synaptic
and aptitude
are a couple that use the same .deb package formats that we discussed so far. Then there there is pacman
which is the default package manager for arch linux and its derivatives. It uses a .pkg.tar.zst
package format. And then there’s yum
which uses .rpm
package format. This is the goto package manager on Red Hat-based distributions.
Young developer: Wow, so each kind of distribution has it’s own manager huh ? It’d be convenient if there was a common package manager.
Kernel Penguin: I’m glad you asked. There a couple of new kids on the block; snap
and flatpak
. Both are meant to work across different linux distributions. While flatpak is more geared towards GUI applications, snap can be used for CLI applications as well, just like the package managers we discussed earlier. They offer some additional features like parallel installations, i.e multiple installations of the same package. They also both offer sanboxed environments for the installed programs to run.
Young developer: Thank you for your answers. I do have many more questions though.
Kernel Penguin: I’m sure you do, but it is now late in the day, and wise penguins need some rest. You look like you could use some rest too.
Young developer: I agree, I shall come back with more questions at a later time. Goodbye then Mr.Penguin
Kernel Penguin: Goodbye young man !
Further reading
FYI Why using still apt-get, and not apt?
The difference between apt and apt-get is not just that apt is a newer version of apt-get. The apt command was designed as a more user-friendly alternative to apt-get, combining the functionality of multiple package management tools for user convenience.