Introduction for Packagers
Java is a programming language which is usually compiled into bytecode for JVM (Java Virtual Machine). For more details about the JVM and bytecode specification see JVM documentation.
Example Java Project
To better illustrate various parts of Java packaging we will dissect simple Java
`Hello world'' application. Java sources are usually organized using directory
hierarchies. Shared directory hierarchy creates a namespace called `package
in
Java terminology. To understand naming mechanisms of Java packages
see
Java
package naming conventions.
Let’s create a simple hello world application that will execute following steps when run:
-
Ask for a name
-
Print out ``Hello World from'' and the name from previous step
To illustrate certain points we artificially complicate things by creating:
-
Input
class used only for input of text from terminal -
Output
class used only for output on terminal -
HelloWorldApp
class used as main application
$ find .
.
./Makefile
./src
./src/org
./src/org/fedoraproject
./src/org/fedoraproject/helloworld
./src/org/fedoraproject/helloworld/output
./src/org/fedoraproject/helloworld/output/Output.java
./src/org/fedoraproject/helloworld/input
./src/org/fedoraproject/helloworld/input/Input.java
./src/org/fedoraproject/helloworld/HelloWorld.java
In this project all packages are under src/
directory hierarchy.
package org.fedoraproject.helloworld;
// import Input class from org.fedoraproject.helloworld.input package
import org.fedoraproject.helloworld.input.Input;
// import Output class from org.fedoraproject.helloworld.output package
import org.fedoraproject.helloworld.output.Output;
class HelloWorld {
public static void main(String[] args) {
System.out.print("What is your name?: ");
String reply = Input.getInput();
Output.output(reply);
}
}
org/fedoraproject/helloworld/input/Input.java
org/fedoraproject/helloworld/output/Output.java
org/fedoraproject/helloworld/HelloWorld.java
Although the directory structure of our package is hierarchical, there is no real parent-child relationship between packages. Each package is therefore seen as independent. Above example makes use of three separate packages:
-
org.fedoraproject.helloworld.input
-
org.fedoraproject.helloworld.output
-
org.fedoraproject.helloworld
Environment setup consists of two main parts:
-
Telling JVM which Java class contains
main()
method -
Adding required JAR files on JVM classpath
The sample project can be compiled to a bytecode by Java compiler. Java
compiler can be typically invoked from command line by command javac
.
$ javac $(find -name '*.java')
For every .java
file corresponding .class
file will be created. The
.class
files contain Java bytecode which is meant to be executed on
JVM.
One could put invocation of javac
to Makefile and simplify the
compilation a bit. It might be sufficient for such a simple project, but
it would quickly become hard to build more complex projects with this
approach. Java world knows several high-level build systems which can
highly simplify building of Java projects. Among the others, probably
most known are Apache Maven,
Apache Ant and Gradle.
Having our application split across many .class
files wouldn’t be very
practical, so those .class
files are assembled into ZIP files with
specific layout called JAR files. Most commonly these special ZIP files
have .jar
suffix, but other variations exist (.war
, .ear
). They
contain:
-
Compiled bytecode of our project
-
Additional metadata stored in
META-INF/MANIFEST.MF
file -
Resource files such as images or localisation data
-
Optionaly the source code of our project (called source JAR then)
They can also contain additional bundled software which is something we don’t want to have in packages. You can inspect the contents of given JAR file by extracting it. That can be done with following command:
jar xf something.jar
The detailed description of JAR file format is in the JAR File Specification.
The classpath is a way of telling JVM where to look for user classes
and 3rd party libraries. By default, only current directory is searched,
all other locations need to be specified explicitly by setting up
CLASSPATH environment variable, or via -cp
(-classpath
) option of a
Java Virtual Machine.
java -cp /usr/share/java/log4j.jar:/usr/share/java/junit.jar mypackage/MyClass.class
CLASSPATH=/usr/share/java/log4j.jar:/usr/share/java/junit.jar java mypackage/MyClass.class
Please note that two JAR files are separated by colon in a classpath definition.
See official documentation for more information about classpath. |
Classic compiled applications use dynamic linker to find dependencies (linked libraries), Python interpreter has predefined directories where it searches for imported modules. JVM itself has no embedded knowledge of installation paths and thus no automatic way to resolve dependencies of Java projects. This means that all Java applications have to use wrapper shell scripts to setup the environment before invoking the JVM and running the application itself. Note that this is not necessary for libraries.
Build System Identification
Build system used by upstream can be usually identified by looking at
their configuration files, which reside in project directory structure,
usually in its root or in specialized directories with names such as
build
or make
.
Build managed by Apache Maven is configured by an XML file that is by
default named pom.xml
. In its simpler form it usually looks like this:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example.myproject</groupId>
<artifactId>myproject</artifactId>
<packaging>jar</packaging>
<version>1.0</version>
<name>myproject</name>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.1</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
It describes project’s build process in a declarative way, without explicitly specifying exact steps needed to compile sources and assemble pieces together. It also specifies project’s dependencies which are usually the main point of interest for packagers. Other important feature of Maven that packagers should know about are plugins. Plugins extend Maven with some particular functionality, but unfortunately some of them may get in the way of packaging and need to be altered or removed. There are RPM macros provided to facilitate modifying Maven dependencies and plugins.
Apache Ant is also configured by an XML file. It is by convention named
build.xml
and in its simple form it looks like this:
<project name="MyProject" default="dist" basedir=".">
<property name="src" location="src"/>
<property name="build" location="build"/>
<property name="dist" location="dist"/>
<target name="init" description="Create build directory">
<mkdir dir="${build}"/>
</target>
<target name="compile" depends="init"
description="Compile the source">
<javac srcdir="${src}" destdir="${build}"/>
</target>
<target name="dist" depends="compile"
description="Generate jar">
<mkdir dir="${dist}/lib"/>
<jar jarfile="${dist}/myproject.jar" basedir="${build}"/>
</target>
<target name="clean" description="Clean build files">
<delete dir="${build}"/>
<delete dir="${dist}"/>
</target>
</project>
Ant build file consists mostly of targets, which are collections of
steps needed to accomplish intended task. They usually depend on each
other and are generally similar to Makefile targets. Available targets
can be listed by invoking ant -p
in project directory containing
build.xml
file. If the file is named differently than build.xml
you
have to tell Ant which file should be used by using -f
option with the
name of the actual build file.
Some projects that use Apache Ant also use Apache Ivy to simplify
dependency handling. Ivy is capable of resolving and downloading
artifacts from Maven repositories which are declaratively described in
XML. Project usually contains one or more ivy.xml
files specifying the
module Maven coordinates and its dependecies. Ivy can also be used
directly from Ant build files. To detect whether the project you wish to
package is using Apache Ivy, look for files named ivy.xml
or nodes in
the ivy
namespace in project’s build file.
Gradle is configured by a script written in Groovy (dynamic language
built on top of JVM and extending Java’s syntax). It’s configuration
file is build.gradle
.
apply plugin: 'java'
task customJar(type: Jar) {
archiveName = 'myproject.jar'
destinationDir = file("${buildDir}/jars")
from sourceSets.main.output
}
While unlikely, it’s still possible that you encounter a project whose
build is managed by plain old Makefiles. They contain a list of targets
which consist of commands (marked with tab at the begining of line) and
are invoked by make
target or simply make
to run the default
target.
Quiz for Packagers
At this point you should have enough knowledge about Java to start packaging. If you are not able to answer following questions return back to previous sections or ask experienced packagers for different explanations of given topics.
-
What is the difference between JVM and Java?
-
What is a CLASSPATH environment variable and how can you use it?
-
Name two typical Java build systems and how you can identify which one is being used
-
What is the difference between
java
andjavac
comands? -
What are contents of a typical
JAR
file? -
What is a pom.xml file and what information it contains?
-
How would you handle packaging software that contains
lib/junit4.jar
inside source tarball? -
Name at least three methods for bundling code in Java projects