Introduction for Packagers

Java is a programming language which is usually compiled into bytecode for JVM (Java Virtual Machine). For more details about the JVM and bytecode specification see JVM documentation.

Example Java Project

To better illustrate various parts of Java packaging we will dissect simple Java `Hello world'' application. Java sources are usually organized using directory hierarchies. Shared directory hierarchy creates a namespace called `package in Java terminology. To understand naming mechanisms of Java packages see Java package naming conventions.

Let’s create a simple hello world application that will execute following steps when run:

  1. Ask for a name

  2. Print out ``Hello World from'' and the name from previous step

To illustrate certain points we artificially complicate things by creating:

  • Input class used only for input of text from terminal

  • Output class used only for output on terminal

  • HelloWorldApp class used as main application

Directory listing of example project
$ find .
.
./Makefile
./src
./src/org
./src/org/fedoraproject
./src/org/fedoraproject/helloworld
./src/org/fedoraproject/helloworld/output
./src/org/fedoraproject/helloworld/output/Output.java
./src/org/fedoraproject/helloworld/input
./src/org/fedoraproject/helloworld/input/Input.java
./src/org/fedoraproject/helloworld/HelloWorld.java

In this project all packages are under src/ directory hierarchy.

HelloWorld.java listing
package org.fedoraproject.helloworld;

// import Input class from org.fedoraproject.helloworld.input package
import org.fedoraproject.helloworld.input.Input;
// import Output class from org.fedoraproject.helloworld.output package
import org.fedoraproject.helloworld.output.Output;

class HelloWorld {
    public static void main(String[] args) {
        System.out.print("What is your name?: ");
        String reply = Input.getInput();
        Output.output(reply);
    }
}
Java packages
org/fedoraproject/helloworld/input/Input.java
org/fedoraproject/helloworld/output/Output.java
org/fedoraproject/helloworld/HelloWorld.java

Although the directory structure of our package is hierarchical, there is no real parent-child relationship between packages. Each package is therefore seen as independent. Above example makes use of three separate packages:

  • org.fedoraproject.helloworld.input

  • org.fedoraproject.helloworld.output

  • org.fedoraproject.helloworld

Environment setup consists of two main parts:

  • Telling JVM which Java class contains main() method

  • Adding required JAR files on JVM classpath

Compiling our project

The sample project can be compiled to a bytecode by Java compiler. Java compiler can be typically invoked from command line by command javac.

$ javac $(find -name '*.java')

For every .java file corresponding .class file will be created. The .class files contain Java bytecode which is meant to be executed on JVM.

One could put invocation of javac to Makefile and simplify the compilation a bit. It might be sufficient for such a simple project, but it would quickly become hard to build more complex projects with this approach. Java world knows several high-level build systems which can highly simplify building of Java projects. Among the others, probably most known are Apache Maven, Apache Ant and Gradle.

See also Maven and Ant sections.

JAR files

Having our application split across many .class files wouldn’t be very practical, so those .class files are assembled into ZIP files with specific layout called JAR files. Most commonly these special ZIP files have .jar suffix, but other variations exist (.war, .ear). They contain:

  • Compiled bytecode of our project

  • Additional metadata stored in META-INF/MANIFEST.MF file

  • Resource files such as images or localisation data

  • Optionaly the source code of our project (called source JAR then)

They can also contain additional bundled software which is something we don’t want to have in packages. You can inspect the contents of given JAR file by extracting it. That can be done with following command:

jar xf something.jar

The detailed description of JAR file format is in the JAR File Specification.

Classpath

The classpath is a way of telling JVM where to look for user classes and 3rd party libraries. By default, only current directory is searched, all other locations need to be specified explicitly by setting up CLASSPATH environment variable, or via -cp (-classpath) option of a Java Virtual Machine.

Setting the classpath
java -cp /usr/share/java/log4j.jar:/usr/share/java/junit.jar mypackage/MyClass.class
CLASSPATH=/usr/share/java/log4j.jar:/usr/share/java/junit.jar java mypackage/MyClass.class

Please note that two JAR files are separated by colon in a classpath definition.

See official documentation for more information about classpath.

Wrapper scripts

Classic compiled applications use dynamic linker to find dependencies (linked libraries), Python interpreter has predefined directories where it searches for imported modules. JVM itself has no embedded knowledge of installation paths and thus no automatic way to resolve dependencies of Java projects. This means that all Java applications have to use wrapper shell scripts to setup the environment before invoking the JVM and running the application itself. Note that this is not necessary for libraries.

Build System Identification

Build system used by upstream can be usually identified by looking at their configuration files, which reside in project directory structure, usually in its root or in specialized directories with names such as build or make.

Maven

Build managed by Apache Maven is configured by an XML file that is by default named pom.xml. In its simpler form it usually looks like this:

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
                             http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.example.myproject</groupId>
    <artifactId>myproject</artifactId>
    <packaging>jar</packaging>
    <version>1.0</version>
    <name>myproject</name>
    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

It describes project’s build process in a declarative way, without explicitly specifying exact steps needed to compile sources and assemble pieces together. It also specifies project’s dependencies which are usually the main point of interest for packagers. Other important feature of Maven that packagers should know about are plugins. Plugins extend Maven with some particular functionality, but unfortunately some of them may get in the way of packaging and need to be altered or removed. There are RPM macros provided to facilitate modifying Maven dependencies and plugins.

Ant

Apache Ant is also configured by an XML file. It is by convention named build.xml and in its simple form it looks like this:

<project name="MyProject" default="dist" basedir=".">
  <property name="src" location="src"/>
  <property name="build" location="build"/>
  <property name="dist" location="dist"/>

  <target name="init" description="Create build directory">
    <mkdir dir="${build}"/>
  </target>

  <target name="compile" depends="init"
        description="Compile the source">
    <javac srcdir="${src}" destdir="${build}"/>
  </target>

  <target name="dist" depends="compile"
        description="Generate jar">
    <mkdir dir="${dist}/lib"/>

    <jar jarfile="${dist}/myproject.jar" basedir="${build}"/>
  </target>

  <target name="clean" description="Clean build files">
    <delete dir="${build}"/>
    <delete dir="${dist}"/>
  </target>
</project>

Ant build file consists mostly of targets, which are collections of steps needed to accomplish intended task. They usually depend on each other and are generally similar to Makefile targets. Available targets can be listed by invoking ant -p in project directory containing build.xml file. If the file is named differently than build.xml you have to tell Ant which file should be used by using -f option with the name of the actual build file.

Some projects that use Apache Ant also use Apache Ivy to simplify dependency handling. Ivy is capable of resolving and downloading artifacts from Maven repositories which are declaratively described in XML. Project usually contains one or more ivy.xml files specifying the module Maven coordinates and its dependecies. Ivy can also be used directly from Ant build files. To detect whether the project you wish to package is using Apache Ivy, look for files named ivy.xml or nodes in the ivy namespace in project’s build file.

Gradle

Gradle is configured by a script written in Groovy (dynamic language built on top of JVM and extending Java’s syntax). It’s configuration file is build.gradle.

apply plugin: 'java'

task customJar(type: Jar) {
  archiveName = 'myproject.jar'
  destinationDir = file("${buildDir}/jars")
  from sourceSets.main.output
}
Make

While unlikely, it’s still possible that you encounter a project whose build is managed by plain old Makefiles. They contain a list of targets which consist of commands (marked with tab at the begining of line) and are invoked by make target or simply make to run the default target.

Quiz for Packagers

At this point you should have enough knowledge about Java to start packaging. If you are not able to answer following questions return back to previous sections or ask experienced packagers for different explanations of given topics.

  1. What is the difference between JVM and Java?

  2. What is a CLASSPATH environment variable and how can you use it?

  3. Name two typical Java build systems and how you can identify which one is being used

  4. What is the difference between java and javac comands?

  5. What are contents of a typical JAR file?

  6. What is a pom.xml file and what information it contains?

  7. How would you handle packaging software that contains lib/junit4.jar inside source tarball?

  8. Name at least three methods for bundling code in Java projects