PRC2 - 04 Database Access JDBC

Java DataBase Connectivity

DataBase connection credentials and Java Properties

Some things do NOT belong in source code. In particular do not put credentials of any kind inside files that are committed to a version control system, such as source code. Make sure you configure your version control system such that such files are excluded from commits.

Instead put credentials in separate files, that are easily understood by both a human that uses it to configure access to a resource, and also by the programming language.
In java the tradition is to use so called properties files, which, also traditionally, have a file extension .properties. It also helps to give such files well known names, so the program can refer to them by that name.

For the demonstrations in this part we will use the following properties file.

db.properties file specifying connection details for dev and prod

# You can add comments here.

# hosts and ports are plural
# to be able to connect to multiple hosts and ports
# which are probably used for pooling


jdbc.pg.dev.dbname=dev
jdbc.pg.dev.user=exam
jdbc.pg.dev.password=exam
jdbc.pg.dev.hosts=localhost

jdbc.pg.prod.dbname=prod
jdbc.pg.prod.user=appdbuser
jdbc.pg.prod.password=verySecretPassw00rd
jdbc.pg.prod.hosts=cluster1,cluster2,cluster3,fallback1
jdbc.pg.presidents.ports=5432,5433,5434


jdbc.pg.presidents.dbname=presidents
jdbc.pg.presidents.user=exam
jdbc.pg.presidents.password=exam
jdbc.pg.presidents.hosts=localhost
jdbc.pg.presidents.ports=5433

You can see that the properties file supports two environments, development and production.

Configuring a datasource. Put it in a utility class.

public class DBProvider {

    static Map<String, DataSource> cache = new HashMap<>();

    static DataSource getDataSource(final String sourceName) {

        return cache.computeIfAbsent(sourceName,
                (s) -> {
                    Properties props = getProperties("db.properties");

                    PGSimpleDataSource dataSource = new PGSimpleDataSource();

                    String prefix = sourceName + ".";

                    String[] serverNames = {props.getProperty(prefix + "dbhost")};
                    dataSource.setServerNames(serverNames);

                    int[] portNumbers = {Integer.parseInt(props.getProperty(prefix + "port"))};
                    dataSource.setPortNumbers(portNumbers);

                    dataSource.setUser(props.getProperty(prefix + "username"));
                    dataSource.setDatabaseName(props.getProperty(prefix + "dbname"));
                    dataSource.setPassword(props.getProperty(prefix + "password"));
                    dataSource.setCurrentSchema(props.getProperty(prefix + "schema"));

                    return dataSource;
                }
        );
    }

    static Properties getProperties(String propertiesFileName){

        System.out.println("The user working dir is " + System.getProperty("user.dir"));

        // Usage of resource file is preferred way. No issues with working dir.
        // Uses the default location of resources (in src/main/java/resources dir)
        // getClassLoader() is necessary, unless you store your property file in a
        // subfolder according to package name
        // (src/main/resources/fontys/customerdbdemo in this case).

        Properties properties = new Properties();
        try (InputStream dbProperties = DBProvider.class.getClassLoader().getResourceAsStream(propertiesFileName);) {
            properties.load(dbProperties);
        } catch (IOException ex) {
            Logger.getLogger(DBProvider.class.getName()).log(Level.SEVERE, null, ex);
        }
        return properties;
    }
}

Using a Data source

There are some traditional ways to obtain a database connection. We use a DataSource, which itself can be seen as a resource. The data source can then be used to obtain a connection. In the example you see a class that needs a DataSource that is provided at construction time of the class, so it is available when the instance is created. A connection is AutoClosable so candidate for try-with-resources.

Using a datasource to obtain a connection and use the connection

package simplejdbc;

import java.sql.Connection;
import java.sql.SQLException;
import javax.sql.DataSource;

/**
 *
 * @author hom
 */
public class DataSourceDemo {

    final DataSource datasource; 1

    public DataSourceDemo( DataSource datasource ) { 2
        this.datasource = datasource;
    }

    void demo() throws SQLException {  3
        String query
            = """
            select state.name as state
              ,p.name as president
              ,state.year_entered
            from president p
            join state s on(p.state_id_born=s.id)
                where s.name like 'N%'
            """;
            doQuery( query, System.out );
    }
}

Resource used in methods of this class.
Constructor receives the DataSource.
The method uses the DataSource to get a connection in the try-with-resources block
and passes it on to the method that executes the query and deals with the result by printing it.

The doQuery(…) method uses the supplied connection to create a statement which is then executed to produce a ResultSet. You will see some similarities in what you have seen in project 1, using php PDO.

doQuery

void doQuery( String query,
                PrintStream out ) throws SQLException {
    try ( Connection con = datasource.getConnection();
            Statement st = con.createStatement();
            ResultSet rs = st.executeQuery( query ) ) {

        new ResultSetPrinter( rs ).printTable( out );

    }
}

The ResultSetPrinter tries to make a nice looking table of the result of the query. You can imagine that this is a bit of code, but that is not relevant for this demo of JDBC.

ResultSet

For all queries that return a tabular result, the first JDBC class you will use is the ResultSet. The ResultSet also provides so called Meta Information that describes the types of the values in the columns, the number of columns, display size etc.
This can be used to:

produce a nice tabular format
by using a translation or mapping between the database types and Java types, how the column data is to be used, type wise.

Anatomy of a prepared statement

In the example earlier, the sql text used to create the statement is constant, because it needs no user input. If a statement does, you should always use a PreparedStatement.

In the JDBC standard you fill in the parameters from the user in multiple ways, but the simplest is to just use question marks (?) as placeholders and specify which columns and column values you want to insert or update, or you want to use in your select or delete query.

sql query text to insert a student.

insert into students (student_id , lastname, firstname, dob, gender) 1
            values   (?        ,          ?,         ?,   ?,      ?) 2
returning *                                                          3

Fields can be supplied in any order, including definition order.
You do not need the spaces before the commas, we added them for readability.
It is smart to always expect back what has been inserted by the database, including generated id and other fields such as the database’s notion of date or time. Even smarter, but a little more work //' is to specify the columns instead of a *, so that the order in which you receive them is independent of database schema organization and stable.

Now assume the SQL text contains the above. Also assume that we have an array of student data, simply as an array of objects.

student data in array

 Object[] studentData = {123513, "Klaassen", "Jan", "1993-05-12" , "M"}; 1

Then creating the prepared statement and filling it with values is a simple loop:

loop to substitute placeholders.

try (
    Connection con = datasource.getConnection();
    PreparedStatement stmt = con.prepareStatement( query ); ) {

    int count = 0;
    for ( Object param : studentData ) {
        stmt.setObject( ++count, param ); 1
    }
    return stmt.executeUpdate();
}

note the pre-increment, so count starts with column 1.

You see that this is quite simple and other than what is specified in the query, there is no extra need for data conversion or named column use.

This approach can be used to make database access even simpler, so you only have to provide the data in an array and the rest can be packed into utility methods.

The holy grail is to find a way to do all kind of queries against tables, and the only thing you need to know is the table name and what entities as Java objects can be expected to be read from or written to the table. However, start simple first! At some point you’ll find duplicated code and you’ll find a way to optimize your code. Typically, you’ll use Generics, Lambda’s and streams and a bit of reflection at some point. We think it’s good to first face the issue of code that is difficult to maintain and afterwards find a solution for that, instead of providing you with a very generic and maybe complex solution without knowing which underlying problems it solves. //'

Database Meta Information

DO Mention the metadata…

In the previous part we have seen how to use reflection on java classes.

A similar and standardized concept also exists for databases. You can retrieve all kind of meta information about the objects (such as tables and views) defined in your database. Accessing that data can be done with selecting data from special relations in a special schema, called the information_schema.

Suppose we have the following definition of a table students in the schema public:

CREATE TABLE public.students (
    student_id INTEGER PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, 1
    firstname TEXT NOT NULL,  2
    lastname TEXT NOT NULL,
    dob DATE NOT NULL,
    cohort integer NOT NULL DEFAULT  EXTRACT('year' from now()), 3
    email TEXT NOT NULL,
    gender CHARACTER(1) NOT NULL check (gender in ('F','M','U')), 4
    student_grp TEXT NOT NULL,
    active BOOLEAN DEFAULT false NOT NULL
);

ISO SQL for a serial column that by default is generated, and also is primary key.
All columns that 'traditionally' would be varchar are now text. Just as efficient less hassle. It will always fit. Only use varchar if your business requires a length constraint.
The cohort is the year of registration, which can be derived from the insertion moment.
Gender is a one character value with a restricted value set, much like an enum, but simpler.

If after the definition your would ask the database what it knows about this table with

SELECT ordinal_position, column_name,data_type, column_default, is_nullable,
       is_generated, datetime_precision
FROM   information_schema.columns WHERE table_name ='students' ORDER BY 1;

you would get the following as output:

┌──────────────────┬─────────────┬───────────┬──────────────────────────┬─────────────┬──────────────┬────────────────────┐
│ ordinal_position │ column_name │ data_type │      column_default      │ is_nullable │ is_generated │ datetime_precision │
╞══════════════════╪═════════════╪═══════════╪══════════════════════════╪═════════════╪══════════════╪════════════════════╡
│                1 │ student_id  │ integer   │                          │ NO          │ NEVER        │                    │
│                2 │ firstname   │ text      │                          │ NO          │ NEVER        │                    │
│                3 │ lastname    │ text      │                          │ NO          │ NEVER        │                    │
│                4 │ dob         │ date      │                          │ NO          │ NEVER        │                  0 │
│                5 │ cohort      │ integer   │ EXTRACT(year FROM now()) │ NO          │ NEVER        │                    │
│                6 │ email       │ text      │                          │ NO          │ NEVER        │                    │
│                7 │ gender      │ character │                          │ NO          │ NEVER        │                    │
│                8 │ student_grp │ text      │                          │ NO          │ NEVER        │                    │
│                9 │ active      │ boolean   │ false                    │ NO          │ NEVER        │                    │
└──────────────────┴─────────────┴───────────┴──────────────────────────┴─────────────┴──────────────┴────────────────────┘

From this information you can imagine that it is relatively easy to generate the matching Java types as record type.

The resulting record would look like this:

package entities;

import java.time.LocalDate;

public record Student (
   int student_id,
   String firstname,
   String lastname,
   LocalDate dob,
   int cohort,
   String email,
   char gender,
   String student_grp,
   bool active
){} 1

indeed, no body whatsoever.

Because in the Java layer you would like to have the meta information handy in a Mapper, you can generate the mapper at the same time from the database information instead of using reflection to the same effect.

Check constraints

Similarly you can also get the information of the check constraints with another, a bit more involved query.

-- retrieve the check constraints for the user schemas
select tc.table_schema,                                                          
       tc.table_name,
       string_agg(col.column_name, ', ') as columns,
       tc.constraint_name,
       cc.check_clause
from information_schema.table_constraints tc
join information_schema.check_constraints cc
     on tc.constraint_schema = cc.constraint_schema
     and tc.constraint_name = cc.constraint_name
join pg_namespace nsp on nsp.nspname = cc.constraint_schema
join pg_constraint pgc on pgc.conname = cc.constraint_name
                       and pgc.connamespace = nsp.oid
                       and pgc.contype = 'c'
join information_schema.columns col
     on col.table_schema = tc.table_schema
     and col.table_name = tc.table_name
     and col.ordinal_position = ANY(pgc.conkey)
where tc.constraint_schema not in('pg_catalog', 'information_schema')
group by tc.table_schema,
         tc.table_name,
         tc.constraint_name,
         cc.check_clause
order by tc.table_schema,
         tc.table_name;

which, because only one column in this schema actually declares a check constraint, results in:

┌──────────────┬────────────┬─────────┬───────────────────────┬─────────────────────────────────────────────────────────────────┐
│ table_schema │ table_name │ columns │    constraint_name    │                          check_clause                           │
╞══════════════╪════════════╪═════════╪═══════════════════════╪═════════════════════════════════════════════════════════════════╡
│ public       │ students   │ gender  │ students_gender_check │ ((gender = ANY (ARRAY['F'::bpchar, 'M'::bpchar, 'U'::bpchar]))) │
└──────────────┴────────────┴─────────┴───────────────────────┴─────────────────────────────────────────────────────────────────┘

You can use the check constraints in your business code too, but actually deferring those checks as final checks to the database is just fine. The database layer with throw an appropriate exception when a value is not to the liking of a constraint.

You may want to use the check information in the user interface layer, to warn the user that a transaction will not succeed with a illegal or missing value. Parsing the check clause is of course a bit complex, but with a regex you can do a lot, and also know that most check constraint are relatively simple.

Transactions

Things that can go wrong will go wrong. This problem is aggravated by having to use multiple things. Murphy says hello again :-).

A transaction is defined as a sequence of operations that can either succeed completely or be undone without any residual effect.
In plain English: It either happens fully, or we can forget about it.

For a persistence layer that means: if it cannot fulfill the obligation it should make sure that nothing happens to compromise the data already stored before the transaction.

The simple explanation of how this works is that the party that is busy with a resource gets the resource all for itself, locking any other parties out. If something fails, the mess created can either be undone (rollback) or not committed, which in effect is the same.

Locking out sounds serious, and it is, on multiple accounts.

When using a BIG lock, as locking out all other parties, things become VERY slow.
When lock less, like just a table, or better still, only the records we want to modify, then DEADlocks can occur, which is even more problematic than having to wait.
In all cases, when something goes awry, one may have to clean up mess in multiple places.

All this can be related to quite difficult problems. Luckily there are solutions to that and, not unimportant, as far as the database goes, that can typically deal with the issues pretty well. Have a look at paragraph 5.9 for more information about transaction programming.

Using Stored Procedures

From the DBS1 course, you should know about stored procedures. Don’t hesitate to use functionalities of your DBMS.