Saturday 8 November 2014

A Fun Camel Ride With Apache Camel Part Six Reading Tags, Inserting Values And Setting Headers

This is probably the last sizeable entry in this series of the tutorial. In this part of the tutorial there are some design decisions and a lot more code. I will try to be brief but some of the parts are going to need a little more than a couple of words to justify why I took a certain decisions. The next step is to analyse the tags and insert them into the database. Once this step is complete we want to be able to route the files to the final location.

In the previous step I also showed you how to declare a bean and use that bean as a processor. Lets expand the functionality of this class to actually attempt to read the tags and then set some headers we will use later on to write the file to the right location. To read the tags associated with audio file the JAudioTagger library needs a File object to read.

When Camel reads a file via the file component it does not actually send the file on the exchange this might sound contra-intuitive but unless you access the body of the file there is no need to send the entire file along the exchange. This keeps the message small and thus improves throughput.I discussed this concept in a answer on StackOverflow if you want more information

However like I mentioned the JAudioTagger library requires a file object to read. I had two choices on how to deal with the JAudioTagger library requiring a file to object to read.

My first choice was to convert the message body to a binary file and then point the JAudioTagger library at the temporary file I just created. However that would require me to read the source file and then write it to some temporary location and then point JAudioTagger to that temporary location to read a temporary file. Since I accessed the message body I would send a  much larger message across the exchange.

My second option was to use the information Camel passed in the header and use that information to read the tags from the original file.

 Lets review these two options and the steps(operations) involved to see which one is more efficient:

  1. Access the Camel message body and then save the body to a location for tag analysis:
    • Access the message body forcing Camel to read the source file as a Byte[]
    • Write this Byte[] to some temporary location
    • Load this file in the temporary location into JAudioTagger library
    • Analyse the tags
    • Delete the temporary file.
    • Send the message onto the next step and as the message now has the file data is much bigger.
  2. Use the file path information passed to me by Camel in the header to do tag analysis:
    • Use the filepath info passed in the header of the message to read the original source file
    • Analyse the tags
    • Send the original small message onto the next step. 

Option two has three steps and Option one has six steps. Right so we will go with option two as it involves less operations. Remember when deciding on a course of action to deal with something look for the path of least resistance. There is also another reason I decided on this option.

Later on I want to be able to process audio files in parallel. If I used option one I would be sending the entire audio file as a Byte[] along the exchange. Some audio files can be rather large(more than 1 megabyte is normal) and I don't want Camel to send such large message in parallel as this is a recipe for reducing my scalability right away.

Remember as a developer you must try to solve problem efficiently not just hack your way through it. Keep scalability, performance, security and ease of use as a mantra and you will be ok. Always think before you code the results are much more professional than someone that hacks at it for hours. Enough justification of my design decisions lets see how we will get the tag data from the file and insert this into the database.

In the previous part of this tutorial I showed you how to declare a connection pool and we also declared a couple of stored procedures for the MySQL database. Lets review how we will use these in our route.

The connection pool is actually very easy to use as CDI injection handles for us. However how do you get a connection from a declare connection pool. The code for this is too easy(as the Australians say). The tagAnalyser bean has a property called dataSource which CDI uses to inject the connection pool into. To get a connection from this pool we simply need to access the property datasource and ask it to send give us a connection.

This can be achieved with the following lines of code:
    Connection conn = null; //declare connection object
    conn= dataSource.getConnection(); //get a connection from the pool

Pretty simple right? The next step will be to declare a file object that points to the source file to use in tag analysis. Here I ran into some problems as some audio files in my collection are from other countries such as Japan, China and other places with interesting alphabets and characters. This had me stumped for a while and then I realised I forgot to tell Java which charset encoding to use for my file system. Some JVM's will run under the default charset encoding which may differ from your actual file system encoding. This can be a very difficult bug to spot. Add the following option to your VM arguments -Dfile.encoding=UTF-8 to rectify this. Well that is assuming your file system is running UTF-8.

The code for pointing JAudioTagger at the source file and reading the tags will generally follow this pattern:

    File audioFile = new File(filepath);
    AudioFile f = AudioFileIO.read(audioFile);
    Tag tag = f.getTag();

Right so lets look at the complete code for the processor to implement the needed functionality.
package com.tutorial.namphibiansoftware.camelmusic;

import java.io.File;
import java.sql.CallableStatement;
import java.sql.Connection;
import java.sql.SQLException;
import java.util.Iterator;
import java.util.Map;

import org.apache.camel.Exchange;
import org.apache.camel.Handler;
import org.apache.camel.Headers;
import org.apache.commons.dbcp.BasicDataSource;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.commons.lang.StringEscapeUtils;
import org.jaudiotagger.audio.AudioFile;
import org.jaudiotagger.audio.AudioFileIO;
import org.jaudiotagger.tag.FieldKey;
import org.jaudiotagger.tag.Tag;
import org.jaudiotagger.tag.TagField;

public class tagAnalyser {
 
private static final String SQL_INSERT_ARTIST="CALL setArtist(?);";    //This is the call to the stored procedure which will insert or update an artist
private static final String SQL_INSERT_GENRE="CALL setGenre(?)";
private static final String SQL_INSERT_SONG_ARTIST ="CALL setSongArtist(?,?);";
private static final String SQL_INSERT_GENRE_ARTIST ="CALL setGenreArtist(?,?);";


//We will use dependency injection to inject the connection pool into this property.
private BasicDataSource dataSource;
//Another example of using dependency injection to inject a logger into the class
private static final Log LOG = LogFactory.getLog(tagAnalyser.class);
public BasicDataSource getDataSource() {
 return dataSource;
}

public void setDataSource(BasicDataSource dataSource) {
 this.dataSource = dataSource;
}
 @Handler
 public void readTag(@Headers Map header,Exchange exchange) throws Exception{
        
    
        Connection conn   = null;
        CallableStatement cstmt = null;
        String artist   = null;
        String genre   = null;
        String album   = null;
        String title   = null;
        String camelFilePath  = null; 
        String filepath   = null;
        File   audioFile        = null;
        try
        {
            //get the header information once so we dont have to look it up again
            camelFilePath = (String)header.get("CamelFilePath");   
            LOG.info("We just got a connection from the connection pool we can use this to insert data for the file: "+camelFilePath);
 
            //create the file object used by JAudioTagger to read tags
            audioFile = new File(filepath);        
            //create the audiofile object based on the file
            AudioFile f = AudioFileIO.read(audioFile);      
            Tag tag = f.getTag();
            LOG.info("Read Tag Information For The File:"+camelFilePath);
            /*
             * This line(s) of code might need some explaining. 
             * The tag value can be a lot of different combinations of empty, null or zero length strings.
             * Thus I check for the following conditions
             * 1) NULL Strings
             * 2) Empty Strings
             * 3) Length of zero
             * If any of these conditions are true I set the artist name to UNKNOWN
             * If none of these conditions are met I just use the first artist name.
             * I know there can be multiple artists per song
             * However I am keeping this simple you can expand on this if needed
            */
            title=(tag.getFirst(FieldKey.TITLE)==null||tag.getFirst(FieldKey.TITLE).trim().equalsIgnoreCase("")||tag.getFirst(FieldKey.TITLE).length()==0?"Unknown":tag.getFirst(FieldKey.TITLE));
            LOG.info("The Title Of This File Is: "+title);
            artist = ((tag.getFirst(FieldKey.ARTIST)==null||tag.getFirst(FieldKey.ARTIST).equalsIgnoreCase("")||tag.getFirst(FieldKey.ARTIST).length()==0)?"Unknown":tag.getFirst(FieldKey.ARTIST));
            LOG.info("The First Artist Tag Name is: "+artist);
            if(tag.getFields(FieldKey.ARTIST).size()>1)
            {
             
             LOG.info("Multiple artists listed for "+title+" other artist names are as follows:");
             for (Iterator iterator = tag.getFields(FieldKey.ARTIST).iterator(); iterator
      .hasNext();) 
             {
     TagField artistField = (TagField) iterator.next();
     LOG.info("Other Artist Name: "+artistField);
     
    }
             
            }
            genre=(tag.getFirst(FieldKey.GENRE)==null||tag.getFirst(FieldKey.GENRE).trim().equalsIgnoreCase("")||tag.getFirst(FieldKey.GENRE).length()==0?"Unknown":tag.getFirst(FieldKey.GENRE));
            LOG.info("Song "+title +" is of genre: "+genre);
            album=(tag.getFirst(FieldKey.ALBUM)==null||tag.getFirst(FieldKey.ALBUM).trim().equalsIgnoreCase("")||tag.getFirst(FieldKey.ALBUM).length()==0?"Unknown":tag.getFirst(FieldKey.ALBUM));
            LOG.info("Song "+title +" was release on the album: "+album);
        }
        catch (Exception tagException)
        {
         LOG.info("Error in reading tag information please check error log .");
            LOG.error(tagException.getMessage());
            // Here we throw a cannot read tag exception.
        }
        LOG.info("Read Tag Information Setting Header Values For Exchange.");
        header.put("Artist", artist);
        header.put("Album" ,album);
        header.put("Title",title);
        header.put("Genre",genre);
        try
        {
         int numOfRows =0;
         conn  = dataSource.getConnection(); //get a connection from the pool
         cstmt = conn.prepareCall(SQL_INSERT_ARTIST);
         cstmt.setString(1, artist);
         numOfRows=cstmt.executeUpdate();
         LOG.info("Inserted: "+numOfRows+" Artist");
         cstmt = conn.prepareCall(SQL_INSERT_GENRE);
         cstmt.setString(1, genre);
         numOfRows=cstmt.executeUpdate();
         LOG.info("Inserted: "+numOfRows+" Genre");
         cstmt = conn.prepareCall(SQL_INSERT_SONG_ARTIST);
         cstmt.setString(1, title);
         cstmt.setString(2, artist);
         numOfRows=cstmt.executeUpdate();
         LOG.info("Inserted: "+numOfRows+" Songs/Tracks");
         cstmt = conn.prepareCall(SQL_INSERT_GENRE_ARTIST);
         cstmt.setString(1, genre);
         cstmt.setString(2, artist);
         numOfRows=cstmt.executeUpdate();
         LOG.info("Inserted: "+numOfRows+" Genre And Artist Combinations");

             
            }
            catch(Exception dbException)
            {
             LOG.error(dbException.getMessage());
            }
         finally
         {
             try
             {
                     if (cstmt!=null)
                     {
                      cstmt.close();
                     }
                     if (conn!=null)
                     {
                         conn.close();
                     }
             }
             catch(SQLException e)
             {
                 System.out.println(e.getMessage());
             }
         }
  }
 
}


The code is a a little bit more than previous listing of the processor. The process is pretty easy to follow though first we try to read the tags of the file and then we set the headers and finally we do the database inserts. I have broken up the reading of the tags and the inserting of the data into the database into two separate try catch blocks so that I can raise different types of exceptions for the exchange and have the route respond differently to the two types of exceptions. At this point the exceptions just gets logged to the logging system. Go ahead and run the route and watch the log output.

Thats it for this part which I hope is the last of the long parts. In the next part we will look at dealing with the exceptions and also how to write the files to their final destination. After that we will look at some techniques that scale file processing in a Apache Camel route.

Finally just to make sure that you can apply these techniques into a real life deployment we will also cover how to deploy this route to a Apache Karaf installation.

No comments:

Post a Comment