Posts tagged ‘database’

Importance of being indexed

We noticed the other day that the MySQL process on one of our MSG servers was using up far more resources that it should do. After a little investigation it turned out that the problem was due to one very slow running query. The query in question had around 200k records in one table joined to a table with approx 80k records. Running the same query against my test database (with 70k and 80k records in the tables) ran in about 0.2 secs, but with the increased no of records it could take up to 630 seconds (yes – over 10 minutes), so no wonder then MySQL process was going into overdrive.

There were indexes on each of the tables, but only through a compound primary key, but the columns used for the join in the 2 tables in question didn’t have their own index. As soon as I added an index on these columns the query started to run in a much more respectable time (<0.2 secs).

So the lesson for me is to check that indexes are created for any columns used for joins on tables which are going to get big!

On a side note, this was in the OpenFire database and I’m not keen on the fact that their database structure uses the username column (a varchar(64) field) as the primary key for the user, rather than a more usual integer value. I suspect that joining two large tables on a varchar type column rather than integer column contributed to the slow running of the query.

BTW… can anyone recommend any (free/open-source) tools for analysing the performance and use of the indexes on a MySQL database? There seem to be loads MySQL admin type tools around, many paid-for products, but hard to tell which are any good for what I’m looking to do. Really I guess I’m looking for the similar features as MS SQL Query Analyser provides! Update 26/11/07 – I’m looking for something which gives me information about the query execution plan (i.e. how MySQL will use the indexes etc in the construction of the query) rather than general db admin tools for simply executing queries.

Floating point hell…!

I updated the map grids so they are divided equally, and all appears to be working in that the markers divided and combined in an expected way. However I did find there were quite a few little bugs etc with the reclustering algorithm which were all related to floating points and the number of decimal places.

The problems appear to be due to rounding errors when I’m storing the coordinates in the database. For example, when I think I’m storing say “51.2″, actually “51.2000000000001″ gets stored in the database, so then when I look up all the points matching 51.2 the database doesn’t return anything!

I have found a way around this, though I’m not sure how much of a hack it is! The solution seemed to be to treat the coordinates as strings when adding them to the prepared statement for updating and querying on. So instead of using:

String sql ="SELECT * FROM mytable WHERE lat = ?";
Connection conn = conn.getConnection();
PreparedStatement pstmt = conn.prepareStatement(sql);
pstmt.setDouble(1,51.2);

I’m using:

String sql ="SELECT * FROM mytable WHERE lat = ?";
Connection conn = conn.getConnection();
PreparedStatement pstmt = conn.prepareStatement(sql);
DecimalFormat f = new DecimalFormat("#.000000000000");
pstmt.setString(1,f.format(51.2));

This all appears to be working for storing/retrieving the correct values/records – note that the values are still stored as datatype decimal(18,15). My only concern is that I’ve only tested it against a MySQL database and I’m not sure whether other rdbmss would accept decimals/floats passed in string format.

Another possible solution that was suggested to me was to multiply up the coordinates so they are stored as integers in the database – but as this would be a fairly significant change at this stage (and may introduce more bugs!!), I’ll stick with the solution I’ve implemented for now ;-)

Floats, numerics, java, mysql, precision and rounding…

Spent the last couple of days trying to figure out why my clustering algorithm in Java wasn’t working properly – was working for some grids, but was totally messed up on others. I think I’ve tracked down why… the problem I think is to do with the precision of the datatypes I’m using in Java and MySQL. In Java I’m using the float type to store a latitude or longitude, then in the MySQL database I’m using decimal(15,12).

There appears to be a bit of a rounding problem somewhere along the line which I need to figure out. For some reason if I try and send a latitude of 52.2 from Java to be stored in the database, it gets stored as something like 52.2000007623. This means when I try and find all the points with latitude 52.2 it doesn’t pick up the record I’ve just inserted/updated.

I’ve not had much experience using this amount of decimal precision before, so the problems are probably due to my lack of understanding of the precision/accuracy of the various data types.

I think the way around this is to change my Java floats to be doubles for more precision and hopefully the problem won’t then occur  – not totally sure what to try if that doesn’t work, so please let me know if you have any suggestions :-)

Tuesday 11 April 06

Now that I’ve got most of the webservices etc sorted out for the JoinIn enterprise database, I started to look at creating the standalone interface for doing the grouping. I started off by creating a blanks struts application (on Tomcat 4.1- as I had been using for the JoinIN enterprise database), but soon noticed that I can’t use the filters in the servlet 2.3 spec that I’d have liked to have used for the page layout (header & footer), the reason being that the struts framework uses a forward to map the action in the URL to the actual jsp page. This functionality is only available in servlet spec 2.4, which then also means upgrading to Tomcat 5.5.

So, after a bit of ‘fun’, I’ve got both the JoinIn enterprise database/webservice and the standalone grouping interface now running on tomcat 5.5 on my tablet PC. I now need to make sure that I update the tomcat on our dev server and get both the apps running correctly on that too.

How a datasource is configured in tomcat 5.5 is slightly different to tomcat 4.1 too. Rather than creating a joinindb.xml file in the webapps folder to store the context data, this file must now be within the META-INF directory, called context.xml.

Monday 27 March 06

Have spent some time this morning looking at linking up a MySQL database up to Tomcat 4.1, using database connection pooling (DBCP). I had a fair bit of teething troubles getting it to work (using the instructions that I’d found to edit the tomcat server.xml)  as I kept getting the message that:

org.apache.commons.dbcp.SQLNestedException: Cannot create JDBC driver of class ” for connect URL ‘null’

It seems that this is quite a common problem (from all the forum postings about it), and different solutions appear to work for different people (depending I guess on exactly what version of tomcat, which database etc etc). So anyway, here’s what worked for me in the end…. (for info, here I’m using Tomcat v4.1 on Windows, and the MySQL connector in mysql-connector-java-3.1.12.jar – which I put in $CATALINA_HOME/common/lib/)

I did not edit the $CATALINA_HOME/conf/server.xml file, instead I created an xml file in the webapps directory named the same as my webapp that would like to use the database (joinindb.xml). This file looks contains this (for info, this is the entire file)

<Context path="/joinindb" docBase="joinindb"
        debug="5" reloadable="true" crossContext="true">

  <Logger className="org.apache.catalina.logger.FileLogger"
             prefix="localhost_JoinInDB_log." suffix=".txt"
             timestamp="true"/>

  <Resource name="jdbc/JoinInDB"
               auth="Container"
               type="javax.sql.DataSource"/>

  <ResourceParams name="jdbc/JoinInDB">
    <parameter>
      <name>factory</name>
      <value>org.apache.commons.dbcp.BasicDataSourceFactory</value>
    </parameter>

    <!– Maximum number of dB connections in pool. Make sure you
         configure your mysqld max_connections large enough to handle
         all of your db connections. Set to 0 for no limit.
         –>
    <parameter>
      <name>maxActive</name>
      <value>100</value>
    </parameter>

    <!– Maximum number of idle dB connections to retain in pool.
         Set to 0 for no limit.
         –>
    <parameter>
      <name>maxIdle</name>
      <value>30</value>
    </parameter>

    <!– Maximum time to wait for a dB connection to become available
         in ms, in this example 10 seconds. An Exception is thrown if
         this timeout is exceeded.  Set to -1 to wait indefinitely.
         –>
    <parameter>
      <name>maxWait</name>
      <value>10000</value>
    </parameter>

    <!– MySQL dB username and password for dB connections  –>
    <parameter>
     <name>username</name>
     <value>myusername</value>
    </parameter>
    <parameter>
     <name>password</name>
     <value>xxxxxxxxxx</value>
    </parameter>

    <!– Class name for mm.mysql JDBC driver –>
    <parameter>
       <name>driverClassName</name>
       <value>org.gjt.mm.mysql.Driver</value>
    </parameter>

    <!– The JDBC connection url for connecting to your MySQL dB.
         The autoReconnect=true argument to the url makes sure that the
         mm.mysql JDBC Driver will automatically reconnect if mysqld closed the
         connection.  mysqld by default closes idle connections after 8 hours.
         –>
    <parameter>
      <name>url</name>
      <value>jdbc:mysql://localhost:3306/joinin?autoReconnect=true</value>
    </parameter>
  </ResourceParams>
</Context>

I then added the following to my webapp web.xml file (this was added just before the closing </web-app> tag):

 <resource-ref>
      <description>DB Connection</description>
      <res-ref-name>jdbc/JoinInDB</res-ref-name>
      <res-type>javax.sql.DataSource</res-type>
      <res-auth>Container</res-auth>
</resource-ref>

I created a class to test the connection and the method that is used is as follows… (it just returns the name of the class):

public String testConnection() {
        String retvar="nothing here";
        try {
            Context context = new InitialContext();
            if (context == null ) {
                retvar = "Boom – No Context";
            }
            DataSource datasource = (DataSource)context.lookup("java:comp/env/jdbc/JoinInDB");
            if (datasource == null ) {
                retvar= "Boom – No DataSource";
            }
            retvar= "Database connected!";
            retvar = datasource.getConnection().getClass().getName();
    } catch (Exception e) {
            e.printStackTrace() ;
            retvar = "Database Not Available…. aarrgh …. why? – see the stack trace" ;
    }
        return retvar;
    }

When I called this method from a JSP in my webapp, the code run without error returned the value: org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper as the classname.

Friday 3 March 06

I’ve spent the last couple of days sorting out how we’re going to move some of our websites and databases to our new cluster server and database server, this all seems to be going quite well, and I’ve moved 4 of the sites over to the cluster already, but several more to go. Apart from that I’ve spent the last couple of weeks doing work on the JoinIn project and on our internal project planning database.

Thursday 16 February 06

Spent most of this week working on our internal project planning database, which is all going well – it’s moving forward much quicker than I thought it would take – so that’s always good!!

Had really good meeting today with Ernie from the LAMS team about our JoinIn project, and I’ll post up more information about that on the JoinIn blog/webpage