August 24, 2013

James Bond and the CUBRID search



Ok – so today you are James Bond who just got itself (again!) messed into a very dangerous endeavor and one of your tasks is to find some information hidden in a CUBRID database! :)
There is some data in there that you need to get your hands on, and the only thing you know is that the text you are looking for contains a special word - “Estiarte”.
... and just to make things worse, you don’t have any Hollywood-magic USB stick which copies and indexes 1 TB in 1 sec!!! (ha-ha) …or any other spy-wonder-tools…!
But you are not that helpless… …because you already know some things about CUBRID, right? :)

So you broke into the mystical computer’s room, and you have right in front of you a nice, (already unlocked!) CUBRID SQL command prompt window – so let's START! …just make sure you don’t waste any time – the bad guys are coming for you…!! (If you wonder why it’s unlocked, that’s because the administrator forgot to disable the “public” user or to change its default password!)



First of all, James Bond needs a plan – because he is always prepared!!!… So let’s work on the plan…
What do we need to be able to discover the text which contains the “keyword”?

  • We need to write down some SELECT(s) using a LIKE clause
  • We need to know on which columns we will execute the SELECT(s)
  • We need to know on which tables we will execute the SELECT(s)

Also, we need to look into:

  • Minimizing the effort – time is an issue here!
  • Optimize the execution of SQLs, in any way we can

The list of user tables can be obtained simple as that, from the db_class system metadata table:

SELECT * FROM db_class WHERE class_type='CLASS' and is_system_class='NO';



And the list of the columns can be similarly obtained by querying the db_columns metadata table.

But let’s do better – let’s combine tables and columns in a single query, to find out what columns and from which tables we need to query:

SELECT
 class_name, attr_name
FROM
 db_attribute
WHERE
 class_name IN (select class_name from db_class where class_type='CLASS' and is_system_class='NO')
ORDER BY class_name ASC, attr_name ASC



Another piece of the puzzle is how we do the LIKE filter:

SELECT * FROM code WHERE s_name LIKE '%Estiarte%'

Please note that for simplicity we are assuming that the column is of type CHAR/VARCHAR/STRING.
But what if isn’t…? Shouldn’t we do a TO_CHAR….? (if we don’t, http://www.youtube.com/watch?v=ks7-A-7Zvak… :) )

OK – so we have the tables, the columns and we know how to search for our keyword within the columns data.
How do we assemble these pieces efficient, simple and fast…?

Well, what you can do immediately is to generate the set of SQL queries which must be used in the CUBRID search:

SELECT
 CONCAT('SELECT * FROM ', class_name, ' WHERE ', attr_name,  ' LIKE ''%Estiarte%'';') AS qry
FROM
 db_attribute
WHERE
 class_name IN (select class_name from db_class where class_type='CLASS' and is_system_class='NO')
ORDER BY class_name ASC, attr_name ASC

What we obtain is a query for each column in the database that might contain our information:



So far so good! But we can do better than copy/paste/execute, for each SELECT returned!

Final piece – how to execute all these in a "better" way…? Let’s see – there are a couple of options:

  • Use the PREPARE & EXECUTE CUBRID functions. But in our case, this will not work, because PREPARE does not support multiple SQL statements.
  • Save the results in an external file, eliminate the surrounding ‘-s, copy back the commands to the SQL prompt and wait for the execution to complete… Simple, isn’t it? :)
  • Find a way to have just one single query!  How do we do that…? Not that complicated – using GROUP_CONCAT and UNION!

SELECT
 GROUP_CONCAT(CONCAT('SELECT TO_CHAR(`',  attr_name,'`) FROM `', class_name, '` WHERE `', attr_name,  '` LIKE ''%Estiarte%''') SEPARATOR ' UNION ')
FROM
 db_attribute
WHERE
 class_name IN (select class_name from db_class where class_type='CLASS' and is_system_class='NO')
ORDER BY class_name ASC, attr_name ASC

(Please note that we did some other things as well, like surrounding the column and the tables’ names with `, to prevent trouble when we are dealing with CUBRID reserved words.)

We will obtain a single SELECT we can execute in one step… …and that’s it – Estiarte was found!



Final thoughts – all these can be further optimized, and that is exactly what the James Bond would most probably do in a next similar task:
       
  •  Eliminate from the queries those columns which can’t contain the keyword:
    • The NUMERIC columns
    • The DATE, TIME columns
    • Etc.
  • Eliminate columns duplicates from search; Foreign keys columns, for example.


That’s it – James Bond is happy for now – he completed the mission and he can move on, to the next one.
Now, can you find a better way to solve this task…?

In the meantime, see you next time – Bye-bye! :)
/Dee

P.S. Hey, did you figured out what or who is “Manuel Estiarte”…? :D