Article Source: Read More
In an ongoing effort to help our partners write reports in Koha, I’d like to highlight the relationships between database tables in Koha.
A review of the basics of SQL
Just for review, let’s look at some elementary SQL. Let’s say that we have two tables, person
and person_dates
. The person
table contains an id number, the person’s first name, and the person’s surname, like so:
+----+---------+-----------+
| id | surname | firstname |
+----+---------+-----------+
| 1 | Doe | John |
| 2 | Doe | Jane |
+----+---------+-----------+
The person_dates
table has its own id number, but also contains a person_id
number that refers to the id number from the person
table:
+----+-----------+---------------+------------+
| id | person_id | label | date |
+----+-----------+---------------+------------+
| 1 | 1 | Date of Birth | 1970-11-22 |
| 2 | 1 | Married | 2004-05-12 |
| 3 | 2 | Married | 2004-05-12 |
| 4 | 2 | Date of Birth | 1975-12-26 |
+----+-----------+---------------+------------+
We can look up all of the values for firstname
in the person
table by using a SELECT
statement:
SELECT
firstname
FROM
person
Which would give us
+-----------+
| firstname |
+-----------+
| John |
| Jane |
+-----------+
The sections in a SQL query are known as ‘clauses’. Fields in the SELECT clause are separated by commas. If we want to see both the first name and the surname, we can write
SELECT
firstname,
surname
FROM
person
Which would give us
+-----------+---------+
| firstname | surname |
+-----------+---------+
| John | Doe |
| Jane | Doe |
+-----------+---------+
If we only want to see that information for Jane Doe, we can use a WHERE
clause:
SELECT
firstname, surname
FROM
person
WHERE
firstname = 'Jane'
+-----------+---------+
| firstname | surname |
+-----------+---------+
| Jane | Doe |
+-----------+---------+
Running queries on a single table doesn’t give us much information… who are John and Jane Doe? What’s their relationship? Inquiring minds want to know! Taking a page from the SELECT
clause, let’s list the tables separated by commas:
SELECT
*
FROM
person,
person_dates
+----+---------+-----------+----+-----------+---------------+------------+
| id | surname | firstname | id | person_id | label | date |
+----+---------+-----------+----+-----------+---------------+------------+
| 1 | Doe | John | 1 | 1 | Date of Birth | 1970-11-22 |
| 1 | Doe | John | 2 | 1 | Married | 2004-05-12 |
| 1 | Doe | John | 3 | 2 | Married | 2004-05-12 |
| 1 | Doe | John | 4 | 2 | Date of Birth | 1975-12-26 |
| 2 | Doe | Jane | 1 | 1 | Date of Birth | 1970-11-22 |
| 2 | Doe | Jane | 2 | 1 | Married | 2004-05-12 |
| 2 | Doe | Jane | 3 | 2 | Married | 2004-05-12 |
| 2 | Doe | Jane | 4 | 2 | Date of Birth | 1975-12-26 |
+----+---------+-----------+----+-----------+---------------+------------+
Well… that doesn’t look quite right… we’ve got two people, but four lines that say Date of Birth
. Unless John and Jane Doe have the rather exceptional quality of being born twice, there’s a problem with our logic.
Let’s do a bit of trouble shooting…
SELECT
*
FROM
person_dates;
+----+-----------+---------------+------------+
| id | person_id | label | date |
+----+-----------+---------------+------------+
| 1 | 1 | Date of Birth | 1970-11-22 |
| 2 | 1 | Married | 2004-05-12 |
| 3 | 2 | Married | 2004-05-12 |
| 4 | 2 | Date of Birth | 1975-12-26 |
+----+-----------+---------------+------------+
So … we know that John Doe’s Date of Birth is 1970-11-22, but line 4 shows his date of birth as 1975-12-26. Anything stand out on that line? Ah! the person_id
is 2
… that belongs to Jane Doe. So maybe we need to add a where clause:
SELECT
*
FROM
person, person_dates
WHERE
person.id = person_dates.person_id
+----+---------+-----------+----+-----------+---------------+------------+
| id | surname | firstname | id | person_id | label | date |
+----+---------+-----------+----+-----------+---------------+------------+
| 1 | Doe | John | 1 | 1 | Date of Birth | 1970-11-22 |
| 1 | Doe | John | 2 | 1 | Married | 2004-05-12 |
| 2 | Doe | Jane | 3 | 2 | Married | 2004-05-12 |
| 2 | Doe | Jane | 4 | 2 | Date of Birth | 1975-12-26 |
+----+---------+-----------+----+-----------+---------------+------------+
Ah, that looks better!
It turns out, there’s a different way of writing the FROM
clause, that makes this more explicit:
SELECT
*
FROM
person
INNER JOIN person_dates ON person.id = person_dates.person_id
Let’s say that we’re doing some detective work… we’re thinking that people who are married on the same day are probably married to each other. We can use a WHERE
clause to suss that out:
SELECT
*
FROM
person
INNER JOIN person_dates ON person.id = person_dates.person_id
WHERE
label = 'Married'
+----+---------+-----------+----+-----------+---------+------------+
| id | surname | firstname | id | person_id | label | date |
+----+---------+-----------+----+-----------+---------+------------+
| 1 | Doe | John | 2 | 1 | Married | 2004-05-12 |
| 2 | Doe | Jane | 3 | 2 | Married | 2004-05-12 |
+----+---------+-----------+----+-----------+---------+------------+
Oh, look they’re married to each other. How sweet.
… and …
SELECT
*
FROM
person
WHERE
id > 2
+----+---------+-----------+
| id | surname | firstname |
+----+---------+-----------+
| 3 | Doe | J.K. |
+----+---------+-----------+
The miracle of life!
Now, let’s try an INNER JOIN
again … we won’t use the WHERE
clause, because we know that little J.K. is too young to be married…
SELECT
*
FROM
person
INNER JOIN person_dates ON person.id = person_dates.person_id
+----+---------+-----------+----+-----------+---------------+------------+
| id | surname | firstname | id | person_id | label | date |
+----+---------+-----------+----+-----------+---------------+------------+
| 1 | Doe | John | 1 | 1 | Date of Birth | 1970-11-22 |
| 1 | Doe | John | 2 | 1 | Married | 2004-05-12 |
| 2 | Doe | Jane | 3 | 2 | Married | 2004-05-12 |
| 2 | Doe | Jane | 4 | 2 | Date of Birth | 1975-12-26 |
+----+---------+-----------+----+-----------+---------------+------------+
Wait? Where’s little J.K.??? Before we call the cops, let’s take a careful look around…
SELECT count(*) FROM person WHERE firstname = 'J.K.'
Shows count(*)
is 1, so we know that ‘J.K.’ is still in the person
table. (oh, thank goodness, I would never forgive myself… ). So what happened?
SELECT * from person_dates where person_id = 3;
Returns no rows… the doctor hasn’t filled out the birth certificate yet. In this case, we want a LEFT JOIN
, instead of an INNER JOIN
:
SELECT
*
FROM
person
LEFT JOIN person_dates ON person.id = person_dates.person_id
+----+---------+-----------+----+-----------+---------------+------------+
| id | surname | firstname | id | person_id | label | date |
+----+---------+-----------+----+-----------+---------------+------------+
| 1 | Doe | John | 1 | 1 | Date of Birth | 1970-11-22 |
| 1 | Doe | John | 2 | 1 | Married | 2004-05-12 |
| 2 | Doe | Jane | 3 | 2 | Married | 2004-05-12 |
| 2 | Doe | Jane | 4 | 2 | Date of Birth | 1975-12-26 |
| 3 | Doe | J.K. | | | | |
+----+---------+-----------+----+-----------+---------------+------------+
… and J.K. Doe makes an appearance!
A Koha specific example
Most SQL queries are of the form SELECT ... from table1 LEFT JOIN table2 ... WHERE ...
there are a few more wrinkles that can be thrown in, but that’s all you need to know about SQL. Here’s an example of finding out who checked out what, and when:
SELECT
p.firstname,
p.surname,
b.title,
co.issuedate as 'checked out'
FROM
borrowers p
LEFT JOIN issues co using (borrowernumber)
LEFT JOIN items i using (itemnumber)
LEFT JOIN biblio b using (biblionumber)
There are a couple of things to notice here: in the SELECT
clause, we’ve written p.firstname
… the p
in this case is an alias, set in the FROM
clause. Writing borrowers p
says, essentially “We’re calling the borrowers table p
“.
A word about aliases
Common aliases used in queries against the Koha database:
biblio b
orbiblio bib
items i
biblioitems bi
statistics s
borrowers p
— for patronsissues co
— for check-outsreserves h
— for holdsdeleteditems di
— for deleted itemsborrower_attributes a
— for attributeaqorders o
— for orders
These are not required, but if you do use aliases, these work well, because they don’t conflict with each other (i.e. you can’t use biblio b
and borrowers b
in the same query).
The use of aliases isn’t strictly necessary… you could write SELECT borrowers.firstname, borrowers.surname, biblio.title...
, and the query would work just as well. In fact, as long as the column names are unique, you can leave off the table names entirely: SELECT firstname, surname, ...
. There are cases where the column names are not unique … for instance, both borrowers and biblio have a title
field. They’re entirely un-related. In this case, you must specify which table you’re asking for:
SELECT
firstname,
surname,
title,
issuedate as 'checked out'
FROM
borrowers p
LEFT JOIN issues co using (borrowernumber)
LEFT JOIN items i using (itemnumber)
LEFT JOIN biblio b using (biblionumber)
Gives the error message Column 'title' in field list is ambiguous
.
The other thing to note is that rather than writing LEFT JOIN issues co ON co.borrowernumber = p.borrowernumber
, we’re using LEFT JOIN statistics co using (borrowernumber)
. This also has the advantage of making SQL aware of the fact that borrowernumber is exactly the same between both tables … i.e. it’s unique. You can only do this if the column names are exactly the same between tables.
The rubber meets the road
You can almost always join on biblionumber
, itemnumber
, borrowernumber
or branchcode
. In Koha, there are 42 tables that use borrowernumber
, 32 that use branchcode
, 26 that use biblionumber
and 20 that use itemnumber
. This makes it very easy to join the biblio
, biblioitems
, items
and borrowers
tables with any other table that uses one of those keys.
The biblio
table is the basic unit of bibliographic information — it contains biblio.title
, which is the best way to show the title of a book in Koha. The items
table shows actual copies that show up in Koha. The issues
table shows currently checked out items — it contains both itemnumber
and borrowernumber
Notice that the items
table above is in the FROM
clause, but not mentioned in the SELECT
clause… this is necessary because there is no biblionumber
field in the issues
table. The art of writing tables lies in thinking about how to connect tables such as borrowers
and items
or biblio
and issues
.
In the case of Borrowers and Items, you have to think “What does a borrower do with an item?” … the answer is that a borrower checks out an item… so we look at the checkouts table … which, when Koha was designed, was thought of as ‘issuing an item to a borrower’, so we look in the issues table. Looking at the issues table, you can see that it has both borrowernumber
and itemnumber
, so it’s well suited to joining the two.
The case of biblio
and issues
is slightly more subtle. In this case, The biblio record contains information, but it doesn’t point to anything real that can be checked out… the biblio record is more like the idea of a book rather than a paper copy that you can lay your hands on. The item record actually points to the physical copy… you can’t check out the idea of a book, you have to take the physical book to the circ desk and check that out … and that’s what the item record represents… so the item record has an itemnumber as well as a biblionumber. The issues table only contains the itemnumber. If you want to see the title of the book that was checked out, you have to join biblios to issues via items.
Less obvious joins
After you get away from the big four columns biblionumber
, itemnumber
, borrowernumber
and branchcode
, things get a bit more challenging. Here are a few more that you should know about.
Authorized Values
This is a lookup table for various codes in the system — it holds human readable text to be displayed on the staff client or opac, for various codes in the system like collection codes or shelving location. Be aware, when using this table that it uses the British spelling authorised_values
rather than the American authorised_values
. The table consists of a category
field, which specifies whether you’re looking up collection codes ( category ccode
), lost statuses (category lost
), shelving locations (category loc
), etc. The column authorised_value
will match to the code being looked up in the database, and the columns lib
and lib_opac
are the values that display in the staff client and opac, respectively.
Items
items.itype
is used instead ofitems.itemtype
. No one knows why.items.homebranch
,items.holdingbranch
. Both can join tobranchcode
.items.itemlost
links toauthorised_values.authorised_value
whereauthorised_values.category = 'lost'
.items.location
links toauthorised_values
usingcategory = 'loc'
.items.ccode
(Collection code) links toauthorised_values
usingcategory = 'ccode'
.
Serials
The subscription
table is the backbone of the serials module. subscriptionid
is the primary key, analogous to biblionumber
for a biblio record – it is unique id). A subscription links to a bib record via biblionumber
(allowing us to see the title). The subscription table also tracks the frequency with which serials are supposed to arrive (via subscription_frequencies
) and the numbering of the serials (via subscription_numberpatterns
).
Individual issues for a subscription are tracked as rows in the serials
table. These may link to items using the serialitems
table (if the subscription is set to create item records upon receipt of an issue). The serial table contains the key serialid
as well as the subscriptionid
and biblionumber
(though not as foreign keys).
Logs
The action_logs
table shows information logged when various things change in Koha — bibs and items cataloged, borrower information changed, item check-in and check out, system preference changes, etc. Because so many different aspects of Koha can be logged, it’s not always clear how action_logs
links to other tables. You can find a table showing what logs are available, how they’re enabled, and what fields are linked to in the action_logs section of the Koha Reports Library.
Zebraqueue
The zebraqueue
table contains information about whether or not an item has been indexed and is search-able in Koha. Because the zebra queue contains information about bout biblio data and authority record data, it must join to biblionumber
via zebraqueue.biblio_auth_id
.
Where to go from here.
There are a few advanced topics in SQL that I didn’t cover here — some operations such as min
, max
, sum
and count
are called “aggregate operations” … i.e. they work on groups of data rather than individual rows. These often require the use of a GROUP BY
clause. You can find a simple example at Circ actions on date, which specifies a date in the where clause, then counts circulation grouped by type (issue, renew, return, etc.).
Querying MARC data: biblioitems.marcxml
contains MARC data in XML format. This can be queried using MySQL’s ExtractValue
function. See Query MARC. Please note that in Koha 17.05, MARC XML data will be stored in a table called biblio_metadata
. Koha will have a tool available which will convert your old biblioitems.marcxml
queries to use the new meta data table instead.
Cheat Sheet
Most queries have the form
SELECT
A.aaa,
B.bbb
FROM
ASDF A
LEFT JOIN BEEN_THERE_DONE_THAT B using (key)
WHERE
A.xyz = 100
LEFT JOIN
allows you to see the results from A and B, even if there is no matching row in B.(you will see blank fields on any rows selected from B).
INNER JOIN
will not show any rows if data in B does not match (think about poor J.K. Doe, missing because her birthday hadn’t been entered into the database yet).
If you are joining on two tables where the names of the keys differ, you must use the syntax ... on T1.some_key = T2.some_other_key
rather than ... USING (nifty_key)
Many tables in Koha can be joined by itemnumber
, biblionumber
, borrowernumber
or branchcode
.
These aliases are used a lot by the community because they don’t conflict with each other:
biblio b
or biblio bib
, items i
, biblioitems bi
, statistics s
, borrowers p
, issues co
, reserves h
, deleteditems di
, borrower_attributes a
, aqorders o
.