11 Dec 2012, 23:06

Postgres Array type

Beyond traditionnal data types (integer, text, boolean, etc), Postgres support some other interesting types (network, geographical, etc) and one I discovered a few weeks ago : arrays.

Idea is simple : instead of storing a single value in a colum, you can store several ones. Yes several ! You don't need to set up a "one to many relation" or "many to many relation" with others tables.

Simple example : let's imagine a blog application with posts.

Each post has :

  • A unique ID
  • A title
  • A content
  • Tags

Tags are generally managed via a "many to many relation" and thus would require at least two tables, nor to say three. (1 for post, 1 for tag and 1 for the post/tag relation)

With arrays in Postgres, you only need a single table :

create table post (
id         serial unique
title      text,
content    text,
tag        text[]

Any content type (at least traditional ones like text, integer, etc) can be an array, juste add "[]" after.

Back to my example ; let's create some content

INSERT INTO post(title, content, tag) VALUES ('Blog Post 1', 'Some Content for Post 1', '{"test", "post-1"}')
INSERT INTO post(title, content, tag) VALUES ('Blog Post 2', 'Some Content for Post 2', '{"test", "post-2", "another_test"}')
INSERT INTO post(title, content, tag) VALUES ('Blog Post 3', 'Some Content for Post 3', '{"test", "post-3", "a third test", "really !"}');

You can see that for each entry, the number of tags is not the same to illustrate that it can adapt to your needs if you did not define constraints.

Now, let's imagine you want to retrive post depending on tag value, quite easy :

# Select all posts with tags "test" and "post-1" (ie the first one)
select * from post where tag @> ARRAY['test', 'post-1']
# Select all post with tags "post-2" OR "post-3" (ie two last ones)
select * from post where tag && ARRAY['post-2', 'post-3']

For traditionnal schema, you may be used to use the "IN" statement for implementing the "OR" sample above, like :

select * from post where id in (1,2)

But for the "AND", a traditionnal approach would be  :

  1. Do a 1st query on post, filtering on the one having the tag "test"
  2. Do a 2nd query on the resultset from previous and filtering on "post-1"

If number of filters and join are known from beginning, it can be managed. If number of filters is dynamic (let's say user can filter on 1,2,3,...,n tags, you have to manage it on the fly.

With arrays in postgres, I find quite interesting to find a pure and simple SQL solution to manage both "And" and "Or" filtering on values and avoid also multiple tables or (complex) join queries. For dynamic queries (ie user will use from 1 to n tags, it would be a single function to manage it)

If you are interested in such content, I may speak later about hstore, the key/value system in Postgres.

PS : for those who know MongoDB, it can remain you the $all operatoir.

PS2 : You can also have a multiple dimension arrays with "text[][]" So for example, if you have a CRM database and want to keep the last monthly invoices for a customer for the last 3 years, you could define it this way : decimal[3][12]