In a previous blogpost, we talked about barcodes and EAN-numbers, and how in some cases barcodes with different numbers still reference the same product. The solution for searching there was not very performant, and wouldn’t scale to more than a handful of barcodes to check.
This post describes how to store barcodes in the PostgreSQL database in a normalized way, so that searching can be done without slowing down queries. As a bonus, we get a stronger uniqueness constraint.
Let’s quickly summarize why this is important again. Barcodes, we’ll focus on EAN-numbers, consist of
a 3-digit country code, and a product identifier. Some of these country codes are reserved, however,
and each country can use them as they wish. In The Netherlands, the country codes
28 are reserved for codes that include a short product indentifier as well as a price or weight.
This allows for example supermarkets to weigh apples, and print a barcode with product identifier
Since we are just interested in the product identifier, and want to find any matching product (regardless its price or weight); and we want to be sure that we only have one product in our database, we’d like to store barcodes without price or weight.
First we need a set of functions to normalize a barcode, so that for partial barcodes any price or weight is cleared:
--- number of digits of the product identification part of the barcode CREATE FUNCTION ean13_nl_id_length(text) RETURNS integer LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE AS $_$ BEGIN RETURN CASE WHEN LENGTH($1) = 13 THEN CASE LEFT($1, 2) WHEN '21' THEN 6 WHEN '22' THEN 6 WHEN '23' THEN 7 WHEN '28' THEN 6 ELSE 13 END ELSE LENGTH($1) END; END; $_$; --- barcode with weight or price removed CREATE FUNCTION ean13_nl_id(text) RETURNS text LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE AS $_$ BEGIN RETURN LEFT($1, ean13_nl_id_length($1)); END; $_$; --- normalized barcode CREATE FUNCTION ean13_nl_normalize(text) RETURNS text LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE AS $_$ BEGIN RETURN CASE WHEN LENGTH($1) = 13 THEN RPAD(ean13_nl_id($1), 13, '0') ELSE $1 END; END; $_$;
What do these functions do? Let’s find out:
SELECT ean, ean13_nl_id_length(ean) AS id_length, ean13_nl_id(ean) AS id, ean13_nl_normalize(ean) AS normalized FROM UNNEST(ARRAY['1234', '12345', '0000000000000', '2201234567890', '2301234567890']) ean; --- => ean id_length id normalized --- => '1234' 4 '1234' '1234' --- => '12345' 5 '12345' '12345' --- => '0000000000000' 13 '0000000000000' '0000000000000' --- => '2201234567890' 6 '220123' '2201230000000' --- => '2301234567890' 7 '2301234' '2301234000000'
ean13_nl data type
In PostgreSQL, one can define a new data type (which is called a
DOMAIN), and place restrictions on
its contents. This references the last function we just created:
CREATE DOMAIN ean13_nl AS text CONSTRAINT ean13_nl_check CHECK ((VALUE = ean13_nl_normalize(VALUE)));
Setup a sample database
Let’s put it to use in a table:
CREATE TABLE barcodes (barcode ean13_nl PRIMARY KEY, name VARCHAR); INSERT INTO barcodes VALUES (ean13_nl_normalize('8712439020109'), 'Jori Pindakaas 350g'), (ean13_nl_normalize('8713576100129'), 'TerraSana Pindakaas 250g'), (ean13_nl_normalize('2164941000227'), 'EkoPlaza Appel Elstar'), (ean13_nl_normalize('2232072000000'), 'Jumbo Jonagold'), (ean13_nl_normalize('2232140001472'), 'Jumbo Elstar Appels'), (ean13_nl_normalize('125645'), 'Jumbo Appel Jonagold');
Let’s have a look at what’s in the table now:
SELECT * FROM barcodes`; --- => '8712439020109' 'Jori Pindakaas 350g' --- => '8713576100129' 'TerraSana Pindakaas 250g' --- => '2164940000000' 'EkoPlaza Appel Elstar' --- => '2232070000000' 'Jumbo Jonagold' --- => '2232140000000' 'Jumbo Elstar Appels' --- => '125645' 'Jumbo Appel Jonagold'
Note that it isn’t possible to insert a non-normalized barcode, which helps to maintain consistency:
INSERT INTO barcodes VALUES ('2164941000238', 'Foo'); --- ERROR: value for domain ean13_nl violates check constraint "ean13_nl_check"
Find by barcode
Just make sure to feed each barcode through the
SELECT name FROM barcodes WHERE barcode = ean13_nl_normalize('2232140001481'); --- => 'Jumbo Elstar Appels'
One can also join by barcode:
SELECT a.* FROM barcodes a INNER JOIN UNNEST(ARRAY['8712439020109', '2232140001481', '21987']) b ON a.barcode = ean13_nl_normalize(b); --- => '8712439020109' 'Jori Pindakaas 350g' --- => '2232140001472' 'Jumbo Elstar Appels'
Finding multiple barcodes
To make it just a bit easier to find multiple barcodes, one can create another function:
--- normalized barcodes CREATE FUNCTION ean13_nl_normalize(text) RETURNS text LANGUAGE plpgsql IMMUTABLE AS $_$ BEGIN RETURN ARRAY(SELECT DISTINCT ean13_nl_normalize(g.i) FROM unnest($1) g(i)); END; $_$;
This can be used to search for multiple barcodes:
SELECT * FROM barcodes WHERE barcode = ANY(ean13_nl_normalize(ARRAY['8712439020109', '2232140001481', '21987'])); --- => '8712439020109' 'Jori Pindakaas 350g' --- => '2232140001472' 'Jumbo Elstar Appels'
This may be slightly easier in complex queries.