Comparison between popular go libraries and ORM for database access layer.
We look over how normal and abnormal usage of these libraries and ORM helps protect (or not!) against
SQL injection. Remember that we accept various query parameters when listing a resource, ‘users`
records in particular. The result is limited to a maximum of 30 records. We are going to try to list
all records in one resource by formulating sneaking a malicious query parameter in the request.
http GET http://localhost:3080/api/sqlx/user?favourite_colour=blue;select%20*%20FROM%20users;,--;
We know that favourite_colour is a valid and accepted query parameter. We are going to see if
appending the url starting with an apostrophe, then a semicolon ;, followed by a select all
(select * FROM users), closing off with another semicolon, and an sql comment (--) will work.
apostrophe: Common way of closing off a string concatenation
semicolon: SQL way of closing of the previous sql statement
select%20*%20FROM%20users; select everything in users table. %20 is a space
double dash: SQL way of commenting out the rest of SQL queries.
This malicious URL is not the only way of trying to inject SQL queries. Others include using
AND 1=1 which means the query is always true, and the dreaded ';DROP DATABASE users-- which
mean adding a second SQL query to drop users table. You can try out with different urls by running
the examples in the provided example/rest.http file.
The post is incomplete without the addition of this comic:
There are two things we want to see how these libraries and ORMs behave. The first being if
malicious sql gets injected, and second how do they behave and the output that gets returned.
Testing against sqlx is the most interesting amongst all libraries and ORMs because we have made some
dodgy examples with string concatenations.
At the point of QueryxContext(), the fullQuery is correctly formed but the arguments get mangled
with the attempted sql injection.
fullQuery:
SELECT*FROM users WHERE favourite_colour = $1ORDERby id LIMIT30OFFSET0
arguments[0]:
blue;select * from users;,--;
It gives of an error saying invalid input value for enum valid_colours: "blue;select * from users;,--;"
This error comes from the pgx sql driver that we used.
Stepping back at the handler layer in db/filters.go file, the url.Values.Get("favourite_colour")
that we used when parsing query parameter simply return user’s input which why the first argument
also contains that string.
Fortunately sqlx returns an error although for a different reason,
invalid input value for enum valid_colours: "blue';select * from users;,--;"
Note that we can make a prepared statement like below but, it does nothing to prevent this type
of sql injection because the malicious part is in the argument, instead of this sql query.
stmt, err := r.db.PrepareContext(ctx, fullQuery)
if err !=nil {
returnnil, err
}
// no error
sqlc
Curiously, sqlc returns neither a result nor an error at all. The first_name filter is set to be
Bruce';select * FROM users;,--; and sqlc returns no result and no error at all. We get an empty
array as a response.
http GET http://localhost:3080/api/sqlc/user?first_name=Bruce;select%20*%20FROM%20users;,--;
squirrel
GET http://localhost:3080/api/squirrel/user?favourite_colour=blue';select%20*%20FROM%20users;,--;
Just like sqlx, it returns an error saying invalid input. It simply does not accept such parameter
and returns both an error and empty array result.
invalid input value for enum valid_colours: "blue';select * FROM users;,--;"
gorm
Like sqlx and squirrel, gorm also returns the same error coming from pgx sql driver.
invalid input value for enum valid_colours: "blue;select * FROM users;,--;"
models: failed to assign all query results to User slice
It appears that the query did run, but failed when scanning the results to a struct. To find out
what query that has been generated, we switch on the debug mode with boil.DebugMode = true
SELECT * FROM "users" WHERE ("users"."favourite_colour" = $1) ORDER BY id;
[{blue;select * from users;,--; true}]
But this is not clear if the argument is executed in the database. The next thing we can do is to
enable logging for our database, and we can check out its standard output result.
-- enable logging for database db_test
ALTERDATABASE db_test SET log_statement ='all';
Watch the logs at standard output with docker logs -f db_container and re-run the HTTP request.
2022-09-14 12:08:33.858 UTC [173403] ERROR: invalid input value for enum valid_colours: "blue;select * from users;,--;"
2022-09-14 12:08:33.858 UTC [173403] STATEMENT: SELECT * FROM "users" WHERE ("users"."favourite_colour" = $1) ORDER BY id;
It appears that the select * from users; statement did not run in the database.
ent
Ent on the other hand silently return an empty array because it cannot find any match for the value
blue;select * FROM users;,--; from our favourite_colour column.
Sql injection vulnerability in a codebase should not happen anymore. There are many well known best
practices to prevent this security vulnerability, among others is to always sanitise user’s input in
the handler layer, where user’s input are validated. And secondly, to parameterize sql arguments
instead of joining the strings manually by using placeholders, either with question marks (?) for
mysql or dollar signs ($) for postgres.
Among all these behaviors, none executed the malicious injected sql query. In terms of error
handling, I would prefer to receive an error when we receive invalid or malicious
http request. The reason being that we can log the request in the error handler. So the worst are
sqlc and ent where they do not return any error, but sqlx being the worse between the two because
it returns the first 10 records nevertheless.
Conclusion
What initially started to have eight posts has become twelve over the course of a year. But I hope
I have covered common use cases when interacting with a database.
We have seen how easy an ORM especially ent and gorm when it comes to eager-loading child
relationships. Writing code with ent is much easier because of everything is types, so you benefit
from code hinting from IDE - as well as avoiding typo when dealing with magic strings in gorm.
There is a significant learning curve because they are another systems to learn. In my opinion, it is
worth it compared to writing many-to-many by hand using sqlx or sqlc, but not if you do not need
dynamic queries or eager-loading.
If you need dynamic queries, a pure query builder like squirrel is excellent. Writing with it feels
close to writing a sql query. However, it doesn’t codegen so there are magic strings everywhere.
Also, usage of structs for equality and like are strange. sqlboiler is better, and you gain
code completion hints because it knows your database schema. Unfortunately, it falls short at
many-to-many relationship.
Finally, we have seen how powerful raw SQL is. The examples I have given are rather simple but if
you are required to write complex queries, you may find that it is hard to translate to an ORM, or
worse, not supported at all. In many cases, you often test that actual query before converting because
you want to make sure that it is working as intended. If you do not need dynamic query or eager-loading,
why not just use that working raw query? Everyone already knows sql anyway, and you already know
how to protect against sql injection, right?
For my first technical blog series, it took me a year of writing, researching, and re-writing many
times before I finally settle and feel like I can publicise. I hope you find these posts useful as
to they have to mine.
Comparison between popular go libraries and ORM for database access layer.
Typically, to make a transaction, we first start by calling BeginTx() method
in database/sql package which returns a Tx transaction struct.
Following the pattern from the Go blog at
https://go.dev/doc/database/execute-transactions,
we defer the rollback operation immediately after starting a transaction and ends with a commit.
This ensures that a rollback is always called even if any error or panic happens in the function.
If you carefully analyse the pattern above, you will notice that a rollback will happen after a commit.
Fortunately, if that transaction is already done, Go will ignore the rollback and does not make
another round trip to the database - Rollback() will be ignored if the transaction has been
committed as seen from the source code,
// rollback aborts the transaction and optionally forces the pool to discard
// the connection.
func (tx *Tx) rollback(discardConn bool) error {
if !atomic.CompareAndSwapInt32(&tx.done, 0, 1) {
return ErrTxDone
}
...
While knowing the pattern above is important, we want to know a couple of things when it comes to
creating a transaction across multiple SQL queries.
Reuse existing methods, collect, and run them in a transaction.
Adjust transaction isolation levels.
It would be nice if we can just reuse existing methods because why should we repeat ourselves if
the operations are the same, but we needed a transaction over them.
Understanding isolation level is essential when deciding which level we want our transaction to run
on. Isolation is the ‘I’ in ACID. There are four isolation
levels in SQL. In reality, there are only three in postgres as Read Uncommitted behaves like
Read Committed. You can read more about isolation level
in further details. The default in postgres is read committed.
This means that a SELECT query will only see data committed before the query began. In other
words, it will not see any in-progress uncommitted changes.
In this post, we will see how these libraries and ORMs fare with creating and using a transaction as
well as changing an isolation level.
Immediately defer Rollback(). Error is ignored because if rollback fails, it is not going to be
committed anyway.
defer tx.Rollback()
Call sql queries
var u db.UserDB
err = tx.GetContext(ctx, &u, getSQLQuery, userID)
Commits
if err = tx.Commit(); err !=nil {
returnnil, err
}
Note that it is possible to use the Begin() function in database/sql package. We use sqlx’s
Beginx() because we wanted to use GetContext() convenience function. Otherwise, Begin() is
perfectly fine.
To answer the two questions we set in the opening introduction, let us see how they do.
Notice that we cannot compose existing Get() and Update() methods together and apply
a transaction on them. We needed to either change the implementation of both methods to accept a tx
transaction variable, or make the method receiver to include tx transaction.
Moving on to changing transaction level, there are two ways of doing it.
Execute an sql query, for example tx.Exec('SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;')
Use sql.TxOptions{} struct from database/sql package.
There is no drama when setting the isolation level through an sql query:
In the second option, the built-in package database/sql provides a struct called sql.TxOptions
in which you can set the isolation level and read-only value.
However, there is a caveat. We are now using a method from database/sql instead of sqlx. That
means, we cannot use sqlx’s convenience scanning features like GetContext() that automatically
scans results into a Go struct.
sqlc
The pattern for transaction is similar to sqlx’s by using methods provided by database/sql
package with the addition of a constructor that accepts a transaction.
The secret sauce is qtx := pg.New(tx). Unlike sqlx, now we can compose existing GetUser() and
UpdateUser() methods with this tx variable transaction.
Just like sqlx, we can either use BeginTx() method to include isolation level or run an sql
query as usual.
squirrel
Like sqlx and sqlx, we need to use the underlying transaction struct of database/sql.
If you look at the code below, it is similar to how we did a Get operation, but in the query
builder, we tell squirrel to use tx transaction instead by passing it to the RunWith() method.
tx, err := r.forTx.Beginx()
if err !=nil {
returnnil, fmt.Errorf("fails to start transaction error: %w", err)
}
defer tx.Rollback()
rows := r.db.
Select("*").
From("users").
Where(sq.Eq{"id": id}).
RunWith(tx). // <- ensures that you use transaction.
QueryRowContext(ctx)
..._ = tx.Commit()
using RunWith() means we cannot compose existing methods. Fortunately, setting isolation is as
simple as executing the sql query.
gorm
There are two ways do perform database transactions, the first being like the pattern described in
the Go blog, and the second way is through anonymous function.
Anonymous Function
Transaction() accepts an anonymous function and provides transaction tx from gorm.
var u User
err := r.db.Transaction(func(tx *gorm.DB) error {
if err := tx.First(&u).Error; err !=nil {
return err
}
u.FirstName = req.FirstName
u.MiddleName = req.MiddleName
u.LastName = req.LastName
u.Email = req.Email
u.FavouriteColour = req.FavouriteColour
err := tx.WithContext(ctx).Save(&u).Error
if err !=nil {
return err
}
returnnil})
if err !=nil {
returnnil, err
}
return&u, nil
This approach means no composing of existing methods. To change isolation level, simply execute said
sql query.
Sqlboiler
Like above, sqlboiler can accept a transaction instead of normal sql.DB thanks to sqlboiler’s
methods accepting ContextExecutor interface.
tx, err := r.db.BeginTx(ctx, nil)
if err !=nil {
returnnil, fmt.Errorf("fails to start transaction error: %w", err)
}
defer tx.Rollback()
user, err := models.FindUser(ctx, tx, id) // pass in tx instead of sql.DB
if err !=nil {
returnnil, err
}
..._ = tx.Commit()
However, like squirrel, you cannot compose existing Get() and Update() methods. We would need
to refactor both Get() and Update() methods.
For example, we can add a new field in our database struct so that we can ‘save’ a transaction to
be used in our refactored Get() method.
type database struct {
db *sqlx.DB
exec boil.ContextExecutor // create a new field
}
Set the exec field with this transaction.
tx, err := r.db.BeginTx(ctx, nil)
r.exec = tx // Set our executor with this transaction
Our refactored Get() method now uses exec instead.
It begins by creating a transaction. Any errors that happen within the calling the anonymous
function fn(tx) will be rolled back. Otherwise, transaction is committed.
To use, we call Tx() function that returns an error. Within the business logic, we replace
all r.db with the tx parameter. Note that we needed to declare modesl.User outside of
this function, and assign the result we wanted with u = user.
func (r *database) TransactionUsingHelper(ctx context.Context, id int64, req db.UserUpdateRequest) (*models.User, error) {
u :=&models.User{}
err :=Tx(ctx, r.db.DB, func(tx *sql.Tx) error {
user, err := models.FindUser(ctx, tx, id)
if err !=nil {
return err
}
user.FirstName = req.FirstName
user.MiddleName = null.String{
String: req.MiddleName,
Valid: req.MiddleName !="",
}
user.LastName = req.LastName
user.Email = req.Email
user.FavouriteColour = null.String{
String: req.FavouriteColour,
Valid: req.FavouriteColour !="",
}
_, err = user.Update(ctx, tx, boil.Infer())
if err !=nil {
return err
}
u = user // assigns value we wanted with `u` outside of this `Tx()` function.
returnnil })
if err !=nil {
returnnil, err
}
return u, nil}
Ent
The documentation at https://entgo.io/docs/transactions
describes many ways to perform a transaction. The following method is what has been established:
In conclusion, creating a transaction is rather straight forward. We can use the established pattern
from https://go.dev/doc/database/execute-transactions
with all of these libraries and ORMs. However, only sqlc easily allows us to reuse existing Get()
and Update() methods by composing them and apply a transaction.
Comparison between popular go libraries and ORM for database access layer.
In this post, we will look at selecting records based on a column with multiple conditions. To keep
it simple, we will select users whose last names are ‘Donovan’ and ‘Campbell’.
In this post, we will look at how deleting a single record in a table works for these various
libraries and ORMs.
In general, this operation is simple because we only care about deleting and checking for an error.
We cannot pass in a slice into sqlx because the sql driver does not understand Go slice. Instead,
given a series of IDs, we need to create a series of ? depending on the length of that slice. In
sqlx, a helper function called sqlx.In() has to be called before doing the main query.
err = r.db.SelectContext(ctx, &dbScan, "SELECT * FROM users WHERE last_name IN (?)", f.LastNames...)
The above code won’t work. First, SelectContext() only accepts a slice of interface, not a
slice of strings. Next, there is only one question mark placeholder. Given that we are using
Postgres, we need to use dollar placeholders instead.
So the above query won’t generate a valid sql query. The reason is I think sqlx touches your query
as little as possible. So it needs to be a valid sql query first.
query, args, err := sqlx.In("SELECT * FROM users WHERE last_name IN (?)", f.LastNames)
The In() method expands the ? bindvar to the length of f.LastNames slice. By default, it turns
into a series of ?. For postgres, we must yet do another transformation so that it uses dollar signs.
Doing dynamic queries is tricky with sqlc. First of all, there is no support for a WHERE IN()
support for mysql from sqlc. There is an open issue for this.
I suppose there is a dodgy way by extrapolating the number of ? placeholders
mysqlQuery := fmt.Sprintf(`SELECT * FROM users WHERE last_name IN (?%v);`, strings.Repeat(",?", len(f.LastNames)-1))
Only issue here is that f.LastNames slice variable need to be sanitized because it originates from
user input and thus potentially can cause a sql injection.
Anyhow, since we are demonstrating for Postgres, we can make use postgres array! When writing the SQL
query, we declare that we accept an array of text for last_name using last_name = ANY($1::text[].
SELECT id, first_name, middle_name, last_name, email, password, favourite_colour
FROM users
WHERE last_name =ANY($1::text[])
The code that gets generated by sqlc now accepts an array using pq.Array
We use sq.Eq{} struct by giving it the column name we want, and then the slice of arguments.
Using sq.Eq{} may be unintuitive because we would expect to use WHERE IN wording somewhere.
Moreover, sq.Eq{} struct expects a map of string with any type as its value. Since we supply a
list of strings, squirrel will generate a WHERE IN sql statement. We will later see in the
Get section where sq.Eq{} can accept a single
integer value to get one record.
By default, squirrel will create question mark placeholders. Since we are using postgres, we need
dollar placeholders instead. We can add a method to each query builder or, we can avoid repetition
by setting it during repository initialisation.
Squirrel does not have StructScan() or SelectContext() convenience method like sqlx, so we need
to do manual scanning.
var dbScan []*userDB
for rows.Next() {
var u userDB
err := rows.Scan(&u.ID, &u.FirstName, &u.MiddleName, &u.LastName, &u.Email, &u.Password, &u.FavouriteColour)
if err !=nil {
returnnil, err
}
dbScan = append(dbScan, &u)
gorm
Gorm will happily take a slice for your WHERE IN(?). It even correctly use the correct dollar
placeholders because we have imported "gorm.io/driver/postgres" somewhere in our code.
err = r.db.WithContext(ctx).
Where("last_name IN ?", f.LastNames).
Find(&users).
Error
Unlike sqlx, you do not need to put a parentheses around the question mark placeholder.
sqlboiler
Sqlboiler has a WhereIn() query modifier which takes column_name IN ?. In spite of using this
method, you still need to type in IN ? as a string in the argument. I imagine the API can be
simpler by accepting something like WhereIn(last_name, args...).
The variadic arguments requires an interface type instead. Since sqlboiler is not generic yet, we
need to transform our string slice into a slice of interface{}.
import"github.com/samber/lo"args := lo.Map(f.LastNames, func(t string, _ int) any {
return t
})
sqlboiler will take the dollar placeholder without a parentheses.
all, err := models.Users(
//qm.WhereIn("last_name", args...), // Does not work. Needs IN operator
//qm.WhereIn("last_name IN ($1, $2)", "Donovan", "Campbell"), // Is what we want
qm.WhereIn("last_name IN ?", args...), // instead, just give it a `?`.
).
All(ctx, r.db)
ent
Doing a Where IN(?,...) in ent is as easy as Gorm, but with the added advantage of having no magic
strings whatsoever.
LastNameIn() function is generated by the ent cli based on the schema we have defined. If we look
at its function signature, LastNameIn(vs ...string) predicate.User, we can clearly see that it
requires none or a slice of strings.
Like all others, by default ent will add edges{} to the response even when you did not load any
relationship.
Ent is again the easiest to use amongst all with Gorm coming in close second. There’s too much work
involved in sqlx. Squirrel is quite ok. Usage of a sq.Eq{} struct is interesting but scanning can
be a major headache. Sqlboiler syntax can be a little more friendly.
Comparison between popular go libraries and ORM for database access layer.
Often when you are listing a resource, a client may want to perform various filter operations and
sorting. They may want to retrieve all addresses in a specific postcode, or sort the users by
last name followed by first name. A common way of making this request is by passing them using query
parameters in a GET operation in the URL request.
For example, GET /api/user?email=john@example.com only returns user record with john@example.com
as its email. GET /api/users?sort=first_name,asc sorts users by first name in ascending order. We
can add multiple columns to sort by adding more sort key: GET /api/users?sort=first_name,asc&sort=email,desc.
For pagination, we can either define offset and limit or use the convenience page key:
GET /api/users?page=2. Please note that there is a huge penalty LIMIT and OFFSET when you are
retrieving records where the offset is in the millions. See Why You Shouldn’t Use OFFSET and LIMIT For Your Pagination.
ORDER BY also must be added in all LIMIT and OFFSET.
Before going further into the implementation, we add a convenience function in our handler(controller)
layer that receives client requests. This function parse requests by its query parameters.
Here, we declare what field(s) we accept for filtering the users. Then we use url.Values.Get() method
to retrieve them. We decided that we can put multiple names separated by a comma on one query parameter,
so we have a helper function that can turn a string seperated by a comma into a slice of strings.
package db
type Filter struct {
Base filter.Filter
Email string FirstName string FavouriteColour string LastName []string}
funcFilters(v url.Values) *Filter {
f := filter.New(v)
lastNames := param.ToStrSlice(v, "last_name")
return&Filter{
Base: *f,
Email: v.Get("email"),
FirstName: v.Get("first_name"),
FavouriteColour: v.Get("favourite_colour"),
LastName: lastNames,
}
}
The filter/base.go file which is shared by all, handles parsing of query parameters. To use it,
we call the package’s db.Filters() function and pass in URL query from handler’s http.Request.
func (h *handler) List(w http.ResponseWriter, r *http.Request) {
f := db.Filters(r.URL.Query())
Then we inject the f variable down to our data access layer.
sqlx
First we show a dodgy way of doing dynamic filtering by a field by building the strings ourselves.
Care must be taken that we do not concatenate users’ input into the query itself because it will be
vulnerable to sql injection. Instead, we must use placeholders. In the subsequent sqlc section, we
will show a better way - using conditional sql query.
1. Filter By Email
We start by declaring a couple of sql clauses. We select all records from users table and limit it
to the first 30 records.
So what we want is basically the following query but with dynamic WHERE according to what is
passed in the URL query parameter.
SELECT*FROM users WHERELOWER(email) =?ORDERBY id LIMIT30OFFSET0;
Note that there are spaces in the strings because we need some spacing when we combine the strings.
selectClause :="SELECT * FROM users "// notice the spaceat the end?paginateClause :=" LIMIT 30 OFFSET 0"//andat the beginning?
Then we start appending LOWER(email) = ? clause to our empty whereClauses string. We had to use
LOWER() function of postgres because our schema is not case-insensitive. Unlike mysql, a database
can be built using ci (case-insensitive) collation e.g. ...collate utf8mb4_unicode_ci;. We also
use ? placeholder instead of using $ placeholder because we cannot be sure of the order when
filtering by column is optional. We can always rebind the whole query with sqlx’s Rebind() method.
Once whereClauses concatenation are completed, we concatenate to the full query.
Then loop filters.Base.Sort to concatenate column and order string along with arguments.
Looping filters.Base.Sort gives the flexibility to client to determine which column order comes
first.
for col, order :=range filters.Base.Sort {
sortClauses += fmt.Sprintf(" %s ", col)
sortClauses += fmt.Sprintf(" %s ", order)
}
Concatenating strings make me feel uneasy. Is this vulnerable to sql injection? You can try hitting
the api with:
All in all, there is a lot of boilerplate to perform these three types of operations. Care must be
taken, so we do not allow any sql injection.
sqlc
There is a terrible way to do it in sqlc by writing all permutations of columns.
We do not really want to write them all because 2 columns means there are
2! == 4 permutation, for example:
Query without filtering
Query with first_name filtering
Query with email filtering
Query with both first_name and email filtering.
If we have 3 columns, 3 factorial is 6. And 4 factorial is 24 permutations!.
So we try with conditional SQL query:
-- name: ListDynamicUsers :many
SELECT id, first_name, middle_name, last_name, email, password, favourite_colour
FROM users
WHERE (@first_name::text=''OR first_name ILIKE'%'||@first_name ||'%')
AND (@email::text=''OR email =LOWER(@email) )
ORDERBY (CASEWHEN@first_name_desc::text='first_name'THEN first_name
WHEN@email_desc::text='email'THEN email
END) DESC,
(CASEWHEN@first_name_asc::text='first_name'THEN first_name
WHEN@email_asc::text='email'THEN email
END),
id
OFFSET@sql_offset LIMIT@sql_limit;
For example, whenever first_name is present, then the whole (@first_name::text = '' OR first_name ILIKE '%' || @first_name || '%')
line will be true.
Doing order is tricky when we want to decide whether we are sorting a column by ascending or
descending order. We use CASE ... THEN ... END to decide whether we want to include the clause or
not.
One thing missing is WHERE and ordering by favourite colour in which sqlc doesn’t play well with
pgx driver as far as I know.
Since we store sort information in a struct, we need to loop them and assign to the generated
ListDynamicUsersParam struct
p := ListDynamicUsersParams{
SqlOffset: int32(f.Base.Offset),
SqlLimit: int32(f.Base.Limit),
}
iflen(f.Base.Sort) > 0 {
for col, order :=range f.Base.Sort {
if col =="first_name" {
if order =="ASC" {
p.FirstNameAsc = col
} else {
p.FirstNameDesc = col
}
}
}
}
Obviously this only handles first_name. You need to repeat the col check for each column you want
to sort. One thing I haven’t managed to do is conditional query on an enum column.
3. Pagination
Not much drama with pagination. We simply provide the values from the base filter struct.
p := ListDynamicUsersParams{
SqlOffset: int32(f.Base.Offset),
SqlLimit: int32(f.Base.Limit),
}
As you can see, using conditional greatly reduces the boilerplate as compared to our sqlx example.
It is just a matter of getting the initial sql query correct first.
This also means that we could have used conditional sql for sqlx too.
squirrel
Since squirrel is a query builder, it should shine with dynamic queries.
Filter by Email
Chaining the methods that we want is exactly how we build the query.
Note that requesting with ?first_name=jake won’t yield any result because the case does not match
since Gorm does not perform LOWER() to the input or ILIKE to the column. To fix this, we cannot
use struct map, instead we need to use Where() and Or() methods.
In conclusion only sqlc did not pass all functionalities we required like dealing with dynamic
filtering and case-insensitive user input. Concatenating string in sqlx is potentially error-prone
(and dangerous!). Order of methods are important in Gorm. Squirrel is straightforward. Finally, both
sqlboiler and ent offer type safe query and easy query building.
Comparison between popular go libraries and ORM for database access layer.
In this post, we will look at listing users with all their addresses. Both users and addresses
tables are joined with a pivot table, user_addresses, which holds the foreign key of each of
users and address table primary key.
As the erd suggests, a user can have many addresses, and an address can belong to many users.
There are many ways to do this. The most straight away method (what most ORM does) is to perform
a query on each of the related tables.
Get 30 records of users
For each user, get the IDs of the address from the pivot table
Get all records of addresses from the IDs retrieved in the previous step.
Begin attaching address to users.
ORMs often do this automatically for you. Writing sql manually means you have to do all the
steps above.
The other way is to perform left joins on the tables, starting from users, and ends with addresses.
Left join is important because if we perform an inner or right join instead, if a user do not have
an address, that user record will not be returned. While doing left joins, alias, especially on IDs
must be set to prevent name conflict. In my opinion, this method is a poor fit when doing a
many-to-many operation. The following method is better.
Finally, it is possible to use array_agg (for postgres) function or group_concat for mysql/mariadb.
You write the least amount of Go code but requires you to have a sufficient amount of SQL knowledge
to craft the correct query.
sqlx
We will take a look at two methods, method 1 which by which is by doing a query on each table, and
the third method, which is relying on a single query. I will omit the second method because it is
much more complicated than method 1.
Method 1: The Long Way
If you look at the full source code, it is 2 pages long.
So, I will only include relevant snippets here.
Starting with sql queries, we try to limit number of users to 30.
To find out the address of each user, we get that relationship from the pivot table. For an array
of user IDs, we can always do a SELECT DISTINCT u.id FROM users u LIMIT 30;. But this is an
additional query while we can extract the IDs from query #1.
-- #2
SELECTDISTINCT ua.user_id AS user_id, ua.address_id AS address_id
FROM"addresses" a
LEFTJOIN"user_addresses" ua ON a.id = ua.address_id
WHERE ua.user_id IN (?);
We then get all address IDs, pass into the following query, and execute.
-- #3
SELECT a.*FROM addresses a
WHERE a.id IN (?);
Notice that the two last queries uses ? instead of $1. It relates to sqlx’s ability (or inability)
to properly turn array values for IN operator. So what w can do is to use the In() function to
expand into a slice of arguments.
sqlx.In accepts an array of interface but executing QueryContext() expects int8.
Only using ? placeholder sqlx won’t complain about types.
Once we perform the first sql query, we can the result into a custom struct
([]*UserResponseWithAddressesSqlx), one with address field.
type UserResponseWithAddressesSqlx struct {
ID uint`json:"id,omitempty"` FirstName string`json:"first_name"` MiddleName string`json:"middle_name,omitempty"` LastName string`json:"last_name"` Email string`json:"email"` Address []AddressForCountry `json:"address"`}
var all []*UserResponseWithAddressesSqlx
for users.Next() {
var u userDB
if err := users.Scan(&u.ID, &u.FirstName, &u.MiddleName, &u.LastName, &u.Email); err !=nil {
returnnil, fmt.Errorf("db scanning error")
}
all = append(all, &UserResponseWithAddressesSqlx{
ID: u.ID,
FirstName: u.FirstName,
MiddleName: u.MiddleName.String,
LastName: u.LastName,
Email: u.Email,
})
}
We retrieve all user IDs and get its associated addresses with the second sql query. We need to scan
and hold the data with another struct so that wen use it to find the associations later.
type userAddress struct {
UserID int`db:"user_id"` AddressID int`db:"address_id"`}
Then get all address IDs and execute the third sql query. Scan the results into a new address struct -
it needs the db struct tag.
The final step is to loop through all array of structs and attach their associations
for _, u :=range uas {
for _, user :=range all {
if u.UserID ==int(user.ID) {
for _, addr :=range allAddresses {
if addr.ID ==uint(u.AddressID) {
user.Address = append(user.Address, AddressForCountry{
ID: addr.ID,
Line1: addr.Line1,
Line2: addr.Line2.String,
Postcode: addr.Postcode.Int32,
City: addr.City.String,
State: addr.State.String,
})
}
}
}
}
}
Note that if a user record do not have any address, the value for address key will be null,
instead of an empty array.
So you need to remember to initialize the addresses for a user
all = append(all, &UserResponseWithAddressesSqlx{
ID: u.ID,
FirstName: u.FirstName,
MiddleName: u.MiddleName.String,
LastName: u.LastName,
Email: u.Email,
Address: []*AddressForCountry{}, // instead of leaving it empty
})
In my opinion, this is too much code for just a simple two tables plus a pivot table to deal with.
Imagine having to eager-load 7 tables deep. That is a nightmare.
Method 3: The SQL way
So let us take a look at using the third method, which is using both array_agg and row_to_json
and see if it is anything simpler. If you take a look at the SQL query below, it looks similar to
the query we have done for 1-many, but we are now joining three tables:
SELECT u.id,
u.first_name,
u.middle_name,
u.last_name,
u.email,
u.favourite_colour,
array_to_json(array_agg(row_to_json(a.*))) AS addresses
FROM addresses a
JOIN user_addresses ua ON ua.address_id = a.id
JOIN users u on u.id = ua.user_id
GROUPBY u.id;
The relevant bits are that the JOIN operation starts from the ’lower’ aggregated records
(addresses table) and we work our way up to the users table.
With array_to_json(array_agg(row_to_json(a.*))), we turn each row in addresses as a json
array. After performing the left joins, we select user columns we want and finally group them by
user ids. The result is we only get users that have at least one address.
Notice that this SQL query omits users that do not have any addresses. To include them, simply
change the JOIN above to RIGHT JOIN.
One caveat, just like we showed in one-to-many example is that we need to parse the json array from
database back to a Go struct if we want to play around with it. If you don’t we can simply scan into
json.RawMessage.
sqlc
We already know the exact SQL query we need to perform this M2M query so, it is similar to method 3
of sqlx, but with less boilerplate. We do not need to manually scan database results to a struct
because they are all generated for you, but you still need to extract IDs from each. For method 1,
it only takes a page long to perform many-to-many compared to almost 2 pages long for sqlx.
You still need to manually attach address to a user like sqlx. Also, you need to remember to
initialize addresses for a user like above instead of leaving it empty.
Upon sqlc compilation, it generates a method that you can simply use:
dbResponse, err := r.db.ListM2MOneQuery(ctx)
However, since it outputs a struct consisting of sql.Nullstring for middle_name and an enum
for favourite_colour, we want to convert them into another struct
Like sqlx and sqlc, method 3 is going to have far fewer boilerplate than method 1. The long method 3
is uninteresting because we have already shown how to do it with sqlx. Let us see how this query
builder fare.
rows, err := r.db.
Select(
"users.id",
"users.first_name",
"users.middle_name",
"users.last_name",
"users.email",
"users.favourite_colour",
"array_to_json(array_agg(row_to_json(a.*))) AS addresses",
).
From("addresses AS a").
InnerJoin("user_addresses ON user_addresses.address_id = a.id").
InnerJoin("users ON users.id = user_addresses.user_id").
GroupBy("users.id").
QueryContext(ctx)
One look, and you can see that this is almost a one-to-one translation from the SQL query we wanted
into Go code. There are a few differences to the approach we made in the 1-to-many section.
First of all, in the 1-to-many section, the whole database response is in JSON format. Here, only
one column is in JSON format, namely the last addresses column.
At first, I wanted to make array_to_json(array_agg(row_to_json(a.*))) AS addresses statement into
its own select using s := r.db.Select("array_to_json(array_agg(row_to_json(a.*)))") and then chain
with FromSelect(s, "addresses") but that did not work.
The inner join argument here is interesting. We pass one string only, unlike other ORM in other
languages where they can be separated with commas. Here it can take optional arguments. That means
we can do something like this…
InnerJoin("user_addresses ON user_addresses.address_id = a.id AND a.id = ?", 7).
… and it will only return records on the user_addresses join where address ID is equal to 7.
The trickiest part is the scanning, like in one-to-many section, we have to construct a Go struct
specifically for this query. The first six columns for users table are clear. addresses
on the other hand is actually a JSON response - so we need to implement the Scanner interface.
The amount of code needed to be written is also greatly reduced compared to sqlx, sqlc, and squirrel
so far. The key is to use Preload() method. What is not obvious is what magic string you need to
supply to it. The answer is the name of the field itself - it has to match. For example, User
struct has Addresses in its field, so we supply the string Addresses inside Preload().
type User struct {
... Addresses []Address `json:"address" gorm:"many2many:user_addresses;"`}
We also need to tell Gorm the pivot table that is used to link between users and addresses table.
It is done by using a struct tag gorm:"many2many:user_addresses;".
Gorm has no code generation like sqlboiler and ent. Instead, it relies on structs (and struct tags)
as its point of reference.
sqlboiler
Unfortunately I cannot find an easy way to eager load many-to-many relationships with sqlboiler.
Current recommendation is to fall back to raw sql queries.
ent
Like its one-to-many operation, ent shines at loading relationships.
We use the same pattern as what we did with one-to-many - {{ With{{Model}} }} pattern.
If a user has no address, the edges key will only contain an empty object.
Ent makes 3 separate sql queries, just like what we have done for sqlx, and gorm.
In my opinion, the API is very straightforward and there isn’t a way to get it wrong. Everything
is statically typed with no interface{} anywhere which makes it predictable. Only downside I can
think of is you need to spend time getting the relationship in the model schemas correctly.
In conclusion, using raw sql is too much work when it comes to loading many-to-many relationships.
Imagine if we want to load 7 tables deep, it’d be a nightmare. While using sqlc improves upon sqlx
by reducing a good deal of boilerplate, it still pales in comparison to using an ORM. You need to
be careful with struct tag with Gorm, and it was not obvious that you had to use a field name for
Preload(). Ent does not have Gorm’s issues as everything is typed. You will need to learn to using
With... pattern though which isn’t a big deal.
Comparison between popular go libraries and ORM for database access layer.
In this post, we will look at getting a list of countries along with its addresses.
There are three ways of loading this kind of relationship:
Perform one sql query on both tables
Perform a left join from countries to addresses table,
Use array agg (for postgres) function or group concat for mysql/mariadb.
Option one is what most ORM does. They query the parent table, then get its IDs. With those IDs,
they query child table relationships. The more relationships you require, the more sql queries you
make. Each child relationships are then attached to the parent based on the foreign keys that
has been defined. As a result, you get a struct where you can walk through from parent to child
table.
Using joins can be an easy way of getting all data in a single query. You potentially return more
data than option number one when a child record belongs to many parent table. Furthermore, the type
of joins is important. If you have made an INNER JOIN between parent and child table, and that
child table record is NULL, you may end up losing parent table record in your final result.
Joining a number of tables with millions of records can result in huge memory usage. This is because
in SQL order of execution, it first
joins the tables in memory before selecting relevant columns. Even though you make a single SQL
query, you need to loop through the results and scan child record to the correct parent record.
In my opinion, tracking the relationships are hard and error-prone.
Finally, we have an option to use native functions to aggregate child record results to their
related parent tables. There is array agg for postgres and group concat for mysql or mariadb.
For this, I create a view called country address that aggregates addresses to its country.
CREATEVIEW country address asselectc.id, c.code, c.name,
(
selectarrayto json(array agg(rowto json(addresslist.*))) asarrayto json
from (
select a.*from addresses a
wherec.id = a.country id
) addresslist) as address
from countries ASc;
Running select * from country address; will give id, code, name, and address in a JSON
format.
To get everything in a JSON format, you may query with select row to json(row) from (select * from country address) row;.
However, you still need to parse this JSON response back to a Go struct.
const GetWithAddresses2 = "select row to json(row) from (select * from country address) row"func (r *database) Countries(ctx context.Context) ([]*CountryResponseWithAddress, error) {
var resp []*CountryResponseWithAddress
rows, err := r.db.QueryContext(ctx, GetWithAddresses2)
if err !=nil {
returnnil, fmt.Errorf(`{"message": "db error"}`)
}
defer rows.Close()
for rows.Next() {
var i CountryResponseWithAddress
err = rows.Scan(&i)
if err !=nil {
returnnil, err
}
resp = append(resp, &i)
}
return resp, nil}
GetWithAddresses2 query is a statement that queries the view that we have created. On top of that,
we used row to json() function that tuns the whole response as a json format. Before we go ahead,
let us take a step back and define our Go struct.
type CountryResponseWithAddress struct {
Id int`json:"id,omitempty"` Code string`json:"code,omitempty"` Name string`json:"name,omitempty"` Addresses []*AddressForCountry `json:"address"`}
To figure out how to form this struct, we need to know the response coming from the database.
The key (pun intended) here is to look at the JSON key, namely id, code, name, and
addresses. That is how we decide to name the fields in the CountryResponseWithAddress struct.
Go will automatically figure out which json key maps to a field.
Notice that the last field needed to be plural Addresses instead of singular address. We can
override this by using a db struct tag - which uses reflection.
Because the whole response is in a JSON format, we need to tell the runtime on how to handle and parse
the result to a Go struct.
To achieve this, we implement one of database/sql interface called Scanner. It contains one method
called Scan(). So we create our own implementation of scanner interface
with Scan(src any) error by unmarshalling into JSON.
// From standard library `database/sql` package
type Scanner interface {
Scan(src any) error}
// CountryResponseWithAddress is our own custom struct that reflects the result we want
type CountryResponseWithAddress struct {
Id int`json:"id"` Code string`json:"code"` Name string`json:"name"` Addresses []*AddressForCountry `json:"address"`}
// Scan implements Scanner interface using our CountryResponseWithAddress struct
func (m *CountryResponseWithAddress) Scan(src any) error {
val := src.([]byte) // []byte is an alias of []uint8
return json.Unmarshal(val, &m)
}
This means, every time err = rows.Scan(&i) is called, it will go into this Scan(src any) error
method of CountryResponseWithAddress instead of the default Rows struct from database/sql
package.
We can try the second approach by using a struct like this
If no field name or alias conflicts, we should be able to attach the addresses to its country by
looping through the records keeping track of the country ids.
And of course, we need to perform two queries for the first method. Then we need to loop over
result from addresses and attach them to a country or countries by looking at the foreign keys.
sqlc
In sqlc, we do not have to worry about writing boilerplate because the code is auto generated from
the sql query we already know.
-- name: CountriesWithAddressAggregate :many
selectrowto json(row) from (select*from country address) row;
So only three lines are needed to get the correct result.
However, sqlc uses json.RawMessage. If you need to play around with the returned database records,
you need to unmarshall this JSON response to a Go struct, an example in which I show in the squirrel
section below.
squirrel
The obvious way to do this is like in approach number 1. Query each table, and then attach relevant
address (if exists) to a country.
But since squirrel is a query builder, we are more interested to find out how building the such
query looks like. Let us take a closer look at the query.
selectrowto json(row) from (select*from country address) row;
First of we have a row to json postgres function that takes up row as its argument. The second
part is we select everything from a view called country address and is aliased to become row.
To build this query using squirrel, we have to build each of these to different parts:
s := r.db.Select("* from country address")
rows, err := r.db.
Select("row to json(row)").
FromSelect(s, "row").
QueryContext(ctx)
First, we select the view and store in a variable (s) - this is the select * from country address
part. Then we do a Select("row to json(row)"), and chain it to a FromSelect() method.
Once this is done, we can scan it to either simply a raw JSON message or a custom Go struct.
var items []json.RawMessage
for rows.Next() {
var rowToJson json.RawMessage
if err := rows.Scan(&rowToJson); err !=nil {
returnnil, err
}
items = append(items, rowToJson)
}
Using a custom Go struct is much more elaborate. First you have to figure out the correct struct
for this JSON payload:
type CustomGoStruct struct {
Id int`json:"id"` Code string`json:"code"` Name string`json:"name"` Address []struct {
Id int`json:"id"` Line1 string`json:"line 1"` Line2 string`json:"line 2"` Postcode int`json:"postcode"` City string`json:"city"` State string`json:"state"` CountryId int`json:"country id"` } `json:"address"`}
And then make sure to implement Scanner interface for this struct
So far, getting this to work is the hardest. The code above looks
deceptively simple but to get to this point is not.
It is true that we simply call Preload() method to load child relationship, but before we dive
further into the code, let us see how we declare our structs for our tables.
type Country struct {
ID int`json:"id"` Code string`json:"code"` Name string`json:"name"` Address []Address `json:"address" gorm:"foreignkey:country id"`}
The struct declares that we have three columns. To define the relationships, one cannot think of it
like an sql schema. Here, it is telling that a ‘Country’ has many Address, and denoted by
[]Address slice. Thus, the Address field is a reference to the addresses table.
Next, take a look at Address struct:
type Address struct {
ID int`json:"ID,omitempty"` Line1 string`json:"line 1,omitempty" gorm:"Column:line 1"` Line2 string`json:"line 2,omitempty" gorm:"Column:line 2"` Postcode int32`json:"postcode,omitempty" gorm:"default:null"` City string`json:"city,omitempty" gorm:"default:null"` State string`json:"state,omitempty" gorm:"default:null"` CountryID int`json:"countryID,omitempty"`}
This almost looks alright except that CountryID field should have been a foreign key to
countries table. But just by looking at the struct above, there’s no indication that it is a
foreign key. By Gorm convention, CountryID field will become country id column in this address table.
Let us take a second look at the following snippet:
One must be careful with the Find() method with the type you are putting into it.
We give it a slice of address pointers, ([]*Countries). If you do not, for example, giving it an
Country type will return a single record!
Also, we made a mistake in the code above even though Go compiler returns no error. The Find()
finisher is called too soon and ignore Select(). We are lucky because we were doing SELECT *
anyway. But if we were doing SELECT id in the code above for example, it will be ignored.
var country Country
// country will be filled with the first record in the database.
err := r.db.WithContext(ctx).
Preload("Address").
Select("*").
Find(&country).
Error
There is no enforcement on the type that can be accepted inside Find(), not surprising
because it accepts an empty interface. Gorm uses a lot of reflection to get to the type.
You can also omit the Select("*") builder, and it will work the same! It is forgiving but on the
other hand loses explicitness when it comes to building a query.
sqlboiler
1-N in sqlboiler is more verbose than what you would expect from what an ORM would do. However,
everything is typed - no magic strings anywhere which prevents any typo.
It starts with models.Countries(). In its argument, we use the generated package to load the
relationship by its string, denoted by models.CountryRels.Addresses. The method All() ‘finisher’
executes the query.
func (r *database) Countries(ctx context.Context) ([]*db.CountryResponseWithAddress, error) {
countries, err := models.Countries(
qm.Load(models.CountryRels.Addresses),
qm.Limit(30),
).
All(ctx, r.db)
if err !=nil {
returnnil, err
}
var all []*db.CountryResponseWithAddress
for , country :=range countries {
resp :=&db.CountryResponseWithAddress{
Id: int(country.ID),
Code: country.Code,
Name: country.Name,
Addresses: getAddress(country.R.Addresses),
}
all = append(all, resp)
}
return all, err
}
So why are we looping countries in the next few lines?
The reason is the generated model.Country struct relationship (or edge) has - as its struct tag.
This means when we return the result to the client, the marshalled json would not contain any
addresses.
type Country struct {
ID int64`boil:"id" json:"id" toml:"id" yaml:"id"` Code string`boil:"code" json:"code" toml:"code" yaml:"code"` Name string`boil:"name" json:"name" toml:"name" yaml:"name"` R *countryR `boil:"-" json:"-" toml:"-" yaml:"-"`// this edge won't be returned to client
// because json is tagged with -
L countryL `boil:"-" json:"-" toml:"-" yaml:"-"`}
For this reason, we need to create a new struct, and copy them over if we want to send the child
relationships to client.
ent
In ent, we use the With... method to eager load a relationship. Each model in a json response
will contain an edges key no matter if there is any relationship(s) or not.
To define the relationship, or ’edges’ in ent’s lingo, you write an Edges() method on User
struct.
// Edges of the User.
func (User) Edges() []ent.Edge {
return []ent.Edge{
edge.From("addresses", Address.Type).Ref("users"),
}
}
It is compulsory to set a link back to user from an address:
// Edges of the Address.
func (Address) Edges() []ent.Edge {
return []ent.Edge{
edge.To("users", User.Type),
}
}
All we need to do after the schema is defined is to run go generate ./... and we practically need
one line is needed that says query all Country with its Address relationship.
In this post, we have seen each library and ORM vary wildly in the approach between them. Sqlx
is quite normal with normal scanning. Sqlc on the other hand, removes all boilerplate present in
sqlx. Squirrel documentation is contained inside its source code. You cannot be sure of Gorm’s
method chaining order and its struct key. Sqlboiler is similar to squirrel in its ease of query
building but its addresses child records are not serialized by default.
Ent however, shines over other libraries and ORMs. If you have a good sql expertise,
sqlc is pretty awesome. Otherwise, ent’s eager loading API is a breeze to use.
We pass a &User{} struct which indicated gorm that it is deleting this users table. Gorm
magically figure out the name of the table you want using
the name of the struct.
sqlboiler
In sqlboiler, deleting is opposite to gorm when compared to doing an update operation. Here, sqlboiler
deletes a record through query building which means
two queries are made. Delete() method must be done on the models.User struct.
Comparison between popular go libraries and ORM for database access layer.
In this post we will see a simple update of a resource by using its ID. We will try to perform an
UPDATE <TABLE> SET <COLUMN1>=<VALUE1> etc, if not, we simply retrieve the item first, set to new
values, and then save the record.
We only show how to do a full PUT updating instead of a PATCH, where a patch means we only
update certain fields only, leaving other columns untouched.
As a good practice, we scan user’s request to a custom UserUpdateRequest struct. Then we hand
over this struct to our database access layer.
Our request struct looks like the following. We are making a PUT request instead of a PATCH.
That means we have to include the changed value as well as all other fields in our update request to
the database. If we only include one field, other fields will be zeroed out.
So, the client must use GET, then update what is necessary, and finally call a PUT endpoint.
The struct to receive client request will look like this. We use json.NewDecoder(r.Body).Decode(&request)
to parse PUT json payload to a Go struct. These json struct tag helps with determining with json
key should map to a Go field.
Although the client did a GET prior to this PUT request, we still need to do another Get()
because we need to retrieve hashed password. If we straight away assign new values, then password
column will be empty since client never had this information! Also, we cannot trust the client sends
complete data for this PUT request.
Finally, we call our own Get() method to return the updated record with the updated updated_at
column value. Three separate sql queries means this is a good candidate to perform a transaction on.
Named Query
Sqlx has a feature called named queries. To use it, we
need to modify our sql query with colon character. Good thing is this is still a valid postgres
query.
const UpdateNamed = "UPDATE users set first_name=:first_name, middle_name=:middle_name, last_name=:last_name, email=:email WHERE id=:id;"
Then we create a new struct with db struct tag to tell which field belongs to which column.
type updateNamed struct {
ID int64`db:"id"` FirstName string`db:"first_name"` MiddleName string`db:"middle_name"` LastName string`db:"last_name"` Email string`db:"email"`}
Finally, we can call NamedExecContext() method without worrying about the order.
Scanning is a pinch in expense of writing more boilerplate beforehand.
sqlc
Unlike sqlx, we do not have to worry about the order because we are filling in the update to the
generated struct made by sqlc generate. Like sqlx, we need to retrieve the record before
updating the values.
In many cases, we would want to perform a transaction over these three sql queries - something that
which we will cover in the transaction section of this blog series.
Once we retrieve the record from database, we begin to update the field that we want.
The Save() method can be chained to Error to propagate the error up to our handler. It is easy
to forget this because it is not compulsory and Go compiler won’t complain about it.
sqlboiler
Updating in sqlboiler, like gorm, behaves like common ORM; you have to find the record, and then
update desired fields.
Note that we omit setting up a password, so that it will not be changed. Since middle_name is
an optional field, you have to you use null.String{} struct to fill in the new value. We ignore
the first return value with underscore unless you want to know the number of rows being affected.
Important: You need to find the record before updating. If you straightaway call Update() and not
setting a password, that password field will be deleted from the database!
Comparison between popular go libraries and ORM for database access layer.
In this post, we will look at how these libraries and ORM deal with fetching a single record given
an ID. There should not be any drama as we only be doing a simple query as follows:
For scanning to work, this struct needs db struct tag annotated on each field at the moment.
middle name is nullable, so we declare the type to be sql.NullString to allow value to be
scanned as either a string or a null value.
type userDB struct {
ID uint`db:"id"` FirstName string`db:"first name"` MiddleName sql.NullString `db:"middle name"` LastName string`db:"last name"` Email string`db:"email"` Password string`db:"password"`}
Note that usage of that db struct tag will cause reflection to be
used. There is a way to avoid it by matching the field names with database column names. For
example First Name to match against first name database column. Title Case naming style
is not a convention in Go though.
One last thing is the way we handle error. It is possible that the client requests for a user ID
that do not exist, or have been removed from the database. So we want to return appropriate message
and correct HTTP status. We can use error.Is() function to compare the err value against built
in error provided in the sql package.
sqlc
While inflexible, sqlc continues to impress with its API usage. GetUser() method is generated for us
by annotating the sql query with the following the name that we want and how many records it is
returning.
-- name: GetUser :one
SELECT id, first name, middle name, last name, email, favourite colour
FROM users
WHERE id = $1;
To use, we simply call the generated method. The rest are simply error handling and DTO transform.
Select() and From() method on the other hand are ok because their method signatures clearly
states what type they require.
We then finish off with a finisher QueryRowContext() that returns a RowScanner interface. That
means, we have to manually scan and take care of the column orders.
var u db.UserDB
err := rows.Scan(&u.ID, &u.FirstName, &u.MiddleName, &u.LastName, &u.Email, &u.Password, &u.FavouriteColour, &u.UpdatedAt)
if err !=nil {
if errors.Is(err, sql.ErrNoRows) {
return&db.UserResponse{}, &db.Err{Msg: message.ErrRecordNotFound.Error(), Status: http.StatusNotFound}
}
log.Println(err)
return&db.UserResponse{}, &db.Err{Msg: message.ErrInternalError.Error(), Status: http.StatusInternalServerError}
}
gorm
Gorm is relatively simpler by using a basic First() method to obtain a record by its id.
We must remember to chain Error after First() method. This is easy to forget because it is not
compulsory.
Remember that in the previous post where Find() can also accept a slice of users? First() method
can also accept a slice. So be careful with what you are putting into it. The number of records
being returned is not determined by the method we use, but by the type given to Find() or First().
func (r *repo) Get(ctx context.Context, userID int64) (*User, error) {
var user User
err := r.db.WithContext(ctx).
// First() also can accept a `var user []*User` which can return more than one record!
First(&user, userID).
Error
if err !=nil {
if errors.Is(err, gorm.ErrRecordNotFound) {
return&User{}, &db.Err{Msg: message.ErrRecordNotFound.Error(), Status: http.StatusNotFound}
}
log.Println(err)
return&User{}, &db.Err{Msg: message.ErrInternalError.Error(), Status: http.StatusInternalServerError}
}
return&user, nil}
The second argument of First() accepts a variadic interface. So unless you refer to gorm’s website
for a documentation, you cannot infer what type it needs.
As our User model already has - json struct tag on password, we do not have to worry it
leaking to client.
type User struct {
ID uint`json:"id"` FirstName string`json:"first name"` MiddleName string`json:"middle name"` LastName string`json:"last name"` Email string`json:"email"` Password string`json:"-"`}
sqlboiler
You do not have to write much boilerplate to get a single item in sqlboiler. FindUser() argument
types are laid out specifically and the name helps into inferring what you need to supply.
Discounting error handling, it also a one-liner for ent.
u, err := r.db.User.Query().Where(user.ID(uint(userID))).First(ctx)
if err !=nil {
if gen.IsNotFound(err) {
return&gen.User{}, &db.Err{Msg: message.ErrRecordNotFound.Error(), Status: http.StatusNotFound}
}
log.Println(err)
return&gen.User{}, &db.Err{Msg: message.ErrInternalError.Error(), Status: http.StatusInternalServerError}
}
return u, nil
No copying to a new struct is needed since the password field is already made sensitive.
Getting a specific resource is a lot simpler than creating and listing them. All of them have their own
convenience method to scan database result to a Go struct.
In sqlx, listing an array of json looks similar to how sqlx creates a record. We use QueryxContext()
method that allows us to use StructScan() that makes scanning easier - no longer do you need to
type out all the field names.
const list = "SELECT * FROM users LIMIT 30 OFFSET 0;"func (r *database) List(ctx context.Context) (users []*UserResponse, err error) {
rows, err := r.db.QueryxContext(ctx, List)
if err !=nil {
returnnil, fmt.Errorf("error retrieving user records")
}
for rows.Next() {
var u userDB
err = rows.StructScan(&u)
if err !=nil {
returnnil, errors.New("db scanning error")
}
users = append(users, &UserResponse{
ID: u.ID,
FirstName: u.FirstName,
MiddleName: u.MiddleName.String,
LastName: u.LastName,
Email: u.Email,
UpdatedAt: u.UpdatedAt.String(),
})
}
return users, nil}
We collect the records as an array of UserResponse. This is because we do not want to expose the
password to the client.
This is true for returning users’ details, but sometimes you want to play
around with that protected field and only converting to UserResponse at handler layer, just before
returning that data to client. So instead of scanning to UserResponse struct, you should scan into
a struct with a Password field.
text[] data type
Since we work with postgres, we should also explore into one of its data type not present in mysql,
sql server, and oracle, array. We basically want to select tags of a specific user.
In the users schema, we have a text[] data type. It is interesting to see how we can select this
kind of data type.
To make it simple, we want to select its tags based on user ID:
SELECT tags FROM users WHERE users.id = $1
The result coming from the database for the column tags is now in binary instead of text (data
type is text[]). So in the code, data coming from database is in bytes (bytes is an alias for
uint8) but, we are scanning the result into a slice of string. So, we cannot simply scan using
&values because it is a mismatched data type - *[]string vs []uint8. What we do instead is to
use pq.Array(), which helps us into choosing the optimum data type for scanning.
That’s it. Only 3 lines, or 1 line, however you may want to look at it!
Since in the query we omit selecting the password, this prevents that field from leaking. So I
cheated a bit. If you require to play around with the password value, you will need to remember
to transform to a userResponse struct.
And since we now use methods instead of writing raw SQL, it suddenly becomes easier to create
dynamic query as you will see in the dynamic list
section.
Squirrel requires a finisher which will then return a sql rows and a possible error.
gorm
Listing user records looks deceptively simple. It looks like a normal query builder but there are
many ways it can go wrong. Have a look at this:
At a glance, there isn’t anything obviously wrong with the code above. In fact, if you run the code,
you will get a result with no error. However, there are subtle mistakes or quirks compared to the
query builder we have seen so far. You need to:
Tell the model you are asking for a list with []*User
But you can pass in a *User instead, and it will return 1 record.
The Find() method accepts an interface, so it will happily accept anything whether a slice or a single item (or something totally different).
var user User
err = r.db.WithContext(ctx).Find(&user).Limit(30).Error` <- passing `&user` returns one record.
You are doing an implied select operation, selecting all fields using *. If you want to select just a few fields:
Gorm does not provide a constant for each of database field’s name.
Instead of passing a struct, you pass a slice of string. Need to refer documentation because
method’s signature does not help - its signature is saying it accepts a query interface and
a list of arguments?! Select(query interface{}, args ...interface{}) (tx *DB)
You can also pass strings to Select() and it will give the same output even if it doesn’t conform to method signature and the IDE telling you are wrong.
// this
err := r.db.WithContext(ctx).Select([]string{"id", "first_name", "last_name"}).Find(&users).Limit(30).Error
// also works, look at below screenshot
err := r.db.WithContext(ctx).Select("id", "first_name", "last_name").Find(&users).Limit(30).Error
first argument is a query and the rest are args as expected, but it still returns the same result
If you want to do a ‘WHERE’ clause, you can use a Where() method. But you can also put the
User struct as a second parameter to the Find() method!
On the upside, you do not have to worry about leaking password to client because in the User
model, we set the json struct tag to -. Thus, no copying to a new struct is needed.
type User struct {
ID uint FirstName string MiddleName string LastName string Email string Password string`json:"-"`}
WithContext() is optional.
In Go, we tend to use context for cancellation, deadline and timeout. WithContext() is optional
which means we can lose these features by forgetting to chain it.
‘Finisher’ method is optional, and the order is important.
Finisher methods like Find(), First(), and Save() are optional when building a query.
err = r.db.
WithContext(ctx).
Find(&users). // <- called to early! Subsequent `Limit()` will be ignored, so it'll return all records in the database.
Limit(30).
Error
If you do not include them, Gorm will do nothing, and thus will not return either an error or any result.
The order when you call those finishers are also important. If you call it too early, Gorm will
ignore subsequent methods. For example, the examples I have shown so far has Limit() called
after Find(). That means, Gorm will ignore anything after Find() finisher and return all records
in the database! If you scroll back to point number 2, you may have noticed that in the code
screenshot, it returns 32 records (everything I have) in the database instead of returning 10
records. Unlike other libraries, calling other methods after a finisher is invalid.
Using the API requires a bit of discovery and careful reading of its documentation since its method
signature does not help.
In my opinion, thanks to many pitfalls Gorm has, it is certainly the hardest to use so far.
sqlboiler
In sqlboiler, listing a resource is done with All() method. To limit the records, you have to use
the query modifier (qm) package.
sqlboiler is a lot nicer than gorm for sure. You avoid any magic string which makes it more type-safe
compared to gorm. It will complain if you do not provide a context. You cannot ignore the error
return unless you explicit does so with underscore, _.
But, a fair bit of discovering is needed to know all qm helpers - {{model}}Where, and
{{model}}Columns patterns. For example, in my opinion it is more intuitive to use
models.Users().Limit(30).All(ctx, r.db) to limit instead of using a query modifier.
Password field is automatically generated, and you cannot set it to private. So you will need to
remember to perform data transform object (DTO) before returning to client - as you should always do.
ent
In ent, it is similar to sqlc, but we do not have to worry about copying the fields to a new struct
without password because that field is already protected (or in ent’s lingo, sensitive) from json
marshalling.
Limiting the number of records is done with Limit() builder method instead of using another
package like sqlboiler.
The advantage with ent, like gorm, is we can continue using that sensitive field but, we are assured
it will not be leaked to client. However, the API is a lot more intuitive and looks more like an ORM
in other language.
In conclusion, sqlc and ent were the easiest to use. sqlc is pretty much a one-liner once you have
written the correct sql query. ent gives the best API for query building. In both, we do not have to
worry about leaking the password field to the client. However, you cannot customise sqlc queries
at runtime.
While the examples here are very simple - list 30 records of the users at most - real world use case
are usually more complex. We will see how we eager load a relationship and in the dynamic list
section, we will explore at parsing query param for sorting and pagination.
Before going to those sections, we will finish of WHERE IN(?) and the rest of CRUD stuff.
Comparison between popular go libraries and ORM for database access layer.
In this post, we compare and contrast how these libraries and ORMs handle a record insertion.
As a standard approach to all example in these blog series, our controller accepts and parses
client request to a custom ‘request’ struct, and if required, parses query parameter(s). We hash the
password and together with CreateUserRequest we try to insert to database. Then the struct
is passed down to our data access layer.
type CreateUserRequest struct {
ID uint`json:"id"` FirstName string`json:"first_name"` MiddleName string`json:"middle_name"` LastName string`json:"last_name"` Email string`json:"email"` Password string`json:"password"`}
All CRUD operations are done in crud.go files of each of the libraries and ORMs directories.
You should perform validation in production. In our case, we skip this part to keep this blog series
focused.
In sqlx, we call ExecContext() to perform an insertion. It requires a context, an SQL query and
its arguments. In this example, we use QueryRowContext() which not only performs the insertion,
but we can obtain the newly inserted data thanks to using postgres’ RETURNING clause.
Care needs to be taken that the order of your arguments must match with your sql query. Any order
mismatch will cause an error. That means if column order has been changed, you will have to check
your Go code manually since this library or compiler does not warn you.
Similar story in the scanning database values into Go code. Scanning is tedious because not only you
need to get the order right, but you also need to know what the required fields are required to be
scanned by looking at the sql query. For example, middle_name is nullable, which means the value
can be either null or a string. For that reason, we need to scan the value into something like
sql.NullString.
type userDB struct {
ID uint`db:"id"` FirstName string`db:"first_name"` MiddleName sql.NullString `db:"middle_name"` LastName string`db:"last_name"` Email string`db:"email"` Password string`db:"password"`}
Also, the userDB struct is a new custom struct created
specifically for scanning the values into a Go struct. We use a db struct tag for each field. It
can potentially use reflection to know which database column maps to
which struct field by looking at the struct tag. The value you give to the db struct tag must
match with the field name you have in the database.
Note that we are returning userDB type that has a password field. We should not expose this
field to the client. We have already omitted scanning password field into userDB struct. But
a good discipline is to have a struct for each db scanning and client response. In this
example, we make a data transform object (DTO) from a db struct by copying into a UserResponse
struct that does not have a password field.
To use sqlc, we put all of sql queries in an .sql file and tell sqlc to generate from that file.
You only need to give each of the query a name and how many records it returns. For example:
The name that we want our method is CreateUser and it only returns one record. Once sqlc.yaml
is properly configured, we run
$ sqlc generate
And the generated codes are created in query.sql.go.
func (q *Queries) CreateUser(ctx context.Context, arg CreateUserParams) (User, error) {
row := q.db.QueryRowContext(ctx, createUser,
arg.FirstName,
arg.MiddleName,
arg.LastName,
arg.Email,
arg.Password,
arg.FavouriteColour,
)
var i User
err := row.Scan(
&i.ID,
&i.FirstName,
&i.MiddleName,
&i.LastName,
&i.Email,
&i.Password,
&i.FavouriteColour,
)
return i, err
}
sqlc automatically generates Go code containing the arguments to QueryRowContext() as well
as the scanning. To use, we call this CreateUser() method with the provided struct to fill in
the parameters CreateUserParams.
Notice how the order no longer matters because we are filling in the parameters in a struct. But
with any schema change (column ordering, creation, etc), you will have to re-run sqlc generate.
Note that CreateUserParams generated by sqlc requires the optional field middle_name to be
declared in sql.NullString and we have to tell that it is valid! The Go documentation in
database/sql package says that
NullString implements the Scanner interface so
it can be used as a scan destination:
var s NullString
err := db.QueryRow("SELECT name FROM foo WHERE id=?", id).Scan(&s)
...if s.Valid {
// use s.String
} else {
// NULL value
}
What I often do is I use my IDE to automatically set all required fields I need to fill in:
Overall, sqlc does look less error-prone than sqlx. You still need to copy over the fields to
UserResponse like sqlx to prevent leakage of password field as usual.
squirrel
Writing squirrel feels like writing a hand-rolled query but uses Go methods. The methods reflect
their sql counterpart. However, you still need to write magic strings for columns and tables names.
Scanning order is also important. There are no generated helper struct to insert the parameters.
Creating using gorm looks similar to sqlx. But chaining the Error is not compulsory, WithContext() too.
So it is easy forget to call it.
In Gorm, we need to declare a struct for Gorm to scan into. No special struct tag is needed. Here,
only json struct tag is used to do two things; 1. decide the json key for each field and; 2. Do not
serialize Password field ( annotated with a - struct tag) and return to user.
This User struct
is also the same struct used for gorm’s auto migration. Since Gorm tends to use a single model struct
for everything, it is important to keep this password field from leaking to the client. Of course,
a good practice is to have separate struct for database and client response so that we can control
what the client sees.
type User struct {
ID int`json:"id"` FirstName string`json:"first_name"` MiddleName string`json:"middle_name"` LastName string`json:"last_name"` Email string`json:"email"` Password string`json:"-"`}
Gorm has a feature of adding a gorm.Model field that automates creation of ID,created_at, and
updated_at timestamps.
Inserting a record also looks similar to sqlc and gorm. Like sqlx, you need to use the user struct
(models.User) generated by the library. Like sqlc, assigning middle_name is awkward.
Here the third parameter of Insert() is something you can choose whether you want
sqlboiler to infer which column to insert, or manually set them:
Infer the column list, but ensure these columns are not inserted
Greylist
Infer the column list, but ensure these columns are inserted
Then, you use Insert() method of the User struct to perform an insertion.
You do not have to worry if sqlboiler does a RETURNING clause or not because it will fill in
the ID to the User struct for you.
Since Password field is not annotated with - struct tag, you will need to copy over the fields
to UserResponse struct like sqlx above.
ent
Instead of using a struct to set the values, ent uses a builder pattern. The choice to give an
empty string or null to middle_name depends on which method you choose.
func (r *database) Create(ctx context.Context, request db.CreateUserRequest, hash string) (*gen.User, error) {
saved, err := r.db.User.Create().
SetFirstName(request.FirstName).
SetNillableMiddleName(&request.MiddleName). // Does not insert anything to this column
//SetMiddleName(request.MiddleName). // Inserts empty string
SetLastName(request.LastName).
SetEmail(request.Email).
SetPassword(hash).
Save(ctx)
if err !=nil {
returnnil, fmt.Errorf("error saving user")
}
return saved, nil}
Ent uses a code-first approach (defines models and relationships using code as opposed to using
database as a single source of truth), so before using ent, we need to define the fields and
relationships(edges) with a methods.
Notice that password field is made sensitive, which means, the json output will not contain the
password. This means, you no longer have to worry about forgetting to copy the values to a new
struct just to hide password value.
Like all ent output, there is an additional field called edges added to client response. There
isn’t a way to turn off edges key from ent’s output, so you will need to transform to a new
response struct before returning to the client.
If table columns order keeps changing, then both sqlx and squirrel will be a nightmare because
scanning order needs to keep track with your database schema. Others do not suffer this issue.
All of them return an updated model with its ID after insertion which is great especially when we
want to return that record back to client.
There is an awkward setting value to a nullable column in both sqlc and sqlboiler request.MiddleName) != ""
sqlx, squirrel and gorm transparently handles this nil value.
Ent gives an option to set middle_name column to be nullable (SetNillableMiddleName(&request.MiddleName)) or
empty string (SetMiddleName(request.MiddleName)). There’s a bit of a learning curve especially to
knowing how to define the relationships between the tables.
Comparison between popular go libraries and ORM for database access layer.
There are a multitude of ways to interact with a SQL database in Go. The most obvious path is to
simply use database/sql package from the standard library plus a database driver. It is easy to use
and can be sufficient to meet all of your requirements. Using raw SQL directly means you can
leverage what you already know, SQL, and craft complex queries in many ways that an ORM may not
support. However, retrieving (scanning in Go lingo) results from database is verbose and can be
tedious - something that alternatives shine. This series of posts will show how different popular
libraries and ORM available of Go are used. They are sqlx, sqlc,
squirrel, gorm,
sqlboiler, and ent.
This post is only an introduction to these packages. I will briefly touch on each of them on what
it is and database operations that we are going to test with. In the next posts, we will cover
common use cases for each. Full source code is available at https://github.com/gmhafiz/golang-database-library-orm-example.
This is the most popular library for purists and often the most recommended one. It requires you to write your own SQL queries,
yet it has some convenience method to make scanning results to a Go struct easier.
sqlc
Like sqlx, you also write your own SQL, but sqlc generates a lot of Go boilerplate that you
would otherwise have to write if you were using sqlx. You only need to install sqlc once, and set a
config file. Then for each of your query, it needs to be annotated with a desired method name, and
the expected number of record - one or many. Everytime you run sqlc generate, it will create
Go methods from your sql queries that both queries (or execute), and scans.
squirrel
update(12 July 2022)
Squirrel is an sql query builder. Unlike both sqlx and sqlx, you do not need to manually write
sql queries. Instead, squirrel provides helper functions like Select(),
Where(), From(), as well assq.Eq{} for equality.
gorm
Gorm is among the earliest and the most popular ORM library that still actively maintained till today.
Instead of writing SQL queries, you write a Go struct with correct struct tags for each of your
tables and gorm will handle creation of the tables in the database. Further tuning is done using
struct tags. Among many ORMs, this is an example of code-first library. Query building is fairly
decent - a lot of (type) guessing is needed because of interface[} usage everywhere, and order or
methods are important. Scanning database results however is easy thanks to
reflection.
sqlboiler
Sqlboiler is opposite in the approach from gorm where it is a database-first approach. You provide
a config file (sqlboiler.toml) with your database credentials, and it will learn the schema and
generate custom-tailored ORM specific to your database, including all structs for all CRUD
operations. As a result, you can rely and on generated methods and constants which mean you hardly
need any magic strings.
Like sqlc, you must remember to re-generate to update this ORM. It this is a
good idea to put this command in your build pipeline.
ent
Ent is the newest entry in the list, used to be part of facebook community repository, now it is in
its own. It has the most features and the easiest ORM to use.
It uses a code-first approach. You write Go methods to define a table fields and its relationships
which can take some learning curve. Like both sqlc and sqlboiler, you must remember to re-generate
the ORM.
You get a more statically typed ORM compared to sqlboiler. Handling one-to-many or
many-to-many is easy. Query building however, can be a lot more verbose.
Spectrum
With the many libraries and ORMs available for Go, it can be hard to visualise where do these
tools stand from, ranging from pure sql (hand-rolled) queries to traditional ORM.
Above is an illustration comparing the mentioned libraries in a spectrum between close-to-the-metal,
or raw SQL queries to full ORM capabilities.
Both sqlx and sqlc are very close to the standard database/sql library with sqlc inches ahead.
Anything you give into these libraries are sent to the wire almost without any changes.
I put squirrel in the middle where it is the most builder-ish library amongst all.
Ent has to be the most ORM-ish amongst all, so it is placed on the furthest right. Sqlboiler is not
too far off because it still requires some raw SQL in some cases.
That leaves gorm, a weak-ish ORM (full of interface{} everywhere!) but with excellent query
builder capabilities.
Operations
The blog series will demonstrate several operations that I think most applicable to a common CRUD
Go api. The operations that we will compare and contrast are:
Simple CRUD operation
We test two things:
We will look at how easy/hard it is for create, read, update, and delete operations for users table.
Given an array of IDs, how it deals with WHERE IN(?, ...)
1-to-Many queries
We list all addresses of a particular country.
Many-to-many queries
Since a user can have many addresses, and an address can hold many user, we look at how we can
list them all.
Dynamic list filter from query parameter
Pagination is a common use case. We will see how these libraries and ORMs deal with limiting the
number of records being returned.
Say the client want to sort our users list by last name in
ascending order. An example url query will look like this:
We may want to return a list with specified property. For example, we want to return users
whose last name is ‘Campbell’. Going further, we may want to combine several properties, say
last name is ‘Campbell’ and favourite colour is ‘blue’. A URL can look like this:
Nowadays, safety against sql injections should not be an issue to any developers. Correct use of
placeholders and prepared statement protect against this security vulnerabilities. Sanitisation of
user input will also help in preventing this. Yet, we will still look over how they behave when we
purposely try to give malicious user input.
Schema
We have a simple schema but still allows us to query one-to-many and many-to-many relationships.
A country has many addresses
A user can have many addresses; and an address can hold many users. These tables are joined by
the pivot user_addresses table.
To make things interesting, we have nullable columns denoted with ? in the erd above, so we can
see how serialization to json works.
The enum is an interesting case especially when it comes to libraries or ORM that generate codes,
will they create type for that enum or not? If yes, how do they handle it when Go does not have a
strong enum support.
We will also see how we prevent password field from being serialized and sent to the client.
That is something that we do not want to leak. Generally, we want to make a data transform so that
we can control what the client receives. This transform technique also prevents accidental leaking
of data when we create new columns in a table. Nevertheless, we will see if these libraries or ORMs
has any mechanism to prevent such sensitive data leak.
Files
In the repository, each library/orm are placed inside /db folder. Each folder under /db contains
a file for each section in this blog series such as crud.go for CRUD stuff. Each folder contains
helper functions such as database.go for library initialisation, handler.go for controller/handler
that receives a request, transform into internal request struct, call data access layer, then finally
transform into a JSON response.
Outside of this folder, we have several files shared by all. A model.go contains any database structs.
Any query param or JSON payload are parsed by filter.go.
A quick note on the handler layer - real
world API must have validation performed on client requests; but is omitted for simplicity.
First up, we see how common CRUD operations are done, starting with Create: