Golang Database Library Orm Example - Many to Many

Comparison between popular go libraries and ORM for database access layer.
January 2, 2022

In this post, we will look at listing users with all their addresses. Both users and addresses tables are joined with a pivot table, user_addresses, which holds the foreign key of each of users and address table primary key.

Database

As the erd suggests, a user can have many addresses, and an address can belong to many users.

Table of Contents

This is part of a series of blog posts including:

The runnable code is available in the repository https://github.com/gmhafiz/golang-database-library-orm-example

There are many ways to do this. The most straight away method (what most ORM does) is to perform a query on each of the related tables.

  1. Get 30 records of users
  2. For each user, get the IDs of the address from the pivot table
  3. Get all records of addresses from the IDs retrieved in the previous step.
  4. Begin attaching address to users.

ORMs often do this automatically for you. Writing sql manually means you have to do all the steps above.

The other way is to perform left joins on the tables, starting from users, and ends with addresses. Left join is important because if we perform an inner or right join instead, if a user do not have an address, that user record will not be returned. While doing left joins, alias, especially on IDs must be set to prevent name conflict. In my opinion, this method is a poor fit when doing a many-to-many operation. The following method is better.

Finally, it is possible to use array_agg (for postgres) function or group_concat for mysql/mariadb. You write the least amount of Go code but requires you to have a sufficient amount of SQL knowledge to craft the correct query.

sqlx

We will take a look at two methods, method 1 which by which is by doing a query on each table, and the third method, which is relying on a single query. I will omit the second method because it is much more complicated than method 1.

Method 1: The Long Way

If you look at the full source code, it is 2 pages long. So, I will only include relevant snippets here.

Starting with sql queries, we try to limit number of users to 30.

-- #1
SELECT u.id, u.first_name, u.middle_name, u.last_name, u.email 
FROM "users" u 
LIMIT 30;

To find out the address of each user, we get that relationship from the pivot table. For an array of user IDs, we can always do a SELECT DISTINCT u.id FROM users u LIMIT 30;. But this is an additional query while we can extract the IDs from query #1.

-- #2
SELECT DISTINCT ua.user_id AS user_id, ua.address_id AS address_id 
FROM "addresses" a 
    LEFT JOIN "user_addresses" ua ON a.id = ua.address_id 
WHERE ua.user_id IN (?);

We then get all address IDs, pass into the following query, and execute.

-- #3
SELECT a.* 
FROM addresses a
WHERE a.id IN (?);

Notice that the two last queries uses ? instead of $1. It relates to sqlx’s ability (or inability) to properly turn array values for IN operator. So what w can do is to use the In() function to expand into a slice of arguments.

query, args, err := sqlx.In(UsersAddress, userIDs)

sqlx.In accepts an array of interface but executing QueryContext() expects int8.

Only using ? placeholder sqlx won’t complain about types.

Once we perform the first sql query, we can the result into a custom struct ([]*UserResponseWithAddressesSqlx), one with address field.

type UserResponseWithAddressesSqlx struct {
	ID         uint                `json:"id,omitempty"`
	FirstName  string              `json:"first_name"`
	MiddleName string              `json:"middle_name,omitempty"`
	LastName   string              `json:"last_name"`
	Email      string              `json:"email"`
	Address    []AddressForCountry `json:"address"`
}

var all []*UserResponseWithAddressesSqlx
for users.Next() {
    var u userDB
    if err := users.Scan(&u.ID, &u.FirstName, &u.MiddleName, &u.LastName, &u.Email); err != nil {
        return nil, fmt.Errorf("db scanning error")
    }
    all = append(all, &UserResponseWithAddressesSqlx{
        ID:         u.ID,
        FirstName:  u.FirstName,
        MiddleName: u.MiddleName.String,
        LastName:   u.LastName,
        Email:      u.Email,
    })
}

We retrieve all user IDs and get its associated addresses with the second sql query. We need to scan and hold the data with another struct so that wen use it to find the associations later.

type userAddress struct {
	UserID    int `db:"user_id"`
	AddressID int `db:"address_id"`
}

Then get all address IDs and execute the third sql query. Scan the results into a new address struct - it needs the db struct tag.

The final step is to loop through all array of structs and attach their associations

for _, u := range uas {
    for _, user := range all {
        if u.UserID == int(user.ID) {
            for _, addr := range allAddresses {
                if addr.ID == uint(u.AddressID) {
                    user.Address = append(user.Address, AddressForCountry{
                        ID:       addr.ID,
                        Line1:    addr.Line1,
                        Line2:    addr.Line2.String,
                        Postcode: addr.Postcode.Int32,
                        City:     addr.City.String,
                        State:    addr.State.String,
                    })
                }
            }
        }
    }
}

Note that if a user record do not have any address, the value for address key will be null, instead of an empty array.

{
    "id": 3,
    "first_name": "Jake",
    "last_name": "Doe",
    "email": "jake@example.com",
    "address": null
}

So you need to remember to initialize the addresses for a user

all = append(all, &UserResponseWithAddressesSqlx{
        ID:         u.ID,
        FirstName:  u.FirstName,
        MiddleName: u.MiddleName.String,
        LastName:   u.LastName,
        Email:      u.Email,
        Address:    []*AddressForCountry{}, // instead of leaving it empty
    })

In my opinion, this is too much code for just a simple two tables plus a pivot table to deal with. Imagine having to eager-load 7 tables deep. That is a nightmare.

Method 3: The SQL way

So let us take a look at using the third method, which is using both array_agg and row_to_json and see if it is anything simpler. If you take a look at the SQL query below, it looks similar to the query we have done for 1-many, but we are now joining three tables:

SELECT u.id,
       u.first_name,
       u.middle_name,
       u.last_name,
       u.email,
       u.favourite_colour,
       array_to_json(array_agg(row_to_json(a.*))) AS addresses
FROM addresses a
         JOIN user_addresses ua ON ua.address_id = a.id
         JOIN users u on u.id = ua.user_id
GROUP BY u.id;

The relevant bits are that the JOIN operation starts from the ’lower’ aggregated records (addresses table) and we work our way up to the users table.

With array_to_json(array_agg(row_to_json(a.*))), we turn each row in addresses as a json array. After performing the left joins, we select user columns we want and finally group them by user ids. The result is we only get users that have at least one address.

[
  {
    "id": 1,
    "first_name": "John PATCHED",
    "middle_name": "middle",
    "last_name": "Does",
    "email": "jogn-doe@example.com",
    "favourite_colour": "green",
    "addresses": [
      {
        "id": 1,
        "line_1": "Sydney Opera House",
        "line_2": "Bennelong Point",
        "postcode": 2000,
        "city": "Sydney",
        "state": "NSW",
        "country_id": 1
      }
    ]
  },
  {
    "id": 2,
    "first_name": "Jane",
    "middle_name": null,
    "last_name": "Doe",
    "email": "jane@example.com",
    "favourite_colour": "green",
    "addresses": [
      {
        "id": 2,
        "line_1": "Petronas Twin Towers",
        "line_2": "",
        "postcode": 50088,
        "city": "Kuala Lumpur",
        "state": "Wilayah Persekutuan",
        "country_id": 2
      },
      {
        "id": 1,
        "line_1": "Sydney Opera House",
        "line_2": "Bennelong Point",
        "postcode": 2000,
        "city": "Sydney",
        "state": "NSW",
        "country_id": 1
      }
    ]
  }
]

Notice that this SQL query omits users that do not have any addresses. To include them, simply change the JOIN above to RIGHT JOIN.

One caveat, just like we showed in one-to-many example is that we need to parse the json array from database back to a Go struct if we want to play around with it. If you don’t we can simply scan into json.RawMessage.

sqlc

We already know the exact SQL query we need to perform this M2M query so, it is similar to method 3 of sqlx, but with less boilerplate. We do not need to manually scan database results to a struct because they are all generated for you, but you still need to extract IDs from each. For method 1, it only takes a page long to perform many-to-many compared to almost 2 pages long for sqlx.

You still need to manually attach address to a user like sqlx. Also, you need to remember to initialize addresses for a user like above instead of leaving it empty.

Upon sqlc compilation, it generates a method that you can simply use:

dbResponse, err := r.db.ListM2MOneQuery(ctx)

However, since it outputs a struct consisting of sql.Nullstring for middle_name and an enum for favourite_colour, we want to convert them into another struct

resp := make([]*db.UserResponseWithAddressesSqlxSingleQuery, 0)

	for _, dbRow := range dbResponse {
		row := &db.UserResponseWithAddressesSqlxSingleQuery{
			ID:              uint(dbRow.ID),
			FirstName:       dbRow.FirstName,
			MiddleName:      dbRow.MiddleName.String,
			LastName:        dbRow.LastName,
			Email:           dbRow.Email,
			FavouriteColour: string(dbRow.FavouriteColour),
			Address:         dbRow.Addresses,
		}
		resp = append(resp, row)
	}

squirrel

Like sqlx and sqlc, method 3 is going to have far fewer boilerplate than method 1. The long method 3 is uninteresting because we have already shown how to do it with sqlx. Let us see how this query builder fare.

rows, err := r.db.
	Select(
		"users.id",
		"users.first_name",
		"users.middle_name",
		"users.last_name",
		"users.email",
		"users.favourite_colour",
		"array_to_json(array_agg(row_to_json(a.*))) AS addresses",
	).
	From("addresses AS a").
	InnerJoin("user_addresses ON user_addresses.address_id = a.id").
	InnerJoin("users ON users.id = user_addresses.user_id").
	GroupBy("users.id").
	QueryContext(ctx)

One look, and you can see that this is almost a one-to-one translation from the SQL query we wanted into Go code. There are a few differences to the approach we made in the 1-to-many section.

First of all, in the 1-to-many section, the whole database response is in JSON format. Here, only one column is in JSON format, namely the last addresses column.

At first, I wanted to make array_to_json(array_agg(row_to_json(a.*))) AS addresses statement into its own select using s := r.db.Select("array_to_json(array_agg(row_to_json(a.*)))") and then chain with FromSelect(s, "addresses") but that did not work.

The inner join argument here is interesting. We pass one string only, unlike other ORM in other languages where they can be separated with commas. Here it can take optional arguments. That means we can do something like this…

InnerJoin("user_addresses ON user_addresses.address_id = a.id AND a.id = ?", 7).

… and it will only return records on the user_addresses join where address ID is equal to 7.

The trickiest part is the scanning, like in one-to-many section, we have to construct a Go struct specifically for this query. The first six columns for users table are clear. addresses on the other hand is actually a JSON response - so we need to implement the Scanner interface.

type CustomM2mStruct struct {
	Id              int             `json:"id" db:"id"`
	FirstName       string          `json:"first_name" db:"first_name"`
	MiddleName      any             `json:"middle_name" db:"middle_name"`
	LastName        string          `json:"last_name" db:"last_name"`
	Email           string          `json:"email" db:"email"`
	FavouriteColour string          `json:"favourite_colour" db:"favourite_colour"`
	Addresses       json.RawMessage `json:"addresses" db:"addresses"`
}

func (m *CustomM2mStruct) Scan(src interface{}) error {
	val := src.([]uint8)
	return json.Unmarshal(val, &m)
}

Scanning is just looping and appending.

var items []*CustomM2mStruct
for rows.Next() {
	var rowToJson CustomM2mStruct
	if err := rows.Scan(
		&rowToJson.Id,
		&rowToJson.FirstName,
		&rowToJson.MiddleName,
		&rowToJson.LastName,
		&rowToJson.Email,
		&rowToJson.FavouriteColour,
		&rowToJson.Addresses,
	); err != nil {
		return nil, err
	}
	items = append(items, &rowToJson)
}

gorm

Performing many-to-many here is very similar to its one-to-many. We supply a slice to User pointer and gorm will use reflection to infer what we want.

func (r *repo) ListM2M(ctx context.Context) ([]*User, error) {
	var users []*User

    err := r.db.WithContext(ctx).
        Preload("Addresses").
        Find(&users).
        Select("*").
        Limit(30).
        Error
	if err != nil {
		return nil, fmt.Errorf("error loading countries: %w", err)
	}

	return users, nil
}

Unlike sqlx and sqlc, a user without address will be marshalled to an empty array

  ...
  {
    "id": 3,
    "first_name": "Jake",
    "middle_name": "",
    "last_name": "Doe",
    "email": "jake@example.com",
    "address": []
  }
]

The amount of code needed to be written is also greatly reduced compared to sqlx, sqlc, and squirrel so far. The key is to use Preload() method. What is not obvious is what magic string you need to supply to it. The answer is the name of the field itself - it has to match. For example, User struct has Addresses in its field, so we supply the string Addresses inside Preload().

type User struct {
    ...
	Addresses []Address `json:"address" gorm:"many2many:user_addresses;"`
}

We also need to tell Gorm the pivot table that is used to link between users and addresses table. It is done by using a struct tag gorm:"many2many:user_addresses;".

Gorm has no code generation like sqlboiler and ent. Instead, it relies on structs (and struct tags) as its point of reference.

sqlboiler

Unfortunately I cannot find an easy way to eager load many-to-many relationships with sqlboiler. Current recommendation is to fall back to raw sql queries.

ent

Like its one-to-many operation, ent shines at loading relationships.

func (r *database) ListM2M(ctx context.Context) ([]*gen.User, error) {
	return r.db.User.Query().
		Limit(30).
		WithAddresses().
		All(ctx)
}

We use the same pattern as what we did with one-to-many - {{ With{{Model}} }} pattern. If a user has no address, the edges key will only contain an empty object.

[
  {
    "id": 3,
    "first_name": "Jake",
    "last_name": "Doe",
    "email": "jake@example.com",
    "edges": {}
  },
  {
    "id": 1,
    "first_name": "John",
    "last_name": "Doe",
    "email": "john@example.com",
    "edges": {
      "addresses": [
        {
          "id": 1,
          ...

Ent makes 3 separate sql queries, just like what we have done for sqlx, and gorm.

In my opinion, the API is very straightforward and there isn’t a way to get it wrong. Everything is statically typed with no interface{} anywhere which makes it predictable. Only downside I can think of is you need to spend time getting the relationship in the model schemas correctly.

In conclusion, using raw sql is too much work when it comes to loading many-to-many relationships. Imagine if we want to load 7 tables deep, it’d be a nightmare. While using sqlc improves upon sqlx by reducing a good deal of boilerplate, it still pales in comparison to using an ORM. You need to be careful with struct tag with Gorm, and it was not obvious that you had to use a field name for Preload(). Ent does not have Gorm’s issues as everything is typed. You will need to learn to using With... pattern though which isn’t a big deal.

Next > Golang Database Library ORM Example - Dynamic List