In the realm of modern web applications, database performance is not just a technical requirement but a cornerstone of a seamless user experience. Among the arsenal of tools available for optimizing database queries, materialized views stand out for their efficiency and effectiveness. Let’s delve into how materialized views can enhance query performance in a Rails application, illustrated with a practical example.
Understanding Materialized Views
Materialized views are special database objects that cache the result of a query physically. Unlike standard views, which re-execute the query upon each access, materialized views store the result set, thus speeding up data retrieval significantly.
Integrating Materialized Views in Rails
Incorporating materialized views into a Rails project can be done directly using SQL in a migration or through gems like scenic
for easier management. Here’s how to create a materialized view with a Rails migration:
class CreateMaterializedView < ActiveRecord::Migration[6.0]
def up
execute <<-SQL
CREATE MATERIALIZED VIEW my_materialized_view AS
SELECT * FROM my_table;
SQL
end
def down
execute "DROP MATERIALIZED VIEW IF EXISTS my_materialized_view;"
end
end
The Problem
Imagine a Rails application with a feature that reports aggregated data such as total sales, order counts, and average order values for each shop. Computing these aggregates on the fly can become increasingly demanding as data scales. For illustration, consider a test scenario with 100 shops and over 2 million orders.
To set up a similar environment, follow these steps:
- Create a new rails project with:
rails new shop_reporting_app --database=postgresql
- Create a shop model:
rails g model Shop name
- Create an order model:
rails g model Order shop:references total_sales:decimal order_value:decimal
- Run the migrations:
rails db:migrate
- Add the
faker
gem to the gemfile:
gem 'faker' # group test/development
Place this content on the seeds file, so we can populate the database with a lot of values:
return if Rails.env.production?
if Shop.count == 0
puts "populating shops..."
100.times do |i|
Shop.create(name: Faker::Company.name)
puts "shop #{i} created"
end
puts "shops populated"
end
if Order.count == 0
puts "populating orders..."
shops = Shop.all
50_000.times do |i| # change this 50_000 according to the amount of data you want for testing
shop = shops.sample
100.times do |j|
shop.orders.create(
total_sales: Faker::Number.decimal(l_digits: 3, r_digits: 2),
order_value: Faker::Number.decimal(l_digits: 3, r_digits: 2)
)
puts "order #{i} - #{j} created"
end
end
puts "orders populated"
end
- Run the seeds:
rails db:seed
. This may take some time, depending on how much data you want for test.
After this steps you should have a project that you can use to test.
Implementing the Solution
To address this issue, we can use materialized views to precompute and store the aggregated data. We can then query the materialized view instead of performing expensive calculations on the fly.
Notice that, there is a downside here, which would be having to refresh this data constantly if you want it to be strictly equal to the real time data. You can do that refreshing the view every time a new variable that affects calculations enters in. Or, the scenario that I think would be more applicable here, would be if we didn't need the data to be exactly what we have now. For example, if we can have approximate values, or if we can have a "last updated in 4 hours" label to the report.
Here's how you can create a materialized view for this scenario:
- generate a new migration:
rails g migration create_shop_reporting_materialized_view
- Edit the migration with the following:
class CreateShopReportingMaterializedView < ActiveRecord::Migration[6.0]
def up
execute <<-SQL
CREATE MATERIALIZED VIEW shop_reporting_data AS
SELECT shop_id, SUM(total_sales) AS total_sales, COUNT(*) AS total_orders,
AVG(order_value) AS avg_order_value
FROM orders
GROUP BY shop_id;
SQL
end
def down
execute "DROP MATERIALIZED VIEW IF EXISTS shop_reporting_data;"
end
end
This will run a query to create a materialized view with the reporting query that gets the information that we need.
- Run the migrations:
rails db:migrate
Now we can query directly from this materialized view, performing this SQL:
SELECT * FROM shop_reporting_data;
To refresh this materialized view, you can run:
REFRESH MATERIALIZED VIEW shop_reporting_data;
You can choose where to call this refresh in your application. I would suggest to do that in a worker that runs every X minutes / hours, if you can have this kind of information a little bit off the realtime one. This will avoid a lot of calculations on your workers to refresh the view periodically, and still will give you an almost up to date data.
Query Performance Comparison
Let's compare the performance of querying the orders table directly versus querying the materialized view. We'll measure the time taken to fetch the total sales, total orders, and average order value for a specific shop:
First query:
SELECT shop_id, SUM(total_sales) AS total_sales, COUNT(*) AS total_orders, AVG(order_value) AS avg_order_value
FROM orders
GROUP BY shop_id;
For the first query, without the materialized view, we got an average time of 700ms, with the highest being 916ms (depending on computer usage at the moment)
Second query:
SELECT * FROM shop_reporting_data;
For the query from the materialized shop_reportin_data view, we have now reached an average of 100ms, with a lowest total time of 51ms!
This can be even more beneficial in a production environment, where we would have more data and concurrent requests.
Conclusion:
Materialized views are a potent solution for enhancing query performance in Rails applications, especially when dealing with extensive and complex data sets. By storing precomputed data, they minimize the processing load and accelerate data retrieval, thus significantly improving the user experience.
Incorporating materialized views into your Rails projects can lead to substantial performance gains and a smoother, more responsive application.