Introduction
On a production environment I was watching a query I had just made attempts at optimizing take longer and longer as days went. Usually I would look at WHERE clauses to inform me on indexes that need creating or queries that need restructuring but often overlooked ORDER BY clauses, turns out both need equal attention.
Reproduction
I will be using Laravel for this because of easy EXPLAIN statements provided by barryvdh/laravel-debugbar and FakerPHP that we’ll use to seed the database with lots of records.
Create a Laravel Project
$ composer create-project laravel/laravel order-by-gotchaUpdate
.envand setDB_DATABASE,DB_USERNAMEandDB_PASSWORD. I will be usingmysql. You might need to create a database.Create a
Clientmodel together with a corresponding migration and controller.$ php artisan make:model Client -mcUpdate the migration. Only the email column is indexed, for future demonstration purposes.
// ... $table->string('name'); $table->string('email')->index(); $table->string('phone'); $table->timestamps(); // ...Update the
Clientmodel to make the columns fillable.// ... protected $fillable = [ 'name', 'email', 'phone' ]; // ...Update the
DatabaseSeederto seed 1,000,000 clients. This might take a bit, 6 min on my end (SSD, 8 Cores):public function run(): void { $records = []; $start_time = microtime(true); $iter_start_time = microtime(true); for ($i = 0; $i < 1_000_000; $i++) { // For logging, just to see stuff works if ($i > 0 && (($i % 10_000) == 0)) { $end_time = microtime(true); $cumulative = $end_time - $start_time; $iter = $end_time - $iter_start_time; $this->command->info("$i - Cumulative: $cumulative, Iter: $iter"); $iter_start_time = microtime(true); } $records[] = [ 'name' => fake()->name(), 'email' => fake()->email(), 'phone' => fake()->phoneNumber(), 'created_at' => fake()->dateTimeBetween('-10 Years', 'now'), 'updated_at' => fake()->dateTimeBetween('-10 Years', 'now'), ]; // Bulk insert every 1000 records if ($i > 0 && (($i % 1_000) == 0)) { DB::table('clients')->insert($records); $records = []; } } }Now on to the
ClientControllerclass, with the following methods:public function index() { $client = Client::query()->latest() ->first(); return view('welcome', compact('client')); } public function index_with_where_clause() { $filter = 'ca'; $client = Client::query() ->where('email', 'like', "%{$filter}%") ->latest() ->first(); return view('welcome', compact('client')); } public function index_order_by_email() { $client = Client::query() ->latest('email') ->first(); return view('welcome', compact('client')); } public function index_order_by_id() { $client = Client::query() ->latest('id') ->first(); return view('welcome', compact('client')); }Clean up
welcome.blade.php, just show the client name:<!DOCTYPE html> <html> <head> <title>Laravel</title> </head> <body> {{ $client->name }} </body> </html>Update
web.phpto add the route:Route::get('/', [ClientController::class, 'index']); Route::get('/with_where', [ClientController::class, 'index_with_where_clause']); Route::get('/order_by_email', [ClientController::class, 'index_order_by_email']); Route::get('/order_by_id', [ClientController::class, 'index_order_by_id']);Add
laravel-debugbarand publish the config:$ composer require barryvdh/laravel-debugbar --dev $ php artisan vendor:publish --provider="Barryvdh\Debugbar\ServiceProvider"Enable explain output by editing
config/debugbar.php. Scroll until you find the setting then update it as follows:'explain' => [ // Show EXPLAIN output on queries 'enabled' => true, 'types' => ['SELECT'], // Deprecated setting, is always only SELECT ],Run migrations and seed the database:
$ php artisan migrate --seedStart the server
$ php artisan serve
Exploring Performance
When we open http://localhost:8000/, this is what the debug bar shows us as the
EXPLAINoutput:explain select * from `clients` order by `created_at` desc limit 1table type possible_keys key key_len ref rows Extra clients ALL 999001 Using filesort 999001 records are being loaded into memory! And the query takes
904ms. Not good. On production, a comparable query was taking1.7 seconds.You might think that a query with a where clause would help things, let’s open http://localhost:8000/with_where to see if that’s the case:
explain select * from `clients` where `email` like '%ca%' order by `created_at` desc limit 1table type possible_keys key key_len ref rows Extra clients ALL 999001 Using where; Using filesort Still way to many records being loaded with the operation taking
767ms.We added an index on the
emailcolumn though? Maybe let’s use it when ordering instead ofcreated_by. Let’s open http://localhost:8000/order_by_email:explain select * from `clients` order by `email` desc limit 1table type possible_keys key key_len ref rows Extra clients index clients_email_index 1022 1 Only one row is loaded, and the index is used with the select operation taking
2.46ms. Just like that we get a significant bump in performance. On production the query went from1.7 secondsto ~70ms. Now you can either add an index to thecreated_atcolumn or leverage the primary key which is indexed by default. Theemailindex in this example was just to show indexes help inORDER BYstatements.Let’s use the
PRIMARY KEYfor ordering http://localhost:8000/order_by_id:explain select * from `clients` order by `id` desc limit 1table type possible_keys key key_len ref rows Extra clients index PRIMARY 8 1 This one takes
1.74ms, doesn’t load close to a million records to memory, looks alright.
TL;DR
Use indexed columns when using ORDER BY clauses.