How to Implement the Repository Pattern
How to Implement the Repository Pattern
Most applications start with database queries scattered throughout the codebase—controllers directly calling SQL, business logic mixed with data access, and domain concepts buried in WHERE clauses. This works initially, then becomes unmaintainable when you need to switch databases, add caching, change query logic, or test business logic without hitting the database. Refactoring at this point means touching dozens of files and risking subtle breakage.
The Repository pattern solves this by creating a clean abstraction layer between your business logic and data access code. Repositories encapsulate all database operations for an entity, providing a collection-like interface that business logic uses without knowing whether data comes from PostgreSQL, MongoDB, or an in-memory cache. This guide covers how to implement repositories that actually improve code quality rather than adding pointless abstraction layers.
We'll focus on practical implementation: structuring repositories for real use cases, handling complex queries without leaking SQL into business logic, implementing caching effectively, and avoiding the common mistakes that make repositories more complex than direct database access.
Why the Repository Pattern Matters
Direct database access from business logic creates tight coupling. Your user registration code directly calls database.query() with SQL strings, which means that code knows about database schema, query syntax, and connection management. Testing requires a real database. Switching from PostgreSQL to MongoDB means rewriting every query in every file that touches users.
The Repository pattern inverts this dependency. Business logic calls userRepository.create(user) without knowing or caring how that user gets persisted. The repository handles all data access concerns—SQL generation, connection pooling, error handling, caching—while exposing a simple domain-oriented interface. Business logic depends on the repository interface, not the implementation, which enables testing with fake repositories and switching implementations without touching business logic.
The Core Problem: Scattered Data Access
Without repositories, data access code spreads everywhere. A user management feature has queries in the registration controller, the profile update handler, the authentication middleware, and the admin dashboard. Each implements similar logic slightly differently, creating inconsistencies and bugs.
// Scattered queries throughout the codebase
// In registration controller
const user = await db.query(
'INSERT INTO users (email, password) VALUES ($1, $2) RETURNING *',
[email, hashedPassword]
);
// In profile controller
const user = await db.query(
'SELECT * FROM users WHERE id = $1',
[userId]
);
// In admin dashboard
const users = await db.query(
'SELECT * FROM users WHERE created_at > $1 ORDER BY created_at DESC',
[thirtyDaysAgo]
);
This pattern fails when requirements change. Adding a soft-delete feature means updating every SELECT query to filter deleted_at IS NULL. Moving to a different database means rewriting SQL in dozens of files. Testing any business logic requires database setup because data access is embedded in the logic.
Centralized Data Access with Repositories
Repositories centralize all data access for an entity in one place. All code that needs user data goes through UserRepository, which provides domain-oriented methods instead of raw SQL.
// All user data access centralized in repository
class UserRepository {
async create(userData) {
const result = await db.query(
'INSERT INTO users (email, password) VALUES ($1, $2) RETURNING *',
[userData.email, userData.password]
);
return this.mapToUser(result.rows[0]);
}
async findById(id) {
const result = await db.query(
'SELECT * FROM users WHERE id = $1 AND deleted_at IS NULL',
[id]
);
return result.rows[0] ? this.mapToUser(result.rows[0]) : null;
}
async findRecentUsers(days) {
const cutoff = new Date(Date.now() - days * 24 * 60 * 60 * 1000);
const result = await db.query(
'SELECT * FROM users WHERE created_at > $1 AND deleted_at IS NULL ORDER BY created_at DESC',
[cutoff]
);
return result.rows.map(row => this.mapToUser(row));
}
mapToUser(row) {
return {
id: row.id,
email: row.email,
createdAt: row.created_at
};
}
}
Now soft-delete logic lives in one place (the repository), all queries consistently filter deleted users, and business logic uses clean domain methods. Testing becomes trivial—swap in a fake repository.
Repository Interface Design
Repository interfaces should reflect domain concepts, not database operations. Don't expose SQL queries or database-specific details. Think about what business logic needs, then design the interface to provide that.
Domain-Oriented Methods
Good repository methods express business intent. findActiveUsersWithExpiredTrials() is better than findWhere({ status: 'active', trial_end: { lt: new Date() } }) because it expresses what you're looking for in domain terms.
class UserRepository {
// Good: expresses business intent
async findActiveUsersWithExpiredTrials() { }
async findUsersNeedingPasswordReset() { }
async countUsersBySubscriptionTier(tier) { }
// Bad: exposes database implementation
async query(sql, params) { }
async findWhere(conditions) { }
async executeStoredProcedure(name, args) { }
}
Domain-oriented methods make code self-documenting. When you read findActiveUsersWithExpiredTrials(), you understand what data you're getting. When you read query('SELECT ...'), you must parse SQL to understand intent.
Return Domain Objects, Not Database Rows
Repositories should return domain objects, not raw database rows. This decouples business logic from database schema. If your users table has 20 columns but business logic only needs 5 fields, the repository should return objects with those 5 fields.
// Good: returns domain object
async findById(id) {
const row = await db.query('SELECT * FROM users WHERE id = $1', [id]);
if (!row) return null;
return {
id: row.id,
email: row.email,
displayName: row.display_name,
createdAt: new Date(row.created_at),
isActive: row.status === 'active'
};
}
// Bad: returns database row directly
async findById(id) {
const row = await db.query('SELECT * FROM users WHERE id = $1', [id]);
return row; // Exposes database schema to business logic
}
Mapping to domain objects means business logic never depends on database column names. Renaming display_name to full_name requires changing only the repository, not every file that uses user data.
Standard Repository Methods
Most repositories need a common set of methods for basic CRUD operations. Standardize these across repositories for consistency.
| Method | Purpose | Returns |
|---|---|---|
| findById(id) | Retrieve single entity by primary key | Entity or null |
| findAll() | Retrieve all entities (use with caution) | Array of entities |
| create(data) | Create new entity | Created entity |
| update(id, data) | Update existing entity | Updated entity |
| delete(id) | Delete entity (or soft delete) | Success boolean |
| exists(id) | Check if entity exists | Boolean |
| count() | Count total entities | Number |
Beyond these basics, add domain-specific methods as needed. Each method should have a clear purpose and return type.
Handling Complex Queries
The challenge with repositories is handling complex queries without creating method explosion or exposing SQL to callers. You need flexible querying while maintaining abstraction.
Specific Methods for Common Queries
For queries you run frequently, create specific methods. This makes code readable and keeps query logic centralized.
class OrderRepository {
async findByCustomerId(customerId) {
return this.query(
'SELECT * FROM orders WHERE customer_id = $1 ORDER BY created_at DESC',
[customerId]
);
}
async findPendingOrdersOlderThan(days) {
const cutoff = new Date(Date.now() - days * 24 * 60 * 60 * 1000);
return this.query(
'SELECT * FROM orders WHERE status = $1 AND created_at < $2',
['pending', cutoff]
);
}
async findHighValueOrders(minAmount) {
return this.query(
'SELECT * FROM orders WHERE total >= $1 ORDER BY total DESC',
[minAmount]
);
}
}
These methods express business concepts (pending orders older than X days) rather than database concepts (WHERE status = 'pending' AND created_at < cutoff). They're self-documenting and testable.
Query Object Pattern for Complex Filtering
When you need flexible querying, use the Query Object pattern instead of exposing raw SQL. Create a query builder that constructs queries using domain concepts.
class OrderQuery {
constructor() {
this.filters = {};
this.sortField = 'created_at';
this.sortOrder = 'DESC';
this.limitValue = 100;
}
forCustomer(customerId) {
this.filters.customer_id = customerId;
return this;
}
withStatus(status) {
this.filters.status = status;
return this;
}
createdAfter(date) {
this.filters.created_after = date;
return this;
}
sortBy(field, order = 'DESC') {
this.sortField = field;
this.sortOrder = order;
return this;
}
limit(count) {
this.limitValue = count;
return this;
}
}
class OrderRepository {
async find(query) {
// Build SQL from query object
let sql = 'SELECT * FROM orders WHERE 1=1';
const params = [];
if (query.filters.customer_id) {
params.push(query.filters.customer_id);
sql += ` AND customer_id = $${params.length}`;
}
if (query.filters.status) {
params.push(query.filters.status);
sql += ` AND status = $${params.length}`;
}
if (query.filters.created_after) {
params.push(query.filters.created_after);
sql += ` AND created_at > $${params.length}`;
}
sql += ` ORDER BY ${query.sortField} ${query.sortOrder}`;
sql += ` LIMIT ${query.limitValue}`;
return this.query(sql, params);
}
}
// Usage
const orders = await orderRepository.find(
new OrderQuery()
.forCustomer('customer_123')
.withStatus('pending')
.createdAfter(new Date('2024-01-01'))
.sortBy('total', 'DESC')
.limit(50)
);
This provides flexibility without exposing SQL. Business logic composes queries using domain terms, and the repository handles SQL generation. You can swap database implementations by changing how the repository interprets the query object.
Repository Implementation Patterns
Repository implementation involves managing database connections, handling errors, mapping between database and domain representations, and ensuring consistent behavior across repositories.
Database Connection Management
Repositories need database connections but shouldn't manage connection lifecycle. Inject a database connection or connection pool into repositories at construction time.
class UserRepository {
constructor(db) {
this.db = db; // Database connection or pool
}
async findById(id) {
const result = await this.db.query(
'SELECT * FROM users WHERE id = $1',
[id]
);
return result.rows[0] ? this.mapToUser(result.rows[0]) : null;
}
mapToUser(row) {
return {
id: row.id,
email: row.email,
createdAt: new Date(row.created_at)
};
}
}
// Repository initialization
const db = createDatabasePool(config);
const userRepository = new UserRepository(db);
const orderRepository = new OrderRepository(db);
This pattern keeps repositories focused on data access logic while connection management stays separate. It also enables easy testing—inject a test database connection or mock.
Transaction Support
Operations that modify multiple entities need transaction support. Repositories should accept an optional transaction parameter that allows coordinating multiple repository calls in a single transaction.
class UserRepository {
constructor(db) {
this.db = db;
}
async create(userData, transaction = null) {
const conn = transaction || this.db;
const result = await conn.query(
'INSERT INTO users (email, password) VALUES ($1, $2) RETURNING *',
[userData.email, userData.password]
);
return this.mapToUser(result.rows[0]);
}
}
class AccountRepository {
constructor(db) {
this.db = db;
}
async create(accountData, transaction = null) {
const conn = transaction || this.db;
const result = await conn.query(
'INSERT INTO accounts (user_id, plan) VALUES ($1, $2) RETURNING *',
[accountData.userId, accountData.plan]
);
return this.mapToAccount(result.rows[0]);
}
}
// Using repositories in a transaction
async function registerUser(userData, accountData) {
return this.db.transaction(async (tx) => {
const user = await userRepository.create(userData, tx);
const account = await accountRepository.create({
...accountData,
userId: user.id
}, tx);
return { user, account };
});
}
Repositories work with or without transactions. Business logic that needs atomicity across multiple operations uses transactions. Simple operations use repositories directly.
Error Handling and Mapping
Repositories should catch database-specific errors and throw domain exceptions. Business logic shouldn't handle PostgreSQL error codes or MySQL constraint violations.
class UserRepository {
async create(userData) {
try {
const result = await this.db.query(
'INSERT INTO users (email, password) VALUES ($1, $2) RETURNING *',
[userData.email, userData.password]
);
return this.mapToUser(result.rows[0]);
} catch (error) {
if (error.code === '23505') { // PostgreSQL unique violation
throw new DuplicateEmailError(userData.email);
}
throw new RepositoryError('Failed to create user', error);
}
}
}
// Business logic handles domain exceptions
try {
const user = await userRepository.create({ email, password });
} catch (error) {
if (error instanceof DuplicateEmailError) {
return { error: 'Email already registered' };
}
throw error;
}
This keeps database-specific details out of business logic. Business logic handles domain errors (duplicate email) without knowing about PostgreSQL error codes.
Caching Layer Integration
Repositories are ideal places to implement caching because they centralize all data access. Caching in repositories is transparent to business logic—code calls userRepository.findById() and doesn't know whether the result comes from cache or database.
Read-Through Cache Pattern
Read-through caching checks the cache before hitting the database. On cache miss, load from database and populate cache.
class UserRepository {
constructor(db, cache) {
this.db = db;
this.cache = cache;
this.cacheKeyPrefix = 'user:';
this.cacheTTL = 300; // 5 minutes
}
async findById(id) {
const cacheKey = `${this.cacheKeyPrefix}${id}`;
// Try cache first
const cached = await this.cache.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// Cache miss: load from database
const result = await this.db.query(
'SELECT * FROM users WHERE id = $1',
[id]
);
if (!result.rows[0]) {
return null;
}
const user = this.mapToUser(result.rows[0]);
// Populate cache
await this.cache.set(
cacheKey,
JSON.stringify(user),
{ ttl: this.cacheTTL }
);
return user;
}
async update(id, data) {
const result = await this.db.query(
'UPDATE users SET email = $2 WHERE id = $1 RETURNING *',
[id, data.email]
);
const user = this.mapToUser(result.rows[0]);
// Invalidate cache on update
const cacheKey = `${this.cacheKeyPrefix}${id}`;
await this.cache.delete(cacheKey);
return user;
}
}
Caching is invisible to callers. Business logic calls findById() and gets data fast without knowing about Redis. Cache invalidation happens automatically on updates.
Testing Repositories
Repositories make testing easier by providing clear boundaries. Test repositories themselves against a real database to ensure data access works correctly. Test business logic with fake repositories to isolate business rules.
Testing Repository Implementation
Repository tests should use a real test database to verify SQL correctness, connection handling, and data mapping.
describe('UserRepository', () => {
let db;
let repository;
beforeEach(async () => {
db = await createTestDatabase();
repository = new UserRepository(db);
});
afterEach(async () => {
await db.close();
});
test('findById returns user when exists', async () => {
const user = await repository.create({
email: '[email protected]',
password: 'hashed'
});
const found = await repository.findById(user.id);
expect(found.id).toBe(user.id);
expect(found.email).toBe('[email protected]');
});
test('findById returns null when not exists', async () => {
const found = await repository.findById('nonexistent-id');
expect(found).toBeNull();
});
test('create throws DuplicateEmailError for duplicate email', async () => {
await repository.create({
email: '[email protected]',
password: 'hashed'
});
await expect(
repository.create({
email: '[email protected]',
password: 'hashed2'
})
).rejects.toThrow(DuplicateEmailError);
});
});
Testing Business Logic with Fake Repositories
Business logic tests should use fake repositories (in-memory implementations) to avoid database dependencies and make tests fast.
class FakeUserRepository {
constructor() {
this.users = new Map();
this.nextId = 1;
}
async create(userData) {
const id = `user_${this.nextId++}`;
const user = {
id,
email: userData.email,
createdAt: new Date()
};
this.users.set(id, user);
return user;
}
async findById(id) {
return this.users.get(id) || null;
}
async findByEmail(email) {
return Array.from(this.users.values())
.find(u => u.email === email) || null;
}
}
// Business logic test using fake repository
test('user registration creates user and sends welcome email', async () => {
const userRepository = new FakeUserRepository();
const emailService = new FakeEmailService();
const registrationService = new RegistrationService(
userRepository,
emailService
);
const result = await registrationService.register({
email: '[email protected]',
password: 'password123'
});
expect(result.success).toBe(true);
expect(userRepository.users.size).toBe(1);
expect(emailService.sentEmails).toHaveLength(1);
expect(emailService.sentEmails[0].to).toBe('[email protected]');
});
Fake repositories keep tests fast and deterministic. You control exactly what data exists, making edge cases easy to test.
Common Repository Anti-Patterns
Several patterns seem reasonable but create problems. Avoid these common mistakes.
Generic Repository Base Class
Creating a base Repository class with generic CRUD methods sounds like good code reuse but creates awkward APIs.
// Anti-pattern: generic repository
class Repository {
async findById(id) { }
async findAll() { }
async save(entity) { }
async delete(id) { }
}
class UserRepository extends Repository { }
class OrderRepository extends Repository { }
// Problem: all repositories have same interface regardless of domain needs
// Orders need findByCustomerId, users need findByEmail,
// but generic base class doesn't know about these
Instead, define repository interfaces independently based on each entity's needs. Share implementation utilities if needed, but don't force a common interface.
Repository per Database Table
Creating one repository per table treats repositories like ORMs. Repositories should be organized by domain aggregates, not database tables.
// Anti-pattern: table-based repositories
class UsersRepository { } // users table
class UserProfilesRepository { } // user_profiles table
class UserSettingsRepository { } // user_settings table
// Better: aggregate-based repository
class UserRepository {
async findById(id) {
// Joins users, user_profiles, user_settings
// Returns complete User aggregate
}
}
One repository per aggregate means business logic works with meaningful domain objects, not database tables. Joining multiple tables to build a User aggregate is the repository's job.
Exposing IQueryable or Query Builders
Returning queryable objects that let callers build arbitrary queries defeats the purpose of repositories.
// Anti-pattern: exposing query builder
class UserRepository {
query() {
return this.db.queryBuilder('users'); // Returns query builder
}
}
// Caller builds queries directly
const users = await userRepository
.query()
.where('status', 'active')
.where('created_at', '>', cutoff)
.orderBy('email')
.limit(10);
// This is just database access with extra steps
Repositories should provide domain-specific methods, not generic query interfaces. If you need flexible querying, use the Query Object pattern that expresses domain concepts.
Repository Pattern with ORMs
ORMs like Prisma, TypeORM, and Sequelize already provide data access abstraction. You might not need explicit repositories, or you might want thin repositories that wrap ORM models to provide domain-specific methods.
When ORMs Replace Repositories
If your ORM provides good domain modeling and you're comfortable with business logic depending on the ORM, you might not need repositories. ORMs offer querying, relationships, and transactions already.
// Using Prisma directly without repository
const user = await prisma.user.findUnique({
where: { id: userId },
include: { profile: true, settings: true }
});
const recentOrders = await prisma.order.findMany({
where: {
customerId: userId,
createdAt: { gte: thirtyDaysAgo }
},
orderBy: { createdAt: 'desc' }
});
This works fine if you're happy depending on Prisma throughout your codebase. The tradeoff: switching ORMs means updating every file that uses Prisma.
Thin Repositories Over ORMs
You can wrap ORM models with thin repositories that provide domain-specific methods while using the ORM for actual data access.
class UserRepository {
constructor(prisma) {
this.prisma = prisma;
}
async findById(id) {
const user = await this.prisma.user.findUnique({
where: { id },
include: { profile: true, settings: true }
});
return user ? this.mapToUser(user) : null;
}
async findActiveUsersWithExpiredTrials() {
const users = await this.prisma.user.findMany({
where: {
status: 'active',
trialEndsAt: { lt: new Date() }
}
});
return users.map(u => this.mapToUser(u));
}
mapToUser(prismaUser) {
return {
id: prismaUser.id,
email: prismaUser.email,
displayName: prismaUser.profile?.displayName,
// Domain object structure, not Prisma structure
};
}
}
This balances abstraction with practicality. You get domain-oriented interfaces and independence from ORM specifics without reinventing data access.
FAQ
Do I really need repositories if I'm using an ORM?
It depends on your abstraction goals. ORMs already provide data access abstraction, so repositories might be unnecessary overhead. Use repositories over ORMs when you want domain-specific interfaces, need to hide ORM details, or plan to swap ORMs. Otherwise, using the ORM directly is simpler.
Should each repository have create, read, update, delete methods?
Only if those operations make sense for the entity. Some entities are read-only (reference data), some are append-only (audit logs), and some support full CRUD. Design each repository's interface based on business operations, not a generic template.
How do I handle complex queries that join multiple tables?
Create specific methods for common complex queries, or use the Query Object pattern for flexible querying. Don't expose raw SQL or query builders directly—that defeats the abstraction. Repositories should handle joins internally and return domain objects.
Where should transaction management happen—in repositories or business logic?
Business logic coordinates transactions across repository calls. Repositories accept optional transaction parameters so they can participate in transactions, but business logic decides when transactions are needed. This keeps repositories focused on data access while business logic orchestrates operations.
Should repositories return domain models or DTOs?
Repositories should return domain models (entities representing business concepts). DTOs are for API layer communication—repositories operate at the domain layer. Return objects with domain-relevant properties, not database table structures or API response formats.
How do I test repositories without hitting a real database?
Test repository implementations against a real test database to verify SQL correctness. Test business logic using fake repositories (in-memory implementations) to avoid database dependencies. This gives you confidence that data access works while keeping business logic tests fast.
Is it okay to have multiple repositories for related entities?
Yes, organize repositories by domain aggregate roots. If User and UserProfile are part of the same aggregate, one UserRepository handles both. If Order and OrderItem are part of the Order aggregate, one OrderRepository handles both. Separate repositories when entities are independent aggregates.
How should repositories handle pagination?
Add pagination parameters to methods that return collections: findAll({ page, limit }) or use a query object with pagination methods. Return both results and total count to enable UI pagination. Make pagination optional with sensible defaults to keep common cases simple.
Should repositories handle caching or should that be separate?
Repositories are ideal places for caching because they centralize data access. Implement read-through caching in repositories to make caching transparent to business logic. This keeps cache invalidation logic close to data modification code.
What's the difference between repository and DAO patterns?
DAOs (Data Access Objects) provide CRUD operations per database table. Repositories provide domain-oriented operations per aggregate. DAOs map closely to database structure; repositories map to domain concepts. Repositories are higher-level abstractions focused on business needs, not storage implementation.
Conclusion
The Repository pattern provides valuable abstraction when implemented with domain needs in mind. Good repositories express business intent through their interfaces, hide data access details, and enable testing by providing clear boundaries. Poor repositories just wrap database calls without adding value, creating complexity without benefit.
Design repository interfaces around domain operations, not database operations. Think about what business logic needs and provide methods that express those needs clearly. Return domain objects, not database rows. Handle database-specific concerns (connection management, error mapping, caching) within repositories so business logic stays clean.
Use repositories where they add value—complex domains with rich business logic, systems where testability matters, or applications that might switch databases. For simple CRUD applications or when using feature-rich ORMs, direct data access might be simpler. Choose patterns that solve real problems in your specific context.