15 Dec 2023

Django with PostgreSQL Full-Text Search

Context + Problem

Full-text search serves as a crucial tool for swiftly retrieving information from a vast collection of text or an unstructured data dump. The approach often involves sifting through a plethora of words or text entries to locate specific information promptly. Some databases, such as Elasticsearch or Opensearch, are explicitly designed or prominently feature full-text search capabilities. These solutions exhibit exceptional performance, allowing users to navigate through extensive datasets containing millions of rows in mere milliseconds. However, a notable drawback associated with these systems is their substantial memory requirements, rendering them less suitable for minimal environments, such as those equipped with only 1 core and 1 GB of RAM, where robustness becomes a concern.

Addressing this challenge, PostgreSQL’s full-text search functionality emerges as a valuable alternative. While it may not match the sheer performance levels of its counterparts, like Elasticsearch or Opensearch, PostgreSQL’s full-text search offers the advantage of simplicity and integration with existing PostgreSQL installations. This means users can leverage the full-text search capabilities without the need for a separate database dedicated solely to this purpose. In essence, PostgreSQL’s solution provides a balance between functionality and resource efficiency, making it a practical choice for scenarios where maintaining a single database is preferable to managing an additional one exclusively for full-text search. This adaptability allows users to optimize their database environments based on their specific needs and resource constraints.

Setup

Cloud + Postman

The performance evaluation uses Django, Django Rest Framework and PostgreSQL and was conducted in a cloud environment using a Digital Ocean droplet equipped with a small hardware resources: 1 core processor and 1 GB of RAM. The operating system of choice was Ubuntu 22.04 LTS, with Docker serving as the primary containerization runtime to ensure a consistent and reproducible testing environment.

The application stack comprised Django as the web framework along with PostgreSQL as the relational database management system. Here, the Django Object-Relational Mapping (ORM) played a pivotal role, functioning as the chief translator responsible for converting user requests into full-text search queries executed on the PostgreSQL database.

For the setup above, I’ve used Postman to simulate users querying or hitting the service. Here is the configuration of Postman to simulate the testing and benchmarking of 100 users simultaneously querying or accessing the service.

postman

Code

Django provides a very nice wrapper to handle the full-text search queries in PostgreSQL. You can use the following code to do this.

from django.http import JsonResponse
from django.contrib.postgres.search import SearchVector
from rest_framework import permissions, viewsets
from rest_framework.decorators import action
from rest_framework_tracking.mixins import LoggingMixin
from .serializers import MyModel, MyModelSerializer

class MyModelViewset(LoggingMixin, viewsets.ModelViewSet):
    queryset = MyModel.objects.all()
    serializer_class = MyModelSerializer
    def search(self, request):
        text_search = request.data.get("q", "*")
        results = MyModel.objects.annotate(
            search=SearchVector(
                "my_model_field_1",
                "my_model_field_2",
                ...
                # more model fields here
            )
        )
        if text_search != "*":
            results = results.filter(search=text_search)
        else:
            results = results.filter()
        results = results[start_from: start_from+size]
        res = self.serializer_class(results, many=True).data
        template["features"] = res
        template["total"] = results.count()
        return JsonResponse(template)

You can still improve the code above by adding pagination and permission classes. Pagination is a technique that allows you to split your data into smaller chunks and display them on different pages. This can improve the performance and usability of your application, especially when you have a large amount of data. Permission classes are a way to control who can access or modify your data. You can define different levels of permissions based on the user’s role, authentication, or other criteria. This can enhance the security and privacy of your application, as well as prevent unauthorized or malicious actions.

We can notice that the code above is easier to implement. Some integration like Elasticsearch will require you to install a DSL library and implement a documents.py just to properly index the fields. Elasticsearch is a popular and powerful full-text search engine that can handle large-scale and distributed data. However, it also requires more resources and setup to use. You need to install and run Elasticsearch as a separate service, and communicate with it using a REST API or a DSL library. You also need to define and update the schema of your data, and create and manage the documents that represent your data. This can add more complexity and overhead to your application development and maintenance.

Results

postman-performance

PostgreSQL’s full text search is a powerful feature that allows you to perform complex queries on text data. It provides the performance above, which means that it can handle a large number of requests per second with low latency. During the performance test, we measured the CPU and memory usage of our system. The CPU usage was pegged at 100%, which indicates that the CPU is the limiting factor for our system’s performance. The memory usage was about 45%, which means that we still have some free memory available. We can infer that our system is bottlenecked by the CPU, and that we could improve the performance by upgrading the CPU or adding more CPU cores.

Overall, this is already a good enough result for small system deployments, such as personal projects or prototypes. You can have a feature rich application that is very functional, without requiring a lot of resources or configuration.


Tags: