How to determine the appropriate concurrency number in your background job system?
To read and discuss this topic together, I think you can't ignore nor be unaware of the concepts of concurrency, parallelism, and background jobs. In this article, I will assume that you already have certain knowledge (or just the basics are enough) about these concepts. As you may know, it is essential for an application to have a background job processing system to handle heavy and time-consuming tasks such as importing, exporting, sending emails, crawling, data migration, etc. It helps improve user experience and application performance. There are many libraries or frameworks that support background jobs. For example, Sidekiq for Ruby on Rails, Django for Python, Laravel for PHP, they all have their own background job platforms. Each library or framework operates differently. Here, I will mainly focus on Sidekiq and how to determine the appropriate concurrency level for it.
1. What is concurrency in Sidekiq?
Concurrency is the number of threads running simultaneously on a single process (1 CPU, 1 instance, or 1 worker, depending on different definitions in some places). In simple terms, it's like someone who can cook rice and wash dishes at the same time. While waiting for the rice to cook, they utilize that "free time" to wash dishes. Concurrency operates in a similar way. When Sidekiq runs a job that needs to wait (waiting for a DB query, waiting for network response, etc.), it utilizes the waiting time to run other jobs if you configure the number of concurrent jobs allowed. For example, if you configure the concurrency number to be 10, then a single Sidekiq worker can run up to 10 jobs simultaneously (not necessarily at the exact same time if there are no jobs that need to wait for other tasks).
2. How to configure it?
There are 2 ways to configure the concurrency in Sidekiq:
You can directly configure it in the
config/sidekiq.ymlfile, with the
You can also configure it directly at runtime using the
Example start Sidekiq with a concurrency of 20
bundle exec sidekiq -c 20
3. How to calculate the appropriate number?
What are your needs?
This means how much job volume your application needs to handle?. You don't need to determine the exact number, but you should be able to estimate an approximate number. For example, in my system, there are times when I can run up to 100 import and export jobs simultaneously.
How long do you want to complete a job?
Let's say you have a file with 10,000 rows to import, and you want Sidekiq to run this import job within 5 minutes. If you split the file into smaller parts and import each part as a separate job, with each job importing 100 rows, then you will have 100 jobs. To import the 10,000 items within 5 minutes, Sidekiq needs to complete 2,000 items per minute, which corresponds to 20 jobs and is also the concurrent (threads) number required for each process.
What are your system resources?
We have already calculated the basic computation for an example of determining the necessary concurrent quantity. But it is important to consider whether your system resources are sufficient to handle the workload that requires running multiple jobs simultaneously. You need to consider whether you have enough resources such as RAM and CPU to meet the demand while avoiding waste.
4. What should you pay attention to ?
One extremely important thing to consider is the number of connections in the database. It can cause issues if you don't configure the appropriate concurrent number. Sidekiq recommends that you set the concurrency number close to or equal to the connection pool size (pool option into config/database.yml) to the database. If the number of threads (with the need to connect to the database) is greater than the allowed pool size, errors can occur like this :
ActiveRecord::ConnectionTimeoutError - could not obtain a database connection within 5 seconds. The max pool size is currently 5; consider increasing it
or like this image
Or if in your code you intentionally open additional threads that perform DB queries, a similar situation can occur.
And conversely, having a large number of connections compared to the number of threads needed can result in resource waste.
If your job doesn't use any DB queries or network operations and is solely for computation, increasing the concurrency number will have little to no effect. This is because Sidekiq's concurrency mechanism utilizes the idle time of the CPU to execute other jobs in parallel threads.
So basically, to determine the appropriate concurrency number, you need to consider the following factors:
Sufficient resources for configuration
The number of jobs to run and the desired completion time for each job
Consideration of issues related to the DB (Connection pool) and network.