A common issue and a quick approach to avoid it
Generally, enterprise applications make use of transactional database features to ensure a group of database operations is fully dumped into the database itself — from the former to the last one altogether — preventing those operations from being partially applied.
Having said that, every read operation against the data that is being modified in a transaction will have this transaction’s changes reflected on it as long as it is performed after the transaction has been successfully applied.
Many enterprise applications have to deal with a great amount of traffic, thus leading to complex reading/writing scenarios where concurrency issues are common to happen.
If a read operation is performed on a data item while it is being modified by a transaction simultaneously, it will probably retrieve an old version of the data item that doesn’t have the transaction’s changes still applied to it. Every read operation performed before the transaction has completely finished will be likely to behave this way.
A special case of these concurrent reading/writing scenarios raises when using asynchronous task processing in conjunction with transactions. Often when a specific business operation is carried out, one or more asynchronous tasks need to be performed.
Most of the time, those tasks are related to specific application implementation details, thus they can be performed at any moment in the future. For that to happen, tasks are usually enqueued using a queuing mechanism from which one or more task runners can take and execute them asynchronously. A common setup of this environment would look like this:
The problem arises when tasks are triggered before transactions successfully finish, and those tasks perform reading/writing operations on the same pieces of data those transactions might be using.
This scenario leads to having tasks being performed with old versions of data, thus generating inconsistencies along the application. The following diagram attempts to better depict this situation:
Note that for this scenario to happen, tasks that were enqueued during a transaction should be immediately taken by a task runner process. If it takes longer for the task runner to start with a task, then by the time the task starts, the transaction may already be finished, and the problem wouldn’t exist. Having an empty queue at the moment the task is enqueued would be an ideal scenario where this issue is likely to happen since the task would be immediately taken by an idle task runner.
To better describe the issue itself, a quick non-sophisticated example will be provided. Use it as a quick help to clarify how the problem might look like in a practical situation more than a well-defined guide of how it actually looks like. In practice, this issue could give rise to more complex and hard-to-diagnose scenarios.
Often, large computation results are stored in their own database tables to avoid repeating the same computations, again and again, every time those results are requested by some application feature.
Supposing that an employee’s salary is rarely modified, that there are thousands of employees in a company, and that the average salary of employees per department is frequently requested; an asynchronous task that updates the
average_per_department table every time an employee’s salary is updated could have been implemented. A quick PHP skeleton for that implementation would look like the following: