fsync Unlocked: PHP & Advanced Memory Synchronization

Hard Drive Sync using fsync in PHP. Image generated by DALL-E using ChatGPT4
Hard Drive Sync using fsync in PHP. Image generated by DALL-E using ChatGPT4

Back in 2021, PHP released version 8.1 and a very important feature that I want to address here: fsync and fdatasync. It always catches me off guard when such fundamental features are introduced into the language at such a late stage: fsync is a native operating system function and the only thing that needs to be done in order to make it available in PHP is to create a wrapper function and ship it to PHP. But, one after the other: let’s start with some basic.

What is fsync doing?

fsync functions as a system command in systems that are similar to POSIX, essentially serving as a command to the operating system to confirm any modifications or additions to a file. Those modifications and additions can sometimes temporarily stored in the buffer and fsync makes sure they are instantly written to the hard drive. This ensures that any updates are securely stored on the disk once the fsync completes.

In the C programming language, which is the programming language used to write PHP, fsync is a part of its standard function library. Most of the major and relevant programming languages have native fsync support with one exception: until version 8.1, PHP was the notable exception.

fsync in Practice: A Hands-On Illustration

Consider the following piece of PHP code handling files:

Imagine using this function and it returns true, indicating that the file writing process was successful. It clearly says the content is written to the file and therefore it returns a value not equal to false. So this means, the content is written, right?

Unfortunately, this is not the not the case.

In PHP, as well as in other languages with similar functionality, writing to a file does not guarantee immediate data storage on disk. What actually happens is that the data first enters a buffer – either the PHP internal, the operating system or even the hard disk. After certain conditions are met – such as a given time has passed, the buffer reached a given threshold or even all of them – the content is getting its way to the disk.

Take a moment to think about what this means: if data is still in the buffer and not yet written to disk, any abrupt interruption like a script failure, webserver crash, or power loss can lead to the loss of that data.

Even fflush can not guarantee here that data is written to the disk since the operating system, which aims to optimize disk usage, also temporarily stores data in its kernel buffer before physically saving it to the disk.

fsync and fdatasync functions are here to help in this cases. They work with file handles – resources – and try to save changes to the disk. It will return true if successful, false if it fails, or trigger a warning if the resource is not a file.

Here’s an example function demonstrating its use:

In addition to fsync, there is fdatasync, which offers a more targeted approach to synchronizing data. While fsync synchronizes both file data and metadata to the disk, fdatasync focuses solely on the file data, excluding metadata unless it’s essential for data retrieval. This makes fdatasync a bit more efficient compared to fsync but for the most of the cases, this is not relevant.

Some Head Ups

When handling file and especially file writes, there are some critical aspects that often go unnoticed but can have substantial impacts on your system’s reliability and data integrity. These “Heads Up” sections are designed to shed light on such often-overlooked details, providing valuable insights and precautionary measures to help you navigate these complex scenarios more effectively.

Heads Up 1

Be aware that using fsync() for frequent, high-volume file writes is not practical. Attempting to fsync() after every one of many hundreds or thousands of writes per second will severely hamper I/O performance.

Heads Up 2

To ensure maximum file durability on Linux systems, consider also opening and using fsync() the directory containing your file. Without this, there’s a risk that, although file changes are synchronized, the directory structure is not, potentially leading to data recovery challenges after a system restart.

Heads Up 3

Keep in mind that modern disk drives also have their own internal buffers, adding another layer to the buffering process. While most operating systems are designed to handle this by instructing the disk to flush these buffers during fsync(), some storage devices might falsely report successful writes.

As a result, even after fsync() confirms data persistence, there’s a small chance the data might not actually be saved. This is relatively rare – for instance in environments with low power supplies or unreliable hardware -, but it is a possibility to consider.

Conclusion

Integrating fsync into PHP is a notable enhancement, especially considering the function’s basic yet vital role in data management. Its introduction, though surprisingly late, is a significant development. fsync is essential for scenarios where immediate data writing is crucial, as there can be a delay before data is physically written to the disk. In most file-writing tasks, this delay is negligible, but fsync addresses those critical instances where delay is not acceptable.

Moreover, it’s important to understand the limitations of fsync in PHP. It specifically applies to standard file handles, meaning its use is restricted to certain types of file operations. Custom stream-wrappers are incompatible with fsync, and it won’t work on resources that are not regular files or those that can not be treated as such. This limitation places fsync in a specific context within PHP’s functionality.

fsync also has some heads ups we addressed in this blog post. While they are rare or edge case scenarios, it is important to at least be aware of them and remember when having inexplicable situations.

Incorporating fsync into Keestash has allowed us to see its practical impact, even it is a minor use case. While handling passwords, file attachments – such as recovery records or public/private keys – are sometimes also broader part of a credential.
Using fsync, Keestash improves its robustness and stability in terms of atomicity and consistency. When adding files as an attachment to a password, Keestash now returns a non 200 HTTP status code if the file is not synced to the disk.

Talking about Keestash: if you are interested in an open source password manager, then you’ll find Keestash to be an invaluable tool. It stands out in the realm of cybersecurity solutions by offering robust, user-friendly features that prioritize the safety of your digital credentials. As an open source platform, Keestash fosters a community-driven approach to security, allowing users to benefit from continuous improvements and transparent, peer-reviewed updates. Whether you’re an individual concerned about personal password management or a business looking for an efficient way to safeguard sensitive data, Keestash is designed to meet your needs. Its intuitive interface makes managing your passwords a breeze, while advanced encryption ensures that your information remains secure. Moreover, being open source, it offers the flexibility to customize and adapt to your specific requirements. Take control of your digital security today by choosing Keestash – the smart choice for anyone serious about protecting their online presence.