Question - Is there any limitations i need to consider to ZIP a folder on my server. PHP
I have a folder with thousands of images inside of it... I can now ZIP this folder with .... Server Connect ---- Folder to ZIP... (* local machine development)
But what is the limitations or memory issues i might need to consider...
(* production / live server)
Do i have to worry about 'memory_limit' on much memory a PHP script can use...
As im not uploading files and then zipping them.. in one go.. then i dont have to worry about upload_max_filesize and post_max_size ..... i guess
BUT if i have a folder that is say 900MB in size and want to make a ZIP of that on the server.. will it cause issues...? Is there anything i need to adjust... besides having enough space on the server....
How does the ZIP work... does it write into a temp.. directory first then add it to the zip.. but in a nut shell .. in theory will it work without worries to ZIP an image folder of 900MB........
JUST SOME FEEDBACK...:
I was worried about memory limits and so on... but so far i have managed to create a ZIP file on the server of 850MB... without any issues......
I actually have a different experience. On an older php based project I am having issues where yhe folder to zip attempts to zip a folder with 641 pdf filed. But the end zip file only contains 230 zip files.
Kind of weird, because the zip is not corrupt.
The zip is 146MB, while it should end up around 450MB.
I have tried editing some php.ini settings in the docker container. But no luck so far.
i guess it could be "server" dependent. Because maybe of the php settings that is allowing for the "temp" data or even the memory limit... hence i asked the question.. but nobody responded... so i had to just do my own tests.... but "luckily" it worked...
i did find if i have a folder within the folder i wanted to zip.. then it wont zip those folders within the structure .. just the files in the folder... will be then contained in the zip .... eg... images to zip.. and in images/thumbs/ folder.. it will not add that to the zip file..
I also found if the file names were eg 1(2 copy).jpeg it wont work... but if i used a filename... eg..... 1.jpeg... or 1-2-copy.jpeg... then i could see those files.. so i ended up forcing all my files when uploading to the server to use a uid for the filename .. so that there is no "funny" business with the filenames.. and it worked....
so my suggestion to you @jellederijke would be to start at the filenames.. and check if they dont have spaces or funny brackets or something... and give that a test..
EDIT -- After doing some more tests it does indeed accept 1 (1).jpg and 1(1 copy).jpg ....
So maybe @patrick or @Teodor can shed some more light on this.. so that we know what we can adjust or look out for when doing this...
Thanks @Mozzi, this is really helpful. Another example where more documentation can save us all a great deal of time testing and trying things just to see what may or may not work. I would never have guessed the filenames would affect the zipping up of files so I will keep that in mind. But a definitive list of what is acceptable and what isn't would be extremely useful so we can structure our filenames without the risk of them not being included in the zip.
I would set up an environment where I could watch the system performance and memory usage of the system & PHP process during the zip process versus performing the zip process outside of PHP, such as inside a shell or cron job. If zipping thousands of images will be a recurring process, I would want to architect the most efficient method and some testing would be necessary.
The standard zip tool writes directly to output, not a temp file location. However, memory is used for calculating the compression and performing the archiving process. If you run out of RAM during the archive process, past experience tells me that the system will use the paging file (temp storage) to move other processes out of memory, or to help complete the archive. Sometimes increasing the paging file size default improves overall system performance as it reduces swapping between RAM and disk.
How deep you go really just depends on the frequency of the zip process you described, the amount of RAM on the server available when the archive starts, and if performance of the system remains acceptable to you or its users during that time. HTH
Thanks for your response. Highly appreciate your input!
I can actually monitor the CPU/RAM usage in Docker Desktop. And change the containers specs in the php.ini in the container. But it does not seem to use a lot of RAM. The laptop running Docker Desktop has 64RAM, so we have some room.
It is a yearly process where around 16.000 pdf's are created. And in some cases there are around 600 or more pdf's for one zip.
Would you have any ideas on which php settings would impact this zipping process most?
I wouldn't unfortunately. It's been quite a while since I knew my way around PHP config's. Your favorite flavor of chatGPT is usually quite good at giving some direction with decent prompting. Here's an example of the prompt I would start with:
You are my PHP and linux web server administrator expert. I need your assistance in understanding what PHP configurations, Docker settings, and operating system components I should consider for the Use Case described below. I want the process to be as efficient and fast as possible. You may ask me questions to help facilitate the best possible answer.
[Use Case]
Once yearly, I want to run a process that zips (archives) thousands of images and PDF files from within a folder. Multiple zip files will be created containing upwards of 600 files each. The folder contents before archiving average around 900MB. My web server is running current versions of PHP, Linux, Apache, and Docker.
The below text was the output from Gemini:
Analyzing Your Use Case for Efficient Archiving
Understanding Your Requirements:
To optimize the archiving process, we'll need to consider the following factors:
File Size and Quantity: You mentioned an average of 900MB per folder and multiple zip files with 600 files each.
Frequency: The process will run once yearly.
Performance Goals: You want the process to be as efficient and fast as possible.
Key Considerations for Optimization:
PHP Configuration:
Memory Limits: Ensure PHP has sufficient memory allocated to handle large file operations. You can adjust the memory_limit setting in your php.ini file.
Parallel Processing: If your hardware allows, consider using parallel processing techniques to compress multiple files simultaneously.
Compression Level: Experiment with different compression levels to find the optimal balance between compression ratio and processing time.
Batch Processing: If you have a large number of files, consider breaking them into smaller batches to improve performance and reduce memory usage.
Testing and Profiling: Thoroughly test your archiving process under load to identify bottlenecks and optimize performance. Use profiling tools to analyze CPU and memory usage.
So far we have managed to find a solution. Findings so far:
This issue does not occur on a NodeJS project. If we try to zip the exact same folder in NodeJS, it works perfectly.
If we do a add files to zip, instead of folder to zip, we don't have the issue on the PHP project, anymore. Practically for us, it comes down to the same thing. But we don't have the issue
Hi thanks to your feedback and the testing your side.. im glad you found a work around...
"If we do a add files to zip, instead of folder to zip, we don't have the issue on the PHP project, anymore. Practically for us, it comes down to the same thing. But we don't have the issue"