I have always just made small projects with a (very) limited amount of database entries.
This new project requires a lot more.
The one with the most entries is probably where I would like to store prices of collections a few times a day. And that for a growing number of collections. Let’s say about 100 but possibly growing to a thousand…
So that means at least 200 entries if I only record the prices twice a day. But if I would record the prices every hour, it would already be 2400 entries per day.
If the collection grows to 1000 it would mean 24000 entries… per day… (if recorded per hour)
“If your data is inherently relational, and subject to queries that work well with SQL, you should be able to scale to hundreds of millions of records without exotic hardware requirements.
You will need to invest in indexing, query tuning, and making the occasional sacrifice to the relational model in the interests of speed. You should at least nod to performance when designing tables – preferring integers to strings for keys, for instance.”
There are specialized databases in time-series data, such as TimescaleDB - this one’s based on PostgreSQL
For my next project I’m looking forward using Graphite
A specialized time-series database helps you handle the case where e.g. you want to “compress” old data (e.g.: transform daily price points into weekly after a year, transform weekly to a month after 5 years). Graphite also has a built-in graphing functionality
Usually you can’t run these kind of apps on regular shared hosting, so that’s something to keep in mind
Edit: I’ve actually done something similar to your project (regularly inserting data) a very long time ago in a MariaDB database. It did became slow, but I was also doing it way worse: I was inserting like 300 rows every minute. I’ve made a cron to erase old data and I think that helped keep things sane
So, naturally, for me using a time-series database is a logical evolution. For you, I think it’s ok to continue using MySQL. When you believe it doesn’t fit anymore I’m sure it won’t be difficult to migrate away
Thanks htatd and apple.
Good to hear that for the time being I’m good
And yes time series database would be great. The high precision is only valuable for me in the short term. The data from a few months back there is no need for hourly updates. Daily or weekly would be enough.
Maybe there is an easy way to delete old data to keep only 1 datapoint per day?
Not needed right now but would be great to know for the future…
In case you want that, you could just create a scheduled action to check once a day/week/month for data that's older than x time, then perform actions to that older data. You could then delete all same day points and just keep last point of each day or something.
In any case I think you'll find the best solution for you once you get there
Since you are using mysql you can use (depending on what your hosting server techies have allowed to your level of account) the MySQL equivalent of a chron job --
Summary : in this tutorial, you will learn about MySQL event scheduler and how to create events to automate repetitive database tasks.
MySQL Events are tasks that execute according to a specified schedule. Therefore, sometimes MySQL events are referred to as scheduled events.
MySQL Events are named object which contains one or more SQL statement. They are stored in the database and executed at one or more intervals.
For example, you can create an event that optimizes all tables in the database that runs at 1:00 AM every Sunday.
You don't need any other database than mysql.
mysql> delete from price_collections where creationDate < now() - interval 30 DAY;
USING referenced page link you could duplicate what a chron job is usually suggested for in this forum
Make sure you have a timestamp field to record when the record was created
and make sure your key id is sufficient to handle thousands of integers in an auto_increment = 1 or something
run the delete checking the timestamp field and have the server compare its local now() time
Running PHP & MySQL I made a copy of my main records table with a different name like “archive_main_records” to hold a backup table that was never called for by my php application.
I ran events or triggers or even periodic php scripts to command the mysql database and did regular backups of older records by timestamps. Then keeping only records in my online queried “main_records” table. (Names are examples) that were, say, created within the last week, 2 weeks, or 2 days – whatever.
MySQL queries & functions offer many different ways or sequence of procedures to solve needs.
But CREATE EVENT is the start of ways to get the server itself to do the job for you. Different procedures might vary according to whether you are running ISAM or INNODB schemas.
My example is just an example of possibilities. Using chron events then schedules WHEN it runs and you tell it to delete itself after running or you tell it to hang around and be ready to run again.
If your record has a timestamp on its creation named "recordtime"
SELECT * FROM price_collections WHERE recordtime <= NOW() - INTERVAL 1 DAY
would trim the table with entries older than 24 hours from whatever your server time is. Set up a Server event to run it at midnight, every midnight.
You should combine it with an export to sql or csv with a timestamp in the sql export as in "2022-Feb-21.sql" or whatever.
Your mysql is also capable of sending this nightly export to your chosen email address.
My current cronjob records the floor price of all collections every hour.
What I would like is to keep all this for one month and from all records older than one month I would like to keep just one record per day (so the other 23 records from that day can be deleted) per collection
If I understand you then “floor_price” is the biggie.
One record that shows the lowest price for that day or 24 hour period?
And is “date_time” a real UTC time stamp that equates to 2022-020-28 13:16:29 ?
I would like to keep all data points (all records recorded) for one month (or so) and after that this kind of detail is no longer neccesary so I only would like to keep one record (out of the 24)
Saving this kind of data, as you already know, is not your problem.
But, are you saying that you want the existing online database being inserted into with each user submission, to, after 24 hours or something, simply contain One Lowest Floor Price PER “slug” so that each slug category has its own lowest floor price?
You want to start each new post-cleanup of the online table with maybe 5 records of each distinct slug with its Lowest value floorprice. And then let all the new days entries to commence, maybe another thousand plus new entries by users are added into this same table?
Then after 24 hours or something you strip out these thousands of new records, using the same filtering until you have the NEW 5 category records left with their new floorprices left over to start a new day’s entries?