How do you solve cron job planning?

Let’s say every hour I want to clean up some data in the database: split up full names into a first_name, last_name.
For example:

first_name: "John Doe",
last_name: ""

Becomes:

first_name: "John",
last_name: "Doe"

How would you do this in a lean way?
I would need to keep track if a certain name has been ‘cleaned up’.
But I don’t want to just add a name_cleaned_up col to my database, that would get messy quickly with more different clean ups to do.
I think a system (table) that records the clean ups would be better. Where it simply saves “record with id 523 has been cleaned up on 2023-01-01 10:00”.

I have a system like that.

But, I feel like this is a common issue, and I don’t want to re-invent the wheel.
How do you solve this type of issue?

Why would you want to do cleanup in a recurring job? Normally you would do the cleanup directly on insert/update of the data. When you still have old data in the database that needs the cleanup it could be done with a one-time job instead of scheduled.

If you want to do it with a scheduled job then you have to keep track of the dirty records, this can be done with an extra field in the table or having a separate table which you use as a queue of records that needs to be cleaned or keep track of the records that were cleaned. Having a separate table will make the query a bit more complex since you need to join them.

Why not check the value of last_name?

If NULL you need to process it.
If not empty string you have processed it already.
If empty string you have processed but for whatever reason there is no last name…if that can be a case.

Although if you need to keep a time record of when it was last processed you need a new column.

But of course I would do this on client side(if applicable) first and then validate and correct on server if needed as patrick suggests.

This is a very good question. And makes me realize I still have some things to learn :wink:
You’re right. The whole question came from the idea that I might miss cleaning up the data BEFORE inserting into the db.
But my attention should go into finding all the events where it gets inserted into the database and making suer it’s cleaned up before hand.

Makes sense. Thanks !:smiley:
And yes, as Patrick suggests, I should fix the core issue!

Set aside the discussion why you’re not cleaning up on data creation, the quoted approach is called “auditing trail” and a very common approach. Many systems from CMS to Customer Management Systems do it this way, it’s just a simple database log of which (backend-/front end) user or process has changed what data.

And to Wappler-ise this, create an API action file in your library which you can easily call in all your other scripts. Set the $_PARAM fields you’ll need (userid, description, etc.) and then create the Database Insert to add the record. Use the NOW variable for the datetime field or set it to default to current_timestamp in the database structure. Then you just need to use the Exec action and run that library script in all your other scripts. Nice and tidy and quick to implement.

Also, it might come in handy. I don’t know if there is a way to add this to wappler. But it would be very nice to manage such events on the database side…

1 Like

Awesome tips everyone!

Would you create a subtable for these audit trails?
Let’s say we have a table users and want to see who/what changed data.
We can make users_audit_trail and make records like said:

user_id: 253, 
datetime_changed: 2023-02-17 12:20:23, 
who: system,
description: splitting first & last name

Or do you have a different structure in mind?

With this structure we might end up with lots of subtables. As many different items in our database could use audit trails.

I keep things nice and simple. I have my audit table (I call it ‘log’) and it has these fields: logid (int11 AI), userid, relatedRecordid, description, datetime.

So a typical record would consist of: 123, 5, 341, Updated a record, 2023-02-17 11:25:07

123 is the auto-increment id of the log table, 5 is the logged in user id, 341 is the related record id (in this case the record that was updated), the description so I know which table the related record refers to, the current date/time.

You could have one more field to contain the name of the table which the log refers to to make things even clearer. I find the description is sufficient.

1 Like

Yep, something like this. Unless you only need to audit, but don’t want / need to recover settings, this is more than enough. Compliance Departments usually doesn’t ask for more.

1 Like

You can provide less space and provide a wider usage area by adding error codes or operation codes of the operations performed.