Data sync from API, what's the best method?

I’m working on getting data from an API, to insert records into the DB.

I would like to get data every minute, making sure that after getting the data all records are synchronised.

Right now I’m thinking to do this:

  1. Get the Data
  2. Compare the API records with records in DB
  3. Insert modified / new records in DB
  4. Delete records in DB that are not in the API records

This seems to be very “heavy” to do every minute though.

Anyone any recommendations on the best “algorithm” for this case?

Hi.
This is not the best method. But a way:

  1. Mark all rows in the table as “deleted” using a flag column.
  2. Run an UPSERT custom query to insert/update table using API data as the source. Make sure to set flag column null or 0.
  3. Delete all rows in the table which are still marked as “deleted”.
  4. You can run these set of steps in a 1 minute scheduler, if using NodeJS. There, first step could be to check if there are any “deleted” rows in the table. If yes, it means previous execution is still running - so don’t start another execution. This will help with avoiding overlapping steps.

Thanks for the suggestion! It’s better but I think still not 100% optimal.

I’ll have two types of data integration api’s

  1. Records that can’t/shouldn’t be changed (phone calls, live chat rating etc).

This should be ”easy” as I think I can just request api data from the last record’s timestamp

  1. Records that can be changed like contacts, sales etc

I’m thinking now that most api’s provide a “modified” timestamp field, which I can maybe compare with the last modified time in my db

Thats exactly how upsert works actually.
You can define which fields to update incase the row already exists in the table, else new row is inserted.

1 Like