Is there a workable solution to importing large datasets into a DB in server connect?

Hey All,

I’ve gone through all the posts, spent countless hours trying to find a better solution but am coming up short. Just wondering if i’ve missed anything.

As best i can tell, there still doesn’t appear to be a good working solution to creating server connect workflows that will import large amounts of data (i.e. from a user uploaded CSV) into a the Database.

Please let me know if i’ve missed something…I’ve tried using transactions (BEGIN and COMMIT) and it took only 7 seconds off from a test run which was 1.07 minutes…i’ve got other tests that have taken 30mins+.

I can’t get the postgres COPY to work, seemingly due to the CSV file being on the web server and postgres operating on its own DB server.

Is there anything?

I assume the workflows csv import server action is not suitable?

Have a look at this

Thanks Ken. I’ve definitely read through that numerous times, and while it gave me some ideas (i.e. the BEGIN / COMMIT) i don’t think anyone there really came to a solution and it ended up getting cloned as a feature request (which i voted for)

It seemed they weren’t able to get the COPY or \COPY query to work, and they also tried to create a custom module but i didn’t see that that got resolved either.

Maybe, but not that i’ve seen how to incorporate it. Obviously the current way is using the CSV import feature and then running a repeat step on each row with the INSERT statement inside each loop. BUt that is what causes such length response times.

One option i’m still to explore is using JSON in a INSERT statement…so that is still something, but it gets to a point where you’ve bashed your head against the wall for so many hours you want to make sure there is a possible resolution somewhere before continuing with said head bashing :slight_smile:

No no no!
The csv import feature does it all for you. You are trying to reinvent the wheel!

I think i probably didn’t explain myself well :slight_smile: I’m definitly using CSV import. That’s how i get the data into the Wappler Server Action…and then run a multi-insert step which is just a repeat on the CSV Import with an UPSERT statement inside it.

But that is just way too slow for large sets of data.

So unless i’m missing something, i am already using that feature and not trying to re-invent that particular wheel :slight_smile:

Sorry Phillip, I thought you were trying to do the Import via SQL.
As I don’t have access to wappler at present I.cant do much more although issues like thus have been reported in other contexts the past which have been due to slow connections between the webserber and database server. For day to day use the connection speed is not very noticeable but when you hit it with a large data set …

Hey @Philip_J,

What exactly does “large datasets” mean?
A huge number of records or a large amount of data per record?
Sorry if this is a stupid question but the first thing that I’m thinking is devide those datasets in smaller groups (groups of let’s say 100 records) or something like that and execute one of them at a time.
Just saying…

Hey @famousmag

Large numbers of rows, each with perhaps 5-10 fields/columns per row. Who knows what exactly clients would use, but it could be 10s or 100s of thousands.

It all needs be done by the end user through the UI. And also includes mapping field names from the CSV file imported etc.

I’ve got it working, but just set up as a repeat on the imported CSV file so it can take quite a while.

Not a solution per so for a speedy import, but I know a lot of SaaS products that work with large volumes of importing data will pass the task to a worker (or other depending on stack) so that it doesn’t block the user and then the system will notify the user once it’s complete. Could be a UX workaround for you to think about.

2 Likes

I agree @mgaussie, if not mission critical I would hand it off to the scheduler and do it on background.

2 Likes

I agree with Brian…

I suppose this is a matter of memory… This huge amount of data is loaded on memory taking so much place of it and then the memory resources are low for processing!

The problem must be hidden here:

This Topic has a few very good ideas/advisements:

Although memory could play a role, the most notable performance bottleneck is actually what I described here:

1 Like

What didn’t work when trying LOAD DATA? I can’t recall if I ended up using this method, but it seems workable as you can specify the text file be loaded from the client or local to the MySQL server. It would be extremely fast.

Thanks Matt. This is something i’ve wanted to introduce for other workflows anyway so now might the time to get serious about it. I think someone may have even built an extension for it.

Agreed!

This is perhaps something I glossed over as i haven’t given it a proper attempt to get working…I’ll look into this idea a bit more today

Actually I realise now that i did look into that, naturally LOAD DATA isn’t available for Postgres, but i spent a lot of time trying to get the equivalent COPY to work. Unfortunately, like others, i struggled to get it to work. Seemingly due to the fact the CSV file is located on the web server and postgres doesn’t have direct access to it.

I feel it’s doable, but not something i’ve been able to crack yet. Same goes with trying to run the INSERT INTO statement using JSON (i.e. the CSV file parsed into JSON in Wappler)