Technical question on updating data via sockets in Wappler

Mr.Rubi · December 8, 2022, 1:29pm

Notice: I am giving a deliberately unrealistic scenario in order to better understand the technical part of the question I am interested in.

Conditions: There are two app. They are the same except for one detail. In the first application, a data table is built based on a query that is limited to 100 records and this query is executed very quickly (let’s say 200ms). In the second application, the same query with the same data, but limited to 1 million records and executed very slowly (say 10 minutes). Requests in both the first and second application use sockets to update data in real time. And so, when the data has already been loaded, in both the first and second applications user A (from the first application) and user B (from the second application) changed the data in different records of their tables at the same time.

Questions:

Does user A and user B receive a message with new data on sockets equally quickly or is user A faster?
If user A receives data on sockets faster, then how much faster? will the difference be 10 minutes or less/more?

Mr.Rubi · December 12, 2022, 8:14am

So, I tested it and confirmed my concerns. Sockets completely reboot server action on the server and only then send messages to the client side.

What follows from this?

Imagine that you have a complex query to the database. Its processing takes a lot of time and we get a large array of data at the output. All this takes about a minute of time. In general, not so much for the first download, since further on the client side you can work with this array at a very high speed. Filter the data as you like and get instant samples, because no waiting from the server is required. In addition to the convenience for the user, it also saves server resources, since it does not create a large number of requests from users with each data filtering (and filters are used very often). I.e., it turns out only advantages, but this is as long as we do not approach the issue of updating data. Any, even the smallest change in the database will cause the sockets to restart the entire server action, which will slow down the sending of updates to the client part until the server receives the results of the response of a large request. For the user, it will look like a huge time lag on the client side. Let’s say if a user sends a message, he will see this message in the interface only after a minute. Until that time, he will not understand what happened, because the message will be sent correctly and a positive response will be received from the server.

The main question is, is it possible to make sockets, when making changes/updating the database, send a message to the client side only of this small change without restarting the entire server action? @George, @patrick is it possible? I am very attracted to rare, complex requests to the server with subsequent filtering exclusively on the client side. It works incredibly fast and does not load the server, which gives fantastic opportunities to scale the application. However, updating data on sockets spoils everything, because a complete update of the request by sockets actually makes it impossible for users to quickly receive updated data.

Mr.Rubi · December 14, 2022, 1:16pm

I continued the research and found new problems with the current method of updating data using sockets.

Even if the application uses a large number of fast queries, this will not help with a large number of active users actions. The application will start to slow down. This is due to the fact that when updating data, this data usually participates in many requests (5-6 or more), so we have to run an update of all these actions on the server. When there are a lot of users, many server actions fall into a state of endless reboot. This loads the server very much and the application starts to slow down both on primary/regular requests and on updating data on sockets.

@patrick, what can be done to solve the problem? How to get rid of this monstrous overhead when updating data?

patrick · December 14, 2022, 3:08pm

The first problem is a difficult one, you could after a insert/update only send the inserted/updated record to the client but we currently don’t have any component that can handle that.

The way it currently is being used is that in the insert/update action you trigger a refresh event which is being send to all clients, then the client requests the new data from the server by calling the server action.

The second problem is because of this refresh, each client is requesting new data. The data is probably the same for all and an optimization would be to cache the results from the query. By caching the results it only has to query the updated data once and then send only the cached version. Problem here is that all requests come at the same time, so even when you cache the results most requests come before the query is finished. This could perhaps be optimized by keeping track of the queries are being executed and not execute a new query when it is already running, the results of the query could be shared for all the requests being made resulting in a single query.

I will have to do some research on which optimization helps and how it scales. The current implementation of caching is only for http requests and doesn’t work for requests that are made simultaneously when it wasn’t cached before. Also you can only set a time to cache the route endpoint and no way to invalidate the cache. I think improving the caching of server actions with large queries would improve performance a lot.

scalaris · December 14, 2022, 5:05pm

Add a random duration wait() in the client flow that processes the socket event. This increases your chances of having a cached result set in redis for the majority of refresh calls. (Assuming it’s not a custom query which don’t ever appear to cache)

Mr.Rubi · December 14, 2022, 6:37pm

Yes, maybe it will improve the situation a little. But the approach itself, don’t you mind its inefficiency? I’ll just give an example and a little calculation. Even if you make the database requests very compact, with maximum filtering on the server side, the total data packets (from all requests) will still be 300 - 700 Kb. Let’s take 100 users who made 100 changes to the application during the day. This is 100*100 reboot messages for everyone. Suppose that 7000 requests will not pass due to the fact that there will be a collision conflict, only 3000 will pass. I.e. each user downloads a data package 3000 times from the server during the day, which is about 1.2 -1.7 Gb of data. Plain text data. Per user. In one day. I.e., for all it will be 120-170 Gb per day. AWS will be happy for such traffic! And all because of what? Due to small updates that can take a couple of dozen bytes. Even if we calculate the total volume of all changes, it will be 400-600 Kb. Even if this data is sent to all users it will be about 50 Mb at all. But in fact it will be much less, since the messages will be addressed. But even if you take 50 Mb and compare it with 150 Gb, the difference is obvious.

Also to the question of the concept itself. In fact, an approach is now being used that was practiced 20 years ago in chats written in php. When the timer was set and every 1-2 seconds the client requested data from the server. Only the Wappler approach uses a message received over the socket as a trigger for updating the request instead of a timer. And this is a dirtier approach, which can lead to very bad consequences. About them below.

This requires immediate modification and optimization. I just now realized how much danger my application is in. Tomorrow, first of all, I will inform the administration so that they stop adding new users, otherwise there is a risk of a server crash. At the moment, the Wappler application with connected sockets is actually the source of a ddos attack on itself. This will not be noticeable with a small number of users or if the data is not actively changing in the application. But if the data changes actively as the number of users grows, the problem will grow exponentially. If there are 1000 active users in the application actively changing data, there can easily be a moment when 100 users change data at the same time. In this case, each user will receive 100 messages about data reloading. A total of 100,000 simultaneous requests. But as I wrote above, each update usually triggers 5-6 or more server actions. I.e., as a result, 500,000 simultaneous requests from different clients fall on the server.

This is the cleanest and most correct approach. It is hundreds of times more efficient in terms of traffic, many times faster and easily scales to any amount of data and the number of users. Moreover, there is already a lot for him in the Wappler. Even now, using the Wappler, you can determine which users need to send the data update and the updated record itself. There are also all the tools to send address messages to users and receive them on the client side. The only thing missing is the additional settings of the Server Connect component on the client side. I see this as additional dynamic events in the data, socket events section. There are three events of adding, changing, deleting a record. The record is taken from a message that was received via sockets from the server and then, according to the type of event, changes are made to the previously downloaded data array (when the application is first loaded). This would incredibly speed up the operation of the application, unload the server and open the way to a confident and calm scaling of the application.

In my case, all requests are custom. But even if this were not the case, there are critical problems in the current approach of sockets, which I wrote about above.

scalaris · December 14, 2022, 6:45pm

Sockets are a powerful tool but you do need to consider and mitigate the potential side effects. There’s plenty of algorithms to smooth the curve - that’s why we have computer science. To be aware of and avoid what naive developers don’t consider.

Mr.Rubi · December 14, 2022, 7:06pm

I would be really grateful and thankful if you would share a couple of algorithms for smoothing the described problems that I have encountered.

Perhaps there will be other naive developers, like me, who will face similar problems.

scalaris · December 14, 2022, 7:13pm

That’s not how this works. You need to zoom out, look at the issues, design smoothing algorithms and trial them. Despite this being a well known and understood problem, there’s rarely, if ever, an off-the-shelf silver bullet that covers all specific cases. And it’s a common issue in no/low code platforms that become successful. At some point, you need to employ real developers who can mitigate the issues. I wouldn’t panic just yet

Mr.Rubi · December 14, 2022, 7:26pm

Well, then it makes sense to discuss and sort it out here. We all want Wappler to have highly effective or close to such solutions in all areas. Now sockets seem to me to be a link that can create a lot of problems. Not only to me, but in principle to any developer. And it makes sense to improve.

At the moment, one of the fastest solutions I see is the rejection of the automatic socket update approach, which is implemented in the Wappler. And the launch of data updates by sending address messages to users. This will solve the problem of a massive number of requests. Although from the point of view of data exchange, this will still be a suboptimal solution.

scalaris · December 14, 2022, 7:27pm

Or look at your app design and data model. That’s usually the easiest thing to fix

Mr.Rubi · December 14, 2022, 7:39pm

I agree, but not this time.

Whatever I do with the application design or data model, it will not solve the problem of generating a cascade of requests when using socket updates now. Just because now automatic socket updates work like this in the Wappler.

scalaris · December 14, 2022, 7:41pm

What about a simple queue to smooth the peak?

Mr.Rubi · December 14, 2022, 8:20pm

This will really smooth out the peak of updates, stretching them in time, if we set a timer for queue execution (for example, no more than 1 time per second). But it will kill usability. As the queue is filled with actions to refresh socket, users will receive updates of new data more and more slowly through different channels, because a request to refresh socket of certain data may end up at the end of a queue of many thousands.

At the same time, such a complex implementation by and large will eventually create something like a second-by-second update of server actions. In this case, a reasonable question arises: why complicate everything so much if you can set a timer on the client, as 20 years ago, and do a data update every second?

scalaris · December 14, 2022, 8:25pm

Whatever works for you.

George · December 15, 2022, 7:28am

Maybe using PouchDB with CouchDB as server will be more efficient approach to the problem of large datasets that needs to be up-to-date at all connected clients all the time.

This is because PouchDB creates a client side database that is synchronized on the fly with the server CouchDB. So only small set of diff data is exchanged and everything is kept in sync with all clients.

We hope to soon offer integration of those databases in Wappler.

As those databases are more NoSQL and document based, we have to take a bit different approach to integrate them in Wappler.

scalaris · December 15, 2022, 7:34am

That’s a good idea. We have recently integrated the Bryntum calendar framework - that’s got a client side store that can be used to sync updated - very efficient approach.

Mr.Rubi · December 15, 2022, 9:13am

Yes, I’m looking forward to integrating PouchDB with CouchDB into the Wappler, we’ve been discussing this for quite some time. I understand that their integration is not a trivial task at all, so I wait patiently.)

However, in the current project I would not be able to use them, even if they were already available in the Wappler. The thing is that I am not allowed to store data on the client side. The data must be available to the client exclusively during the active session.

For my current project, I will come up with something and find workarounds. I’m more concerned about something else. The current implementation of sockets in the Wappler poses a threat to absolutely any project that uses sockets to update data. No matter how small and compact the application requests are. If users are actively working with data in the application, now the developer must put a huge reserve of power into the server infrastructure of the application, otherwise he will accept that his server is likely to fall. So if you have 100 active users, deploy a server capable of handling the load of 2 000 - 4 000 simultaneous requests. If you have 1000 active users deploy an infrastructure that can handle loads 200 000 - 400 000 simultaneous requests. With 10,000 active users, you will need computing power capable of handling tens of millions of requests. If you do not lay such a power reserve, your application will not work normally. And I am not considering the issue of traffic, it must be unlimited and free, otherwise there will be huge traffic bills.

I’m not insisting on anything, but maybe it can be improved somehow.

JonL · December 15, 2022, 9:54am

@patrick what about adding an option to work in binary mode? Also some batching could be included.