Is it correct that every api action I create (nodejs) could be called by someone outside my site/app and so that is the reason we need to use security restrict? If that is the case, how do you stop people just harvesting your data on public facing parts of your site by connecting to the API? For example if you ended up having a massive site with lots of content that someone might want to harvest to use for themselves (their own apps/site/to sell), how would you ever stop this if the api to show the public data has to be unrestricted?
I think I may just be totally misunderstanding how it works as I am still only building my first app so do not fully understand it all yet.
You are correct. The security restriction is there to prevent access of an api via permissions or simple login.
For public API’s the risk of scraping is certainly real. You could looked at obfuscation (not a Wappler thing) or reduce the risk by checking for human interaction to block bots.
You could do a check of the HTTP_REFERER superglobal (available via server connect) to ensure the call was from your server and not another.
Not 100% (can be spoofed) but pretty effective
thanks both. I am wondering how other sites handle it then? Let’s use an example that’s been in the news recently, Reddit. They have have started charging for their data via api even though you can access the data without loging in frontend. Now I am guessing they do not use nodejs or wouldn’t they have similiar problems? Is the api access thing a nodejs issue or if I created my site in wappler and used php/.net would it still be the same? Is Nodejs not be a good idea for more user generated type content websites like Reddit?
I know I would never get as big as something like reddit so people would probably never even want to steal my content, I was more curious how it works.
It’s not a node.js thing, it’s common to all the language platforms.
Basically if you make the info public it is public.
You can limit the ability for automatic harvesting systems to some degree but by definition, public means exactly that.
ideally we could benefit from some mechanism to at least mitigate the issue similar to CORS to control external site access
There is also the option of using server-side binding so the api is called by your server and sent as a rendered page. Still doesn’t stop all harvesting but makes it trickier.
I'm just starting out with Wappler. And this is one of the questions I'm trying to understand.
Some APIs, even though they are public, should only be consumed within the website. I still don't know how to approach this.
Restrict access to the server's IP only? Restrict access to a specific URL only? How is this done?
I looked for tutorials on YouTube, but I couldn't find any. I found them for users who log in and out, ok. But I haven't found any for restricting access by other means yet.
Is there any content on this being addressed?
I'm sorry if this is something silly, but I don't know how to proceed. I've only been using Wappler for a short time, but I think it's incredible. It has a strong community. My language may not help me with the questions, they may seem confusing, but I don't speak English well. But I'll go ahead.
Thanks for the answer, @Hyperbytes . I always watch your videos on the channel.
I've thought about something along these lines. I had this thought. But then wouldn't it be possible to check if the API call is coming from outside?
Let's say the server IP is: 178.56..., there was a call to the API, coming from a completely different IP, I blocked it, if it's the same, the server IP can respond to the call.
Or maybe the public queries wouldn't be made through the public APIs but by some other function within the site.
We know that nowadays anyone can web scrape content. But making some things difficult ends up discouraging something from an API call.
When accessing the site, the news is public, there is no need to log in. Everything can be seen without blocking. I have advertising when reading the news.
However, what I don't want is for a query to be made externally in the API, and to get all the news published on the Portal in a json file. If that were to happen, the person would have all the news and would not consume the advertising. It would be possible to work on this query in order to publish it on other channels.
Look, I know that there is a web scraping method, but it ends up making some things more difficult.
Going back to the example in WordPress, the news is visible within the site, public. But for an external call via API, the webservice has to be enabled, there is no public query.
Maybe my thinking about the API is wrong and the queries for internal consumption within the site are done in another way.
if it comes from some other function from within the site then the IP is going to be the same anyway however the call will also share the same session and therefor the same user identify will be returned so effectively all internal calls will be authorised.
If it comes from another site then there will be a separate session created and no user identity will be defined so the api will return an unauthorised error
I actually use the sort of method @Hyperbytes mentioned. I have a guest user that I can set permissions on, if identity is false, then the guest user information is used as the logged in user's info/permissions even though it's not truly logged in.
Yes, I mentioned WordPress because there the API is only open if it is enabled. Otherwise, your only access is by viewing page by page of the site itself.
Maybe it could be different, I'm looking for something that isn't possible. But it's confusing not being able to restrict access on the endpoint for an external query, where I want it to be consumed only on the domain itself.