Public api actions confusion

Anthony_0808080 · July 1, 2023, 8:07pm

Hi,

Is it correct that every api action I create (nodejs) could be called by someone outside my site/app and so that is the reason we need to use security restrict? If that is the case, how do you stop people just harvesting your data on public facing parts of your site by connecting to the API? For example if you ended up having a massive site with lots of content that someone might want to harvest to use for themselves (their own apps/site/to sell), how would you ever stop this if the api to show the public data has to be unrestricted?

I think I may just be totally misunderstanding how it works as I am still only building my first app so do not fully understand it all yet.

Thanks

mebeingken · July 1, 2023, 8:28pm

You are correct. The security restriction is there to prevent access of an api via permissions or simple login.

For public API’s the risk of scraping is certainly real. You could looked at obfuscation (not a Wappler thing) or reduce the risk by checking for human interaction to block bots.

Hyperbytes · July 1, 2023, 8:43pm

You could do a check of the HTTP_REFERER superglobal (available via server connect) to ensure the call was from your server and not another.
Not 100% (can be spoofed) but pretty effective

Anthony_0808080 · July 1, 2023, 9:33pm

thanks both. I am wondering how other sites handle it then? Let’s use an example that’s been in the news recently, Reddit. They have have started charging for their data via api even though you can access the data without loging in frontend. Now I am guessing they do not use nodejs or wouldn’t they have similiar problems? Is the api access thing a nodejs issue or if I created my site in wappler and used php/.net would it still be the same? Is Nodejs not be a good idea for more user generated type content websites like Reddit?

I know I would never get as big as something like reddit so people would probably never even want to steal my content, I was more curious how it works.

Hyperbytes · July 2, 2023, 7:21am

It’s not a node.js thing, it’s common to all the language platforms.
Basically if you make the info public it is public.
You can limit the ability for automatic harvesting systems to some degree but by definition, public means exactly that.
ideally we could benefit from some mechanism to at least mitigate the issue similar to CORS to control external site access

bpj · July 2, 2023, 9:12am

There is also the option of using server-side binding so the api is called by your server and sent as a rendered page. Still doesn’t stop all harvesting but makes it trickier.

tbvgl · July 2, 2023, 12:22pm

I’m using rate limits to prevent extensive data scraping and more:

You can simply extend the limits for logged-in users and even make it plan-specific.

I built a Wappler extension for this but didn’t publish it because it is not documented. Maybe I can publish it in the future if there is interest.

Anthony_0808080 · July 2, 2023, 7:33pm

thanks for the replies everyone. You have given me some good insight on the different ways this can be handled.