Shadow AI refers to the use of unsanctioned AI tools by employees in the workplace, typically done in good faith to improve efficiency without official approval from their employer.

What are the risks associated with using cloud-based AI for sensitive data?

Using cloud-based AI can expose sensitive data to risks since it involves sending data to third-party servers for processing, which may lead to data breaches or unauthorized disclosure, especially if proper security measures are not in place.

What types of tasks can be automated using local AI tools?

Tasks that are well-suited for local AI automations include data cleaning and standardization, report generation, and data extraction and manipulation, allowing for efficiency without compromising data security.

Overcoming Shadow AI with Local, AI-Generated Automations

For those new to the term, Shadow AI is when an employee uses unsanctioned AI tools for their work. Generally speaking, this is done in good faith, the idea being that AI can make me so much more efficient.

This is not a rare phenomenon: surveys vary from showing that 47 percent of people using Shadow AI to 78 percent of employees admitting they use AI tools not approved by their employer.

Some industries might be fine with glossing over these numbers, but for labs handling patient samples, proprietary research, or clinical trial data, this creates serious compliance exposure.

Why does Shadow AI prevail in workplaces?

The use of Shadow AI isn’t surprising when “researchers spend half their time on administrative tasks”, or as the Harvard Business Review put it slightly more conservatively, knowledge workers (including researchers and scientists) spending 41 percent of their workday on “discretionary activities” that require little of their expertise.

These “discretionary activities” can include manual data entry, fighting with Excel formulas, transforming data from one format to another, batch renaming files—work that machines are really good at, and makes most humans want to pull out their hair.

Whitepaper

Digital screen with encryption data background. Big data with binary computer code

Evaluating Automation in Long-Read Sequencing Library Preparation

See how automation influences reproducibility, throughput, and efficiency, and what lab managers should consider before adoption

Certain LIMS exist to help, but they are generally pre-packaged tools that always come with some limitations. As one Reddit user stated so eloquently: I'll tolerate a good LIMS, loathe a bad LIMS.

Ideally, you would write custom software for all your “discretionary activities,” but this is not feasible without a developer on hand, an expensive resource in and of itself.

The only solution is, therefore, to turn to AI. Compliance be damned, as it were.

The risk of cloud-based AI

The fundamental issue with cloud-based AI is the architecture. When you paste data into ChatGPT, Claude, or any cloud service, that data travels to a third-party server for processing. The provider might promise not to train on it. They might even be telling the truth. But it’s not always up to them.

Just recently, 50 major firms were breached after attackers logged in using just one stolen employee password at each company. In other words, if your data is on someone else's server, its safety is no longer in your hands.

And even without hackers, your data might still be at risk when using LLMs.

Apparently, some Samsung engineers pasted proprietary code into a chatbot, and that code became part of the model's memory.

Could you risk the same with patient PII? Confidential lab results?

**Enter local, AI-generated automations**

Instead of having AI do the work itself (thereby exposing your data), you can have AI build the tool that does the work, such that it can run entirely on your device. No sending your sensitive files to a server, storing them in an external database, and certainly not being processed by an LLM.

Let’s take a simple example to see how this would work.

Let’s say that every week you need to consolidate several CSV files with raw data outputs from various sources. Instead of uploading the files to an LLM and hoping for the best, you can prompt it thusly:

Write a Python script that processes CSV files with the following headers: <your CSV headers here>, and consolidates them into one Excel file following these guidelines: <your guidelines here>

In this way, you aren’t sharing the data itself but rather the instructions to complete your task. If you do this right, you now have a little script that you can run any time this consolidation task crops up. And because it’s just a Python script, the code can run locally on your device, keeping your data 100 percent safe.

This is valuable from a privacy standpoint. Take HIPAA, for example: the Privacy Rule protects "individually identifiable health information" from unauthorized disclosure. If data never transmits anywhere, there's no disclosure to govern.

The same logic applies to GDPR's requirements around lawful processing and data controller accountability. Local processing means there's no third party to audit.

There's also a practical reliability advantage. With a local tool that runs on your device, labs don't need to worry about API rate limits during peak hours, LLM hallucinations, internet connectivity issues during critical workflows, or sudden service changes when a provider updates their terms of service.

Sounds too good to be true?

There are, of course, some caveats here. Chiefly, if you’re doing this on your own (as opposed to using a dedicated platform, for example), you need to be at least a little tech-savvy to get code running on your device.

But even this is pretty easily offset with a bit of guidance from your LLM of choice. A simple “explain how I can run this code as a non-coder” prompt will often suffice.

The other main drawback is that because these tools run on “simple code,” they lack the powerful features that LLMs offer, such as excellent OCR capabilities, text summarization, reasoning, etc. That said, there is ample filework out there that doesn’t need all the bells and whistles of AI. It just needs to get done.

Practical implementation for lab leaders

What does this look like in practice? Three categories of lab work are especially well-suited for these kinds of local automation:

Data cleaning and standardization. This is the big one. Taking inconsistent sample IDs, reformatting date columns, removing duplicate rows, standardizing naming conventions—all the tedious work that happens between "raw export" and "usable dataset." Local tools can handle these transformations without the data ever touching a cloud service.

Report generation. Weekly status reports, monthly summaries, and compliance documentation. Much of this involves pulling together information from multiple sources and dropping it into a template. Automation can handle the extraction and formatting, leaving researchers to focus on interpretation and decision-making.

Data extraction and manipulation. Converting PDFs to spreadsheets, restructuring nested data, and batch processing files. Mind-numbing stuff; classic for a machine to do.

Take any of these tasks, explain them to the AI, and have it build a tool for you that you can run locally.

Making the approved path the easy path

Lab leaders don't have to choose between productivity and compliance. That forced choice only exists when the available options all involve sending data to third parties.

Understanding the difference between AI that requires data transmission and AI that builds a locally-processing tool opens up a middle path. Staff get the automation they need to handle routine work efficiently. Leadership maintains the data controls compliance requires. IT gets visibility instead of shadow deployments.

That’s how you overcome the Shadow AI problem. Not by dumping files into any AI and hoping for the best, but by leveraging AI to build a safe solution for your particular pain.

Overcoming Shadow AI with Local, AI-Generated Automations

Navigate the Shadow AI dilemma by understanding the risks and empowering your team with compliant local automation.

Why does Shadow AI prevail in workplaces?

The risk of cloud-based AI

**Enter local, AI-generated automations**

Lab Quality Management Certificate

Sounds too good to be true?

Practical implementation for lab leaders

Making the approved path the easy path

Frequently Asked Questions (FAQs)

About the Author

Jona Farache

Related Topics

When the Unexpected Hits

Sponsored

The Power of Persuasion

Verify Circulator Performance without Waiting for Service

Kimtech™ Polaris™ Nitrile Exam Gloves. Built For Science. Made For You.

Manage Acetone Risk with High-Performance Hand Protection

Overcoming Shadow AI with Local, AI-Generated Automations

Navigate the Shadow AI dilemma by understanding the risks and empowering your team with compliant local automation.

Why does Shadow AI prevail in workplaces?

The risk of cloud-based AI

Enter local, AI-generated automations

Lab Quality Management Certificate

Interested in lab tools and techniques?

Sounds too good to be true?

Practical implementation for lab leaders

Making the approved path the easy path

Frequently Asked Questions (FAQs)

About the Author

Jona Farache

Related Topics

When the Unexpected Hits

Sponsored

The Power of Persuasion

Verify Circulator Performance without Waiting for Service

Kimtech™ Polaris™ Nitrile Exam Gloves. Built For Science. Made For You.

Manage Acetone Risk with High-Performance Hand Protection

**Enter local, AI-generated automations**