How AI Uses Artistic Works as Training Datasets: Mechanism, Risks, and Legal Issues

December 03, 2025

1. Introduction: AI Learns from Human-Made Art

Generative AI technologies such as Stable Diffusion, Midjourney, and DALL·E have rapidly transformed the digital art landscape. These systems can create new images, illustrations, and artistic styles in seconds.

But behind these capabilities lies one crucial fact:
AI models are trained on massive datasets containing millions of artistic works created by humans.

Many of these works are collected from the internet without the creators’ permission, which creates serious ethical and legal concerns. As my thesis reveals:

“AI transforms entire datasets—including copyrighted artworks—so they can be processed and learned by the system.”

This foundational process is precisely where legal disputes emerge.

2. How AI Processes Artwork in a Training Dataset

Generative AI learns through machine learning and deep learning techniques. Here is a simplified breakdown:

a. Developers collect artworks from various online sources

From:
• websites
• social media
• image platforms
• open datasets (e.g., LAION-5B, ImageNet, COCO)

Some datasets are licensed (closed-source), but many open-source datasets collect images randomly, including copyrighted content.

b. AI copies, scans, and analyzes the entire artwork

According to the thesis:

“AI uses all parts of the artwork to improve its learning quality.”

AI extracts information such as shapes, colors, brushstrokes, textures, and stylistic features.

c. AI builds internal statistical models that capture artistic patterns

The AI does not store the image itself but builds a “representation” of its characteristics.

d. When prompted by the user, AI generates a new image

The output often resembles the style of specific artists—sometimes so closely that the AI can recreate works similar to the originals.

3. The Legal Problems: Copyright Violations

This process raises a series of legal issues under Indonesian Copyright Law (UU Hak Cipta), as explained in the thesis:

(1) Violation of Economic Rights

Under Article 8 of Indonesian Copyright Law, creators have the exclusive right to receive economic benefits from their works.

Using artworks in AI training without permission deprives artists of this right, especially when the AI service is commercial.

(2) Violation of Moral Rights

AI does not credit the original creators and may alter their artistic style, damaging artistic integrity.

This violates Article 5 and Article 7 of UUHC.

(3) Commercial Use → Criminal Liability

If AI developers monetize systems trained on unlicensed artworks, Article 113(3) applies:

Unauthorized commercial use of copyrighted works can be subject to criminal sanctions.

(4) AI Cannot Be Held Legally Responsible

Some developers argue that “the AI system, not the developer, created the output.”

However, the law is clear:

AI is not a legal subject. Responsibility lies entirely with the developer.

Developers design the algorithms, collect the datasets, and profit from the outputs—making them legally accountable.

4. Why Open-Source Datasets Are Legally Risky

Most generative AI models rely heavily on open-source datasets such as LAION-5B, which automatically scrape images from the internet.

The thesis notes:

“Open-source datasets may contain copyrighted works taken without consent.”

This has triggered international lawsuits, such as:

Getty Images vs. Stability AI (2023)
Sarah Andersen vs. Midjourney & DeviantArt
OpenAI licensing agreements with Shutterstock (2022)

Open datasets are powerful tools but legally fragile.

5. Conclusion: AI Is Powerful, but the Law Protects Artists

AI can only generate high-quality artwork because it learns from human creators.

However, when developers use artworks without permission:

it violates economic rights,
it violates moral rights,
it becomes criminal when commercialized,
and developers—not AI—bear full legal responsibility.

To move forward ethically, the thesis proposes licensing systems and compensation models that ensure:

developers can legally access training data,
artists receive fair payment,
and AI innovation can continue without harming creators.

This balance—not conflict—will shape the future of AI and copyright.

Search This Blog

LegalTech Insight Fauzan Iraldi