By Nguyễn Lê Bảo Ngọc (Ngọc Prinny) · Reviewed by Ls. Lê Thị Kim Dung & Ls. Nguyễn Văn Điệp
📖 Etymology Corner: "Intellectual" — The Mind's Own Work
The word "intellectual" traces to the Latin intellectus — the act of understanding, perception, from intelligere (to understand, to choose between). When we speak of intellectual property, we are speaking of property that originates in the mind — in a deliberate act of creative understanding. The legal question at the heart of this article is genuinely philosophical: when an AI system learns from human creative work, is that learning a tribute to the mind that created it, or is it a form of taking without giving back? Vietnam's amended IP Law 2025 has decided it can be both — depending on what you do and how you do it. 🧠📜
🎬 In a Nutshell
Every AI system needs training data. Language models need text. Image generators need images. Medical AI systems need patient records and research papers. The question for every Vietnamese business building or deploying AI is: where can that data legally come from?
The answer just got clearer — and more conditional — with Clause 5, Article 7 of the amended IP Law 2025 (Law 131/2025/QH15). This provision introduces an explicit text and data mining exception for AI training: a legal basis for using IP-protected works to train AI systems, subject to three cumulative conditions. Miss any one of them, and the legal protection disappears.
This post breaks down what those conditions are, what the AI-specific IP ownership rules say, and what Vietnam's state policy on IP tells us about the direction of travel.
📋 Section 1: The Three-Condition Rule — All or Nothing
Article 7, Clause 5 of the IP Law 2025 creates the following permission:
Organisations and individuals may use texts and data relating to IP-protected subject matter that has been lawfully published and made accessible to the public, for the purposes of scientific research, testing, and training artificial intelligence systems — provided that such use does not unreasonably affect the rights and legitimate interests of authors and IP rights holders.
Three conditions. All mandatory. Here they are in plain language:
Condition 1 — Lawfully published and publicly accessible: The data must have been published through lawful means and be accessible to the public. This is not just "available on the internet." It means the data was legitimately released into the public domain or made genuinely accessible — not scraped from paywalled sources, not extracted from databases the user doesn't have access rights to, not pulled from private repositories. If accessing the data would itself require bypassing a paywall, a licence restriction, or any form of access control, the data is not "publicly accessible" in the required sense.
Condition 2 — Correct purpose: The use must be for scientific research (nghiên cứu khoa học), testing (thử nghiệm), or AI training (huấn luyện hệ thống trí tuệ nhân tạo). These are the three permitted purposes — and they are listed exhaustively, not illustratively. Using data to train a model that will then be commercialised raises questions about whether the training falls within these purposes or goes beyond them. This is an area where the implementing Government decree (still pending) will be critical.
Condition 3 — No unreasonable harm to IP rights holders: The use must not "unreasonably affect" the rights and legitimate interests of authors and IP owners. This is the most interpretively flexible of the three conditions — and therefore the most legally dangerous. "Unreasonable" is a proportionality standard: some degree of impact on an author's market or interests may be acceptable; systematic substitution for the original work, or training that enables mass reproduction of protected works without licence, is unlikely to be considered reasonable. The three-step test familiar from international copyright law is the interpretive framework lurking behind this language.
The additional rule for copyright-protected data: For texts and data that are subject to copyright and related rights specifically, compliance with all three conditions above is necessary but not sufficient. Additional requirements will be set out in a Government decree — which has not yet been issued. Until that decree is published, businesses using copyright-protected data for AI training are operating in a zone of residual regulatory uncertainty even if they satisfy the three main conditions.
🏛️ Section 2: Who Owns What an AI Creates?
Separately from the training data question, the amended IP Law 2025 also addresses a question that has plagued IP lawyers globally: if an AI system creates something — a text, an image, a musical composition — who owns it?
Article 6 of the IP Law 2025 (as amended) adds an important new provision: the Government will set out rules on the arising and establishment of IP rights in cases where the subject matter was created using an AI system.
This is a significant policy signal. Vietnam is not ignoring the question — but it is delegating the answer to subordinate legislation. The current law does not directly declare that AI can or cannot be an author or IP rights holder. It leaves that determination for the Government's implementing decree.
What we do know from the existing framework:
Copyright (quyền tác giả) arises automatically when a work is created and expressed in a tangible form — regardless of whether it has been published or registered. The question of whether an AI-generated work qualifies for copyright protection turns on whether the creation process involves a human author in a meaningful way.
Industrial property rights (patents, trademarks, design rights) are established through formal registration — and the question of who may register an AI-generated invention remains open pending the implementing decree.
Trade secrets and well-known marks follow their own logic (use-based for the latter; lawful acquisition and maintenance of confidentiality for the former) and are less directly affected by the AI authorship question.
🏛️ Section 3: State Policy — The Direction of Travel
Article 8 of the IP Law 2025 sets out the Vietnamese state's IP policy — and it contains several provisions that signal where things are headed for AI-related IP:
The state policy emphasises promoting innovation while balancing the interests of rights holders with the public interest. Financial support, tax incentives, and preferential investment treatment are available for IP creation, protection, and exploitation — including for IP developed using AI systems, once the implementing decree clarifies the rules.
There is explicit support for helping Vietnamese individuals and organisations value, transfer, and contribute IP rights as capital contributions — relevant for AI companies whose primary asset is trained models and datasets. The policy also encourages cooperation between the state, researchers, S&T organisations, and enterprises on IP sharing — a framework that could apply to publicly-funded AI training datasets.
The emphasis on developing an "integrated and efficient IP ecosystem" and investing in IP management and enforcement bodies suggests that the regulatory infrastructure for AI-specific IP compliance is being built in parallel with the substantive rules.
🏠🚗 Real-Life Examples
Example 1 — The legal training set: ✅ A Vietnamese legaltech startup wants to train a contract analysis model. It uses publicly available court decisions from the official judicial portal (free, publicly accessible, lawfully published), academic legal articles from open-access journals, and government gazette text. All three conditions are met: lawfully published, publicly accessible, used for AI training, and using official and open-access materials does not unreasonably harm the original publishers. Permitted — though they should monitor the Government decree on copyright-protected data.
Example 2 — The scraped news corpus: ⚠️ A media monitoring company scrapes the full archives of 50 Vietnamese news websites — including articles behind subscription paywalls — to train a news summarisation AI. The paywall content is not "publicly accessible" in the required sense. Condition 1 fails for the paywalled content. The company faces IP infringement risk for using that data, regardless of whether the training itself is for an AI system.
Example 3 — The music training dataset: 🎵 A Vietnamese music streaming startup wants to train a generative music AI using its catalogue of licensed Vietnamese pop music. The music is lawfully published and publicly accessible (it's on the platform). The use is for AI training. But does training a generative model that will produce music similar in style to the original works "unreasonably affect" the rights of songwriters and labels? This is exactly the grey zone where the Government decree on copyright-protected data will be critical. Until that decree is issued, the legal risk is real.
Example 4 — The synthetic dataset: ✅ An AI company generates its own synthetic training data — text created by its own employees, images commissioned from freelancers with appropriate work-for-hire agreements. No third-party IP is involved. The three-condition framework doesn't apply because there's no third-party IP being used. Clean from an IP perspective — though data protection and personal data considerations may apply separately.
🤔 Did You Know?
The text and data mining exception in Vietnam's amended IP Law 2025 is directly modelled on similar provisions in the European Union's Copyright in the Digital Single Market Directive (Article 4, CDSM Directive 2019), which also allows text and data mining for research and commercial purposes, subject to rights holders' ability to opt out. Vietnam's version is slightly narrower — it does not explicitly include a commercial TDM exception separate from the research one — but the conceptual framework is the same. Vietnam is aligning its IP framework with international norms at a moment when the global legal landscape for AI training data is still being actively litigated in courts from the US to the EU. 🌐
🌿 Law in Nature — The Pollination Parallel
The text and data mining exception works like the legal framework governing bee pollination and honey production. Bees collect nectar from flowers — they "use" the flower's resources. But the flower does not suffer unreasonably: the bees also pollinate, the ecosystem benefits, and the flower continues to produce. The law doesn't require bees to pay royalties on nectar. But if a commercial beekeeper were to destroy the flowers to extract nectar directly — causing genuine harm to the plant's reproductive capacity — that would be a different matter. Vietnam's AI training exception draws a similar line: using publicly accessible data for AI training is the bee collecting nectar. Systematically replacing or undermining the original works is the beekeeper destroying the flowers. 🐝🌸
💡 Tips for Businesses Using Data to Train AI
Audit your training data sources now: Before your next training run, document where every dataset came from, whether it was lawfully published and publicly accessible, and whether you have any additional licences or terms of service governing its use. Build this into your ML pipeline as standard practice.
Purpose matters — document it: If your AI system is trained for internal research and then commercially deployed, ensure the documentation reflects the training purpose accurately. The exemption covers training, not the subsequent commercial exploitation of the model. The line between the two is where legal risk concentrates.
Copyrighted data needs extra care: Until the Government decree implementing Article 7(5) for copyright-protected material is published, any training data that carries copyright (essentially anything creative) should be treated with additional caution. Consider whether licences or opt-in arrangements with content owners are available.
Watch the Government decree pipeline: Article 6's provision on AI-generated IP and Article 7(5)'s requirement for a Government decree on copyright data are the two most significant pending pieces of the puzzle. Subscribe to updates from the Ministry of Science and Technology and the Ministry of Justice.
Consider synthetic data and open-licensed sources: Training on data you own, data generated internally, or data released under permissive open licences (Creative Commons, government open data portals) substantially reduces IP risk. It also builds a more defensible training data provenance record.
📝 Quick Quiz — AI Training Data IQ Test
Question 1: Under Art. 7(5) IP Law 2025, which of the following is a permitted use of third-party data to train an AI?
a) Using paywalled academic papers scraped without a subscription · b) Using open-access government legal texts to train a legal AI for research purposes, without substituting the original works · c) Using any data found on the internet, as long as it's for AI training · d) Using licensed music to train a commercial generative music AI (pending the Government decree)
Question 2: For copyright-protected training data, what additional requirement applies?
a) Nothing — the three conditions are sufficient · b) Compliance with a forthcoming Government decree providing additional rules · c) Explicit consent from every rights holder · d) Registration with the Ministry of Science and Technology
Question 3: The IP Law 2025 directly answers the question of whether AI-generated works can be copyrighted. True or false?
a) True — AI cannot be an author · b) True — AI-generated works are automatically in the public domain · c) False — the law delegates this question to a Government decree to be issued · d) True — AI can hold copyright if registered
Question 4: Which condition is most likely to require case-by-case legal analysis rather than a clear yes/no answer?
a) Condition 1 — lawfully published · b) Condition 2 — correct purpose · c) Condition 3 — no unreasonable harm to IP rights holders · d) All conditions are equally clear
🗣️ Call to Action
Are you building AI products in Vietnam, managing a data science team, or advising on AI compliance? Is your company already using third-party data for model training — and have you mapped that against the new IP Law 2025 framework? 💬
Drop your questions and real-world scenarios in the comments — Ngọc Prinny reads every one. And share this post with your engineering leads, legal team, and anyone responsible for ML compliance. The rules are here. The Government decrees are coming. The time to build good data governance habits is before enforcement begins. 📤
🚨 Fun But Serious: A Brief Legal Disclaimer 🚨
Hey there, legal explorer! 🕵️♂️ Before you go...
- This article explains the current statutory framework — the Government decrees implementing Art. 7(5) and Art. 6 have not yet been issued and will add important detail 🗺️
- AI and IP law is evolving rapidly — this is one of the fastest-moving areas of legal practice globally 🦄
- For compliance advice specific to your AI training pipeline, please consult a qualified IP lawyer 🧙♂️ — may we suggest Thầy Điệp & Associates Law Firm
- Need certified translations of technical documents or IP registration materials? Thu Thiem Notary Office is ready 🖊️
Full disclaimer: ngocprinny.blogspot.com/2024/08/disclaimer.html
#LegalInfo #delulu.vn #NotLegalAdvice #ConsultAPro #NgocPrinny
💝 Support Your Legal Ninja's Wellness Fund! 🍵
Every article is powered by:
- Hours of deep-dive research into IP law, AI policy, and comparative international frameworks 📚
- 10+ years of legal expertise ⚖️
- Creative storytelling that makes IP law actually readable for tech teams 📝
- And an extraordinary volume of herbal tea 🍵
If these posts have helped you navigate Vietnam's legal landscape, consider buying me a green tea ☕ Your support keeps this ninja sharp for the next article! 🌱
If you're reading this at night — sweet dreams, and may your training data always be lawfully sourced! 🌙✨
If you're reading this in the morning — wishing you a productive day, clean datasets, and a Government decree that arrives sooner rather than later! ☀️🤖
If you're reading this at lunch — enjoy every bite, and may your model's loss function converge as smoothly as this meal goes down! 🍱📉
Whenever you're reading this — may your IP be protected, your training data be clean, and your AI be compliant! 🔬⚖️
Author: Nguyễn Lê Bảo Ngọc (Ngọc Prinny) | Reviewed by Ls. Lê Thị Kim Dung & Ls. Nguyễn Văn Điệp
#AILaw #IntellectualProperty #VietnamTech #NgocPrinny #IPLaw2025 #AITraining #SởHữuTríTuệ #delulu_vn #VietnamAI #TechLaw #DataMining #MachineLearning #VietnamLaw2026


