AI assistance and authorship attribution
AI assistance and authorship attribution
As LLM-based code assistants become routine, "how should we mark that AI helped?" has split into different team decisions. This article objectively organizes the relevant facts (lawsuits, OSS license positions, commit trailer conventions).
1. About AI attribution
Since GitHub Copilot's release (technical preview June 2021, general availability June 2022), debate over the copyright/license position of AI-assisted code has intensified. Core issues:
- Whether training-data licensing is respected — does the model trained on copyleft code (e.g. GPL) generate non-public code that bypasses license obligations?
- Copyright of the output — who has rights over AI-generated output, or do rights even arise?
- Attribution duty — is there an obligation to publicly disclose use of AI assistance?
2. Litigation and policy cases
DOE 1 v. GitHub (2022~) — a class-action lawsuit filed against GitHub, OpenAI, and Microsoft in November 2022 in the US Northern District of California (with multiple anonymous plaintiffs, lead counsel including Matthew Butterick). The issue is whether Copilot violated the attribution and conditions of the OSS licenses (GPL, etc.) of training data. The court has dismissed some claims while letting others proceed in stages. As of 2024, some remaining claims are still alive. The final judgment may shift.
US Copyright Office (USCO) — published the guidance Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence in March 2023. The position is that only the human creative contribution is subject to copyright, and parts generated by AI alone are not protected by copyright. AI portions are recommended to be disclosed at registration. Other jurisdictions like Korea and the EU hold somewhat different positions and are still evolving.
FSF · OSI:
- FSF (Free Software Foundation) publicly raised concerns that Copilot may bypass GPL obligations (2021 funding call, 2022 statements).
- OSI (Open Source Initiative) released Open Source AI Definition 1.0 in October 2024, defining "open source AI" criteria. It specifies the openness scope for data, code, and weights.
These two organizations do not ban the use of AI assistance itself. Their positions are closer to emphasizing license compliance and data transparency.
3. Conventions of attribution
git commit message trailers (Co-authored-by:, Signed-off-by:, etc.) are formats defined by git itself (RFC 822-style headers). GitHub's Co-authored-by trailer is shown as a co-author on the PR page.
One way of marking AI-assisted use that has emerged is a trailer like:
Co-Authored-By: Claude <noreply@anthropic.com>
This is a de facto convention, not a standard. Some tools (Claude Code default settings) add it automatically; some teams turn it off as policy.
4. Policies of not marking
The opposite position bases not marking on:
- The view that AI is an assistance tool like IDE autocomplete or search engines, not subject to attribution.
- The view that the responsibility for the final output lies with whoever committed it anyway.
- The view that external display carries unintended marketing connotations (positive or negative).
Grounds for marking:
- Transparency about training data and possibilities of model output.
- Traceability in case of license disputes.
- The view that it contributes to forming social consensus.
There is no consensus yet on which side is right. Whatever position a team adopts, it is commonly said the reason and consistency matter more.
5. Where responsibility lies
Apart from technical and legal debate, points often emphasized in practice:
- Review responsibility lies with humans — even for AI-suggested code, the reviewer of the PR carries the same review responsibility.
- External library citation — if AI-suggested code closely matches part of a specific OSS, the view is that license obligations follow as-is.
- Secrets and internal code — code assistants using external APIs leak input text outside. Policy is needed to keep secrets and confidential code out of that flow.
6. The spectrum of team policies
| Position | Mark | Use |
|---|---|---|
| Active marking | Trailer on every AI-assisted commit | Free use |
| Implicit use | No marking | Free use |
| Restricted use | No marking | Forbidden in secrets/specific areas |
| External-tool blocked | — | Internal models only |
Suitable positions vary by company policy, customer contracts, and regulation (finance, healthcare).
7. Common pitfalls
It's hard to retroactively distinguish which part of a PR was AI-assisted versus human-written — if marking, do it by consistent rule.
AI-suggested license headers or SPDX identifiers can be wrong — license is a human decision, not a tool decision.
The excuse "AI suggested it, so I'm not responsible" doesn't hold anywhere — output responsibility rests with the PR author.
If policy changes too often, people start ignoring it — once you set a position, hold it for a period and document the reason on change.
Closing thoughts
The attribution policy for AI assistance is a place where social consensus is still forming. Both marking and not marking are justified when there is a consistent reason. Either way, review responsibility lies with humans, secrets don't leak outside, and license is a human decision form the common foundation of every policy.
Next
- feature-flag-skeptic
- naming-readability
DOE 1 v. GitHub litigation tracker · U.S. Copyright Office AI guidance · OSI Open Source AI Definition 1.0 · FSF Statements on Copilot · git commit-trailers · Conventional Commits · SPDX License List · GitHub About AI in code review for reference.