PDF Viewer

BUILDER'S SANDBOX

Build This Paper

Use an AI coding agent to implement this research.

OpenAI Codex
OpenAI CodexAI Agent

Lightweight coding agent in your terminal.

Claude Code
Claude CodeAI Agent

Agentic coding tool for terminal workflows.

AntiGravity IDE
AntiGravity IDEScaffolding

AI agent mindset installer and workflow scaffolder.

Cursor
CursorIDE

AI-first code editor built on VS Code.

VS Code
VS CodeIDE

Free, open-source editor by Microsoft.

MVP Investment

$9K - $12K
6-10 weeks
Engineering
$8,000
Cloud Hosting
$240
SaaS Stack
$300
Domain & Legal
$100

6mo ROI

0.5-1.5x

3yr ROI

5-12x

Computer vision products require more validation time. Hardware integrations may slow early revenue, but $100K+ deals at 3yr are common.

Talent Scout

Q

Quoc Khoa Tran

Monash University

T

Thanh Thi Nguyen

Monash University

C

Campbell Wilson

Monash University

Find Similar Experts

Illicit experts on LinkedIn & GitHub

References (47)

[1]
Misinformation Detection using Large Language Models with Explainability
2025Jainee Patel, Chintan Bhatt et al.
[2]
Large Language Models for Detection of Life-Threatening Texts
2025T. T. Nguyen, Campbell Wilson et al.
[3]
Few Images, Many Insights: Illicit Content Detection Using a Limited Number of Images
2024Giuseppe Cascavilla, Gemma Catolino et al.
[4]
SIGNIFICANCE deep learning based platform to fight illicit trafficking of Cultural Heritage goods
2024E. Malinverni, D. Abate et al.
[5]
Online Illegal Cryptomarkets
2024Dana L. Haynie, Scott W. Duxbury
[6]
Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models
2024Keyan Guo, Ayush Utkarsh et al.
[7]
Marketness and Governance: A Typology of Illicit Online Markets
2024Rasmus Munksgaard
[8]
Multi-Identity Recognition of Darknet Vendors Based on Metric Learning
2024Yilei Wang, Yuelin Hu et al.
[9]
Illicit Darkweb Classification via Natural-language Processing: Classifying Illicit Content of Webpages based on Textual Information
2023Giuseppe Cascavilla, Gemma Catolino et al.
[10]
Fine-Tuning Llama 2 Large Language Models for Detecting Online Sexual Predatory Chats and Abusive Texts
2023Thanh Thi Nguyen, Campbell Wilson et al.
[11]
Understanding and preventing the advertisement and sale of illicit drugs to young people through social media: A multidisciplinary scoping review.
2023Ashly Fuller, Marie Vasek et al.
[12]
Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug Trafficking Detection on Social Media
2023Chuanbo Hu, Bin Liu et al.
[13]
Towards Safe Cyber Practices: Developing a Proactive Cyber-Threat Intelligence System for Dark Web Forum Content by Identifying Cybercrimes
2023Kanti Singh Sangher, Archana Singh et al.
[14]
Fine-grained classification of drug trafficking based on Instagram hashtags
2022Chuanbo Hu, Bin Liu et al.
[15]
The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey
2022Rick Sauber-Cole, T. Khoshgoftaar
[16]
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
2022Tim Dettmers, M. Lewis et al.
[17]
Towards automatic detection of wildlife trade using machine vision models
2022Ritwik Kulkarni, E. D. Minin
[18]
Chain of Thought Prompting Elicits Reasoning in Large Language Models
2022Jason Wei, Xuezhi Wang et al.
[19]
An Unsupervised Machine Learning Approach for the Detection and Characterization of Illicit Drug-Dealing Comments and Interactions on Instagram
2021Neal Shah, Jiawei Li et al.
[20]
LoRA: Low-Rank Adaptation of Large Language Models
2021J. Hu, Yelong Shen et al.

Showing 20 of 47 references

Founder's Pitch

"A tool using LLMs for accurate, multilingual illicit content detection on e-commerce platforms."

Illicit Content DetectionScore: 9View PDF ↗

Commercial Viability Breakdown

0-10 scale

High Potential

2/4 signals

5

Quick Build

4/4 signals

10

Series A Potential

4/4 signals

10

Sources used for this analysis

arXiv Paper

Full-text PDF analysis of the research paper

GitHub Repository

Code availability, stars, and contributor activity

Citation Network

Semantic Scholar citations and co-citation patterns

Community Predictions

Crowd-sourced unicorn probability assessments

Analysis model: GPT-4o · Last scored: 3/5/2026

🔭 Research Neighborhood

Generating constellation...

~3-8 seconds

Why It Matters

This research is crucial for enhancing the detection and removal of illicit content from online marketplaces, which is challenging due to multilingual and semantically complex communication. Without it, illegal activities could persist unchecked, risking societal harm and undermining trust in e-commerce.

Product Angle

The research can be productized into a SaaS platform or API that integrates with e-commerce sites to monitor and moderate illicit content automatically, supporting compliance and enhancing user trust.

Disruption

This solution replaces labor-intensive content moderation systems and rule-based automated systems which struggle with linguistic nuances and large-scale implementation. It also surpasses traditional ML models in performance for complex tasks.

Product Opportunity

The increasing reliance on online marketplaces globally has created a demand for solutions that ensure safe transactions by detecting illicit activities, which both platforms and regulatory bodies are willing to invest in. The tool addresses a multibillion-dollar market affected by fraud and illicit trade.

Use Case Idea

A monitoring tool for e-commerce platforms that automatically flags and removes illicit items and communications in real-time, adaptable to multiple languages and new obfuscation tactics.

Science

The approach utilizes large language models, specifically Llama 3.2 by Meta and Gemma 3 by Google, fine-tuned on a multilingual dataset (DUTA10K) to detect and classify illicit content on online marketplaces. These models outperform traditional machine learning methods in complex, multi-class classification tasks due to their superior understanding of language nuances.

Method & Eval

The models were evaluated using the DUTA10K dataset, consisting of multilingual entries from illicit online sources. The study benchmarked fine-tuned Llama 3.2 and Gemma 3 LLMs against SVM, Naive Bayes, and BERT, showing superior performance especially in multi-class classification scenarios.

Caveats

The generalizability of results might be limited to similar multilingual datasets in the domain of illicit content. Additionally, continuous adaptation to evolving illicit communication trends is necessary.

Author Intelligence

Quoc Khoa Tran

Monash University
qtra0027@student.monash.edu

Thanh Thi Nguyen

Monash University
thanh.nguyen9@monash.edu

Campbell Wilson

Monash University
campbell.wilson@monash.edu