A few months ago I used streamlit to build a simple UI, so I can collect manually labeled data for a LLM fine-tuning task at work. Streamlit is fine, but the full process of creating a nice UI with required functionalities for data annotation and data storage management wasn’t trivial.
Today I found out about label-studio which is an easy to use framework (backend and frontend) for data annotation task. It provides various annotation templates for text, image, audio, and video data!
I tried it with uv
locally, but it can be self-hosted using docker-compose
as well:
uv add label-studio
uv run label-studio
Then a very nice looking UI pops out in the browser, you need to sign up (why?) and create a new project. When creating a project, I selected Labeling setup > Natural Language Processing > Text Classification
and saved it with the default choices (positive, negative, neutral). I then imported a sample dataset from huggingface
and ended up with this clean and user friendly interface for lebeling the data.
Label-studio supports multiple users, and it records which user has annotated which record and when. You can export the lebeled data in various formats including CSV. And another cool feature of label-studio is its integration with ML and AI tools, so you can run the same annotated records with a model and compare the output with the human annotated values.
I will use label-studio next time I have a data labeling work.