LLM | Saeed Esmaili

Never Been Easier to Learn

In a discussion about LLMs and their impact on learning, gchamonlive writes on Hackernews: We can finally just take a photo of a textbook problem that has no answer reference and no discussion about it and prompt an LLM to help us understand what’s missing in our understanding of the problem, if our solution is plausible and how we could verify it. LLM changed nothing though. It’s just boosting people’s intention. If your intention is to learn, you are in luck! It’s never been easier to teach yourself some skill for free. But if you just want to be a poser and fake it until you make it, you are gonna be brainrot waaaay faster than usual. ...

Released a new tool: llm-url-markdown

Recently I started using Simon Willison ’s CLI tool which is conveniently called llm . He introduced a particularly useful fragments feature in a recent version of the tool, that allows the user to provide extra information to the llm when working with long context models. Simon himself developed a llm-hacker-news plugin that fetches all the comments in a HackerNews discussion and provides that to llm as an extra context (i.e. a fragment). ...

Comparing local large language models for alt-text generation

I’m always interested in reading how people use language models for automating boring tasks or performing what they wouldn’t be able to do manually. In his post, Dries explores using several local language models for generating alt text for 10,000 photos! This is double-interesting topic to me, since I also love photography . I recommend reading the original post if you’re interested in generating alt text with local llm, specially since he has posted a couple of follow up posts on updating his approach. ...

Building a Personal Content Recommendation System, Part Two: Data Processing and Cleaning

In part one of this blog series , I explored the motivation behind developing a personal recommendation system. The main goals are to learn how recommendation systems work and to build a tool that helps me find interesting blog posts and articles from feeds where only 1 in 20 posts might match my content interests. If you are interested in the technical implementation, the complete codebase is available in this github repository . ...

Enhancing Text-to-SQL With Synthetic Summaries

LLMs are being experimented with to do so many things today, and one of the use cases that sound compelling is getting their help to generate insights from data. What if you could find the answer to your data question without begging a data analyst in your company? But this is easier said than done. To perform this task properly, LLMs need to know about your datasets, the tables, their schemas, and values stored in them. You can provide this information in the prompt itself if your dataset is tiny, but this is not possible in most real life scenarios, since the information will be huge and either it won’t fit the LLM’s context knowledge or it will be very expensive and not feasible. ...

Building a Personal Content Recommendation System, Part One: Introduction

Every morning, my RSS reader greets me with hundreds of new posts. Tech blogs, indie developers’ journals, photography content - they all compete for attention. While I’ve gotten good at quickly scanning through these feeds , I keep wondering about all the great content I might be missing from sources I’ve had to ignore simply because their signal-to-noise ratio doesn’t justify daily checking. On the other hand, the posts that I shortlist from my RSS feeds and read or listen to, end up on a curated repository of articles that have passed my personal quality threshold, so I have access to a valuable collection of content (on Pocket ) that is relevant to my interests. This made me wonder, can I utilize this data, and create a content recommendation system tailored to my preferences? Can I build a system that would review new posts from feeds where only 1 in 20 posts might match my content priorities, and filter those for me? ...

Add Logprobs to Openai Structured Output

When working with LLMs sometimes you want to know if the response you’re getting from the model is the one that at least the model itself is sort of confident about. For example, I recently worked on classifying pull requests into categories like “feature”, “bugfix”, “infrastructure”, etc with LLMs, and as part of the process we wanted to know how many categories should we assign for each PR. We were interested in assigning any number of categories that are relevant to the PR (a PR can be both a “bugfix” and “infrastructure”). It’s hard to get a proper confidence score from an LLM, but logprobs probably is the closest we can get. The problem is, in a structured response generation (e.g. when you prompt the model to generate its response in a JSON format), you’re only interested in the logprobs of the values, not everything. In the example generation below, we are only interested in the logprobs of “bugfix”, “testing”, and “infrastructure”, but not “primary_category”, etc: ...

Label-Studio: Annotate Text and Image Data for AI and ML training

A few months ago I used streamlit to build a simple UI, so I can collect manually labeled data for a LLM fine-tuning task at work. Streamlit is fine, but the full process of creating a nice UI with required functionalities for data annotation and data storage management wasn’t trivial. Today I found out about label-studio which is an easy to use framework (backend and frontend) for data annotation task. It provides various annotation templates for text, image, audio, and video data! ...

Pydantic Logfire for LLM and API Observability

I’ve been using sentry for automatically logging the errors and exceptions of my python projects. A few months ago I needed to log some information if a specific condition is true in my side project’s backend, but I wasn’t able to do this with sentry. It apparently can only work when something fails, and you can’t capture log messages if there’s no failure or exception. I looked for an affordable and user friendly observability tool and settled on using axiom . It has a generous 500GB ingestion on free tier plan, but you can only view the events for the past 30 days time period. So I’ve been exporting the logs every month into a csv file, since I want to be able to view the trend of some behaviours over time. ...

Build a search engine, not a vector DB

If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves. This is exactly what I’ve been trying to communicate in my org in the past few months. It’s 2024 and we still can’t have a proper search engine in organizations to find relevant information from various sources. While this problem remains to be solved, organizations are adapting RAG and AI into their tooling, but are missing the important R of the RAG: Retrieval. I’ve been an advocate of prioritizing search engines over any AI related tool in the past few months, and I found it refreshing to read about this somewhere else: ...

Access Google Gemini LLM via OpenAI Python Library

Google Gemini now can be accessed via OpenAI python library: from openai import OpenAI client = OpenAI( api_key="GEMINI_API_KEY", base_url="https://generativelanguage.googleapis.com/v1beta/openai/" ) ## rest of the code as you would use openai It support basic text generation, image input, function calling, structured output, and embeddings. More info and code examples can be found on Gemini docs .

Understanding Input Masking in LLM Finetuning

I’ve been using conversational alpaca or sharegpt formats for fine-tuning LLMs with Axolotl , but it always felt unnecessary to limit the model on a conversational format when the use-case doesn’t require so. I’m currently working on a project to classify pull requests in my company’s code repositories. The model needs to look at the PR title, description, and code changes, then categorize the PR and explain its reasoning. I thought there must be a way to fine-tune these models with any format I see fitting this specific use-case, and sure there is: Template-free Axolotl ...

To Chunk or Not to Chunk With the Long Context Single Embedding Models

In his excellent write up on state of the art embedding models, Aapo Tanskanen compares the retrieval score for when the source documents are split into chunks and when they’re not: Transformer-based single embedding models have traditionally had 512 token context windows because of the usage of the original BERT encoder. Newer models, like the BGE-M3, have expanded the token window to much larger scales. It could be tempting to forget chunking and just embed long texts as they are. However, that would mean mashing many topics and entities you might want to search for into a single vector representation, which doesn’t sound like a good idea. ...

Lessons After a Half Billion Gpt Tokens

Ken writes about the lessons they’ve learned building new LLM-based features into their product. When it comes to prompts, less is more Not enumerating an exact list or instructions in the prompt produces better results, if that thing was already common knowledge. GPT is not dumb, and it actually gets confused if you over-specify. This has been my experience as well. For a recent project, I first started with a very long and detailed prompt, asking the LLM to classify a text and produce a summary. GPT-4, GPT-3.5, Claude-3-Opus, and Claude-3-Haiku all performed average or poorly. I then experimented with shorter prompts, and with some adjustments I was able to get much better responses with a very much shorter prompt. ...

LLMs Shouldn't Write SQL

Every day a new tools pops out claiming “Throw the data analysts and data scientists of your company away, you don’t need to write SQL anymore, everyone can use data with our groundbreaking ’talk to your data’ tool”, and Benn discusses this: There are thousands of computational devils in details like how to handle nulls. For analysts, describing these specifics in English is inefficient and inexact. For everyone else, they wouldn’t know they need to describe them at all. ...

Optimizing Technical Docs for LLMs

Many companies are integrating LLM question answering tools into their DevEx toolchain. If you’re writing documentation and you’d like to assist these tools to serve people with proper responses to the questions related to what you own, kapa.ai has a few practical tips on optimizing the technical docs for LLMs: A clear hierarchy of headings and subheadings on a page helps LLMs understand the relationships between different sections of your documentation. Troubleshooting sections formatted as Q&A are an effective source for LLMs as they mirror the questions users often ask, making it easier for LLMs to understand and respond to similar questions. Including small, self-standing code snippets can be helpful, especially for products that rely on large and often complex SDKs or APIs. Have a brief description above the code to clarify its purpose and usage. Include comments within the code to explain the logic and functionality. Keep relevant content directly in your docs rather than in linked files such as PDFs, as LLMs have a harder time parsing these. Ensure information conveyed through screenshots is also described in text, as LLMs parse text more efficiently. Clarify all acronyms and specialized terminology within your documentation to aid LLM comprehension. source ...

Fuck You Show Me the Prompt

Hamel dives deep into how LLM frameworks like langchain , instructor , and guidance perform tasks like formatting the response in a valid JSON output. He intercepts the API calls from these Python libraries to shed some light on how many API calls (to OpenAI’s GPT services) they make and what prompt they use. I’ve always been skeptic of the usefulness of many of the LLM “wrapper” libraries, specially for larger and more serious projects, as they are fine for quick prototypes. ...

There Is a Huge Gap in Generative Ai

There is a huge gap in generative AI between the quality you observe when you’re playing with it open endedly, and the quality you observe when you try to use it for a task where you have a specific end goal in mind. This is I think where most of the hype/reality mismatch occurs. This accurately sums up my experience using generating AI in a daily base and building products with this technology. ...

A Rant on Arc Search

Manu writes about apps and businesses who claim to replace search engines by feeding the web page content to a language model and returning the response to the user: Firstly, without a search engine in the mix, the AI has no way to search for anything. So if the goal is to replace the traditional search engine then we’re already failing. Because we’re not replacing anything, we’re just hiding it behind some AI tool. ...

Generating text embeddings locally using sentence-transformers

Recently, I’ve been working on a side project where I use OpenAI’s text-embedding-ada-002 model to generate vector embeddings for text snippets. While this model is inexpensive, the cost can add up when dealing with thousands or millions of text snippets. Therefore, I decided to explore alternatives, particularly those that would allow me to run similar models locally instead of relying on OpenAI’s API. In this post, I’ll share my experience using the sentence-transformers library for this purpose and discuss the pros and cons. ...