Released a new tool: llm-url-markdown

Recently I started using Simon Willison ’s CLI tool which is conveniently called llm . He introduced a particularly useful fragments feature in a recent version of the tool, that allows the user to provide extra information to the llm when working with long context models. Simon himself developed a llm-hacker-news plugin that fetches all the comments in a HackerNews discussion and provides that to llm as an extra context (i.e. a fragment). ...

2025-04-16 · 1 min

Building a Personal Content Recommendation System, Part Two: Data Processing and Cleaning

In part one of this blog series , I explored the motivation behind developing a personal recommendation system. The main goals are to learn how recommendation systems work and to build a tool that helps me find interesting blog posts and articles from feeds where only 1 in 20 posts might match my content interests. If you are interested in the technical implementation, the complete codebase is available in this github repository . ...

2025-03-26 · 5 min · Saeed Esmaili

Add Logprobs to Openai Structured Output

When working with LLMs sometimes you want to know if the response you’re getting from the model is the one that at least the model itself is sort of confident about. For example, I recently worked on classifying pull requests into categories like “feature”, “bugfix”, “infrastructure”, etc with LLMs, and as part of the process we wanted to know how many categories should we assign for each PR. We were interested in assigning any number of categories that are relevant to the PR (a PR can be both a “bugfix” and “infrastructure”). It’s hard to get a proper confidence score from an LLM, but logprobs probably is the closest we can get. The problem is, in a structured response generation (e.g. when you prompt the model to generate its response in a JSON format), you’re only interested in the logprobs of the values, not everything. In the example generation below, we are only interested in the logprobs of “bugfix”, “testing”, and “infrastructure”, but not “primary_category”, etc: ...

2025-03-03 · 3 min

Adding new entries to a Supabase postgres table via REST API

I read this blog post from Jordan on how he created a simple feature request form on his iOS app using Supabase’s Swift library. I wanted to try this out on Python to see how easy would be to implement this when I need a quick way to add a form functionality in a backend service. Here is what I did: Created a free Supabase account . Created a new project. Created a new table in the project and called it feature_request, and added a new column called text. Grabbed the unique API URL and API Key from Project Settings > API. Made an API call in Python: import requests supabase_url = "https://ufcgesssclkrciknuaey.supabase.co" table_name = "feature_request" endpoint = f"{supabase_url}/rest/v1/{table_name}" supabase_key = "<my-supabase-project-api-key>" ## The request will fail if either of apikey or Authorization are not provided headers = { "apikey": supabase_key, "Authorization": f"Bearer {supabase_key}", "Content-Type": "application/json", "Prefer": "return=representation" # This returns the inserted row } data = {"text": "this is a new feature request ..."} response = requests.post(endpoint, json=data, headers=headers) response.json() # [{'id': 1, # 'created_at': '2025-01-02T16:52:44.738549+00:00', # 'text': 'this is a new feature request ...'}] And visiting the table in the Supabase dashboard confirms that the data is inserted as a new row. ...

2025-01-02 · 1 min

Quickly Filter and Aggregate Python Lists

Today I came across this brilliant python library called leopards which allows you to do some basic but frequently used filters and aggregations on python lists. I"ve always used pandas for any quick and adhoc work with large lists or CSV files, but leopards sounds a quick and performant alternative when you don"t need to do any fancy data analysis work. Leopards provides following filters: eq: equals and this default filter gt: greater than. gte: greater than or equal. lt: less than lte: less than or equal in: the value in a list of a tuple. e.g. age__in=[10,20,30] contains: contains a substring as in the example. icontains: case-insensitive contains. startswith: checks if a value starts with a query strings. istartswith: case-insensitive startswith. endswith: checks if a value ends with a query strings. iendswith: case-insensitive endswith. isnull: checks if the value matches any of NULL_VALUES which are ("", ".", None, "None", "null", "NULL") e.g. filter__isnull=True or filter__isnull=False A quick example of its usage with filters: ...

2024-12-19 · 2 min

Pydantic Logfire for LLM and API Observability

I’ve been using sentry for automatically logging the errors and exceptions of my python projects. A few months ago I needed to log some information if a specific condition is true in my side project’s backend, but I wasn’t able to do this with sentry. It apparently can only work when something fails, and you can’t capture log messages if there’s no failure or exception. I looked for an affordable and user friendly observability tool and settled on using axiom . It has a generous 500GB ingestion on free tier plan, but you can only view the events for the past 30 days time period. So I’ve been exporting the logs every month into a csv file, since I want to be able to view the trend of some behaviours over time. ...

2024-12-19 · 2 min

Access Google Gemini LLM via OpenAI Python Library

Google Gemini now can be accessed via OpenAI python library: from openai import OpenAI client = OpenAI( api_key="GEMINI_API_KEY", base_url="https://generativelanguage.googleapis.com/v1beta/openai/" ) ## rest of the code as you would use openai It support basic text generation, image input, function calling, structured output, and embeddings. More info and code examples can be found on Gemini docs .

2024-12-01 · 1 min

Running Python on a serverless GPU instance for machine learning inference

I was experimenting with some speech-to-text work using OpenAI’s Whisper models today, and transcribing a 15-minute audio file with Whisper tiny model on AWS Lambda (3 vcpu) took 120 seconds. I was curious how faster this could be if I ran the same transcription model on a GPU instance, and with a quick search, modal.com seemed like a nice option to spin up a GPU machine, run the code, and shut down the machine, similar to how AWS Lambda works. ...

2024-04-22 · 5 min · Saeed Esmaili

Fuck You Show Me the Prompt

Hamel dives deep into how LLM frameworks like langchain , instructor , and guidance perform tasks like formatting the response in a valid JSON output. He intercepts the API calls from these Python libraries to shed some light on how many API calls (to OpenAI’s GPT services) they make and what prompt they use. I’ve always been skeptic of the usefulness of many of the LLM “wrapper” libraries, specially for larger and more serious projects, as they are fine for quick prototypes. ...

2024-02-29 · 3 min

Hand-drawn xkcd style charts with matplotlib

I’m a big fan of unique charting styles and I avoid using the default matplotlib style whenever possible, as I find it boring and soleless. This preference is not limited to charts, and I also like the hand-drawn styles for fonts and diagrams (excalidraws is a fantastic tool that I use frequently). The hand-drawn style is especially useful when presenting a proof of concept idea. Something very interesting that I’ve recently stumbled upon is an xkcd chart style for matplotlib . This is a fantastic style that I can see myself using frequently going forward. All you need to do is add plt.xkcd() to your existing matplotlib code. ...

2023-12-09 · 2 min · Saeed Esmaili

Topic Classification of Texts Locally Using BERTopic

I’ve been recently working on survey response data that in addition to aggregatable question types like Likert-scale and multiple-choice questions, includes optional free-text questions. Although we are lucky that thousands of the respondents spend time elaborating on questions and leaving comprehensive free-text responses, getting insights from these text responses is challenging. While investigating how to enrich this text data with proper metadata related to their topics, I came across BERTopic which introduces itself as a topic modeling technique to create clusters allowing for easily interpretable topics. In this post, I’ll explore BERTopic and will go through an example to explain what adjustments worked for me. ...

2023-09-12 · 7 min · Saeed Esmaili

Text Chunking and Headings Grouping: A Guide to Parsing Documents with Pandoc and Python

In my previous blog post I explored using the unstructured python library for loading and parsing documents. As I mentioned in the post, although unstructured seems a very useful library, it has a few issues. Since I’m planning to do a semantic search on the paragraphs and feed the relevant ones to a large language model, the library’s inability to reliably identify headings and paragraphs was a big problem for me. ...

2023-07-08 · 7 min · Saeed Esmaili

Demystifying Text Data with the unstructured Python Library (+alternatives)

In the world of data, textual data stands out as being particularly complex. It doesn’t fall into neat rows and columns like numerical data does. As a side project, I’m in the process of developing my own personal AI assistant. The objective is to use the data within my notes and documents to answer my questions. The important benefit is all data processing will occure locally on my computer, ensuring that no documents are uploaded to the cloud, and my documents will remain private. ...

2023-07-05 · 6 min · Saeed Esmaili

Generating text embeddings locally using sentence-transformers

Recently, I’ve been working on a side project where I use OpenAI’s text-embedding-ada-002 model to generate vector embeddings for text snippets. While this model is inexpensive, the cost can add up when dealing with thousands or millions of text snippets. Therefore, I decided to explore alternatives, particularly those that would allow me to run similar models locally instead of relying on OpenAI’s API. In this post, I’ll share my experience using the sentence-transformers library for this purpose and discuss the pros and cons. ...

2023-07-02 · 4 min · Saeed Esmaili

TIL: Simplifying URL Parsing with Python's urlparse Library

Background For quite a while now, I’ve been using Pocket as my go-to read-it-later app. A few months back, I found myself wanting a solution to listen to my saved articles. This led me to explore the text-to-speech feature in the Reader app. Reader gave me the convenience of linking with my Pocket account, synchronizing its inbox automatically with my saved articles. This system has worked well. I’ve been listening to my saved articles on Reader while continuing to save new articles on Pocket. This lets me easily revisit them later if needed. ...

2023-06-25 · 2 min · Saeed Esmaili

Exploring OpenAI's Whisper with Non-English Voices

TL;DR: Whisper.cpp is the fastest when you’re trying to use the large Whisper model on a Mac. For top-quality results with languages other than English, I recommend to ask model to translate into English. About Whisper Whisper is OpenAI’s speech-to-text model and it’s well-known for its impressive results. Although I knew about it for a while, I didn’t get to test its real-world performance until recently. So, I spent a weekend seeing how it could handle converting speeches, in both English and other languages, into text. ...

2023-06-21 · 6 min · Saeed Esmaili