[{"content":"Why build a WhatsApp RAG? I have a very active group chat with my friends on WhatsApp. At the time of writing, it is a bit over half a million messages. Since LLMs became a thing, I always wondered how I could use this data for something useful—or at the very least, prank my friends.\nLast year I tried a few different approaches to fine tune a model using the chat data, but it didn\u0026rsquo;t work all that well. Fine‑tuning a model on commodity hardware is a challenge in itself and the results were underwhelming. So I dropped that idea for a while. While going through the material for the HuggingFace Agents Course though, it became very clear that RAG (Retrieval Augmented Generation) would be a perfect fit for what I was trying to do.\nThis post shows how easy it is to set up a RAG on top of your WhatsApp chat logs. You are going to export your messages, parse the .txt files, index them with LlamaIndex, generate embeddings and store them in DuckDB, and ask questions locally running Ollama. You’ll end with a small chat application that you can use to ask questions about your conversation log. The best part is that everything can run from your local machine, so you don\u0026rsquo;t have to upload any of this sensitive data to the cloud.\nRAG, embeddings, and vector databases Before diving into the implementation, here are some definitions of key concepts used in this experiment:\nRAG: Retrieval‑Augmented Generation; first retrieves relevant chunks from your data, then generates an answer grounded in those snippets. Embeddings: Numeric vector representations of text; semantically similar texts map to nearby vectors, enabling semantic similarity search. Vector database: A store/index optimized for embeddings and fast similarity search (e.g., retrieving the top‑k most relevant chunks). Here LlamaIndex’s DuckDB vector store is used to persist vectors locally. This is what the workflow looks like once fully implemented:\nflowchart %% Ingestion subgraph Ingestion A[WhatsApp exports] --\u0026gt; B[Parse/Cleanup] B --\u0026gt; C[Chunking] C --\u0026gt; D[Generate embeddings] D --\u0026gt; E[(DuckDB Vector Store)] end %% Query subgraph Query Q[User question] --\u0026gt; Qe[Embed query] Qe --\u0026gt; E E --\u0026gt; K[Top-k similar chunks] K --\u0026gt; L[LLM generation] L --\u0026gt; A2[Grounded answer] end Prerequisites Python 3.10+ One or more WhatsApp chat exports in .txt format ollama running a local model (e.g., llama3 or gpt-oss) uv to manage Python dependencies Create a new project using uv and add the dependencies:\nuv init whatsapp-rag cd whatsapp-rag uv add \\ llama-index-llms-ollama \\ llama-index-vector-stores-duckdb \\ llama-index-embeddings-huggingface \\ gradio Export your chats Create a new directory for the chat data inside your project:\nmkdir input You need to export your chat messages to a text file.\niOS: Chat → Contact info → Export Chat → Without Media → Save/Share .txt Android: Chat → More → Export chat → Without media Name each file clearly: family.txt, work.txt, etc., and place them in the ./input folder. Ingest chat logs Create a new file named ingest.py and populate with this content:\nfrom llama_index.core import SimpleDirectoryReader, StorageContext, VectorStoreIndex from llama_index.core.node_parser import TokenTextSplitter from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.vector_stores.duckdb import DuckDBVectorStore vector_store = DuckDBVectorStore(\u0026#34;duck.db\u0026#34;, persist_dir=\u0026#34;./data/\u0026#34;) storage_context = StorageContext.from_defaults(vector_store=vector_store) embed_model = HuggingFaceEmbedding(model_name=\u0026#34;BAAI/bge-m3\u0026#34;) splitter = TokenTextSplitter(chunk_size=512, separator=\u0026#34;\\r\\n\u0026#34;) documents = SimpleDirectoryReader(\u0026#34;./input/\u0026#34;).load_data() index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, transformations=[splitter], embed_model=embed_model, show_progress=True, ) The script loads your exported .txt chats, splits them into retrieval‑friendly chunks, calculates the embeddings, and persists everything to DuckDB:\nVector store: DuckDBVectorStore(\u0026quot;duck.db\u0026quot;, persist_dir=\u0026quot;./data/\u0026quot;) stores both vectors and metadata on disk under ./data/, so you can reuse the index without re‑ingesting. Embeddings: BAAI/bge-m3 is a strong multilingual embedding model that runs locally via Hugging Face. You can swap it for a smaller/faster model if needed. Chunking: TokenTextSplitter(chunk_size=512, separator=\u0026quot;\\r\\n\u0026quot;) breaks the raw chat text along line breaks, keeping messages together while limiting token length for better retrieval. Reader: SimpleDirectoryReader(\u0026quot;./input/\u0026quot;) loads every .txt file in the folder and attaches basic file metadata (e.g., filename). Index build: VectorStoreIndex.from_documents(...) generates embeddings for each chunk and writes them to DuckDB with progress reporting. After running this once, the built index is persisted and can be opened later for querying without reprocessing the input files.\nTo run the script, use:\nuv run ingest.py Main chat app Next, you have the actual RAG application that uses the index generated on the previous step.\nimport gradio from llama_index.core import VectorStoreIndex from llama_index.core.prompts import ChatMessage from llama_index.embeddings.huggingface import HuggingFaceEmbedding from llama_index.llms.ollama import Ollama from llama_index.vector_stores.duckdb import DuckDBVectorStore embed_model = HuggingFaceEmbedding(model_name=\u0026#34;BAAI/bge-m3\u0026#34;, device=\u0026#34;cpu\u0026#34;) vector_store = DuckDBVectorStore.from_local(\u0026#34;./data/duck.db\u0026#34;) index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model) llm = Ollama( model=\u0026#34;gpt-oss:20b\u0026#34;, request_timeout=300, context_window=1024 * 10, ) engine = index.as_chat_engine( llm=llm, similarity_top_k=5, system_prompt=( \u0026#34;You are a helpful assistant that searches WhatsApp \u0026#34; \u0026#34;messages to answer questions\u0026#34; ), streaming=True, ) def stream(input: str, history: list[dict[str, str]]): chat_history = [ ChatMessage(role=item[\u0026#34;role\u0026#34;], content=item[\u0026#34;content\u0026#34;]) for item in history ] content = \u0026#34;\u0026#34; for token in engine.stream_chat(input, chat_history=chat_history).response_gen: content += token yield content chat = gradio.ChatInterface( fn=stream, type=\u0026#34;messages\u0026#34;, title=\u0026#34;RacinhoGPT\u0026#34;, ).launch() This file wires the stored index to an LLM and a simple chat UI:\nEmbed model: here the embedding model is loaded on the CPU to save VRAM for the LLM model. Also, only the user\u0026rsquo;s prompt needs to be processed so using the GPU won\u0026rsquo;t provide much performance benefit. Load index: DuckDBVectorStore.from_local(\u0026quot;./data/duck.db\u0026quot;) reopens the previously persisted vectors, and VectorStoreIndex.from_vector_store(...) prepares a retriever over them using the same embedding model. Local LLM: Ollama(model=\u0026quot;gpt-oss:20b\u0026quot;) runs a local model for generation. You can replace it with another Ollama model (e.g., llama3) if preferred. Make sure to configure an appropriate context window that fits in your hardware budget. I\u0026rsquo;m using a GeForce RTX 4060 Ti 16 GB, so 10k tokens was the right number to fit the model, system prompt, the RAG context, and the user prompt. Chat engine: index.as_chat_engine(...) handles retrieval‑augmented generation, retrieving 5 similar chunks, with a concise system prompt. Streaming: engine.stream_chat(...) yields tokens as they are generated; the stream function accumulates and streams them back to Gradio for a live UI. History: Incoming history messages are converted to ChatMessages so the LLM can keep context across turns. UI: gradio.ChatInterface provides a minimal chat app you can open in the browser. Title is arbitrary—rename freely. uv run main.py Once launched, type a question like “Are there any discussions of a ski trip?” The assistant retrieves relevant messages from your chats and answers grounded in those snippets.\nWhere to go from here Here are some ideas on how to improve on this example:\nPlay around with different LLM and embedding models. Tune the system_prompt, chunk_size, top_k, and the context_window values according to your hardware, and compare which combinations deliver the most reliable results. Turn the application into an agent. So far, the application only performs a single‑shot call to the model with the retrieved context. You can improve results by using an agentic loop to retrieve information. This can be achieved by transforming the query engine into a tool using the QueryEngineTool and AgentWorkflow classes, both from LlamaIndex. Wrap‑up With a few dozen lines of parsing and LlamaIndex’s indexing/query APIs, you get a private, semantic interface to your WhatsApp history. This example isn’t limited to WhatsApp chats; you can easily adapt it to other file formats using LlamaIndex‑provided parsers. I highly recommend checking it out.\n","permalink":"https://blog.juzam.pro/posts/2025-09-05/zaprag/","summary":"\u003ch2 id=\"why-build-a-whatsapp-rag\"\u003eWhy build a WhatsApp RAG?\u003c/h2\u003e\n\u003cp\u003eI have a very active group chat with my friends on WhatsApp. At the time of\nwriting, it is a bit over half a million messages. Since LLMs became a thing, I\nalways wondered how I could use this data for something useful—or at the very\nleast, prank my friends.\u003c/p\u003e\n\u003cp\u003eLast year I tried a few different approaches to fine tune a model using the chat\ndata, but it didn\u0026rsquo;t work all that well. Fine‑tuning a model on commodity\nhardware is a challenge in itself and the results were underwhelming. So I\ndropped that idea for a while. While going through the material for the\n\u003ca href=\"https://huggingface.co/learn/agents-course/unit2/llama-index/components\"\u003eHuggingFace Agents Course\u003c/a\u003e\nthough, it became very clear that RAG (Retrieval Augmented Generation) would be\na perfect fit for what I was trying to do.\u003c/p\u003e","title":"Ask your WhatsApp: build a private RAG with LlamaIndex"},{"content":" Update 2025-09-05: Added a section for fixes for Fedora and Rocky Linux\nI\u0026rsquo;ve gone through a few iterations of tools to manage the VMs in my homelab: VirtualBox, Vagrant, Proxmox—you name it. Even a cobbled‑together solution with Libvirt, qcow2 images, and a Makefile.\nOver the years, I\u0026rsquo;ve homed in on a few requirements that make sense for my use cases:\nIntegrates well with my workflow (i.e., has a good CLI) Supports name resolution Makes downloading images easy Good support for cloud-config Supports OpenZFS With my previous attempts, something was always missing or wasn\u0026rsquo;t fully supported. LXD checks all these boxes for me. I\u0026rsquo;m not a huge fan of the Snap installation, and after Canonical’s takeover, I\u0026rsquo;ll migrate to Incus when time allows. But I am pretty happy with the tool itself.\nI use LXD mostly to manage virtual machines. I don\u0026rsquo;t have much use for OS‑level containers because most of the services that I run on my homelab are application containers running with Podman. I spin up a VM whenever I want to experiment with something new and need the full isolation it provides. So I play around with different distros like Ubuntu, Debian, Fedora, Rocky, Alma, and CentOS.\nVMs hanging during boot But what\u0026rsquo;s the fun in technology without an itch to scratch? I noticed an issue where some of those distros would sometimes get stuck during boot. I would see a message like this in the lxc console:\n[ *** ] A start job is running for /dev/loop4p2 (1w 15h 22min 46s / no limit) In particular, this happens with Red Hat–based distros like Fedora and Rocky Linux. As you can see, this VM has been stuck on this job for more than a week. Yes, I hear you—I should have better monitoring for my homelab. But like I said, these are usually ephemeral boxes that I use for testing.\nAfter digging around a bit, I found that some people were having a similar issue with brand‑new VMs, and it had to do with how the images were created. That wasn\u0026rsquo;t my case—these VMs were working fine for a while, and then, all of a sudden, this would happen.\nLooking more closely at that forum post, the last message states that after running yum update, the user would experience the same behavior. Experimenting with a few installations, I was able to zero in on the root cause. When there\u0026rsquo;s a kernel update, the generated /boot/loader/entries/\u0026lt;uuid\u0026gt;-\u0026lt;version\u0026gt;-\u0026lt;arch\u0026gt;.conf looks a little like this:\ntitle Rocky Linux (5.14.0-570.26.1.el9_6.x86_64) 9.6 (Blue Onyx) version 5.14.0-570.26.1.el9_6.x86_64 linux /boot/vmlinuz-5.14.0-570.26.1.el9_6.x86_64 initrd /boot/initramfs-5.14.0-570.26.1.el9_6.x86_64.img options root=/dev/loop4p2 ro console=tty1 console=ttyS0 grub_users $grub_users grub_arg --unrestricted grub_class rocky Comparing it to the previous version, the error is clear:\ntitle Rocky Linux (5.14.0-570.25.1.el9_6.x86_64) 9.6 (Blue Onyx) version 5.14.0-570.25.1.el9_6.x86_64 linux /boot/vmlinuz-5.14.0-570.25.1.el9_6.x86_64 initrd /boot/initramfs-5.14.0-570.25.1.el9_6.x86_64.img options root=UUID=11111111-2222-3333-4444-555555555555 ro console=tty1 console=ttyS0 grub_users $grub_users grub_arg --unrestricted grub_class rocky The fix is quite simple: you need to replace the root=/dev/loopXpY parameter in the options section with the correct UUID for your root device. In my case, this would be:\noptions root=UUID=11111111-2222-3333-4444-555555555555 ro console=tty1 console=ttyS0 You can use a working entry—usually the previous kernel\u0026rsquo;s—as a reference to update the value.\nMounting a ZVOL on the host system Here\u0026rsquo;s the tricky part: how do you update the value if the OS won\u0026rsquo;t boot? If you are using LXD with an OpenZFS storage pool, you need to mount the ZVOL on the host to make those changes. Follow these steps:\nMake sure to stop the machine before executing these steps:\nlxc stop rocky Set the volmode to full so it can be read and mounted by the host system:\nzfs set volmode=full tank/lxd/virtual-machines/rocky.block Mount the device at a temporary mount point:\nmount /dev/zvol/tank/lxd/virtual-machines/rocky.block-part2 /mnt Edit the entry file and update the root parameter:\nvi /mnt/boot/loader/entries/a53795788afc466894e149f71508881b-5.14.0-570.26.1.el9_6.x86_64.conf Unmount the filesystem and clean up:\numount /mnt zfs set volmode=none tank/lxd/virtual-machines/rocky.block That\u0026rsquo;s it—you should be able to boot the VM as usual now:\nlxc start rocky This is a quick workaround to get you up and running again. Ideally, whatever generates the entry file would set the correct root parameter, but I\u0026rsquo;m not sure where this is configured. I\u0026rsquo;ll update this post in the future if I find a definitive solution.\nFix After spending some time experimenting with kernel upgrades on both Fedora and Rocky, I managed to come up with a solution.\nFirst, check whether the filename prefix of the entries under /boot/loader/entries matches the output of systemd-machine-id-setup --print. I believe that, due to the way the image is generated, the machine ID is reset on the VM’s first boot. This causes an unrelated issue where, after a kernel update, the machine boots into the older kernel.\nFor the root filesystem issue, determine the UUID of the rootfs device using blkid:\nblkid Take note of the UUID value for the partition labeled as rootfs.\nRocky On Rocky and similar distros, you can use grubby to update the boot configuration.\ngrubby --update-kernel ALL --args \u0026#39;root=UUID=11111111-2222-3333-4444-555555555555\u0026#39; Fedora Fedora doesn\u0026rsquo;t ship with grubby by default, but you can achieve the same result by updating /etc/default/grub and adding the following setting:\n# Set device UUID GRUB_DEVICE_UUID=\u0026#34;11111111-2222-3333-4444-555555555555\u0026#34; ","permalink":"https://blog.juzam.pro/posts/2025-08-29/lxd-rhel-images/","summary":"RHEL-based LXD VMs (Rocky, Fedora) can hang at boot after kernel updates.\nThis post explains the root cause and shows how to fix the boot entry, and get the VM booting again.","title":"LXD: Rocky Linux VM gets stuck during boot"},{"content":"I\u0026rsquo;ve been working through the Hugging Face agents course, and I’m enjoying it quite a bit. Highly recommended! First, it’s rounding out my knowledge of LLMs, transformers, and AI in general. Second, it paints a very clear picture of what agentic AI is all about—while staying away from the hype. I’ll try to summarize here, but I really recommend checking out the full course.\nThis is not a formal definition, but I think the crucial feature of agents is the ability to use tools to interact with the environment. Instead of relying solely on the knowledge of the model itself, agents can search the web, access web pages, and use Unix commands like find, ls, and grep to help answer your questions. Another key characteristic is that this all happens in a loop, giving the agent the ability to course correct in case things don\u0026rsquo;t go as planned in order to achieve its goal.\nThis is known as the Re-Act loop, and it looks something like this:\nstateDiagram-v2 direction LR [*] --\u0026gt; Prompt Prompt --\u0026gt; Think Think --\u0026gt; Act Act --\u0026gt; Observe Observe --\u0026gt; Think Observe --\u0026gt; [*] It all starts with a prompt that gives the agent a task to complete. The agent \u0026ldquo;thinks\u0026rdquo; using an LLM model to decide which tools should be used to complete the task. It acts by executing those tools and collecting the observations. It then decides if the task is complete, otherwise it goes back for another iteration until it solves the problem. Of course this is the happy path and many things can go wrong here. But conceptually that\u0026rsquo;s what the agent does.\nFirst agent using smolagents If that seems too abstract, let me walk you through a real example using a simple agent that uses a couple of tools to calculate the distance between two cities. Allow me to dive into the code.\nThis example is based on the smolagents framework from Hugging Face. It focuses on the CodeAgent class, which is a special kind of agent that uses Python code to answer its requests. I\u0026rsquo;ll be using uv to manage the dependencies for this tutorial, so if you don\u0026rsquo;t have it already, go ahead and install it following the installation steps first.\nCreate a new project and add the required dependencies:\nuv init ai-agent cd ai-agent uv add smolagents[openai,toolkit] Update your main.py file\nimport math from smolagents import CodeAgent, WebSearchTool, OpenAIServerModel, tool @tool def distance(point1: tuple[float, float], point2: tuple[float, float]) -\u0026gt; float: \u0026#34;\u0026#34;\u0026#34; Haversine formula to calculate distance between two points on Earth Args: point1: a tuple containing the latitude and longitude of point 1 in decimal format point2: a tuple containing the latitude and longitude of point 2 in decimal format Output: Returns the distance in km between the 2 points. \u0026#34;\u0026#34;\u0026#34; # Convert decimal degrees to radians lat1, lon1, lat2, lon2 = map(math.radians, [*point1, *point2]) # Haversine formula dlat = lat2 - lat1 dlon = lon2 - lon1 a = ( math.sin(dlat / 2) ** 2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2 ) c = 2 * math.asin(math.sqrt(a)) # Radius of earth in kilometers r = 6371 return c * r model = OpenAIServerModel( api_base=\u0026#34;http://localhost:11434/v1\u0026#34;, model_id=\u0026#34;qwen3-coder:30b\u0026#34;, api_key=\u0026#34;ollama\u0026#34;, ) agent = CodeAgent(tools=[WebSearchTool(), distance], model=model) res = agent.run(\u0026#34;\u0026#34;\u0026#34; I want to calculate the distance between Toronto and New York City. - You should find the geo coordinates for Toronto - Find the geo coordinates for New York City - Calculate the distance between those 2 coordinates \u0026#34;\u0026#34;\u0026#34;) Let me break down the code:\nImports: bring in the smolagents classes and functions that are used to implement the agent. Tool definition: defines a tool that calculates the distance between two coordinates using the Haversine formula. Model API: I\u0026rsquo;m using Ollama to run this test on an RTX 4060 Ti with 16 GB of VRAM on a Debian 13 system. The qwen3-coder:30b model was the best performing option that would run on my hardware, but your mileage may vary. You can also use the InferenceClient and the Hugging Face API to run these tests. I had to find an alternative solution because I ran out of credits. Instantiate the agent: here the agent is created with all the necessary tools and also a reference to the model. Run the prompt and monitor the logs Execute the file by running:\nuv run main.py Example output:\n╭─────────────────────────────────── New run ────────────────────────────────────╮ │ │ │ I want to calculate the distance between Toronto and New York City. │ │ - You should find the geo coordinates for Toronto │ │ - Find the geo coordinates for New York City │ │ - Calculate the distance between those 2 coordinates │ │ │ ╰─ OpenAIServerModel - qwen3-coder:30b ──────────────────────────────────────────╯ The agent loads the model and executes the prompt. It tries to break the task in a series of steps that it can more easily reason about.\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ─ Executing parsed code: ─────────────────────────────────────────────────────── toronto_coords = web_search(query=\u0026#34;Toronto latitude longitude\u0026#34;) print(\u0026#34;Toronto coordinates:\u0026#34;, toronto_coords) ──────────────────────────────────────────────────────────────────────────────── Execution logs: Toronto coordinates: ## Search Results [Latitude and longitude of Toronto, Canada - GPS Coordinates](https://latlong.info/canada/ontario/toronto) What is the latitude and longitude code of Toronto ? The latitude of Toronto , Canada is 43.70011000, and the longitude is -79.41630000. Toronto is located at Canada country in the states place category with the gps coordinates of 43° 42\u0026#39; 0.396\u0026#39;\u0026#39; N and -79° 24\u0026#39; 58.68 E. Geographic coordinates are a way of specifying the location of a place on Earth, using a pair of numbers to represent a ... (more results) ... Out: None [Step 1: Duration 4.40 seconds| Input tokens: 2,110 | Output tokens: 77] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ─ Executing parsed code: ─────────────────────────────────────────────────────── nyc_coords = web_search(query=\u0026#34;New York City latitude longitude\u0026#34;) print(\u0026#34;New York City coordinates:\u0026#34;, nyc_coords) ──────────────────────────────────────────────────────────────────────────────── Execution logs: New York City coordinates: ## Search Results [New York City Latitude and Longitude Map - Maps of World](https://www.mapsofworld.com/lat_long/new-york-city.html) Latitude and longitude of New York City is 40.71278 N and -74.00594 E. Map showing the geographic coordinates of New York City , in United States. (more results) ... Out: None [Step 2: Duration 4.79 seconds| Input tokens: 5,342 | Output tokens: 127] On steps 1 and 2, the agent uses the web_search tool to crawl the web and look for the geographic coordinates for the two cities of interest. That information gets added to the agent memory and is used in the following steps.\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ─ Executing parsed code: ─────────────────────────────────────────────────────── toronto = (43.70011, -79.4163) nyc = (40.712775, -74.005973) distance = distance(toronto, nyc) print(\u0026#34;Distance between Toronto and New York City:\u0026#34;, distance, \u0026#34;km\u0026#34;) ──────────────────────────────────────────────────────────────────────────────── Code execution failed at line \u0026#39;distance = distance(toronto, nyc)\u0026#39; due to: InterpreterError: Cannot assign to name \u0026#39;distance\u0026#39;: doing this would erase the existing tool! [Step 3: Duration 7.77 seconds| Input tokens: 9,382 | Output tokens: 231] Armed with the coordinates from steps 1 and 2, the agent now tries to use the provided distance tool to perform the calculation. Pay close attention to the highlighted lines though. The agent made a mistake when naming the variable and the Python parser complained about it. This gets corrected on the next step. That speaks to the ability of the Re-Act loop to self-correct and find the right answer, as long as you provide the correct feedback mechanisms. In this case the Python interpreter provided an error message.\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 4 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ─ Executing parsed code: ─────────────────────────────────────────────────────── toronto = (43.70011, -79.4163) nyc = (40.712775, -74.005973) distance_result = distance(toronto, nyc) print(\u0026#34;Distance between Toronto and New York City:\u0026#34;, distance_result, \u0026#34;km\u0026#34;) ──────────────────────────────────────────────────────────────────────────────── Execution logs: Distance between Toronto and New York City: 555.6065996863686 km Out: None [Step 4: Duration 9.93 seconds| Input tokens: 12,736 | Output tokens: 348] The correct code is now executed and the final answer is found.\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 5 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ─ Executing parsed code: ─────────────────────────────────────────────────────── final_answer(555.61) ──────────────────────────────────────────────────────────────────────────────── Final answer: 555.61 [Step 5: Duration 2.67 seconds| Input tokens: 16,369 | Output tokens: 396] Conclusion I was very skeptical of the current AI hype cycle, so I was mostly following it from a safe distance for lack of a better word. But since I started experimenting with Claude Code and similar tools, something clicked. The multi-step approach combined with tool usage and verifications is really interesting. As a software engineer, this also resonates a lot with me because now I can shape and give tools to the LLM so it can complete tasks and verify its outputs.\nI haven\u0026rsquo;t completed the course yet and I am curious about other tools like LlamaIndex and how RAG plays a role in the agent\u0026rsquo;s workflow. The course also clearly favors code agents rather than tool calling (JSON-based) but it does very little to compare both approaches. I can only imagine the kinds of attacks a code- generating agent would be susceptible to, so I\u0026rsquo;m not convinced that it is a clear winner. Also, the fact that most other players like Anthropic prefer tool- calling agents gives me pause. I might do a follow-up article as I advance through the course, but that is it for now.\n","permalink":"https://blog.juzam.pro/posts/2025-08-25/hugging-face-agents/","summary":"\u003cp\u003eI\u0026rsquo;ve been working through the Hugging Face\n\u003ca href=\"https://huggingface.co/learn/agents-course/\"\u003eagents course\u003c/a\u003e, and I’m enjoying\nit quite a bit. Highly recommended! First, it’s rounding out my knowledge of\nLLMs, transformers, and AI in general. Second, it paints a very clear picture of\nwhat agentic AI is all about—while staying away from the hype. I’ll try to\nsummarize here, but I really recommend checking out the full course.\u003c/p\u003e\n\u003cp\u003eThis is not a formal definition, but I think the crucial feature of agents is\nthe ability to use tools to interact with the environment. Instead of relying\nsolely on the knowledge of the model itself, agents can search the web, access\nweb pages, and use Unix commands like find, ls, and grep to help answer your\nquestions. Another key characteristic is that this all happens in a loop, giving\nthe agent the ability to course correct in case things don\u0026rsquo;t go as planned in\norder to achieve its goal.\u003c/p\u003e","title":"Hugging face agents"},{"content":"This post describes the technique I use to organize my configuration files (i.e., dotfiles).\nWhat are dotfiles Dotfiles are hidden configuration files of a Unix system that live usually in your home folder and are prefixed by a dot. Things like your .vimrc, .bashrc, or even folders like .config and so on are examples of these configuration files.\nStoring files in git Since these are usually text files, using a source version control system makes a lot of sense to store these files. I used to have a github repository that I would store all these files and then symlink them whenever I was setting up a new system. That process became tedious and error prone very quickly, so I decided to create an install script to automate the whole process.\nThat workflow served me well for a while. The biggest challenge was that every time I would make any changes to the repository, for example add a new file, I would have to pull the changes from the remote but also create new symlinks for the new files. At some point I\u0026rsquo;ve made the install script idempotent so I could always safely rerun the script and update all missing links. Still I wasn\u0026rsquo;t fully satisfied.\nA new approach Around the same time I was growing tired of zsh and oh-my-zsh for its performance and complexity. I started experimenting with fish and hit it off right away. Around the same time I was ditching vscode for its ridiculous amount of bloat it accrued over the years and replacing it with neovim. So my base configuration was growing quite a bit as well.\nThere are plenty of tools to manage your configuration files with all sort of features. But in my case, nothing could match the simplicity of simply using a bare git repository as described in this article.\nBare repository The idea is quite simple, you create a bare repository in a hidden folder and then set the work tree to the $HOME folder. The article also suggests creating an alias to deal with this as in:\nalias cfg=\u0026#39;/usr/bin/git --git-dir=$HOME/.cfg/ --work-tree=$HOME\u0026#39; In my case, since I\u0026rsquo;m using fish, I\u0026rsquo;ve created a function that wraps git so I still get the completion suggestions. I also expanded it to add some quality of life features like pulling the base repository, plus all sub-modules, and update fish plugins.\nIt is also important to configure git so it doesn\u0026rsquo;t show the untracked files. Otherwise it thinks your whole home directory is a git repository. This is done by:\ncfg config --local status.showUntrackedFiles no Installation To perform the installation on a fresh system like a VM or a new box, all I need to do is to run the bootstrap script on the new system. Usually this is done by curl -fsSL $URL | sh. It makes sure to install all dependencies and perform the initialization steps required to make it all work. After that I have the cfg function available to maintain things up-to-date.\nFinal thoughts I\u0026rsquo;ve been using this system for some time now and I\u0026rsquo;m very happy with it. It is super easy to bootstrap new machines and pull all configurations that I am used to have. It is also pretty simple to keep everything consistent on the different systems I interact with. No more missing vim plugins or shell functions anywhere.\n","permalink":"https://blog.juzam.pro/posts/2025-08-22/dotfiles/","summary":"\u003cp\u003eThis post describes the technique I use to organize my configuration files\n(i.e., dotfiles).\u003c/p\u003e\n\u003ch2 id=\"what-are-dotfiles\"\u003eWhat are dotfiles\u003c/h2\u003e\n\u003cp\u003eDotfiles are hidden configuration files of a Unix system that live usually in\nyour home folder and are prefixed by a dot. Things like your \u003ccode\u003e.vimrc\u003c/code\u003e,\n\u003ccode\u003e.bashrc\u003c/code\u003e, or even folders like \u003ccode\u003e.config\u003c/code\u003e and so on are examples of these\nconfiguration files.\u003c/p\u003e\n\u003ch2 id=\"storing-files-in-git\"\u003eStoring files in git\u003c/h2\u003e\n\u003cp\u003eSince these are usually text files, using a source version control system makes\na lot of sense to store these files. I used to have a github\n\u003ca href=\"https://github.com/fabiojmendes/shell-goodies\"\u003erepository\u003c/a\u003e that I would store\nall these files and then symlink them whenever I was setting up a new system.\nThat process became tedious and error prone very quickly, so I decided to create\nan install script to automate the whole process.\u003c/p\u003e","title":"Managing my dotfiles"},{"content":"Hi, my name is Fabio. I\u0026rsquo;m passionate about all things related to hardware, software, programming languages, and distributed systems. As a home lab enthusiast, I also enjoy running my own infrastructure.\nProfessionally, I am a principal architect with a focus on complex distributed systems leveraging Kafka and a variety of different data stores like Redis, MySQL, OpenSearch, and Databricks. I use this space as a conduit to express my interests and creativity about embedded systems, hardware design, and low level programming that I don\u0026rsquo;t get to do in my day job.\nI was motivated to start blogging after a few insights from a couple of podcasts that I listened to: Oxide and Friends - Technical Blogging and Screaming in the Cloud with Simon Willison. That\u0026rsquo;s a muscle I would like to develop. Writing helps me document my journey as I learn new things.\n","permalink":"https://blog.juzam.pro/about/","summary":"\u003cp\u003eHi, my name is Fabio. I\u0026rsquo;m passionate about all things related to hardware,\nsoftware, programming languages, and distributed systems. As a home lab\nenthusiast, I also enjoy running my own infrastructure.\u003c/p\u003e\n\u003cp\u003eProfessionally, I am a principal architect with a focus on complex distributed\nsystems leveraging Kafka and a variety of different data stores like Redis,\nMySQL, OpenSearch, and Databricks. I use this space as a conduit to express my\ninterests and creativity about embedded systems, hardware design, and low level\nprogramming that I don\u0026rsquo;t get to do in my day job.\u003c/p\u003e","title":"About"}]