Blog

Using Jupyter from VS code or Standalone project

VS code

To connect a uv-managed project to a Jupyter notebook within VS Code by creating a kernel of the project VS Code requires ipykernel to be present in the project environment.

# Create a project.
uv init project

# Move into the project directory.
cd project

# Add ipykernel as a dev dependency.
uv add --dev ipykernel

# Open the project in VS Code.
code .

Press + + p and select "Create: New Jupyter Notebook" It creates .ipynb file and select the python kernel right side of the note book

Standalone

If you want to standalone project without need pyproject.toml and uv.lock then run the below commands

1
2
3
4
uv venv --seed
uv pip install pydantic
uv pip install jupyterlab
.venv\Scripts\jupyter lab

It launchs the url http://localhost:8888/lab. you can install additional packages via !uv pip install, or even !pip install. To launch the existing notebook again, .venv\Scripts\jupyter lab

Jupyter Notebook


Create a Python Project Using uv and Manage Dependencies

Create a Python Project

uv init pyprojects

This command automatically creates the following project structure inside the pyprojects folder:

1
2
3
4
5
6
.git
.gitignore
.python-version
main.py
pyproject.toml
README.md

pyproject.toml Content

1
2
3
4
5
6
7
[project]
name = "pyprojects"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = []

Run the Entry Script

uv run main.py

Install Dependencies

Add the latest requests package to pyproject.toml:

uv add requests

This updates the file as follows:

1
2
3
dependencies = [
    "requests>=2.32.5",
]

Upgrade a package:

uv add --upgrade requests

Remove a package:

uv remove requests

Add Development Dependencies

uv add --dev pytest

This creates a [dependency-groups] section in pyproject.toml:

1
2
3
4
[dependency-groups]
dev = [
    "pytest>=8.4.2",
]

Add Dependencies from requirements.txt

uv add -r requirements.txt

List Packages in the Virtual Environment

uv pip list

Lock and Sync

Generate a lock file:

uv lock

Synchronize the environment with the lock file:

uv sync

Python Package and Project Manager Tool – uv

uv is a single tool that replaces pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more.

I’ve mostly used pip, pipenv, and virtualenv in the past, but uv combines most of these tasks into one tool. It’s faster, easier to learn, and highly efficient.

You can find installation instructions and documentation here: https://docs.astral.sh/uv/


Why uv?

  • uv is a Python package and project manager that integrates multiple functionalities into one tool.
  • It enables fast dependency installation, virtual environment management, Python version management, and project initialization, all designed to boost productivity.
  • uv can build and publish Python packages to repositories like PyPI, streamlining the process from development to distribution.
  • It automatically creates and manages virtual environments, ensuring clean and isolated project dependencies.

Usage

Installation

You can install uv from the command line:

  • On Windows, use PowerShell.
  • On macOS/Linux, use curl.
  • You can also install it using pipx:
pipx install uv

Refer to the official documentation for detailed installation options.


Verify installation

uv --version

Upgrade to the latest version

1
2
3
uv self update
# or, if installed via pipx
pipx upgrade uv

Install multiple Python versions

uv python install 3.10 3.11 3.12

Create a virtual environment with a specific version

uv venv --python 3.12.0

Pin a specific Python version in the current directory

uv python pin 3.11

View available and installed Python versions

uv python list

Remove the uv, uvx, and uvw binaries

For Windows,

1
2
3
rm $HOME\.local\bin\uv.exe
rm $HOME\.local\bin\uvx.exe
rm $HOME\.local\bin\uvw.exe
For macOS,
rm ~/.local/bin/uv ~/.local/bin/uvx


Configure MCP Server in Visual Studio and VS Code

This article explains how to configure MCP servers in Visual Studio and Visual Studio Code. The Model Context Protocol (MCP) enables servers to expose tools and services that MCP clients — such as GitHub Copilot or Claude — can consume.

This guide demonstrates how to configure the MCP Database server, which queries a SQL Server database and returns results to a chat agent. The agent can then interact with a large language model (LLM) to display results in the chat window or perform actions such as generating SQL or saving query results to a CSV file.

The MCP database server used in this guide is a Node package and can be installed globally with npm:

npm install -g @executeautomation/database-server

VS Code configuration

  1. Open VS Code and press Ctrl+Shift+P. Run MCP: Add Server….
  2. Select the MCP server type. Choose "Command (stdio)" from the list (other types include http, npm, pip, and docker).
  3. For the command, enter: npx.
  4. For Server Id, enter: advworks-db.
  5. Choose the installation scope: Global or Workspace.
  6. Global — available to all workspaces and runs locally
    • This creates mcp.json at C:\Users\<user>\AppData\Roaming\Code\User\mcp.json.
  7. Workspace — available only in the current workspace and runs locally
    • This creates .vscode\mcp.json inside the workspace.

Example mcp.json configuration

Replace hostname\sqlexpress, database name, and credentials with your own values.

{
  "servers": {
    "advworks-db": {
      "type": "stdio",
      "command": "npx",
      "args": [
        "-y",
        "@executeautomation/database-server",
        "--sqlserver",
        "--server", "hostname\\sqlexpress",
        "--database", "AdventureWorks",
        "--user", "sa",
        "--password", "testpwd"
      ]
    }
  },
  "inputs": []
}

Visual Studio configuration

  1. Open the GitHub Copilot chat window and choose an agent (for example, GPT-4.1 or Claude).

Github Copilot - Setup MCP Server

  1. Click the tool icon in the chat window and choose "Select tools (+)".
  2. In the "Configure MCP Server" dialog, add the server configuration.

Github Copilot - Setup MCP Server

You can configure the MCP server at either the Global level (available to all solutions) or the Solution level (available only to the current solution). The JSON format is the same as the example shown in the VS Code section.

Example usage

In the GitHub Copilot chat window, select an agent and an LLM model, then ask a query such as:

"List the tables in the Person schema."

The agent will interact with the MCP database server to run the query and return the list of tables in the Person schema in the chat window.


LLM Models and Parameters

Models

Many LLM models are available on the market, which can make it overwhelming to choose the right one. In this article, we will look at the list of LLM categories, their functionalities, and important parameters for fine-tuning LLMs. LLMs revolve around the Transformer architecture, which powers many generative AI applications like chats, summarization, and translation. This architecture has three important components:

  1. Tokenizer: Converts text into tokens.
  2. Embedding: Turns tokens into vectors representing meaning and position.
  3. Transformer: Uses encoders, decoders, and self-attention to process and generate text.

Transformer Architecture

Transformer Architectures

The Transformer supports three main architectures:

  1. Encoder-only: Best for understanding and classifying text (e.g., BERT, RoBERTa, DistilBERT).
  2. Decoder-only: Best for generating new text from prompts (e.g., GPT, Llama, Mistral).
  3. Encoder-decoder: Versatile for tasks like translation, summarization, and Q&A (e.g., T5, BART).

OpenAI APIs make it easy to use powerful language models without managing infrastructure. In contrast, the Hugging Face Transformers library allows for more customization and direct model control, including fine-tuning. Both are used for integrating AI into applications.

Model Parameters

Below are the key parameters that influence text generation with Transformers:

  • Prompt: Sets the context for the model’s output. In chat models, prompts are split into system (instructions/role), user (questions/commands), and assistant (model’s responses).
  • max_new_tokens: Limits the number of tokens generated in the output, controlling the output length.
  • max_length: Limits the total length of the input plus the output. Use either max_new_tokens or max_length, not both.
  • temperature: Controls randomness. Higher values (closer to 1) make outputs more creative and varied; lower values make them more predictable and focused.
  • do_sample: If false, the model always picks the most likely next token (deterministic). If true, it samples from the probability distribution, resulting in more creative output.
  • top_k: When sampling, this restricts choices to the K most likely tokens. Lower values lead to more focused output, while higher values create more diverse results.
  • top_p: Samples from the smallest set of tokens whose cumulative probability exceeds a threshold, which allows for more dynamic and varied outputs.

Other parameters (like beam search, repetition penalties, etc.) offer further control. More details can be found in the Hugging Face Transformers documentation. These parameters let you balance creativity, coherence, and length in generated text, depending on your application’s needs.

Categories of Models

Code Generation Models

Code generation tools like GitHub Copilot and Cursor are powered by Transformer models trained on vast code datasets from sources like GitHub, Stack Overflow, and documentation. Many LLMs use reinforcement learning with human feedback (RLHF) to align outputs with professional coding standards. There are two main categories of code generation models:

  • General-purpose LLMs (e.g., GPT-4, Claude): These can generate both natural language and code, assisting with code completion, error correction, and test creation. They are best for prototyping and tasks involving both code and natural language.
  • Specialized code models (e.g., CodeLlama, CodeT5, CodeBERT): Trained specifically on code, these models excel at advanced tasks such as code search, clone detection, bug detection, and code translation. They are preferred for high-precision or domain-specific code tasks.

Image Generation Models

DALL·E and Stable Diffusion create images from natural language prompts using diffusion, a process where noise is gradually removed to form a coherent image.

  • DALL·E (by OpenAI):
    • Key parameters:
      • quality: HD or standard
      • size: Output image dimensions
      • style: vivid (hyperrealistic) or natural (authentic look)
  • Stable Diffusion (open source):
    • Key parameters:
      • seed: Ensures reproducibility of images.
      • negative prompt: Excludes unwanted elements.
      • width/height: Image dimensions (optimized for 512x512).
      • steps: Number of denoising steps (more steps = more detail).
      • scheduler: Controls the denoising rhythm, affecting style and quality.
      • CFG scale: Balances creativity and prompt adherence.

Text-to-Speech (TTS) Models

TTS converts written text into natural-sounding audio. The TTS pipeline involves several steps:

  1. Text is cleaned and normalized.
  2. Words are mapped to phonemes, and prosody (rhythm, stress, intonation) is analyzed.
  3. Acoustic modeling generates a mel spectrogram.
  4. A neural vocoder (like WaveNet or HiFi-GAN) converts the spectrogram to raw audio.
  5. Optional post-processing can be applied.

Two popular TTS options are:

  • OpenAI’s TTS API: Simple and high-quality.
    • Key Parameters:
      • voice: Choose from preset voices (e.g., Nova, Alloy).
      • speed: Controls speech rate.
      • instructions: Guide delivery style (e.g., "excited," "positive").
  • Chatterbox (Resemble AI): Open source and offers more control.
    • Key Parameters:
      • exaggeration: Controls expressiveness.
      • CFG weight: Balances faithfulness to text.
      • audio prompt path: Reference audio for voice adaptation.

Multimodal Models (MLLMs)

Multimodal models (MLLMs) can process and understand multiple data types (text, images, audio) within a single system. Training is a two-phase process: aligning new modalities and then fine-tuning the full system.

  • Unified Embedding Approach: Modalities are encoded into the same embedding space as text. This is simple but can lose detail. Used in models like Pix2Struct and Fuyu.
  • Cross-Attention Approach: Image encodings are connected to the LLM via cross-attention layers, allowing selective focus. This is more complex but more powerful. Used in models like Flamingo and LLaVA.

Fine-tuning Models

Fine-tuning large models is resource-intensive. Parameter-Efficient Fine-Tuning (PEFT) techniques adapt models by updating only a small number of parameters. The benefits include:

  • Much lower memory and compute requirements.
  • Tiny storage for task-specific parameters.
  • Preservation of original model knowledge.
  • LoRA (Low-Rank Adaptation): Adds small, trainable matrices to each layer.
  • QLoRA (Quantized Low-rank Adaptation): Combines LoRA with quantization to fine-tune very large models on a single GPU.
  • Prompt Tuning: Learns a small set of trainable embeddings (“soft prompts”) prepended to the input.
  • Prefix Tuning: Injects trainable vectors (“prefixes”) into the attention mechanism of each layer.

Costs

Model choice is only part of the challenge; cost is equally important.

  • Hosted APIs (e.g., OpenAI, Anthropic, Google):
    • Easy setup with no infrastructure to manage.
    • Pricing is usually per token (text) or per request (image/audio).
    • Use pricing calculators and token-counting libraries (like tiktoken) to estimate costs.
  • Self-hosting open-source models (e.g., Llama, Mistral):
    • More control and flexibility, but you manage the infrastructure.
    • Costs are based on GPU rental (hourly billing).
    • Use tools like Hugging Face’s model memory estimator to plan VRAM

2025 09 06 Introduction

👦About Me.

I’m Yuvaraj Kesavan, a hands-on engineering leader with over two decades of experience building scalable, distributed applications across healthcare, RCM, and enterprise domains. I specialize in architecting cloud-native solutions on Azure, modernizing monolithic systems into microservices, and designing secure, interoperable APIs. With a strong background in ASC and hospital domains, I’ve led teams to deliver high-performance applications, achieving millions of transactions per day. I’m also passionate about mentoring teams, driving full product life cycle delivery, and fostering cross-functional collaboration to transform complex challenges into successful outcomes. 👍👍👍🎉