How to Add Data to a Data Frame in Python

5 Min Read

Python is a powerful and versatile programming language with a rich ecosystem of libraries. One of the most popular libraries for data manipulation and analysis is pandas. Pandas makes it easy to work with structured data, such as tables or spreadsheets, through the use of DataFrames. In this tutorial, we’ll explore how to add a column to a DataFrame in Python, a common task when working with data.

Getting Started with Pandas

Before diving into the process of adding a column to a DataFrame, it’s essential to have pandas installed on your system. You can install pandas using the following pip command:


pip install pandas

Once pandas is installed, you’ll need to import it in your Python script. You can do this with the following line of code:


import pandas as pd

By convention, pandas is usually imported as ‘pd’ to make it easier to work with.

Creating a DataFrame

To demonstrate how to add a column to a DataFrame, let’s first create a simple DataFrame with some sample data:


data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 30, 22, 28],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

This code snippet creates a DataFrame with three columns: ‘Name’, ‘Age’, and ‘City’. The DataFrame is printed and should look like this:

   Name  Age           City
0  Alice   24       New York
1    Bob   30  San Francisco
2  Charlie  22    Los Angeles
3  David   28        Chicago

Adding a Column to a DataFrame

Now that we have a DataFrame to work with, let’s add a new column to it. There are multiple ways to add a column to a DataFrame in Python, but we’ll focus on the most straightforward method using direct assignment.

Suppose we want to add a column called ‘Salary’ to our DataFrame. We can do this by assigning a new list of values to a new column name:


df['Salary'] = [60000, 80000, 55000, 75000]
print(df)

After executing this code, the DataFrame should now include the new ‘Salary’ column:

   Name  Age           City  Salary
0  Alice   24       New York   60000
1    Bob   30  San Francisco   80000
2  Charlie  22    Los Angeles   55000
3  David   28        Chicago   75000

Adding a column to a DataFrame in Python is simple and straightforward using pandas. Keep in mind that the length of the list assigned to the new column must match the number of rows in the DataFrame, or you’ll encounter an error.

Adding a Derived Column

In some cases, you may want to add a column that is derived from existing columns in the DataFrame. For example, let’s add a column called ‘Tax’ that calculates the tax based on the ‘Salary’ column:


tax_rate = 0.3
df['Tax'] = df['Salary'] * tax_rate
print(df)

After executing this code, the DataFrame should now include the new ‘Tax’ column:

   Name  Age           City  Salary      Tax
0  Alice   24       New York   60000  18000.0
1    Bob   30  San Francisco   80000  24000.0
2  Charlie  22    Los Angeles   55000  16500.0
3  David   28        Chicago   75000  22500.0

By using operations on existing columns, you can quickly create new columns derived from your existing data.

Now that you’ve learned how to add a column to a DataFrame in Python, you can continue exploring other powerful features of pandas. For more tutorials and insights on working with Kubernetes and other technologies, be sure to check out the rest of the articles on our site.

    Keep experimenting with pandas and Python, and you’ll be able to handle even more complex data manipulation tasks with ease. Happy coding!

    Share this Article
    Leave a comment