Create Pandas DataFrame from a string

python

pandas

dataframe

data-cleaning

byAnton Shumikhin·Oct 20, 2024

Turn a multiline string into a DataFrame in the blink of an eye using pd.read_csv(StringIO(data)):

import pandas as pd
from io import StringIO

data = "A,B,C\n1,2,3\n4,5,6"  # Three men in a boat (to say nothing of the dog)

df = pd.read_csv(StringIO(data))
print(df)

To display your DataFrame:

   A  B  C
0  1  2  3
1  4  5  6

But what if you're dealing with different separators? Specify the correct sep in pd.read_csv() to keep your data in check:

df_semicolon = pd.read_csv(StringIO(data), sep=';')  # Just like splitting a pizza

Embracing StringIO and read_csv

When you're grappling with strings in your program's memory, io.StringIO provides a virtuous object that pd.read_csv() happily chomps down. This allows for seamless transformation of a string into a Pandas DataFrame. Now, isn't that a nifty trick?

Wrestling with separators

Your data might be separated by commas, semicolons, or even something outrageous like a tilde. For semicolon-separated, you'll need to be specific with sep=';':

string_data = "Column1;Column2;Column3\n7;8;9\n10;11;12"
df_semicolon = pd.read_csv(StringIO(string_data), sep=';')  # Semicolon: the silent hero 
print(df_semicolon)

Remember, clean data is happy data. Look out for extra or missing delimiters and ensure your DataFrame fits like a glove.

Tackling irregular data formats

Sometimes your data just won't sit still, with spaces around separators or inconsistent patterns. Using regex in the 'sep' parameter is a big help:

string_data = "Column1 ; Column2 ; Column3\n7 ; 8 ; 9\n10 ; 11 ; 12"
df_semicolon = pd.read_csv(StringIO(string_data), sep=r'\s*;\s*')  # Regular Expressions: the ultimate lifesaver
print(df_semicolon)

Piping data into the DataFrame

When dealing with pipe-separated data, don't fall for the allure of pd.read_fwf. pd.read_csv is the true champ with its versatile keyword arguments ('kwargs'):

pipe_data = "A|B|C\n1|2|3\n4|5|6"
df_pipe = pd.read_csv(StringIO(pipe_data), sep='|')  # Piping hot data coming right up!
print(df_pipe)

More useful twists and turns

Need to load data from clipboard to DataFrame? Here's a shortcut:

pd.read_clipboard(sep=';') lands you a DataFrame from copied content for testing, but you shouldn't rely on it in production.

PyCharm users can tap into a great visualization tool:

The "Pipe Table Formatter" plugin helps you visualize your data, so you can spot oddities and fix them before loading the data into a DataFrame.

Because clean data forms a clean DataFrame:

You should always review and clean up your data before creating a DataFrame.

And the cherry on top:

pd.read_csv(), besides being easy to use, automatically extracts column headers and preserves data types. This is a bonanza for efficiency and accuracy.

explain-codes / Python / Create Pandas DataFrame from a string

Linked

Load data from txt with pandas



Split (explode) pandas dataframe string entry to separate rows



Pandas read in table without headers



How do I read a large csv file with pandas?



Pandas read_csv from url

