Explain Codes LogoExplain Codes Logo

Pandas query function not working with spaces in column names

python
pandas
query-function
dataframe
Alex KataevbyAlex Kataev·Nov 25, 2024
TLDR

To handle column names with spaces in pandas queries, use backticks:

df.query('`Column Name` == "value"')

This tells pandas to interpret "Column Name" as a single identifier, meaning your queries run flawlessly.

However, this method applies to Pandas 0.25 and later. If the version is earlier, refer to the alternative methods provided later.

Grasping the details

Working with pandas query() method and backticks for columns containing spaces is a promising enhancement. However, challenges may arise in certain scenarios that demand a workaround.

Alternative selections for earlier versions

For Pandas versions earlier than 0.25, you can't use backticks in query(). In such cases, employ the following:

  • Boolean indexing:

    df[df['Column Name'] == value] # old is gold effect
  • The .loc accessor:

    df.loc[df['Column Name'] == value] # navigator Avenger to the rescue

These functionalities are sturdy, although they might lack query()'s succinctness.

Non-identifier name challenges

Some column names might not follow Python identifier rules – starting with numbers or carrying special characters. Backticks will still fail you with a query(), so consider renaming the columns or use alternative methods as previously illustrated.

Community knowledge and documents references

The pandas community on GitHub often shares some light on these aspects, especially in GitHub thread #6508. Moreover, the Pandas documentation is a trove of continually updated practices.

Extra tips and tricks

Column name consistency

Maintain column names that are consistent and avoid spaces. This proactive step evades potential issues with querying and other data operations.

Handling multiple conditions

For multi-condition queries involving complex column names containing spaces, always wrap each identifier separately:

df.query('`First Column` > 10 and `Second Column` < 20') # like ordering a combo at a restaurant

Complex expressions and limitations

While the query() function is powerful, it has its own share of limitations. Complex expressions, especially those involving non-vectorized functions or operations, might require you to resort to traditional methods.