All we need is an easy explanation of the problem, so here it is.
New to Pandas, and Im doing something wrong. While running the bellow code to replace cells in column "data" that dont contain the string "fiels" with empty strings, instead of returning two columns (id, data), the whole of id column disappears with all rows starting with a delimiter instead. My intuition is because when I write back the chunk to csv I am only writing chunk_results which does not do anything on "id". The problem is I dont know how to solve it.
import pandas as pd in_csv= "out.csv" out_csv= "out_1.csv" reader = pd.read_csv(in_csv, chunksize=100, sep='|', header=None, names=['id', 'data'], encoding='utf-8') for chunk_df in reader: chunk_results = chunk_df['data'].astype(str).str.replace('^((?!field).)*$','', regex=True) chunk_results.to_csv(out_csv, mode='a', sep='|', encoding='utf-8', header=None, index=False)
What I have tried:
I guessed that I needed to create a
chunk_id = chunk_df['id'] and concat it with "chunk_results" to_csv but that just gave me an error. Any idea what Im doing wrong?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
You need to assign the results back to the dataframe chunk’s column. When you assign to
chunk_results you’re setting it to a dataframe with just the
chunk_df['data'] = chunk_df['data'].astype(str).str.replace('^((?!field).)*$','', regex=True) chunk_df.to_csv(out_csv, mode='a', sep='|', encoding='utf-8', header=None, index=False)
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂