Pandas - append

October 11, 2018

Appending to a dataframe makes a copy every time, so big performance hit when used iteratively. Using a dictionary is fastest,
It's been a long time, but I faced the same problem too. And found here a lot of interesting answers. So I was confused what method to use. In the case of adding a lot of rows to dataframe I interested in speed performance . So I tried 3 most popular methods and checked their speed.
SPEED PERFORMANCE
Using .append ( NPE's answer )
Using .loc ( fred's answer and FooBar's answer )
Using dict and create DataFrame in the end ( ShikharDua's answer )
Results (in secs):
Adding 1000 rows 5000 rows 10000 rows
.append 1.04 4.84 9.56
.loc 1.16 5.59 11.50
dict 0.23 0.26 0.34

From < https://stackoverflow.com/questions/10715965/add-one-row-in-a-pandas-dataframe >

It is worth noting that concat() (and therefore append() ) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

frames = [ process_your_file(f) for f in files ] 
result = pd.concat(frames)

From < https://pandas.pydata.org/pandas-docs/stable/merging.html >

Expressed as for loop:

csv_data = pd.DataFrame() 
for csv in glob.iglob(here + '/logs/**/*.csv', recursive=True): 
csv_data.append(pd.read_csv(csv, 
error_bad_lines=False, 
warn_bad_lines=True, 
index_col=False, 
ignore_index=True))

As list comprehension:

csv_data = [ 
pd.read_csv(csv, error_bad_lines=False, warn_bad_lines=True, index_col=False) 
for csv in glob.iglob(here + '/logs/**/*.csv', recursive=True) 
] 
merged_csv = pd.concat(csv_data, ignore_index=True)

and, the Spyder python IDE is awesome. I'm a convert. :) Winning killer feature over the many others I've used to date: interactive live variable explorer