Writing Efficient Python Code
1. Foundations for efficiencies
unpacking : 여러개의 객체를 포함하고 있는 하나의 객체를 풀어줌 (* 사용)
# Create a range object that goes from 0 to 5
nums = range(0,6)
print(type(nums))
# Convert nums to a list
nums_list = list(nums)
print(nums_list)
# Create a new list of odd numbers from 1 to 11 by unpacking a range object
nums_list2 = [*range(1,13,2)]
print(nums_list2)
enumerate 실습
# Rewrite the for loop to use enumerate
indexed_names = []
for i, name in enumerate(names):
index_name = (i,name)
indexed_names.append(index_name)
print(indexed_names)
# Rewrite the above for loop using list comprehension
indexed_names_comp = [(index_name,indexed_names) for i,name in enumerate(names)]
print(indexed_names_comp)
# Unpack an enumerate object with a starting index of one
indexed_names_unpack = [*enumerate(names, 1)]
print(indexed_names_unpack)
map 자료형으로 names str.upper 표현
# Use map to apply str.upper to each element in names
names_map = map(str.upper, names )
# Print the type of the names_map
print(type(names_map))
# Unpack names_map into a list
names_uppercase = [*names_map]
# Print the list created above
print(names_uppercase)
numpy 실습
numpy : numpy array는 동종 데이터 유형(메모리 소비량을 줄임)을 포함하며 broadcast를 통해 모든 요소에 연산을 적용할 수 있는 기능을 제공
# Print second row of nums
print(nums[1,:])
# Print all elements of nums that are greater than six
print(nums[nums > 6])
# Double every element of nums
nums_dbl = nums * 2
print(nums_dbl)
# Replace the third column of nums
nums[:,2] = nums[:,2] + 1
print(nums)
# Create a list of arrival times
arrival_times = [*range(10,60,10)]
# Convert arrival_times to an array and update the times
arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3
# Use list comprehension and enumerate to pair guests to new times
guest_arrivals = [(names[i],time) for i,time in enumerate(new_times)]
# Map the welcome_guest function to each (guest,time) pair
welcome_map = map(welcome_guest, guest_arrivals)
guest_welcomes = [*welcome_map]
print(*guest_welcomes, sep='\n')
2. Timing and profiling code
python 코드의 시간을 재는 방법
%timeit 내장 함수를 이용하여 런타임 시간을 잴 수 있음.
-r 옵션 : 런타임
-n 옵션 : 루프
ex) %timeit -r5 -n25 set(heroes)
list() 보다 [] 가 더 빠르다.
%lprun 사용의 총 시간과 %timeit 사용의 시간이 일치하지 않을 수 있다.
%timeit은 시간의 평균 및 표준 편차를 계산하기 위해 여러 루프를 사용하므로 실행되는 시간이 정확하게 일치하지 않을 수 있다.
3. Gaining efficiencies
zip 실습
# Combine five items from names and three items from primary_types
differing_lengths = [*zip(names[:5], primary_types[:3])]
print(*differing_lengths, sep='\n')
Counter 실습
# Collect the count of primary types
type_count = Counter(primary_types)
print(type_count, '\n')
# Collect the count of generations
gen_count = Counter(generations)
print(gen_count, '\n')
# Use list comprehension to get each Pokémon's starting letter
starting_letters = [name[:1] for name in names]
# Collect the count of Pokémon for each starting_letter
starting_letters_count = Counter(starting_letters)
print(starting_letters_count)
combination 실습
# Import combinations from itertools
from itertools import combinations
# Create a combination object with pairs of Pokémon
combos_obj = combinations(pokemon, 2)
print(type(combos_obj), '\n')
# Convert combos_obj to a list by unpacking
combos_2 = [*combos_obj]
print(combos_2, '\n')
# Collect all possible combinations of 4 Pokémon directly into a list
combos_4 = [*combinations(pokemon, 4)]
print(combos_4)
set 실습
# Convert both lists to sets
ash_set = set(ash_pokedex)
misty_set = set(misty_pokedex)
# Find the Pokémon that exist in both sets
both = ash_set.intersection(misty_set)
print(both)
# Find the Pokémon that Ash has and Misty does not have
ash_only = ash_set.difference(misty_set)
print(ash_only)
# Find the Pokémon that are in only one set (not both)
unique_to_set = ash_set.symmetric_difference(misty_set)
print(unique_to_set)
# Convert Brock's Pokédex to a set
brock_pokedex_set = set(brock_pokedex)
print(brock_pokedex_set)
# Check if Psyduck is in Ash's list and Brock's set
print('Psyduck' in ash_pokedex)
print('Psyduck' in brock_pokedex_set)
# Check if Machop is in Ash's list and Brock's set
print('Machop' in ash_pokedex)
print('Machop' in brock_pokedex_set)
# Use find_unique_items() to collect unique Pokémon names
uniq_names_func = find_unique_items(names)
print(len(uniq_names_func))
# Convert the names list to a set to collect unique Pokémon names
uniq_names_set = set(names)
print(len(uniq_names_set))
# Check that both unique collections are equivalent
print(sorted(uniq_names_func) == sorted(uniq_names_set))
# Use find_unique_items() to collect unique Pokémon names
uniq_names_func = find_unique_items(names)
print(len(uniq_names_func))
# Convert the names list to a set to collect unique Pokémon names
uniq_names_set = set(names)
print(len(uniq_names_set))
# Check that both unique collections are equivalent
print(sorted(uniq_names_func) == sorted(uniq_names_set))
# Use the best approach to collect unique primary types and generations
uniq_types = set(primary_types)
uniq_gens = set(generations)
print(uniq_types, uniq_gens, sep='\n')
# Collect Pokémon that belong to generation 1 or generation 2
gen1_gen2_pokemon = [name for name,gen in zip(poke_names, poke_gens) if gen < 3]
# Create a map object that stores the name lengths
name_lengths_map = map(len, gen1_gen2_pokemon)
# Combine gen1_gen2_pokemon and name_lengths_map into a list
gen1_gen2_name_lengths = [*zip(gen1_gen2_pokemon, name_lengths_map)]
print(gen1_gen2_name_lengths_loop[:5])
print(gen1_gen2_name_lengths[:5])
# Create a total stats array
total_stats_np = stats.sum(axis=1)
# Create an average stats array
avg_stats_np = stats.mean(axis=1)
# Combine names, total_stats_np, and avg_stats_np into a list
poke_list_np = [*zip(names, total_stats_np, avg_stats_np)]
print(poke_list_np == poke_list, '\n')
print(poke_list_np[:3])
print(poke_list[:3], '\n')
top_3 = sorted(poke_list_np, key=lambda x: x[1], reverse=True)[:3]
print('3 strongest Pokémon:\n{}'.format(top_3))
# Import Counter
from collections import Counter
# Collect the count of each generation
gen_counts = Counter(generations)
# Improve for loop by moving one calculation above the loop
total_count = len(generations)
for gen,count in gen_counts.items():
gen_percent = round(count / total_count * 100, 2)
print('generation {}: count = {:3} percentage = {}'
.format(gen, count, gen_percent))
# Collect all possible pairs using combinations()
possible_pairs = [*combinations(pokemon_types, 2)]
# Create an empty list called enumerated_tuples
enumerated_tuples = []
# Add a line to append each enumerated_pair_tuple to the empty list above
for i,pair in enumerate(possible_pairs, 1):
enumerated_pair_tuple = (i,) + pair
enumerated_tuples.append(enumerated_pair_tuple)
# Convert all tuples in enumerated_tuples to a list
enumerated_pairs = [*map(list, enumerated_tuples)]
print(enumerated_pairs)
# Calculate the total HP avg and total HP standard deviation
hp_avg = hps.mean()
hp_std = hps.std()
# Use NumPy to eliminate the previous for loop
z_scores = (hps - hp_avg)/hp_std
# Combine names, hps, and z_scores
poke_zscores2 = [*zip(names, hps, z_scores)]
print(*poke_zscores2[:3], sep='\n')
# Use list comprehension with the same logic as the highest_hp_pokemon code block
highest_hp_pokemon = [(name, hp, zscore) for name,hp,zscore in poke_zscores2 if zscore > 2]
print(*highest_hp_pokemon, sep='\n')
4. Basic pandas optimizations
# Print the row and type of each row
for row_tuple in pit_df.iterrows():
print(row_tuple)
print(type(row_tuple))
# Create an empty list to store run differentials
run_diffs = []
# Write a for loop and collect runs allowed and runs scored for each row
for i,row in giants_df.iterrows():
runs_scored = row['RS']
runs_allowed = row['RA']
# Use the provided function to calculate run_diff for each row
run_diff = calc_run_diff(runs_scored, runs_allowed)
# Append each run differential to the output list
run_diffs.append(run_diff)
giants_df['RD'] = run_diffs
print(giants_df)
ittertuple 실습
# Loop over the DataFrame and print each row's Index, Year and Wins (W)
for row in rangers_df.itertuples():
i = row.Index
year = row.Year
wins = row.W
# Check if rangers made Playoffs (1 means yes; 0 means no)
if row.Playoffs == 1:
print(i, year, wins)
# Convert numeric playoffs to text by applying text_playoffs()
textual_playoffs = rays_df.apply(lambda row: text_playoffs(row['Playoffs']), axis=1)
print(textual_playoffs)
win_perc_preds_loop = []
# Use a loop and .itertuples() to collect each row's predicted win percentage
for row in baseball_df.itertuples():
runs_scored = row.RS
runs_allowed = row.RA
win_perc_pred = predict_win_perc(runs_scored, runs_allowed)
win_perc_preds_loop.append(win_perc_pred)
# Apply predict_win_perc to each row of the DataFrame
win_perc_preds_apply = baseball_df.apply(lambda row: predict_win_perc(row['RS'], row['RA']), axis=1)
# Calculate the win percentage predictions using NumPy arrays
win_perc_preds_np = predict_win_perc(baseball_df['RS'].values, baseball_df['RA'].values)
baseball_df['WP_preds'] = win_perc_preds_np
print(baseball_df.head())
'Language > Python' 카테고리의 다른 글
데이터 엔지니어(다섯번째 이야기) (0) | 2021.10.07 |
---|---|
데이터 엔지니어(네번째 이야기) (0) | 2021.09.16 |
데이터 엔지니어(두번째 이야기) (0) | 2021.09.06 |
데이터 엔지니어(첫번째 이야기) (0) | 2021.09.02 |
스택(Stack) 개념과 파이썬 문서 가이드 (0) | 2021.01.22 |
댓글