Case Studies

What Are Python's Most Used Functions?

A python snake writing Python

We analyzed 200K+ IPython notebooks to find out.

The "Most Python" report provides insights into how Python is used for data science. We analyzed 200K+ IPython (aka Jupyter) notebooks and 2M+ StackOverflow questions, resulting in:

  1. A list of the most-used Python functions in IPython notebooks

  2. For each function, the most common questions from StackOverflow

  3. For each function, code samples from the most popular notebooks on Github


Methodology

Github makes code from open-source licensed projects available in BigQuery's public data. We filtered this data to ".ipynb" files and, after extensive regex and data manipulation, we parsed the data into notebook cells, functions, and libraries. This gave us a clean connection from Github repositories → file paths → files (i.e. notebooks) → notebook cells → code, functions, or libraries.


Top 10 Most-Used Python Functions

We collected a list of the top 10 most-used Python functions in Jupyter notebooks based on 200K+ open-source Jupyter notebooks on Github. We then cross-referenced this data with 2M+ StackOverflow questions to identify the most common questions about the most common Python functions.

10) np.zeros

Produces a NumPy array of zeros. This is particularly useful when creating vectors in TensorFlow or for other machine learning applications.

StackOverflow's Most Common Questions

  • My data type is not understood when using np.zeros.

  • How do I delete a row in a numpy array which contains a zero?

9) float

Converts a string or integer to a floating-point number. Float is used in 9.5% of notebooks and is used 19 times per notebook on average.

StackOverflow's Most Common Questions

  • How to limit floats to two decimal points.

  • How to check if a number is a float.

8) init

When we create a Python object by running a class, init initializes the data stored in the object. From there, we can apply methods (aka functions) to transform the data in our object.

# Sample class with init method
class Person:
    # init method or constructor
    def __init__(self, name):
        self.name = name
    
    # user-defined method
    def say_hello(self):
        print('Hello, my name is', self.name)

p = Person('Roger')
p.say_hello()
# Sample class with init method
class Person:
    # init method or constructor
    def __init__(self, name):
        self.name = name
    
    # user-defined method
    def say_hello(self):
        print('Hello, my name is', self.name)

p = Person('Roger')
p.say_hello()
# Sample class with init method
class Person:
    # init method or constructor
    def __init__(self, name):
        self.name = name
    
    # user-defined method
    def say_hello(self):
        print('Hello, my name is', self.name)

p = Person('Roger')
p.say_hello()

StackOverflow's Most Common Questions

  • How to fix "Attempted relative import in non-package" even with init.py.

  • How to return a value from init in Python?

7) np.array

NumPy's array function outputs an n-dimensional array, based on the inputs specified. NumPy arrays are often used in machine learning, as they allow for smaller data storage and faster processing than Python data objects.

StackOverflow's Most Common Questions

  • How do I convert a PIL Image into a NumPy array?

  • How do I get indices of N maximum values in a NumPy array?

  • Concatenating two one-dimensional NumPy arrays.

6) format

The format function provides a simple way to print strings dynamically.

import numpy as np

x, y = np.full(4, 1.0), np.full(4, 2.0)
print("{} + {} = {}".format(x, y, x + y))
import numpy as np

x, y = np.full(4, 1.0), np.full(4, 2.0)
print("{} + {} = {}".format(x, y, x + y))
import numpy as np

x, y = np.full(4, 1.0), np.full(4, 2.0)
print("{} + {} = {}".format(x, y, x + y))

Here, a data scientist uses .format() to build filepath references and display the first image within each subfolder.

for str in ['A', 'B', 'C', 'D', 'E', 'F']:
    root = 'notMNIST_small'
    path = os.listdir('{}/{}'.format(root, str))[0]
    display(Image('{}/{}/{}'.format(root, str, path)))
for str in ['A', 'B', 'C', 'D', 'E', 'F']:
    root = 'notMNIST_small'
    path = os.listdir('{}/{}'.format(root, str))[0]
    display(Image('{}/{}/{}'.format(root, str, path)))
for str in ['A', 'B', 'C', 'D', 'E', 'F']:
    root = 'notMNIST_small'
    path = os.listdir('{}/{}'.format(root, str))[0]
    display(Image('{}/{}/{}'.format(root, str, path)))

StackOverflow's Most Common Questions

  • How do I print curly-brace characters in a string while using .format?

  • How to print a string at a fixed width?

5) int

The function int() converts an input into an integer. This input can be a string, number, or bytes object. It's used in 15% of notebooks.

StackOverflow's Most Common Questions

  • Convert all strings in a list to int.

  • How can I read inputs as numbers?

  • How to convert an int to a hex string?

4) str

The function str() converts inputs into a string. It is found in 14% of notebooks and is used an average of 25 times per notebook. The example below converts a set of dates into strings to be included in filenames.

for row, item in publications.iterrows():
    md_filename = str(item.pub_date) + "-" + item.url_slug + ".md"
    html_filename = str(item.pub_date) + "-" + item.url_slug
    year = item.pub_date[:4]
for row, item in publications.iterrows():
    md_filename = str(item.pub_date) + "-" + item.url_slug + ".md"
    html_filename = str(item.pub_date) + "-" + item.url_slug
    year = item.pub_date[:4]
for row, item in publications.iterrows():
    md_filename = str(item.pub_date) + "-" + item.url_slug + ".md"
    html_filename = str(item.pub_date) + "-" + item.url_slug
    year = item.pub_date[:4]

StackOverflow's Most Common Questions

  • Does Python have a string 'contains' substring method?

  • How to read a file line-by-line into a list?

3) range

Range returns a sequential list of numbers. Range takes the inputs: start, stop, and step, where:

  • Start is the starting number of the list, 0 by default

  • Stop is the number at which the range should end

  • Step is the value to increment in each step

Range() is commonly used in for() loops, partly why it's found in 36% of notebooks. Here's an application of range in a for loop:

# Return a list of batch size pairs
def get_batches(x, y, batch_size=100):
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]
# Return a list of batch size pairs
def get_batches(x, y, batch_size=100):
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]
# Return a list of batch size pairs
def get_batches(x, y, batch_size=100):
    n_batches = len(x)//batch_size
    x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
    
    for ii in range(0, len(x), batch_size):
        yield x[ii:ii+batch_size], y[ii:ii+batch_size]

StackOverflow's Most Common Questions

  • How do I use a decimal step value for range()?

  • Print a list in reverse order with range()?

2) len

len() returns the length of a string, list, dataframe, or any other type of Python object. Needless to say, this is commonly-used in IPython notebooks, as data scientists regularly check the size of the data they're transforming. 38% of notebooks use len().

See an example below, where we use len() to get non-empty values from a list, and use it again to print summary information about the output list.

# Get all index values with non-empty reviews
non_zero_idx = [
    ii for ii, review in enumerate(reviews_ints) 
    if len(review) != 0
]

# Print the number of non-empty reviews
print(len(non_zero_idx))
# Get all index values with non-empty reviews
non_zero_idx = [
    ii for ii, review in enumerate(reviews_ints) 
    if len(review) != 0
]

# Print the number of non-empty reviews
print(len(non_zero_idx))
# Get all index values with non-empty reviews
non_zero_idx = [
    ii for ii, review in enumerate(reviews_ints) 
    if len(review) != 0
]

# Print the number of non-empty reviews
print(len(non_zero_idx))

1) print

No other function is used as often as print() in IPython notebooks. This makes sense, as data scientists output information at the end of nearly every cell. One-third of notebooks use print() and it's used 31 times per notebook on average.

import numpy as np

x, y = np.full(4, 1.0), np.full(4, 2.0)
print("{} + {} = {}".format(x, y, x + y))

# Print using list comprehension
print([x + y for x, y in zip([1.0] * 4, [2.0] * 4)])
import numpy as np

x, y = np.full(4, 1.0), np.full(4, 2.0)
print("{} + {} = {}".format(x, y, x + y))

# Print using list comprehension
print([x + y for x, y in zip([1.0] * 4, [2.0] * 4)])
import numpy as np

x, y = np.full(4, 1.0), np.full(4, 2.0)
print("{} + {} = {}".format(x, y, x + y))

# Print using list comprehension
print([x + y for x, y in zip([1.0] * 4, [2.0] * 4)])

In the following example, the data scientist uses print to provide status updates during a data load.

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

print('Getting MNIST Dataset...')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print('Data Extracted.')
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

print('Getting MNIST Dataset...')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print('Data Extracted.')
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

print('Getting MNIST Dataset...')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
print('Data Extracted.')

StackOverflow's Most Common Questions

  • How can I print variable and string on same line in Python?

  • Print multiple arguments in Python

Tip: Understanding the most-used functions helps you write cleaner code and solve common problems more efficiently. Focus on mastering these functions first—they appear in the majority of data science projects.


Conclusion

These 10 functions form the foundation of most Python data science work. Whether you're just starting out or looking to improve your Python skills, understanding how and when to use these functions will significantly accelerate your ability to work with data effectively.



The Data Strategist helps startups and scaleups become data-driven. Get a data scientist on-demand, or advice on building your analytical data stack.