Case Studies
What Are Python's Most Used Functions?

We analyzed 200K+ IPython notebooks to find out.
The "Most Python" report provides insights into how Python is used for data science. We analyzed 200K+ IPython (aka Jupyter) notebooks and 2M+ StackOverflow questions, resulting in:
A list of the most-used Python functions in IPython notebooks
For each function, the most common questions from StackOverflow
For each function, code samples from the most popular notebooks on Github
Methodology
Github makes code from open-source licensed projects available in BigQuery's public data. We filtered this data to ".ipynb" files and, after extensive regex and data manipulation, we parsed the data into notebook cells, functions, and libraries. This gave us a clean connection from Github repositories → file paths → files (i.e. notebooks) → notebook cells → code, functions, or libraries.
Top 10 Most-Used Python Functions
We collected a list of the top 10 most-used Python functions in Jupyter notebooks based on 200K+ open-source Jupyter notebooks on Github. We then cross-referenced this data with 2M+ StackOverflow questions to identify the most common questions about the most common Python functions.
10) np.zeros
Produces a NumPy array of zeros. This is particularly useful when creating vectors in TensorFlow or for other machine learning applications.
StackOverflow's Most Common Questions
My data type is not understood when using np.zeros.
How do I delete a row in a numpy array which contains a zero?
9) float
Converts a string or integer to a floating-point number. Float is used in 9.5% of notebooks and is used 19 times per notebook on average.
StackOverflow's Most Common Questions
How to limit floats to two decimal points.
How to check if a number is a float.
8) init
When we create a Python object by running a class, init initializes the data stored in the object. From there, we can apply methods (aka functions) to transform the data in our object.
StackOverflow's Most Common Questions
How to fix "Attempted relative import in non-package" even with init.py.
How to return a value from init in Python?
7) np.array
NumPy's array function outputs an n-dimensional array, based on the inputs specified. NumPy arrays are often used in machine learning, as they allow for smaller data storage and faster processing than Python data objects.
StackOverflow's Most Common Questions
How do I convert a PIL Image into a NumPy array?
How do I get indices of N maximum values in a NumPy array?
Concatenating two one-dimensional NumPy arrays.
6) format
The format function provides a simple way to print strings dynamically.
Here, a data scientist uses .format() to build filepath references and display the first image within each subfolder.
StackOverflow's Most Common Questions
How do I print curly-brace characters in a string while using .format?
How to print a string at a fixed width?
5) int
The function int() converts an input into an integer. This input can be a string, number, or bytes object. It's used in 15% of notebooks.
StackOverflow's Most Common Questions
Convert all strings in a list to int.
How can I read inputs as numbers?
How to convert an int to a hex string?
4) str
The function str() converts inputs into a string. It is found in 14% of notebooks and is used an average of 25 times per notebook. The example below converts a set of dates into strings to be included in filenames.
StackOverflow's Most Common Questions
Does Python have a string 'contains' substring method?
How to read a file line-by-line into a list?
3) range
Range returns a sequential list of numbers. Range takes the inputs: start, stop, and step, where:
Start is the starting number of the list, 0 by default
Stop is the number at which the range should end
Step is the value to increment in each step
Range() is commonly used in for() loops, partly why it's found in 36% of notebooks. Here's an application of range in a for loop:
StackOverflow's Most Common Questions
How do I use a decimal step value for range()?
Print a list in reverse order with range()?
2) len
len() returns the length of a string, list, dataframe, or any other type of Python object. Needless to say, this is commonly-used in IPython notebooks, as data scientists regularly check the size of the data they're transforming. 38% of notebooks use len().
See an example below, where we use len() to get non-empty values from a list, and use it again to print summary information about the output list.
1) print
No other function is used as often as print() in IPython notebooks. This makes sense, as data scientists output information at the end of nearly every cell. One-third of notebooks use print() and it's used 31 times per notebook on average.
In the following example, the data scientist uses print to provide status updates during a data load.
StackOverflow's Most Common Questions
How can I print variable and string on same line in Python?
Print multiple arguments in Python
Tip: Understanding the most-used functions helps you write cleaner code and solve common problems more efficiently. Focus on mastering these functions first—they appear in the majority of data science projects.
Conclusion
These 10 functions form the foundation of most Python data science work. Whether you're just starting out or looking to improve your Python skills, understanding how and when to use these functions will significantly accelerate your ability to work with data effectively.
The Data Strategist helps startups and scaleups become data-driven. Get a data scientist on-demand, or advice on building your analytical data stack.