pyan is a Python module that performs static analysis of Python code to determine a call dependency graph between functions and methods. This is different from running the code and seeing which functions are called and how often; there are various tools that will generate a call graph in that way, usually using debugger or profiling trace hooks - for example: https://pycallgraph.readthedocs.org/ This code was originally written by Edmund Horner, and then modified by Juha Jeronen. See README for the original blog posts and links to their repositories.
Offline call graph generator for Python 3
Pyan takes one or more Python source files, performs a (rather superficial) static analysis, and constructs a directed graph of the objects in the combined source, and how they define or use each other. The graph can be output for rendering by GraphViz or yEd.
This project has 2 official repositories:
The PyPI package pyan3 is built from development
Defines relations are drawn with dotted gray arrows.
Uses relations are drawn with black solid arrows. Recursion is indicated by an arrow from a node to itself. Mutual recursion between nodes X and Y is indicated by a pair of arrows, one pointing from X to Y, and the other from Y to X.
Nodes are always filled, and made translucent to clearly show any arrows passing underneath them. This is especially useful for large graphs with GraphViz’s fdp
filter. If colored output is not enabled, the fill is white.
In node coloring, the HSL color model is used. The hue is determined by the filename the node comes from. The lightness is determined by depth of namespace nesting, with darker meaning more deeply nested. Saturation is constant. The spacing between different hues depends on the number of files analyzed; better results are obtained for fewer files.
Groups are filled with translucent gray to avoid clashes with any node color.
The nodes can be annotated by filename and source line number information.
The static analysis approach Pyan takes is different from running the code and seeing which functions are called and how often. There are various tools that will generate a call graph that way, usually using a debugger or profiling trace hooks, such as Python Call Graph.
In Pyan3, the analyzer was ported from compiler
(good riddance) to a combination of ast
and symtable
, and slightly extended.
pip install pyan3
See pyan3 --help
.
Example:
pyan *.py --uses --no-defines --colored --grouped --annotated --dot >myuses.dot
Then render using your favorite GraphViz filter, mainly dot
or fdp
:
dot -Tsvg myuses.dot >myuses.svg
Or use directly
pyan *.py --uses --no-defines --colored --grouped --annotated --svg >myuses.svg
You can also export as an interactive HTML
pyan *.py --uses --no-defines --colored --grouped --annotated --html > myuses.html
Alternatively, you can call pyan
from a script
import pyan
from IPython.display import HTML
HTML(pyan.create_callgraph(filenames="**/*.py", format="html"))
You can integrate callgraphs into Sphinx.
Install graphviz (e.g. via sudo apt-get install graphviz
) and modify source/conf.py
so that
# modify extensions
extensions = [
...
"sphinx.ext.graphviz"
"pyan.sphinx",
]
# add graphviz options
graphviz_output_format = "svg"
Now, there is a callgraph directive which has all the options of the graphviz directive
and in addition:
Example to create a callgraph for the function pyan.create_callgraph
that is
zoomable, is defined from left to right and links each node to the API documentation that
was created at the toctree path api
.
.. callgraph:: pyan.create_callgraph
:toctree: api
:zoomable:
:direction: horizontal
If GraphViz says trouble in init_rank, try adding -Gnewrank=true
, as in:
dot -Gnewrank=true -Tsvg myuses.dot >myuses.svg
Usually either old or new rank (but often not both) works; this is a long-standing GraphViz issue with complex graphs.
If the graph is visually unreadable due to too much detail, consider visualizing only a subset of the files in your project. Any references to files outside the analyzed set will be considered as undefined, and will not be drawn.
Currently Pyan always operates at the level of individual functions and methods; an option to visualize only relations between namespaces may (or may not) be added in a future version.
Items tagged with ☆ are new in Pyan3.
Graph creation:
Analysis:
self.a.b
☆contract_nonexistents()
followed by expand_unknowns()
, but that often generated spurious uses edges (because the wildcard to *.name
expands to X.name
for all X
that have an attribute called name
.).super()
based on the static type at the call site ☆super()
☆self.a = MyFancyClass()
, the analyzer knows that any references to self.a
point to MyFancyClass
for
loop counter variables and functions or classes defined elsewhere no longer confuse Pyan.self
is defined by capturing the name of the first argument of a method definition, like Python does. ☆x,y,z = a,b,c
☆a = b = c
☆del name
(probably seen as isinstance(node.ctx, ast.Del)
in visit_Name()
, visit_Attribute()
)self.last_value
?
self.visit()
.self.last_value
is the simplest implementation that extracts a value from an expression, and it only needs to be used in a controlled manner (as analyze_binding()
currently does); i.e. reset before visiting, and reset immediately when done.The analyzer does not currently support:
.append()
and similar).a,*b,c = d,e,f,g,h
ast.Subscript
)Enum
in bases
during analysis of ClassDef; then tag the class as an enum and handle differently.super()
.
lambda
that has been stored in self.something
).From the viewpoint of graphing the defines and uses relations, the interesting parts of the AST are bindings (defining new names, or assigning new values to existing names), and any name that appears in an ast.Load
context (i.e. a use). The latter includes function calls; the function’s name then appears in a load context inside the ast.Call
node that represents the call site.
Bindings are tracked, with lexical scoping, to determine which type of object, or which function, each name points to at any given point in the source code being analyzed. This allows tracking things like:
def some_func():
pass
class MyClass:
def __init__(self):
self.f = some_func
def dostuff(self)
self.f()
By tracking the name self.f
, the analyzer will see that MyClass.dostuff()
uses some_func()
.
The analyzer also needs to keep track of what type of object self
currently points to. In a method definition, the literal name representing self
is captured from the argument list, as Python does; then in the lexical scope of that method, that name points to the current class (since Pyan cares only about object types, not instances).
Of course, this simple approach cannot correctly track cases where the current binding of self.f
depends on the order in which the methods of the class are executed. To keep things simple, Pyan decides to ignore this complication, just reads through the code in a linear fashion (twice so that any forward-references are picked up), and uses the most recent binding that is currently in scope.
When a binding statement is encountered, the current namespace determines in which scope to store the new value for the name. Similarly, when encountering a use, the current namespace determines which object type or function to tag as the user.
See AUTHORS.md.
GPL v2, as per comments here.