Easy-to-use data analysis / manipulation framework for humans
DaPy is a data analysis library designed with ease of use in mind and it lets you smoothly implement your thoughts by providing well-designed data structures and abundant professional ML models. There has been a lot of famous data operation modules already like Pandas, but there is no module, which
Thus, DaPy is more suitable for data analysts, statistic professors and who works with big data with limited computer knowledge than the engineers. In DaPy, our data structure offers 70 APIs for data mining, including 40+ data operation functions, 10+ feature engineering functions and 15+ data exploring functions.
This example simply shows the characters of DaPy of chain programming, working log and simple feature engineering methods. Our goal in this example is to train a classifier for Iris classification task. Detail information can be read from here.
We already have abundant of great libraries for data science, why we need DaPy?
The answer is DaPy is designed for data analysts, not for coders. In DaPy, users only need to focus on their thought of handling data, and pay less attention to coding tricks. For example, in contrast with Pandas, DaPy supports you manipulating data by rows as same as using SQL. Here are just a few of things that make DaPy simple:
Also, DaPy has high efficiency to support you solving real-world situations. Following dialog shows a testing result which provides that DaPy has comparable efficiency than some exists C written libraries. The detail of test can be found from here.
The latest version 1.11.1 had been updated to PyPi.
pip install DaPy
Some of functions in DaPy depend on requirements.
sheet = DaPy.read(file_addr)
sheet.show(lines=5)
sheet.info
sheet.count_values('gender')
sheet.groupby('city')
sheet.corr(['age', 'income'])
sheet.drop_duplicates(col, keep='first')
sheet.fillna(method='linear')
sheet.dropna(axis=0, how=0.5)
sheet.drop('ID', axis=1)
sheet = sheet.sort('Age', 'DESC')
sheet.merge(sheet2, left_key='ID', other_key='ID', keep_key='self', keep_same=False)
sheet.join(sheet2)
sheet.append_row(new_row)
sheet.append_col(new_col)
sheet[:10, 20: 30, 50: 100]
sheet['age', 'income', 'name']
sheet.get_date_label('birth')
sheet.get_categories(cols='age', cutpoints=[18, 30, 50], group_name=['Juveniles', 'Adults', 'Wrinkly', 'Old'])
sheet.get_dummies(['city', 'education'])
sheet.get_interactions(n_power=3, col=['income', 'age', 'gender', 'education'])
sheet.get_ranks(cols='income', duplicate='mean')
sheet.normalized(col='age')
sheet.normalized('log', col='salary')
sheet.apply(func=tax_rate, col=['salary', 'income'])
DaPy.diff(sheet.income)
m = MLP()
, m = LinearRegression()
, m = DecisionTree()
or m = DiscriminantAnalysis()
m.fit(X_train, Y_train)
m.report.show()
m.plot_error()
or DecisionTree.export_graphviz()
DaPy.methods.Performance(m, X_test, Y_test, mode)
.m.save(addr)
sheet.save(addr)
Xuansheng WU (@JacksonWoo: [email protected])
Feichi YANG (@Nick Yang: [email protected])
Following programs are also great data analyzing/ manipulating frameworks in Python: