While APIs have become a pervasive component of software, a core challenge for developers is to identify and use existing APIs. This warrants either a deep understanding of the API landscape or access to high-quality documentation and usage examples. While the for- mer is infeasible, the latter is often limited in practice.
CodeScholar
(📝 Paper: Preprint) is a tool that generates idiomatic code examples for
query APIs (single and multiple). It finds idiomatic examples for APIS by searching a large
corpus of code and growing program graphs idiomatically guided by a neural model.
python search.py --dataset <dataset_name> --seed json.load
- 🔥 Fast neural-guided search over graphs.
- 🧠 Idiomatic code generation by graph growing for representative examples.
- 🪢 Single and Multi-API support, and easily extensible to new APIs.
- 🚀 Streamlit app for interactive search.
- How to install CodeScholar
- How to use CodeScholar
- How to run CodeScholar App
- How to train CodeScholar
- Reproducability of CodeScholar Evaluation
# clone the repository
git clone [email protected]:tart-proj/codescholar.git
# cd into the codescholar directory
cd codescholar
# install basic requirements
pip install -r requirements-dev.txt
# install pytorch-geometric requirements. Use {pyg} for GPU and {torch} for CPU
pip install -r requirements-{pyg,torch}.txt
# install codescholar
pip install -e .
-
Starting services
./services.sh start
what does this do?
# start an elasticsearch server (hosts programs) in a tmux session docker run --rm -p 9200:9200 -p 9300:9300 -e "xpack.security.enabled=false" -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:8.7.0 # start a redis server (hosts embeddings) docker run --rm -p 6379:6379 redis
-
Indexing
./services.sh index <dataset_name>
what does this do?
# index the dataset using /search/elastic_search.py cd codescholar/search python elastic_search.py --dataset <dataset_name>
TODO: index all embeddings into redis; currently index happens before each search
-
Searching
# run the codescholar query (say np.mean) using /search/search.py python search.py --dataset <dataset_name> --seed np.mean
You can also use some arguments with the search query:
--min_idiom_size <int> # minimum size of idioms to be saved --max_idiom_size <int> # maximum size of idioms to be saved --max_init_beams <int> # maximum beams to initialize search --stop_at_equilibrium # stop search when diversity = reusability of idioms
note: see more configurations in /search/search_config.py
-
Setup services
./services.sh start ./services.sh index <dataset_name>
-
Start server and application
cd codescholar/apps ./app.sh start
what does this do?
# start a celery backend to handle tasks asynchronously celery -A app_decl.celery worker --pool=solo --loglevel=info # start a flask server to handle http API requests # note: runs flask on port 3003 python flask_app.py
You can now make API requests to the flask server. For example, to run search for size
10
idioms forpd.merge
, you can:curl -X POST -H "Content-Type: application/json" -d '{"api": "pd.merge", "size": 10}' http://localhost:3003/search
Finally,
# start the streamlit app on port localhost:8501 streamlit run streamlit_app.py
View details about the app using:
./app.sh show
Refer to the training README for a detailed description of how to train CodeScholar.
Refer to the evaluation README for a detailed description of how to reproduce the evaluation results reported in the paper.