Need help with elixir?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

bootlin
514 Stars 86 Forks GNU Affero General Public License v3.0 586 Commits 20 Opened issues

Description

The Elixir Cross Referencer

Services available

!
?

Need anything else?

Contributors list

No Data

= The Elixir Cross Referencer :doctype: book :pp: {plus}{plus} :toc: :toc-placement!:

image::https://travis-ci.com/bootlin/elixir.svg?branch=master[Build Status,link=https://travis-ci.com/bootlin/elixir]

Elixir is a source code cross-referencer inspired by https://en.wikipedia.org/wiki/LXRCrossReferencer[LXR]. It's written in Python and its main purpose is to index every release of a C or C{pp} project (like the Linux kernel) while keeping a minimal footprint.

It uses Git as a source-code file store and Berkeley DB for cross-reference data. Internally, it indexes Git blobs rather than trees of files to avoid duplicating work and data. It has a straightforward data structure (reminiscent of older LXR releases) to keep queries simple and fast.

You can see it in action on https://elixir.bootlin.com/

NOTE: this documentation applies to version 2.0 of Elixir.

toc::[]

= Requirements

  • Python >= 3.6
  • Git >= 1.9
  • The Jinja2 and Pygments (>= 2.2) Python libraries
  • Berkeley DB (and its Python binding)
  • Universal Ctags
  • Perl (for non-greedy regexes and automated testing)
  • Falcon and
    mod_wsgi
    (for the REST API)

= Architecture

Elixir has the following architecture:

.---------------.----------------. | CGI interface | REST interface | |---------------|----------------. | Query command | Update command | |---------------|----------------| | Shell script | '--------------------------------'

The shell script (

script.sh
) is the lower layer and provides commands to interact with Git and other Unix utilities. The Python commands use the shell script's services to provide access to the annotated source code and identifier lists (
query.py
) or to create and update the databases (
update.py
). Finally, the CGI interface (
web.py
) and the REST interface (
api.py
) use the query interface to generate HTML pages and to answer REST queries, respectively.

When installing the system, you should test each layer manually and make sure it works correctly before moving on to the next one.

== Database design

./update.py
stores a bidirectionnal mapping between git object hashes ("blobs") and a sequential key. The goal of indexing such hashes is to reduce their storage footprint (20 bytes for a SHA-1 hash versus 4 bytes for a 32 bit integer).

A detailed diagram of the databases will be provided. Until then, just use the Source, Luke.

= Manual Installation

== Install Dependences


For RedHat/CentOS



yum install python36-pip python36-pytest python36-jinja2 python36-bsddb3 python36-falcon git httpd perl perl-autodie jansson libyaml rh-python36-mod_wsgi


For Debian



sudo apt install python3 python3-jinja2 python3-bsddb3 python3-falcon python3-pytest universal-ctags perl git apache2 libapache2-mod-wsgi-py3 libjansson4

To enable the REST API, follow the installation instructions on https://github.com/GrahamDumpleton/modwsgi[`modwsgi`] and connect it to the apache installation as detailed in https://github.com/GrahamDumpleton/mod_wsgi#connecting-into-apache-installation.

To know which packages to install, you can also read the Docker files in the

docker/
directory to know what packages Elixir needs in your favorite distribution.

== Special dependencies

=== Device Tree and Kconfig Source syntax highlighting

The service on https://elixir.bootlin.com relies on a modified version of https://pygments.org/[Pygments], to enable syntax highlighting on Device Tree Source (DTS) files and on all Kconfig files.

The changes have been sent upstream and should appear in future distributions.

In the meantime, you can either rebuild this package from its source on https://github.com/MaximeChretien/pygments/tree/dev[GitHub], or download this https://bootlin.com/pub/elixir/Pygments-2.6.1.elixir-py3-none-any.whl[binary module] and install it as follows:


sudo pip3 install Pygments-2.6.1.elixir-py3-none-any.whl

=== Kconfig identifiers support

The service on https://elixir.bootlin.com relies on a modified version of https://ctags.io/[Universal-ctags], to enable indexing of Kconfig identifiers.

The changes have been sent upstream and should appear in future distributions.

In the meantime, you can either rebuild this package from its source on https://github.com/universal-ctags/ctags[GitHub], or download this https://bootlin.com/pub/elixir/universal-ctags0+git20200526-0ubuntu1amd64.deb[deb binary package] or https://bootlin.com/pub/elixir/universal-ctags-0+git~20e934e3-1.6.x86_64.rpm[rpm binary package] and install it as follows:


sudo dpkg -i universal-ctags0+git20200526-0ubuntu1amd64.deb or

sudo rpm -iv universal-ctags-0+git~20e934e3-1.6.x86_64.rpm

Then tell

apt
to hold this package and block future updates of the normal package:

sudo apt-mark hold universal-ctags

== Download Elixir Project


git clone https://github.com/bootlin/elixir.git /usr/local/elixir/

== Create Directory


mkdir -p /path/elixir-data/linux/repo

mkdir -p /path/elixir-data/linux/data

== Set environment variables

Two environment variables are used to tell Elixir where to find the project's local git repository and its databases:

  • LXR_REPO_DIR
    (the git repository directory for your project)
  • LXR_DATA_DIR
    (the database directory for your project)

Now open

/etc/profile
and append the following content.

export LXRREPODIR=/path/elixir-data/linux/repo

export LXRDATADIR=/path/elixir-data/linux/data

And then run

source /etc/profile
.

== Clone Kernel source code

First clone the master tree released by Linus Torvalds:


cd /path/elixir-data/linux

git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git repo

Then, you should also declare a

stable
remote branch corresponding to the
stable
tree, to get all release updates:

cd repo git remote add stable git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git

git fetch stable

Then, you can also declare an

history
remote branch corresponding to the old Linux versions not present in the other repos, to get all the old version still available:

cd repo git remote add history https://github.com/bootlin/linux-history.git

git fetch history --tags

Feel free to add more remote branches in this way, as Elixir will consider tags from all remote branches.

== First Test


cd /usr/local/elixir/

./script.sh list-tags

== Create Database


./update.py


Generating the full database can take a long time: it takes about 15 hours on a Xeon E3-1245 v5 to index 1800 tags in the Linux kernel. For that reason, you may want to tweak the script (for example, by limiting the number of tags with a "head") in order to test the update and query commands. You can even create a new Git repository and just create one tag instead of using the official kernel repository which is very large.


== Second Test

Verify that the queries work:

$ ./query.py v4.10 ident rawspinunlock_irq C $ ./query.py v4.10 file /kernel/sched/clock.c

NOTE:

v4.10
can be replaced with any other tag.

== Configure httpd

The CGI interface (

web.py
) is meant to be called from your web server. Since it includes support for indexing multiple projects, it expects a different variable (
LXR_PROJ_DIR
) which points to a directory with a specific structure:
  • 
    ** 
    
    *** 
    data
    ***
    repo
    **
    
    *** 
    data
    ***
    repo
    **
    
    *** 
    data
    ***
    repo

It will then generate the other two variables upon calling the query command.

Now open

/etc/httpd/conf.d/elixir.conf
and write the following content. Note: If using apache2 (Ubuntu/Debian) instead of httpd (RedHat/Centos), the default config file to edit is:
/etc/apache2/sites-enabled/000-default.conf

HttpProtocolOptions Unsafe

Required for HTTP

Options +ExecCGI AllowOverride None Require all granted SetEnv PYTHONIOENCODING utf-8 SetEnv LXRPROJDIR /path/elixir-data

Required for the REST API

SetHandler wsgi-script Require all granted SetEnv PYTHONIOENCODING utf-8 SetEnv LXRPROJDIR /path/elixir-data

AddHandler cgi-script .py

Listen 80

ServerName xxx DocumentRoot /usr/local/elixir/http

# To enable REST api after installing mod_wsgi: Fill path and uncomment:
#WSGIScriptAlias /api /usr/local/elixir/api/api.py

AllowEncodedSlashes On

RewriteEngine on RewriteRule "^/$" "/linux/latest/source" [R] RewriteRule "^/(?!api|acp).*/(source|ident|search)" "/web.py" [PT] RewriteRule "^/acp" "/autocomplete.py" [PT]

cgi and rewrite support has been enabled by default in RHEL/CentOS, but you should enable it manually if your distribution is Debian/Ubuntu.


a2enmod cgi rewrite

Finally, start the httpd server.


systemctl start httpd

== Configure lighthttpd

Here's a sample configuration for lighthttpd:


server.document-root = server_root + "/elixir/http" url.redirect = ( "^/$" => "/linux/latest/source" ) url.rewrite = ( "^/(?!api|acp).*/(source|ident|search)" => "/web.py/$1") url.rewrite = ( "^/acp" => "/autocomplete.py") setenv.add-environment = ( "PYTHONIOENCODING" => "utf-8",

"LXRPROJDIR" => "/path/to/elixir-data" )

= REST API usage

After configuring httpd, you can test the API usage:

== ident query

Send a get request to

/api/ident//?version=&family=
. For example:

curl http://127.0.0.1/api/ident/barebox/cdev?version=latest&family=C

The response body is of the following structure:


{ "definitions": [{"path": "commands/loadb.c", "line": 71, "type": "variable"}, ...], "references": [{"path": "arch/arm/boards/cm-fx6/board.c", "line": "64,64,71,72,75", "type": null}, ...]

}

= Maintenance and enhancements

== Using a cache to improve performance

At Bootlin, we're using the https://varnish-cache.org/[Varnish http cache] as a front-end to reduce the load on the server running the Elixir code.

.-------------. .---------------. .-----------------------. | Http client | --------> | Varnish cache | --------> | Apache running Elixir | '-------------' '---------------' '-----------------------'

== Keeping Elixir databases up to date

To keep your Elixir databases up to date and index new versions that are released, we're proposing to use a script like

utils/update-elixir-data
which is called through a daily cron job.

You can set

$ELIXIR_THREADS
if you want to change the number of threads used by update.py for indexing (default is 10 and minimum is 5).

== Keeping git repository disk usage under control

As you keep updating your git repositories, you may notice that some can become considerably bigger than they originally were. This seems to happen when a

gc.log
file appears in a big repository, apparently causing git's garbage collector (
git gc
) to fail, and therefore causing the repository to consume disk space at a fast pace every time new objects are fetched.

When this happens, you can save disk space by packing git directories as follows:


cd git prune rm gc.log

git gc --aggressive

Actually, a second pass with the above commands will save even more space.

To process multiple git repositories in a loop, you may use the

utils/pack-repositories
that we are providing, run from the directory where all repositories are found.

= Building Docker images

Docker files are provided in the

docker/
directory. To generate your own Docker image for indexing the sources of a project (for example for the Musl project which is much faster to index that Linux), download the
Dockerfile
file for your target distribution and run:

$ docker build -t elixir --build-arg GITREPOURL=git://git.musl-libc.org/musl --build-arg PROJECT=musl .

Then you can use your new container as follows (you get the container id from the output of

docker build
):

$ docker run

You can the open the below URL in a browser on your host: http://172.17.0.2/musl/latest/source (change the container IP address if you don't get the default one)

= Hardware requirements

Performance requirements depend mostly on the amount of traffic that you get on your Elixir service. However, a fast server also helps for the initial indexing of the projects.

SSD storage is strongly recommended because of the frequent access to git repositories.

At Bootlin, here are a few details about the server we're using:

  • As of July 2019, our Elixir service consumes 17 GB of data (supporting all projects), or for the Linux kernel alone (version 5.2 being the latest), 12 GB for indexing data, and 2 GB for the git repository.
  • We're using an LXD instance with 8 GB of RAM on a cloud server with 8 CPU cores running at 3.1 GHz.

= Contributing to Elixir

== Supporting a new project

Elixir has a very simple modular architecture that allows to support new source code projects by just adding a new file to the Elixir sources.

Elixir's assumptions:

  • Project sources have to be available in a git repository
  • All project releases are associated to a given git tag. Elixir only considers such tags.

First make an installation of Elixir by following the above instructions. See the

projects
subdirectory for projects that are already supported.

Once Elixir works for at least one project, it's time to clone the git repository for the project you want to support:

cd /srv/git git clone --bare https://github.com/zephyrproject-rtos/zephyr

After doing this, you may also reference and fetch remote branches for this project, for example corresponding to the

stable
tree for the Linux kernel (see the instructions for Linux earlier in this document).

Now, in your

LXR_PROJ_DIR
directory, create a new directory for the new project:

cd $LXRPROJDIR mkdir -p zephyr/data ln -s /srv/git/zephyr.git repo export LXRDATADIR=$LXRPROJDIR/data export LXRREPODIR=$LXRPROJDIR/repo

Now, go back to the Elixir sources and test that tags are correctly extracted:

./script.sh list-tags

Depending on how you want to show the available versions on the Elixir pages, you may have to apply substitutions to each tag string, for example to add a

v
prefix if missing, for consistency with how other project versions are shown. You may also decide to ignore specific tags. All this can be done by redefining the default
list_tags()
function in a new
projects/.sh
file. Here's an example (
projects/zephyr.sh
file):

list_tags() { echo "$tags" | grep -v '^zephyr-v' }

Note that

 must match the name of the directory that
you created under 
LXR_PROJ_DIR
.

The next step is to make sure that versions are classified as you wish in the version menu. This classification work is done through the

list_tags_h()
function which generates the output of the
./scripts.sh list-tags -h
command. Here's what you get for the Linux project:

v4 v4.16 v4.16 v4 v4.16 v4.16-rc7 v4 v4.16 v4.16-rc6 v4 v4.16 v4.16-rc5 v4 v4.16 v4.16-rc4 v4 v4.16 v4.16-rc3 v4 v4.16 v4.16-rc2 v4 v4.16 v4.16-rc1 ...

The first column is the top level menu entry for versions. The second one is the next level menu entry, and the third one is the actual version that can be selected by the menu. Note that this third entry must correspond to the exact name of the tag in git.

If the default behavior is not what you want, you will have to customize the

list_tags_h
function.

You should also make sure that Elixir properly identifies the most recent versions:

./script.sh get-latest

If needed, customize the

get_latest()
function.

If you want to enable support for

compatible
properties in Devicetree files, add
dts_comp_support=1
at the beginning of
projects/.sh
.

You are now ready to generate Elixir's database for your new project:

./update.py

You can then check that Elixir works through your http server.

== Coding style

If you wish to contribute to Elixir's Python code, please follow the https://www.python.org/dev/peps/pep-0008/[official coding style for Python].

== How to send patches

The best way to share your contributions with us is to https://github.com/bootlin/elixir/pulls[file a pull request on GitHub].

= Automated testing

Elixir includes a simple test suite in

t/
. To run it, from the top-level Elixir directory, run:

prove

The test suite uses code extracted from Linux v5.4 in

t/tree
.

== Licensing of code in

t/tree

The copied code is licensed as described in the https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/COPYING[COPYING] file included with Linux. All the files copied carry SPDX license identifiers of

GPL-2.0+
or
GPL-2.0-or-later
. Per https://www.gnu.org/licenses/gpl-faq.en.html#AllCompatibility[GNU's compatibility table], GPL 2.0+ code can be used under GPLv3 provided the combination is under GPLv3. Moreover, https://www.gnu.org/licenses/license-list.en.html#AGPLv3.0[GNU's overview of AGPLv3] indicates that its terms "effectively consist of the terms of GPLv3" plus the network-use paragraph. Therefore, the developers have a good-faith belief that licensing these files under AGPLv3 is authorized. (See also https://github.com/Freemius/wordpress-sdk/issues/166#issuecomment-310561976[this issue comment] for another example of a similar situation.)

= License

Elixir is copyright (c) 2017--2020 its contributors. It is licensed AGPLv3. See the

COPYING
file included with Elixir for details.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.