You must have read in the previous blogs that we are currently working on a new worker for augur to calculate Libyear. Weeks 8 and 9 have been really productive as we have made good progress in that direction.
Working of the Worker-
We can divide Worker into 3 parts —
1- Parsers — Parsers parse the package files and get the list of dependencies and their version representations. Currently, we have only build parsers for the PYPI package manager and supporting file formats are — system.py, requirement.txt, pipfile, pipfile.lock, and poetry. I would be adding the support for conda soon. For these parsers, formats like TOML and JSON are easily handled, as we can parse them easily using python code. But the tricky part was parsing setup.py and requirement.txt. These parsers heavily rely on pattern string matching using regular expressions. Setup.py was the most tricky one as it has a lot more than just dependencies. Conda parsers would be parsing YAML files, so I am expecting it to be straightforward.
2- Getting meta data for packages — This part is doing a lot of things. Firstly we get the dependency name and spec string from the parsers. Then it checks and cleans the spec string according to the way it is represented. For example, people might represent their dependencies in this way —
As you can see above, there are different kinds of spec strings for the dependencies. When there is no version specified we assume it to be the latest, when we have a format like ` <8.0,>=5.1 `, this considers version just before `8.0`, I will explain shortly the way we get the list of all the versions. So we get the current cleaned version according to the way it is represented. Once we have the cleaned version now we would need current version’s release date, latest version and its release date to calculate Libyear. As I have mentioned before, right now we are only supporting PYPI, so we get all the data from a PYPI url, which takes dependency name and version(if needed) as parameters and gives back a JSON response. This response has a lot of meta data, more than we need. I have also tried my best to minimize the call to PYPI as we don’t want to bother their servers as well as it also reduces the risk of getting a bad response i.e. ERROR 404. We have the list of releases with their release dates and time which we fetch according to the versions.
Now there are scenarios where the release date or version doesn’t exist. This is fairly common for really old PYPI packages. I have added error handling for this, if we do not find the release date then it automatically sets it to 1970–01–01 00:00:00 and sets the Libyear to -1. If we see -1 as Libyear, we would know that there would be some problem with the data and by this approach we also do not miss any dependencies in the files.
3- Libyear Calculator — As the name suggests, this calculates the Libyear. We get all the release dates and it checks for the data, if we do not have the latest version or release dates, it returns -1 as its value. If the file does not specify any version or just have `>=`, we assume the latest version, in this case we return Libyear as 0. For all other scenarios, we calculate Libyear by subtracting the current release date from the latest release date and dividing it by 365. This subtraction is done by using a python library `dateutil.parser`
This is how the current code is working. This code is written separately from augur and not as a worker because it may get way more complicated and time-consuming if we directly start with the worker. Augur already does a lot of things so sometimes debugging may get difficult and also for every small change we have to rebuild the entire instance to see it working. So it is always recommended to build a separate program and then wrap it up as a worker. So this is what I am currently doing, wrapping everything as a worker. I hope this would work as a worker too, fingers crossed!