Serialization: A week long struggle

Hello folks,

I have been away from my blog because there was nothing really to discuss. I was constantly trying to do some stuff and was constantly failing. But, after a week long struggle and some help I was able to get over this struggling period and now shifted to the next task in my task list.

So as a whole, this month was well spent learning new stuff, first unit tests and then serializers. Those who have worked with Django Rest Framework will get what I am trying to say in the post.

First things first, Why do we need serializers?

To answer this question, we need to know why were the serializers created anyway.

According to some reliable sources like Wikipedia, serialization is the process by which we convert the data into such a format so that it can be transferred easily through the different layers of electronic components.

We know that our data is present in the models. We also know that we cannot ship that data easily to different formats through our models. So, we use the simple concept of serialization that converts the models’ data or any other data into JSON, XML or YAML format which can be easily transmitted over the network.

Easy, right?

Let’s dive in and see some code snippets.

class ScanInfo(models.Model):
    def __str__(self):
        return self.scan_type

    scan_types = (
        ('URL', 'URL'),
        ('Local Scan', 'localscan'),
    )

    scan_type = models.CharField(max_length=20, choices=scan_types, default='URL')
    is_complete = models.BooleanField()

class UserInfo(models.Model):
    def __str__(self):
        return self.user.username

    user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    scan_info = models.ForeignKey(ScanInfo)

class URLScanInfo(models.Model):
    def __str__(self):
        return self.URL

    scan_info = models.ForeignKey(ScanInfo)
    URL = models.URLField(max_length=2000)

class LocalScanInfo(models.Model):
    def __str__(self):
        return self.folder_name

    scan_info = models.ForeignKey(ScanInfo)
    folder_name = models.CharField(max_length=200)

class CodeInfo(models.Model):
    def __str__(self):
        return self.total_code_files

    scan_info = models.ForeignKey(ScanInfo)
    total_code_files = models.IntegerField(null=True, blank=True)
    code_size = models.IntegerField(null=True, blank=True, default=0)

Well, that’s not the all of the models, but you got the idea, right? So, we have multiple levels of inheritance between all those models( Well not really inheritance but in simple words, we can say this). Now the real test is to write the serializers about them.

I decided to use the simple ModelSerializers.

class ScanInfoSerializer(serializers.ModelSerializer):
    class Meta:
        model = ScanInfo
        fields = '__all__'
class UserInfoSerializer(serializers.ModelSerializer):
    class Meta:
        model = UserInfo
        fields = '__all__'
class URLScanInfoSerializer(serializers.ModelSerializer):
    class Meta:
        model = URLScanInfo
        fields = '__all__'
class LocalScanInfoSerializer(serializers.ModelSerializer):
    class Meta:
        model = LocalScanInfo
        fields = '__all__'
class CodeInfoSerializer(serializers.ModelSerializer):
    class Meta:
        model = CodeInfo
        fields = '__all__'

Now I checked the sample outputs of these serializers and to my surprise, I was not able to get the desired result. The JSON output created by them was totally opposite from what we were expecting it to be.

So, I did an experiment to create a GodSerializer( Which was the literal name of the serializer) along with a helper for it. The helper will tell the serializer in the way that it was going to work.

class GodSerializer(serializers.Serializer):
    """
    Another good serializer to handle all the serialization activities
    """
    code_info = CodeInfoSerializer()
    url_scan = UrlScanInfoSerializer()
    local_scan = LocalScanInfoSerializer()
    scan_result = ScanResultSerializer()
    scan_file_info = ScanFileInfoSerializer(many=True)
    license = LicenseSerializer(many=True)
    matched_rule = MatchedRuleSerializer(many=True)
    matched_rule_license = MatchedRuleLicenseSerializer(many=True)
    copyright = CopyrightSerializer(many=True)
    copyright_holder = CopyrightHolderSerializer(many=True)
    copyright_statement = CopyrightStatementSerializer(many=True)
    copyright_author = CopyrightAuthorSerializer(many=True)
    package = PackageSerializer(many=True)
    scan_error = ScanErrorSerializer(many=True)

After this, I created the GodSerializerHelper that helped the Serializer the way things were going to work. Here is the code for the helper.

class GodSerializerHelper(object):
    def __init__(self, scan_info):
        self.scan_info = scan_info
        self.code_info = CodeInfo.objects.get(scan_info=scan_info)
        self.url_scan = URLScanInfo.objects.get(scan_info=scan_info)
        self.local_scan = None
        self.scan_result = ScanResult.objects.get(code_info=self.code_info)
        self.scan_file_info = ScanFileInfo.objects.filter(scan_result=self.scan_result)
        self.license = License.objects.filter(scan_file_info__in=(self.scan_file_info))
        self.matched_rule = MatchedRule.objects.filter(license__in=(self.license))
        self.matched_rule_license = MatchedRuleLicenses.objects.filter(matched_rule__in=(self.matched_rule))
        self.copyright = Copyright.objects.filter(scan_file_info__in=(self.scan_file_info))
        self.copyright_holder = CopyrightHolders.objects.filter(copyright__in=(self.copyright))
        self.copyright_statement = CopyrightStatements.objects.filter(copyright__in=(self.copyright))
        self.copyright_author = CopyrightAuthor.objects.filter(copyright__in=(self.copyright))
        self.package = Package.objects.filter(scan_file_info__in=(self.scan_file_info))
        self.scan_error = ScanError.objects.filter(scan_file_info__in=(self.scan_file_info))

See the proper usage __in, this is used to remove a big error of calling a model by using multiple rows of the ForeignKey. This might seem weird explanation. But that’s it. Let me try it once more. We know when we use objects.filter it return more than one row. Now as the variable is storing more than one row, it cannot be passed to next objects.filter because it has more than one rows itself.

After this, for testing, I used the following code to see if the things are looking well.

s = GodSerializerHelper(ScanInfo.objects.get(pk=51))
s = GodSerializer(s)
s.data

Hope this post helps someone in future. Still, in some dilemma, join the conversation in the comments.

Have a good day.

Writing unit tests for the models

You must have heard about the term test-driven-development if you are into the developmental works. It is the development in which you write tests before writing the logic. That means first you write the stuff that can break the code and then you write the real code that doesn’t break which is unbreakable from that point of view.

I hope this makes sense. If not, keep on reading for some time and you will come to know more about the stuff.

Why do we need tests?

At the intermediate level of development, where I am right now, we merely write tests for our code. But it is regularly said that

Untested code is broken code.

That being said, I found a great presentation that will eventually strengthen my argument of writing automatic tests.

https://www.slideshare.net/wooda/philipp-von-weitershausen-untested-code-is-broken-code

No need to go beyond first 7-8 slides.

So the basic idea of automatic testing is to save someone from breaking our code in future. This also helps us to find some issues in the code that were not visible when we coded them. The logical errors in the repository having thousands of lines of code are hard to detect. That is why the good people introduced testing for the developers.

Similarly in future, if someone codes something for us and we add that to our main repository without testing, it can break everything for us. So automatic testing is there to save us. Run the tests before adding the new stuff into the main repository and go forward without worrying about your code.

What happens during testing?

During testing, we are given certain cases which are applied on the code. The output of the code is calculated and is compared with some recommended output that developer wants. If both the outputs are same then test passes otherwise it fails. As simple as that.

In the GSoC project, we wrote tests using the unittest module of python. The unit test is a software engineering term which means to test each and every module separately. This is the default testing module used by Django.

For the beginning, we used the module to write tests for the models in the code. Here is the commit for the code.

https://github.com/singh1114/scancode-server/commit/99a36d8fe0c9289a5fac608f02cbf34171abdf28

After applying the tests I found a few errors in the code that I removed in the same commit.

What should we test in models?

While testing models we should test all the custom methods in the models. We should also test that we cannot add stuff when the field is not given. Django takes care of most of the rest stuff.

As all the things are provided in Django by default so there is very less need of test most of the things in the recent versions. But you should test __str__ and plural name of the models visible in the admin panel.

As your tests start taking shape you will feel more confident about your code.

Let’s have a coding sample:

from django.test import TestCase
class ScanInfoTestCase(TestCase):
    def test_scan_info_added(self):
        scan_info = ScanInfo.objects.create(scan_type='URL', is_complete=True)
        self.assertTrue(scan_info.is_complete)
        self.assertEqual(scan_info.scan_type, str(url_scan_info))
        self.assertEqual('Scan Info', scan_info._meta.verbose_name_plural)

In the first line, we import the TestCase from django.test. After that, in the ScanInfoTestCase we inherited this TestCase and used assertTrue and assertEqual method to check if the tests pass or not. The assertTrue method is used to check if the value is True or not. Similarly, assertEqual checks if the two variables are equal or not.

For running tests in Django, apply:

$ python manage.py test

Conclusion

It is not difficult to write automatic tests for your code. You just have to be patient with tests. It takes some time to write tests and many times it feels useless. But in longer runs, it will benefit you and save a lot of your time.