Serialization: A week long struggle

Hello folks,

I have been away from my blog because there was nothing really to discuss. I was constantly trying to do some stuff and was constantly failing. But, after a week long struggle and some help I was able to get over this struggling period and now shifted to the next task in my task list.

So as a whole, this month was well spent learning new stuff, first unit tests and then serializers. Those who have worked with Django Rest Framework will get what I am trying to say in the post.

First things first, Why do we need serializers?

To answer this question, we need to know why were the serializers created anyway.

According to some reliable sources like Wikipedia, serialization is the process by which we convert the data into such a format so that it can be transferred easily through the different layers of electronic components.

We know that our data is present in the models. We also know that we cannot ship that data easily to different formats through our models. So, we use the simple concept of serialization that converts the models’ data or any other data into JSON, XML or YAML format which can be easily transmitted over the network.

Easy, right?

Let’s dive in and see some code snippets.

class ScanInfo(models.Model):
    def __str__(self):
        return self.scan_type

    scan_types = (
        ('URL', 'URL'),
        ('Local Scan', 'localscan'),
    )

    scan_type = models.CharField(max_length=20, choices=scan_types, default='URL')
    is_complete = models.BooleanField()

class UserInfo(models.Model):
    def __str__(self):
        return self.user.username

    user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    scan_info = models.ForeignKey(ScanInfo)

class URLScanInfo(models.Model):
    def __str__(self):
        return self.URL

    scan_info = models.ForeignKey(ScanInfo)
    URL = models.URLField(max_length=2000)

class LocalScanInfo(models.Model):
    def __str__(self):
        return self.folder_name

    scan_info = models.ForeignKey(ScanInfo)
    folder_name = models.CharField(max_length=200)

class CodeInfo(models.Model):
    def __str__(self):
        return self.total_code_files

    scan_info = models.ForeignKey(ScanInfo)
    total_code_files = models.IntegerField(null=True, blank=True)
    code_size = models.IntegerField(null=True, blank=True, default=0)

Well, that’s not the all of the models, but you got the idea, right? So, we have multiple levels of inheritance between all those models( Well not really inheritance but in simple words, we can say this). Now the real test is to write the serializers about them.

I decided to use the simple ModelSerializers.

class ScanInfoSerializer(serializers.ModelSerializer):
    class Meta:
        model = ScanInfo
        fields = '__all__'
class UserInfoSerializer(serializers.ModelSerializer):
    class Meta:
        model = UserInfo
        fields = '__all__'
class URLScanInfoSerializer(serializers.ModelSerializer):
    class Meta:
        model = URLScanInfo
        fields = '__all__'
class LocalScanInfoSerializer(serializers.ModelSerializer):
    class Meta:
        model = LocalScanInfo
        fields = '__all__'
class CodeInfoSerializer(serializers.ModelSerializer):
    class Meta:
        model = CodeInfo
        fields = '__all__'

Now I checked the sample outputs of these serializers and to my surprise, I was not able to get the desired result. The JSON output created by them was totally opposite from what we were expecting it to be.

So, I did an experiment to create a GodSerializer( Which was the literal name of the serializer) along with a helper for it. The helper will tell the serializer in the way that it was going to work.

class GodSerializer(serializers.Serializer):
    """
    Another good serializer to handle all the serialization activities
    """
    code_info = CodeInfoSerializer()
    url_scan = UrlScanInfoSerializer()
    local_scan = LocalScanInfoSerializer()
    scan_result = ScanResultSerializer()
    scan_file_info = ScanFileInfoSerializer(many=True)
    license = LicenseSerializer(many=True)
    matched_rule = MatchedRuleSerializer(many=True)
    matched_rule_license = MatchedRuleLicenseSerializer(many=True)
    copyright = CopyrightSerializer(many=True)
    copyright_holder = CopyrightHolderSerializer(many=True)
    copyright_statement = CopyrightStatementSerializer(many=True)
    copyright_author = CopyrightAuthorSerializer(many=True)
    package = PackageSerializer(many=True)
    scan_error = ScanErrorSerializer(many=True)

After this, I created the GodSerializerHelper that helped the Serializer the way things were going to work. Here is the code for the helper.

class GodSerializerHelper(object):
    def __init__(self, scan_info):
        self.scan_info = scan_info
        self.code_info = CodeInfo.objects.get(scan_info=scan_info)
        self.url_scan = URLScanInfo.objects.get(scan_info=scan_info)
        self.local_scan = None
        self.scan_result = ScanResult.objects.get(code_info=self.code_info)
        self.scan_file_info = ScanFileInfo.objects.filter(scan_result=self.scan_result)
        self.license = License.objects.filter(scan_file_info__in=(self.scan_file_info))
        self.matched_rule = MatchedRule.objects.filter(license__in=(self.license))
        self.matched_rule_license = MatchedRuleLicenses.objects.filter(matched_rule__in=(self.matched_rule))
        self.copyright = Copyright.objects.filter(scan_file_info__in=(self.scan_file_info))
        self.copyright_holder = CopyrightHolders.objects.filter(copyright__in=(self.copyright))
        self.copyright_statement = CopyrightStatements.objects.filter(copyright__in=(self.copyright))
        self.copyright_author = CopyrightAuthor.objects.filter(copyright__in=(self.copyright))
        self.package = Package.objects.filter(scan_file_info__in=(self.scan_file_info))
        self.scan_error = ScanError.objects.filter(scan_file_info__in=(self.scan_file_info))

See the proper usage __in, this is used to remove a big error of calling a model by using multiple rows of the ForeignKey. This might seem weird explanation. But that’s it. Let me try it once more. We know when we use objects.filter it return more than one row. Now as the variable is storing more than one row, it cannot be passed to next objects.filter because it has more than one rows itself.

After this, for testing, I used the following code to see if the things are looking well.

s = GodSerializerHelper(ScanInfo.objects.get(pk=51))
s = GodSerializer(s)
s.data

Hope this post helps someone in future. Still, in some dilemma, join the conversation in the comments.

Have a good day.

Using mutt to send a few hundred of emails using python script

This post was first posted on http://ranvir.xyz/blog/using-mutt-to-send-a-few-hundred-of-emails-using-python-script/. Read over there for the better reading experience.

GitHub repository:

https://github.com/singh1114/csiMuttScript

So this was my first encounter with the powerful email client MUTT. So, according to the official website of mutt, “All web clients sucks but this one sucks less”. This is the best way explaining something.

So, let’s go through the build up and share the story of WHY.

What was the reason to use MUTT

So I went to my friend’s room in the hostel and he was asking his roommate to send emails to a list of people. He had the database of the people with him which he was going to use to send the emails.

He told me that he have the database of the registered student and in which all the data is present. He gave the database to his roommate and asked him to send the email to each one of them. I asked him about the procedure that he was going to follow. Very sincerely he told me that he will pick all of the emails one by one and send them the email.

I laughed to this as I knew there were more than a hundred people in the list. I knew that if he went forward by following this procedure he might not be able to complete it till morning. So I asked him to allow me to do the work for him. I assured him that I will write a program that will complete the work faster.

So he gave me the database and I started working on it. The first and very obvious thing that can come to someone’s mind is that we can convert the database into CSV(comma separated values) file and import it using google mail which allows direct importing of contacts from CSV file.

Note: Now that the work is done I am thinking that it could had been a good solution.

I asked my friend and he said that they don’t want everyone to know about each other’s email ID.

He told me that what he want to do is that he can send the email in a loop. So that the privacy of each member is preserved.

I remembered that Rai sir used to talk about MUTT, using which we can send emails through the terminal and if we can do something using the terminal, we can configure it according to our needs.

So, I went forward to install MUTT and started reading about it. Finally, I found a great article which I followed during the build process. The following link will take you to the article.

http://nickdesaulniers.github.io/blog/2016/06/18/mutt-gmail-ubuntu/

After that, I wrote a small program in python that can read each line of CSV file and differentiate between all the values in the file and stores the email in a variable. First of all searched about how to execute bash commands in a python program. “subprocess” was the solution to my problem so I imported it.

I tested it by sending a few emails to my friends. It worked fine. Then I wrote another file and asked my friend about the content that he wanted to send to the list. He told me a few things and I used my open source capabilities to write the content (Collaborating in a big project is initiated by a good conversation).

I tested the script and found that for every mail we have to tell the system that whatever we were doing was correct and we want to go forward. So there was a lot of pressing “Enter” key. I wanted to remove that too but wasn’t able to in the time. I explain my friend on how to carry out the procedure as I was in no mood of doing all that clicks.

When he ended up sending all the emails. I found the solution and corrected the single character in my file. This commit shows the way in which I solved the problem.

https://github.com/singh1114/csiMuttScript/commit/75f2eccb6928fdb3994390283ebe205cf17d4bbb

 

Automating the action of creating the presentations

GitHub repository : https://github.com/singh1114/automatingPresentations/

8th February 2017

I came back from my so-called holidays and sir were ready to excite me with a new task on the table. As we know that the progress in the chatting app is really slow, that is why I decided to work on this side project. So the basic aim of the project is to create the presentations that are good enough to get the work done. We didn’t want to create something extraordinary. The basic aim of the project is to create a markdown presentation from a Powerpoint presentation. Sir were able to get whole of the text out from the presentation and now the job is to create something is good enough with the formatting.

The file that contains the whole of the text, differentiate each slide with some references in the end. So differentiating two different slides will be easy. Next task will be to figure out what tags to be used for all the content. I saw that all the heading were written in the capitals. But we cannot differentiate between different levels of headings. So this might be done manually.

9th February 2017

Now was the time to think something about the procedure that we were going to follow. I wrote some of the pseudo code to start with.

Read the file:

  whenever the line starts with *:

    read each character:

      if the letters are capital:
        give heading h1

      if the * is first in the group:
        raise the height by 50%

      if there is nothing after *
        do nothing

  if the line starts with References:
    write the code for new slides

We decided to use python for completing the job. So let the work begin.

10th February 2017

I wrote the code for the above-written pseudo code. In the end, the code written was somewhat different from the pseudo code. While writing the code I learned a few things about the file handling in python which I want to share in a quick fashion. First of all, the file handling in python does not require any type of library to be imported.

For opening a file, we can use the function open(). The parameter that we pass are:

  1. The file name
  2. The mode in which we want to open the file

If you know about the file handling in C, then you must know about these modes like “r”, “w”, “a” which stands for read, write and append respectively. The first two words are self-explanatory on the other hand append is used to append the new text to the end of the file.

read() function is used to read the content of the file and write() function is used to write some content to the file. seek() function is used to send position of the file to some other position or check the position of the pointer of reading and writing the text. Do keep in mind to close() the opened file in the end.

Now talking about the script, I used a temporary variable to check for all types of cases but still two cases are not under the hood (There might be more but for now we only know about two of them). They are as follows:

  1. Even if the first word is in upper case it takes it a header text
  2. Paragraphs are not handled properly.

The commit for the code is here.

In the end I wrote some documentation about the way in which this code will be used.

The commit for the documentation is here.

11th February 2017

We were able to resolve the earlier discussed issues. Further information will be shared in some other follow ups.

Please comment on the post if there is some confusion.