How to Respond to Rogue websites

In a continuation of the impact of deliberate mis information spread on the web and poisoning of the AI models, I would like to reproduce here an article which I had written some time back.

This article also refers to another article of 2000 where I had discussed the Dalistan.org issue.

The thoughts represented in these articles become more relevant today since AI can itself be used to generate blog posts and flood the web with articles with a bias which will further be accentuated by the search engines picking up the blog posts. The posts can be created in such a manner with SEO optimization that the search engine pick up is better than sites like naavi.org and hence the fake narrative will proliferate.

This will be like the Time Capsule that Mrs Indira Gandhi wanted to bury to influence the future civilization. What is being created on the web is like a “Time Capsule” and in due course the Indian history will be re-told by this time capsule.

The problem of Fake news which we encountered in Twitter and tried to mitigate through the Digital Media Ethics code will be nothing to the future where AI algorithm based search engines will be poisoned.

We need to think of a way to resolve this issue.

Naavi

Posted in Cyber Law | Leave a comment

New Digital India Act in the making-2: Integrity of ChatGPT like models

On May 26th, 2022, the MeitY had released the “National Data Governance Framework Policy” for public consultation. Mr Rajeev Chandrashekar has made a reference to this policy while introducing the proposed Digital India Act and stated that this policy would be part of the ensuing Cyber Law eco system of India.

The objective of this policy was to ensure that non-personal data and anonymised data from Government and Private entities are safely accessible by research and innovation eco-system. According to the press release of the Government issued on 27th July 2022 in this regard, the policy was meant to provide an institutional framework for data/datasets/metadata rules, standards, guidelines and protocols for sharing of non personal data sets while ensuring privacy, security and trust.

Now this policy, a draft of which was made public in 2022 (Refer here) becomes integral to the Cyber Law eco system in the country and will have an impact even on DPDPB2022. The objectives of this policy included building a platform where “Data” can be made available for processing by the Big Data industry. It would also impact the AI development systems by contributing data for Machine Learning modelling. (Also refer here for more on the policy)

In the context of the penetration of Large Language Models such as the ChatGPT in the ecosystem, the need for unbiased data set for Machine Learning is critical. The public opinion in future would be automatically framed by the large language models which have the capability of making people believe untruth as truth. The models like the new Bing Search Engines built on this “Idiotic but pretending to be Intelligent” language models may rationalize fake narratives on the web if web information becomes the predominant training data set. These models are amenable to “AI Training Data Poisoning” threat which needs to be prevented if we want the integrity of the ChatGPT like models to be preserved.

As an example arising out of the current narratives floating in the internet, it would not be surprising if the language model confidently says that “Indian Democracy is under threat because of Mr Modi while the truth could be that it is under threat because of Rahul Gandhi”. The “Garbage In, Garbage Out model” of training of AI models would come to this conclusion because the fake narrative may be more prolific than the true narrative. This would be true of all social issues since people spread negative information faster than true information and the media looks at this as an opportunity to increase their TRP. This would lead to the development of a distorted view of the society.

At any point of time web will contain more negative and false information than truthful and positive information. This will creep into the language model dependent search engines and corrupt the society irretrievably over a period.

I reproduce here the first paragraph of an article (Is it a George Soros sponsored article?) “intelligencesquared.com” . The article is reported as an introduction to a debate with Siddarth Varadarajan , Founding editor of The Wire and Bobby Ghosh, a member of the Bloomberg’s editorial board participating.

Quote

India may be the world’s largest democracy, but under Prime Minister Narendra Modi the country is sliding inexorably towards autocracy. In his six years in office, Modi has presided over an increase in arrests, intimidation and the alleged torture of lawyers, journalists and activists who speak out against him. His Hindu nationalist government has amended its citizenship laws to favour Hindus over Muslims and has pledged to create a national register of citizens, prompting concern that millions of Muslims with inadequate paperwork will be unable to qualify for citizenship. Modi doesn’t like to hear dissent: while in power he has not held a single press conference or given any unscripted interviews. Several international organisations have now marked India as only ‘partly free’ or as a ‘flawed democracy’. This great, vibrant, argumentative country with a proud history of debate has never seen anything like this prime minister: Narendra Modi is the most serious threat Indian democracy has ever faced.

Unquote

The second paragraph provides a brief mention of the opposing view and then goes onto introduce an even in which two speakers will speak. This is a clever presentation of data which the search engines and AI algorithms will pick as the narrative represented by the heading and the first paragraph.

Any AI model to which this paragraph goes as an input would definitely create a biased output. As the biased output percolates into the public mind through search engines such as the Bing search, the public will slowly start believing the fake narrative.

The new DIA flags the need for preventing fake social media narrative as one of its objectives in creating an “Open but Safe” Internet. At the same time the tendency of the law makers is to exempt “Online Search Engines” from most of the regulatory controls in the belief that the search engine output is unbiased. This presumption is however incorrect in the era of ChatGPT since “Fake planted articles on the web” will reinforce the learning of the Chat GPT like platforms and increase the Bias in each cycle of learning.

Hence filtering the data set used for machine learning is necessary to avoid bias creeping in to the AI model in its decision making.

Regulation of AI is also part of the DIA objective and “Prevention of Bias” is declared as one of the most important ethical challenges.

The DIA therefore needs to ensure that a reliable data set needs to be created out of unbiased basic data. This is an important regulatory aspect required to maintain the integrity of the search engines and large language models.

We have already suggested that AI algorithms need to be held accountable to the creators through a system of labelling, licensing and registration. The use of reliable data set for the training process is one of the parameters for accreditation of AI algorithms to be registered by the AI regulatory authority.

Meity had released certain reports on AI (Refer here). Out of the four reports published, Report of Committee D which has addressed the ethical issues of AI has suggested the need for measures to avoid bias in AI. Now some of these suggestions have to be part of the DIA.

We draw the attention of the Meity and request for considering measures to prevent “Poisoning of AI Training.”.

Naavi

Posted in Cyber Law | Leave a comment

Concurrent Compliance and Continuous Compliance

The audit community (eg ISO 27001 audit) generally conducts an audit as a snapshot at a point of time and issue a certificate that the subject entity is compliant. The the certificate would be normally valid for a 3 year period with a clause that the entity should maintain the compliance check through internal audits at periodical intervals. Most auditors also add that in case of any significant change in the operations, the audit should be repeated. As a result, the responsibility for the maintenance of controls after the audit vests with the organization.

The internal audit team of an organization normally maintains a schedule of audit such as quarterly audit or half yearly audit depending on its own risk perceptions. This “Intermittent Audit” is like the Financial Reporting on quarterly basis through Balance Sheets drawn once in a quarter.

In some industries the system of “Continuous Audit” is in vogue where the maintenance checks are conducted at more frequent intervals and observations are made on critical parameters on transaction to transaction basis. In such a system each transaction is filtered through an audit check before being recorded. For example in the case of a Financial Audit, each voucher may be checked for appropriate permissions and authority and on clearance taken on record. In simple decision making environment this can be automated to the extent the audit becomes almost a “Continuous Audit”.

However in the Techno Legal Audits such as GDPR or ITA 2000 or DPDPB audits, the filters involve some legal interpretations which need human intervention more often than in the case of simple financial decisions. In the case of Personal data protection, a “Transaction” may mean collection of a personal data set, or accumulation of identifiers. Some times new processes and disclosure also may be transactions where personal data is processed as a transaction.

Despite the emergence of AI tools, it is difficult to fully automate the Personal Data related transaction verification on a continuous transaction by transaction basis. The effort would therefore be to reduce the intermittent audit period from around 3 months to a lesser duration of say one month or more ideally one day. Such auditing may require some affirmative action by a human and cannot be entirely relied upon on an automated system.

How this “compression” of audit period can be achieved is a complex decision and may also depend on the risk perceptions in the entity. Further in the enterprise level legal compliance, compliance can be measured only in totality of the operations and not on individual transactions. Hence it would be necessary to have an index of compliance as a barometer to be watched. Hence Concurrent Audit in the Techno Legal scenario cannot be done without first developing the measurement index of compliance and tracking its changes.

The DTS system developed by Ujvala Consultants is used by the Ujvala Concurrent Audit system with the use of an online mechanism already developed. Some finer details of how to tag the monitoring of changes to certain parameters of change is being finalized and will shortly be announced as an automated online system for Certification.

The Concurrent DTS evaluation of Ujvala will follow the steps of “Self Assessment”, “Mentor Assisted Self Assessment”, “Summary Assessment based on documentary evidence” . Subsequently the Certification can be passed onto a qualified auditor who is accredited by a suitable organization such as FDPPI.

Watch out for the launching of the “Personal Data Certification” system based on Concurrent audit shortly.

Naavi

Posted in Cyber Law | Leave a comment

Concept of Concurrent Compliance

In our earlier article we had raised a term “Concurrent Compliance” as one of the goals of PDPSI. This was a new term coined after the more often used term namely “Concurrent Auditing”. In PDPB 2019, apart from the mandatory annual data audit by an external data auditor, Significant Data Fiduciaries were required to conduct “Concurrent Audits”.

Essentially, “Concurrent Audit” means that the organization maintains an ongoing supervision on its activities (in this instance compliance to data protection law) and not an intermittent audit conducted from time to time.

This means that if there are 50 principles of Digital Personal Data Protection Audit, which an external auditor would check once a year, the management has to keep checking these 50 parameters every day and every moment.

If DPIA is conducted as and when a new process is being contemplated, Concurrent audit should monitor DPIA on a daily basis identifying the changes that might occur in its data processing such as a new employee coming in, an existing employee exiting. or when new technology devices are purchased or sold.

Hence Concurrent Audit envisages an integrated system where relevant parameters are monitored on an ongoing basis and a dashboard is available for the management to follow. It is accepted that this is a complex challenge when the business parameters are continuously change. But organizations can work on setting up such systems initially at a higher level and later fine tune it as needed.

Under PDPSI, we are trying to use the online DTS system which we developed some time back as a tool for this Concurrent Auditing. The DTS system is a system which tries to assess the compliance of an organization to a given data protection law over 50 different Model Implementation Specifications (MIS). This was developed to assist the Data Auditor who makes an annual assessment. The same system can be also used by the management by creating a dashboard where DTS is being continuously monitored and fine-tuned.

Presently, we had introduced the online DTS system for PDPB 2019/DPA 2021 and GDPR and presented it on Ujvala.com website. This will now be suitably automated to generate the DTS on a continuing basis. As and when an external auditor makes an assessment, the self-assessed DTS would be modified to reflect the audited DTS. This will enable the synchronization of the internal approach managed by the DPO with the external auditor’s approach and both would learn by mutual exchange of views during the audit.

Await more information to be released on this service….

Posted in Cyber Law | Leave a comment

The New Digital India Act in the making-1 : Cyber Crimes under IPC?

A few months back, Naavi.org had started a discussion on “Shape of Things to Come” where several aspects of Data Protection Law was discussed through a series of articles. A total of 23 articles were published ending with “Cut paste approach or Zero based approach?..Shape of Things to Come-23″.

We also carried a list of 8 articles on Telecom Act ending with The New Telecom Act-8: Right of Way which is still in draft status.

The Government had at that time announced the intention of revising the ITA 2000 and introducing a new Act titled Digital India Act. (DIA). We had published 4 articles in this series ending with https://www.naavi.org/wp/digital-india-act-4-online-gaming/

Many sugestions have been made earlier also when T K Vishwanathan committee was working on the amendments. One such article was Suggestions on Modification of ITA 2008

Now, on 9th March 2023, the honourable Minister of State for IT, Sri Rajeev Chandrashekar (RC) has unveiled the contours of the new Digital India Act proposed to replace the current ITA 2000. Mr RC made a power point presentation outlining the “Proposed Digital India Act 2023” calling for suggestions to be sent to the Ministry.

We can therefore continue our discussions on the DIA series on the basis of this new draft. A copy of the presentation made by Mr RC is already available here:

One of the first observations that can be made is that DIA is set to be “Principle Based” and not “Prescriptive”. This indicates that the Act would focus more on the regulation of the industry and restrict its penal provisions to only Civil Wrongs. It is likely that the entire Chapter XI of ITA 2000 may be moved as an amendments of IPC. This incidentally explains the logic in the new DPDPB2022 dropping the criminal offence of “Re-identification of Anonymized Information” as well as the amendments sought to be made to ITA 2000 through the JanVishwas Bill. (yet to be passed).

It is perhaps a good idea to place all Cyber Crimes as part of IPC. At present, any crime under IPC where an Electronic Document is an instrument of crime or a target of crime was being defined as a “Cyber Crime” along with specific crimes defined in the ITA 2000.

But Police were often confused on invoking proper sections of ITA 2000 since the names of Cyber Crimes given by the Tech Industry need decyphering with the “Intention based violations” that was the basis for invoking IPC. The legal education system was also not geared to teach ITA 2000 in as much detail as it was necessary for lawyers. These things may change for the better now since Cyber Crimes may become part of IPC.

(P.S: The movement of Chatper XI of ITA 2000 to IPC is an expectation and we need to watch out for the next draft of DIA for confirmation).

…Discussions continue

Posted in Cyber Law | Leave a comment

“Concurrent Compliance” under PDPSI

While the Government of India is in the process of finalizing the Digital Personal Data Protection Bill (DPDPB), Naavi is busy in finalizing the new version of PDPSI incorporating the changes that have been brought in by the DPDPB2022. Once the final Bill is ready and presented in the Parliament, the new version will be released and a training program for auditors would be started in April 2023 as a Certification program.

The essence of this new version of PDPSI (version 2023) would be the concept of “Concurrent Compliance” where the management of a data fiduciary would be monitoring the compliance parameters on an ongoing basis.

The Concurrent Compliance Tool which would be available for companies online would enable even Data Auditors to conduct audits.

If the audits are to be certified by FDPPI, there will be certain requirements. Otherwise the tool can be used as a Self assessment tool.

We are looking forward to the Government to come up with the new version of the Bill.

FDPPI will also be commencing parallelly a program on Module I on Indian Data Protection law in April as soon as the Bill is ready.

Watch out for necessary information here shortly.

Posted in Cyber Law | Leave a comment